How to integrate a Streaming-First Data Architecture into your existing Infrastructure

by Caroline Maier

Hero Image
June 23, 2020 4:30pm

Blog

A Streaming-First Data Architecture is critical to real-time data ingestion



Let’s pretend we are back on Black Friday 2019. Bleary eyed, early morning shoppers await the familiar chink of the lock being pulled back as the doors to shiny, new TVs, coats, and appliances are opened at last. Accordingly, websites open their digital doors to shoppers curious to explore the many deals that lie in wait.

With a real-time enabled dashboard, you could, on a real-time basis, change what is happening on your website in response to what is selling in the store. Push items getting great traction. Remove deals on items not engaging your audience. What’s more - you can now make even more informed decisions of what and how to offer your products on Cyber Monday. If you were capturing data every hour, every three hours or worse yet, every day, you would be missing critical business opportunities that could transform revenue during an important and short-lived window. By employing streaming data into your data ingestion architecture, you are enabling an entirely new value to your organization that you could not achieve with batch data ingestion.

It’s important to note, however, that there still is an important role for batch to play as real-time and historical analytics drive important information for organizations. But in our current environment, we have all felt the pressure to adopt streaming into our infrastructure to help guide us successfully into the future and achieve key business objectives.


Change Data Capture (CDC) should be a key component of your Streaming- First data strategy

For a large number of organizations, without deep knowledge of streaming data open source components and how to deploy them in a modern micro-services environment, the good news is that they can now focus on understanding their business models/data and rely more on a modern Change Data Capture (CDC) platform to do the hard data ingestion work.

The right CDC environment should help you to simplify the conversion of batch workloads into streaming workloads and accelerate the creation of new streaming-first data pipelines with the potential of transforming your business.

A modern CDC platform will capture changes in real-time and replicate changes with low latency into your data lake, data warehouse, operational database or any machine learning environment. With the right CDC environment you will have the ability to do transformations and manipulation on streaming data before it is replicated in different target platforms.


Multi-Cloud data-ingestion is here to stay


Very few organizations are 100% on-premises. Most are expanding into public cloud or even multi-cloud approaches as they move ahead with their current data infrastructure plans and look ahead to the future. Usually, we see some combination of some of the three major cloud service providers: AWS, Microsoft Azure and Google Cloud Platform. Like other commercial products, each cloud provider has different strengths and weaknesses. One might be better in machine learning (e.g. GCP) while the next shines with its infrastructure (e.g. AWS). Most importantly, there is a growing reticence from organizations to being “locked in” to just one cloud. With the advent of COVID-19 in particular, it is inevitable that industries will accelerate how they engage with the cloud as that activity is likely already on the rise.

At Equalum, we believe strongly that you should not have to be locked into any particular cloud tool to successfully ingest real time data. We can connect to any cloud that you work with, replicating or transforming the data as meets your business needs. Access to our platform includes all of the CDC and streaming technologies that you will need to appropriately push to whatever cloud or alternative target you require, removing hassle and overhead.

So if you’re already embracing multi-cloud or planning to, we are here to help.


What kind of risks am I taking by implementing a Streaming-First Data strategy?


Streaming often comes into an organization when there are layers of legacy batch processes in place. Some applications have been sitting there for years, so how can you create a new dynamic in your organization without causing massive disruption when you’re not looking at a green field IT environment?

If you wanted to adopt Apache Kafka or Spark, you will inherently be assuming some level of risk. You will have to look for someone to manage the platforms and data coming to it. You will suddenly be asked to step into streaming projects that are based in Scala or Java code, forcing you to hire specialized data engineering and data ops staff to develop, manage, execute, alert and more on your data ingestion pipelines. The costs can be high and expertise in limited supply.

In Equalum’s platform, Kafka and Spark are orchestrated for you with a drag and drop user interface. The platform can connect to any cloud, numerous sources and targets and can confidently eliminate any overhead associated with these technologies. By employing CDC from various systems and sources, you have the ability to then replicate data OR if you want to transform it, with the platform’s streaming capabilities, you can transform on the fly, employ real-time lookups and other dynamic features in one click. You no longer have to worry about things like “my Kafka broker isn’t configured properly” or “how much memory should each Spark worker get for this particular workload”. Everything is completely managed.

Additionally, Equalum's end-to-end pipeline can accommodate both streaming and batch with any volume of data because of the scale-out architecture. You add nodes and add processing power. When you scale out more than 100% per node, the platform can handle any volume of data on your behalf.

Best of all, once you launch Equalum, deployment is quick. Instead of years of cost building a complex, custom system, in a few days of deployment and training, the platform enables the organization to leverage fully managed data streaming pipelines to power analytics.


Start Small with Streaming-First Data and Grow from There: Focus on Green Field Opportunities


For many organizations, a long sought after vision has been employing streaming data and streaming analytics to solve problems faster than you have been able to in the past. You might not have a week, month or year to arrive at a solution. The time is now, and the data must be too. But how can you achieve this with existing integration architectures, the batch windows in place and scripts that have been built for many years causing complex flows of information, sometimes redundant and often error prone?

Start with a green field project as you embark upon streaming. Think proactively about new use cases, while laying the infrastructure to migrate existing workloads.

Before you think about adopting streaming across your entire organization, identify bottlenecks in your business that, with appropriate real-time solutions, could create quick wins. Once that business use case is proven, you can slowly fan out and apply the same framework to other use cases. Slowly, you can start migrating batch workloads as well and see the success of real-time data streaming unfold.



Interested in learning more about Streaming-First Data Architectures?

Download our White Paper "The Why and How of Streaming-First Data Architectures"

DOWNLOAD

Ready to Get Started?

Experience Enterprise-Grade Data Ingestion at Infinite Speed.