Inside look at Equalum & live discussion on Data Ingestion Strategy

by Caroline Maier

Hero Image
April 15, 2020 8:12pm


Data Ingestion Strategies

To see this video with the best resolution - CLICK HERE

According to Gartner, many legacy tools that have been used for data ingestion and integration in the past will be brought together in one, unified solution in the future, allowing for data streams and replications in one environment, based on what modern data pipelines require.

In our Office Hours event on April 15, 2020, Equalum's teams of data experts walked attendees through the many facets of Data Ingestion, top considerations you must evaluate as you build a Data Ingestion strategy as well as a quick look behind the curtain at Equalum's powerful, end to end data ingestion platform. Below are the highlights.

Office Hours Moderator - Alton Dinsmore

Alton Dinsmore - our moderator for Office Hours - came to Equalum with a wealth of experience at large industry leaders. He has spent 40+ years working on various, complex data issues. His focus has been designing and architecting very large data analytical systems, complex data ingestion systems, data processing systems and also working closely on moving large workloads to the cloud. He now offers that experience to our customers and Equalum followers.

Trends we will see with Legacy Tools for Data Ingestion with Modern Data Pipelines

Gartner reports that legacy CDC tools performing data integration will be brought together in one unified solution allowing you to do streams, data replications, and transformations in one environment. They see that trend happening by 2023 based on what modern data pipelines now require. Modern data pipelines require quick reading of data as well as the ability to transform and manipulate data on the fly, because that is what is required by the target system.

You might be dealing with data ingestion issues due to legacy tools. You might have a change data capture environment, processing streaming data, but also have an ETL environment processing batch data. Often those data delivery styles operate in legacy silos - one silo for stream data replication with a CDC tool that has provided low latency for the past 20 years. On the other side lies a proprietary ETL tool processing batch data for the past 30 years. All of these tools have been serving an important function thus far, but moving forward, when considering a modern data pipeline, it might not be serving you well in the future. While you might have very fast data ingestion and acquisition, you might not be able to transform and manipulate this data. On the other side, you can transform and manipulate, but you cannot stream it with real-time data.

Many legacy data platforms are not adapting to new, modern data sources

Many traditional, legacy platforms are having a hard time adapting to new, modern data sources, especially those not provided by the platform creator. They are also dealing with issues that are native to the architecture of the technology that is proprietary. This impacts agility, scalability and flexibility for users. All of this operates against the logic you want to apply to a modern data ingestion strategy and its tools.

D.I.Y Data Ingestion Solutions can quickly become derailed and costly

While it is possible to build your own Data Ingestion platform using open source code frameworks, if you don't know how to install, integrate, monitor, manage and orchestrate your data, it is very likely that you will hit roadblocks on your path to success. You will likely not have an environment that is optimized for your users to allow them to build drag and drop pipelines and other no-coding tools.

Incorporate the 7 Core Principles when Building a Modern Data Ingestion Platform

We reviewed the 7 Core Principles of Modern Data Ingestion in the video:

  • Comprehensive Data Replication
  • Performance, Reliability, Ease of Use
  • Broad Support for Sources and Targets
  • Multi-Structured Data Support
  • Extensive ETL like operations
  • Full Orchestrated OSS Foundation
  • Hybrid Deployment

We built Equalum with all of these principles in mind using the open source foundation products of Kafka and Spark which scale very well. We have a front side that connects and does Change Data Capture, but we also transform that data with ETL operators in the center. We then synchronize that data in some fashion and push it out to the targets or where you want your data to be stored. Your target could also then come around and become a source for another workflow as needed.

Watch the video to see our brief platform demo.

Interested in learning more?

Download our White Paper "A Step by Step Guide to Data Integration Modernization"


Ready to Get Started?

Experience Enterprise-Grade Data Integration + Real-Time Streaming