5 Insider Tips Before You Embark On A Streaming Data Architecture

by Caroline Maier

Hero Image
June 26, 2020 7:07am

Blog

With the emphasis on moving to a Streaming Data Architecture, it might feel like a rip and strip of your current architecture is the only way to ensure success, but that’s hardly the case. Trying to replace complex, existing systems is not the first move, much less is it something that your team can culturally adopt. Bottom line, change is hard, so smaller steps can often lead to quicker adoption and long term gains.


#1) Consider a Greenfield Project with new revenue streams to Implement Streaming & Build Cultural Adoption


So often cultural adoption is what keeps organizations from embarking on a streaming journey. The Data Giants (or sometimes dinosaurs) are reticent to break from what is known and functional. At its core, new technology could be a threat to their understanding and even job security. Proceed knowing that as your Architecture becomes streamlined, and the lean on the tech team lightens, there will be new room for innovation, collaboration and future projects where experience and enthusiasm from all players can be fully utilized.

Take a good look at the business and identify where you see bottlenecks that, with real-time data instead of batch, could be a quick win and even generate new revenue streams. Give your team and business leaders a chance to see true value in what streaming can achieve. Once you demonstrate success and the viability of the new revenue channel, this same architecture can be brought to other areas of your organization more quickly. You’ll not only have proof of concept, but infrastructure in place and buy in from those who will need to learn and work with your new system.




#2) If you’re choosing DIY approach, prepare for a long haul

For years, many companies had ETL tools in place and are now transitioning to open source technologies like Kafka, Spark or Hadoop as they try to embrace streaming on a larger scale. But as people who have walked the DIY path know all too well, this tech can be hard to maintain, manage and to build on. Sure, open source frameworks are free, but come loaded with required tech expertise, coding and a lack of support when things go wrong. You will invest countless hours patching your system in the gray area between platforms, and lose valuable ground when a team member leaves, taking coding and expertise along with them. Before you know it, you will be trying to maintain and manage an unwieldy monster of a system - recreating the wheel when you don’t have to. It might work, but you will spend more time on that than anything else in IT.

#3) Change Data Capture is KEY

Ultimately, we all want to avoid risk as much as possible. If you want to stream data from data sources, querying a billion records every second is not going to work and neither will rewriting the application. Change Data Capture is your bridge - capturing data changes at the source. CDC will capture incremental changes on the data by listening to change mechanisms off the source - i.e. database transaction logs, journaling mechanisms within messaging services, rest APIs, etc. Each source is very unique and very different. CDC is how you will solve the first and hardest problems of a streaming architecture - capturing data from the source in a seamless way without making any changes to the application.


#4 - Use Replication Groups to Simplify Replication of High Volumes of Data and Changes

Replication Groups allow a streaming-first data platform to process non-stop changes to groups of tables in one shot. If multiple tables are updated in one transaction, that would be captured altogether. By using replication groups, you can select tables for replication by name or name patterns. If replication groups are not supported, then the user would be forced to create data flows for each table with independent flow execution on each - a laborious process and error prone to say the least.

Schema Evolution is a common feature of replication groups, providing full support for database schema evolution (schema changes) in an automated manner with options for the customer to determine how he/she wants to propagate schema changes. When a new table appears at the source and its name matches the replication pattern, it will create a table in the target database and start replicating it immediately. When considering streaming-first data platforms, replication groups and schema evolution should radically simplify massive data streaming processes.


#5) Find a Data Ingestion Solution that simplifies Your Data Architecture

Everyone talks about simplifying their architecture, but achieving a truly streamlined approach is another story. Here are few key components that you should look for:

  • An end to end platform that can accommodate Streaming Replication (for ELT), Streaming ETL and Batch ETL - ideally with a no coding, drag and drop UI
  • Easy to deploy, easy to onboard and with ongoing support from start to finish
  • A fully orchestrated platform that provides ease of monitoring, alerting and management
  • Scalable - grows with you as your data volume, processing complexity and use cases grow too
  • Cloud-agnostic, multi-cloud or on-premises
  • Transformation, Aggregation and Correlation
  • On cutting edge, best-of-breed distributed processing open-source frameworks such as Spark and Kafka
  • Exactly once guarantee
  • Ensure high availability and enterprise grade security
  • Failure recovery from sources and targets.

Streaming data will change the way that your business operates. It’s an exciting new chapter in operating, but one that needs a thoughtful and strategic approach for implementation. When the right steps are put into place from the start, you and your organization can reap the many benefits to follow.



Interested in learning more?


Download our Whitepaper in collaboration with Eckerson: The How & Why of Streaming First Data Architectures


DOWNLOAD

5 Insider Tips Before You Embark On A Streaming Data Architecture

Ready to Get Started?

Experience Enterprise-Grade Data Ingestion at Infinite Speed.