Learn how to implement column profiling using Kafka Streams

datum newsletter [Issue #2]

Jun 19, 2020

Welcome to the second issue of the datum newsletter. Thank you so much for subscribing!! Share and tell a few friends if you feel like it.

Share Datum

An article seizing opportunity in data quality published in MIT Sloan Management Review says

“The cost of bad data is an astonishing 15% to 25% of revenue for most companies. Two-Thirds of These Costs Can Be Eliminated by Getting in Front on Data Quality”.

It’s no different situation for real-time streaming data. I have developed a functional prototype for data quality column profiling using Kafka Streams API, including test cases, that you can use start using straight away in your projects. The project is available GitHub kafka-streams-dataquality use, share, and contribute. Do let me know feedback.

you can find blog post here

Streaming data quality — How to implement column profiling using Kafka Streams?— Data Quality Series

Archives and Recommendations

Understanding tradeoffs in designing real-time streaming analytical applications

There is no good or bad design instead, there will be many tradeoffs to make and hopefully, those tradeoffs are good for a particular use…

How do you explain distributing computing and Apache Spark with different levels of complexity

How do you explain spark distributed computing to a 7 yrs old kid, 9th-grade student, a software engineer (java), ETL Engineer, Machine Learning engineer and an executive

Apache Spark performance recipe — Explicitly cache RDD when branching out from parent RDD

The word count example below illustrates the importance of caching the RDD when the RDD lineage breaks/branches out.

You are receiving this email because you have subscribed via our website. All the posts are available on the website.

Disclaimer: All the opinions expressed are personal independent thoughts and not to be attributed to my current or previous employers.

Datum

Learn how to implement column profiling using Kafka Streams

datum newsletter [Issue #2]

Archives and Recommendations

Discussion about this post