Soda announced today the launch of Soda SQL to bring data testing, monitoring and profiling to SQL accessible data. Soda SQL is the first part of Soda’s strategy to provide freely available open source data management tools to engineers working in data-intensive environments where data quality is paramount. The launch follows the news that Soda has raised over €14M in Series A and Seed funding.
Businesses in online retail, biomedical, financial services and other fast-moving sectors are building more and more products using data as a core input, meaning that it is now critical to test and monitor the quality of data being used. Soda SQL helps data engineers easily identify, monitor and solve issues that might impact large datasets, data lakes and data warehouses, equipping them with means to screen data pipelines and monitor the data through a series of fully configurable tests that examine different characteristics of a dataset.
"Almost every company is now trying to automate processes and create innovative new data products and services using data, but the key challenge for teams across the Enterprise is having data that is reliable enough to make this happen,” explains Tom Baeyens, Co-Founder and Chief Technology Officer, Soda. “By providing highly configurable and open source SQL data testing capabilities, Soda is taking the first step towards empowering data engineers with the right tools to meet these challenges.”
The configuration options within Soda SQL enable data engineers to control the tests set to screen for bad data and the metrics that are used to evaluate the results. Soda SQL uses efficient SQL requests to extract data metrics and column profiles with full control over the queries provided through declarative YAML configuration files. The tests run by Soda SQL are performed across the data pipeline and trigger alerts when problematic or bad data is found. The results can be viewed directly and used to catch problems, quarantine bad data and send updates to the Soda Enterprise data monitoring. This enables individual testing by data engineers to be integrated with the enterprise-wide data testing strategy.
“In software engineering, we started testing to catch bugs more than 20 years ago and we have never looked back,” explains Maarten Masschelein, CEO, Soda. “These software engineering principles are now making their way into the data engineering world, where the subject of testing is not code, but the data. By providing the tools for users to work collaboratively with data teams across the company, Soda helps data engineers to detect data quality issues early and involve wider team members to ensure quality is maintained and communicated. Soda SQL is an important first step towards becoming the de facto collaboration platform for Enterprises solving challenges arising with data products.”
Soda is in the process of developing a comprehensive suite for open source data testing and monitoring that will include developer tools for data frames and streaming data and operate across all major data workloads, engines and environments including Kafka, Spark, AWS S3, Azure Blob Storage, Google Cloud Datastore, Presto, Snowflake, Azure Synapse, Google BigQuery, and AWS Redshift. Soda SQL is provided as open source software under the Apache License and offered for free on GitHub. The product executes either on the cloud or on the local systems of data engineers.