Announcements, Data Observability

Monte Carlo Launches Observability for Azure Data Factory, Informatica, and Databricks Workflows

Michael Segner

Michael writes about data engineering, data quality, and data teams.

Monte Carlo, the data observability leader, today announced new observability features for Azure Data Factory, Informatica, and Databricks Workflows. With the new release, Monte Carlo users can gain visibility into pipeline failures, lineage and pipeline performance – all in a single pane of glass.

To provide business insights and power critical applications, engineers commonly use orchestration or ETL tools to create data pipelines that extract data from a variety of sources and then aggregate it into the form required for consumption. 

These data pipelines, and the integrations between them, can fail resulting in a cascading deluge of data quality issues. Data engineers get overwhelmed by alerts across multiple tools, struggle to associate pipelines with the data tables they impact, and have no visibility into how pipeline failures create data anomalies.

With Monte Carlo’s end-to-end data observability platform, data teams can now get full visibility into how each Azure Data Factory, Informatica or Databricks Workflows job interacts with downstream assets such as tables, dashboards, and reports. 

This gives users a single pane of glass to quickly understand how pipeline failures create data issues and map what assets are impacted so as to inform stakeholders. Furthermore, alerts on pipeline failures are all routed and managed in the same place, streamlining and simplifying data team operations and preventing ownership gaps. 

Users can easily see that Jira_support_issue_load is the Databricks job populating the table in question.

“For many organizations pipeline and system failures are a daily fact of life,” said Lior Gavish, CTO and co-founder, Monte Carlo. “Traditional approaches to data quality focus exclusively on detecting that data is broken, not understanding the why, which often goes beyond the data itself. Data observability is the only approach that monitors data, systems, and code to both detect and resolve data quality issues at scale and across your entire data ecosystem.

We’re going to continue building integrations for every system our customers use to produce, process, and consume data because it is the only way to ensure data quality issues get detected AND resolved. These systems in particular have been strongly adopted across the Fortune 500 and are critical to how they deliver data reliability,” he added.

Eureciclo, a Brazilian-based reverse logistics company, greatly accelerated their average time to resolve data quality issues thanks in part to Monte Carlo’s data lineage and pipeline failure alerts.

“When we had a pipeline failure our data engineer had to access every single system. He was going into Databricks, going into Airflow…and now suddenly all the information he needs is in one place for him,” said André Gonzalez, Data Manager, Eureciclo. “It’s made his post-mortem reports better and made the process much more efficient overall.”

Monte Carlo was the first data observability platform to support multi-system data lineage and alert on job failures for orchestration tools such as dbt, Airflow, Fivetran, and Prefect. It now supports more native ETL integrations than any other data observability provider, including pipelines common in large organizations.

Building The Data Observability “Data, System, Code” Framework

Incident management and resolution is a core focus of Monte Carlo’s development roadmap. The data observability platform recently announced multiple features–including new integrations with Databricks and Gitlab– to make it easier for data teams to identify and resolve data quality incidents caused by code related issues.

“Historically, data quality solutions have been focused on detecting data quality incidents,” said Moses. “Our vision is to take the next step and provide visibility into the three primary ways data breaks: problematic data sources, system failures, and code changes. Accelerating the

root cause analysis process makes data teams more efficient, and reduces the risk that poor data quality will negatively impact business operations and investments in AI.”

To learn more about Monte Carlo visit: https://www.montecarlodata.com/

Our promise: we will show you the product.