Credit Karma’s Journey to Reliable Generative AI Models with Data Observability
Credit Karma is best known for providing credit scores to millions of consumers — but the fin-tech platform also delivers support for finding financial products like identity monitoring, insurance, checking accounts, saving accounts and loans. And the recommendation engines that power those experiences rely on a massive amount of data.
As we learned at IMPACT 2023 from Vishnu Ram, Credit Karma’s VP of Engineering, his team handles more than 65 billion model predictions every single day.
“All of these model predictions help our members get the best recommendations in terms of financial products that they want and have access to in the market,” Vishnu says.
And while that workload alone would be enough to keep any data organization busy, Credit Karma isn’t stopping there. Like many data-driven companies, they’re focused on operationalizing generative AI to provide even more value to their customers and their business.
But how does a fintech company — highly regulated, and dealing with huge volumes of highly sensitive data — build reliable experiences with GenAI?
By ensuring data quality, of course. Read on to learn how Vishnu and his team do exactly that.
Table of Contents
Credit Karma’s data + AI journey
Over the decade that Vishnu has spent at Credit Karma, the company’s experienced massive growth. When he joined the team in 2014, Credit Karma already had 25 million users — but over the next five years, they added 70 million more. And from 2017 to 2023, the company’s employee headcount doubled.
Along with growth in sheer numbers, there was an increasing appetite for data at Credit Karma. Teams began to build data-powered capabilities to expand beyond the brand’s core credit score monitoring functionality, like delivering recommendations for financial products like insurance, credit cards and loans. From 2018 to 2023, the data team grew from handling 8 billion model predictions per day to the aforementioned 65 billion, with additional training data growth to support the data science team.
“Everyone wanted to use data because they could see some teams getting a lot of benefits from using the data and delivering a lot of wins,” says Vishnu. “We really had to lean on building out a technology platform to deal with this kind of scale.”
The Credit Karma team needed to ensure the data they were serving up to stakeholders was reliable and accurate. “When everyone started using data to deliver member benefits, from an impact perspective, we were creating a lot of value,” says Vishnu. “We wanted to build a safety net that allowed these teams to do the things they wanted to do. We needed to ensure that they can rely on the systems and the data, so the value they’re delivering is not transient.”
And as they continually increased spending on their cloud infrastructure and storage, they had to make sure they were demonstrating a good return on that investment.
“We wanted to make sure we were able to manage our cloud spend as efficiently as possible,” says Vishnu. “And we needed to help our stakeholders understand why we’re spending all this money on the cloud, and what value the data was helping us deliver.”
But just as the team was focusing on improving stakeholder transparency, a challenge started to emerge: data silos.
Problem: Lack of data discoverability and quality
As more teams across Credit Karma used data to build products and support members, they wanted to protect the value of their work and ensure they had control over their data. But it was clear that there were differing levels of data awareness and skills on those teams. “Everyone wanted to use data to deliver value, but there are only so many people who came in with a background in data engineering, data science, or business intelligence,” he says.
So when the team didn’t have transparent visibility into how data was being collected and analyzed, they weren’t able to play a proactive role in identifying and fixing instances of data being erroneous, missing, or otherwise inaccurate — in other words, data downtime.
“One of the big gaps that emerged was having different teams operating in their own silos, trying to protect the value that they were creating,” Vishnu says. “We wanted a system in place that would allow us to operate within one system in a consistent way — that allows us to know when things are going bad, how we can protect our teams, and understand what they needed to protect the value they were creating.”
Solution: End-to-end data observability with Monte Carlo
That’s when the Credit Karma team learned about Monte Carlo’s end-to-end data observability platform. With automated monitoring and alerting for data health at each stage in the life cycle, across the entire data stack, Monte Carlo could provide a way to consistently manage this process throughout the organization.
When Credit Karma first adopted Monte Carlo, there were a handful of users and only a few teams that would receive alerts when data downtime occurred. “Now, we have 136 users across the company who are using Monte Carlo across 38 domains,” Vishnu says. Those users are setting up their own monitors to ensure their data is accurate and reliable.
Vishnu describes this widespread adoption of Monte Carlo as “…an indication of the value we’re creating with data. If we weren’t creating value with data, people would not be spending their time protecting it.”
Monte Carlo’s automated monitoring, alerting, and end-to-end lineage ensures that every team across Credit Karma can rely on their data. And that visibility and trust is key as the Credit Karma team looks to the next evolution in their platform: driving even more value with generative AI.
How Credit Karma is investing in generative AI
Up through 2022, Credit Karma was accomplishing great things with machine learning. They used data to power member experiences and build helpful recommendation systems. Vishnu likens their role to that of a master chef who uses a deep understanding of his diners, techniques, and ingredients to handcraft a menu that’s served in a restaurant.
“But in 2023, by bringing in generative AI, we’re getting into a mode where our users are coming in and we’re helping them use technology to come up with their own menu and solutions to the problems that they care about,” Vishnu says.
As a fin-tech company, there’s a lot at stake to getting this right. Vishnu and his team work in a highly regulated space, and need to safeguard against bias. “How do you build reliable GenAI experiences?” he says. “I think there’s a big role for data.”
Leveraging quality data to build reliable GenAI experiences
Vishnu’s team is using large language models (LLMs) to converse with members and help them find answers to their unique questions via Intuit Assist. But they have to be extremely careful around how those LLMs access and use data to avoid hallucinations and inaccuracies.
“We don’t want the LLM to randomly give some made-up number when someone asks for their credit score,” Vishnu says. “And we want to make sure that if someone is asking for their credit score, they can only ask for their credit score — not someone else’s.”
For Credit Karma’s teams, that means enabling LLMs to use the company’s existing machine learning tools and data, rather than operating on their own. “We want to make sure that the LLM is delegating some of this work to the really high-quality recommendation systems that we’ve built,” says Vishnu. “There’s a lot of things that we use for data today in Credit Karma that we can use to build out these GenAI experiences.”
For example, the team is creating GenAI experiences that provide targeted recommendations and personalize experiences based on behind-the-scenes data. “If the LLM doesn’t know my income or my credit score, it’s not going to come up with the right answers given my situation,” says Vishnu. “And if I’m looking to buy a car to replace my 10-year-old Honda, once I buy that car, it’s done. We want to make sure the LLM is not thinking about me replacing that car forever.”
The vital role of data observability in GenAI
This means Vishnu’s team needs to have visibility into how the model is working and how changes affect its output. They need to track the accuracy of data throughout the GenAI experience, such as adding more data sources to RAG (retrieval augment generation) or switching models.
“We need to look at when data gets converted into embeddings, and when those embeddings get converted into some rich text that the LLM puts in front of the user,” says Vishnu. “Where’s the source? How did the LLM decide to put this in front of the users?”
Vishnu believes data observability will play an important role in helping solve these questions. Lineage can help his team track data to embeddings to rich text, and monitoring can help ensure the accuracy of data that’s powering their models and RAG pipelines.
“We don’t have any choice — we need to be able to observe the data,” Vishnu says. “We need to understand what data we’re putting into the LLM, and if the LLM is coming up with its own thing, we need to know that — and then know how to deal with that situation. If you don’t have observability of what goes into the LLM and what comes out, you’re screwed.”
Getting started with data observability
The Credit Karma team is using data observability to ensure today’s data health, and set the stage for the future of building reliable experiences with GenAI. If your organization needs more visibility into the state of your data, contact our team to learn how data observability can provide transparency and accountability across your organization.
Our promise: we will show you the product.