November 20, 2024

Building Tomorrow’s Data Ecosystems for AI and Analytics with dbt's Tristan Handy

Tristan Handy’s move to develop dbt Labs meant stepping away from steady consulting revenue to build a product. With support from an open-source community and interest from enterprise clients, dbt quickly found a growing user base. By 2019, it had become a go-to tool for data professionals looking to streamline workflows and prepare data for AI. Today, dbt Labs serves over 50,000 teams globally, with dbt Cloud helping organizations tackle modern analytics and AI needs through data transformation, observability, and orchestration.

In this episode of Barrchives, Tristan discusses how dbt is adapting to support the evolving demands of today’s data ecosystems. He shares insights on how data teams can move beyond manual, repetitive tasks to create environments where data becomes a valuable, collaborative asset. From AI’s potential as an analytical “thought partner” to the emerging standards reshaping data access, Tristan explores the shifts making data infrastructure more adaptable and effective.

Tune in to hear Tristan’s thoughts on the future of data and how shared frameworks can drive faster, smarter insights—and read on for deeper analysis into five highlights from the episode.

AI as a Thought Partner: Shrinking the Gap Between Curiosity and Insight

“What is the distance between curiosity and result? And if you can shrink that time, then you can non-linearly change the outputs of any analytical system... You move even one level of abstraction up and AI becomes a thought partner. You start to be able to ask it not just what questions… you can ask it why questions.”

For analysts, AI as a “thought partner” allows them to approach data as a dialogue rather than a series of technical tasks. This reduces the lag between asking questions and receiving actionable insights, making it possible to engage in deeper real-time analysis. The ability to explore the “why” opens opportunities to explore patterns, anomalies, and root causes in data.

Open Table Formats: A Path to Seamless Data Sharing

"I think the biggest thing happening in data right now is open table formats and the catalogs behind them. The fundamental shift here is that the total spend on data infrastructure has risen significantly… As spending and market division increase, teams often find themselves stuck on different sides of a company’s data infrastructure, with limited control over spending or collaboration. So the obvious solution becomes, ‘Why don’t we all store our data in S3 in a common format, so every platform can read from the same data?’

Tristan believes that open data formats are the most interesting trend today in data; allowing every platform to read from the same data.

dbt’s Growing Role in Powering AI Workloads

"It's very hard for me to look at the top line numbers and say, this percentage of people are using dbt for this thing versus that thing... We did a survey early in the year and asked, does dbt power downstream AI workloads? There were two answers that were essentially yes—one was yes today, and the other was we will be in the next 12 months. So, the sum of those yes and yes in the near term was 55%, which I was quite surprised by.”

As dbt gains popularity, data engineers are applying it to feature engineering, a core step in machine learning, bridging the gap between traditional analytics and AI. This trend highlights dbt’s flexibility and appeal, not only for data transformation but also as a key component in preparing data for AI-driven insights. For data teams, dbt’s role in AI workloads opens up new possibilities, allowing them to manage both analytics and machine learning processes with one tool. This integration means more streamlined workflows, easier collaboration, and a consistent approach to data preparation across functions, making dbt a powerful ally in modern, AI-enabled data ecosystems.

The Challenge of Feeding Structured Data into AI Pipelines

“I've asked lots and lots of people who are at the center of this ecosystem, what is the right way to feed structured data into large language models? And by and large, the answers I get are quite unsophisticated... We've worked so hard to get this in the exact format, and then we have to take that and put it into English.”

The integration of structured data into AI systems remains a huge hurdle. Tristan highlights a frustrating industry trend—converting structured data into text before feeding it to AI models. This process not only adds unnecessary steps but also risks reducing the precision of the data, compromising its effectiveness in AI applications. For data practitioners, this underscores the need for AI models and tools that can work directly with structured data. Bypassing this forced transformation would allow AI systems to process data as-is, preserving its integrity and boosting accuracy.

Social Consensus in Data Definitions for AI

“The thing that's being missed there is the social consensus layer... you can't just look at a database and know how to calculate a metric, you have to go around and talk to all the different people who are stakeholders in this metric... I think that you have to actually take those definitions, store them in code, and then allow the LLM to do that.”

Defining metrics for AI isn’t as simple as writing code—it requires cross-team consensus. True accuracy in AI workflows comes from a shared understanding of key data definitions within an organization. Without alignment, there’s a risk of misinterpreting metrics, leading to inconsistent AI outputs. This points to the value of codifying agreed-upon definitions to ensure all teams work from the same playbook. For organizations, investing in this “social consensus layer” means embedding consistent standards across data and AI systems, reducing ambiguity and strengthening decision-making.

Become a better AI founder every Wednesday with articles and episodes sent directly to your inbox.
explore untold stories in ai, directly from the industry's top founders.
Delivered to your inbox every Wednesday.