S
4
-
EPISODE
23
Data Masters Podcast
released
November 6, 2025
Runtime:
40m01s

How Semantic Layers Future-Proof Data Strategies with David P. Mariani of AtScale

David P. Mariani
Chief Technology Officer and Co-Founder of AtScale

The semantic layer is becoming the backbone of trusted, AI-ready data. We’re joined by David P. Mariani, Chief Technology Officer and Co-Founder of AtScale, to explore why defining a shared business language is critical for scalable analytics and AI innovation. David explains how the semantic layer enables teams to align on metrics, eliminate silos and create flexibility across BI tools, data platforms and emerging AI interfaces. He also shares how open standards and large language models are reshaping how businesses interact with their data.

I'd rather read the transcript of this conversation please!

In this episode, David P. Mariani, Chief Technology Officer and Co-Founder of AtScale, shares why the semantic layer is the key to trusted, AI-ready data. He explains how defining a shared business language eliminates silos, enables flexibility across tools and accelerates innovation.

Key Takeaways:

00:00 Introduction.

02:22 Semantic layers began in BI tools, tightly linked to the presentation layer.

08:02 Combining a semantic layer with LLMs unlocks powerful insights.

12:57 Relying on one BI tool creates inconsistent metrics as AI adds new consumption layers.

17:29 Open-sourcing SML prevents lock-in and standardizes semantic models.

22:38 Semantic layers with GenAI reshape strategy through language and a strong query engine.

25:45 Without a semantic layer, LLMs were wrong 80% of the time.

30:33 Data engineers should build base semantic objects as part of their pipeline.

38:53 MCP with semantic layers and knowledge graphs gives LLMs a richer context.

David: [00:00:00] the future of using these LLMs is to let the LLM do the needle in the haystack for you, not to the traditional way where let's click and drag and drop as fast as we can, because we need to ask a million questions to get to something that's actually a valuable answer. 

Anthony: Welcome back to Data Masters. Today we're digging into a term that's absolutely everywhere, but often misunderstood the semantic layer. What is it really, and why has it become so critical [00:01:00] to both the future of BI and AI to help us cut through the noise. We're joined by a true pioneer in this space. Our guest is David Mariani, the co-founder and CTO at at scale. With a background in building massive analytic platforms at places like Yahoo. David leads the charge in creating a universal semantic layer for some of the biggest companies in the world.

And here he's here to give us a masterclass on what a semantic layer is, what it isn't, and why it's the key to unlocking trusted AI ready data. David, welcome to the show.

David: Thanks for having me, Anthony.

Anthony: So as I suggested in the introduction, this idea of a semantic layer is everywhere. but it's a little bit like one of these things, and I ask. People and the, you know, out on the street, what's a semantic layer to you? [00:02:00] Everybody's got their own picture of what a semantic layer is, and so I think it would be really helpful to ground the conversation if, if you can put a stake in the ground and, you know, literally, I can't think of anyone better in the world to do this.

Like what is a semantic layer and what problem does it solve?

David: Yeah, well let's start with just the history lesson first, Anthony. So, you know, semantic layers have been around since probably business intelligence has been around. the difference, today versus, versus then is that the semantic layer has traditionally. in the consumption layer, in the BI tool. And so it's been a core part of not just the metadata and the business, the, the, the logical, view of that data, but it's also been really tightly knit with the presentation layer. with the visualization layer. some people have like looked at building semantic layers in the data platform itself, in the database.

And that's, that took, took the form of things [00:03:00] like,like schemas, like you hear a lot of people talking about the medallion architecture. or, so they're creating reporting views, or stored procedures and the like. so, Sort of I'd like experienced the, the semantic layers in both, both sort of, places, at, during my career.

And it was really at Yahoo where. I really saw it in spades, and that was because Yahoo was born as a business unit focused, organization where the business units owned their own stacks. and so because of that, we had one of everything. and because of that, we had multiple semantic layers. and the issue that, that I had there was that, you know, the basic measures of the business, which are basically clicks and page views. Everybody had a different definition of it because some people were using MicroStrategy. That had a semantic layer. Some people were using Tableau that had a semantic layer. Some people were using ClickView, that had a semantic layer. and so, [00:04:00] at the end of the day, nobody could really mesh those numbers together. So Yahoo Finances numbers were not comparable to Yahoo Sports or to Yahoo Front page. And when it came to revenue, you know, search and display. Couldn't compare their numbers 'cause we couldn't agree upon the very basic, fundamental metrics of the business. So what really came to me at that point is like the semantic layer, truly to be the single source of truth, it needed to live independent. Of the consumption layer and the data platform layer just like we had multiple consumption tools at Yahoo, we also had multiple data platforms. 'cause most of our data was in Oracle. But moving to this new thing, we called Hadoop, which was the, the data lake. and so, it, there wasn't a single place. To, to store your data. So the universal semantic layer is something that we invented at AtScale. and the [00:05:00] idea there was to create a, a single source of truth, a logical representation of your data. So you ask about what it is. First of all, it's virtualized. So the semantic layer is not a data store. We don't ingest data into a semantic layer. It is really, a router, a logical data router, where we can present a logical view to the consumption tools, which include BI and ai. And then translate those logical business queries into physical, SQL queries so that we can run against the data platforms. while we're doing that, we're learning about the data, we're learning about patterns, and we're optimizing it. So we're creating aggregates in the data platform to make those queries faster so that customers can avoid having to create data extracts or to create import modes, where they're actually creating copies of data. For analytics. So really that's it. a [00:06:00] semantic layer needs to be universal. it needs to be virtualized so it doesn't store data or create yet another sort of data stack. and it's gotta scale to handle, the most complex business use cases so that people can model their business in the semantic layer without having to resort to building that logic the front end.

Anthony: So I, I think that's helpful. and let me, but let me add maybe a layer to it. and I think the trick is in the word semantic, so. Tell me if you would agree with this, which is that a semantic layer is a way for businesses to describe the meaning of their data. now you make these important points that you, it follows, therefore, that it shouldn't be stored in the data layer.

It shouldn't be stored in the presentation layer. It needs, you know, the, all of the points you made previously, but. The problem that the semantic layer solves is giving language to [00:07:00] data, which by almost by definition is, is sort of language independent, and giving a way for the business people to talk about their data.

Is that a fair summary? I don't wanna put words in your mouth, but.

David: It, it really is. and you know what, it's, it, it's a little bit, you know, when it comes to business intelligence, you know, it, it, the semantic layer sort of manifests itself through. Measures and dimensions in those BI tools where people are dragging and dropping and, and double clicking to drill down. But really it's, it's really an AI that's really sort of made it more obvious. because does an LLM need? An LLM needs business context. It needs context about your business LLMs are trained on the general internet. don't understand, what Tamers,a RR means or how you calculate your a r, or what you call gross margin, or what you call a customer. that is definitely something that has been [00:08:00] defined for your business. So when you marry. Universal semantic layer, which brings context and brings that, brings that detailed information about your business. With the power of an LLM, magic happens. and it happens because it's,an LLM is so much more powerful than a, you know, sort of traditional BI tool where you're forcing the human to sort of find the patterns in the data versus relying on LLM to navigate that for you using business context

Anthony: Right, and using, again, the trick is in the words, but. Using the language. but let's, let's come back to the AI story in a second. 'cause I want to go back in your, introduction. and I just wanna be careful and clear about differentiating between two different, other alternatives. So a a not unreasonable perspective that someone might bring to this conversation is to say, wouldn't it be best to contain my [00:09:00] semantic layer?

In my data preparation experience. And so, or it said differently in the act of preparing my data, heaven forbid, in a spreadsheet, it is in that mode, in that moment that I will be giving it meaning and language and, and and you know, like, and actually I suppose in a way the spreadsheet examples a perfect one.

'cause you can label columns and rows anything you like in a spreadsheet. But you know, imagine for the moment, A not unreasonable perspective is data preparation as a mechanism of creating semantics. I I, I imagine you think that's a terrible idea, but, but, but give language to why it's a terrible idea.

David: Let's talk about that. 'cause look, when it comes to data prep, first of all, the semantic layer is virtual, right? and so it needs to talk to the physical, I like to separate that logical from the physical. But the physical, that data needs to get into shape, and needs to be queryable and needs to have [00:10:00] quality. and really the semantic, that's not the role of the semantic layer. so if you talk about, and you think about the medallion architecture where you have bronze. which is the raw data. You have silver, you know, which has been cleaned, and then you have what they call the gold layer. and the gold layer tr traditionally is those reporting views. and what I see customers doing is that the semantic layer can either take the place of the gold layer or can sit on top of the gold layer. It doesn't matter honestly. because, you know, but what I find is like for customers who don't have a semantic layer and are trying to use the gold layer as their, their data API, the problem is that the business wants to combine that in different fashions in different ways. And so if you're thinking about it, it goes back to it trying to keep up with, with the, pace, pace of the business. Where new things are happening all the time. And that's sort of like, let me like physically store the [00:11:00] data you want Mr. Business customer who doesn't understand the business, it's better to put the, the business logic in the hands of the business.

Allow them to use that virtual semantic layer to then combine, and define additional sort of, virtual metrics and dimensions. In a way that makes sense to the business. It just allows the it and the, the data engineers to do their job really well. Put data in a form that can be consumed by the semantic layer and it gives business the power of creating their own data products.

Anthony? So I, I think it's a, very, very complimentary one cannot exist without the other.

Anthony: So I think that makes a ton of sense. And and maybe to add to that, the confusion when people think about a semantic layer as data prep is that you're really just pushing the two ideas into one. You're saying the data is. The semantic layer, and I think where you started is exactly right. the, the way [00:12:00] you, the language you use to describe the data is by its nature different than the data to, and you use the example of defining what is a customer, or in your Yahoo example, what is a click?

The idea of a click. Versus the actual records that comprise acrylic. The records themselves may be in multiple databases. They may be, described in very different ways than the language, but that does sort of, beg an alternative, which I'll throw to you, which is, well, maybe. This should all be described in the consumption tool, in the BI tool, or again, God forbid the spreadsheet, but, but more likely the, the BI tool.

and you've made this point, but I wanna give you the opportunity to, help listeners be really crisp. Both why defining the semantic layer in power bi click Tableau, ThoughtSpot, whatever is a terrible idea.

David: So, a couple fronts of why it's a bad idea [00:13:00] now, If you only have one tool that you consume data with and, you know, now that AI is here and Gen AI is here, I don't think anybody can say that that's affirmative anymore. Now, you could say before AI. That. Yeah, you know what I can have, I'm a Power BI shop and that's all I'm going to use, except there's also Excel. but you know, power BI can serve that too. So I guess Anthony, if you have more than one, then you're gonna have more than one definition of your business metrics. If you have more than one consumption tool. And I would argue that today you have definitely have at least two your BI platform and now your. AI chat bot, AI agent, whatever you wanna use.

Anthony: But I, I actually don't even think you, you need that to be true, which is even, let's just imagine a world where everybody's standardized on Power bi, then you still have multiple power BI workbooks.

David: That's exactly right. and you're right. So you have like, you have the balkanization within the tool [00:14:00] itself. because they're not made to be, objects are not made to be shared there or to be consumed and across different. Analytics. so that, well, that's, so that's, thank you for making that point.

So that's a, that, that's one problem, one dimension of the problem. The other dimension of the problem, and I see this in customers all the time, is lock-in. So we don't know what's gonna come up. and we don't know what the new sort of consumption, consumption pattern will be. and people may be talking to their data, maybe people, maybe we have headless agents. They're not even like, they're just machines that are taking action on data. We don't know. And so to make a bet where you're saying, I'm gonna put my semantics in this BI tool or this tool, or even if I'm gonna make a bet to put it into a data platform of this type, which is what, by the way, snowflake and Databricks are trying to make you do. it's like, that locks you in. And so [00:15:00] what I would say there, they call it shifting left. It's like, okay, that's great, except that why is everybody investing in iceberg? They're investing in iceberg as their data format because they want to have the freedom. Choosing the best engine to operate on that data. So if you are gonna go ahead and build your semantic layer in the data platform, now you're stuck there as well. And if you're gonna invest in iceberg as if general data format because you want that freedom to move. To the lowest cost provider. Well, that's now become an impediment.

If all your, all your semantics are now talking to something that's, gonna be really hard to undo. so that's the, the second dimension, Anthony. It's not just that you don't have a single source of truth because you have, you have multiple consumers of that data, but you also, you can't future proof your business because you've.

Embedded your most [00:16:00] important meaning about your business In the data platform or in the consumption tool?

Anthony: Got it. And of course it's in their strong interest, to have you do that. but, and I wanna come back to this idea of open, uh, but I just want to kind of, wrap up a little bit in a bow where we've gotten to. So semantic layer is about, The language you're using to describe the data, it's a business language for describing the meaning of the metrics and dimensions that drive your business.

It needs to be independent of the consumption tool, independent of the data storage layer, and it needs to be so, so that it. That it can be,that, that you can have flexibility in those underlying tools, but also that those underlying tools themselves have a lot of complexity that you don't want burdened with this.

Now you mentioned this point about lock in vendor, lock in,

and a commitment to flexibility and openness, and I know. At scale in particular, has made some big investments [00:17:00] here with SML, everything needs a three letter acronym. So you've appropriately, gone on the three letter acronym bandwagon.

Congrats. But I guarantee that no one listening to this has any idea what SML is. So why don't you start us off with what is SML, and then we can talk about why it's important.

David: So it stands for Semantic modeling language. Okay, so

Anthony: Very logical.

 

David: so I've just been talking about vendor lock-in, right? Well, what about at scale, right? if you don't wanna lock yourself into a single semantic layer either, 'cause then you have the same problem of if a, a better tool comes up or a lower cost tool comes up. Now you gotta rewire everything to your, from your semantic layer. so. I can't talk in both sides of my mouth, right? I can't say don't, don't go for lock-in on the BI tool. Don't go for lock-in the data platform. So we invented SML and we open sourced it Anthony, so we're pushing it as a standard language for describing [00:18:00] semantics and semantic models. we made it object oriented. We made it YAML based. So it's very standard. it's composable, which is really important in today's sort of world where you want the business units to be able to create their own semantic models. Hey, we want. LLMs and AI to create their own semantic models.

So it's gotta be composable for humans and machines. And so we did that and we, and we created it as a standard and open sourced it because we want SML to be the, common language that semantic layers can implement, at their foundational level. So, Before it gets outta hand.

So look, we were the first to do a universal semantic layer. Now there's a lot of different semantic layer providers out there, including some big boys out there. and I don't know if it's this just, recently came to press where Snowflake announced, their initiative of OSI, which is their [00:19:00] sort of, open interchange for semantics. and. it's not a semantic modeling language. It's a way to move from one proprietary semantic modeling language to another. That's like, just imagine Anthony, there was no sequel, there was no structured query language, but you had a, query language. Interchange and so that everybody could keep their own sort of query languages, and then if you wanted to go from SQL Server to Oracle, you had to then use this migration tool to migrate all your code to this new language. That doesn't make any sense to me. It's like. It was really important that, SQL became standard that the database vendors implemented. It created a whole analytic data and analytics ecosystem around that and it really created, some great possibilities. So we think that somebody had to do it. nobody [00:20:00] was doing it, we had to go ahead and say, you know what, we're just gonna open source this. and hopefully, you know, the rest of the players will sign on and we'll make it so that now the semantic layer won't have lock-in.

 

Anthony: And I think that's great and I, I very much appreciate or acknowledge the point you're making about talking on both side sides of your [00:21:00] mouth. you know, I think the argument here, of course, is that. You can create the standard, but then also be the, the best player within that standard. But, you know, maybe there's some innovation that occurs, on the edges that's really, you know, useful and interesting and exciting, and helps advance the, the cause.

obviously one area of innovation and excitement, and we've already brought this up, is large language models and AI and a little to your ear, the earlier commentary. The nature of a semantic layer is that it's language. It's about, it's using words to describe the data. and I think this is a, you, you made this point earlier, just to bring it back to the.

The front of the conversation that, large language models by their nature don't understand are not actually very good with data. They're actually very poor with calculations. they're not even particularly good with, data. and. They don't know anything about your business. They've been trained on the public internet as you [00:22:00] describe.

so, but talk us through this in a, in a bit more detail. The connection between, databases, data, the semantic layer, ai and maybe even bi. And actually, just to be controversial for a minute, is it your view that the predominant consumption experience will be through these AI large language model interfaces?

David: Yes, and I'll just describe my experience so far. It's, I'm, my mind has been blown,

Anthony: it's changed everything as a leader of product at Adscale. I actually rewrote my whole product roadmap after having some experience with these tools, in combination with the semantic layer. but let's just talk about, there's two parts where the semantic layer really helps, gen ai. one is the metadata. And most people just think about the metadata. You're right. You know, Anthony, it is about the language, and the language and. The business language overlay on top of the physical data. the [00:23:00] other really important part is the query engine itself that comes with a semantic layer. And why is that important? You just mentioned it, you said that LLMs are not really good. At doing calculations, for example, or navigating joins and the like, and doing that in a deterministic fashion.

David: So what the semantic layer does is it makes the job for the LLM really easy because what we do, the way we do it is that a model gets presented to the LLM as a single table. So you may have 250 metrics and, a hundred different dimensions and all these different hierarchies. But to the LLM, it's a single logical table, which means there are no joins that are necessary. So it's this semantic layer that generates the sequel that does all those nasty joins, whether it's a many to a many, or it's a semi additive or non additive, or you have to do a [00:24:00] calculation that requires multiple passes of the data. All those things are really hard for an LLM to do on its own. So it's really the combination of providing the language context to the LLM. And then providing also a very easy interface so that the LLM can generate very basic, logical sql, and then the semantic engine take care of all the nastiness of translating that into the physical queries against the the respective data platforms.

Anthony: So, lemme pause you there for a second because I think that may be something new for listeners. there's certainly a, school of thought out there that what large language models need is a text to SQL engine that that will solve the problem. It sounds like you wouldn't agree with that, or at least you would say that's the wrong path to solving this problem.

That better to give it a, a good le semantic model. Is that fair?

David: Yeah, and we did test with and without a semantic model. So, [00:25:00] to give credit where credit's due. Juan Cicada of Data World is,which I think is now part of ServiceNow. I think, he did the original benchmark where he, he compared, on a semantic layer versus an LLM by itself. And, it was pretty stark, in terms of, how much the semantic layer helped. So we took that same research, but we did it against T-P-C-D-S. because T-P-C-D-S, it's, it's a more complicated schema. It's a retail based model. It comes with predefined queries. So it's a little more standardized.

and when we ran it with. semantic layer on at scale versus, just raw. the LLM was wrong 80% of the time. Anthony, without a semantic layer, it got 20% of the answers right. and it wasn't just that because LLMs had to do the joins and had to do all that navigation and calculations on their own, the answers were not deterministic.

Anthony: Oh, so, so one day you'd ask the question to get three, and the next day you ask it and get five.

David: [00:26:00] Exactly. that's really, really scary because the way that these LLMs, they're so confident in terms of their answers. and they give, they give users false confidence. And that could change from day to day or query to query. So the, the value of the semantic engine is that those queries will always be deterministic because the semantic model has already defined those paths. those paths don't change. what does change is that text to SQL for which that metadata is really important for the LLM. now if you combine those two together, you combine the semantic layer in the LLM, what we found is that we used model context protocol or MCP as that inter interface. And it's lights out, Anthony.

So now if you use MCP as a protocol, now that LLM is talking straight to the semantic layer with no extra work, no training, no nothing. back [00:27:00] to your original question, what's gonna be the future of consumption? I've been using just quad as a chat bot. and a lot of people think of natural language query.

They'll first start with, and I started this in doing my demos. Like, okay, you know, show me sales byproduct,for Canada, for 2002. And it's like, okay. And then somebody else came to me and said, Dave. David, that's Why would you do that? That's you're acting like a bi tool now.

Anthony: Right.

David: Or you can imagine I was just automating the clicks, right?

And the

Anthony: Yeah. It's like what you would've done through the UI of a of a BI tool.

David: I would've done. And instead he says, show me insights on sales. And the LLM will then go and it wrote, you know, 10, 15 queries against the semantic layer and came up with all these insights and then even charted them and created a dashboard. For me, I didn't ask for any of that.

I just said, show me insights on sales. And to me that was like, [00:28:00] okay. we are thinking about this the wrong way. the future of using these LLMs is to let the LLM do the needle in the haystack for you, not to the traditional way where let's click and drag and drop as fast as we can, because we need to ask a million questions to get to something that's actually a valuable answer. that blew my mind and I, so, so I do think things will definitely change in terms of how people consume analytics. I still don't know whether that's a chat bot or that's something else, Anthony. But I definitely know that it's an LLM paired with a semantic layer and probably through MCP as a really good interface to make those LLMs really smart.

Anthony: Interesting. Yeah, no, I think that, or I might refine what you said slightly, which is to say that. The idea of an LLM, using a semantic layer in the context of trying to create queries on data. [00:29:00] cause there are lots of things LLMs can do that aren't about querying data. but I, and I think there's another really important insight that you draw there, which is the tendency for us to come at the problem as though we were clicking around, especially people like us, but as though we were clicking around in a bi tool when what.

If business users do is just ask questions about the business, not about the data, the, the, the problem. We're talking about solving here is translating the business question into a data question, which traditionally you would've given to a data team and said, okay, data team, like, figure out the bus. The data question that comes from this business question, which actually brings me to a, a sort of a, in a funny way, in a roundabout way, brings me to a, a harder question, which is who should own?

The semantic layer. So imagine someone's listening to this podcast, they're like, this all sounds great. We need a semantic layer. I need to go do it. should they be a data person? Should they be a business leader? Should [00:30:00] they be, if they are a data person, are they the, you know, the report writer type, or are they the data engineer type writing python code?

Like who's the ideal owner of this layer, so to speak?

David: I would do with, AtScale at Yahoo, is that. Definitely. So there's, there's participation from the data engineers. To me, the data engineers would create the base semantic objects. and SML is is called a data set. A data set is a representation. It can be a straight map to a table, or it could be a query, or it could be a table with, calculated columns. So if I were, if, if I were doing it, I would have my data engineers go a little bit further on their pipeline. So they wouldn't just write the tables, they'd also write some base semantic objects, and they would include that into a repository.

So with SML, it's code and with at scale Design Center at Scale's, we store all of the SML in gi and you can have multiple [00:31:00] repositories. So what I would then do is that I would put those into a common repository. Then what I would do is I'd open that up to the business. So each business use unit could have their own get repos. and they could create their own semantic objects and own, semantic models. And they could refer to those base objects. And those base objects are not just data sets. They could be base dimensions like calendar. Very important in a retail is to have a, a common calendar that you're reporting on, that

Anthony: Yes.

David: with time. it's really important to have a common product dimension, a common organization dimension. You can imagine that those are curated maybe by a central team, maybe by a team. That's anointed with that duty or maybe a business unit that is designated as the owner, but it needs to be distributed.

Anthony is the short answer to my long answer. It should be distributed in a semantic layer that's [00:32:00] composable. and is backed by code that possible so that you can decentralize your data product creation without creating chaos and creating mistrust in the data because you've lost your single source of truth. So semantic objects should be also,controlled, just like source code is controlled. So if somebody wants to contribute a new base object. It should be a pull request. And that pull request should be reviewed by somebody who is, designated as that reviewer, for that particular repository. And they can decide to accept it or ask for changes or reject it because there could be another semantic objects object that's similar or a duplicate. So. I really think of semantics and semantic layers should be a software sort of CICD kind of a practice. and if you do that, I think you can open up data product creation and [00:33:00] move it to the edges without getting into, the problems of the past where you have everybody rolling their own when it comes to, analytics and

business definitions. 

Anthony: that's a very nice way of describing it. the semantic layer is a software asset because I think it also helps, encapsulate why it's so different. Than the underlying data. so one of the obvious sort of parallels and overlaps here between, you know, the work that Tamer does and, and that you're doing, is thinking about giving people access to clean, curated, up-to-date data.

Well, there two problems there. One is a language to describe that data. Like what are we talking about? Are we talking about retail sales or are we talking about farm animals or whatever the other, you know, whatever it is. And then the second is. The actual underlying data, like, you know, where are the sales and, you know, what kind of animals are we talking about?

Like, it's the, it's that next level down that we think a lot about. And, and you're thinking about the language [00:34:00] to describe the software asset over the hardware substrate, if you want to think of it that way. 

David: before we leave this,another sort of comment I had, which is that, when we're talking about LLMs, I wanna make sure that, your listeners don't just think of LLMs from a query perspective. so because LLMs are really good at writing code. You asked about, okay, who creates the semantic? The semantic layer? Who creates the semantic model? Guess what? AI and LLMs are gonna create the semantic models. So what we're really investing in now, and we're doing it through MCP, because we can create tools.

I can create a. a measure tool, a dimension tool, a calculation tool, a semantic model tool that then take all those three elements and create models from your data. So what we're investing in is, is going beyond just. Using the LLM to query a semantic layer, but also using the LLM to create the semantic [00:35:00] model itself.

Anthony: Interesting.

David: could imagine, right, that this could be self-learning. So as people ask questions, we can see a pattern in those questions and then, and potentially. Create new semantic objects that are prebuilt to answer the questions that maybe those users are taking a, a, a haphazard path to get to. 

Anthony: So it's, but to play with that idea for a second, and tell me if you agree with the way I'm framing this, the point you're making is that the kinds of language and questions that people ask of the business, and I'm being careful not to say anything about data. The things that're asking about the business changes, the strategic, environment of the business changes and.

Again, to put words in your mouth, the semantic layer needs to keep up. It needs to be prepared to answer the questions that are coming. Is that, did I get it?

David: You got it. And right. Today, semantics, the hardest part of implementing a semantic layer is [00:36:00] building the semantic models. so today it's far too human, curated. and so, we've been investing in, In a chatbot, you know, in a, in a copilot that uses LLMs through MCP the human still has to approve it. The human still has to look and see what the LLM is, generating. Just like, you know, you wouldn't trust an LLM to write your, JavaScript without reviewing it. but. gonna be much easier, because that's part of the barrier to getting value out of a semantic layer, is it, it needs a semantic models to generate that value in that ROI. so I think then in the, in the very near future, you're gonna see many more machine generated semantic models with humans, curating and, and, and human oversight.

Anthony: Interesting. Well, again, not to draw the parallel of the tamer too directly, but that sounds a very similar conception. The way we talk about, [00:37:00] not the semantic layer, but the data itself. Right? Because, you know, fundamentally, when you have, duplicate data records, like, you know, the model can be helpful and human feedback is important.

I was gonna ask you about the future of the semantic layer, but you've done, you, you've sort of gotten there yourself. I think this vision you're setting out of, Of, you know, it's, it's AI out and AI in is really compelling.

David: Yes, that's exactly right. That's well said. yeah, and I think that as, you know, one of the things that, that was always a challenge for us as a a semantic layer platform company is that in the BI tools, right, you can, there's a lot of client side calculations. And a lot of times we've gotten a lot of requests.

It's like, well, how can I turn those client side calculations? I don't want them to be defined in the client. They should be defined in the semantic glare, and was always like, oh man. if I try to do that, I'm gonna be presenting that semantic modeler with a bunch of noise. [00:38:00] I could use AI to, to not to just surface the ones that matter. and yeah, and I don't even, you know, with the right kind rules, I could actually create the semantic and enhance the semantic model, in place through

Anthony: like

David: users

Anthony: so good at summarizing long form text here. You're asking it to summarize complex calculations.

David: so I'm really excited about it. I think it's, there's so much more to be done. I was talking to a, just a, a customer this morning and they have a knowledge graph tool. and they have at scale like, okay, how can we make the knowledge graph tool and, and at scale work together? And my answer was that they want to use it for AI because they have, they have people that are asking questions or customers are asking questions about particular, prescriptions or, or ailments and the like, and they have a, a knowledge graph with all these terms in it. And it's like, well, the answer is like. Use Mt. P. So you have MCP on top of your knowledge graph. You have [00:39:00] Mt P on top of your semantic layer. The LLM has access to both of those and let the LLM do the mashing, let the LLM do the D do. You're just informing the LLM, you're giving it more data about your business they're, and that data can come from any form, not just a semantic G layer.

Anthony: Yeah. No, I love it. and it's a, it is a good way of thinking about the problem because you're, it's a separation of concerns, so.

David: Mm-hmm.

Anthony: Well, David, thank you for the time. I hope that for listeners, that we've demystified the semantic layer. It certainly has for me. and hopefully people have a better sense for what it is, why it's valuable, and, how should one should go about thinking about, implementing one.

So thank you for the time.

David: Thanks a appreciate it.

Suscribe to the Data Masters podcast series

Apple Podcasts
Spotify
Amazon