EPISODE

released

January 7, 2026

•

Runtime:

45m58s

How Open Source, Python and AI Are Shaping the Data Future with Wes McKinney of Posit PBC, Voltron Data and Composed Ventures

Wes McKinney

Principal Architect of Posit PBC, Chief Scientist of Voltron Data and a General Partner at Composed Ventures

The future of analytics isn’t just about bigger models — it’s about building smarter, more interoperable data systems. Wes McKinney, Principal Architect of Posit PBC, Chief Scientist of Voltron Data and a General Partner at Composed Ventures, joins us to explore how the modern data stack is evolving and what it means for the future of analytics. Wes reflects on his journey building pandas and Apache Arrow, sharing how open-source ecosystems grow, transform and shape the way organizations work with data today. Wes also highlights the rising importance of semantic layers, agentic workflows and defensive coding practices as teams embrace AI-driven development.

I'd rather read the transcript of this conversation please!

In this episode, Wes McKinney, Principal Architect of Posit PBC, Chief Scientist of Voltron Data and a General Partner at Composed Ventures, joins us to explore how the modern data stack is evolving. He reflects on building pandas and Apache Arrow, sharing how open-source ecosystems shape the way organizations work with data today. Wes also highlights the rising importance of semantic layers, agentic workflows and defensive coding practices as teams embrace AI-driven development.

Key Takeaways:

‍

00:00 Introduction.

02:32 Wes didn’t expect pandas to drive AI but he recognized Python’s unrealized potential.

05:09 A lucky convergence helped Python’s tools snowball into the AI standard.

10:40 Early big data focused on essentials, not the interoperable stacks we rely on today.

15:44 The composable data stack grew through bottom-up, grassroots open-source momentum.

21:56 Many “data science” roles ultimately became business intelligence and dashboard work.

25:24 Complex statistical work still depends on human judgment, not fully autonomous agents.

30:27 Frontier models retrieve table data reliably, while smaller models fail dramatically.

35:16 Better models and coding agents shifted Wes from an AI skeptic to an adopter.

40:07 AI-driven code demands stronger testing and review to avoid costly failures.

45:14 An AI-built finance project ballooned, revealing how agents inflate codebases.

‍

Audio

Wes: [00:00:00] it is problematic, especially for data work where you need to have an exact answer every time you, you run the system. And so even the risk that if you have the, the model call tools and you're leaving some interpretation of the results to the model and you know, one day you get one answer and another day you get. From the same input, you get a different, a different answer that could lead to a business decision being taken that is, detrimental to the business.

‍

Anthony: welcome back to Data Masters. If you've ever written import pandas as PD in Python, then our guest today needs no introduction. He is the creator [00:01:00] of Pandas, the seminal open source library. The practically defined python as a language for data science, but he didn't stop there. He is also the creator of Apache Arrow.

Creating the defacto standard for how High performance systems exchange data, in particular tabular data and is currently a principal architect at Posit, the chief scientist at Voltron Data and a general partner at Composed Ventures. Wes McKinney has spent nearly two decades building the foundational infrastructure that powers our industry.

And today we're gonna talk about what comes next from the rise of the composable data stack to his contrarian view on why LLMs might be heading for a trough of disillusionment when it comes to actual analytics. Wes, welcome to Data Masters.

‍

Wes: Thanks for having. The podcast.

‍

Anthony: So I wanted to maybe start, by going back in history a little bit, and I know this is almost an [00:02:00] unfair question to sort of. Cast your eye back 15 years ago, when you started the Pandas project. but I think it's fair to say, and again, with the benefit of hindsight, that it really, you know, it built this, Python based data science ecosystem.

and in that sense, Python has become the defacto standard for a whole bunch of AI platforms, TensorFlow, PyTorch, et cetera. So I'm just curious from your vantage point. Sitting here today, looking back, if, when you were first building pandas, did you, did you have a sense that it would become the sort of foundational layer for ai?

‍

Wes: It would be hard to predict that that far into the future, but I, definitely saw there was a lot of untapped potential in, in Python. And if only there was a toolkit for basic data, data manipulation, data wrangling, then that would help whatever was the potential or or potential future for Python as a mainstream data [00:03:00] language.

But back in 2008 when I started working on Pandas, it was not. At all the case and was not, not a foregone conclusion. Even using Python for doing professional business data work was seen as fairly risky at, at the time because Python was unproven. It had a fairly immature open source ecosystem for statistics and data analysis work. And, the idea initially, pandas started out as a. As a toolkit for myself to do my work at my job at a quant hitch fund. And I enjoyed building the toolkit for myself. And then in, eventually I was, building it for my colleagues who were excited about, using it. And eventually we open source the project and I started engaging with the Python community and seeing like, is there an appetite for this?

Do people want this? Is this something that, that the world needs? And eventually. It turned out that the answer was yes, that there was, it was at a little bit at the right, being at the right place at the right time, around 20 11, 20 12, where. people were starting to talk about data science and big data, and there was [00:04:00] suddenly a massive need for people with data skills, and Python was an open source accessible programming language that people could learn easily. And the sudden availability of a toolkit to be able to read data out of databases and load CSV files and read Excel files, and then be able to do meaningful work with that data. With a, with code that was easy to write and easy to reason about was one of the things that helped unlock Python as a, a language that could be accepted in, in the business world.

And I don't know if this was causal or something that really factored into. Google choosing Python as the language for TensorFlow. I think it was a little bit of an accident. I was partly inspired by the fact that Google used Python as their, one of their three languages, the other two being c plus plus and Java.

So Python was their scripting language that they would use to build interpreted interfaces on top of, mainly c plus plus libraries using wig and other wrapper wrapper generators. So that was probably the main reason why [00:05:00] Google chose to do TensorFlow and Python. And eventually Meta started, building PyTorch.

initially it was Torch was all in Lua, I believe, and eventually they, they migrated that to, to, to Python to create PyTorch. so there was a combination of like you know, being lucky and, and making the right prediction. but also this, lucky confluence of combination of open source projects and then. major AI research labs needing to choose a programming language to build their AI frameworks in. It just happened to be that everyone chose. Python. And so, you know, we were all rolling around our little snowballs and suddenly the snowballs merged together and became one really gigantic snowman that now, that now powers the world.

So it's been, it's been interesting time, but, I've resistant to patting myself on the back and, and saying like, oh, I predicted this was going to happen and I knew it was gonna end up like this because that's, that's definitely not, not the case. I was hopeful that it would end up. That things would end up like this, but, I would've been satisfied with a much less, successful outcome.

‍

Anthony: Well, I can pat you on the back. So [00:06:00] how about that? So I think that, I think that's fair. in that same spirit though, and maybe this is an unfair question, but just to throw it out there, were there other features of Python that lent itself? Clearly you solved a big unmet need, around access to.

Data and also, munging that data, to use your term. but were there other features of Python that made it particularly relevant for this use case?

‍

Wes: I think because Python is really easy, was originally created as a teaching language. So it's easy to learn, it's easy to read. People back in the day would often describe Python as readable. Readable pseudo code, like similar to the code that that you would write to describe, describe algorithms. So you could hand a piece of Python code to somebody who'd never written Python before, and they would pretty much be able to get the idea of what.

What the code was doing without a lot of like types. And of course now Python has types and so that's changing. But, but the language had an accessibility and a, a readability aspect that made it really appealing. to, [00:07:00] to do scientific work in, Python also had a, an existing numerical computing ecosystem.

So there was a group of folks that were essentially building an alternative to working in matlab. So if you were doing neuroscience research or physics research or things like that, you had num pi and sci pi as a. Basic computing foundation for numerical algorithms, optimization, linear algebra, the essential things that you would need to begin to start doing statistics and data analysis work.

Like whenever I needed to run a linear regression in Python, I didn't have to wrap linear algebra libraries myself like that that work was already done. So that was an essential, an essential factor. I think another thing that really helped tie the room together was the. and development environment, which initially started out as being the I Python Shell and eventually the I Python notebook, which turned into Jupyter Notebook and now the, Jupyter Ecosystem and Jupyter Lab. And so that also gave [00:08:00] people, that was one of the first. mainstream, open source computing notebooks. People were familiar with Mathematica and other closed source computing notebooks in, in the past, but this one was, you know, inspired by things that had come before out of Mathematica and MATLAB and things like that.

But it was open source and, just, just, just worked with. You know, map pot lib, the plotting libraries and all the things that existed at the time. So essentially, very quickly, by 2013 or so, we had this full stack environment that had the bare essentials of a interactive computing environment through the I Python and notebook, Jupiter numerical computing through NumPy and SciPi plotting through map pot lib and data wrangling through through pandas.

And so if you were an aspiring data scientist or somebody who was looking to do business analytics or some type of. build a data analysis application in a business setting. At that point, you could credibly make the case to your colleagues and your bosses, your boss or whoever, that you [00:09:00] had the, the tools at your disposable, to be able to build something without getting pulled down into a rabbit hole.

Having to like build essential components from. From scratch. some people underestimate like how important it's to be able to read CSV files, but it turned out that just being able to point pandas at a CS v file was its initial killer app. Like, oh, I have a data file. I can say pd, read CSV and, and read it. People take that for granted now, but, circa 2012, like that was, that was a big level up for, for Python at the

‍

Anthony: No, I mean like that, that alone, I mean the hell that is parsing CSV files, being able to abstract that away and deal with, you know, the common use case there is, and, and probably I would. Venture to bed, 98% of use cases were loading CSVs off a local vial system, even for very large amounts of data.

so shifting, to, to the more, closer to the present. And, and, you know, you've also shifted your energy and focus towards Apache Arrow. [00:10:00] and I think it, it would be fair to say that you would frame Arrow as the foundation, for the composable data stack. and you know, there's tools like Duck DB and others that are so based on that.

so talk a little bit about what the composable data stack is. For those that are not familiar, and maybe building a little bit on these lessons you talked about, you know, building the bigger snowball, this idea of interoperability and building an ecosystem,and how, Apache Arrow fits into that as well.

‍

Wes: well, you can think about the, the late two thousands, early 2010s as being a little bit of like the food, water, and shelter era of, of big data and open source data processing in general. And so you had. Disparate communities building solutions to solve the immediate essential problems that, that were right, right in front of them.

And so there was relatively little consideration to, building, you know, larger, more heterogeneous data stacks full of multiple programming [00:11:00] languages, different data analysis tools, processing engines, storage engines. so people were in general building. integrated systems where they would, build a, a solution for a particular problem, with a layer of technologies where classically you would have a database system that, that has everything tightly integrated together.

So the storage engine. the data ingestion, query planning and optimization query execution as well as the sql SQL front end, and that would all be present within a full stack, you know, a vertically integrated full stack system. But one of the, the key ideas from the, from the Hadoop. big data era, which was originally kicked off by Google's, map produced paper.

This idea of, disaggregated storage or storage being decoupled from, from compute. And so you could start to think about, okay, how can we store data in a way that multiple compute engines can, can work on it? So that gets you thinking about. Standardizing data formats. So having open standards, open source [00:12:00] standards for data formats. and that was stage one of what was happening in, in the big data ecosystem. but as time, time dragged on, there was a collective realization in the mid 2010s that. the cost of building one-to-one interfaces like pairwise interfaces between programming language, a computing system, or storage system B was not only hampering the performance of systems by introducing a lot of serialization. And interoperability overhead. but it was also fragmenting effort and, and weighing down progress in the overall, open source data ecosystem. So the I original idea of Arrow was if we could define an open standard for data interoperability for tabular data, in particular column oriented data similar to what you would find in pandas or in an analytic database. And we could use that format for moving data efficiently between programming languages, between compute engines, between storage systems and [00:13:00] execution layers. and that would not only improve, you know, improve performance, but also, reduce the amount of, glue code that has to be written by developers to make these, make these systems work.

And so we initially started with this, this data interop interoperability problem. Of, essentially having a better wire protocol for hooking together, hooking together systems. But what was interesting is that once that we, and that that came through the, through the Arrow project, we already had some standardized file formats like Parquet and ORC. the time, an arrow was designed to work really well with those, as a companion technology. what we've seen as time has moved on is that we can start to modularize and decouple the other layers of the stack. And so we can start to think about modular execution engines, or decouple front ends, like different types of, user interfaces for interacting with compute engines.

Not only sql, but also more data frame like interfaces, like what you would see. [00:14:00] from using Pandas. so the kinda the way that we described this, like composable data stack, another way that we, we looked at it was the, deconstructed database. So if you think about like the architecture of a database system and trying to separate all the different layers of, of a traditional vertically integrated database system.

Putting, building, open protocols and open standards to connect those, those pieces with each other, to enable interchangeability so that we can Maybe if somebody comes along and develops a better storage format, then we can incorporate. and maybe that storage format only works, really only works well for certain types of use cases, but you can choose to say, use Parquet files for one set of use cases.

Whereas, you know, now there's new file formats which are specialized for multimodal AI data, including images and video like the lance format. And so you'd like to be able to take advantage of those new developments in the ecosystem without having to do a, a full tear out of your system and replace, basically throw the baby out with the bath water in order to get, get access to new, functionality in one layer, or new or [00:15:00] enhanced functionality in one layer of the stack, if that makes sense.

‍

Anthony: It does. And what I'm curious about is how this interacts with the, more traditional closed source, database and analytic,software providers. Have you found support in that? Or is this felt like competition? how do you get folks like that on board? or, or is that not even a consideration and it's really just, you know, a, a, a kind of orthogonal standard.

‍

Wes: Well, I think it's one of those, the open source model succeeding stories in the sense that, that a lot of the adoption and growth and success of the. Composable data stack. So essentially arrow duct DB data fusion and a collection of related open source technologies has been really driven by open source adoption and grassroots bottom up, you know, bottom up adoption and pressure on, you know, some of the larger, you know, more [00:16:00] powerful forces in the ecosystem.

And so data fusion, for example, is a modular. customizable column or query engine similar, similar to Duct db. It's written in rust, but it's really designed for custom customizability. So rather than being a batteries included, you know, ready to go system, which is more like duct DB data fusion is meant to be customized, so it wants you to, to mess around and modify and ad. Operators to its logical planner layer. It wants you to be able to hack on the optimizer to introduce new features in its in its SQL dialect. and the idea of, you know, the idea of data fusion is that you could have this off the shelf, high performance arrow, native query engine, and use it to build your custom database engineer your custom query processing solution. And as time, time went on. Data fusion just got more and more popular. It got better and better in terms of performance and extensibility, to the point where, [00:17:00] apple decided to go all in on using data Fusion to build its Spark accelerator layer called, data Fusion Comet. so now there's a team at Apple, building, you know, spark accelerating Apache Spark with data fusion.

The creator of data fusion work, Andy Grove, works for Apple and, and leads that team there. but that was something where it wasn't a top-down thing necessarily, but rather like people looking in on the, the evolution of the data stack and seeing like, okay. these technologies are, are becoming, you know, integrated, like all over the place and they're getting better and better over time.

They're attracting more and more contributions from the open source ecosystem. and it's. Better to get involved in these projects, hire people to work on them and influence their direction in a beneficial, beneficial way, rather than go some totally different way or build something that's completely proprietary. I think another thing that is driven, the adoption of these,composable data [00:18:00] stack tech technologies has been the. the trend in open data lakes, or what is now called like the Lakehouse Lakehouse architecture, where you have a structured data lake, scalable metadata store like Apache Iceberg. so you can think about, think of it as like an evolution or a formalization of some of the ideas from, from the Hadoop era where originally. the way that data sets were managed and their metadata was managed in, in Hadoop was you store the data in HDFS, and then there was a meta store called the Hive Meta Store, basically at MySQL or Postgres database that contained all of the details about like what constituted a table.

And whenever you were planning a query, you would read data from the Hive meta store and that would tell you what files you need to read to be able to run a particular SQL query, with your chosen compute engine. but hive. Ran into scalability challenges, especially in very large data lakes. And so that led to the creation of these new, open data lake or Lakehouse technologies like Iceberg Huie and Delta [00:19:00] Lake, which provide present a more scalable and high performance approach for some of these really massive, data lakes that you find, among the biggest companies in the world. So it's been, it's been interesting, but again, the success of, of the open source model and I think. also, you know, to see cutting edge, research like CWI, the birthplace, you know, one of the birthplaces of, of analytic column or databases. basically CWI and MIT and like, you know, handful of academic, database research labs. you know, they chose to, to build a research group there, chose to build duck DB as an open source project. they could have gone and built another analytic database company. build another commercial analytic database, but they chose to build a sq l light, type system for analytics to be, you know, one of the best batteries included embeddable, SQL engines, out there.

And now, you know, they have a project that is open source, has massive adoption and you know, you'd be crazy to go and build a brand [00:20:00] new embedded database engine from scratch. Now, like either you need something customizable. you wanna work in rust. So you can use data fusion if you want something that's batteries included and ready to go outta the box, you choose duct db and that's, you know, what we're seeing kind of across the board.

‍

Anthony: Yeah, so like the, the open source strategy. Ultimately sort of trumps because it allows for, a full range of highly customizable through to, you know, as you call it, batteries included or fully, functioning. So let's, let's shift, the conversation a little bit, and not totally cast eye forward, but maybe cast eye to the present, which is, this whole idea of, building on top of this stack, how people.

Generates insights and conclusions off this data. you know, there's a maybe a working theory that there is no future for data scientists and data engineers because, smart models, and being [00:21:00] careful not to say LLMs, but, but maybe LLMs are gonna become so good at. understanding and working with data that the notion of asking an analyst to tackle a problem will become, seem silly and rather we'll just ask our smart agent to, to figure it out themselves.

and you have a bit of a contrarian view here. and my sense, but I don't want to lead the witness as it were, is that it comes from your experience in dealing with tabular data with complex business logics in both. pans and Arrow, and just trying to connect what we were just talking about to this, this challenge.

so, so talk about, let's start with the question of whether you think we're on the verge of, firing all the data analysts and turning it over to, you know, chat GP something, or whatever. And then maybe well let, let's start there and then we can talk about, you know, why or what about tabular data.

‍

Wes: Yeah, that's, there's a lot, there's a lot to that question. Many layers to many [00:22:00] layers to unpack. Well, first, one of the things that, that I've observed is that maybe one of the dirty secrets of, you know, the term data science and maybe why we're hearing the term data science and data scientists thrown around. Less and less these days is that a lot of data science roles are what people were trying to hire data scientists and and build data science teams in the early and mid 2010s, is that a lot of those teams ended up essentially doing business intelligence, so doing engineering on data pipelines and building doing ETL or reverse ETL or data plumbing and ultimately to create. Create a dashboard or a set of series of dashboards that could be, could be updated. So it was, it was an e evolution of the traditional, kind of the old school, you know, bi engineer, ETL engineer, building a, building a database to power, you know, somebody's Tableau server, Tableau instance. And so I, I do think that that. you know, a lot of that work, the, the, the mundane [00:23:00] building of dashboards and, you know, creating, allowing business users to, with natural language to ask for custom like bespoke dashboards that answer an exact question that, that they need. I think increasingly that that work is, you know, gonna be done more and more by, by LLMs, especially as. there's a lot of work happening, uh, recently on semantic layers, which are something that's, I think, frankly necessary in a lot of cases to make LLMs effective at being able to reason about, the relationships between tables and to be able to generate correct, queries against the data without a semantic layer.

Like there's, you know, lots of, Examples that have been shown about how, you know, LM can reason incorrectly about the, the relationships, like the join relationships between tables and, make the kinds of errors and writing SQL queries that a, you know, first year analyst, writing SQL would, would make, you know, double counting and, things like that. but I think that that ecosystem will [00:24:00] become, you know, become more mature, that semantic layers will become more, widely deployed and, standardized. and, more and more of like that dashboard building and custom dashboard building work will be, taken care of by, agents. but I, you know, I think there is still data science work, which, you know, involves, modeling and asking nuanced or subtle, subtle questions that require like judgment and intuition about. having domain expertise and, and understanding of like the business context where, where the questions are being asked, and choosing the right, tech techniques to, you know, be able to build a statistical model or a machine learning model. Maybe you're trying to determine a causal relationship or do some type of type of forecasting and, you know, these. Many of the, the, this type of, a lot of this type of data science work is, is more, there's, it's part, part science and, and part art and relying on, you know, experience from, you know, from past, modeling or [00:25:00] statistics work.

And I do think that data a lot, a lot of that nature of, You know, statistical work in data science still requires a lot of, a lot of human judgment. And I think we'll be, you know, maybe eventually it will, you know, once we have a GI, maybe it will be, taken over by an A GI statistician, you know, we'll see. but in the short term, I think there's still is a, is a need for, that that's one of the areas where there's the most human judgment needed. where, you know, if you just turn over, all of that work to an data scientist, it's likely to run into, you know, run into pitfalls or only only explore the types of questions or analyses that the LMS are well suited for. And so the more you know, complex, there might be some study that you need to run that might require, dozens of queries to be run and the, the results need to be like, stitched together and compared in lots of different ways. LLMs are still at a stage where like they, they often struggle to count.

And so asking them to reason about, like, you ran these 35 queries [00:26:00] and we need to stitch together with the results. And then reason about them. right now, like a lot of that work is being offloaded to, to tool calling, because, you know, LLMs are not, not great at looking at data sets, can talk more about. but, I think we're still at early days in terms of, data scientists being put out of a job by, by, by AI agents. But, you know, we'll see, we'll see, see where things land in in a few years.

[00:27:00]

‍

Anthony: So, I wanna maybe make this super basic. it's strikes me that large language models, I mean, the answer is in the name. It's, it's a language model, and, language by its nature is. Sequential. it also is, has these slightly arcane rules. I mean, all languages have a grammar, et cetera. and, they are decidedly not tabular data.

and they're not really even, code. So like to the extent that we believe LLMs are good at writing code, for example. code is a much closer analogy to language. in fact it's in a way, a simpler version of language 'cause it has a very tight grammar, a very tight syntax. whereas there, as you know, many, people are learning the English language for the first time.

Well notes there's lots of, poorer, badly implemented rules in, in, in languages. whereas tabular data, has very specific set of features that LLMs are just [00:28:00] not. At all competent with, and you, you gave a very simple example of counting, but you know, even simple notions of sorting and querying and filtering and like basic behaviors are things that are just a mystery to it.

Is that a, am I framing it fairly?

‍

Wes: Yeah. haven't run any, you know, studies myself or, or set up, you know, structured, you know, evals to get, get the data for myself, like the accuracy results. But I remember early on, I think it was in the. Claude Sonnet 3.5 or maybe Claude Sonnet four era. you know, I, I built a, a little system to, you know, collect and summarize data from get repository history and it would create little tables and then, it was a lot of data to summarize and then I asked it to, to analyze the data that was summarized in the tables and it would struggle to do basic arithmetic in combining. really small, small tables of data together. And so that was the first time that a light bulb went off in my head. I was like, oh gosh, like. these models, they're for language, they struggle to do [00:29:00] things like adding or combining data sets unless, you know, all the work is, is delegated to, tool calling or writing Python.

They'd be better off writing Python code to do the work than actually trying to do, trying to do the arithmetic or the logic in, in the language model. in the language model themselves. but yeah, you, you're, you're absolutely right. And there've been, there've been some research lately around the retrieval problem.

So basically the idea of the retrieval problem is if you present a table, let's say a spreadsheet or a table of data, let's say it's like students in a class, and then there's a bunch of columns about attributes about those students. And so the idea is that you ask the model for this student, could you tell me. you know, let's say every student has 10 attributes which are stored in the table, and you ask the model, like, can you tell me, like, for this student, like, what is their attribute C or their attribute F? So just essentially

‍

Anthony: Straight look.

‍

Wes: and straight up, looking up the value in the table and The frontier models have gotten to a point where [00:30:00] they're, they have high accuracy, you know, over 90% accuracy in the, in the retrieval problem. the smaller models are, they fail catastrophically, at this retrieval problem. and I'm blanking on the, the name of the blog post, but there have been, people that have done studies like even trying to determine what's the best data format to present a table to an LLM, especially the smaller, smaller, more efficient, cost efficient models. To get the best accuracy for, for the retrieval problem. And it's, it's unsurprising, it, it's, it's a little bit surprising the results. Like you would think that like CSV format would be a good format for an LLM to look at the data. But, you know, it turns out that amongst, you know, the 10 or 15 different ways you could format a tabular dataset, there's, you know, some, weird formats, you know, basically markdown key values, I think was one that I saw. which was a format that I'd never heard of. I think it was invented for the study where it turned out that like, you know, presenting the data in this like markdown [00:31:00] format with like each row as a markdown section would yield better retrieval than putting the data in XML or in JSON, in the prompt.

So. I mean, I think part, some of it, I suspect, I mean, I'm not a, I'm not an AI scientist, but some of it may have to do with like the auto regressive nature of like the next token prediction, you know, design of these, models. And so I'm sure that they'll get better over time.

But, these large frontier models are really expensive to run. And so, you know, the hope is that as. LLMs advance that the small models, can become more and more effective where we can run the model at the edge on our phone. Or I just got myself in, one of NVIDIA's DGX Spark, you know, ai, sort of little Ai, mini computers to, to experiment on and to build some of my own fine tune models. and so I'd be really excited if like, you know, we could do really great work local LLMs, not requiring, you know, 30 or $40,000 GPUs, but, to really get the performance, you've gotta run on these [00:32:00] super expensive, this, these really expensive,hardware configurations.

And so, to get cutting edge inference is quite, quite expensive these days. So it's an interesting problem. I think there, there's a number of companies that are working on. models for, specifically for tabular data. basically an AI approach to like prediction and forecasting and regression and things like that. And so I think that, I think we're, we'll definitely see more interest in that area. I'm surprised that the, you know, frontier AI research labs aren't, maybe they've got internal research projects that they haven't announced, but, maybe as some of the hype shifts away from. chatbots, you know, more work might shift towards, building foundation models for tabular data because ultimately, like a lot of the value in business data sets does lie.

And value to unlock for businesses does lie in their data. And so, to, to get the most value out of AI in a business context, like somehow we've gotta reconcile like this, this incongruence between, you know, current generation LLMs. interacting with, [00:33:00] with tabular data sets, even like MCP, which was developed to provide a standardized interface for LLMs to in interact with external systems and tools, not an especially efficient way to expose data to, to an LLM, even if LLMs were good at looking at, Even if LLMs were really good at looking at data sets, like MCP is not the interface that you would want to, provide, you know, a hundred thousand row table, to, to a model to, to look at. and so we're, we're far just thinking about all the work that we've done, like the engineering work that we've done, on Arrow to, achieve high performance, interoperability in all these contexts.

And, you know, the AI equivalent of like, how do we expose data to, an LLM. you know, it looks like caveman tools, you know, by, by comparison,

‍

Anthony: Which is exactly the connection I was hoping you would draw, because it's like we've done, you know, 10, 15 years worth of work to make data, to use your example, to make data interoperable, and then we turn to this new world and we just. Start from [00:34:00] scratch. Like, it doesn't make any sense. also, I was gonna ask you It seems to me a failure point with these models is that they aren't deterministic. Like the one thing we can say confidently for any analytical problem is that if you run the analysis twice, you should get the same answer. Like, it's not okay to be like, well it's, it's kind, it's sort of round two, you know, like, or today it's two and tomorrow it's 20, and like, you know,

‍

Wes: Yeah.

Yeah.

‍

Anthony: whereas that's actually a feature I think of these language models, which is, Obviously you can adjust the temperature, but the, like, I think something people like about it is that it gives you different answers every time. Otherwise, it's just a rules-based system. I don't know if that resonates with you.

‍

Wes: It does. I mean, I've, you know, initially I, I, throughout all of last year I was pretty AI skeptical. Let's, let's put it that way. And, this year think in part because the models have gotten a lot better. and also the emergence of, CLI coding agents like Claude Code, I [00:35:00] think have really. for me that was a big, a big unlock where, you know, working like I wasn't particularly enthused about using the, you know, AI IDs like cursor and windsurf.

But, I think within a couple of weeks of, of using cloud code to delegate mundane coating work, refactoring, you know, just stuff that was taking up my time that didn't seem particularly high value and, and seeing a lot of returns on my. like really quick returns and productivity benefits. I've, you know, become a big believer and now I, I, use cloud code almost, almost every day, and, definitely costing philanthropic money on my max plan because of, you know, the, couple billion tokens that I, I consume every month. but at the same time, like the imperfection is. Like, if you don't observe, like see the, the inconsistency or the non-determinism like, it, it is problematic, especially for data work where you need to have an exact answer every time you, you run the system. And so even the risk that like, if you have the, the [00:36:00] model call tools and you're leaving some interpretation of the results to the model and you know, one day you get one answer and another day you get. From the same input, you get a different, a different answer that could lead to a business decision being taken that is, detrimental to the business. And that's, that, that is problematic. and I often find myself playing whack-a-mole with the prompts to, to get consistent behavior out of the models.

Like particularly like creating a consistent development environment where each time I pick up cloud code that I can count on the agent to predictably, do the same things. Whether it's mundane things like. making sure that the style checks run but you know, it seems that, 20 to 50% of the time, like from day to day, like without modifying the prompts or the Claw MD or any of the sub at all, like, you know, the next day I'll, I'll open Claude code and it will forget to. do things. And I'll say, Hey, you forgot to do this. Like, CI is failing. They're like, you're absolutely right. I ignored your instructions that [00:37:00] we wrote in Claw md and so the idea that these tools can just like casually, forget things that, you know, even with their massive context windows, like they, they have the memory of a goldfish and, so, you know, I'm sure that it will get better and, again, like I use these tools every day, like they, they bring a lot of value for me, but I'm also not quite, drinking the Kool-Aid and, and believing that like, you know, this is the next great step for humanity that's, you know, going to lead to, a world without work.

‍

Anthony: Yeah, and I mean maybe to say it in a really simple way, I think, think you would agree that it, it's made you much more productive, but it's not made you Eat.

‍

Wes: Yeah, no. If anything, my experience and my experience building software feels essential when I'm, when I'm using these tools because if I wasn't able to read the code to review it as though I was reviewing the work of a junior developer. And tell it all of the things that it messed up. Its architectural problems like the incorrect or missing unit tests, [00:38:00] like, incorrect documentation, incorrect implementations.

Like I have a lot of experience reviewing other people's code and I feel like that is one of my biggest assets when I'm working with these coding agents is that I have to, I tell myself when I'm working with cloud code, like treat all the work that is coming out of. this agent, like a very motivated, very productive, junior developer who's prone to errors and doing things and creating me, and frankly creating messes.

And so I, you know, at a glance, like to be able to spot, design architectural problems, you know, things that need to be refactored, code, duplication, code smells, all that is, is feels essential to getting the most value out of these tools and. I've heard, you know, from, from talking to other people, that the people that are the users, the AI users who are able to get the most value out of the coding agents are. the most experienced, developers who were able to bring their experience and judgment to give, not only to write better prompts, to be very specific and articulate [00:39:00] about what you're asking for, but also to be able to judge the output and give high quality feedback so that you can, corral basically move things in the right direction to get, what you want.

But I do think that, that we're likely to have a vibe coding epidemic of. kind of amplifying Dunning Kruger syndrome of, people building software, AI slop, not reading the output carefully, you know, not doing code review. Basically just letting you know, letting Codex or cloud code do its thing, and then slapping up a poll request without giving it. a second's thought. there's likely to be, you know, substantial business losses because developers are deploying vibe coded software into production without, sufficient coding, without sufficient code review. You know, of course you can, protect yourself against some of this by taking a test driven development approach and asking the agent to, to build a test suite that of course, you have to review before you set to implementation work. you know, in the past, like I was [00:40:00] never really a hardcore test driven development, TDD adherent, but now using coding agents, I've become much more so because each time I sit down with co code, I treat it as a defensive exercise where it's like, how do I, protect myself from the agent, doing incomplete work or insisting that it's completed the problem or solved the problem in the way that I asked for. but is, you know, self, it's deceived itself into believing that it's finished the problem. And so the more test coverage, whether it's you know, test coverage, automated checks, benchmarking suites, like all the defensive things that you would need to do, already to create a piece of production software. the, yeah, it's, it's even more important to, to do that with, with these agents, otherwise, yeah. Just any software that they create becomes a huge liability otherwise.

‍

Anthony: I think that's a wonderful, insight actually. Like, it's, it's like these practices become more important because you're, and, and, and I also wanna. loop it back to something you you said before, [00:41:00] which is the future is a smaller number of really experienced, thoughtful, architect engineer types, marshaling a army of, of these agents.

But then, and then to your point, layering in this defensive coding practices, maybe that model starts to feel, closer to the future. I sort of, I'm waiting for the first, spaceship. crashing into Mars due to a vibe coding, error, as opposed to a units conversion problem.

‍

Wes: it's interesting, I think, one of the existential problems is how will junior developers become senior developers? And so the old, the old working

‍

Anthony: Right.

‍

Wes: that you become a senior developer by doing junior work over a long period of time.

Getting, you know, good at not only writing code, but reviewing code and, and seeing what good code looks like. And then as your career progresses, you more of your work transitions from maybe as a junior, junior developer, you're doing 90%, 90% writing code, 10% code review. [00:42:00] maybe a senior developer is doing 10% development, 90% code review. And so now delegating more of the coding work to these agents means that more and more of our work is, is shifting to code review. And I think the code review is still going to be a bottleneck. And so even having. Senior developer with an army of AI agents who's going to review all of that code?

mean, you can have the agents review their own work or you

‍

Anthony: Well, good luck with that.

‍

Wes: yeah, I mean, I have, you know, I, I have friends that have told me about having, Claude Code, implement Codex Review or vice versa. Basically, you know, use the agents to review each other's slop and to make it better.

But, I think, on the whole, we're gonna have fewer software engineers and, you know, especially really experienced engineers will spend more of their time reviewing the wor the output of agents. but still there's a human bottleneck of the co of being able to, do code review and assess the work and determine [00:43:00] whether or not it's, you know, it should be accepted or whether. you know, many cases, like these AI generated pull requests and patches should just be thrown out altogether and, and start over. But one of the nice things is that the cost of starting over is so much smaller. And so if you see, see some, you know, somebody approach a problem and they've, you know. It's basically like the Panama Canal approach versus like going around, you know, the bottom of, of South America. And sometimes, like, you know, with an AI agent, you can generate an impressive amount of code, like thousands and thousands of lines of code in an afternoon to hack your way around a problem.

But maybe there's a more elegant solution that the AI agent just didn't see, or like, wasn't obvious, you know, from, given all the data in its training set. to see like the simpler, more elegant solution. And so if everyone is, yeah, so if everyone is like always using the agents and creating solutions, which are sometimes circuitous or kind of missing, like, you know, the more elegant. More maintainable, more sustainable approach [00:44:00] that that's gonna re lead to these projects where, you have a hundred thousand lines of code that maybe should only be 15 or 20,000. You know, if written by human it might be a much smaller code base that's easier to maintain and is more robust over time. Whereas, you know, with this large code base you, you start to reach a certain limit where even having the agent look at the code base starts to become unwieldy, like files with thousands of lines. Like it starts to choke on the, the input pretty quickly. Like I recently created like a little personal finance tool, with Claude code called Money Flow for interacting with like, you know, I use Monarch money, so personal finance tool. And now I have a project that I created from scratch with cloud code. So it's 95 to 99% created with. Code with a lot of feedback and code review from me. and it's pushing, including the test suite and all of the infrastructure and everything. It's pushing 40,000 lines of code. If I'd written it by hand, probably would be a lot smaller in part because, you know, I couldn't spend nearly the amount [00:45:00] of, time that it would take to, to write a code base that large and I would've cut more corners or made simplifying assumptions in order to, accomplish the same things like with a lot. less code. So, so it's interesting. But, yeah, I'm, you know, learning day to day and, yeah, I, I'm no expert by any means, but, I feel like the more I use these tools, the better I understand them and the more I get out of them.

‍

Anthony: Well, Wes, really appreciate you taking the time. This was a bit of a, a, a journey from, uh,from where you started and through to the present day and, and also thinking about what. how to think about at least, uh, a future where we're marshaling agents on our behalf. I really appreciate you taking the time and sharing your thoughts and insights.

‍

Wes: Thanks again for having me on.

See Full Transcript ↓

How Open Source, Python and AI Are Shaping the Data Future with Wes McKinney of Posit PBC, Voltron Data and Composed Ventures

Wes McKinney

I'd rather read the transcript of this conversation please!

Suscribe to the Data Masters podcast series