EPISODE

released

July 6, 2023

•

Runtime:

35m23s

The Language Revolution: Navigating the Frontier of AI-Language Models with Dean Abbott

Dean Abbott

Founder & President of Abbott Analytics

This thought-provoking podcast episode delves into the fascinating world of GPT's large language models and neural nets. Join us as we explore the impact of adding a chat interface to GPT and how it has taken off, captivating the public's consciousness. Our guest, an expert in the field, shares insights on the divergent opinions surrounding large language models and neural nets in relation to language.Welcome to another episode of DataMasters. Our guest today is Dean Abbott, founder and president of Abbott Analytics. Since March 1999, Abbott Analytics has led organizations through applying and integrating leading-edge data mining and machine learning methods to marketing, research, and general business endeavors. Abbott Analytics has been dedicated to improving efficiency, ROI, and regulatory compliance through machine learning. Before founding Abbott's Analytics, Dean worked as Wonderkind's chief data scientist and co-founder of Smarter HQ.

I'd rather read the transcript of this conversation please!

Intro - 00:00:02: Data Masters is the go to place for data enthusiasts. We speak with data leaders from around the world about data analytics and the emerging technologies and techniques data savvy organizations are tapping into to gain a competitive advantage. Our experts also share their opinions and perspectives about the height and overhyped industry trends we may all be geeking out over. Join the DataMasters podcast with your host, Anthony Deighton, who is Chief Product Officer at Tamr.

‍

Anthony Deighton - 00:00:37: Welcome to another episode of DataMasters. Today's guest is Dean Abbott, founder and president of Abbott Analytics. Operating since March 1999, Abbott Analytics leads organizations through the process of applying and integrating leading edge data mining and machine learning methods to marketing, research, and general business endeavors. Abbott Analytics has been dedicated to improving efficiency, ROI, and regulatory compliance through machine learning. Before founding Abbott's Analytics, Dean worked as the chief data scientist of Wonderkind and chief data scientist and co-founder of Smarter HQ. Welcome Dean.

‍

Dean Abbott - 00:01:19: It's a pleasure to be here. It's always fun to talk about data for all of us geeks out there.

‍

Anthony Deighton - 00:01:24: Yes, and hopefully we can do a bit of that. So, before we dig into data and analytics and machine learning, maybe share a little bit about Abbott's founding story, how you came into this business. I've been doing it since 1999. That's a long time ago. So you have quite some history here.

‍

Dean Abbott - 00:01:43: Yeah. And the Abbott analytics story started in 1999, as you indicated. But my story with analytics in general really began in the late eighty s, nineteen eighty seven, and I've been doing essentially the same kind of analysis for 35, 36 years now. Starting out of grad school, I was at the University of Virginia, found a temporary job with a company that was doing missile guidance. And I was in applied math and control systems control theory, and I was interested in such arcane things like differential equations and stuff like that, and that's why controls work. But the secret sauce to that company was using statistical learning to apply to optimized guidance commands. And that turned out to be the part that was really, really interesting. And several in that company started their own at that time, machine learning or data mining. Companies back in the finally went independent 1999 as a consultant after spending about a decade in mostly DoD consulting or companies that were doing consulting, and since then have been working with companies, a wide range of companies since 99, just applying these techniques to solve business problems. I'll tell you, I'm not a recipe driven kind of guy. I love those kinds of problems that require more thinking and more connection with the business. So it's not like you can read a book on algorithms and say, oh, we'll use this algorithm here. You've got to connect it to what the business is trying to accomplish. And sometimes the simplest thing is just a statistical measure. Sometimes it's a machine learning model, sometimes it's a whole infrastructure. So I've done this with a wide range of clients, everything from retail to DoD. I mentioned to you before that I built models for the Navy Seals for years to try to predict who would make it through Hell Week. I've worked with the IRS for a decade or more, finding non compliance on corporate tax returns, revenue Canada, to find how much someone owes in tax. I've done other wide range of behavioral marketing with, I guess, retail clients you would know and love, hopefully to try to identify what would be the best content to put in an email, what's the best product to recommend for you, those kinds of questions. But they're all analytics driven at the bottom line.

‍

Anthony Deighton - 00:04:08: Yeah. And I think that point about sometimes, rather than starting with the algorithm and thinking about how to apply it to a problem, starting with a problem and then thinking about what algorithms make sense, and sometimes a simpler technique is more applicable and more interpretable and sort of easier to apply to the problem, you know, given your history. I mean, in some sense, if you roll the clock back to 1999 and through to today, the analytics industry has changed a lot. The machine learning industry arguably has sort of come into its own. I mean, certainly statistics has been around a lot longer, but using computers to be able to calculate statistics extremely quickly and in great large quantities has changed a lot. But I'm curious in your perspective on how all of these things, in a sense, have changed since 1999 through today. Like, what are some of the bigger trends that you've seen over that time?

‍

Dean Abbott - 00:05:02: That's a really interesting question. It's one I get a lot too, because I would say, except for a couple very notable exceptions, it's the same. And so when you go online, if you go to a website like Katie Nuggets or Carl, buddy of mine, friend and colleague Carl Rexer has a survey, a data science survey that he puts out every couple of years, the Rexer Analytics Data Science Survey. And one of the things they ask is, what algorithms do you use? Now, these are data science practitioners. What algorithms do you use to solve problems? Number one, linear regression or logistic regression. Number two, decision trees. Number three, p means clustering. It's the same story. And it's been like that for decades. However, the two biggest shifts, just mathematically algorithmically number one, starting around when I started, was the concept of ensemble modeling. And ensemble modeling means instead of just building one model, just building one regression or one tree, you actually build lots of them. And then all these models have a slightly different take on identifying the patterns you're interested in. Then you combine those predictions into one final decision. And that approach has been tremendously effective at improving the accuracy. And if people listening to this are familiar with things like random forests, or XGBoost. Those are examples of ensemble methods which have gained huge popularity in space. So that's relatively new late ninety s and it really took off in the last ten years. The second big thing is the resurgence of neural networks in the form of deep learning and it's really captured the mindset of the leading edge portion of the industry and that's what things like Chat GPT is all about. It's really deep learning neural networks and they're essentially the same kinds of neural networks but they do a lot more than what I used to do in the 90s but it's essentially the same kind of thing. The biggest difference though the reason why deep learning has taken off in the last 6810 years is because of something else you mentioned which is the computational power. So we can compute things that a generation ago there's no way you could even dream of it. I used to build essentially deep learning networks but I had to use this in the early 90s for image processing. It was specialized hardware that cost tens of thousands of dollars for just one board and it would be what we may call Cindy processing. So single instruction, multiple data so you'd do things in parallel and then stitch it together at the back end all in hardware. So deep learning is essentially doing that but it's so much easier to do now. You've got infrastructure is just phenomenal and that means that people right out of school, right out of grad school or undergraduate school can just leverage something in the cloud, build something huge, incredibly sophisticated, incredibly complex to solve very complex problems.

‍

Anthony Deighton - 00:08:02: I was going to say I would very much agree with that perspective. So if I were to try to summarize that the techniques are largely the same but what's changed in the last ten years is the availability of highly scalable computers and I can speak to that personally as an undergrad. I did a statistical modeling class and I was trying to build a prediction model for oil prices and each time I would build and test a model it would take 15 to 20 minutes to build, calculate and run. And so my ability to Iterate models was constrained by my 286 processor on my Compaq sort of lunchbox style computer. And it was the first all nighter I pulled. And not because I was necessarily behind, because simply getting through the models and testing and changing them just took a long time. And today could probably do that entire exercise in under a second.

‍

Dean Abbott - 00:08:55: And the irony is, now we're kind of back there in that way. And another way that I think is really fascinating, that I may touch on, but because of the complexity of some of the models that we can build now, now all of a sudden time matters again. Because there's a period of time where time really didn't matter because you could do things so quickly. I kind of crashed at the San Diego University of California San Diego Supercomputer Center in the late 90s. There was a meeting of big data by these famous statisticians and I asked if I could attend. I just want to listen because I was not in their league at all. One of the questions I was asked was what's the largest data set you've worked with? How many rows? And for many of them it was like 1000 records. Maybe 10,000 was big on that day. Millions were definitely possible. But that's kind of where people were capping things. Millions, maybe 10 million if you had specialized hardware and now like a million records we don't even blink. I mean that's like yeah, whatever to that point.

‍

Anthony Deighton - 00:09:53: So let's talk a little bit about the man of the hour. GPT's, large language models, neural nets. It very much has the public consciousness. I've said this before, but I think one of the smartest things OpenAI did was putting a chat interface on top of GPT. In fact, I don't know this for sure since I don't know anyone on the inside, but I'm guessing that someone did that as a lark. And it's like, well, what if we put an interface to let people type chat messages back and forth? What would happen? And all of a sudden it's taken off. So I'm curious. There's a lot of divergent opinion. Large language models, neural nets as it relates to language. Some people see it as wildly uninteresting verbal diarrhea. I've heard it described as others think that we are going to be seeing mass layoffs and the end of work as we turn over rather all of our daily work to a large language model. Where do you sit on the continuum?

‍

Dean Abbott - 00:10:53: Yeah, it's a really good question and in the end, I don't know. I usually avoid predictions of where the technology is going because the times I have predicted I've been so wildly wrong. But you're right that there is such a variety of opinions. I was speaking with a colleague the other day about the 1020 30, 40% of the industry like copywriting that's just going to go away in his opinion, because people just enter it into ChatGPT. You tune the prompt properly and you get this wonderfully written article and I know there's news agencies that are doing that, which is fascinating to me because I've tried it a lot, as you can imagine. I love tinkering with things and oh by the way, I think your supposition that it was a lark. I would not be surprised if either in fact I wouldn't be surprised if someone just said how do we test this out? Let's just have a chat interface and that'll just give us a better idea of what it's doing. But it was genius because the natural language responses are, I think, part of what's capturing everyone. Now this is not new. There are other language models that do this kind of thing. In fact, I used to for fun, 1012 years ago, we used to go to this website to enter my name, and it would generate a random mathematics paper ten pages in length. And it was funny as anything because all the language looked like math. But it's garbage. ChatGPT is not garbage, but this is completely garbage. But it read very mathematically correct. In fact, they claimed that they submitted to some conferences and got their papers accepted and that kind of thing. But my point is that this is not new. What is new is for such a large language model that's so comprehensive and not so narrowly focused like math is able to respond in cogent ways to any prompt you give it, it's really a phenomenal architecture and it's a phenomenal approach. And remember, this is one model and I'm using that not in a very specific way, kind of nefarious. No, not even nefarious, just kind of an abstract way. One model doing all this. So there's good and there's bad with it. I've used Chat GPT myself just to generate Python code, say generate to do X, and it'll generate some Python code for me. It's not perfect, really. What it does for me is it takes me through that first draft, or I'll say generate text. I just did this. My daughter's getting married a month and a half, and I wrote up a bunch of narratives about her and her fiance. I said, generate a poem, a rhyming poem about them that I could use, like as father of the bride. And it generated something for me. Now, I'll say it was horrible. I mean, there's not something I would ever read. It was correct, it had the correct information, it had rhyming patterns, but it so lacked creativity. And I think that's what I'm trying to get at from a writing standpoint. My experience with it has been that it's factual, kind of interesting and plausible, but it doesn't contain style. I know you can generally, if you can prompt it properly, maybe you can get better style out of it. But the other part of it that is a little concerning to me is it will regurgitate or spit out information that comes across as authoritative and correct. That's completely wrong. Before we started recording, I told you this story that a friend of mine was asking about who invented PML, and it said me, or I was in the group that created PML, and I'm not. Now, I've written about PML, I've written about model deployments where I've mentioned PML. So I think I was associated with the data somehow, but it was completely wrong. But if you just did that prompt and wrote a news story, you'd be completely wrong. So I don't trust it. That's my main point. I don't trust it to do anything without some kind of layer that's put on top of it to refine tune, because it's not the same as a Google search. No, Google search is finding an actual written document that says what you were asking for. This is generating language probabilistically based upon what it's seen in the corpus of literature. It's consumed that's saying, oh, when you see this kind of concept, these kinds of things are associated with it, and then it formulates these concepts or these ideas, and strings them together. So it's not quite at the point where it can be fully trusted that state it is still a phenomenal achievement, and I will use it, but I will not use it blindly.

‍

Anthony Deighton - 00:15:28: Yeah, I think this point about trust is a really interesting one. And the comparison to Google is obvious. Google executives are spending a lot of time thinking about responding to the threat. But one of the things that I think the Google search engine does in Page rank is create a sense of authority. So the theory being that a highly page ranked article or page is more authoritative than a lower page rank. Which isn't to say that high ranked pages on Google are true, but they're certainly more authoritative and more likely to be true. And of course, this begs the question, what is truth? And to your point about this idea of saying things with confidence that are false, I'd say humans have a long history of this entire business school. Classes get good grades in business school by saying things that are not true, but with authority. It's almost a core skill of a CEO to be able to speak with authority about and make stuff up.

‍

Dean Abbott - 00:16:25: Yeah, so it behooves us. In interacting with humans, we based on how much we believe what someone says on their authority or their trustworthiness in the past, and we trust but verify. I guess it's one of those phrases. You say there are very few people that will just believe everything they say without unflinchingly. I think one of the things that is going to be a challenge for Chat GPT in the future is the data that it's using is, of course, digital data. So it doesn't have the same kind of common sense that we as humans have. So I don't think we should view it as a human replacement, because we gather information in a wide variety of ways through our senses, through our experience that's not articulated in any particular way. And also our way of learning clearly is not the same as what Chat GPT is doing. Now. I'm not saying that ChatGPT should try to emulate human learning. It was a colossal failure to try to emulate how to fly based upon how birds do it. We've seen all those movies of people trying to do that and it just doesn't work. So we'd understand the basic physics and then come up with another way to fly. And in terms of a natural conversant engine, language engine, or knowledge engine, I think we definitely need a different model than just trying to emulate humans because we don't understand how people learn particularly well. There are some things we understand, of course, but that's not something that we've plumbed the depths to the degree that we can emulate mathematically anyway.

‍

Anthony Deighton - 00:18:01: Fair point. So you make this point that the input data to these generative models is really important. And my guess is, although I want to put words in your mouth, that you might have a general view that the quality of data in the enterprise or the quality of data as an input to any of the work you have done or doing is one of the more important elements of the success of the project. And to that prior conversation, we would improve the quality of ChatGPT if we fed it better data, if we fitted only ground truth, whatever that means. By definition, it would only speak ground truth, but maybe speak for a moment more generally than Chat GPT. But this idea of the quality of input data as an important grounding truth in model building and model solving problems.

‍

Dean Abbott - 00:18:49: With models, yeah, and I'm glad you're expanding it because I was going to do the same thing, because I think that principle of what data to use to build the models is so critically important. And there are obviously limitations with any model that we build. Now, just philosophically, I'm in the camp that says math is not biased. So some people are concerned about bias in models. And I say it's not the math, it's not the algorithms. It's the data that's biased. And the algorithms will spit out whatever data you feed it. That's what its ground truth is. And all of these algorithms, and this is something you mentioned, too, that I completely resonate with is these algorithms are naive. They trust everything you say. They think everything is true. Unless there are ways you can do this with modeling data to give kind of a confidence or a weighting associated with a record, to say how much I believe this or what happens mathematically is how much influence should this record have in building the model? So you can get fractional weights if it's a low source. Most people don't do that. Most people just say, here's a bunch of data, away we go. So there is going to be bias, undoubtedly, in every data set we bring to the picture. It's important to understand what that bias might be. And sometimes the bias is a good thing because we're trying to make specific decisions. So, for example, if I'm in retail and I'm trying to predict, like, what's the likelihood someone's going to purchase something on the website in the next week? So what should that set of data be that I use to build the models from now? You could say, I only want to know that answer for people who've bought something in the past. Okay, so that's fine. So that's a subset that's a bias sample. Of course, they're more likely to buy something again, but of course, if that model gets applied to the broad population, we have no guarantee that it's going to do well. I mean, it might be by chance or by hamsters, or it may be that the patterns of behavior are the same between people who've bought something in the past and those who haven't, but probably not. But that's a decision we make as a business of what pool what population we're trying to make the decision for. So you design the data based upon what you're trying to decide on. And we do this as humans kind of naturally, if we're trying to make a decision. Let's say you are making a hiring decision, and you want to know, okay, I need somebody for this position. Well, who are you looking for? Well, I want somebody who is in this Senior Data science position, someone who knows the algorithms really well. So they've got Master's or PhD, but they know retail really well, so they've got, like, 15 years of experience. And they're experts in Python, they're experts in R. And you come with all these lists. Of course, there's nobody that matches all these. Right, and that's the way a lot of these wrecks are created. But what do we do as people when we're looking at this internally? We trade off how important each of these are when we're looking at this data coming in and the algorithms don't know how to do that, they treat everything equally. So how do we achieve that if we have a sense that, okay, all these things are true, but we really want to bias our pool more toward understanding the algorithms, and they can learn the retail part on the fly. So that means while it's good that they have that, but that's not critical. So what does that mean for the data? That means when we're bringing in data, we want the data to reflect those kinds of patterns so that the decisions that come out are reflecting our biases. So sometimes we don't want bias, sometimes we do. And we've got to build the data to reflect what decision we're trying to make and what subpopulations are more important to us in the process of making those decisions.

‍

Anthony Deighton - 00:22:30: Yeah, I think what's interesting about that is that the kinds of bias we introduce into these models may not be obvious from the outset. So building a model predicting what you might buy on a website or in a retail location, the idea of using purchases of the past seems quite logical, right? I mean, like, what other data do you have? That's the data you have, so you use it, but recognizing that that introduces a bias of the kinds of people who've shopped with you in the past. So, for example, if you were trying to address a new market segment that you'd historically never sold to, using data from the past about the segment that you've acknowledged that you're not successful at selling into that segment previously and you're trying to address that segment. The model could be not only not helpful, it could actually be harmful. It could actually set you up for failure.

‍

Dean Abbott - 00:23:16: Absolutely. It's such a tricky thing. A lot of times when I speak with Execs and all, there's a view that we've got a whole pile of data, we'll throw it at you and magic will happen because machine learning is basically it's magic. Well, I don't know how it works, but it's just good things happen.

‍

Anthony Deighton - 00:23:31: Well, just say because machine learning.

‍

Dean Abbott - 00:23:33: Exactly. So pick up your example with a new product. There's a concept in time series modeling in particular called stationarity. And so what we want to know is how consistent are these patterns of behavior over time? And the more stationary these are, the further back we can go without breaking the patterns. But you were mentioning a new product, it may be that there's a problem if we go back too far in time with a new product because maybe it could be the business change. Fundamentally, the demographic of who's buying has changed in the last three or five years, and sometimes that's very difficult to identify upfront. We don't really know where this is breaking. So usually my rule of thumb is I like to keep the time period backward of time for these kinds of models, whether it's IRS models looking at sometimes there's regulatory constraints that have changed over time, like the tax law has changed in the last ten years. We can't look at patterns of non compliance ten years ago, and it necessarily applies now because things change so much. So I want as short of a time period as possible so that I have the most stationary amount of data. But then you've got a conundrum, right? The shorter the time period, yeah, that's more relevant to what you're going to be predicting in the near future, but there's less of it. And so how do you trade that off? Have enough data so you get stable patterns, but you don't go back too far in time so that you break patterns. Now there's ways you can handle that, and I'm not going to go into that here. I'm happy to go through in fact, if there's a part two ever, I could talk about those kinds of strategies because they rely on statistical tests and things like that. But you can get at that kind of a design of experiments approach to how you build your data. But you have to think about it. You don't need a PhD. Statistics to do this either. You just have to have some common sense and probably some experience with where these models fail to be able to get there.

‍

Anthony Deighton - 00:25:24: So speaking of experience, and you brought up the IRS and the Navy Seals, certainly welcome to pick either of those examples I'm trying to decide whether it's better to ask you about spectacular failures or spectacular successes, since both are fun, but maybe share a practical example of how you've seen this work applied.

‍

Dean Abbott - 00:25:42: Let me mention one first at this time. I'll mention the second one. One of the most surprising successes was us building models for the Defense, Finance and Accounting Service, which are the government accounts. And I think this gets at a real core business question about what algorithms you pick and what data you use and how you decide what's good or not. So the problem here is that people submit invoices to the government and some of them need to be investigated because people are people and they try to defraud the government. They'll submit an invoice that's completely bogus, or they'll submit a duplicate invoice on a contract that's a valid contract, but you've already paid it out and they're hoping it gets through the system or there's a bunch of different ways that this fraud occurs. One of the things that was interesting is we're interviewing the stakeholders was that they have only so many investigators, so each month they may get and I'm making the number up like a million invoices coming in, but they could only investigate 100 each month. They can investigate hundreds. What does that mean from a machine learning perspective? That means when we're building a model, first of all, we want models that do really well at the tail. You can imagine in your head a normal distribution or something like that. Just the tail, that's the only place that really matters. The rest of it we're not going to investigate. So you can think of it this way. The model does not have to be accurate at all. It could be a coin flip almost everywhere. As long as at the very tip of that distribution, it does really well. That does inform what kinds of algorithms you might build. There are some algorithms that are great across all records on average, it does a better job, and there's some that are better at finding small homogeneous subsets of behavior that are interesting. And so that did drive us toward particular kinds of algorithms. Secondly, it drove us toward what metric we use to decide which model or models to pick. Because in this case I would never use, and I almost never use things like percent correct classification or average error or something like that with any of my clients because what was interesting to them, we want to maximize the effectiveness of the models in that top hundred period. So we picked the models that did the best job at that small subset, regardless of what the rock area in the curve was, regardless of what, whatever classic metric, precision and recall or anything like that, we didn't care. And that was really successful because that gave them a better list to give to their workforce to investigate. So that drives data. And there are. Different data issues we had considered too, because most of the cases were not labeled. Most of the case, they didn't know if it's fraud or not. They didn't investigate it. But that means if we assume that there's some low level of fraud, we don't want too many of the unlabeled cases in our modeling data because the more there are, the more likely is that there's something we're calling not fraud in some way or suspicious, the more they will leak into our model. So we want to have all of the fraud cases and a smaller subset of the non fraud cases to build the model more effectively. So that's a bias we introduced into the data to avoid a potential problem with the labeling of the data. Why, I just blurted out like a ton of stuff all at once. But that I thought was a helpful project to get at. There's a myriad of issues that we have to do as machine learning engineers or professionals to solve the problem because we want it to be effective for the business, of course.

‍

Anthony Deighton - 00:29:11: And I think it speaks to this broader point, which is that things have changed a lot over time. So you do them very differently today than you might have done them in the past. And you're thinking a lot about what data is being brought into the process, what algorithms you're selecting. And it's not just as simple as sort of building a neural net and saying that must be the answer because we forgot what the question was. Maybe. I'm going to go in a completely different direction for a second to sort of wrap up a little bit. You've had a long and illustrious career in space and I think you've pointed out very well that the Space today algorithmically looks very similar to what it looked like 20 years ago. But there's some important differences in terms of how much compute is available and how we sort of approach problems. But what advice would you give to somebody who was just graduating their undergraduate today? They saw Chat GPT, they think that's super interesting. I should be doing something with that and let's assume, to make it a little easier, that they have some basic stats and math interest and or skills. They're not just humanities majors. What career advice would you give them?

‍

Dean Abbott - 00:30:13: It's a really good question because the buzz is around things like Chat GPT and deep learning. But the vast majority, when I say vast, I mean like 99% of the problems out there to be solved are much simpler and much smaller, but very important. I've got a colleague, a friend named James Taylor who has this great picture of making business decisions, said they're big strategic decisions. There's the big ones that require complex models and things and that's kind of the deep learning mindset of thinking, especially around image processing or language networks. But then there are all these simpler, smaller problems, but there are so many of them that in aggregate, they probably will generate more revenue or make your company more efficient than nailing these big strategic ones. And so I would say for the graduating undergrad or grad student number one, don't be afraid of working with a company that is not data driven yet, but wants to be data driven because there are so many problems to solve. To me, from a data perspective, almost all of them are interesting. So there are so many problems to solve out there. There's a lot of work out there that you can make use of these algorithms in an effective way. If you're really just interested in the deep learning side, that's a more competitive and narrower niche of the space. But if you're interested in data more generally, then there's plenty of work out there. But also when you get into that company, one thing I recommend to people, your job when you come in there is to make your boss and your boss look good and to make them look good and smart by what you do. So learn the craft, learn how to build the models well, but also learn how to understand what they want. And what they want is not necessarily what they say, because decision makers and analytics, we're getting better, but at the project manager level or the director level, the VP level, a lot of these individuals still don't have the language of data down yet. They don't know what machine learning can do, what it can't do. So you have to be a good listener to pick out those things that they're trying to accomplish and translate that into an analytics approach. I strongly recommend with all of these situations that we have lots of feedback loops and every step is like weekly or bi weekly, at least go to your boss and say, this is what I'm seeing in the data. Does this make sense? Because again, the data driven approach is great, these algorithms are fantastic, but they have no common sense and the data bias issues are really hard to tease out until you see what the models actually do. So those are two things I definitely recommend for folks. And learn that well, become proficient and learn how to apply the models in the context of the business. Get good at that before you venture out on your own. It'll take several years before even with our modern cycles that people want to become an independent consultant in the first year. It takes a while to be able to learn the things you need to learn. I've got a talk track of five things they didn't teach in grad school. And it's these kinds of things. It's not a negative tour, they teach what they can. But there are things that you learned about the art of this field that you just have to experience in order to understand.

‍

Anthony Deighton - 00:33:40: And I think that's really good advice, sort of get in there's plenty of problems to be solved, and go follow the problems that are interesting and motivational to you and dig in deeply and learn how to solve those problems. Become good at the craft. And ultimately, like anything, it is a craft.

‍

Dean Abbott - 00:33:56: And toward that end, remember, you're interviewing the company as well and make sure that the person who is your direct report is somebody who you respect and serve you well as well. Because I've heard that from a lot of people it's a good story, it's a good company, but the person they're reporting to is just terrible and they have to leave because it's such a bad situation. It's too political or something. And I know it's hard to tease out, but keep that in mind. That person you're reporting, especially for one.

‍

Anthony Deighton - 00:34:24: Of those first jobs, since I did ask somebody who's graduating undergrad as those first jobs, those people that you work with are ultimately some of your biggest advocates. Over could be many. Many.

‍

Dean Abbott - 00:34:35: Yes, exactly.Anthony Deighton - 00:34:37:Well, Dean, thank you so much for the time and the insights. It's been a pleasure to have you on DataMasters, and we'll look forward to chatting again soon.

‍

Dean Abbott - 00:34:45:Thank you very much, Anthony really enjoyed it.

‍

Outro - 00:34:48: DataMasters is brought to you by Tamr, the leader in data products. Visit tamr.com to learn how Tamr helps data teams quickly improve the quality and accuracy of their customer and company data. Be sure to click subscribe so you don't miss any future episodes. On behalf of the team here at Tamr, thanks for listening.

See Full Transcript ↓

The Language Revolution: Navigating the Frontier of AI-Language Models with Dean Abbott

Dean Abbott

I'd rather read the transcript of this conversation please!

Suscribe to the Data Masters podcast series