Stanton Chase
In Conversation with John Parkinson: The Hype and Hope for Artificial Intelligence, Part 2

In Conversation with John Parkinson: The Hype and Hope for Artificial Intelligence, Part 2

March 2025

Share:

Video cover

AI is moving faster than most can track, but a recent conversation with John Parkinson cuts through the chaos

Greg Selker, Managing Director at Stanton Chase Baltimore and North America’s Regional Sector Leader of Technology, spoke with John Parkinson about artificial intelligence. Parkinson, former CTO at Capgemini and TransUnion and current Partner and Managing Director at Parkwood Advisors, shared his observations on AI implementation and its effects on organizations. Their conversation covered AI development, edge computing, open-source versus proprietary models, and practical business applications during this period of technological change.

This conversation has been edited for clarity and conciseness.

Dealing with the Growing Complexity of AI

Since we last spoke, the overall world of AI and LLMs has gotten more complex and seems to be more complex every day. How does one make sense of all this and navigate through this increasing complexity?

It’s difficult. We maintain research threads to try and keep track of it all. And we’ve almost given up trying to do this from a primary perspective. We subscribe to probably two dozen content aggregation services that do a pretty good primary survey of what’s going on in the field and then we cherry-pick amongst these. One of the nice things about the Edge AI that’s emerged over the last year is that I can run a lot of this locally. I have a couple of workstations with two GPUs each that let me run a fairly big model. We can run up to 70 billion parameter models in-house. So, I can collect all of this stuff, run it through the models and ask for the common themes. 

I think many people are unaware of how astounding that is and how quickly we’ve gotten to this point. What would this have looked like three to five plus years ago?

It would’ve looked like a data center running on tens of millions of dollars of compute. Just to run a query would’ve been equivalent to what it takes to run a training session today. And it continues to get edgier in the sense that there’s some things now that I can run on my phone with half a terabyte. That’s not very much in model data terms. So, I’m dependent on an internet connection, which means I can’t really control the data that’s represented by the prompts that I put in. But in my home data center, I’ve got 300 terabytes of network attached storage with all of my accumulated source data. I can run augmented generation queries against that, so I don’t have to remember everything my search agents find. I just put it in and run a graph theory classifier that builds a knowledge graph of all the materials collected. That sits behind the retrieval augmented generation layer that the models use when I ask them to check locally if it’s something we’ve already seen, or it’s a spin on something we’ve seen that’s adding something new, or it really is new. This gives continuous learning capability. 

The models never forget anything unless you tell them to. This is a new aspect that’s developed over the last eight months. And it is one of the drivers behind models performing better over time in that you can have them remember how good their answers were. With a little human feedback, they literally improve faster than expected. There’s a huge body of ongoing research that’s looking at the why, because it’s not entirely clear that we have a good theoretical understanding of it. 

The Emergence of Open Source vs. Proprietary Models

You’re using plural “models”. Your answer to my question about navigating through today’s complexity was, you’ve constructed a model to aggregate and differentiate between the models that are out there to understand which ones make more sense for which applications.

That’s a reasonable inference. We basically decided to split the world into open source and not open source, or proprietary. There’s a halfway house where some parts of the model are open source, but the weights or the training data are not. We have some clients who are only interested in open-source models. They don’t want their data leaving the premises. At scale there is literally no way to avoid that. We hope that the contractual arrangements with service providers are such if you send your data or your queries to the cloud, your privacy is respected. But there are no guarantees that’s the case. And we have anecdotal evidence that “private” corporate data has shown up in training sets. 

Proprietary LLMs are the problem. Because you can’t run those in-house. You must run them via an API. And when you use the API, you’re sending things to the model. Even if you point at in-house data for the RAG (Retrieval Augmented Generation) applications, it’s difficult to ensure that data doesn’t leak out somehow. 

We’re asking models to tell us your best guess, and then to go look at our data and tell us if there’s a better answer there. But if the compute is running in the cloud, you’ve opened a pipe into your corporate data that’s behind the firewall. We have clients who are uncomfortable with that. With open source, everything is in-house and run locally. There are performance issues, but by and large, you’re not trying to serve a million users when running the models locally. 

What are the limitations of that model?

The limit is you are paying for all the compute required to run the model for fine-tuning an inference. For example, I have two high-end consumer GPUs, approximately $10,000 of hardware. That’s a third of an H100 cost. If I’m going to run the model in-house, I need to use it a lot to justify the cost of the compute. This has become a much bigger question than was commonly posed a year ago. Not, “is this stuff good and does it deliver the value?” It’s, “can I afford it?” 

The DeepSeek Breakthrough

This a perfect segue to talk about DeepSeek.

Yes. The rational but not proven conclusion is that DeepSeek had a lot more GPUs than they claimed and that they did some very clever low-level programming below the Cuda level, which is where most people stop because it’s hard to figure out how to optimize what goes on in a GPU. But they did that with low-level primitives that NVIDIA provides, getting 5X-10X the throughput that the GPU would normally be capable of doing. 

But the interesting thing is they used enhanced training strategies, so their resulting model is not just one model. It’s what the state of the art calls a “mixture of experts” where there’s one model that tries to guess what you’re trying to do and then it farms out the work to smaller models which do specific things well. The best guess I’ve seen is they’re probably 10X more expensive than they claimed. But that’s still 100X cheaper than the leading lab foundation models cost. 

So put all that aside for a minute. The other thing they figured out, possibly by skirting the contractual language of the Open AI API agreement you sign when you get an API key, is that you use the early results of training your model as prompts to a big model to tell you how to improve your model. This reinforcement learning with machine feedback, not human feedback, looks to be a very interesting strategy. Then they built this distillation process where they produce smaller models, which help their big model perform better. I have a laptop which runs DeepSeek R-1 distilled down to 1.4 billion parameters. Our benchmarks say it’s about as good as up to the 70 billion parameter model. The big model, 671 billion parameters, takes a minimum of eight H100 GPUs to run. That’s expensive; $300,000 of hardware. I can get almost as good results, and as good on specific domain queries, on a $1,200 laptop. 

That’s significant.

Yes. It is. And from a theoretician’s point of view, it calls into question the scaling laws business driver model that’s pushed the U.S. foundation labs to spend tens to hundreds of billions of dollars on compute capacity. We might still need it because my laptop is supporting one user. I couldn’t get the same performance if I had a million people use it. We still don’t quite have the right stable model of how the infrastructure economics will work with DeepSeek. But it seems the road to whatever the future looked like might not be the only road. 

Beyond Raw Computing Power

So, it’s not just about increasing compute. There are fundamental ways in which we are thinking about how compute is being interacted with and leveraged that we can get more bang for the buck than was being done by simply adding more power.

Yes, but nothing comes free in technology. For every thousand programmers who can write Python, there are probably 10-15 who understand how CUDA works and can use it effectively. Not just call a library and hope for the best. Of those 10 or 15, there’s two who understand PTX, the low-level programming environment that DeepSeek used to get the performance out of an otherwise subpar GPU. So, the population of people who can make things go really fast is very small. 

The proprietary foundation labs in the U.S. don’t talk about how they do this. Their papers hint at a few things, but the impression given is they’d rather throw dollars at hardware than do very complex and difficult low-level programming. We are pretty sure the Chinese would be willing to take the path of training a lot more people to do it. That would be culturally very difficult here in the US. 

Why?

We don’t turn out enough people in the 1000th class, let alone in the two. One of the things I tell people is that you should never make success dependent on skills you don’t have or can’t get. The road that DeepSeek took depends on skills which are very difficult, and most people can’t get. 

So, what’s the best way to address this?

Use what works. We know how to do a whole bunch of things that don’t require specific skills because we’ve got tremendous compute resources. We are not banning people from buying NVIDIA GPUs in the US, so we’re going to have to traverse the scale-laws route for a while. DeepSeek open-sourced almost everything, but we don’t have a complete view of the training data. We do know what the weights that go into the model are and you can tweak them. You can de-bias the “be good about China” stuff relatively easily. You can go in and look at where information is stored in the parametric view of the model, and tweak the way information gets stored so the answers come back more as you would expect from the Western model. 

We can learn a lot from the DeepSeek experience that will improve the efficiency of both pre-training and post-training strategies. But the truth is that none of that impacts how inference works. The interesting thing about DeepSeek, which no foundation models did previously, is it tells you how it gets your answer. So, you can look at it and see what chain-of-thought model it’s using and how it moves between the mixture of expert environments. You can basically replicate the entire thing. I have the source code for the model running on one of my computers. And you can think about how to alter what’s essentially a processing pipeline for specific situations to get better answers cheaper. At the end of the day, Jobin’s Paradox says if a lot of people come and use it, it will get cheaper to use because you’ll get economies of scale. The first step is it’s got to be easy to use. 

Traditional AI vs. Generative AI: Business Applications

Given this landscape, how does a business figure out when it is best to go down the GenAI track versus traditional AI?

The advice we give people is that deterministic analytic processes will always be cheaper than model-based processes. You can predict what they’re going to cost. You always get the same answer from the same inputs. You don’t have to invest in building guardrails. If your use case is about finding things that aren’t obvious, model-based extractive AI works fine and it’s not terribly expensive. But you must understand the output to get value from it. It doesn’t provide declarative answers. It simply says, “We saw this pattern. Does it matter?” The world is full of patterns that don’t matter as well as patterns that do matter. if you can find them and understand contextually what they’re telling you. So, as a tool for aiding analysts looking for patterns and understanding them, non-GenAI is powerful and should be used more. The real problem is it requires human thought to understand the results. 

The magic in GenAI is that it can create an output that was not perfectly represented in its training data. It can intuit things that you would not have seen. Some are hallucinations, but some are combinations of things that, given enough time a human could potentially have gotten to, but you will get them much quicker. They’re probably going to be verifiable. And once you’ve told the model that it was right, it’s never going to forget that it will get better at doing it over time. 

AI HR

Hallucinations and Their Consequences

Are the models getting better at differentiating hallucinogenic from non-hallucinogenic output?

Yes, in the sense that the error function, how often they get something wrong, given the vastly expanded set of questions they’re being asked, is declining. But it also means the things they get wrong cost more. 

Can you give an example?

If they make something up that could be a discovery, you’re not going to know because it’s not been discovered before. You are not going to know whether it’s real or not or if the model got it right until you’ve made an investment to verify. If it comes up with something verifiably false, you’ll know it’s a hallucination. The famous example being Mexico did not win the war of 1812. You can weed those out and using reinforcement learning, eventually eliminate the probability that it will make that mistake again. But if it comes up with something like, “Build a fusion generator like this and it’ll work,” and you build it and it blows up, that’s a problem. Or “design a drug that does this” and you’ve still got do the in-silico testing of pharmaceuticals. It’s not good enough yet to eliminate in vitro or in vivo testing. So, you could spend a lot of money going down a rabbit hole if it’s a realistic but wrong hallucination. 

AI and Creative Work

This brings me to trying to understand the true application or implications of GenAI for creative or generative work?

We still don’t see much evidence of true creativity in that regard. Valuable but not true human creativity absolutely exists. But it’s largely because you’re mining a huge set of data to put together things that a human could ultimately get to eventually. They would never be able to absorb and remember all the data and they couldn’t organize it fast enough to get the output in the way that the model does. The areas where this is getting interesting are text-to-speech and text-to-video, which was trivial for the longest time. Even the best models would only produce things that were cartoonish for tens of seconds at the most. But I’m talking to a company that’s got a photorealistic text-to-video generator for clips for up to two minutes, which means you can build TV ads in minutes at very low cost. Their compute cost is less than $10 a minute, a game changer. Now, you could argue that there’s a whole bunch of human creativity that’s wasted building idiotic TV ads, but there are a lot of idiotic TV ads. 

Getting long-form video out of that is a much harder challenge, but short-form photorealistic video is the first step. It might only be a matter of enough compute cost. The folks who built this are a combination of ex-Hollywood movie and technology skills. They think they can impose stylistic consistency over time on their videos. There’s some evidence that they can do this. So, you could go in and say, “Make me an ad that Steven Spielberg would’ve made,” and it would be recognizably Spielberg-like in the result. Which throws up a whole bunch of additional interesting copyright issues because there’s currently no way to copyright a style. 

One could infer that the capability of developing a minute, to two-minute video segment that is for a single purpose, e.g., engendering interest in a product, that over time with enough learning that would go into the model and enough computing power could begin to generate longer and more complex stories.

A 30 second, or even a two-minute ad has maybe 1,000-to-2,000 words. The context window, the amount of text you give the generator, is quite small for that -typically under 4K. You look at a 30-minute TV episode and you are pushing into the 25,000-to-40,000-word range. That’s a bigger context window, but it’s not unimaginable that could be done. A full movie script is closer to 100,000-to-150,000 words, a lot of which are not spoken. They’re the nonverbal parts of the action and directions for the camera crew. That’s a huge context window. We can’t do that today, but we probably can within five years. 

Years ago, I had dinner with James Cameron not long after Avatar was released. And he didn’t talk about how he thought about the movie, but how he thought about how he would see the movie so that it would be consistent with how he thought about the movie, and the technology they had to invent to do that. Movie directors have a mental picture in their heads. What the result looks like and where GenAI could impact this is what’s interesting. As we previously discussed, much of this was unimagined and unimaginable. What we can imagine is now being driven by at least the examples of what we see as possible, even if they’re not yet always scalable to real use cases. 

This is fascinating because obviously there’s been a trend for companies to move their own proprietary data center to the cloud. Will the trend towards leveraging open source and the power of the edge lead back to companies managing their own data centers rather than pushing everything to the cloud?

The jury is out. There is some work migrating back from the cloud because of data privacy concerns. But the issue really is that AI as we think of it, particularly the closed models, are SaaS. You are sending your data to somebody. You are not using a platform with your software and data on it to run the application. I believe we will increasingly see people building tuned versions of open-source models. We think this is why it’s happening as fast as it is. Tuned versions of open-source models which will be worked out locally and implemented in the cloud as a platform as a service, not SaaS. 

That makes sense. So back to the question of GenAI versus data-analytics AI that is driven more from an ML and RPA perspective?

More routine things are going to get software augmentation of some form. More processes are going to get data-driven, which is not a panacea in the sense that history is not always a good predictor of the future. But we’ve got to see this happen a lot more where there’s an evidence-driven performance improvement opportunity. Part of the problem is that we don’t really have a consistent open reference architecture for how to do this today. There’s a danger, like the early days of ERP, that RPA will lock you into a vendor who may not exist five years from now. So, we’ll probably see a consolidation around platform strategies. 

The other interesting dimension is there’s a lot of focus on vertical-industry or market-vertical applications because we have to tune to the language of the vertical, but there’s also a lot of horizontal opportunities where every business has the same general solvable problems with a lightly tunable horizontal solution. I think we’re going to see both of those explode over the next two to five years—or even six months. I’ve given up on predicting timelines. 

Government Influence on AI Development

What impact do you believe that the Trump administration Stargate project will have?

None. They don’t have any money. 

So, you don’t really see government funded or government backed AI initiatives really shaping the global AI landscape.

Well, in the sense of having money, the government can print as much as they want, and they probably will. The arithmetic is interesting. It’s conceivable that you can conjure up $100 billion a year of private equity and subsidy-based investments in something like Stargate. But you can’t build data centers that fast. Even Elon couldn’t do it. It took everything that X AI had available to build one data center in eight months. And it’s not a scalable skill. You must start with something that’s already there or permitting costs you two years. 

The government could reduce a lot of those factors. But there isn’t sufficient construction capacity in the United States to build out as much as Stargate says it’s going to do even if you had the money. We don’t have raw materials, the land, the electrical network connectivity. You could put natural gas generation plants in every data center, and we probably will, but you must build pipelines to get the natural gas in. That limits the number of places you can put it. The more concentrated you get, the more power you need to consume. And all the places where you can do that don’t have a workforce. So, when you start to deconstruct what these fancy headlines say, it’s not going to happen. 

Is it a lot of bluff and blunder?

It’s not like nothing will happen. Some things will happen, but Stargate won’t happen. We won’t get half a trillion dollars of investment in data center capacity for AI in the next five years. It’s just not going to happen. 

Risks and Strategic Considerations

Let’s spend the last few minutes looking at these different trends that we’ve discussed in the context of the risks posed.

Everybody you talk to in this space wants to go fast until they can’t. Then you start to see what the gating factors are for changing things in a business, a market, or an economy that prevents moving as fast as you’d like. There’s a lot of inertia built in, a lot of familiarity with the way things are. Homeostasis is very difficult to significantly disrupt without effectively breaking everything. And while we can afford to break little pieces here and there, trying to break big pieces is very risky for many social and economic reasons. A big risk I see is we will try and do more than we are able to absorb from the perspective of managing the rate of change. 

A lot of what we do now could be 5% but not 100% better. If we focused on the areas where we can surely make a big difference, they would be much narrower, much more likely to be successful and therefore much less risky. But there is a danger. I remind people that back in 2000 when the dot com crash happened, the global economy burned roughly $800 billion in 18 months on dot coms. The telecom industry burned $3 trillion in value, which nobody was paying any attention to. That could happen again. Back then it was a $65, $70 trillion global economy. Now it’s a $120 trillion global economy. Arguably our ability to do damage has been magnified by the growth of the global economy. We really ought to pay more attention to the risks associated with that because the things we break now are much more intertwined and complexly important than they were 25 years ago. 

I agree. A definite theme of mine is looking at the interconnectedness of systems, of infrastructures, of people, and the rapidity of where communication is occurring. These interconnections are obvious and have massive implications in terms of when disruptive events happen or when things fail. 

Right. And those kinds of changes have consequences, they’re not always well understood or thought out in advance, not always even visible in advance. 

So, does this line of thought mean that we would be better suited to focus on where we can drive those 5%-10% incremental differences, or leave those aside and try to identify the areas where we can do massive evolution and change, knowing those improvements carry greater risk?

It is an interesting question, Greg. If you do the classic two by two evaluation, you probably want some high risk, high reward endeavors going on. And it’s difficult to pick the ones where the trade-off is understood well enough for you to be confident in its value. You want to avoid the high risk, low reward part of the world, and you probably won’t invest in the low risk, low reward part. Although there’s a lot of that you probably could do. It’s the low risk, high reward part that we would really like to find. And those I don’t see terribly clearly yet. I’m sure there’s some. 

Are we trying to identify what those lower risk, high reward opportunities are?

We are not because the hype-cycle drives you away from those kinds of areas. They’re almost always not very interesting. 

What do you see might need to happen to shift organizations, institutions, people, to begin focusing on those lower risk, high reward opportunities?

I think that if we stepped back a little from automation and looked more at augmentation assistance in making humans more productive, we’d find more of those opportunities. 

Instead of trying to replace, improve.

The focus on GenAI is philosophically interesting. But it could be a chimera. Maybe we’ll never get a machine that’s smarter. Faster, yes. More reliable, probably. Smarter, I doubt it. Because as I said in our original conversation, we don’t have a good model of what makes humans smart. We might not even notice if we made a machine intelligent. 

Although one of your observations was that while we don’t know why this is happening, the machines are learning, and we don’t understand completely what is happening yet that is having this occur.

So, what we’re apparently doing is constructing algorithms that are better at minimizing error functions. 

That makes sense. So, is this giving the illusion of learning?

It certainly looks like learning to most people who don’t learn very well anyway. 

About the Author

Greg Selker is a Managing Director at  Stanton Chase and the Regional Sector Leader of Technology for North America. He has been conducting retained executive searches for 35+ years in technology, completing numerous searches for CEOs and their direct reports at the CXO level, with a focus on fast growth companies, often backed by leading mid-market private equity firms such as Great Hill Partners and JMI Equity. He has also conducted leadership development sessions with executives from companies such as BMC Software, Katzenbach Partners, NetSuite, Pfizer, SolarWinds, Symantec, TRW, and VeriSign.    

AI & Technology

How Can We Help?

At Stanton Chase, we're more than just an executive search and leadership consulting firm. We're your partner in leadership.

Our approach is different. We believe in customized and personal executive search, executive assessment, board services, succession planning, and leadership onboarding support.

We believe in your potential to achieve greatness and we'll do everything we can to help you get there.

View All Services