The Path to Artificial Capable Intelligence

An examination of the stages of AI progress, bottlenecks to overcome, and the realistic path toward human-level AI competence across tasks.

Introduction

AGI is often the talked about frontier. This will never be actually realised. Over months, AI will just become so good that we decide it is at AGI. But, we will never be able to test enough different tasks to "validate" this. It is quite a useless concept & a goalpost that isn't even worth shooting for. Economic growth figures are the ones to index on. Real world impact. Suleyman's ACI term seems appropriate and it is the term we will substitute for AGI here. A general AI system that generalises across enough tasks to be considered as human level competence.

This thesis is about the stages of progress what the models need to do to get to AGI & the various bottlenecks that need to be overcome. Obviously, the path here is uncertain. Anyone that has tonnes of conviction is masquerading uncertainty because there are various questions that no one knows the answers to unfortunately:

Will scaling laws hold up? How much longer does RL scale? Will transformer architecture & current paradigm allow for memory/long-horizon learning? Will Chip Manufacturing happen at fast enough rate? Will investment continue to pour in? Energy Systems & limitations here? Deployment across all of society & how governments plan on managing this? What is Diffusion model & why does it matter?

This essay will seek to examine all of these questions without giving a definitive viewpoint that we will have AGI by X date.

Stages of Progress

Chatbot → Reasoning Models → Agents → AI R&D Intern → Full Stack Progress

Next Horizon

We were in the chatbot phase for 3 years. After GPT-3 launched in 2022, we were there until 2025. A chatbot is cool for consumers to be able to use & play with. But, it does not hold much economic value. Reasoning is the current stage we are in. o3 and r1 are the dominant reasoning models. Apple put out a paper disputing advances here, however, empirically, these models reason quite well. You can read their endless reasoning via the ChatGPT interface or something like that. There is much more progress to be done here. It still needs to improve in messy environments, it does not relate well to updated context. There is still a little way to go here.

What is needed for this to happen

After complex reasoning, the next frontier is agentic workflows. This allows drop-in remote workers to complete long-time horizon tasks. This is going to require overcoming various challenges including:

Continual Learning: The ability to iterate based on past feedback. All employees have huge context windows on specific tasks.

Agent Scaffolding: we do not have an effective way of giving agents the necessary tools, the necessary steps to complete a task or reason through. More importantly however, we do not have good computer usage. Unlike the internet, there isn't a whole internet's worth or computer vision data that you can train on. Therefore, agents are not yet good at interacting with the web, making autonomous choices & progress on long-term tasks.

Memory: ChatGPT's memory is horrendous. It is actually, in many ways, detrimental because it only gives a glimpse of your overall persona. It remembers ~20 facts about you which means it cannot contextualise you enough, but also asserts that 20% onto every statement. I tried uploading a past life of mine & it did not come away with any profound insights.

Onboarding Problem: It is hard to give the AI enough context on your life to make good decisions. Being human is highly contextual based on millions of data points across our lives on how to spend time, what decisions to make. Without this, it's hard to make effective decisions.

Long-Term Horizon thinking: For agents to be useful, they need to be able to perform a task just as long as human work tasks can take. For a managerial position, this could be assigning different tasks to different people giving feedback on these tasks. Better yet, for an execution-level employee, this could be hours or even month-long tasks pushing code or writing reports or investment thesis for corporate finance jobs.

Physical-World Infra: Some people like Yann Lecunn think without physical data & understanding of implicit physics, we won't have successful agentic workflows.

These will take quite a long time I think. Somewhere around 4-5 years to fully accomplish. Perhaps it will be longer for them to be fully deployed. Following agents, we should have automated R&D research. Some like Leopold think this will catalyse super-intelligence because there will be a plethora of algorithmic efficiencies. Agents seems to be the point where we get ACI & therefore I won't discuss the other ones in as much detail.

Bottlenecks along the way

This should be quite a long section of all the different bottlenecks that we could face, and the various fears associated with all of them.

Current Capabilities

There are lots of people in the community who think that very few changes to the current models will yield significant outcomes. Aschenbrenner is the main proponent. He believes simply by scaling compute to a reasonable threshold & unhobbling the models with memory, task horizon we will have significant impacts on the economy.

Honestly, after trying to use the glorified "MCP" amongst other things, there is still a while to go. I am not convinced of the above. Most of the gains still to come toward ACI are from agent scaffolding including memory, rather than necessarily getting the models to increasingly solve harder and harder problems. That helps and is needed for scientific breakthroughs, but is not needed for ACI.

Inherent troubles with the architecture — Transformer & Diffusion Models

There is a world where deep learning, in particular neural networks & the transformer architecture is not a suitable model to reach ACI. Yann Lecunn thinks this. The argument theoretically makes sense; the transformer models might not allow for memory to be built in. It may not allow for native scaffolding. However, it seems the transformer models can handle long-term tasks. Length of tasks is doubling every 7 months. Can that continue?

I also have to learn about the Diffusion models. They seem to be a different sort of architecture which is not auto-regressive prediction, but rather thinks in sentences. It prints the sentence & adapts it as a whole sentence rather than merely predicting the next word. Can this be the key to unlocking the next level?

My sense is that a lot of algorithmic efficiencies come from the labs & unless they are the DeepSeek ones which are open-source & the paper is released with lots of detail about why it worked, then it is hard to understand what is actually happening on the frontier.

Scaling Laws Stop

So far, over 80% of gains have come from simply scaling compute. Buying more GPUs. Building bigger clusters. Increasing the throughput of total energy. There is no indication that this will stop, yet. Many predict that this only goes so far & you need genuine innovation algorithmically to get to the next level. Still, GPT-2 cost $40k and now would cost $600. So, there has been lots of algorithmic progress. OpenAI just doubled the amount of o3 usage that plus members have access to. Inference compute cost, and more importantly training compute cost continues to decrease substantially.

We cannot meet scaling laws because of energy demand

Alternatively, scaling laws could continue. But, we simply cannot build the amount of energy data centers required to sustain the amount of GPUs needed to sustain scaling laws. Epoch AI had a lot of interesting data around this. Essentially, there is sufficient energy to continue scaling at the current rate. Perhaps, they will have to shift compute clusters from single distributed training centers into multiple distributed training centers. While there are tradeoffs on latency, this seems to be the way. There are some numbers here I don't remember off the dome.

TMSC fails to increase Chip Production

TMSC controls the manufacturing process of virtually all AI chips. This is unlikely to change because Dylan Patel thinks it would cost $1TRN & take a decade for the US to build it in-house. Basic fabs take $20-30BN to build. Taiwan has advantages of low-paid, high-skilled jobs from dominant UNI, 80hr work weeks in the fabrication facility. US could not replicate this. Right now, TMSC has been ramping up production gradually. They will need to seriously ramp up to 50x production. If they cannot, it will be a serious bottleneck to GPU production, scaling laws etc

Running out of data

Internet data will inevitably run out. There is 5.5TRN tokens. Pre-training data is increasing 2.9x YOY. It is projected to run out late 2020s or early 2030s. So far, there have been 20 tokens per parameter. If you do not have more data, then you cannot satisfy that demand for each parameter. This means you have to use less parameters. This means you use less GPUs because after a certain amount per parameter they dwindle in effectiveness.

Some remedies to this are synthetic data where the AI creates problems for itself, answers the question itself & then the new model becomes trained on that new data. One problem could be that garbage in = garbage out with the new model, meaning if the data isn't high-quality than it won't work well. Epoch thinks this has the biggest variation on how it could slowdown AI.

Latency

Because of multi distributed training centres, there is a world where there is too much latency between trained models means batch processing cannot happen. I do not understand the technical details here; just that, it will hurt the ability to build bigger models.

Regulation

Fundamentally, the capabilities of the model don't matter as much as the deployment of such models in the economy. It could be cool to have super-powerful AI in some lab in SF. If people are still doing mundane marketing roles & burger-flipping finance excel spreadsheets, then it is sort of irrelevant.

Throughout human history, vast inequality or poor social outcomes for humanity leads to unrest, lack of societal wellbeing and thus revolution. See the French Revolution, Russian revolution & technological revolutions alike are borne out of discontent. The government knows that if humans start losing jobs, there will be discontent. How quickly can the government implement UBI? How quickly can people re-skill and retrain into different jobs? These are interesting questions, and hard to answer.

In unregulated industries, the government lacks the ability to regulate it significantly. They could enforce a certain amount of employees in various functions. More likely, they will do this in the public sector. In Australia, 82% of jobs since Aug 2022 have come out of the public sector. ZIRP era is over. Interest rates & Inflation are high. Private companies are wary of over-hiring when they don't need to. Government could just enforce humans, or regulate the ability to AI to seep into industries.

The counter here is that competitive forces over time enforce countries to limit regulation because other countries march ahead & can invest more readily into their defence, healthcare, agriculture sectors meaning they become dominant and better over the long-run. Particularly, this is due to the US v China cold war that has started in the AI race.

In the short-term, this could happen, but in the long-term, I do not see this being a significant bottleneck. Though, it does present interesting questions around regulated industries like finance, law, healthcare, energy, industrials & how these industries will navigate AI adoption when most hospitals still carry around binders of paper.

Investment stops coming in & AI bubble bursts

$1 spent on NVIDIA GPUs should create $1-5 of economic value for software companies. Right now, NVIDIA has revenue of $100BN. This will grow over time & is expected to be $200BN by 2027. This is basically all done by the hyper-scaler AI companies: Google, OpenAI, Grok & Anthropic. To break-even they would need to have revenue of $400BN and $600BN with normal profit margins.

Currently, the economic value to the economy is not significant enough to justify the cost of compute right now. This happens with every software cycle, but it means that they would need to create $2TRN of consumer value to justify $600BN of usage. Right now, people estimate revenues of AI companies is around $20-30BN based on API usage & consumer subscriptions. Perhaps, they find other ways to monetize like an ad network, but likely more AI application layer companies will have to be built so that the API can be pinged more. Consumer subscriptions has a limit and can only get to $140BN and will always be far far short of that.

Companies could continue to lose money. Microsoft/Google, the main buyers of the GPUs, could theoretically continue to loss lots of money. They have excess profit/revenue from other portions of their company. But, this can only go on so long. There will be interesting fluctuations in the economy based on scaling laws as well. How this will impact the demand for GPUs & downstream the revenues of NVIDIA and costs for OpenAI & other hyperscale companies.

Some argue this creates the need for AGI by 2030 or else progress will slow. To mitigate this, we need ACI: AI to be deployed in various functions like coding, marketing & customer service and for businesses and consumers alike to extract tangible value from it so they can use it.