W44 - Planning Next Year's AI Coding Technical Decisions

Lately I’ve been thinking about how to set goals and how much to invest in AI coding next year, and sketching a framework for this kind of technical decision-making. This year AI coding is still at the individual productivity stage; from the perspective of building organizational capability, our resource investment is approximately zero.

First, look at how much cost space the internal and external environments can offer us.

The business’s cost tolerance is the decisive variable. Next year AI will most likely not be as hot as this year; the cost tolerance determined by internal and external conditions will decline. Opportunities missed this year are unlikely to return next year.

Full‑stack engineers should be discussed in tiers. Training a full‑stack engineer who meets average expectations across all specialized stacks requires very high cost and long cycles. A more realistic approach is to define a scoped scenario and set achievable targets. For example, within six months train a full‑stack engineer who can handle a certain generalized business scenario, and measure how many times faster delivery is compared with specialized‑stack collaboration or how much manpower is saved.

Next, probe the engineering limits.

Keep deepening your understanding of the boundaries and limits of a cluster of technologies and engineering — know how big a chunk your knife can cut. The engineering limits of current LLMs and Agents are converging.

The model capability hitting a wall is a conclusion drawn from many pieces of information. There are three main reasons:

First, Sutton’s recent discussion of “big world and small world”: large models are not truly intelligent, they operate in a small world, and they cannot keep learning continuously just from observation;

Second, the scaling law — an important rule — was shown to hit a wall with Grok. Grok4 already runs at roughly 200K GPU cards of compute; Musk is an engineer who believes absolutely in engineering, yet the market response was only so‑so and the model didn’t open a decisive gap. Clearly, when model capabilities are similar, xAI’s ability to deliver to users is not as strong as OpenAI’s and Claude’s;

Third, the industry’s hopes for breaking Transformer path dependence have not been realized; many expected a “Thinking Machine,” but half a year has passed without a breakthrough;

Therefore I think opportunities for leaps in model capability are narrowing — it’s fine to wait it out. It’s like the jump from Intel to M1 is already done; after that it’s just M2, M3… and in a few years no one will remember which M generation we were on.

This year many research efforts helped clarify the boundaries of large models, such as why you can’t count how many r’s are in “strawberry,” why hallucinations and uncertainty exist, context rot, and similar issues.

The engineering limits of Agents are also being gradually mapped, and there remains a large gap between Agents and real‑world tasks. For example, when code breaks: for simple problems (syntax errors) let the AI fix itself, but for complex problems (logic bugs, unexpected errors) a person must first figure out the issue and clearly tell the AI what to fix, rather than dumping all error messages into the AI. If you don’t know what to do and say, “AI, fix it,” and paste every error, the AI will make random changes and only make things worse.

Next year industry attention will tilt more toward efficiency and applications. One factor is cost — calling large models is still expensive now, but marginal costs could decline over time (for example, with recurring work like DeepSeek‑OCR). Another factor is that the Agent ecosystem may see a much larger breakout.

Finally, find the right scenarios and paths.

Although AI coding automates some work and accelerates learning cycles, people who ultimately can do the job still need to complete their knowledge system and build new cognition — that process remains necessary. Practical engineers understand this takes a cycle; each exploration is a contest between old and new, and the result is finding the most cost‑effective scenario and positioning. Front‑end went through a Node full‑stack wave: it didn’t become mainstream, but “what Node should do” is no longer debated, and that process took 3–5 years. Also, truly understanding the backend determines how deep you can go — front‑end lacks experience in some problem domains, such as process models and concurrency control, databases and transactions, authentication and security models, service scaling and disaster recovery. These factors show that full‑stack is not a jump but a spectrum of continuous strategic play.

PreviousW43 - Lessons Learned from the AWS Incident Next2024

Last updated 2 days ago