W50 - What the Team Should Do on AI Coding Next Year
This week I read a piece from a16z and OpenRouter on“State of AI”They conducted empirical research using real tokens at the quadrillion scale on the OpenRouter platform. Just reading the beginning gives a disorienting sense of time — o1’s stable release was published on 2024-12-05, which is almost exactly one year ago. The share of tokens carried by inference-focused models rose steadily through 2025 and recently surpassed half.
Assuming model capabilities continue to strengthen over the next few years (multimodality, memory, and agentic behavior will keep expanding), a team’s core objective for AI-driven coding shouldn’t be merely adding a few tools or collecting some prompts, but building AI-ready engineering foundations. The stronger the models become, the more they can unlock our delivery capabilities.
To enable work to remain on the main track over the long term, the following five areas must be accumulative.
1) Architecture
AI struggles with projects that lack clarity. Such projects not only consume large amounts of tokens, but are also prone to regression bugs caused by overlooked implicit dependencies. A common problem in frontend architecture is mixing views and business logic — the entanglement of logic and side effects, and the inability to separate mutable from immutable parts. Using MVVM or MVC alone doesn’t solve this.
A project should be able to clearly answer “what is the core code.” The core code holds all business logic and state evolution; ideally it is composed of pure functions, framework-agnostic, environment-independent, unit-testable, and has deterministic inputs and outputs. The rest of the code handles network I/O, side effects, and UI rendering. This gives AI a stable focal point and makes changes more controllable.
2) Making context explicit
Without historical context, AI cannot understand why certain odd hacks exist in the code. The importance of documentation and comments is no longer just knowledge preservation — it directly affects the quality of code generation.
More specifically, domain information, architectural constraints, tech-stack preferences, and team engineering norms should be made explicit and, ideally, captured in formats that are easier for AI to consume — for example ADRs, constraint checklists, and README “do-not-break” sections. Combined with an appropriate toolchain, feed that context to the model.
3) Shift constraints left
I strongly agree with the idea of shifting problems left using compilers/static analysis. Rust is the archetype of strict compilation, but most frontend scenarios don’t require Rust to avoid incidents. The real issue is that TypeScript’s language advantages aren’t being fully leveraged. Historically, to accelerate TS adoption, it was allowed to be used permissively, and many projects never enabled full type and compilation constraints.
TypeScript’s strict mode is foundational and non-negotiable; we must aggressively forbid meaningless types like any. Standardizing custom lint rules is necessary — author custom ESLint rules to enforce architectural boundaries and unify language style, for example by enforcing the Composition API and forbidding cross-layer dependencies.
4) Programmatic verification
Code generation will get cheaper and cheaper; the truly costly part is verifying that generated code is correct. Make low-cost verification happen first. This aligns with TDD principles and also addresses the historical pain point of high TDD adoption cost.
With a good architecture you can clearly identify core code, and covering that core with unit tests becomes highly cost-effective. Frontend and backend collaborate by contract, so adding runtime schema validation with Zod is sensible. Frontend and QA collaborate through test cases; writing clear acceptance criteria or natural-language test cases before implementation may seem boring, but can be very effective.
5) Observability and troubleshooting
Customer complaint responses and production issue investigations grow less efficient as organizations scale and roles specialize. Logs are scattered across services, and gathering facts through chat threads can waste hours.
Integrate logging and observability so AI can see complete upstream and downstream data. At minimum, implement structured logs, unified tracing, and replayable critical paths. Automate fact collection first so engineers can respond more nimbly; this is also a necessary prerequisite for AI-driven automated diagnosis.
Last updated