W24 - Half-Year Review of Large Model Applications

Recently I watched WWDC, Google I/O, Pichai’s interview, and read several summaries of the AI sector from the past six months.

AI is moving into productization and ecosystem integration; model competitions have cooled, and most companies can now openly build many “wrapper” applications.

Changes in model capabilities and pricing. Open-source models are rapidly catching up with and even surpassing some closed models; gaps between top-tier models have narrowed significantly. Although local models have limited capabilities, models like Llama 3 70B can now run on ordinary laptops and approach GPT-4 levels. Inference costs for large models continue to fall, with speed improvements far outpacing Moore’s Law in the internet era. Annual cost reductions for top-tier model inference commonly reach 10x, and some lightweight/distilled models have seen 100x decreases.

Some previously claimed that AI product value is measured by token consumption rate—just offering a prompt box or chatbot and waiting for users to ask questions is insufficient. But pursuing token burn alone clearly misses the point. The key metric for AI product value is the value generated per token, which appears in shifting business models: the industry consensus is moving from MaaS (Model as a Service) to RaaS (Results as a Service). Many B2B companies no longer charge by token usage but by metrics like how many new customers were acquired or converted. This mirrors advertising’s shift from pay-per-impression to performance-based billing.

We haven’t seen a paradigm shift where NUI replaces GUI. In daily use it’s still clear that clicking is often the most efficient. However, there is a noticeable trend in interfaces: moving from traditional “toolbars + menus” toward more natural, intuitive “natural language + intelligent execution.”

Application engineering faces more challenges. Issues like models excessively flattering users, system prompt leakage, and models “ratting out” information reflect challenges in model safety and controllability. Tool invocation, permission management, and prompt injection have become key engineering problems for deployment.

Apple has indeed lagged in AI. Jobs used to attribute everything to taste; I once thought Apple’s conservatism stemmed from a pursuit of taste and experience. After trying multimodal GPT-4o and Gemini Live, I found they’re already very usable and represent a generational gap for Siri—Apple’s cautious strategy is becoming hard to justify.

When major companies release more general products, many niche vertical apps are easily outflanked. For example, when Google launched AI Mode and Agent Mode—the former supporting more complex conversational queries and the latter proactively completing multi-step tasks (like home searches or ticket booking)—some vertical AI products (such as Perplexity, Manus) lost their appeal and relevance.

PreviousW23 - Meituan Q1 Report and WWDC NextW25 - A Small Surprise GPT Gave Me

Last updated 1 month ago