W24 - Half-Year Review of Large Model Applications

Recently I watched WWDC, Google I/O, Pichai’s interviews, and several summaries of the AI sector from the past six months.

AI has moved into productization and ecosystem integration, model races have cooled, and most companies can now confidently build “wrapper” applications.

Changes in model capability and pricing: open-source models are rapidly catching up with and even surpassing some closed models, and the gap between top-tier models has narrowed significantly. Although local models remain limited, models like Llama 3 70B can now run on ordinary laptops and approach GPT-4-level capability. Inference costs for large models continue to fall, with speed improvements far outpacing the Moore’s Law era of the internet. Top-tier model inference prices commonly drop tenfold per year, and some lightweight/distilled models even fall a hundredfold.

People used to measure AI product value by token consumption rate. Offering only a prompt box or chatbot and waiting for users to ask is clearly insufficient, but chasing token burn rate alone misses the point. The key metric is the value generated per token, which shows up in changing business models. The industry consensus is shifting from MaaS (Model as a Service) to RaaS (Results as a Service). Many B2B companies no longer bill by token usage but by metrics like new customers acquired or conversions achieved—similar to advertising’s shift from pay-per-impression to pay-for-performance.

We are not seeing a paradigm shift where NUI replaces GUI. In daily use, clicking is still the most efficient action in most scenarios. However, the human-computer interface is trending clearly from the traditional “toolbar + menu” toward a more natural and intuitive “natural language + intelligent execution.”

Application engineering faces growing challenges. Issues such as models over-flattering users, system prompt leakage, and models “snitching” reflect challenges in model safety and controllability. Tool invocation, permission management, and prompt injection have become key engineering problems for deployment.

Apple has indeed lagged in AI. As Jobs said, it comes down to taste; I previously thought Apple’s conservatism stemmed from a focus on taste and experience. After trying multimodal GPT-4o and Gemini Live, I found they are already highly usable and represent a generational leap over Siri—Apple’s conservative approach is increasingly hard to justify.

When major tech firms launch more general-purpose products, niche vertical applications can be easily overtaken. For example, Google’s AI Mode and Agent Mode—the former enabling more complex conversational queries and the latter proactively completing multi-step tasks (like house hunting or booking tickets)—have made some prior vertical AI products (such as Perplexity and Manus) feel less compelling.

Last updated