Essay Collection — AI

CES wrapped up, and with Chinese companies back in attendance this year, the consumer electronics scene was noticeably livelier with more hotspots. The two trendlines that left the strongest impressions on me were AI and energy storage devices.

AI is no longer just a remote cloud service; it is deeply integrated into consumer electronics. From AIPC to AI for ALL, every vendor is placing strategic bets on AI. 2024 is the inaugural year for AI-enabled endpoints, and consensus on this is growing. The Rabbit R1 is an example of a breakout new hardware form. Traditional phone and PC makers will certainly follow, but I’m more optimistic about leading niche players like iFlytek and DJI.

Energy storage devices are maturing and starting to enter the consumer market. Historically energy has seemed supply-driven, with consumers having little choice—think energy structure reform, grid reform, and pricing reform. Energy storage can bring change at the consumer end, increasing the use of green energy and providing household-level energy solutions. For example, store green energy during off-peak times and use it during peak times to shave peaks and fill valleys. That addresses the instability of renewable generation and optimizes electricity costs. It will likely remain in an early adoption stage for a long time, much like NAS was for years and remained mostly a niche hobbyist product. The idea of consumer-level energy use is fascinating: you could see how much green energy you used each day; if you can generate to-grid, you could see how many people’s demand you supplied and how much carbon you reduced. Household-level carbon neutrality could also create new trading markets.


I recently read This Is ChatGPT. Liu Jiang’s foreword is excellent.

First, a bit about the book’s author Stephen Wolfram—he’s a contemporary prodigy worth knowing. As a child he scorned the “stupid books” recommended by schools and was poor at arithmetic, so teachers initially thought he wouldn’t succeed. Yet by 13 he had written several physics books, at 15 published a high-energy physics paper that was cited five times, and earned a PhD from Caltech at 20. He later won a MacArthur Fellowship—the youngest recipient—and worked at the Institute for Advanced Study in Princeton. He studied cellular automata and is one of the founders of complexity science.

Learnability and computational irreducibility are a central tension in his work. The arrival of GPT created the illusion that computers suddenly became vastly more capable. In reality, tasks that are hard to compute remain difficult; the essence of learnability and computational irreducibility hasn’t changed. It’s just that tasks like writing, which we once thought computers struggled with, are actually “shallow in computational depth” and turn out to be easier than we imagined.

He considers ChatGPT’s success an important scientific fact, showing we can still hope to discover significant new “language laws,” or more accurately “laws of thought.” If we can make those laws explicit, we could perform what ChatGPT does in a more direct, efficient, and transparent way. That is GPT’s technical advantage and explains why symbolic NLP has not reached satisfactory levels. Recently I saw an interview with Fei-Fei Li in which she suggested humanity might again be at a pre-Newtonian point—the eve of understanding these laws.

Wolfram studies the underlying logic of nature and society. Early on he designed a language called the Wolfram Language. Ordinary programming languages tell computers exactly what to do; the Wolfram Language represents and processes human thinking computationally. It is a comprehensive computational language that can describe anything in the world in computational terms, intended to be a language that lets both humans and machines “think in computational terms.”

Wolfram turned his research into a web-accessible Wolfram|Alpha system: users input natural language queries and receive precise, detailed answers from a vast knowledge base, algorithms, and rules. For tasks like mathematical computation, its answers are far more reliable than ChatGPT’s. Many regard it as the first truly practical AI technology you can try.https://www.wolframalpha.com/It currently supports English only. When the book was written GPT-4 did not yet exist; Wolfram already wanted to combine the two: first run a query through Wolfram|Alpha to get analyses rooted in accurate knowledge and facts, then feed that into GPT to produce a more accurate and natural answer. You can now find a Wolfram plugin for GPT-4, making it easier to leverage these two powerful AI systems together.

Lastly, reading this book made me find both the author and ChatGPT equally fascinating. The life stories of outlier talents like Wolfram feel both familiar and strange. After a few years of slowed progress during the pandemic, technology development has resumed at breakneck speed: in under a month after the New Year came Sora and Claude3, and now there’s breaking news almost every month. Our mysterious Eastern land has never lacked talent, but when will we produce more outliers? I kept thinking of Qian Xuesen’s question and the Joseph Needham problem.

Wolfram offers the following suggestions for future work and study—let’s take them as mutual encouragement.

The most efficient approach is to discover new possibilities and define what is valuable to you.

Move from answering questions to learning how to ask them and how to determine which questions are worth asking. In other words, shift from knowledge execution to knowledge strategy.

Breadth of knowledge and clarity of thought will become important.

Directly learning every detail is no longer necessary: we can learn and work at a higher level, abstracting away many specifics. “Integrate,” rather than specialize. Think as broadly and deeply as possible and draw on as many bodies of knowledge and paradigms as you can.

Learn to use tools to get things done. In the past we leaned on logic and mathematics; going forward we should pay special attention to computational paradigms and adopt thinking modes directly related to computation.


Tencent Tech recently ran a “Reviewing China’s Large Models” interview series, which so far includes pieces by Yang Zhilin, Zhu Xiaohu, and Wang Xiaochuan.

The interview format works well—each person speaks in their own domain without heated confrontations, and the responses are reasoned defenses after time for reflection. The clashes of viewpoint, divergent worldviews, and varied perspectives are very interesting.

Yang Zhilin’s core point is achieving generalization and personalization through long context and scaling laws. Long context is like increasingly large memory: it can record all your history and store vast amounts of digitized world information to model problems within that framework. Scaling laws mean that if you投入 sufficient compute, performance will improve.

Zhu Xiaohu represents the market-faith faction. The steep technical curve of large models will inevitably slow, and open-source models will catch up with closed-source ones as the technology spreads. “Sufficient AI capability” should be applied to commercial scenarios that can monetize quickly, using China’s massive and unique data to build moats. B2B is currently the most monetizable scenario; B2C is too costly right now.

After reading the first two interviews, Wang Xiaochuan felt they were like blind men describing an elephant or a colt testing a river—they couldn’t see the whole picture. Technology tends to look far; business tends to look near. The issue now is not distance but completeness. He has been thinking about how to turn life into mathematical models; GPT is a great tool, and that was his original motivation to return to entrepreneurship last year. Use GPT to build virtual worlds, life worlds, and the real world, and find product-technology fit in gaming, healthcare, and production efficiency. He holds a conservative view of Sora, seeing it as a phase product that’s not of the same kind as GPT and not language-centric. Sora is just a simulator and can’t produce a world model.

Whether intentional or not, the interview order is clever: from faith in technology, to faith in the market, to faith in the world. I wonder if changing the order would tell a different story.


Two examples of technologies driving industry consolidation: choosing a diffusion path for a technology that yields the most effective development.

Last week Stanford HAI released an AI trends report; chapter four on economics includes an interesting data set. Total private investment and jobs in AI have declined over the past three years, while investment and jobs in generative AI have increased. The overall decline is counterintuitive compared to past technological waves and reflects a concentrated diffusion of technology. Two reasons are visible: first, technical consensus across academia, industry, and research is clear, with the technical route converging toward LLMs represented by GPTs and accelerating the evolution from task-specific AI to AGI. Second, this wave of startups requires high costs and resources, which seems to favor Big Tech rather than disrupt them—at least for now.

Last week, while attending a meeting at the Agricultural Bank, I met our privacy computing team in person for the first time. I deeply sensed that banks have embraced efficient, outcome-focused marketing: they’ve moved away from scattershot spending. Whoever helps them clearly see returns and account for costs will win their marketing budgets. This is driven by technology, and privacy computing is an important vehicle for precise marketing. Domestic privacy computing development inevitably revolves around Ant Group’s AntTalk, which has led industry development almost single-handedly. Treating open source as a key means of technology diffusion may be their biggest commercialization move.


Last week I found a promising scenario for integrating LLM capabilities into a product. The business side has high-value use cases for large models in B2B super-bill scenarios, and the tech platform has the engineering capability to build intelligent applications; combined, they can lower merchants’ startup costs for exploring large models.

I tried the tech platform’s intelligent application builder; the product experience is comparable to Coze. Using the platform’s workflow capabilities, you can freely combine components like LLMs, HTTP, knowledge bases, and code execution blocks to quickly assemble intelligent applications. It’s currently in the third MVP phase and requires further research and validation for consumer-facing online service scenarios.

At last week’s super-bill brainstorm we reached concrete consensus on iteration directions: (1) data must be accurate and comprehensive, (2) bookkeeping operations should be convenient with decreasing marginal costs, (3) bill clustering must be reasonable with sufficiently fine-grained categories, and (4) income-and-expenditure analysis should guide operations and enable monetizable fund-flow services. There are two places to introduce large models: first, income analysis summaries—even a simple sentence is currently implemented with complex hard-coded rules; second, reducing bookkeeping operational costs—chatbot interaction is well suited for bookkeeping, allowing multimodal inputs like natural language text, voice, and images to lower user effort. If we also launch a WeChat public account for quick notes, the bookkeeping entry could upgrade from a merchant-side secondary entry to primary status in WeChat. On Android there’s an app called XiaoGuai Bookkeeping that already implements something similar.


Last week I talked with Hui Ling about the intelligent application builder platform; when she mentioned insights from the LRP about changes in middle- and back-office product structures, it opened my eyes. The trend is moving from most page-based interactions to large-model interactions. I’d put it this way: “Why must your next frontend application be a page?” What a purely LLM-based app will ultimately look like is still unclear, but interaction should be entirely based on NUI. For example, in the past you delivered a set of pages to operations, and launching a marketing campaign required N steps of configuration plus an operational SOP. A product based on large-model interaction should let operations express the campaign configuration tasks clearly, then check and confirm to complete them.


After switching to Huawei, installing the latest Silicon Valley tech became difficult, so I ended up using an iPhone with a US-region Apple ID just to access foreign product hunts. In the past two weeks of juggling two phones, I’ve noticed that in the A/B comparison, pure online digital products—the A-side apps—we still lag behind in several areas. On the B-side, we have absolute advantages.

Two examples: I only learned last week that Perplexity was the progenitor of RAG products on the market; it remains simple and direct, unlike later C2C products that clutter the experience with flashy features that scatter users’ attention and disrupt user flows. The second example is Arc: its mobile version launched this year and was widely praised; I heard iOS 18 will copy some of its designs. Browser-for-me features like pinch-to-summarize and call integration are new mobile browser interactions. Arc was created by The Browser Company in NYC—I vaguely recall being recommended the Arc desktop app during the 2023 Spring Festival (it was invite-only then). I respect this company; they’ve shown me what user-driven innovation looks like and seem to be rethinking and redefining the browser.


Apple used a homophonic wordplay—Apple Intelligence—to redefine AI. The core of Apple Intelligence is on-device models; the long-discussed “on-device intelligence” has truly reached an inflection point.

Although Apple’s WWDC demos didn’t seem particularly groundbreaking, they are highly practical and deftly implemented, making them easy for users to perceive deeply. For example, the new calculator on iPadOS—handwritten formula recognition, computation, and visualization—was a three-minute demo delivered with remarkable smoothness; I watched it with my mouth open.

The advantages of integrated hardware and software were once again on display.Apple’s hardware-software integration is not just about products and the OS; it includes the underlying chips and developer toolchains. On-device models have system-level permissions and data access, enabling fine-tuning on a user’s personal data. This requires balancing compute, data, models, and security; software-hardware integration seems the only viable solution for doing this well, and prioritizing privacy and security is essential for practical deployment.

Will there be a ModelStore after the App Store?As on-device compute grows, multiple vendors may provide models for endpoints. For example, if you’re a developer, you could choose a popular developer-pretrained model. By building compute and Intent API infrastructure, you can create a new platform for a model marketplace: use in-house models to open scenarios and spark community creativity.

Ultimately, success will depend on whether the community can thrive.Following Apple’s playbook, the self-congratulatory demos are meant to ignite developer enthusiasm and attract top talent. Apple’s global pull on excellent developers remains the strongest, which makes me think of Huawei—this may be the hardest thing for Huawei to catch up on. Huawei has made initial gains in hardware-software integration, but it has a long way to go to give developers the respect and decent monetization opportunities that WWDC-style engagement provides.


On-device AI development is encouraging. In November at FEDAY I thought engineering rollouts would take two to three years. Unexpectedly, in just over six months both Apple and Google have offered solutions to their ecosystems.

Jason Mayes shared new Web AI themes at Google I/O 2024. He previously promoted TensorFlow.js and is now the lead for Web AI. Web AI is built on WebAssembly and WebGPU. WebNN wasn’t mentioned, suggesting it hasn’t progressed as expected, since hardware with NPU configurations remains rare. Visual Blocks is an ML workflow builder for creating Web AI applications—essentially enabling no-code workflows for on-device AI tasks like translation, background removal, and text classification. Chrome has already implemented parts of Web AI: the Chrome 127 canary build supports Gemini Nano.

On-device AI’s advantages remain privacy, offline low latency, and low cost. Jason gave a vivid example about cost reduction.

Take video conferencing as an example: many platforms offer background blur or removal to protect user privacy.

Cameras typically generate video at 30 frames per second; for an average 30-minute meeting, that’s 54,000 frames that require background processing.

Assuming one million meetings per day, that amounts to 54 billion processing events per day.

If each processing event costs even a tiny 0.0001 cents, that still implies $5.4 million per day, or roughly $2 billion per year in server-side GPU compute costs.

By doing background blur in the client via Web AI, those costs disappear.

You can also port other models into the browser, such as background noise reduction, improving meeting experience at extremely low cost.


A note on large models’ impact on programmers.

China’s intelligent driving industry is undergoing a milestone shift. The technical route is switching from rule-based autonomous driving to end-to-end, mapless intelligence driven by large models. The previous smart-driving teams looked like labor-intensive industries because autonomous driving was implemented through perception, planning, and control modules; aside from perception, which involves some CV, planning and control were built by crafting rules and hard-coding them case by case. After the shift, you no longer need many rule-writing programmers, which explains the current boom and related layoffs. Musk once tweeted that after Tesla adopted the new FSD, the C++ codebase shrank from 300,000 lines to 3,000—99% of the code disappeared. End-to-end driving faces two challenges: first, the data challenge—scaling laws require massive data to exceed human drivers, and only Tesla currently meets that scale and designs systems fully around end-to-end models; second, regulatory safety—the impact of each parameter on driving behavior is hard to explain, so early legal tolerance is needed for controllability and auditability.

A Reddit user named lapurita posted about achieving 10x development speed using Claude Sonnet 3.5. I haven’t figured out who lapurita is, but they seem influential. In short, they praise Claude Sonnet 3.5 for coding, claiming it’s clearly better than ChatGPT-4. I recently went through over a month of technical refactoring, exploring human-machine collaboration: which tasks to give to large models and which to do ourselves. I collaborated entirely with Claude and didn’t use Google once. I resonate with that post—coding with Claude is addictive. Large models gave me more time and choices, freeing me from trivial, low-value syntax work to think about design and optimization, enabling strategic programming. I truly believe our workflows and delivery processes will change soon, and the dimensions of competition between people and teams will shift as well.


I’ve discussed large models’ potential productivity impact before. A month or two later, products that deeply integrate large models with programming UX have begun to emerge.

TL;DR: you must try Cursor.

For progress in coding with large models, pay attention to this product trio: Claude + Cursor + Vercel. They may now be crossing the chasm from Early Adopters to the Early Majority. The potential impact is still on productivity and competitive strategy, both individually and organizationally. In exploratory businesses, speed can be the decisive advantage: rapid iteration and cognitive capture are business strategies—whoever reaches PMF first gains first-mover advantage. If development costs fall below a threshold and productivity reaches a new level, the “speed” strategy will deepen and could cause structural changes in overall competitive strategy.

  1. Claude 3.5 Sonnet has firmly established itself as the No.1 in code logic reasoning. Almost every strong AI coding product is switching its underlying model to Sonnet, including Cursor. Programmers are relatively expensive labor; in a productivity revolution, Claude chose the domain with the highest leverage. Early adopters of large models are mostly programmers focusing on code reasoning ability, which also helps win the developer community and influence application-layer choices—similar to how Stripe deployed a competitive strategy in the payments space.

  2. Cursor is the hottest deeply integrated AI IDE right now, receiving widespread community praise. After switching to Sonnet, many Copilot users moved to Cursor. Comparing it to Copilot: Copilot is a plugin-focused approach originally designed for code completion and suggestions rather than deep involvement in the entire development workflow. IDE vendors integrating Copilot lack deep customization, leading to inconsistent experiences, and commercial ties prevent switching the underlying model to Claude. Cursor takes an IDE approach—it’s a heavily modified VS Code. It makes many deep UI and interaction optimizations that align with developer intuition. To make LLMs more useful, they must better understand your project, and Cursor excels at handling entire project context. Beyond Cursor, Claude also released a command-line tool, Claude Engineer, which offers direct filesystem and script execution plus web search—more convenient than an IDE in some scenarios.

  3. Vercel v0 is a popular code-generation tool that creates user interfaces and code for stacks like TS, React, and Next.js. Its appeal lies in producing code increasingly close to production standards—code that programmers can continue working on. Claude has a similar feature called Artifact that also generates UI and code, but Vercel v0 seems closer to a full solution in this space. Vercel’s role in the frontend ecosystem is growing: Next.js and Turbopack are Vercel’s, and the company aims to build an ecosystem covering the frontend application lifecycle from development and build to deployment and operations. Essentially Vercel positions itself as a frontend infrastructure cloud vendor monetizing via cloud services. It’s also progressing in AI-driven tooling to simplify and accelerate development. Besides v0, Vercel offers an AI SDK for building AI Web apps in TypeScript.


This year I refined the overall quality of my information sources and felt the impact on information distribution.

The internet’s information distribution has gone through four stages—portals, search, subscriptions, and recommendations—with increasing distribution efficiency.

A landmark event shifting subscriptions to recommendations was Google shutting down Google Reader ten years ago. Google Reader was an RSS reader; RSS required users to manage and organize sources actively, which suited only a small subset of users. After that, both domestically and abroad, content consumption moved into a machine-recommendation era.

Existing recommendation mechanisms cannot leapfrog the quality of sources; good sources still require more traditional searching. Compared to ten years ago, it’s unclear whether filter bubbles have made people more open or narrower. Information silos pose a tougher source-management challenge because many sources don’t offer direct subscriptions and are scattered across platform “follows.” Abroad there are still newsletters and RSS feeds, and platforms like Substack let readers interact directly with authors by providing technology and services, but such distribution models aren’t mainstream.

LLMs are creating large-scale effects on information distribution and may usher in the next stage. I don’t know exactly what will happen next. LLMs sinking into the OS layer could deconstruct existing platform distribution logic, and as data becomes an asset, everyone might start providing content to AIs.

One app I recently tried is Ground News. Conceptually I think it can partially represent the future; LLMs aren’t yet the decisive factor. Ground News is designed to counter informational asymmetry by presenting perspectives from different political leanings to avoid drifting into extremes. For each news item it breaks down two dimensions: Bias and Factuality. Bias indicates political leaning based on the outlets reporting the story and shows left-right proportions. If a story is mostly reported by right-leaning outlets, frequent left-leaning readers should pay attention or they’ll likely miss it. Factuality measures truthfulness: Ground News doesn’t verify each story directly but derives a factuality percentage for the reporting outlets based on past reporting—whether their stories had reliable sources and whether they corrected false information promptly.

Last updated