Essay Collection — AI

CES concluded, and this year Chinese manufacturers returned to attend. The consumer electronics scene was noticeably livelier with more hot topics. The two trends that impressed me most were AI and energy storage devices.

AI is no longer just a remote cloud service; it is being deeply integrated into consumer electronics. AIPC, AI for ALL—every vendor is placing strategic bets on AI. 2024 is shaping up to be the inaugural year for AI-enabled endpoints, and consensus is growing. The Rabbit R1 is a breakout new hardware form. Traditional phone and PC manufacturers will surely follow, but I am more optimistic about leading niche players like iFlytek and DJI.

Energy storage devices are maturing and entering the consumer market. Historically energy has been supply-driven, with consumers having little choice—think structural reforms, grid reforms, and price reforms. Energy storage can bring change at the consumer end, increasing the use of green energy and enabling household-level energy solutions. For example, store green energy during off-peak times and use it during peak times to shave peaks and fill valleys. This both addresses the intermittency of renewable generation and optimizes power costs. It will likely remain in an early-adopter phase for a long time—like NAS, which after many years remained mostly a niche product for enthusiasts. Consumer applications of energy are interesting: you could see how much green energy you use each day; with grid-connected generation, you could see how many people’s electricity demand you supplied and how much carbon you reduced. Household-level carbon neutrality could even create new trading markets.

I recently read This Is ChatGPT. Liu Jiang’s foreword is excellent.

First, a bit about the book’s author Stephen Wolfram—he is a contemporary eccentric worth knowing. As a child he scorned the "dumb books" schools recommended and struggled with arithmetic, so teachers initially thought he was hopeless. Yet by 13 he had written several physics books, by 15 he published a high-energy physics paper that was cited five times, and at 20 he earned a PhD from Caltech. He later received a MacArthur Fellowship, the youngest recipient at the time, and worked at the Institute for Advanced Study in Princeton studying cellular automata, becoming one of the founders of complexity science.

Learnability and computational irreducibility form a core tension in his arguments. The arrival of GPT created the illusion that computers suddenly became extraordinarily capable. In reality, tasks that are hard to compute remain hard; the essential tension between learnability and computational irreducibility has not changed. It’s just that tasks like writing, which we once thought were beyond computers, turned out to be relatively shallow in computational depth and therefore easier than we imagined.

He views ChatGPT’s success as an important scientific fact, showing that we can still hope to discover significant new "language laws"—or rather "laws of thought." If we can make those laws explicit, we might perform what ChatGPT does in more direct, efficient, and transparent ways. This is GPT’s technical advantage and part of why symbolic NLP has struggled to satisfy expectations. I recently saw an interview with Fei-Fei Li in which she suggested humanity may once again be at a pre-Newtonian moment—the eve of understanding these laws.

Wolfram studies the foundational logic of nature and society. Early on he designed a language called the Wolfram Language. Traditional programming languages precisely tell a computer what to do; the Wolfram Language represents and processes human thought computationally. It is a comprehensive computational language intended to discuss anything in the world in computational terms—a language aimed at enabling both humans and machines to "think in computational terms."

Wolfram turned his research into Wolfram|Alpha, a web-accessible system where users input natural-language queries and receive precise, detailed answers from a vast knowledge base, algorithms, and rules. For tasks like mathematical computation, its answers are far more reliable than ChatGPT’s. Many consider it the first truly practical AI technology and worth experiencing.https://www.wolframalpha.com/It currently supports English only. When the book was written GPT-4 did not yet exist; Wolfram already wanted to combine the two: first run a query through Wolfram|Alpha to produce analysis grounded in accurate knowledge and facts, then feed that into GPT to generate a more accurate, natural answer. You can now find a Wolfram plugin on GPT-4, making it easier to leverage both systems.

Finally, reading this book made me find the author as fascinating as ChatGPT. The life of a prodigy like Wolfram feels both familiar and strange. After a few years’ slowdown during the pandemic, technology has resumed rapid acceleration and continues to advance at a breathtaking pace. Less than a month after returning from the holiday, Sora and Claude 3 appeared; now there are breaking news items nearly every month. Our mysterious Eastern land has never lacked talent, but when will another genius emerge? I’m reminded of Qian Xuesen’s question and Joseph Needham’s dilemma.

Wolfram offered the following suggestions for future work and learning—let’s take them to heart.

The most efficient approach is to discover new possibilities and define what is valuable to you.

Shift from answering existing questions to learning how to pose questions and determine which questions are worth asking—that is, move from knowledge execution to knowledge strategy.

Breadth of knowledge and clarity of thought will become important.

Directly learning every detail is no longer necessary: we can operate at a higher level, abstracting many specifics. "Integrate," rather than narrowly specialize. Think as broadly and deeply as possible and draw on as many knowledge domains and paradigms as you can.

Learn to use tools to get things done. In the past we relied more on logic and mathematics; going forward pay special attention to computational paradigms and the modes of thought directly tied to computation.

Tencent Technology recently ran a series called "Reviewing China’s Large Models," currently featuring interviews with Yang Zhilin, Zhu Xiaohu, and Wang Xiaochuan.

The interview format works well: each person speaks individually in a setting, without heated confrontations, and responses are rational defenses after some reflection. The collisions of viewpoints, divergent worldviews, and diverse perspectives are very interesting.

Yang Zhilin’s core view is that universalization and personalization can be achieved through long context plus scaling laws. Long context is like ever-growing memory that records your entire history and preserves a vast amount of digitized world information, modeling all questions within that framework. Scaling law means that with sufficient compute, performance keeps improving.

Zhu Xiaohu represents the market-faith faction. He argues that the steep technical curve of large models will eventually slow and that open-source models will catch up to closed-source ones as the technology diffuses. He advises deploying "sufficient AI capability" into monetizable commercial scenarios and using China’s vast, unique data to build moats. B2B is currently the best monetization scenario; B2C is too costly right now.

After reading the first two interviews, Wang Xiaochuan felt they were like blind men touching an elephant or a pony crossing a river—unable to see the whole picture. Technology looks far ahead; business looks close up. The issue now is not distance but completeness. He’s long been thinking about turning life into mathematical models, and GPT is a powerful tool—that was his motivation to start a new company last year. Using GPT to build virtual worlds, life worlds, and real worlds, and finding tech-product touchpoints in gaming, healthcare, and productivity—that’s TPF. He is conservative about Sora, viewing it as a phase product, not on the same track as GPT and not language-centered. Sora is just a simulator and cannot form a world model.

Whether by design or not, the order of those interviews is clever: from belief in technology, to belief in market, to belief in worldview. I wonder whether a different order would tell a different story.

Two examples of technologies driving industrial consolidation—i.e., choosing a diffusion path that yields the most effective development.

Last week Stanford HAI published an AI trends report; chapter four on the economy included an interesting data set. Overall private investment and jobs in AI have fallen over the past three years, even though investment and roles in generative AI have increased. The aggregate decline seems counterintuitive and unlike previous tech waves; it reflects an intensification in how the technology diffuses. There are two visible reasons: first, technical consensus across academia, industry, and research has become clear, converging on LLM routes exemplified by GPT, and speeding evolution from task-specific AI (TSAI) toward AGI. Second, this wave of startups requires high costs and resources, which seems to favor Big Tech rather than disrupt them—at least for now.

Last week, while attending a meeting at the Agricultural Bank, I met our privacy-computing team in person for the first time. I could sense that banks have embraced centralized, efficient marketing; they are done with scattershot spending. Whoever can clearly evaluate returns and account for costs will win marketing budgets. This is of course driven by technological progress; privacy computing is an important vehicle for precise marketing. China’s privacy-computing development has been heavily influenced by Ant’s "Hanyu," which has nearly single-handedly led the industry. Treating open source as a key mechanism for technology diffusion may be their largest commercial move.

Last week we identified a strong use case for LLM-driven performance integration. In B2B super-bill scenarios there is real value for large models, and our platform’s engineering capability for building intelligent apps can lower the startup cost for merchants exploring large models.

I tried the platform’s intelligent app builder; product-wise it competes with Coze. Using the platform’s workflow capabilities, you can freely combine LLMs, HTTP calls, knowledge bases, code execution blocks, and other components to rapidly assemble needed intelligent apps. It’s currently in the third MVP stage; for customer-facing online service scenarios it still needs further research and validation.

Our super-bill brainstorming last week yielded a concrete iteration consensus. One: data must be accurate and complete. Two: bookkeeping operations should be convenient with decreasing marginal cost. Three: bill clustering should be reasonable with sufficiently fine-grained categories. Four: income-and-expense analysis should guide operations and have the capability to drive financed services. There are two opportunities to introduce large models now. One is narrative summaries for revenue analysis—though seemingly a simple sentence, these are currently implemented with complex hard-coded rules. Two is reducing bookkeeping operation cost: chatbot interactions suit bookkeeping well, enabling multimodal input—text, voice, images—to lower user effort. If we additionally launch a WeChat Official Account for quick notes, the bookkeeping entry could be upgraded from a secondary merchant-side feature to a primary WeChat entry. There’s an Android app called XiaoGuai Accounting that has implemented something similar.

Last week I talked with Huiling about the intelligent app-building platform, and her insights into mid/back-office product structure trends in the LRP opened my eyes. The trend moves from page-based interactions toward LLM interactions. I might put it this way: "Why must your next front-end app be a page?" The final shape of a purely LLM-driven app is still uncertain, but interaction should be fully NUI-based. Previously you delivered a set of pages to operations, and launching a marketing campaign required multiple configuration steps with an SOP for operators. A product built around LLM interaction should let operations describe campaign configuration tasks clearly, then check and confirm to complete them.

After switching to Huawei, installing the latest Silicon Valley tech became difficult, so I reluctantly set up a US Apple ID on an iPhone specifically for foreign product hunts. Over the past two weeks of switching between two phones, one feeling emerged: on the A/B sides, pure online digital products—the A-side apps—still have a gap compared to the West. On the B-side applications we have an absolute advantage.

Two examples: I only learned last week that Perplexity is the progenitor of RAG products on the market; it remains simple and direct, unlike later C2C products that add flashy distractions and disrupt user flows. The second example is Arc. Arc’s mobile release this year has been highly praised; I heard iOS 18 will copy some of its designs. "Browser for Me," pinch-to-summarize, call integration—all are new mobile browsing interactions. Arc was created by The Browser Company in NYC; I vaguely recall someone recommending Arc’s desktop version to me around the 2023 Spring Festival (it was invite-only then). I greatly respect the company—watching them has shown me what user-driven innovation looks like, and I believe they are truly rethinking and redefining the browser.

Apple used a pun—Apple Intelligence—to redefine AI. Apple Intelligence centers on on-device models; the long-discussed "edge intelligence" really seems to have reached a turning point.

Although the use cases Apple showed at WWDC didn’t look especially radical, Apple’s app-first approach is clever and more likely to be deeply perceived by users. For instance, the new Calculator on iPadOS—handwritten formula recognition, calculation, and visualization—was demonstrated smoothly in three minutes; I watched in astonishment.

The advantage of integrating software and hardware was released once again.Apple’s integration is not just between product and OS but includes underlying chips and developer toolchains. On-device models have system-level permissions and data access, enabling on-device fine-tuning using personal data. Balancing compute, data, models, and security is required, and software-hardware integration appears to be the only viable approach to do this well; prioritizing privacy and security is essential for real deployment.

Might there be a ModelStore after the App Store?As edge compute grows, models for endpoints may come from more than one vendor—say, a developer could choose a popular pre-trained model for programmers. Build the compute and Intent API infrastructure and create a new platform for a model marketplace: use proprietary models to open scenarios and stimulate community creativity.

Whether it succeeds will ultimately depend on whether the community can thrive.Following Apple’s playbook, the hype serves to ignite developers and attract more top talent. Apple’s appeal to global developers remains the strongest, which makes me think of Huawei—this may be Huawei’s hardest gap to close. Huawei has made early wins in software-hardware integration, but it has a long way to go to show developers the same level of respect and to enable them to earn a dignified living like WWDC does.

Endpoint AI’s development is encouraging. At FEDAY last November I thought engineering deployment would take two to three years; I didn’t expect Apple and Google to propose ecosystem-level solutions just six months later.

At Google I/O 2024 Jason Mayes shared a new Web AI theme. He previously promoted TensorFlow.js and now leads Web AI. Web AI is built on WebAssembly and WebGPU. WebNN wasn’t mentioned, suggesting WebNN’s development hasn’t met expectations—NPU-equipped hardware remains scarce. Visual Blocks is an ML workflow builder where you can create Web AI-based apps; in other words, you can build on-device AI workflows without code for tasks like translation, background removal, and text classification. Chrome has begun implementing parts of Web AI—Chrome 127 Canary can already run Gemini Nano.

On-device AI’s advantages remain privacy, offline low latency, and low cost. Jason gave a vivid example of cost reduction.

Take video conferencing: many conferencing apps offer background blur or removal to protect user privacy.
Cameras typically generate video at 30 frames per second. Assuming an average meeting length of 30 minutes, that’s 54,000 frames needing background blur processing.
If there are 1 million meetings per day, that amounts to 54 billion processing operations daily.
Even if each operation costs an extremely low 0.0001 cents, that still implies $5.4 million per day, or roughly $2 billion per year in server-side GPU compute costs.
Performing background blur on the client via Web AI eliminates those costs.
You can also port other models to the browser, such as background noise removal, improving meeting quality at very low cost.

Let’s discuss large models’ impact on programmers.

China’s autonomous driving industry is undergoing a milestone shift. The technical route is moving from rule-based automated driving to end-to-end, mapless intelligent driving based on large models. Previously, autonomous driving teams looked like labor-intensive industries, because driving systems were built from perception, planning, and control modules; aside from perception’s CV tasks, planning and control often meant designing rules case by case and hard-coding them. After the shift, fewer programmers will be needed to implement rules, which partly explains the current boom and the associated layoffs. Musk once tweeted that after Tesla adopted a new FSD, C++ code went from 300,000 lines to 3,000—99% of the code disappeared. End-to-end driving faces two challenges: data volume—scaling laws require massive data to exceed human drivers, and only Tesla likely has that; and safety and regulation—parameters’ effects on driving are hard to explain, so legal tolerance will be needed early on for controllability and auditability.

Someone named lapurita posted on Reddit about achieving tenfold development speed using Claude Sonnet 3.5. I haven’t figured out who lapurita is, but they seem influential. The post praises Claude Sonnet 3.5 for coding and claims it clearly outperforms ChatGPT-4. I recently spent over a month doing a tech refactor, essentially experimenting with human-machine collaboration—deciding which tasks suit models and which to do yourself. I collaborated exclusively with Claude without using Google once. I resonate with that post: using Claude for coding is addictive. Large models gave me more time and choices, freeing me from trivial, low-value syntax chores and allowing more time for design and optimization—strategic programming. I strongly believe our workflows and delivery processes will change soon, and competition between people will shift accordingly.

I’ve discussed large models’ potential productivity impact before. A month or two later, polished products that deeply integrate large models with programming have emerged.

TL;DR: you must try Cursor.

For progress in coding, watch this product combination: Claude + Cursor + Vercel. We may be crossing the chasm from Early Adopters to the Early Majority. The potential effects remain on productivity and competitive strategy at both individual and organizational levels. In exploratory businesses, speed can be decisive—trial-and-error and rapid learning are business strategies. Whoever reaches PMF first gains first-mover advantage. If R&D costs drop below a critical threshold and productivity reaches a new level, the "fast" strategy will deepen and may produce structural changes in competitive strategy.

Claude 3.5 Sonnet has firmly established itself as the No. 1 for code logic and reasoning. Almost every leading AI coding product is switching its base model to Sonnet, including Cursor. Programmers are relatively expensive productive resources, so Claude targets areas with the highest leverage. Early adopters of large models are mostly programmers focusing on code logic and reasoning, which also helps win developer communities and influence application-level choices. I recall Stripe used a similar strategy in payments.
Cursor is the hottest deeply integrated AI IDE and has received strong community praise. After adopting Sonnet, many Copilot users migrated to Cursor. Compare with Copilot: Copilot is a plugin designed primarily for code completion and suggestions, not to deeply participate in the entire development workflow. IDE vendors integrating Copilot lack deep customization, so experiences vary, and contractual ties prevent switching base models to Claude. Cursor follows an IDE approach—essentially a heavily modified VS Code—optimizing interface and interaction for developer intuition. To maximize a model’s usefulness, it must understand your project; Cursor excels at handling project-wide context. Besides Cursor, Claude released a CLI tool, Claude Engineer, which offers direct file-system operations, script execution, and web search—more convenient than an IDE in some scenarios.
Vercel v0 is a highly talked-about code-generation tool that can produce UI and code for stacks like TS, React, and Next.js. The hype stems from the generated code being increasingly production-ready—i.e., code programmers can continue from. Claude also has a similar feature called Artifact that generates UI and code. Vercel v0 feels closer to a full solution. Vercel’s role in the front-end ecosystem is growing; Next.js and Turbopack are theirs. Vercel isn’t just offering tools or frameworks but building an ecosystem covering the front-end app lifecycle from development to deployment and operations. Vercel positions itself as a front-end infrastructure cloud vendor monetizing through cloud services. It has also progressed in AI apps, using AI to simplify and accelerate development. Besides v0, it offers an AI SDK for building AI web apps in TypeScript.

This year I refined the overall quality of my information sources and had some reflections on information distribution.

Internet information distribution has gone through four stages—portals, search, subscription, and recommendation—with increasing distribution efficiency.

A landmark event in the shift from subscription to recommendation was Google shutting down Google Reader ten years ago. Google Reader was an RSS reader; RSS required users to be skilled at organizing information and constantly managing feeds, serving only a minority. After that, both domestically and internationally, content consumption shifted to machine recommendation.

Current recommendation mechanisms cannot leapfrog the quality of sources; finding good sources still requires traditional methods. Compared with ten years ago, the asymmetries from information cocoons make it unclear whether people have become more open or more narrow. Information silos create harder challenges for source management because many sources don’t offer direct subscriptions and are scattered across platform "Follows." Abroad you still see some newsletters and RSS subscriptions; platforms like Substack let readers and writers interact directly by offering technology and services. But such distribution methods are not mainstream.

LLMs are already exerting large-scale influence on information distribution and may usher in the next stage. What comes next is uncertain. As LLMs sink into the OS layer, they may deconstruct current platform distribution logic. As data becomes an asset, people might supply content to AIs.

I’ll share an app I recently used called Ground News. Conceptually it may partly represent the future; LLMs are not yet decisive in it. Ground News is designed to combat information asymmetry by supplementing perspectives across political leanings and avoiding extremes. It evaluates each news item on two dimensions: Bias and Factuality. Bias measures political leaning—based on the outlets reporting the story, it calculates left-right proportions. If an item is mostly reported by right-leaning outlets, readers who follow left-leaning outlets should pay attention or risk missing it. Factuality measures truthfulness; Ground News doesn’t fact-check every story but assigns a factuality percentage to each outlet based on its past reporting—whether it cites reliable sources and whether it corrects errors promptly.

Previous2024 NextEssay Collection — Casual Talks

Last updated 1 month ago