Mind GPT: The ""super brain"" of automotive AI
Li Xiang regards Mind GPT as the core of Li Auto’s AI strategy. As of January 2025, Mind GPT had undergone multiple iterations since December 2023, been installed in more than 500,000 vehicles, processed 120 million interaction requests daily, and performed tasks with the accuracy of 98.7%. Li Xiang proposed the goal of making Mind GPT rank amog the top three in the industry in the next few years.
On January 16, 2025, Li Auto officially started pushing OTA 7.0. The biggest highlight is that Mind GPT in “Lixiang Tongxue” was upgraded to Mind GPT-3o. Since the first version came out, Li Auto's Mind GPT has undergone multiple iterations and now boasts an intelligent agent from perception to cognition to active expression.
In April 2023, Mind GPT 1.0 was released;
In December 2023, Mind GPT 1.0 landed on vehicles with OTA 5.0 and was listed in the national foundation model registration. It is one of the earliest automotive large language models;
In August 2024, Mind GPT 2.0 was launched, featuring language understanding, Q&A, and logical reasoning. The model architecture is consisted of Mixture of Experts (MoE) and Transformer, doubling the model size;
In January 2025, Mind GPT-3o was pushed via OTA 7.0. It can listen, see and remember (connected with Face ID and family accounts, it can remember the preferences and requirements of family members), excel in reasoning (it can understand and dismantle complex problems, and intuitively display the process of perception and thinking through animation, that is, “Lixiang Tongxue Workflow”), express (it has a more human voice and supports dozens of modal particles; it can conduct more colloquial conversations; it is versatile and can sing and imitate animal sounds), and use 300+ tools (such as traffic restriction inquiry, Meituan, etc.).
As a powerful multi-modal end-to-end foundation model, Mind GPT-3o can understand multiple modalities such as speech, vision, and language, and provide feedback in hundreds of milliseconds. It can accurately perceive external information, deeply understand and express it naturally and accurately within a single architecture, building a complete and coherent intelligent processing system. Empowered by Mind GPT-3o, “Lixiang Tongxue” has achieved a qualitative leap in intelligent interaction and other aspects. “Lixiang Tongxue” has been comprehensively improved in terms of memory, planning, tools, and expression skills. It can complete tasks, improve cognition, and provide emotional companionship for passengers in vehicles.
In terms of intelligent interactive experience, “Lixiang Tongxue” has excellent perception and a wealth of knowledge. Even outside the cockpit, it can accurately answer questions related to various locations, and can keenly perceive animals, plants, cars, paintings, etc., and even know the surrounding buildings and geographical information.
At the same time, “Lixiang Tongxue” is not only a smart assistant, but also a considerate family member. It can accurately identify you and your family, remember everyone's preferences and special requirements, make every trip easier and more convenient, and truly make you the center.
In daily life, it can quickly check information about traffic restrictions, clearly present schedules, and filter high-quality restaurants on Meituan according to your tastes and needs. Not only that, it can also provide timely information on popular local events.
Cockpit evolution: from ""function stacking"" to ""active foresight""
With the empowerment of Mind GPT-3o, the smart space has achieved all-round evolution via OTA 7.0. The newly upgraded RGB+IR visual module and rich multimodal information input allow “Lixiang Tongxue” to not only understand the instructions of users, but also see the situation in vehicles. The function enables “Lixiang Tongxue” to better understand the intentions of users. For example, when passengers in the vehicle are discussing a certain scenic spot, “Lixiang Tongxue” can quickly provide users with relevant information about the scenic spot, including introductions, navigation, etc. through visual recognition and voice analysis.
With the powerful cognitive capabilities of Mind GPT, “Lixiang Tongxue” can also become a 24-hour travel assistant, car assistant and entertainment assistant for the family. It can plan the best route in advance according to the user's daily travel habits and remind traffic information in real time. In terms of car use, AI Task Master adds an individual task, such as ""Close the sunshade when parking for a while."" It also supports extended services, such as ""Turn on the front air conditioner for ten minutes.""
For entertainment, “Lixiang Tongxue” can recommend suitable music, movies and other entertainment content according to the user's preferences. When the user wants to listen to a certain type of music, “Lixiang Tongxue” can quickly select songs that suit the user's taste from the massive music library and play them.
The “Lixiang Tongxue” APP goes online, allowing Lixiang AI to spread from IVI to mobile phones, homes and other scenarios, providing general AI services such as Q&A, visual recognition (like menus, animals and plants).
Advancement of Mind GPT-3o
The advancement of Mind GPT-3o stems from Li Auto’s full-stack independent R&D, scenario-based deep customization, hybrid deployment architecture, and ecological collaboration.
1.Multimodal end-to-end architecture: the industry's first full-link integration model
The ""multimodal end-to-end integration"" architecture is the most distinctive feature of Mind GPT-3o. Unlike traditional automotive foundation models that rely on modular stacking (such as independent speech recognition and image processing modules), Mind GPT-3o achieves deep integration of speech, vision, and language understanding through a single model. The complete link from perception to cognition to expression is fulfilled in a closed loop in one model. This design greatly reduces system latency (response in hundreds of milliseconds) and reduces information loss caused by multi-module collaboration.
2.Data scale and rapid iteration.
Mind GPT-3o is trained on the basis of 3 trillion tokens of diverse data, covering multiple dimensions such as user habits, road scenarios, and voice interactions, far exceeding the industry average. OTA updates happen frequently. 17 iterations were completed via OTA in 2024, with an average cycle of 19 days. It responds quickly to user feedback and continuously optimizes functions (such as RedNote content call, multi-modal instruction integration), forming a closed loop of ""user feedback - model optimization - experience upgrade"".
3.Collaborative optimization of smart driving and smart cockpit
End-to-end architecture collaboration: Mind GPT-3o is deeply coupled with the smart driving system. For example, in scenarios such as highway toll stations and roundabout traffic, vision language models (VLMs) are used to assist end-to-end model decision-making to achieve anthropomorphic driving (such as automatically selecting ETC lanes and dynamically overtaking). However, the cockpits and smart driving systems of traditional OEMs are mostly independent modules with weak collaboration.
All-scenario coverage: It supports ""D2D"" navigation, AI inference visualization and other functions. Combined with 2.9 billion kilometers of intelligent driving data, it forms a complete closed loop from perception to decision-making, and has better mobility ecosystem integration capabilities than competitors that only focus on cockpit interaction.
4.Full-stack self-developed architecture and in-depth scenario customization
Technical path: Li Auto is the first OEM that launched a fully self-developed multi-modal cognitive model. It uses the self-developed Taskformer neural network architecture to achieve unified feature representation of multi-modal data such as voice, vision, and text, avoiding system fragmentation that relies on third-party models.
Scenario focus: It is deeply optimized for the automotive environment, covering 111 fields and more than 1,000 exclusive capabilities (such as semantic understanding, multilingual communication, fuzzy perception, etc.), especially in spatial command execution in family scenarios (such as air-to-air control of the rear screen, voiceprint recognition) and personalized services (such as children's mode, holiday greetings).
5.Flexibility of cloud + edge hybrid deployment
Computing efficiency: Through the inference load distribution of cloud GPT and edge NPU, its dependence on hardware computing power is reduced, so that old vehicle models can also run foundation models smoothly. Compared with the edge-only deployment of some competitors (relying on 8295 and Orin), it has stronger compatibility.
Balance between privacy and performance: Critical tasks (such as navigation and vehicle control) are processed by the edge to ensure privacy, and complex tasks (such as Q&A, entertainment) call cloud computing power to improve the experience, taking into account both efficiency and security.
6. Complex task processing: an “active assistant” that goes beyond command execution
The breakthrough of Mind GPT-3o is to upgrade the foundation model from a “question-answering tool” to a “task planning hub”. Its core capabilities include:
Complex problem dismantling: For example, if a user puts forward a vague requirement like ""a family outing on the weekend"", the model can automatically decompose it into sub-tasks such as route planning, attraction recommendation, diner reservations, weather, etc., and coordinate the automotive system with external APIs (such as Meituan and AutoNavi) to complete these sub-tasks.
Tool chain integration: It has 20+ built-in vertical scenario tools such as traffic restriction query, calendar management, fault diagnosis, etc., and supports third-party service expansion.
Learn how to effectively navigate the market research process to help guide your organization on the journey to success.
Download eBook