Automotive Voice Industry Report, 2025

Publisher Research in China

Published Feb 11, 2025

Length 340 Pages

SKU # RIC19599829

Description

Automotive voice research: high-level voice function installation rate significantly increases, automotive voice moves towards ""cognitive interaction""

1. Automotive voice installation rate exceeds 83%, high-level voice function installation rate significantly increases

From January to November 2024, installations of automotive voice systems reached 16.76 million units, with an installation rate of 83.3%. Compared to the full year of 2023, installations increased by 5 percentage points. By energy type, EREV (Extended-Range Electric Vehicle) had the highest installation rate for automotive voice systems, reaching 100% from January to November 2024. Typical models under this energy type include the Li Auto L series, AITO M series, and Deepal S series.

In terms of voice function, installations and installation rate for continuous dialogue, see-and-speak, and wake-up-free functions greatly increased in 2024.

For the see-and-speak function, from January to November 2024, its installations reached 4.66 million units, with an installation rate of 23%, an increase of 18 percentage points compared to the full year of 2023. By energy type, EREV had the highest installation rate at 92.1%, while fuel vehicles had the lowest at only 7.1%. By price range, the ""see-and-speak"" function had the highest installation rate in the over 500,000 RMB range, with representative models such as Zeekr 009, Yangwang U8, and NIO ES8. This range also saw the largest increase in installation rate, up by 48 percentage points. This also indicates a significant improvement in the intelligence level of automotive voice systems in 2024.

2. The cockpit accesses more ecological resources, voice assistants gain deep service capabilities
In the era of foundation models, a voice assistant that ""knows a lot and can serve"" relies more on the access to diverse ecological applications. For example, when users issue vague commands such as ""the car is almost out of power,"" ""I'm hungry,"" or ""what should I wear for the Chinese New Year,"" the voice assistant's response requires support from applications like maps, local life services, and online information.

In addition to common applications like AMAP, iQiyi, Tencent Video, NetEase Cloud Music, and QQ Music, Li Auto has implemented voice calls to Xiaohongshu (Little Red Book) platform content and launched a deeply customized voice skill for Meituan. For example, users can wake up Lixiang Tongxue to ask "" Chinese New Year outfits recommended by Xiaohongshu,"" ""find a Beijing travel guide on Xiaohongshu,"" or ""help me find a Cantonese restaurant on Meituan with an average price of 200 RMB and a rating above 4.5.""

3. Foundation model applications accelerate the development of automotive voice from ""command interaction"" to ""cognitive interaction""

Different from the previous command-based interaction, automotive voice systems empowered by foundation models have better capabilities in spoken language understanding, logical reasoning, knowledge Q&A, painting creation, and perceiving the vehicle's surrounding environment. For example:
XPeng's XGPT-powered Xiao P assistant has capabilities in spoken language understanding, logical reasoning, knowledge encyclopedia, painting & story & fairy tale creation, and recognizing objects around the vehicle.
Li Auto's Mind GPT-powered Lixiang Tongxue has fuzzy search capabilities, such as asking Lixiang Tongxue ""I forgot the name of a movie, there's a black pianist, do you know what it is?""; search by image description, where Lixiang Tongxue can read movie poster content and express it freely, allowing children who cannot read to choose movies by describing the poster.
Xiaoai Tongxue's application of foundation models also gives it the ability to understand and respond to vague commands. For example, it can recognize and respond to commands like ""Where is my phone?"", ""Turn off the lights at home"", ""What mountain is that ahead?"", and ""What car is that ahead?"".

Taking XPeng Motors as an example, XPeng Motors has developed its own XGPT (Lingxi) foundation model and integrated it into the voice system. Additionally, it has integrated the ZhiPu AI base foundation model and multimodal models, giving the voice assistant Xiao P stronger language understanding, image recognition, and generation capabilities, which can be linked with in-vehicle perception system and external environment.

4. AI foundation models become a must-have for OEMs to build intelligent automotive voice systems
By 2024, the number of brands equipping their intelligent cockpits with foundation models has significantly increased, with Chinese independent brands being the primary drivers of this trend. Some brands have already completed the development path from cooperative supply to joint R&D, and finally to independent research. For example, in January 2024, Geely applied Baidu's ERNIE Bot foundation model in its Galaxy L6. In the same month, Geely released its self-developed full-scenario AI foundation model—Geely Xingrui AI Foundation Model.

Based on the Xingrui AI Foundation Model architecture, Geely has also developed derivative models such as the Xingrui NLP Language Foundation Model and the Xingrui Multimodal Foundation Model. Among these, the Xingrui NLP Language Foundation Model is entirely self-developed by the Xingrui Intelligent Computing Center, with a total training data volume exceeding 3 trillion tokens. It includes an emotional module, enabling excellent logical reasoning and contextual memory capabilities, allowing for human-like emotional interactions.

In January 2025, Geely showcased its development path for an in-cabin intelligent assistant based on the Xingrui AI Foundation Model at CES 2025—moving from ""Assisted Intelligence"" to ""Agent Intelligence"" and finally to ""Autonomous Intelligence."" With the support of the foundation model, in-car assistant will evolve from ""accurately responding to commands"" to ""understanding the environment and autonomously completing tasks,"" and ultimately to ""possessing self-awareness and autonomous emotional capabilities.""

Chinese independent brands such as BYD, SAIC, Dongfeng, GAC, Changan, Chery, and emerging OEMs like NIO, Li Auto, XPeng, AITO, and Xiaomi have also implemented AI foundation models in automotive voice systems. As automotive intelligence enters its second phase, AI foundation models are gradually becoming a necessary option for building intelligent voice interaction systems.

Please Note: PDF E-mail from Publisher purchase option allows up to 10 users and does not allow printing or editing. This functionality will require a Global Site License.

Related Definitions
1 Overview of Automotive Voice Industry: 1.1 1 Overview of Automotive Voice Industry; Overview of Automotive Voice; Development History of Automotive Voice; Hierarchical Classification of Automotive Voice Interaction (1); Hierarchical Classification of Automotive Voice Interaction (2); Installation Forms of Foundation Model of OEMs; Key Participants in Foundation Model Installation; Enhancement Effects of Foundation Models on Cockpit Interaction; Pain Points in Automotive Voice Interaction Empowered by Foundation Models (1); Pain Points in Automotive Voice Interaction Empowered by Foundation Models (2); Pain Points in Automotive Voice Interaction Empowered by Foundation Models (3); Other Voice Interaction Technologies—Voiceprint Recognition; Comparison of Voiceprint Recognition Applications by OEMs; 1.2 Installation Status of Automotive Voice Systems; Overall Installation Status of Automotive Voice Systems; Installation Status of Advanced Automotive Voice Functions: Continuous Dialogue (1); Installation Status of Advanced Automotive Voice Functions: Continuous Dialogue (2); Installation Status of Advanced Automotive Voice Functions: See-and-Speak (1); Installation Status of Advanced Automotive Voice Functions: See-and-Speak (2); Installation Status of Advanced Automotive Voice Functions: Wake-up-Free (1); Installation Status of Advanced Automotive Voice Functions: Wake-up-Free (2)
2 OEM Applications of Automotive Voice Systems: Summary of OEM Automotive Voice Interaction Functions (1); Summary of OEM Automotive Voice Interaction Functions (2); Summary of Foundation Model Applications by OEMs; 2.1 SAIC; Application of IM Foundation Model in Automotive Voice Systems; Benchmark Model for Voice: IM L6; Benchmark Model for Voice: Rising F7; Details of Voice OTA Updates; 2.2 BYD; Enhancement of In-Car Interaction by Xuanji AI Foundation Model; Benchmark Model for Voice: Denza Z9 (1); Benchmark Model for Voice: Denza Z9 (2); Details of Voice OTA Updates; 2.3 Changan Auto; Deepal DEEPAL OS 3.0 Cockpit Interaction Capabilities; Avatr Harmony Cockpit Interaction Capabilities; Enhancement of Cockpit Interaction by Changan Xinghai Foundation Model; Benchmark Model for Voice: Deepal G318; Benchmark Model for Voice: Avatr 07; Details of Voice OTA Updates; 2.4 GAC; AI Foundation Model Platform; Application of AI Foundation Model Platform in Intelligent Cockpit; Benchmark Model for Voice: AION RT; 2.5 Geely Auto; Global AI Technology System for Intelligent Vehicles (1); Development Path of In-Car Intelligent Assistant Under the Global AI Technology System; Application of Geely Xingrui AI Foundation Model in In-Car Assistants; Architecture of Geely Xingrui AI Foundation Model; Cooperation of Geely's Foundation Models; Zeekr Kr Foundation Model (1); Zeekr Kr Foundation Model (2); Structural Design of Zeekr Foundation Model; Flyme Auto Voice Interaction Capabilities; Benchmark Model for Voice: Geely Galaxy E8; Benchmark Model for Voice: Zeekr 7X; Details of Voice OTA Updates; 2.6 Jiyue; AI Foundation Model Cockpit (1); AI Foundation Model Cockpit (2); Voice OTA (1); Voice OTA (2); Voice OTA (3); Benchmark Model for Voice: Jiyue 07; Details of Voice OTA Updates; 2.7 NIO; ONVO Intelligent Cockpit Interaction Capabilities; AI Foundation Model; NOMI GTP Interaction Framework; NOMI GTP Deployment Hierarchy; Nomi Module Design; Nomi Multimodal Perception Processing Flow; Nomi Command Reception + Scenario Understanding and Response Flow; Nomi Module Design—Command Distribution; NOMI Emotional Interaction Capabilities (1); NOMI Emotional Interaction Capabilities (2); Nomi Multimodal Perception and Interaction Capabilities; Nomi GPT Callable Scenarios; NOMI Vehicle Control Capabilities; Benchmark Model for Voice: NIO ET5T; Details of Voice OTA Updates; 2.8 XPeng Motors; AI Xiao P; AI Underlying Capabilities; Enhancement of Voice Assistant Xiao P by XGPT; Remote Voice Vehicle Control Capabilities; Benchmark Model for Voice: XPeng P7i; Details of Voice OTA Updates; 2.9 Li Auto; Empowerment of Cockpit Interaction by Mind GPT (1); Empowerment of Cockpit Interaction by Mind GPT (2); Empowerment of Cockpit Interaction by Mind GPT (3); Module Design of Lixiang Tongxue Empowered by Mind GPT; Benchmark Model for Voice: Li MEGA Ultra; Application Scenarios of Mind GTP in Lixiang Tongxue; Skills of Lixiang Tongxue (1); Skills of Lixiang Tongxue (2); Skills of Lixiang Tongxue (3); Skills of Lixiang Tongxue (4); Details of Voice OTA Updates; 2.10 Leapmotor; Empowerment of Voice Assistant ""Xiao Ling"" by Alibaba Cloud's Tongyi Foundation Model; Benchmark Model for Voice: Leapmotor C16; 2.11 Xiaomi; Foundation Model Installation; Empowerment of Xiaoai Tongxue by Foundation Model (1); Empowerment of Xiaoai Tongxue by Foundation Model (2); Architecture Design of Xiaoai Tongxue; Module Design of Xiaoai Tongxue; Core Technology of In-Car Xiaoai Tongxue; Benchmark Model for Voice: Xiaomi SU7; Introduction of Foundation Model in Xiaoai Tongxue OTA; Details of Voice OTA Updates; 2.12 Harmony Intelligent Mobility Alliance (HIMA); Voice Highlights of HIMA Models (1); Voice Highlights of HIMA Models (2); 2.13 BAIC; AI Agent Architecture; Benchmark Model for Voice: ARCFOX αT5; 2.14 Volkswagen; Cockpit Foundation Model; Cooperation with Baidu on Voice Model Installation; Benchmark Model for Voice: Volkswagen ID.3; 2.15 Other Models; Benchmark Model for Voice: Voyah Zhuiguang; Benchmark Model for Voice: WEY Blue Mountain
3 Automotive Voice Suppliers: Summary and Comparison of Automotive Voice Supplier Solutions (1); Summary and Comparison of Automotive Voice Supplier Solutions (2); 3.1 iFlytek; Intelligent Cockpit Business Performance; Overview of Basic Voice Capabilities; Foundation Model Product System; Development History of Spark Foundation Model; Upgrade Content of Spark Foundation Model 4.0; Core Capabilities of Spark Foundation Model; Deployment Solutions for Spark Foundation Model; Vehicle Assistant Based on Spark Foundation Model; Spark Voice Foundation Model; Compatibility of Intelligent Vehicle AI Algorithm Chips; Multimodal Fusion Capabilities of Spark Foundation Model; Empowerment of Cockpit OS by Spark AI Capabilities; Low-Compute Deployment Solutions for Voice Algorithms; Case of Low-Compute Deployment for Voice Algorithms; Core Capabilities of Interaction Foundation Model; Application of Spark Foundation Model in Intelligent Cockpit; Summary of Experience in Deploying In-Car Interaction Foundation Models; 3.2 AISpeech; Key Voice and Language Technologies (1); Key Voice and Language Technologies (2); Key Voice and Language Technologies (3); Automotive Voice Solutions; Automotive Voice Assistant; Development History of Foundation Model; Details of Foundation Model; Architecture Deployment Diagram of Foundation Model; Design Diagram of Voice Foundation Model; Voice Development Platform; Full-Chain + Foundation Model Layout of Voice; 3.3 Cerence; Automotive Language Foundation Model Solutions; Integration of Voice Assistant with Foundation Models; Voice Assistant; External Voice Interaction; Core Voice Technologies (1); Core Voice Technologies (2); Core Voice Technologies (3); In-Car Interaction System; 3.4 Unisound; AI Solutions; In-Car Foundation Model Solutions; Details of Foundation Model (1); Details of Foundation Model (2); Details of Foundation Model (3); Business Model of Automotive Voice Solutions; Basic Voice Technologies (1); Basic Voice Technologies (2); 3.5 SoundHound; Core AI Technologies (1); Core AI Technologies (2); Core AI Technologies (3); Major Clients; Automotive Voice Solutions; Voice AI Platform; 3.6 Desay SV; Research History in Automotive Voice; Application Scenarios of Foundation Model in Voice; Cloud and Vehicle Deployment Solutions for Foundation Model Voice; Automotive Voice Capabilities; Overview of Voice Foundation Model Solutions; Solutions to Industry Pain Points in Voice (1); Solutions to Industry Pain Points in Voice (2); Solutions to Industry Pain Points in Voice (3); Solutions to Industry Pain Points in Voice (4); Future Plans for Foundation Model Voice; Technical Reserves in Automotive Voice; 3.7 Mobvoi; AI Voice Core Technologies; Intelligent Automotive Voice Solutions (1); Intelligent Automotive Voice Solutions (2); Foundation Model Solutions: Sequence Monkey (1); Foundation Model Solutions: Sequence Monkey (2); 3.8 Pachira; Core Voice Technologies (1); Core Voice Technologies (2); Core Voice Technologies (3); Voice Foundation Model Solutions; Intelligent Cockpit Foundation Model (Hybrid Architecture + Open Integration); Automotive Voice Solutions (1); Automotive Voice Solutions (2); Advantages of Voice Products (1); Advantages of Voice Products (2); Highlights of Intelligent Cockpit Human-Machine Interaction Products; 3.9 Huawei; Empowerment of Voice Assistant Xiaoyi by Pangu Foundation Model (1); Empowerment of Voice Assistant Xiaoyi by Pangu Foundation Model (2); Empowerment of Voice Assistant Xiaoyi by Pangu Foundation Model (3); Empowerment of Voice Assistant Xiaoyi by Pangu Foundation Model (4); Pangu Foundation Model 3.0 (1); Pangu Foundation Model 3.0 (2); Xiaoyi Dialogue Flow; Iteration of Pangu Foundation Model; Upgrade of Xiaoyi Based on Pangu Foundation Model 5.0; Commercial Vehicle Application Cases of Pangu Automotive Foundation Model; Qianwu Engine; 3.10 Baidu; Apollo Super Cockpit Capability Framework; Apollo Super Cockpit Capability Framework—Fusion Perception (1); Apollo Super Cockpit Capability Framework—Fusion Perception (2); Apollo Super Cockpit Capability Framework—Fusion Perception (3); AI Foundation Model Cockpit—SIMO 2.0; Operating System Solutions Based on ERNIE Foundation Model (1); Operating System Solutions Based on ERNIE Foundation Model (2); DuerOS X's Automotive Voice Solutions; Development History of Xiaodu Automotive Voice (1); Development History of Xiaodu Automotive Voice (2); Development History of Xiaodu Automotive Voice (3); Empowerment of Xiaodu Assistant by Foundation Model; Intelligent Cockpit Foundation Model 2.0; Basic Architecture of ERNIE Bot Foundation Model; 3.11 Tencent; AI Interaction Driven by Foundation Model; Intelligent Cockpit Foundation Model Framework; Applications of Intelligent Cockpit Foundation Model (1); Applications of Intelligent Cockpit Foundation Model (2); Tencent Cloud's Intelligent Automotive Voice Assistant; Xiaowei's Voice Capabilities (1); Xiaowei's Voice Capabilities (2); Xiaowei's Voice Capabilities (3); Application of TAI 5.0 Foundation Model in Voice Interaction; AI Lab's Voice Technologies (1); AI Lab's Voice Technologies (2); AI Lab's Intelligent Voice Interaction System; AI Lab's Multimodal Interaction; AI Lab's Front-End Solutions for Automotive Voice; Advantages of AI Lab's Front-End Solutions for Automotive Voice; AI Lab's Voice Wake-up; 3.12 Alibaba; Types of Foundation Models; Genie Platform's Automotive Voice Solutions; Capabilities of Banma Foundation Model; Banma Voice AI Technologies; DAMO Academy's Voice Technologies; DAMO Academy's Open-Source TTS; DAMO Academy's Open-Source ASR; 3.13 SenseTime; Foundation Model Voice Capabilities; Humanoid Foundation Model for Creating Emotional Interaction Scenarios; Cockpit AI Products; Multimodal Processing Capability Framework; 3.14 VoiceAI; Automotive Voice Product Line (1); Automotive Voice Product Line (2); Automotive Voice Product Line (3); Underlying Voice Technologies
4 Automotive Voice Industry Chain: 4.1 PATEO; Human-Machine Interaction Technology: Qing iVoka; Qing AI Voice Capability Configuration (1); Qing AI Voice Capability Configuration (2); Evolution of Qing AI Assistant; AI Language Foundation Model Layout; 4.2 Tinnove; Interaction Framework; Voice Engine; 4.3 Megatronix; Voice System Functions; Voice Development Kit; Foundation Model + Automotive Voice; 4.4 Haitian Ruisheng; Voice Business (1); Voice Business (2); Solution for In-Vehicle Foundation Model Voice Requirements; Training Dataset Structure (Smart Voice); Product Coverage; 4.5 DataBaker; Voice Data Services (1); Voice Data Services (2); Voice Foundation Model Products; 4.6 Magic Data; Automotive Voice Business; Voice Foundation Model SFT Dataset; Voice Foundation Model SFT Dataset; High-Quality Chinese Dataset for Voice Cloning Foundation Model; High-Quality Chinese Dataset for Voice Cloning Foundation Model; Automotive Customer Collaboration Cases; 4.7 Chipintelli; Voice Chip: Cl11 Series; Voice Chip: Cl11 Series Application Diagram; Voice Chip: CI13 Series; CI1311 & CI1312 Chip Application Diagram; Voice Chip; In-Vehicle Solutions: Pure Offline Voice Solution; In-Vehicle Solutions: Offline + Online Voice Solution; In-Vehicle Solutions: Offline Voice + IoT Solution; Voice Algorithm; Voice Algorithm; 4.8 WUQi Micro Automotive-Grade AI Voice Chip: WQ5301
5 Automotive Voice Development Trends: Summary of OEMs’ Foundation Model Configuration Parameters (1); Summary of OEMs’ Foundation Model Configuration Parameters (2); Trend 1:; Trend 2: With the Rise of In-House Voice Development, Voice Control Is Gradually Extending Beyond the Vehicle.; Trend 3:; Trend 4:; Trend 5:; Trend 6:

Pricing

Currency Rates

Single User Email from Publisher $4,300
Global Site License Email from Publisher $6,400

How Do Licenses Work?

Request A Sample

Questions or Comments?

Our team has the ability to search within reports to verify it suits your needs. We can also help maximize your budget by finding sections of reports you can purchase.

Chat Now