Research Report on AI Foundation Models and Their Applications in Automotive Field, 2024-2025

Research on AI foundation models and automotive applications: reasoning, cost reduction, and explainability

Reasoning capabilities drive up the performance of foundation models.

Since the second half of 2024, foundation model companies inside and outside China have launched their reasoning models, and enhanced the ability of foundation models to handle complex tasks and make decisions independently by using reasoning frameworks like Chain-of-Thought (CoT).

The intensive releases of reasoning models aim to enhance the ability of foundation models to handle complex scenarios and lay the foundation for Agent application. In the automotive industry, improved reasoning capabilities of foundation models can address sore points in AI applications, for example, enhancing the intent recognition of cockpit assistants in complex semantics and improving the accuracy of spatiotemporal prediction in autonomous driving planning and decision.

In 2024, reasoning technologies of mainstream foundation models introduced in vehicles primarily revolved around CoT and its variants (e.g., Tree-of-Thought (ToT), Graph-of-Thought (GoT), Forest-of-Thought (FoT)), and combined with generative models (e.g., diffusion models), knowledge graphs, causal reasoning models, cumulative reasoning, and multimodal reasoning chains in different scenarios.

For example, the Modularized Thinking Language Model (MeTHanol) proposed by Geely allows foundation models to synthesize human thoughts to supervise the hidden layers of LLMs, and generates human-like thinking behaviors, enhances the thinking and reasoning capabilities of large language models, and improves explainability, by adapting to daily conversations and personalized prompts.

In 2025, the focus of reasoning technology will shift to multimodal reasoning. Common training technologies include instruction fine-tuning, multimodal context learning, and multimodal CoT (M-CoT), and are often enabled by combining multimodal fusion alignment and LLM reasoning technologies.

Explainability bridges trust between AI and users.

Before users experience the ""usefulness"" of AI, they need to trust it. In 2025, the explainability of AI systems therefore becomes a key factor in increasing the user base of automotive AI. This challenge can be addressed by demonstrating long CoT.

The explainability of AI systems can be achieved at three levels: data explainability, model explainability, and post-hoc explainability.

In Li Auto's case, its L3 autonomous driving uses ""AI reasoning visualization technology"" to intuitively present the thinking process of end-to-end + VLM models, covering the entire process from physical world perception input to driving decision outputted by the foundation model, enhancing users’ trust in intelligent driving systems.

In Li Auto's ""AI reasoning visualization technology"":
•Attention system displays traffic and environmental information perceived by the vehicle, evaluates the behavior of traffic participants in real-time video streams and uses heatmaps to display evaluated objects.
•End-to-end (E2E) model displays the thinking process behind driving trajectory output. The model thinks about different driving trajectories, presents 10 candidate output results, and finally adopts the most likely output result as the driving path.
•Vision language model (VLM) displays its perception, reasoning, and decision-making processes through dialogue.

Various reasoning models’ dialogue interfaces also employ a long CoT to break down the reasoning process as well. Examples include DeepSeek R1 which during conversations with users, first presents the decision at each node through a CoT and then provides explanations in natural language.

Additionally, most reasoning models, including Zhipu’s GLM-Zero-Preview, Alibaba’s QwQ-32B-Preview, and Skywork 4.0 o1, support demonstration of the long CoT reasoning process.
DeepSeek lowers the barrier to introduction of foundation models in vehicles, enabling both performance improvement and cost reduction.

Does the improvement in reasoning capabilities and overall performance mean higher costs? Not necessarily, as seen with DeepSeek's popularity. In early 2025, OEMs have started connecting to DeepSeek, primarily to enhance the comprehensive capabilities of vehicle foundation models as seen in specific applications.

In fact, before DeepSeek models were launched, OEMs had already been developing and iterating their automotive AI foundation models. In the case of cockpit assistant, some of them had completed the initial construction of cockpit assistant solutions, and connected to cloud foundation model suppliers for trial operation or initially determined suppliers, including cloud service providers like Alibaba Cloud, Tencent Cloud, and Zhipu. They connected to DeepSeek in early 2025, valuing the following:
Strong reasoning performance: for example, the R1 reasoning model is comparable to OpenAI o1, and even excels in mathematical logic.
Lower costs: maintain performance while keeping training and reasoning costs at low levels in the industry.

By connecting to DeepSeek, OEMs can really reduce the costs of hardware procurement, model training, and maintenance, and also maintain performance, when deploying intelligent driving and cockpit assistants:
Low computing overhead technologies facilitate high-level autonomous driving and technological equality, which means high performance models can be deployed on low-compute automotive chips (e.g., edge computing unit), reducing reliance on expensive GPUs. Combined with DualPipe algorithm and FP8 mixed precision training, these technologies optimize computing power utilization, allowing mid- and low-end vehicles to deploy high-level cockpit and autonomous driving features, accelerating the popularization of intelligent cockpits.
Enhance real-time performance. In driving environments, autonomous driving systems need to process large amounts of sensor data in real time, and cockpit assistants need to respond quickly to user commands, while vehicle computing resources are limited. With lower computing overhead, DeepSeek enables faster processing of sensor data, more efficient use of computing power of intelligent driving chips (DeepSeek realizes 90% utilization of NVIDIA A100 chips during server-side training), and lower latency (e.g., on the Qualcomm 8650 platform, with computing power of 100TOPS, DeepSeek reduces the inference response time from 20 milliseconds to 9-10 milliseconds). In intelligent driving systems, it can ensure that driving decisions are timely and accurate, improving driving safety and user experience. In cockpit systems, it helps cockpit assistants to quickly respond to user voice commands, achieving smooth human-computer interaction.


Please Note: PDF E-mail from Publisher purchase option allows up to 10 users and does not allow printing or editing. This functionality will require a Global Site License.


Definition
1 Overview of AI Foundation Models
1.1 Introduction to AI Mo
Definition and Features of AI M
Classification of AI Models by Archite
Classification of AI Models by Task Type/Training M
Classification of AI Models by Supervision
Classification of AI Models by Moda
Application Process of AI M
1.2 Introduction to Foundation M
Classification of Foundation M
Current Development of Foundation Models in Automotive Ind
Application Scenarios of Foundation Models in Automotive Ind
Application Case 1: Application of LLM in Autonomous Dr
Application Case 2: Application of VFM in Autonomous Dr
Application Case 3: Application of MFM in Autonomous Dri
2 Analysis of AI Foundation Models of Differing Types
2.1 Large Language Models (
Development History o
Key Capabilities o
Cases of Integration with Other M
2.2 Multimodal Large Language Models (
Development and Overview of Large Multimodal M
Large Multimodal Models VS. Large Single-modal Model
Large Multimodal Models VS. Large Single-modal Models
Technology Panorama of Large Multimodal Mo
Multimodal Information Represent
Multimodal Large Language Models (
Architecture and Core Components of
Status Quo of
Dataset Evaluation by Different MLLM Representa
Reasoning Capabilities of
Synergy between MLLM and
Application Case 1: Application of MLLM i
Application Case 2: Application of MLLM in Autonomous Dri
2.3 Vision-Language Models (VLM) and Vision-Language-Action (VLA) Mo
Development History o
Application o
Architecture o
Evolution of VLM in Intelligent Dr
Application Scenarios of VLM: End-to-end Autonomous Dr
Application Scenarios of VLM: Combination with Gaussian Fram
VL
VLA M
Principles o
Classification of VLA M
Application Cases of VL
Application Cases of VL
Application Cases of VLA
Application Cases of VL
Case 1: Core Functions of End-to-End Multimodal Model for Autonomous Driving (
Case 2: World Model Constru
Case 3: Improve Vision-Language Navigation Capabil
Case 4: VLA Generalization Enhanc
Case 5: Computing Overhead o
2.4 World M
Key Definitions of World Models and Application Develop
Basic Architecture of World M
Framework Setup and Implementation Challenges of World M
Video Generation Methods Based on Transformer and Diffusion M
Technical Principle and Path of WorldDr
World Models and End-to-end Intelligent Dr
World Models and End-to-end Intelligent Driving: Data Gener
Case 1: Tesla World
Case 2: N
Case 3: Infinity
Case 4: Worlds Labs Spatial Intelli
Case 5
Case 6: 1X's World
3 Common Technologies in AI Foundation Models
Common Foundation Model Algorithms and Architec
Comparison of Features and Application Scenarios between Foundation Model Algor
3.1 Foundation Model Architectures and Related Algori
Transformer: Architecture and Fea
Transformer: Algorithm Mecha
Transformer: Multi-head Attention Mechanisms and Their Var
KAN: Potential to Replac
KAN: Cases of Integration with Transformer Archite
MAMBA: Introdu
MAMBA: Architectural Founda
MAMBA: Latest Develop
MAMBA: Application Scen
MAMBA: Cases of Integration with Transformer Archite
Applicability of CNN in the Era of Foundation M
Applicability of RNN Variants in the Era of Foundation M
3.2 Visual Processing Algor
Common Vision Algor
CLIP Scenarios and Feat
CLIP Wor
LLaVA
3.3 Training and Fine-Tuning Technolo
Foundation Model Training Pr
Training Case: Geely's CPT Enhancement Sol
Instruction Fine-t
Training Case: Geely's Fine-tuning Framework for Multi-round Dial
3.4 Reinforcement Lea
Introduction to Reinforcement Lea
Reinforcement Learning Pr
Comparison between Some Reinforcement Learning Technology Ro
Cases of Reinforcement Learning (1
3.5 Knowledge G
Optimization Directions for Retrieval-Augmented Generation
Evolution Directions of RAG (1)
Evolution Directions of RAG (2)
Evolution Directions of RAG (3): Grap
RAG Application Cas
RAG Application Ca
RAG Application Case 3: Li
RAG Application Case 4:
Comparison between RAG R
Function
3.6 Reasoning Technolo
Reasoning Process of Transformer M
Evaluation of Reasoning Capabil
Three Optimization Directions for Foundation Model Reas
Reasoning Task Type
Reasoning Task Type
Reasoning Task Type
Common Reasoning Algorithm 1
Common Reasoning Algorithm 2: Go
Comparison between Common Reasoning Algor
Common Reasoning Algorithm 3: PagedAtte
Reasoning Case 1:
Reasoning Case 2: N
3.7 Sparsificat
Characteristics of MoE Archite
Principles of MoE Archite
MoE Training Strat
Advantages and Challenges o
MoE Models from Different Foundation Model Comp
Evolution Direction o
3.8 Generation Technolo
Introduction to Generative M
Comparison between Generation Technol
Case 1: Li
Case 2:
Case 3:
4 AI Foundation Model Companies
Development History of Mainstream Foundation M
Mainstream Foundation Models and Their Companies (For
Mainstream Foundation Models and Their Companies (Chi
Rankings of Evaluated Foundation M
4.1 O
Product La
Product Iteration Hi
GPT Series: Feat
GPT Series: Archite
From GPT-4V
Reasoning Model Open
SORA: Fea
SORA: Performance Evalu
SORA: Advantages and Limita
4.2 G
Development History of Foundation Mo
Typical Model BERT: Archite
Typical Model BERT: Var
Gemini
Cases of Foundation Models in the Automotive Ind
4.3
LLA
LLAMA Series: Evol
LLAMA Series: Fea
LLAMA Series: Training Met
LLAMA Series: A
LLAMA Series: V
4.4 Anth
Claude Performance Evaluat
Claude-based PC-side A
4.5 Mistr
Expert Model: Archite
Expert Model: Algorithm Feature
Expert Model: Algorithm Features
Large Language Model: Mistral La
4.6 A
Nova Product S
Application Cases of Amazon AI Cloud in the Automotive Industry (1
4.7 Stabili
Product S
Stable Diffusion Architecture Based on Diffusion M
Comparison between Stable Diffusion Video Generation Technology with Compet
4.
Product S
Capabilities of xAI M
Capabilities of Gr
Capabilities of Grok
4.9 Abu Dhabi Technology Innovation Inst
Iteration History of Falcon Model S
Parameters of Falcon 3 S
Evaluation of Falcon 3 S
4.10 Sens
Major Foundation Model Product Sy
Major Foundation Model Product Sy
Foundation Model Training Facil
Functional Scenarios of Foundation M
Foundation Model Technol
4.11 Alibaba
Foundation Model Product S
End-cloud Integration Solutions of Foundation Mo
4.12 Baidu AI C
Foundation Model Product Sy
4.13 Tencent
Foundation Model Product S
Reasoning Service Solutions (1
Generation Scenario Solutions for Foundation M
Q&A Scenario Solutions for Foundation Mo
4.14 ByteDance & Volcano E
Doubao Model S
Functional Highlights of Volcano Engine's Coc
4.15 H
Pangu Model Product S
Application Cases of Pangu Models in Data Synthe
LLM Architecture of Pangu M
Capabilities of Pangu Models: Multimodal Techn
Capabilities of Pangu Models: Thinking & Reasoning Techn
AI Cloud Services of Pangu M
4.16 Zhi
Product S
Foundation Model Base in the Automotive Ind
Technical Featu
4.17 F
Product S
Functional and Technical Highli
Cockpit AI S
4.18 Dee
Product Sy
Technical Inspiration from DeepSee
Technical Highlights of DeepSe
Application Cases of DeepSeek (1
5 Application Cases of AI Foundation Models in Automotive
5.1 Cockpit C
Lenovo's AI Vehicle Computing Framework Used in Coc
In-cabin Functions of Thundersoft's Rubik Foundation
LLM Empowers Smart Eye’s DMS/OMS Assistance Sy
Application of DIT in Voice Processing Scen
Application of Unisound's Shanhai Model in Cock
Phoenix Auto Intelligence’s Cockpit Smart B
5.2 Intelligent Driving C
Li Auto: Multimodal Technology in Autonomous Drivin
Li Auto: Multimodal Technology in Autonomous Drivin
Li Auto: Multimodal Technology in Autonomous Driving (3): Overcoming 2D Limita
Li Auto: Data Generation Technolog
Li Auto: Data Generation Technolog
Li Auto: CoT Technology in Dri
Li Auto: Application of Visual Proce
Li Auto: Data Sele
Geely: Application of Visual Proce
Geely: Multimodal Learning Fram
Waymo: Generative World Model G
Tesla: Algorithm Architecture (Including
Tesla: Skeleton, Neck, and Head of Vision Algor
Tesla: Core of Visual System - Hyd
Giga’s World
6 Application Trends of AI Foundation Models
6.1
Tre
Tre
6.2 Algo
Tre
Tre
Tr
Tre
6.3 Computing
Tre
Tre
6.4 Engine
Tr
Tr

Download our eBook: How to Succeed Using Market Research

Learn how to effectively navigate the market research process to help guide your organization on the journey to success.

Download eBook
Cookie Settings