China Autonomous Driving Data Closed Loop Research Report, 2025

Publisher Research in China

Published Oct 17, 2025

Length 318 Pages

SKU # RIC20580204

Description

Data Closed-Loop Research: Synthetic Data Accounts for Over 50%, Full-process Automated Toolchain Gradually Implemented

Key Points:
From 2023 to 2025, the proportion of synthetic data increased from 20%-30% to 50%-60%, becoming a core resource to fill long-tail scenarios.
Full-process automated toolchain from collection to deployment is gradually implemented, helping reduce costs and improve efficiency.
Efficient collaboration of the vehicle-cloud integrated data closed-loop is a key factor in achieving faster iterations.

The essence of autonomous driving data closed-loop is a cyclic optimization system of collection-transmission-processing-training-deployment. In 2025, the industry is accelerating from the 0→1 stage to the high-quality and high-efficiency era, with core contradictions focusing on long-tail scenario coverage and cost control. OEMs and Tier 1 suppliers are actively establishing their own data closed-loop solutions. Through efficient data collection, processing and analysis processes, they continuously improve autonomous driving algorithms, thereby significantly enhancing the accuracy and stability of intelligent driving systems.

I. From 2023 to 2025, the Proportion of Synthetic Data Increased from 20%-30% to Over 50%

The efficiency of acquiring high-quality data determines the evolution speed of intelligent driving. Currently, data sources in the automotive field include mass-produced vehicle-triggered data transmission, high-value specific scenario data collection by collection vehicles, engineering practices for physical world restoration through roadside real data, and data synthesis technology based on world models. The core path for the large-scale application of autonomous driving technology → real data anchors basic capabilities, and synthetic data breaks through capability boundaries. From 2023 to 2025, the proportion of real data and synthetic data in autonomous driving training data has undergone significant changes, gradually shifting from a real data-dominated model in the early stage to a hybrid model with an increasing proportion of synthetic data.

2023: Real data dominates, synthetic data starts (synthetic data accounts for 20%-30%): Real data is still the main body, mainly used for basic scenario training, but faces the problem of insufficient coverage of long-tail scenarios. For example, Tesla relied on real road test data from over one million vehicles in the early stage, but the collection efficiency of extreme scenarios (such as pedestrians breaking in during heavy rain) is low. Synthetic data accounts for about 20%-30%, mainly used to supplement long-tail scenarios. Experiments by Applied Intuition show that after adding 30% of synthetic data with frequent appearance of cyclists to real data, the recognition accuracy (mAP score) of the perception model for cyclists is significantly improved.

2024: Accelerated penetration of synthetic data (proportion rises to 40%-50%): Synthetic data has upgraded from an auxiliary tool to a core production material. Its penetration rate rising to 40%-50% marks that intelligent driving has entered a new data-driven paradigm. At the end of 2024, the Shanghai High-level Autonomous Driving Demonstration Zone launched a plan of 100 data collection vehicles. Through a hybrid model of real data collection + world model-generated virtual data, the proportion of synthetic data is close to 50%; for example, Nvidia DRIVE Sim generates synthetic data of distant objects (100-350 meters) to solve the problem of sparse real annotations. After adding 92,000 synthetic images, the detection accuracy (F1 score) of vehicles 200 meters away is improved by 33%.

2025: Synthetic data surpasses (accounts for over 50%): The ratio of synthetic data to real data moves towards 5:5 or even higher. Academician Wu Hequan pointed out that 90% of the training for L4/L5 is simulation data, and only 10%-20% of real data is retained as a gene pool to avoid model deviation. In terms of innovative applications of synthetic data, take Li Auto as an example. It uses world models to reconstruct historical scenarios and expand variants (such as virtualizing ordinary intersections into rainy night and foggy conditions), and automatically generates extreme cases for cyclic training. The proportion of synthetic data in Li Auto exceeds 90%, replacing real-vehicle testing and verifying reliability.

According to Lang Xianpeng from Li Auto, in 2023, the effective real-vehicle test mileage of Li Auto was about 1.57 million kilometers, with a cost of 18 yuan per kilometer. By the first half of 2025, a total of 40 million kilometers had been tested, including only 20,000 kilometers of real-vehicle testing and 38 million kilometers of synthetic data. The test cost dropped to an average of 0.5 yuan per kilometer. Moreover, the test quality is high, all scenarios can be inferred from one instance, and complete retesting is possible.

The advantages of synthetic data are not only reflected in cost and efficiency but also in its value density beyond human experience. Synthetic data is generated in batches through technical means at extremely low cost, perfectly matching the high-frequency training needs of AI; it can also independently generate extreme corner case scenarios that humans have not experienced but comply with physical laws.

II. Full-process Automated Toolchain from Collection to Deployment is Gradually Implemented, Helping Reduce Costs and Improve Efficiency

The autonomous driving data closed-loop has shifted from focusing on a single link (such as improving annotation efficiency) in the early stage to an end-to-end automated architecture covering collection-annotation-training-simulation-deployment. The core breakthrough is to break through data flow barriers through AI large models and cloud-edge collaboration technology, realizing closed-loop self-evolution.

LiangDao Intelligence LD Data Factory is a full-link 4D ground truth solution from collection to delivery. The LD Data Factory toolchain product has been delivered to more than a dozen automotive OEMs and Tier 1s in China, Germany, and Japan. This automated 4D annotation tool software has automatically annotated more than 3,300 hours of road-collected data for customers, obtaining high-quality 4D continuous frame ground truth; by the middle of 2025, LiangDao Intelligence had delivered more than 55 million frames of data to a well-known German luxury car brand.

LD Data Factory integrates data collection, automated annotation, manual annotation, quality control, and performance evaluation. The toolchain includes AI preprocessing and VLM-assisted collection, an automated annotation module for target detection, full-process closed loop of automatic quality inspection, and hybrid cloud and private deployment. LD Data Factory covers several core modules and realizes data management and task collaboration through a unified data management platform: including time synchronization and spatial calibration, distributed storage and indexing services, a visual annotation platform LDEditor (full-stack annotation), an automated quality control module LD Validator, and a perception performance evaluation module LD KPI.

Main products under MindFlow currently include an integrated data annotation platform, a data management platform (including a vector database), and a model training platform, covering the entire value chain from raw data to model implementation. Users can complete the entire algorithm development process in one stop without switching multiple tools or platforms, redefining a new paradigm of AI data services. The technical highlights of its MindFlow SEED platform (third generation) include support for 4D point cloud annotation (lane lines, segmentation), RPA automated processes, and AI pre-annotation covering more than 4,000 functional modules.

Currently, MindFlow empowers customers including SAIC Group, Changan Automobile, Great Wall Motors, Geely Automobile, FAW Group, Li Auto, Huawei, Bosch, ECARX, MAXIEYE, NavInfo and RoboSense.

III. Efficient Collaboration of the Vehicle-Cloud Integrated Data Closed-Loop is a Key Factor in Achieving Faster Iterations

The essence of the vehicle-cloud integrated data closed-loop is to build a collaborative system of vehicle-side lightweight + cloud-side intelligence, break through data flow barriers, and realize the continuous evolution of intelligent vehicles. The vehicle side is responsible for real-time collection of environmental perception data (such as road conditions, vehicle operation data), which is uploaded to the cloud after desensitization, encryption, and compression. The cloud processes massive amounts of data (PB/EB level), performs annotation, model training, and algorithm optimization, generates new capabilities, and issues them to the vehicle side to realize OTA upgrades.

The ExceedData data closed-loop solution is a vehicle-cloud integrated solution, which has gained the trust and mass production application of more than 15 automotive OEMs and is deployed in more than 30 mainstream models.

The composition of the ExceedData data closed-loop solution includes the vehicle-side edge computing engine (vCompute), edge data engine (vADS), edge database (vData), as well as the cloud-side algorithm development tool (vStudio), cloud computing engine (vAnalyze), and cloud management platform (vCloud). This solution can reduce data transmission costs by 75%, cloud storage costs by 90%, and cloud computing costs by 33%. According to the calculation of an OEM case cooperating with ExceedData: the total cost optimization can be reduced by 85%.

In terms of OEMs, take Xpeng Motors as an example. Its self-built cloud-side model factory has a computing power reserve of 10 EFLOPS in 2025, and the end-to-end iteration cycle is shortened to an average of 5 days, supporting rapid closed-loop from cloud-side pre-training to vehicle-side model deployment.

Xpeng launched China's first 72 billion parameter multimodal world base model for L4 high autonomous driving, which has chain-of-thought (CoT) reasoning capabilities and can simulate human common-sense reasoning and generate control signals. Through model distillation technology, the capabilities of the base model are migrated to the vehicle-side small model, realizing personalized deployment of small size and high intelligence.

High-value data (such as corner cases) is initially screened through the vehicle-side rule engine. The cloud combines synthetic data generation technologies (such as GAN, diffusion models) to fill data gaps and improve model generalization capabilities. At the same time, end-to-end (E2E) and VLA models integrate multimodal inputs to directly output control commands, relying on cloud-side large model training (such as Xpeng's 72 billion parameter base model) to achieve lightweight deployment on the vehicle side.

With the comprehensive modeling of the entire intelligent driving system, car companies are pursuing better cost, higher efficiency, and more stable services in the data closed-loop. The delivery method of intelligent driving is accelerating from delivering code for single-vehicle deployment to a subscription-based cloud service as the core. The efficiently collaborative data closed-loop of vehicle-cloud integration is the key for intelligent vehicles to achieve faster iterations driven by AI.

Please Note: PDF E-mail from Publisher purchase option allows up to 10 users and does not allow printing or editing. This functionality will require a Global Site License.

Glossary
1 Overview/Trends of Autonomous Driving Data Closed-Loop: 1.1 Overview of Data Closed-Loop; One-stop Cases of Data Intelligence Platforms; Comparison of Data Closed-Loop Deployment Case Strategies; 1.2 Data Closed-Loop Moves Towards the Era of Full-Stack Self-Evolution; 1.3 Summary of Data Closed-Loop Progress Cases (1); 1.3 Summary of Data Closed-Loop Progress Cases (2); 1.3 Summary of Data Closed-Loop Progress Cases (3); 1.4 Data Closed-Loop Cooperation Models; 1.5 Summary of OEMs’ Data Closed-Loop Related Cooperation (1); 1.5 Summary of OEMs’ Data Closed-Loop Related Cooperation (2); 1.5 Summary of OEMs’ Data Closed-Loop Related Cooperation (3); 1.6 Trend 1; 1.7 Trend 2; 1.8 Trend 3; 1.9 Trend 4; 1.10 Trend 5; 1.11 Trend 6; Accelerated Popularization of High Computing Power on the Vehicle Side; Comparison of Major Autonomous Driving Chips; Comparison of Cloud-Side Computing Power and Intelligent Computing Centers; Case Analysis of Intelligent Computing Centers
2 Research on High-Quality Data Collection/Synthetic Simulation: 2.1 High-Quality Data Collection; Case 1: Lan-You Technology; Case 2: Kunyi Electronic; Case 3: TZTEK; Case 4: Keymotek; Case 5: EMQ Technologies; Case 6: ExceedData; Case 7: CARLINX; Case 8: YOOTTA; 2.2 Synthetic/Simulation Data; Overview of Autonomous Driving Synthetic Data; Advantages and Challenges of Synthetic Data; Summary of Synthetic Data Application Scenarios; Changes in the Proportion of Synthetic Data Applications; World Model-Based Data Synthesis Technology (1); World Model-Based Data Synthesis Technology (2); World Model-Based Data Synthesis Technology (3); World Model-Based Data Synthesis Technology (4); Case 1: Synkrotron; Toolchain Products (1); Toolchain Products (2); Data Management Platform; Synthetic Data Solutions (1); Synthetic Data Solutions (2); Traffic Flow Synthetic Data Platform for Advanced Intelligent Driving; Case 2: 51SIM; End-to-End Data-Driven Closed-Loop (1); End-to-End Data-Driven Closed-Loop (2); End-to-End Data-Driven Closed-Loop (3); Case 3: WayLancer; Data Products; Data Closed-Loop (1); Data Closed-Loop (2); Data Closed-Loop (3); Case 4: ThousandSim; Case 5: Lightwheel AI
3 Research on Data Storage/Processing: 3.1 Data Storage; Case 1: JOYNEXT; Case 2: MacrooSAN Technology; Case 3: Alibaba Cloud; Case 4: Baidu; Case 5: Tencent Intelligent Mobility; Case 5: Tencent Data Closed-Loop Platform (1); Case 5: Tencent Data Closed-Loop Platform (2); Case 6: AWS; 3.2 Efficient Data Processing; Case 1: Lan-You Technology; Case 2: ExceedData; Case 3: Keymotek; Case 4: Synkrotron; Case 5: Alibaba Intelligent Driving Data Preprocessing Solution
4 Research on Automated (AI) Annotation: Summary: Comparison of Automated Annotation Solutions; 4.1 Rere Data; Profile; Intelligent Driving Solutions of Retention Data; Enable AI Intelligent Data Annotation Platform; Data Collection Services; Data Security Management; 4.2 MindFlow; Profile; Data Service Solutions (1); Data Service Solutions (2); Third-Generation MindFlow SEED Platform; 4D Point Cloud Processing; Development Dynamics; 4.3 StardustAI; Profile; Self-Developed Algorithms; Rosetta Annotation Platform; MorningStar AI Data Management Platform; COSMO Large Model Data Pyramid Solution; Autonomous Driving Service Scenarios; Autonomous Driving Service Cases and Technical Capabilities; Data Annotation Service Customers; 4.4 Datatang; Intelligent Driving Solutions; Intelligent Driving Training Datasets; Comparison of Intelligent Driving Training Datasets; Shujiajia Pro Artificial Intelligence Data Annotation Platform; 4.5 Databaker Technology; Profile; Development History; Easy Collection Tool; 4D-BEV Annotation Tool; AI Data Platform (1); AI Data Platform (2); AI Data Platform (3); Large Model Data Solutions; Large-Scale High-Quality Datasets; 4.6 Boden AI; Profile; Product Matrix; Datasets; Autonomous Driving Datasets; Autonomous Driving Solutions; BASE Data Annotation Platform; 4D Point Cloud Annotation; BBot Agent Platform; Cooperation Cases; 4.7 ByteTree AI; Technology Layout; Full-link Data Services; Shanhai Data Management Platform; Intelligent Driving Data Closed-Loop Solutions; Ground Truth Reuse Solutions; 4D Dynamic Automated Annotation Large Model; Data Closed-Loop Capabilities; Cooperation Partners; Cooperation Cases
5 Research on Algorithms and Model Training: Algorithm Evolution; Algorithm Architecture Evolution; Comparison of Core Algorithm Architectures of OEMs; VLA Development Status; Latest Progress of VLA Solutions in Data Closed-Loop; Latest Progress of OEMs/Tier 1s in VLA Solutions; Comparative Analysis of Advanced Large Models; Case 1: DeepRoute.ai; Intelligent Driving Mileage and Commercialization Progress; End-to-End Technical Solutions; Data Closed-Loop Capabilities; Case 2: Nullmax; Data Closed-Loop Technology Progress; Platform-Based BEV-AI Architecture Design; One-Model End-to-End Core Technology; MaxDrive Platform-Based Solutions; Latest Development Dynamics; Case 3: iMotion Automotive Technology; Core Competitiveness; Intelligent Driving Technology and Model Training; Large Model R&D System; Advanced Parking and Driving Algorithm Platform and Products; Data Closed-Loop Capabilities; Case 4: Momenta; Data Closed-Loop and Mass Production Implementation; Mass Production/Cooperation Dynamics; Overview of World Models; Overview of World Models; Summary of Latest World Models; Core Architectures of Mainstream World Models; Development Direction of Synthetic Data for World Models; Case 1: SenseAuto; Case 2: YOOTTA; Case 3: Company H; Case 4: Horizon Robotics; Case 5: Xiaomi; Case 6: Wayve
6 Research on Representative Suppliers of Data Closed-Loop Technology: Summary: Comparison of Data Closed-Loop Technology Solutions of Representative Suppliers; 6.1 WUWEN.AI; Profile; Core Technologies; Data Closed-Loop Management Platform; Simulation Verification Platform; AI Data Annotation Platform; 6.2 LiangDao Intelligence; Data Factory; Core Modules of Data Factory; 4D Ground Truth Toolchain; Continuous Frame 4D Annotation; Customers; 6.3 ExceedData; Profile; Data Base (1); Data Base (2); Data Base (3); Vehicle-Cloud Full-Stack Products; Vehicle-Cloud Computing Engine; Empowerment of Vehicle-Cloud Computing Architecture; vADS Intelligent Driving Data Engine; vData Edge Database; vStudio Algorithm Development Tool; 6.4 Freetech; Intelligent Driving Platform ODIN3.0; FUZE Middleware Platform; Software and Algorithms; Data Closed-Loop Services; Product Matrix; Development Dynamics; Cooperation Partners; 6.5 MAXIEYE; Haishi Data Intelligent System; Mass Production Data Mileage; 6.6 Ruqi Mobility; Data Closed-Loop Flywheel; Annotation Base; Operation Data; 6.7 Yoocar; Profile; Development Updates; “DriveCloud” Intelligent Computing Solution; 6.8 Roadgrids; Automated Mass Production Mapping Capability Technology Architecture; Data Closed-Loop; 6.9 NavInfo; AI Infra-Empowered Data Closed-Loop; Services/Cloud Cooperation; 6.10 Kotei Informatics
7 Research on Typical OEMs’ Data Closed-Loop: 7.1 XPeng Motor; Summary of Data Closed-Loop and Software Supply Chain; Computing Power Data Center and Platform; Data Management Platform; Autonomous Driving Base Model; 7.2 Xiaomi Auto; Summary of Data Closed-Loop and Software Supply Chain; Delivery Data Statistics; Data Training; End-to-End Assisted Driving; Intelligent Driving Physical World Modeling System; End-to-End OTA Deployment; 7.3 NIO; Summary of Data Closed-Loop and Software Supply Chain; World Model (1); World Model (2); Delivery Data; Intelligent Function OTA Deployment; 7.4 Li Auto; Summary of Data Closed-Loop and Software Supply Chain; Model Training (1); Model Training (2); E2E Training Data Scale; Intelligent Driving Data Training Volume; Intelligent Function OTA Deployment; 7.5 Leapmotor; Summary of Data Closed-Loop and Software Supply Chain; Delivery Data Analysis; 7.6 IM Motors; Summary of Data Closed-Loop and Software Supply Chain; 7.7 Tesla; Summary of Data Closed-Loop and Software Supply Chain; 7.8 BYD; Summary of Data Closed-Loop and Software Supply Chain; End-to-End Large Model Training; Summary of DiPilot Series Assisted Driving Functions; Global Deployment; 7.9 Geely Automobile; Summary of Data Closed-Loop and Software Supply Chain; Full-domain AI Intelligence (1); Full-domain AI Intelligence (2); Brand/Global Production Capacity Layout; 7.10 FAW Group; Summary of Data Closed-Loop and Software Supply Chain; Intelligent Upgrade (1); Intelligent Upgrade (2); Intelligent Upgrade (3); 7.11 GAC; Data Closed-Loop System (1); Data Closed-Loop System (2); Data Closed-Loop System (3); 7.12 Summary of Changan Automobile Data Closed-Loop and Software Supply Chain; 7.13 Dongfeng Motor's One Core, Two Bases, Two Elements System; 7.14 Summary of Dongfeng Nissan Data Closed-Loop and Software Supply Chain; 7.14 Dongfeng Nissan Autonomous Driving Software Solutions and Supply Chain Construction; 7.15 Summary of Volkswagen Data Closed-Loop and Software Supply Chain; 7.16 Summary of Toyota Data Closed-Loop and Software Supply Chain

Pricing

Currency Rates

Single User Email from Publisher $4,300
Global Site License Email from Publisher $6,400

How Do Licenses Work?

Request A Sample

Questions or Comments?

Our team has the ability to search within reports to verify it suits your needs. We can also help maximize your budget by finding sections of reports you can purchase.

Chat Now