Harnessing the Interplay Between Data and Generative AI
This IDC Perspective covers the symbiotic relationship between data and generative AI. It includes data strategy for GenAI, evolving data architecture and key architecture decisions for GenAI like use of synthetic data, automated data pipelines, ETL for LLM applications, use of vector databases, RAG architecture, and prompt libraries. It also provides guidance for building an actionable data strategy for GenAI."Generative AI (GenAI) is changing the rules of the game for almost every industry. As organizations embark on their GenAI journey, they need high-quality data sets to tune their base foundation models for domain relevance and accuracy or use GenAI and modern data architecture to lock insights. Hence creating an appropriate data strategy is a prerequisite for building and deploying a successful GenAI application," says Ritu Jyoti, group vice president, Worldwide Artificial Intelligence and Automation research at IDC. "Without a data strategy for GenAI, an organization's efforts will be greater than necessary, risks will be magnified, and chances of success will be reduced."
Please Note: Extended description available upon request.
Executive Snapshot
Situation Overview
Data Strategy for GenAI Initiatives: Goals and Components
Data Architecture
Data Management
Data Architecture Decisions for GenAI Initiatives
Augmenting Training Data with Synthetic Samples
Development of Data Pipelines to Connect GenAI Models to High-Quality Data Sources
ETL for LLM Applications
Data Cleaning
Use of Vector Databases to Store and Retrieve Embeddings
RAG Architecture
Prompt Libraries
Unlocking Insights from SQL-Based Data Sources (e.g., Relational Database, Data Lakes, Data Warehouses) Using GenAI
Steps to Building an Actionable Data Strategy for GenAI Initiatives