Large Language Models in Modern Data Engineering: A Systematic Review of Architectures, Use Cases, and Limitations

Main Article Content

Shambhu Adhikari

Abstract

The rapid advancement of large language models (LLMs) since 2022 
has significantly reshaped modern data engineering practices. Originally 
developed for natural language processing tasks, LLMs are increasingly 
integrated into data engineering workflows, including data ingestion, schema inference, metadata 
generation, transformation logic synthesis, data quality monitoring, and natural-language interaction with 
analytical systems. This systematic review examines the role of LLMs in contemporary data engineering, 
focusing on architectural integration patterns, practical use cases across the data lifecycle, and inherent 
limitations affecting reliability and governance. Following PRISMA-informed guidelines, peer-reviewed 
articles, preprints, and industrial reports published between 2022 and 2025 were analyzed. The review 
identifies Retrieval-Augmented Generation (RAG), hybrid vector-database architectures, and agent-based 
orchestration frameworks as dominant deployment strategies. Evidence suggests that LLM-assisted 
pipelines improve developer productivity, reduce manual coding overhead, and enhance accessibility of 
data platforms for non-technical stakeholders. However, persistent challenges remain, including 
hallucination, data privacy risks, limited explainability, operational costs, and scalability constraints. The 
findings emphasize the need for robust architectural safeguards, evaluation benchmarks, and governance 
frameworks to ensure safe and effective production adoption. This review contributes a structured 
taxonomy of LLM-centric data engineering architectures and outlines future research directions to support 
trustworthy, scalable, and auditable data platforms. 

Downloads

Download data is not yet available.

Article Details

Section

Articles

Similar Articles

You may also start an advanced similarity search for this article.