PROMPT LINEAGE AND GOVERNANCE IN LLM-ENABLED DATA ENGINEERING: A REFERENCE ARCHITECTURE

Main Article Content

Shambhu Adhikari

Abstract

modern data engineering ecosystems to automate data transformation, 
quality assurance, metadata generation, and analytical reasoning. While 
these models enhance productivity and adaptability, they introduce significant governance challenges due 
to their probabilistic behavior and heavy reliance on prompts as executable control artifacts. Unlike 
traditional data pipelines, where logic is encoded in version-controlled code, LLM-enabled systems often 
embed prompts in orchestration layers without formal lifecycle management, lineage tracking, or policy 
enforcement. This absence of prompt governance undermines reproducibility, auditability, and regulatory 
compliance in enterprise data platforms. This paper proposes a reference architecture for prompt lineage 
and governance in LLM-enabled data engineering environments. Drawing on principles from DataOps, 
MLOps, metadata management, and responsible AI, the architecture treats prompts as first-class governed 
assets. It enables versioning, lineage tracking, metadata capture, and policy enforcement across prompt 
creation, deployment, and execution. The proposed architecture integrates with modern lakehouse 
platforms, orchestration engines, and observability tools to provide end-to-end transparency across data, 
prompts, models, and outputs. This study contributes a structured and practical framework to support 
scalable, compliant, and trustworthy adoption of LLMs in data engineering workflows.

Downloads

Download data is not yet available.

Article Details

Section

Articles