PROMPT LINEAGE AND GOVERNANCE IN LLM-ENABLED DATA ENGINEERING: A REFERENCE ARCHITECTURE
Main Article Content
Abstract
modern data engineering ecosystems to automate data transformation,
quality assurance, metadata generation, and analytical reasoning. While
these models enhance productivity and adaptability, they introduce significant governance challenges due
to their probabilistic behavior and heavy reliance on prompts as executable control artifacts. Unlike
traditional data pipelines, where logic is encoded in version-controlled code, LLM-enabled systems often
embed prompts in orchestration layers without formal lifecycle management, lineage tracking, or policy
enforcement. This absence of prompt governance undermines reproducibility, auditability, and regulatory
compliance in enterprise data platforms. This paper proposes a reference architecture for prompt lineage
and governance in LLM-enabled data engineering environments. Drawing on principles from DataOps,
MLOps, metadata management, and responsible AI, the architecture treats prompts as first-class governed
assets. It enables versioning, lineage tracking, metadata capture, and policy enforcement across prompt
creation, deployment, and execution. The proposed architecture integrates with modern lakehouse
platforms, orchestration engines, and observability tools to provide end-to-end transparency across data,
prompts, models, and outputs. This study contributes a structured and practical framework to support
scalable, compliant, and trustworthy adoption of LLMs in data engineering workflows.
Downloads
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.