AI Engineering Updated May 15, 2026

MLOps & LLMOps

Production practices for model lifecycle, evaluation gates, deployment, monitoring, drift detection, and LLM observability.

The Lifecycle

MLOps is the discipline of making model behavior reproducible, deployable, observable, and reversible.

Data version -> Training -> Evaluation gate -> Registry -> Deployment -> Monitoring -> Retraining

LLMOps adds prompt versions, retrieval indexes, tool traces, model-provider changes, hallucination checks, and cost observability.

Production Gates

Before promotion, check:

Data schema validation
Training reproducibility
Offline model metrics
Slice-based error analysis
Latency and memory
Cost per request
Regression tests on golden examples
Rollback plan

Monitoring

Monitor four layers:

Infrastructure: CPU, memory, queue depth, error rate.
Data: missing values, schema drift, embedding drift.
Model: prediction distribution, confidence, retrieval quality.
Product: user outcomes, correction rate, escalation.

Deployment Patterns

Shadow mode for observation without user impact.
Canary for small traffic promotion.
Blue-green for fast rollback.
Batch inference for non-real-time use cases.
Streaming inference for interactive LLM UX.

Failure Modes

No model registry, so production cannot be reproduced.
Monitoring only infrastructure while model quality silently decays.
Retraining on biased feedback loops.
Prompt changes deployed without eval comparison.
Vendor model update changing behavior unexpectedly.

Operating Habit

Every production model should have an owner, a dashboard, an eval suite, a rollback path, and a written definition of unacceptable behavior.