The Lifecycle
MLOps is the discipline of making model behavior reproducible, deployable, observable, and reversible.
Data version -> Training -> Evaluation gate -> Registry -> Deployment -> Monitoring -> Retraining
LLMOps adds prompt versions, retrieval indexes, tool traces, model-provider changes, hallucination checks, and cost observability.
Production Gates
Before promotion, check:
- Data schema validation
- Training reproducibility
- Offline model metrics
- Slice-based error analysis
- Latency and memory
- Cost per request
- Regression tests on golden examples
- Rollback plan
Monitoring
Monitor four layers:
- Infrastructure: CPU, memory, queue depth, error rate.
- Data: missing values, schema drift, embedding drift.
- Model: prediction distribution, confidence, retrieval quality.
- Product: user outcomes, correction rate, escalation.
Deployment Patterns
- Shadow mode for observation without user impact.
- Canary for small traffic promotion.
- Blue-green for fast rollback.
- Batch inference for non-real-time use cases.
- Streaming inference for interactive LLM UX.
Failure Modes
- No model registry, so production cannot be reproduced.
- Monitoring only infrastructure while model quality silently decays.
- Retraining on biased feedback loops.
- Prompt changes deployed without eval comparison.
- Vendor model update changing behavior unexpectedly.
Operating Habit
Every production model should have an owner, a dashboard, an eval suite, a rollback path, and a written definition of unacceptable behavior.