Enterprise Patterns: Scale, Resilience, Integration
Moving from prototype to production requires patterns for multi-tenancy, disaster recovery, model lifecycle, and enterprise integration.
Building a trustworthy AI system is one challenge. Operating it at enterprise scale is another. This post covers the patterns that separate proof-of-concept from production: how to serve multiple business units, survive failures, manage model evolution, and integrate with existing enterprise systems.
Multi-Tenancy: One Platform, Many Customers
Enterprise AI platforms rarely serve a single use case. Legal needs different guardrails than Marketing. Finance has stricter data residency requirements than HR. Yet running separate platforms for each is wasteful and inconsistent.
The Multi-Tenant Challenge: How do you share infrastructure for efficiency while providing isolation for security, compliance, and customization?
Tenant Isolation Models
There are three primary approaches, each with different trade-offs:
| Model | Isolation Level | Cost | Best For |
|---|---|---|---|
| Shared Everything | Logical (tenant ID filtering) | Lowest | Internal teams, low-risk data |
| Shared Compute, Isolated Data | Separate databases, shared services | Medium | Most enterprise scenarios |
| Fully Dedicated | Separate infrastructure per tenant | Highest | Regulated industries, extreme sensitivity |
For most enterprises, the middle path works: shared orchestration and guardrails, but isolated knowledge bases, policies, and audit logs per tenant.
What Gets Scoped Per Tenant
- Policies: Each tenant has its own guardrail rules, approval workflows, and escalation paths
- Model Access: Finance may only use local models; Marketing can use cloud providers
- Cost Quotas: Budget limits and alerts per tenant prevent runaway spending
- Knowledge Bases: Separate RAG sources ensure data doesn’t leak across tenants
- Audit Logs: Tenant-specific logs for compliance, accessible only to that tenant’s admins
- User Roles: RBAC scoped to tenant – HR admin can’t modify Finance policies
# Request carries tenant context throughout the system
request = {
"tenant_id": "finance_001",
"user_id": "analyst_jane",
"query": "Summarize Q3 revenue",
# Resolved from tenant config
"allowed_models": ["ollama/llama3", "ollama/mistral"],
"policy_set": "finance_sox_compliant",
"knowledge_base": "finance_kb_v3",
"cost_center": "finance_dept"
}
Data Lineage & Provenance
When an AI produces an answer, can you trace it back to its sources? For audits, debugging, and compliance, you need end-to-end lineage tracking.
The Audit Question: “Why did the AI say this?” requires tracing through every transformation: which model was used, what context was retrieved, what guardrails fired, what the original request was.
Request-Level Tracing
Every request gets a unique trace ID that propagates through all services. Parent-child relationships track how requests spawn sub-requests:
What Gets Logged
# Lineage record for each request
lineage_record = {
"trace_id": "abc-123",
"timestamp": "2026-01-03T14:23:45Z",
# Input tracking
"original_query": "...",
"tenant_id": "finance_001",
"user_id": "analyst_jane",
# Processing decisions
"model_selected": "ollama/llama3",
"selection_reason": "privacy=local_only, score=0.87",
"context_sources": ["policy_doc.pdf:3", "faq.md:2.1"],
# Safety checks
"guardrails_applied": ["pii_filter", "financial_data"],
"guardrails_triggered": [],
# Output
"response_hash": "sha256:...", # Don't store actual response
"tokens_used": 847,
"latency_ms": 1234
}
With this lineage, you can answer: “What documents informed this response?” “Did any guardrails fire?” “Which model version was used?” – months after the interaction.
Model Lifecycle Management
Models aren’t static. New versions release, performance characteristics change, providers deprecate endpoints. A production AI platform needs formal lifecycle management.
Key Lifecycle Patterns
| Pattern | Description | Use Case |
|---|---|---|
| A/B Testing | Route percentage of traffic to new model | Evaluating new model versions |
| Canary Deployment | Start at 5%, increase if metrics hold | Safe rollout of major changes |
| Shadow Mode | Run new model in parallel, compare outputs | Testing without user impact |
| Deprecation Window | Grace period before removing old model | Giving users time to migrate |
| Instant Rollback | Route all traffic back to previous version | Incident response |
# Model registry entry with lifecycle state
model_config = {
"model_id": "llama3-v2.1",
"state": "canary",
"traffic_percentage": 10,
"promoted_from": "staging",
"promotion_date": "2026-01-01",
# Promotion criteria
"promotion_rules": {
"min_requests": 1000,
"max_error_rate": 0.01,
"min_quality_score": 0.85
},
# Rollback config
"fallback_model": "llama3-v2.0",
"auto_rollback_on": ["error_rate > 0.05"]
}
Disaster Recovery & Resilience
AI services fail. APIs timeout. Rate limits hit. Models go offline for maintenance. A production platform needs graceful degradation, not catastrophic failure.
The Availability Question: What happens when GPT-4 returns a 429? When your local Ollama instance crashes? When the knowledge service is slow? Users shouldn’t see raw errors.
# Circuit breaker configuration
circuit_breaker = {
"model_id": "openai/gpt-4",
# Trip conditions
"failure_threshold": 5, # failures before tripping
"failure_window_seconds": 60, # time window for counting
# Recovery
"reset_timeout_seconds": 30, # wait before testing
"half_open_requests": 3, # test requests before closing
# Fallback chain
"fallback_models": [
"anthropic/claude-3",
"ollama/llama3"
]
}
RTO and RPO for AI Services
Traditional DR metrics apply to AI services too:
| Metric | Definition | Typical Target |
|---|---|---|
| RTO (Recovery Time Objective) | How quickly must service restore after outage? | < 5 minutes for failover |
| RPO (Recovery Point Objective) | How much data loss is acceptable? | 0 for requests (stateless), 1 hour for playbooks |
| MTBF (Mean Time Between Failures) | Expected uptime between incidents | > 30 days per component |
| MTTR (Mean Time To Recover) | Average incident resolution time | < 15 minutes |
Integration Patterns
AI platforms don’t exist in isolation. They need to integrate with enterprise systems: identity providers, data catalogs, ticketing systems, MLOps pipelines.
Event-Driven Architecture
For loose coupling, emit events that other systems can consume:
# Events emitted by the AI platform
events = [
# For audit and analytics
{"type": "request.completed", "trace_id": "..."},
{"type": "guardrail.triggered", "rule": "pii_detected"},
# For incident management
{"type": "model.circuit_opened", "model": "gpt-4"},
{"type": "request.escalated", "reason": "human_override"},
# For cost management
{"type": "quota.threshold", "tenant": "finance", "pct": 80},
# For learning systems (ACE)
{"type": "feedback.received", "rating": "negative"}
]
The Trust Connection
Enterprise patterns aren’t just operational necessities – they’re trust enablers:
- Multi-tenancy ensures one team’s misconfiguration doesn’t affect another
- Data lineage enables “show your work” for any decision
- Model lifecycle prevents unexpected changes from impacting production
- Resilience means the platform is there when users need it
- Integration respects existing security and governance controls
Enterprise Trust: At scale, trust isn’t just about individual responses being correct. It’s about the platform being reliable, secure, auditable, and respectful of organizational boundaries. These patterns enable that.
Next in the Series: We’ve covered how to build, learn, and scale trustworthy AI. In Part 9, we’ll connect technical metrics to business value: how do you measure AI ROI, demonstrate compliance, and track adoption?
Leave a comment