Governance: Complete Audit Trails
Who did what, when, and why? Governance is about answering those questions definitively.
Trust isn’t just about AI making good decisions. It’s about knowing who accessed the system, what they did, how much it cost, and whether policies were followed. Governance makes AI operations auditable.
The Four Pillars of AI Governance
Access Control: RBAC for AI
Role-Based Access Control (RBAC) maps users to roles, and roles to permissions. This creates a clear hierarchy:
# Role definitions
roles = {
"admin": {
"permissions": ["query", "admin", "audit", "configure"],
"quotas": {"daily_queries": 10000, "daily_cost": 100.0}
},
"analyst": {
"permissions": ["query", "audit"],
"quotas": {"daily_queries": 1000, "daily_cost": 25.0}
},
"viewer": {
"permissions": ["query"],
"quotas": {"daily_queries": 100, "daily_cost": 5.0}
}
}
Trust Insight: With RBAC, you can answer “who has access to what?” instantly. When an audit asks who could have seen sensitive data, you have a definitive answer – not a guess.
How RBAC is Enforced
Services call the governance endpoint before executing any operation. The check is synchronous and fail-secure (deny on error):
# POST /api/v1/access/check
{
"tenant_id": "acme-corp",
"user_id": "alice@acme.com",
"resource": "knowledge-base",
"action": "query"
}
# Response
{
"allowed": true,
"matched_roles": ["analyst"],
"matched_permissions": ["kb-query"]
}
# Or if denied:
{
"allowed": false,
"reason": "No matching permissions"
}
The enforcement pattern: User β Roles lookup β Permissions collected β Match against (resource, action). Every check is logged to the audit trail.
Policy Engine: Rules That Enforce Behavior
Policies define what’s allowed and what happens when rules are violated. Each policy has conditions and actions:
# Policy rule structure
policy_rule = {
"name": "cost-limit-per-request",
"description": "Block requests that exceed $0.50",
"conditions": {
"estimated_cost": {"operator": "gt", "value": 0.50}
},
"action": "BLOCK",
"message": "Request exceeds per-request cost limit"
}
Policy Actions
- BLOCK: Stop the request immediately with an error
- WARN: Allow but log a warning for review
- REQUIRE_APPROVAL: Queue for human approval before execution
- RATE_LIMIT: Slow down rather than block
Common Policy Patterns
# Sensitive data policy
{
"name": "no-pii-to-external-models",
"conditions": {
"has_pii": {"operator": "eq", "value": True},
"model_type": {"operator": "eq", "value": "external"}
},
"action": "BLOCK"
}
# Time-based policy
{
"name": "off-hours-approval",
"conditions": {
"hour_of_day": {"operator": "not_in", "value": [9,10,...,17]}
},
"action": "REQUIRE_APPROVAL"
}
Cost Tracking: Know Your Spend
AI costs can spiral quickly. GPT-4 at $0.03/1K tokens adds up. Cost tracking provides:
- Real-time visibility into spending
- Per-user and per-role budgets
- Alerts when approaching limits
- Historical trends for forecasting
# Cost tracking aggregates
cost_summary = {
"period": "2024-12-07",
"total_requests": 1247,
"total_tokens": 892350,
"total_cost_usd": 42.50,
"by_model": {
"gpt-4o": {"requests": 423, "cost": 28.40},
"claude-3-haiku": {"requests": 824, "cost": 14.10}
}
}
Real-Time Tracking with Redis
The POC uses Redis for fast, real-time cost aggregation across multiple time periods:
# Redis keys for cost tracking (from governance-service)
usage:daily:{tenant}:{user}:tokens:2024-12-07 # Daily aggregate
usage:weekly:{tenant}:{user}:tokens:2024-12-02 # Weekly aggregate
usage:monthly:{tenant}:{user}:tokens:2024-12 # Monthly aggregate
# Each request increments atomically
redis.incrbyfloat(daily_key, token_count)
# Check current usage in milliseconds
current_usage = redis.get(daily_key) # Fast quota checks
Multi-Period Quotas
Quotas can be enforced at daily, weekly, or monthly levels – each with its own limit and alert threshold:
# Quota configuration
quota = {
"metric_type": "TOKENS",
"quota_type": "daily", # daily | weekly | monthly
"limit": 100000,
"alert_threshold": 0.8, # Alert at 80%
"current_usage": 75000,
"usage_percentage": 0.75
}
# When threshold exceeded, alert generated automatically
What You Can Measure: Redis-backed tracking gives you real-time visibility into usage by tenant, user, and model. Quota alerts fire before limits are reached, not after.
Audit Logging: Every Action Recorded
Audit logs capture everything. Every query, every response, every policy decision. This creates an immutable record for:
- Compliance audits
- Incident investigation
- Usage pattern analysis
- Security forensics
Audit Log Structure
# Each audit entry captures comprehensive context
audit_entry = {
"timestamp": "2024-12-07T14:32:18.234Z",
"event_type": "QUERY_EXECUTED",
"user_id": "alice@company.com",
"role": "analyst",
"request_id": "req_abc123",
"model": "gpt-4o",
"tokens": {"prompt": 847, "completion": 400},
"cost_usd": 0.08,
"policies_evaluated": ["cost-limit", "pii-check"],
"policies_passed": True,
"latency_ms": 1847
}
Trust Insight: With complete audit logs, you can reconstruct exactly what happened at any point in time. When something goes wrong, you’re not guessing – you’re analyzing facts.
Compliance Reports
Governance data feeds into compliance reports. Common requirements:
- Access reports: Who accessed what data and when
- Cost reports: Spending by team, user, or model
- Policy reports: What was blocked and why
- Usage reports: Patterns and anomalies
# Compliance report structure
compliance_report = {
"report_type": "MONTHLY_AUDIT",
"period": "2024-12",
"summary": {
"total_requests": 42847,
"unique_users": 127,
"total_cost": 1847.23,
"policies_blocked": 234,
"pii_incidents": 12
},
"top_users_by_cost": [...],
"blocked_by_policy": {...},
"recommendations": [
"Consider increasing analyst quotas - 15% hit limits"
]
}
What to Measure: Governance Metrics
Governance produces measurable signals. Track these to know your AI system is under control:
# Key governance metrics (from compliance_reporter.py)
governance_metrics = {
# Access Control
"active_users": 127,
"total_roles": 5,
"access_denied_count": 34, # Attempts blocked by RBAC
# Policy Enforcement
"policy_violations": 12, # Requests blocked by policy
"pii_incidents": 3, # PII detected and blocked
# Cost Control
"total_cost_usd": 1847.23,
"quota_breaches": 7, # Users who hit limits
"cost_by_model": {"gpt-4o": 1200, "claude": 500},
# Audit Coverage
"total_events_audited": 42847,
"audit_gaps": 0 # Operations without audit record
}
Compliance Status at a Glance: The POC generates compliance reports with overall status (compliant/warning/non-compliant) per section: access control, policy violations, cost tracking, data protection. Red flags surface automatically.
Why Local Governance Matters
Cloud platforms provide governance features, but local control gives you:
- Custom policies: Define rules that match your specific requirements
- Complete audit access: Export full logs, not just summaries
- Integration flexibility: Connect to your existing SIEM, IAM, or compliance tools
- Data sovereignty: Audit logs never leave your control
The Pattern: Governance isn’t about restriction – it’s about accountability. When everyone knows their actions are logged and policies are enforced consistently, trust follows naturally.
Coming Up Next
Governance tracks who did what. But for agentic AI workflows, we need deeper observability. In the final post, we’ll explore Agent Observability – tracking task completion, workflow efficiency, and cost analysis for AI agents.
Leave a comment