Cost Tracking System#
Overview#
The Cost Tracking system provides real-time monitoring and analysis of AI token consumption and costs across Marcus projects. It tracks actual usage patterns rather than relying on naive time-based estimates, enabling precise cost attribution and intelligent resource management.
Architecture#
Core Components#
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β TokenTracker β β AIUsageMiddlewareβ β AI Providers β
β ββββββ€ ββββββ€ (Anthropic, β
β - Per-project β β - Method wrappingβ β OpenAI, etc.) β
β tracking β β - Context mgmt β β β
β - Rate calc β β - Auto-tracking β β β
β - Cost proj β β β β β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β
β β
βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ
β data/token_ β β Conversation β
β usage.json β β Logger β
β β β β
β - Historical β β - Usage alerts β
β persistence β β - Cost tracking β
βββββββββββββββββββ ββββββββββββββββββββ
Data Flow#
AI Call Interception: Middleware wraps all AI provider methods
Context Resolution: Determines project/agent context for attribution
Token Extraction: Parses API responses for usage data
Real-time Tracking: Updates counters and rate calculations
Cost Calculation: Applies pricing models for cost estimation
Persistence: Stores historical data for trend analysis
Monitoring: Alerts on usage anomalies and cost spikes
Integration with Marcus Ecosystem#
Project Lifecycle Integration#
The cost tracking system integrates at multiple points in the Marcus workflow:
graph TD
A[create_project] --> B[register_agent]
B --> C[request_next_task]
C --> D[AI Analysis Engine]
D --> E[Cost Tracking Middleware]
E --> F[Token Tracker]
F --> G[report_progress]
G --> H[Cost Reporting]
H --> I[report_blocker]
I --> J[finish_task]
subgraph "Cost Tracking Layer"
E
F
H
end
Service Dependencies#
AI Analysis Engine: Primary integration point for tracking AI usage
MCP Server: Imports cost tracking for agent context management
Conversation Logger: Receives cost alerts and usage notifications
Project Registry: Provides project context for attribution
Memory System: Could leverage usage patterns for optimization
When Cost Tracking is Invoked#
Automatic Triggers#
Agent Registration: Sets up project context for new agents
AI Provider Calls: Every call to wrapped AI methods triggers tracking
Task Assignment: Context switches update project attribution
Progress Reporting: Cost metrics included in progress updates
Background Monitoring: Continuous rate calculation and anomaly detection
Manual Triggers#
Context Manager Usage: Explicit project token tracking scopes
Direct TokenTracker Calls: Manual token logging for custom scenarios
Stats Queries: On-demand cost and usage reporting
What Makes This System Special#
Real-time Rate Calculation#
Unlike traditional hourly billing, the system calculates:
Current Spend Rate: Tokens/hour over last 5 minutes
Average Spend Rate: Session-wide usage patterns
Sliding Window Analysis: Recent vs. historical usage trends
Intelligent Cost Projection#
# Example projection logic
def _project_total_cost(self, project_id: str, current_rate: float) -> float:
current_cost = self.project_costs[project_id]
# Assumes 20% completion at current progress
return current_cost * 5 # Placeholder - could integrate task completion %
Anomaly Detection#
Spend Spike Alerts: Detects usage > 2x average and > 10k tokens/hour
Background Monitoring: Continuous rate tracking with 1-minute intervals
Historical Pattern Analysis: Maintains 1000-event history per project
Context-Aware Attribution#
# Automatic context resolution
agent_id = kwargs.get('agent_id') or getattr(args[0], 'agent_id', None)
project_id = self.get_current_project(agent_id)
if not project_id:
project_id = kwargs.get('project_id') or 'unassigned'
Technical Implementation Details#
TokenTracker Class#
Core Data Structures:
self.project_tokens: Dict[str, int] = defaultdict(int) # Total tokens per project
self.project_costs: Dict[str, float] = defaultdict(float) # Total costs per project
self.token_history: Dict[str, deque] = defaultdict( # Sliding window history
lambda: deque(maxlen=1000)
)
self.session_start_times: Dict[str, datetime] = {} # Session tracking
self.spend_rates: Dict[str, List[float]] = defaultdict(list) # Rate history
Key Algorithms:
Current Rate Calculation (5-minute sliding window):
def _calculate_current_spend_rate(self, project_id: str) -> float:
cutoff = datetime.now() - timedelta(minutes=5)
recent_events = [e for e in history if e['timestamp'] > cutoff]
if len(recent_events) < 2:
recent_events = history[-10:] # Fallback to last 10 events
time_span = (recent_events[-1]['timestamp'] - recent_events[0]['timestamp']).total_seconds()
total_tokens = sum(e['tokens'] for e in recent_events)
return (total_tokens / time_span) * 3600 # tokens/hour
Background Monitoring:
async def _monitor_rates(self):
while True:
await asyncio.sleep(60) # Check every minute
for project_id in self.project_tokens:
current_rate = self.get_project_stats(project_id)['current_spend_rate']
if current_rate > avg_rate * 2 and current_rate > 10000:
print(f"β οΈ Token spend spike for {project_id}: {current_rate:.0f} tokens/hour")
AIUsageMiddleware Class#
Method Wrapping Strategy:
ai_methods = [
'analyze', 'complete', 'chat', 'generate', 'call_model',
'generate_task_instructions', 'analyze_blocker', 'generate_response',
'classify', 'embed', 'summarize'
]
Decorator Implementation:
@functools.wraps(func)
async def wrapper(*args, **kwargs):
# Context resolution
agent_id = kwargs.get('agent_id') or getattr(args[0], 'agent_id', None)
project_id = self.get_current_project(agent_id)
# Function execution with timing
start_time = datetime.now()
result = await func(*args, **kwargs)
end_time = datetime.now()
# Token extraction and tracking
usage = result.get('usage', {})
await self.token_tracker.track_tokens(
project_id=project_id,
input_tokens=usage.get('input_tokens', 0),
output_tokens=usage.get('output_tokens', 0),
metadata={'agent_id': agent_id, 'duration_ms': duration}
)
Context Management#
Project Context Tracking:
def set_project_context(self, agent_id: str, project_id: str, task_id: Optional[str] = None):
self.current_project_context[agent_id] = {
'project_id': project_id,
'task_id': task_id,
'start_time': datetime.now()
}
Context Manager for Explicit Scoping:
with track_project_tokens("project_123", "agent_1"):
# All AI calls tracked to project_123
await ai_engine.analyze(...)
Pros and Cons#
Advantages#
Precise Attribution: Tracks costs to specific projects/agents/tasks
Real-time Monitoring: Immediate feedback on usage patterns
Anomaly Detection: Prevents runaway costs through alerts
Historical Analysis: Enables cost trend analysis and optimization
Transparent Middleware: Zero-impact integration with existing code
Flexible Context: Supports both automatic and manual context management
Persistent Storage: Survives restarts with historical data preservation
Limitations#
Projection Accuracy: Cost projections use simple heuristics (5x multiplier)
Limited Provider Support: Assumes specific API response formats
Memory Usage: Maintains in-memory deques for recent history (1000 events)
Context Dependency: Requires proper agent/project context setup
Single Pricing Model: Fixed cost per 1k tokens across all models
No Budget Enforcement: Tracks but doesnβt prevent overspending
Technical Debt#
Hardcoded Constants: 5-minute windows, 1000-event limits, rate thresholds
Simple Projection: Should integrate with actual task completion percentages
Provider Coupling: Method names hardcoded for specific AI providers
Error Handling: Limited graceful degradation on tracking failures
Concurrency: No explicit thread safety for rate calculations
Why This Approach Was Chosen#
Design Philosophy#
Non-intrusive: Middleware pattern ensures existing code remains unchanged
Real-time: Immediate feedback enables proactive cost management
Granular: Project/agent/task level attribution for precise accountability
Extensible: Decorator pattern allows easy addition of new AI providers
Persistent: Historical data enables trend analysis and optimization
Alternative Approaches Considered#
Billing Integration: Direct integration with provider billing APIs
Rejected: Lag time and lack of real-time feedback
Manual Logging: Explicit tracking calls throughout codebase
Rejected: High maintenance burden and error-prone
Proxy Server: Network-level interception of API calls
Rejected: Complex setup and limited context awareness
Time-based Estimation: Hourly rates based on agent activity
Rejected: Inaccurate and doesnβt reflect actual AI usage
Future Evolution#
Short-term Enhancements#
Dynamic Pricing: Support multiple models with different costs
Budget Enforcement: Hard limits with graceful degradation
Enhanced Projections: Integration with task completion tracking
Dashboard Integration: Real-time cost visualization
Export Capabilities: CSV/JSON exports for external analysis
Medium-term Features#
Cost Optimization: AI usage pattern analysis and recommendations
Provider Switching: Automatic routing based on cost/performance
Resource Allocation: Dynamic agent assignment based on budget
Predictive Analytics: ML-based cost forecasting
Integration APIs: External billing system integration
Long-term Vision#
Multi-tenant Support: Isolated cost tracking per client/organization
Carbon Footprint: Environmental impact tracking alongside costs
Performance Correlation: Cost vs. quality analysis
Automated Optimization: Self-tuning cost management
Advanced Analytics: Cost attribution to business outcomes
Handling Simple vs Complex Tasks#
Task Complexity Detection#
Currently, the system tracks all AI usage uniformly, but could be enhanced to differentiate:
# Future enhancement example
def classify_task_complexity(metadata: Dict) -> str:
token_count = metadata.get('total_tokens', 0)
if token_count < 1000:
return 'simple'
elif token_count < 5000:
return 'medium'
else:
return 'complex'
Differential Cost Management#
Simple Tasks (< 1k tokens):
Basic tracking only
Minimal rate monitoring
Batch processing for efficiency
Complex Tasks (> 5k tokens):
Enhanced monitoring
Real-time rate alerts
Detailed attribution tracking
Performance correlation analysis
Board-specific Considerations#
Kanban Integration#
The cost tracking system considers board-specific factors:
Project Size: MVP vs. Large projects have different cost profiles
Board Complexity: Number of lanes affects AI analysis frequency
Agent Density: More agents = more context switching overhead
Task Dependencies: Complex dependency graphs require more AI analysis
Board-aware Cost Attribution#
# Future enhancement
def get_board_cost_factors(board_config: Dict) -> Dict:
return {
'complexity_multiplier': 1.0 + (board_config.get('lanes', 3) - 3) * 0.1,
'agent_overhead': board_config.get('max_agents', 5) * 0.05,
'dependency_factor': 1.0 + board_config.get('dependency_depth', 1) * 0.2
}
Integration with Cato#
Current State#
The cost tracking system is designed to be Cato-agnostic, but could integrate for:
Cost Attribution: Track costs per Cato reasoning session
Model Selection: Route to different models based on cost constraints
Quality Correlation: Analyze cost vs. reasoning quality trade-offs
Future Cato Integration#
# Potential Cato integration
class CatoCostTracker:
def track_reasoning_session(self, session_id: str, steps: List[Dict]):
total_tokens = sum(step.get('tokens', 0) for step in steps)
reasoning_depth = len(steps)
# Track correlation between depth and cost
Position in Marcus Workflow#
Typical Scenario Flow#
1. create_project
βββ Initialize cost tracking context
βββ Set up project cost buckets
2. register_agent
βββ Create agent cost context
βββ Begin session tracking
3. request_next_task
βββ AI analysis for task assignment
βββ Cost tracking middleware intercepts
βββ Update current spend rates
4. report_progress
βββ Include cost metrics in progress
βββ Check for spending anomalies
βββ Update cost projections
5. report_blocker
βββ AI analysis for blocker resolution
βββ Track additional analysis costs
βββ Cost-aware solution ranking
6. finish_task
βββ Final cost attribution
βββ Session cost summary
βββ Update historical patterns
Critical Integration Points#
Agent Registration: Establishes cost attribution context
Task Assignment: Major AI analysis point requiring cost tracking
Progress Updates: Opportunity for cost reporting and alerts
Blocker Analysis: High-cost AI operation requiring monitoring
Project Completion: Final cost accounting and pattern analysis
Cost Visibility#
At each stage, the system provides:
Real-time rates: Current tokens/hour consumption
Total costs: Accumulated project expenses
Projections: Estimated final costs at current burn rate
Alerts: Anomaly detection for unusual spending patterns
Attribution: Detailed breakdown by agent, task, and operation type
The cost tracking system serves as a critical observability layer, ensuring that Marcus operates within budgetary constraints while providing the intelligence needed for effective project management.