Cost Tracking System#

Overview#

The Cost Tracking system provides real-time monitoring and analysis of AI token consumption and costs across Marcus projects. It tracks actual usage patterns rather than relying on naive time-based estimates, enabling precise cost attribution and intelligent resource management.

Architecture#

Core Components#

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   TokenTracker  │    │ AIUsageMiddleware│    │  AI Providers   │
│                 │◄───┤                  │◄───┤ (Anthropic,     │
│ - Per-project   │    │ - Method wrapping│    │  OpenAI, etc.)  │
│   tracking      │    │ - Context mgmt   │    │                 │
│ - Rate calc     │    │ - Auto-tracking  │    │                 │
│ - Cost proj     │    │                  │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │
         │                       │
         ▼                       ▼
┌─────────────────┐    ┌──────────────────┐
│  data/token_    │    │ Conversation     │
│  usage.json     │    │ Logger           │
│                 │    │                  │
│ - Historical    │    │ - Usage alerts   │
│   persistence   │    │ - Cost tracking  │
└─────────────────┘    └──────────────────┘

Data Flow#

AI Call Interception: Middleware wraps all AI provider methods
Context Resolution: Determines project/agent context for attribution
Token Extraction: Parses API responses for usage data
Real-time Tracking: Updates counters and rate calculations
Cost Calculation: Applies pricing models for cost estimation
Persistence: Stores historical data for trend analysis
Monitoring: Alerts on usage anomalies and cost spikes

Integration with Marcus Ecosystem#

Project Lifecycle Integration#

The cost tracking system integrates at multiple points in the Marcus workflow:

graph TD
    A[create_project] --> B[register_agent]
    B --> C[request_next_task]
    C --> D[AI Analysis Engine]
    D --> E[Cost Tracking Middleware]
    E --> F[Token Tracker]
    F --> G[report_progress]
    G --> H[Cost Reporting]
    H --> I[report_blocker]
    I --> J[finish_task]

    subgraph "Cost Tracking Layer"
        E
        F
        H
    end

Service Dependencies#

AI Analysis Engine: Primary integration point for tracking AI usage
MCP Server: Imports cost tracking for agent context management
Conversation Logger: Receives cost alerts and usage notifications
Project Registry: Provides project context for attribution
Memory System: Could leverage usage patterns for optimization

When Cost Tracking is Invoked#

Automatic Triggers#

Agent Registration: Sets up project context for new agents
AI Provider Calls: Every call to wrapped AI methods triggers tracking
Task Assignment: Context switches update project attribution
Progress Reporting: Cost metrics included in progress updates
Background Monitoring: Continuous rate calculation and anomaly detection

Manual Triggers#

Context Manager Usage: Explicit project token tracking scopes
Direct TokenTracker Calls: Manual token logging for custom scenarios
Stats Queries: On-demand cost and usage reporting

What Makes This System Special#

Real-time Rate Calculation#

Unlike traditional hourly billing, the system calculates:

Current Spend Rate: Tokens/hour over last 5 minutes
Average Spend Rate: Session-wide usage patterns
Sliding Window Analysis: Recent vs. historical usage trends

Intelligent Cost Projection#

# Example projection logic
def _project_total_cost(self, project_id: str, current_rate: float) -> float:
    current_cost = self.project_costs[project_id]
    # Assumes 20% completion at current progress
    return current_cost * 5  # Placeholder - could integrate task completion %

Anomaly Detection#

Spend Spike Alerts: Detects usage > 2x average and > 10k tokens/hour
Background Monitoring: Continuous rate tracking with 1-minute intervals
Historical Pattern Analysis: Maintains 1000-event history per project

Context-Aware Attribution#

# Automatic context resolution
agent_id = kwargs.get('agent_id') or getattr(args[0], 'agent_id', None)
project_id = self.get_current_project(agent_id)
if not project_id:
    project_id = kwargs.get('project_id') or 'unassigned'

Technical Implementation Details#

TokenTracker Class#

Core Data Structures:

self.project_tokens: Dict[str, int] = defaultdict(int)           # Total tokens per project
self.project_costs: Dict[str, float] = defaultdict(float)        # Total costs per project
self.token_history: Dict[str, deque] = defaultdict(             # Sliding window history
    lambda: deque(maxlen=1000)
)
self.session_start_times: Dict[str, datetime] = {}              # Session tracking
self.spend_rates: Dict[str, List[float]] = defaultdict(list)    # Rate history

Key Algorithms:

Current Rate Calculation (5-minute sliding window):

def _calculate_current_spend_rate(self, project_id: str) -> float:
    cutoff = datetime.now() - timedelta(minutes=5)
    recent_events = [e for e in history if e['timestamp'] > cutoff]
    if len(recent_events) < 2:
        recent_events = history[-10:]  # Fallback to last 10 events

    time_span = (recent_events[-1]['timestamp'] - recent_events[0]['timestamp']).total_seconds()
    total_tokens = sum(e['tokens'] for e in recent_events)
    return (total_tokens / time_span) * 3600  # tokens/hour

Background Monitoring:

async def _monitor_rates(self):
    while True:
        await asyncio.sleep(60)  # Check every minute
        for project_id in self.project_tokens:
            current_rate = self.get_project_stats(project_id)['current_spend_rate']
            if current_rate > avg_rate * 2 and current_rate > 10000:
                print(f"⚠️ Token spend spike for {project_id}: {current_rate:.0f} tokens/hour")

AIUsageMiddleware Class#

Method Wrapping Strategy:

ai_methods = [
    'analyze', 'complete', 'chat', 'generate', 'call_model',
    'generate_task_instructions', 'analyze_blocker', 'generate_response',
    'classify', 'embed', 'summarize'
]

Decorator Implementation:

@functools.wraps(func)
async def wrapper(*args, **kwargs):
    # Context resolution
    agent_id = kwargs.get('agent_id') or getattr(args[0], 'agent_id', None)
    project_id = self.get_current_project(agent_id)

    # Function execution with timing
    start_time = datetime.now()
    result = await func(*args, **kwargs)
    end_time = datetime.now()

    # Token extraction and tracking
    usage = result.get('usage', {})
    await self.token_tracker.track_tokens(
        project_id=project_id,
        input_tokens=usage.get('input_tokens', 0),
        output_tokens=usage.get('output_tokens', 0),
        metadata={'agent_id': agent_id, 'duration_ms': duration}
    )

Context Management#

Project Context Tracking:

def set_project_context(self, agent_id: str, project_id: str, task_id: Optional[str] = None):
    self.current_project_context[agent_id] = {
        'project_id': project_id,
        'task_id': task_id,
        'start_time': datetime.now()
    }

Context Manager for Explicit Scoping:

with track_project_tokens("project_123", "agent_1"):
    # All AI calls tracked to project_123
    await ai_engine.analyze(...)

Pros and Cons#

Advantages#

Precise Attribution: Tracks costs to specific projects/agents/tasks
Real-time Monitoring: Immediate feedback on usage patterns
Anomaly Detection: Prevents runaway costs through alerts
Historical Analysis: Enables cost trend analysis and optimization
Transparent Middleware: Zero-impact integration with existing code
Flexible Context: Supports both automatic and manual context management
Persistent Storage: Survives restarts with historical data preservation

Limitations#

Projection Accuracy: Cost projections use simple heuristics (5x multiplier)
Limited Provider Support: Assumes specific API response formats
Memory Usage: Maintains in-memory deques for recent history (1000 events)
Context Dependency: Requires proper agent/project context setup
Single Pricing Model: Fixed cost per 1k tokens across all models
No Budget Enforcement: Tracks but doesn’t prevent overspending

Technical Debt#

Hardcoded Constants: 5-minute windows, 1000-event limits, rate thresholds
Simple Projection: Should integrate with actual task completion percentages
Provider Coupling: Method names hardcoded for specific AI providers
Error Handling: Limited graceful degradation on tracking failures
Concurrency: No explicit thread safety for rate calculations

Why This Approach Was Chosen#

Design Philosophy#

Non-intrusive: Middleware pattern ensures existing code remains unchanged
Real-time: Immediate feedback enables proactive cost management
Granular: Project/agent/task level attribution for precise accountability
Extensible: Decorator pattern allows easy addition of new AI providers
Persistent: Historical data enables trend analysis and optimization

Alternative Approaches Considered#

Billing Integration: Direct integration with provider billing APIs
- Rejected: Lag time and lack of real-time feedback
Manual Logging: Explicit tracking calls throughout codebase
- Rejected: High maintenance burden and error-prone
Proxy Server: Network-level interception of API calls
- Rejected: Complex setup and limited context awareness
Time-based Estimation: Hourly rates based on agent activity
- Rejected: Inaccurate and doesn’t reflect actual AI usage

Future Evolution#

Short-term Enhancements#

Dynamic Pricing: Support multiple models with different costs
Budget Enforcement: Hard limits with graceful degradation
Enhanced Projections: Integration with task completion tracking
Dashboard Integration: Real-time cost visualization
Export Capabilities: CSV/JSON exports for external analysis

Medium-term Features#

Cost Optimization: AI usage pattern analysis and recommendations
Provider Switching: Automatic routing based on cost/performance
Resource Allocation: Dynamic agent assignment based on budget
Predictive Analytics: ML-based cost forecasting
Integration APIs: External billing system integration

Long-term Vision#

Multi-tenant Support: Isolated cost tracking per client/organization
Carbon Footprint: Environmental impact tracking alongside costs
Performance Correlation: Cost vs. quality analysis
Automated Optimization: Self-tuning cost management
Advanced Analytics: Cost attribution to business outcomes

Handling Simple vs Complex Tasks#

Task Complexity Detection#

Currently, the system tracks all AI usage uniformly, but could be enhanced to differentiate:

# Future enhancement example
def classify_task_complexity(metadata: Dict) -> str:
    token_count = metadata.get('total_tokens', 0)
    if token_count < 1000:
        return 'simple'
    elif token_count < 5000:
        return 'medium'
    else:
        return 'complex'

Differential Cost Management#

Simple Tasks (< 1k tokens):
- Basic tracking only
- Minimal rate monitoring
- Batch processing for efficiency
Complex Tasks (> 5k tokens):
- Enhanced monitoring
- Real-time rate alerts
- Detailed attribution tracking
- Performance correlation analysis

Board-specific Considerations#

Kanban Integration#

The cost tracking system considers board-specific factors:

Project Size: MVP vs. Large projects have different cost profiles
Board Complexity: Number of lanes affects AI analysis frequency
Agent Density: More agents = more context switching overhead
Task Dependencies: Complex dependency graphs require more AI analysis

Board-aware Cost Attribution#

# Future enhancement
def get_board_cost_factors(board_config: Dict) -> Dict:
    return {
        'complexity_multiplier': 1.0 + (board_config.get('lanes', 3) - 3) * 0.1,
        'agent_overhead': board_config.get('max_agents', 5) * 0.05,
        'dependency_factor': 1.0 + board_config.get('dependency_depth', 1) * 0.2
    }

Integration with Cato#

Current State#

The cost tracking system is designed to be Cato-agnostic, but could integrate for:

Cost Attribution: Track costs per Cato reasoning session
Model Selection: Route to different models based on cost constraints
Quality Correlation: Analyze cost vs. reasoning quality trade-offs

Future Cato Integration#

# Potential Cato integration
class CatoCostTracker:
    def track_reasoning_session(self, session_id: str, steps: List[Dict]):
        total_tokens = sum(step.get('tokens', 0) for step in steps)
        reasoning_depth = len(steps)
        # Track correlation between depth and cost

Position in Marcus Workflow#

Typical Scenario Flow#

1. create_project
   ├── Initialize cost tracking context
   └── Set up project cost buckets

2. register_agent
   ├── Create agent cost context
   └── Begin session tracking

3. request_next_task
   ├── AI analysis for task assignment
   ├── Cost tracking middleware intercepts
   └── Update current spend rates

4. report_progress
   ├── Include cost metrics in progress
   ├── Check for spending anomalies
   └── Update cost projections

5. report_blocker
   ├── AI analysis for blocker resolution
   ├── Track additional analysis costs
   └── Cost-aware solution ranking

6. finish_task
   ├── Final cost attribution
   ├── Session cost summary
   └── Update historical patterns

Critical Integration Points#

Agent Registration: Establishes cost attribution context
Task Assignment: Major AI analysis point requiring cost tracking
Progress Updates: Opportunity for cost reporting and alerts
Blocker Analysis: High-cost AI operation requiring monitoring
Project Completion: Final cost accounting and pattern analysis

Cost Visibility#

At each stage, the system provides:

Real-time rates: Current tokens/hour consumption
Total costs: Accumulated project expenses
Projections: Estimated final costs at current burn rate
Alerts: Anomaly detection for unusual spending patterns
Attribution: Detailed breakdown by agent, task, and operation type

The cost tracking system serves as a critical observability layer, ensuring that Marcus operates within budgetary constraints while providing the intelligence needed for effective project management.