Marcus Enhanced Task Classifier#
Table of Contents#
Overview#
The Marcus Enhanced Task Classifier is a sophisticated pattern-matching and keyword-based task categorization system that automatically classifies, prioritizes, and routes tasks to the most suitable autonomous agents within the Marcus ecosystem.
What the System Does#
The Enhanced Task Classifier provides:
Intelligent Task Categorization: Automatic classification of tasks by type using expanded keyword lists and regex patterns
Confidence Scoring: Detailed confidence metrics for classification decisions
Skill Requirement Analysis: Extraction of required skills from task descriptions
Agent Matching: Optimal agent-to-task matching based on capabilities and workload
Context-Aware Classification: Considers task labels, descriptions, and project context
Classification Suggestions: Recommendations for improving task clarity
Supported Task Types#
The classifier categorizes tasks into six primary types:
DESIGN: Architecture, planning, research, wireframes, specifications
IMPLEMENTATION: Building features, coding, bug fixes, integrations
TESTING: Unit tests, integration tests, QA, verification
DOCUMENTATION: README files, API docs, code comments, guides
DEPLOYMENT: Releases, production rollouts, migrations
INFRASTRUCTURE: CI/CD, server setup, monitoring, DevOps
Architecture#
Single-Class Design#
The Enhanced Task Classifier is implemented as a single, focused EnhancedTaskClassifier class that uses pattern matching and keyword analysis.
graph TB
subgraph "EnhancedTaskClassifier"
KM[Keyword Matcher]
PM[Pattern Matcher]
SC[Score Calculator]
CR[Confidence Resolver]
end
Task[Task Input] --> KM
Task --> PM
KM --> SC
PM --> SC
SC --> CR
CR --> Result[ClassificationResult]
style EnhancedTaskClassifier fill:#e1f5fe
Key Components:
Keyword Matcher: Matches task text against comprehensive keyword dictionaries (primary, secondary, and verb keywords)
Pattern Matcher: Uses compiled regex patterns to identify task type indicators
Score Calculator: Weighs keyword and pattern matches to score each task type
Confidence Resolver: Calculates final confidence based on match strength and uniqueness
Ecosystem Integration#
Core Marcus Systems Integration#
The Enhanced Task Classifier integrates with Marcus’s core systems to provide intelligent task routing:
Basic Classification:
# src/integrations/enhanced_task_classifier.py
from src.core.models import Task
from src.integrations.nlp_task_utils import TaskType
class EnhancedTaskClassifier:
"""Enhanced task classifier with expanded keywords and pattern matching."""
def classify(self, task: Task) -> TaskType:
"""
Classify a task using enhanced logic.
Args:
task: Task to classify
Returns:
TaskType enum value
"""
result = self.classify_with_confidence(task)
return result.task_type
def classify_with_confidence(self, task: Task) -> ClassificationResult:
"""
Classify a task and return detailed results with confidence.
Returns:
ClassificationResult with type, confidence, and reasoning
"""
# Combine text from task name, description, and labels
text = f"{task.name} {task.description or ''} {' '.join(task.labels or [])}".lower()
# Score each task type
scores = {}
matched_keywords = {}
matched_patterns = {}
for task_type in TaskType:
if task_type == TaskType.OTHER:
continue
score, keywords, patterns = self._score_task_type(text, task_type, task.labels or [])
scores[task_type] = score
matched_keywords[task_type] = keywords
matched_patterns[task_type] = patterns
# Find best match and calculate confidence
best_type = max(scores.items(), key=lambda x: x[1])[0]
confidence = self._calculate_confidence(scores, best_type, matched_patterns[best_type])
return ClassificationResult(
task_type=best_type,
confidence=confidence,
matched_keywords=matched_keywords[best_type],
matched_patterns=matched_patterns[best_type],
reasoning=self._generate_reasoning(best_type, matched_keywords[best_type], matched_patterns[best_type])
)
Actual Consumers of EnhancedTaskClassifier:
Note:
AIAnalysisEngineandIntelligentTaskGeneratordo NOT useEnhancedTaskClassifier. The actual consumers are:
# phase_dependency_enforcer.py
from src.integrations.enhanced_task_classifier import EnhancedTaskClassifier
# dependency_inferer_hybrid.py
from src.integrations.enhanced_task_classifier import EnhancedTaskClassifier
# nlp_base.py
from src.integrations.enhanced_task_classifier import EnhancedTaskClassifier
# marcus_mcp/tools/task.py
from src.integrations.enhanced_task_classifier import EnhancedTaskClassifier
Workflow Integration#
The Enhanced Task Classifier integrates at multiple points in the Marcus workflow:
Task Creation Workflow#
Task Description Input → Classification → Type Assignment → Agent Matching → Task Assignment
↓ ↓ ↓ ↓ ↓
Text Analysis Keyword/Pattern Confidence Score Skill Matching Optimal Assignment
Matching Calculation & Availability
Real-Time Classification:
# Classify as user creates tasks
task = Task(
name="Design user authentication system",
description="Create architecture for OAuth 2.0 integration with social login providers"
)
classifier = EnhancedTaskClassifier()
result = classifier.classify_with_confidence(task)
print(f"Type: {result.task_type}")
# Output: Type: TaskType.DESIGN
print(f"Confidence: {result.confidence}")
# Output: Confidence: 0.92
print(f"Matched keywords: {result.matched_keywords}")
# Output: Matched keywords: ['design', 'architecture', 'integration']
print(f"Reasoning: {result.reasoning}")
# Output: Reasoning: Classified as DESIGN because task matched patterns:
# (?:create|define|plan)\s+(?:the\s+)?(?:system|application|software)\s+(?:architecture|design)
# and contains primary keywords: design, architecture
What Makes This System Special#
1. Comprehensive Keyword Coverage#
The classifier uses extensive keyword dictionaries organized by category (primary, secondary, verbs) for each task type:
TASK_KEYWORDS = {
TaskType.DESIGN: {
"primary": [
"design", "architect", "plan", "planning", "architecture",
"blueprint", "specification", "spec", "specs", "research",
"analyze", "analysis", "study", "investigate"
],
"secondary": [
"wireframe", "mockup", "prototype", "diagram", "model",
"schema", "structure", "layout", "interface", "ui/ux",
"ux", "ui", "workflow", "concept", "draft", "outline",
"framework", "pattern", "template"
],
"verbs": [
"design", "plan", "architect", "draft", "outline",
"conceptualize", "define", "specify", "model"
]
},
# Similar comprehensive dictionaries for other task types...
}
2. Regex Pattern Matching#
Advanced regex patterns capture complex task naming conventions:
TASK_PATTERNS = {
TaskType.DESIGN: [
r"(?:create|define|plan)\s+(?:the\s+)?(?:system|application|software)\s+(?:architecture|design)",
r"design\s+(?:the\s+)?(?:data|database)\s+(?:model|schema|structure)",
r"(?:create|design)\s+(?:ui|ux|user\s+interface|user\s+experience)",
r"(?:define|specify)\s+(?:api|interface)\s+(?:contracts?|specifications?)",
r"(?:plan|design)\s+(?:the\s+)?(?:workflow|process|flow)"
],
TaskType.TESTING: [
r"write.*tests?",
r"(?:write|create|add)\s+(?:unit\s+)?tests?\s+(?:for|to)",
r"(?:test|verify|validate)\s+(?:the\s+)?(?:\w+\s+)?(?:functionality|feature|component)",
r"(?:create|write)\s+(?:integration|e2e|end-to-end)\s+tests?",
r"(?:ensure|verify|check)\s+(?:that|if)\s+(?:\w+\s+)?(?:works?|functions?)",
r"(?:add|improve)\s+test\s+coverage"
],
# Patterns for other task types...
}
3. Intelligent Scoring System#
The classifier uses a sophisticated scoring algorithm that considers:
Keyword match strength: Primary keywords (2.0 points), secondary keywords (1.0 point), verbs (1.5 points)
Pattern matches: Regex pattern matches (3.0 points)
Position weight: Keywords at the beginning get 1.5x multiplier
Label boost: Direct label matches get strong boost (8.0 points)
Conflict penalties: Presence of competing keywords reduces score (-0.5 per conflict)
def _score_task_type(self, task_name: str, task_description: str, task_labels: list[str], task_type: TaskType) -> Tuple[float, List[str], List[str]]:
"""Score how well text matches a task type."""
score = 0.0
matched_keywords = []
matched_patterns = []
# Label boost
for label in labels:
if label.lower() in ["testing", "qa"] and task_type == TaskType.TESTING:
score += 8.0
matched_keywords.append(label.lower())
# Primary keywords
for keyword in keywords_dict.get("primary", []):
pattern = rf"\b{re.escape(keyword)}s?\b"
match = re.search(pattern, text)
if match:
position_weight = 1.5 if match.start() < 10 else 1.0
score += 2.0 * position_weight
matched_keywords.append(keyword)
# Pattern matches
for regex_pattern in self._compiled_patterns.get(task_type, []):
if regex_pattern.search(text):
score += 3.0
matched_patterns.append(regex_pattern.pattern)
return score, matched_keywords, matched_patterns
4. Confidence Calculation#
Advanced confidence calculation considers multiple factors:
# Base confidence from score strength
base_confidence = min(best_score / 5.0, 1.0)
# Uniqueness bonus (how much the winning type stands out)
uniqueness_bonus = (best_score / total_score) * 0.15
# Pattern match boost
if matched_patterns[best_type]:
confidence = min(confidence * 1.1, 0.95)
# Multiple keyword boost
if len(matched_keywords[best_type]) >= 3:
confidence = min(confidence * 1.05, 0.95)
# Conflict penalty (competing high scores)
if multiple_competing_scores:
confidence = min(confidence * 0.6, 0.65)
5. Ambiguity Resolution and Confidence Adjustment#
Ambiguity resolution and confidence adjustment are handled inline within classify_with_confidence — they are not separate public or private methods. The logic for resolving ambiguous cases (e.g., DESIGN vs IMPLEMENTATION) and applying confidence boosts or penalties is embedded directly in the classification flow:
# Handle DESIGN vs IMPLEMENTATION ambiguity (inline in classify_with_confidence)
if design_score > 0 and impl_score > 0:
if design_score / impl_score < 2.5: # Scores are close
design_keywords = ["design", "architect", "plan", "planning", "mockup", "wireframe"]
if any(keyword in task.name.lower() for keyword in design_keywords):
# Boost design score to resolve ambiguity
scores[TaskType.DESIGN] = max(design_score + 2.0, impl_score + 1.0)
Technical Implementation#
Core Classification Engine#
# src/integrations/enhanced_task_classifier.py
import re
from dataclasses import dataclass
from typing import Dict, List, Optional, Pattern, Tuple
@dataclass
class ClassificationResult:
"""Result of task type classification with confidence."""
task_type: TaskType
confidence: float
matched_keywords: List[str]
matched_patterns: List[str]
reasoning: str
class EnhancedTaskClassifier:
"""Enhanced task classifier with expanded keywords and pattern matching."""
def __init__(self) -> None:
"""Initialize the enhanced classifier."""
# Compile patterns for efficiency
self._compiled_patterns: Dict[TaskType, List[Pattern[str]]] = {}
for task_type, patterns in self.TASK_PATTERNS.items():
self._compiled_patterns[task_type] = [
re.compile(pattern, re.IGNORECASE) for pattern in patterns
]
def classify_with_confidence(self, task: Task) -> ClassificationResult:
"""Classify a task and return detailed results with confidence."""
# Combine text sources
text = f"{task.name} {task.description or ''} {' '.join(task.labels or [])}".lower()
# Score each task type
scores = {}
matched_keywords = {}
matched_patterns = {}
for task_type in TaskType:
if task_type == TaskType.OTHER:
continue
score, keywords, patterns = self._score_task_type(text, task_type, task.labels or [])
scores[task_type] = score
matched_keywords[task_type] = keywords
matched_patterns[task_type] = patterns
# Find best match
if not scores:
return ClassificationResult(
task_type=TaskType.OTHER,
confidence=0.0,
matched_keywords=[],
matched_patterns=[],
reasoning="No matching keywords or patterns found"
)
# Handle ambiguous cases
self._resolve_ambiguity(scores, task, matched_keywords)
best_type = max(scores.items(), key=lambda x: x[1])[0]
best_score = scores[best_type]
# Calculate confidence
total_score = sum(scores.values())
if best_score < 1.0:
return ClassificationResult(
task_type=TaskType.OTHER,
confidence=0.0,
matched_keywords=[],
matched_patterns=[],
reasoning="Insufficient evidence for classification"
)
# Sophisticated confidence calculation
score_ratio = best_score / total_score if total_score > 0 else 0
base_confidence = min(best_score / 5.0, 1.0)
uniqueness_bonus = score_ratio * 0.15
confidence = max(0.85, base_confidence + uniqueness_bonus)
# Adjust for conflicts and boosts
confidence = self._adjust_confidence(confidence, scores, best_type, best_score, matched_patterns, matched_keywords)
# Generate reasoning
reasoning = self._generate_reasoning(best_type, matched_keywords[best_type], matched_patterns[best_type])
return ClassificationResult(
task_type=best_type,
confidence=confidence,
matched_keywords=matched_keywords[best_type],
matched_patterns=matched_patterns[best_type],
reasoning=reasoning
)
Helper Methods#
def get_suggestions(self, task: Task) -> Dict[str, List[str]]:
"""Get suggestions for improving task classification."""
result = self.classify_with_confidence(task)
suggestions = {}
# Only provide suggestions for unclear tasks
if result.confidence < 0.8 or result.task_type == TaskType.OTHER:
if result.task_type == TaskType.OTHER:
suggestions["improve_clarity"] = [
"Consider starting with action words like: design, implement, test, document, deploy",
"Be more specific about the task type",
"Avoid ambiguous terms that could match multiple types"
]
else:
task_keywords = self.TASK_KEYWORDS.get(result.task_type, {})
primary = task_keywords.get("primary", [])
if primary:
suggestions["improve_clarity"] = [
f"Consider starting with: {', '.join(primary[:3])}",
"Be more specific about the task type"
]
return suggestions
def is_type(self, task: Task, task_type: TaskType) -> bool:
"""Check if a task is of a specific type."""
return self.classify(task) == task_type
def filter_by_type(self, tasks: List[Task], task_type: TaskType) -> List[Task]:
"""Filter tasks by type."""
return [task for task in tasks if self.classify(task) == task_type]
Pros and Cons#
Pros#
Simplicity and Maintainability:
Single-file implementation (905 lines) - easy to understand and modify
No external ML dependencies - no PyTorch, transformers, or sklearn required
Fast classification - microseconds per task
Deterministic results - same input always produces same output
Easy to debug - can trace exactly why a classification was made
Practical Effectiveness:
High accuracy for well-named tasks (~95% when task names follow conventions)
Comprehensive keyword coverage based on real-world usage patterns
Sophisticated pattern matching for complex task names
Confidence scoring provides transparency about classification certainty
Works offline - no API calls or model downloads required
Flexibility:
Easy to add new keywords or patterns for custom task types
Simple to adjust scoring weights for specific domains
Can be extended with custom pattern matchers
Lightweight integration with other Marcus systems
Production Ready:
No training data required - works immediately
No model retraining overhead
Minimal computational resources
No GPU or specialized hardware needed
Stable classification behavior over time
Cons#
Semantic Understanding Limitations:
Cannot understand synonyms or paraphrasing (e.g., “construct” vs “build” need both keywords)
Miss tasks with novel phrasing not in keyword list
No contextual understanding beyond simple pattern matching
Cannot infer task type from project context alone
Pattern Dependency:
Requires well-structured task names to achieve high accuracy
Poorly named tasks may be misclassified
Relies on developers following naming conventions
Ambiguous task names may get low confidence scores
Static Knowledge:
Keyword lists must be manually updated with new terminology
No learning from classification outcomes
Cannot adapt to domain-specific vocabularies automatically
Pattern effectiveness depends on manual curation
Accuracy Challenges:
Complex multi-faceted tasks may match multiple types equally
Short task names with generic verbs can be ambiguous
New types of tasks not covered by existing keywords will be classified as OTHER
Cannot handle tasks requiring deep technical understanding
Design Rationale#
Why This Approach Was Chosen#
Pragmatic Simplicity: Traditional machine learning approaches require extensive training data, computational resources, and ongoing model maintenance. Marcus’s keyword and pattern-based approach delivers practical accuracy without ML complexity.
Immediate Functionality: Unlike ML systems that need training data collection and model training, the pattern-based classifier works immediately out of the box with zero setup time.
Transparency and Debuggability: Every classification decision can be traced to specific keywords or patterns that matched. This makes debugging misclassifications straightforward and enables rapid iteration on keyword lists.
Production Reliability: Rule-based classification is deterministic and stable. There’s no model drift, no unexpected behavior from retraining, and no dependency on external AI services or GPUs.
Resource Efficiency: The classifier runs in microseconds with minimal memory usage. This enables real-time classification without performance overhead.
Future Evolution#
Planned Enhancements#
Hybrid Classification Approach: Future versions may combine the current keyword/pattern approach with lightweight NLP for improved semantic understanding while maintaining the simplicity and speed of the current system.
Dynamic Keyword Learning: System could learn new keywords from user corrections and task completion feedback without requiring full ML retraining.
Domain-Specific Customization: Allow projects to define custom keyword sets and patterns for their specific domain, technology stack, or workflow.
For Advanced ML-Based Classification:
See the aspirational future vision document: 44-enhanced-task-classifier-FUTURE.md
Current Implementation: This document describes the actual implemented system.
Aspirational Vision: See 44-enhanced-task-classifier-FUTURE.md for planned ML-based enhancements.
Last Updated: 2025-11-07