Persistence Layer Technical Documentation#
Overview#
The Persistence Layer is Marcusβs unified storage abstraction system that provides consistent data access patterns across all Marcus components. It implements a pluggable backend architecture supporting multiple storage engines (file-based JSON, SQLite, in-memory) with thread-safe concurrent access, atomic operations, and automatic timestamp management. This system serves as the foundational data layer enabling Marcusβs memory system, event distribution, context management, and project registry to maintain state across system restarts.
Architecture#
Core Components#
PersistenceBackend(Base Class)Defines the storage interface contract
Standardizes operations: store, retrieve, query, delete, clear_old
Enables pluggable backend implementations
Methods raise
NotImplementedErrormanually (not an ABC / noabstractmethoddecorators)Ensures consistent API across different storage engines
FilePersistence(Primary Backend)JSON-based file storage implementation
Atomic writes using temporary files and atomic rename operations
Per-collection file locking for concurrent access
Automatic timestamp injection (
_stored_at)Graceful error handling with rollback mechanisms
SQLitePersistence(Performance Backend)SQLite-based storage for better performance and complex queries
Asynchronous executor pattern for non-blocking database operations
Indexed storage with efficient timestamp-based queries
ACID compliance for data integrity
MemoryPersistence(Testing Backend)In-memory storage for unit tests and ephemeral operations
Full feature parity with persistent backends
Fast reset capabilities for test isolation
Persistence(Unified Interface)High-level facade wrapping backend implementations
Domain-specific methods for Events, Decisions, and arbitrary data
Built-in cleanup and maintenance operations
Type-safe object reconstruction from stored data
Storage Architecture#
# Backend Abstraction Layer
PersistenceBackend # Plain base class (raises NotImplementedError, not an ABC)
βββ FilePersistence # JSON files with atomic writes
βββ SQLitePersistence # SQLite database with indexing
βββ MemoryPersistence # In-memory for testing
# Domain Layer
Persistence (Facade)
βββ Event storage/retrieval methods
βββ Decision storage/retrieval methods
βββ Generic collection operations
βββ Cleanup and maintenance
Data Organization#
File Backend Structure:
data/marcus_state/
βββ events.json # Event distribution history
βββ decisions.json # Architectural decisions
βββ projects.json # Project registry data
βββ implementations.json # Code pattern storage
βββ patterns.json # Learned behavioral patterns
SQLite Backend Structure:
persistence(
collection TEXT, -- Logical grouping (events, decisions, etc.)
key TEXT, -- Unique identifier within collection
data TEXT, -- JSON-serialized object data
stored_at TIMESTAMP -- Automatic timestamp for cleanup/queries
)
Integration with Marcus Ecosystem#
Memory System Integration#
The Persistence Layer is the backbone of Marcusβs memory system:
Task Outcomes: Stores agent performance data for learning algorithms
Agent Profiles: Maintains skill assessments and success rates
Task Patterns: Preserves learned patterns for predictive assignment
Temporal Data: Enables time-based analysis and performance trending
Event System Integration#
Enables event history and replay capabilities:
Event Storage: Persists all system events for audit trails
Event Replay: Supports system recovery and debugging
Pattern Analysis: Stores event sequences for behavioral learning
Cross-Session Continuity: Maintains event context across restarts
Context System Integration#
Provides the foundation for architectural decision tracking:
Decision Records: Stores agent-made architectural decisions
Dependency Maps: Persists task relationship information
Implementation Context: Maintains code patterns and integration points
Cross-Agent Communication: Enables decision sharing between agents
Project Registry Integration#
Manages multi-project configurations:
Project Definitions: Stores provider configurations and metadata
Active Project State: Maintains current project selection
Usage Analytics: Tracks project access patterns
Configuration History: Preserves configuration changes over time
Workflow Integration Points#
The Persistence Layer is invoked throughout the typical Marcus workflow:
1. create_project β Project Registry Persistence#
Stores new project configuration
Updates project metadata and timestamps
Maintains project relationship data
2. register_agent β Agent Profile Initialization#
Creates or updates agent profile records
Initializes performance tracking data
Establishes agent capability baselines
3. request_next_task β Context and Memory Lookup#
Retrieves agent performance history
Loads task pattern data for prediction
Accesses architectural decisions for context
4. report_progress β Event and State Updates#
Stores progress events for tracking
Updates working memory state
Records milestone achievements
5. report_blocker β Blocker Analysis and Storage#
Persists blocker information for learning
Updates agent profile with blocking patterns
Stores decision context for future reference
6. finish_task β Outcome Recording and Learning#
Records complete task outcomes
Updates agent performance profiles
Triggers pattern learning and storage
Key Features#
1. Thread-Safe Concurrent Access#
# Per-collection locking prevents race conditions
self._locks = {} # collection -> asyncio.Lock()
async with self._get_lock(collection):
# Atomic operation guaranteed
2. Atomic Write Operations#
# Atomic file replacement prevents corruption
temp_file = file_path.with_suffix(".tmp")
await write_to_temp_file(temp_file, data)
temp_file.replace(file_path) # Atomic on POSIX systems
3. Automatic Timestamping#
# All stored data gets automatic timestamp
data_with_timestamp = {
**original_data,
"_stored_at": datetime.now().isoformat()
}
4. Flexible Querying#
# Filter functions enable complex queries
def filter_recent_errors(item):
return (item.get("event_type") == "error" and
item.get("timestamp") > cutoff_time)
events = await persistence.query("events", filter_recent_errors)
5. Intelligent Cleanup#
# Automatic cleanup of old data with configurable retention
await persistence.cleanup(days=30) # Remove data older than 30 days
Technical Implementation Details#
Asynchronous Operations#
All persistence operations are asynchronous to prevent blocking:
# File operations use aiofiles for non-blocking I/O
async with aiofiles.open(file_path, "w") as f:
await f.write(json.dumps(data))
# SQLite operations use executor pattern
await asyncio.get_event_loop().run_in_executor(None, sync_operation)
Error Handling and Resilience#
# Graceful error handling with detailed logging
try:
await storage_operation()
except Exception as e:
logger.error(f"Storage error in {collection}: {e}")
# Rollback temporary files if needed
if temp_file.exists():
temp_file.unlink()
raise
JSON Serialization Strategy#
# Custom serialization handles complex types
json.dumps(data, indent=2, default=str) # Fallback to string representation
Backend Selection#
Note (aspirational): An auto-selection strategy between SQLite and FilePersistence based on feature requirements does not exist in code. The
Persistence.__init__method simply defaults toFilePersistence()when no backend is provided:self.backend = backend or FilePersistence(). SQLite is only used if the caller explicitly passes aSQLitePersistenceinstance. The block below is aspirational, not current behavior.
# How the default is actually chosen:
self.backend = backend or FilePersistence()
# SQLite is only used when explicitly passed by the caller, e.g.:
persistence = Persistence(backend=SQLitePersistence(Path(persistence_path)))
calculate_median_task_duration() Method#
Both FilePersistence and SQLitePersistence expose a calculate_median_task_duration()
method for computing the median completion time of recorded tasks. This method is not
documented elsewhere but is available on both backends.
Performance Characteristics#
File Backend Performance#
Read Operations: O(1) for single key, O(n) for collection queries
Write Operations: O(n) due to full file rewrite (atomic safety)
Memory Usage: Loads entire collection into memory for operations
Concurrency: Per-collection locking, good for low-contention scenarios
SQLite Backend Performance#
Read Operations: O(log n) with indexed queries
Write Operations: O(1) with transaction batching
Memory Usage: Minimal memory footprint with cursor-based iteration
Concurrency: Row-level locking, excellent for high-contention scenarios
Memory Backend Performance#
Read Operations: O(1) dictionary access
Write Operations: O(1) dictionary update
Memory Usage: All data in memory, fast but volatile
Concurrency: Async lock-based, ideal for testing
Simple vs Complex Task Handling#
Simple Task Operations#
For basic task tracking and single-agent workflows:
File Backend: Sufficient for small datasets (< 1000 records)
Minimal Querying: Simple key-based lookups dominate
Low Concurrency: Single agent, sequential operations
Complex Task Operations#
For advanced features and multi-agent coordination:
SQLite Backend: Required for performance with large datasets
Complex Queries: Time-based filtering, pattern matching, aggregations
High Concurrency: Multiple agents accessing shared data simultaneously
Board-Specific Considerations#
Planka Integration#
Card Metadata: Stores Planka card IDs and board structure
Sync State: Tracks synchronization timestamps and conflicts
Attachment Handling: Persists file references and metadata
Linear Integration#
Issue Tracking: Maps Linear issue IDs to Marcus task IDs
State Synchronization: Maintains bidirectional state consistency
Webhook Processing: Stores webhook events for replay and analysis
GitHub Integration#
Repository Context: Persists repository structure and file patterns
Commit Tracking: Links commits to tasks and decisions
Branch Management: Tracks branch states and merge conflicts
Cato Integration#
The Persistence Layer provides the data foundation for Cato (the future Marcus AI assistant):
Knowledge Base Storage#
Decision History: All architectural decisions available for AI analysis
Pattern Recognition: Stored patterns enable AI-driven insights
Performance Analytics: Historical data feeds AI performance predictions
Context Preservation#
Conversation Context: Maintains context across AI interactions
Project Memory: AI can reference past project decisions and outcomes
Learning Continuity: AI learns from stored experiences and outcomes
Predictive Analytics Support#
Training Data: Historical outcomes provide AI training datasets
Feature Engineering: Stored patterns become AI model features
Validation Data: Past predictions vs. outcomes for model improvement
Pros and Cons of Current Implementation#
Advantages#
Backend Flexibility
Easy to swap storage engines based on requirements
Consistent API regardless of underlying storage
Future-proof for new storage technologies (Redis, PostgreSQL, etc.)
Data Integrity
Atomic operations prevent data corruption
Automatic timestamping for audit trails
Graceful error handling with rollback capabilities
Performance Optimization
SQLite backend for high-performance scenarios
File backend for simplicity and portability
Memory backend for testing and development
Developer Experience
Simple, intuitive API for common operations
Type-safe object reconstruction
Comprehensive error logging and debugging
Disadvantages#
File Backend Scalability
Full file rewrite on every update (O(n) writes)
Memory usage grows with collection size
Limited query capabilities compared to databases
Schema Evolution
No built-in migration system for data structure changes
JSON schema flexibility can lead to inconsistencies
Manual data transformation required for breaking changes
Cross-Collection Transactions
No support for transactions across multiple collections
Potential for partial failures in complex operations
Manual consistency management required
Query Limitations
File backend requires full collection scans for complex queries
No built-in indexing beyond timestamp ordering
Limited aggregation and analytical capabilities
Why This Approach Was Chosen#
1. Incremental Complexity#
Marcus started simple with file-based storage and added SQLite as performance demands grew. This approach allows:
Rapid Prototyping: File backend enables quick development cycles
Performance Scaling: SQLite backend handles production workloads
Deployment Flexibility: No external database dependencies for simple deployments
2. Backend Abstraction#
The pluggable architecture provides:
Future Flexibility: Easy to add Redis, PostgreSQL, or cloud storage
Testing Isolation: Memory backend ensures clean test environments
Environment Adaptation: Choose optimal backend for deployment constraints
3. Marcus-Specific Optimizations#
The design optimizes for Marcusβs specific access patterns:
Time-Based Queries: Most operations filter by timestamp
Collection Isolation: Different data types rarely need cross-collection joins
Event-Driven Architecture: Append-heavy workload with occasional cleanup
Future Evolution#
Short-Term Enhancements (Next 6 months)#
Schema Migration System
class PersistenceMigration: version: str migrate: Callable[[Dict], Dict] rollback: Callable[[Dict], Dict]
Query Optimization
Add indexing support to file backend
Implement query planning and optimization
Add aggregation and analytical query capabilities
Cross-Collection Transactions
async with persistence.transaction(): await persistence.store("events", event_id, event_data) await persistence.store("decisions", decision_id, decision_data)
Medium-Term Evolution (6-12 months)#
Distributed Storage Support
Redis backend for caching and real-time data
PostgreSQL backend for complex analytical queries
Cloud storage backends (S3, GCS, Azure Blob)
Advanced Analytics
Built-in time-series aggregation
Pattern matching and similarity queries
Machine learning feature extraction pipelines
Real-Time Synchronization
Change streams and real-time notifications
Conflict resolution for concurrent updates
Multi-master replication support
Long-Term Vision (12+ months)#
AI-Driven Storage Optimization
Automatic backend selection based on access patterns
Predictive caching and prefetching
Intelligent data placement and partitioning
Semantic Storage Layer
Vector embeddings for semantic similarity queries
Natural language query interface
Context-aware data retrieval
Self-Healing Infrastructure
Automatic corruption detection and repair
Performance monitoring and optimization
Adaptive scaling based on workload patterns
The Persistence Layer represents Marcusβs commitment to data reliability, performance, and flexibility. By providing a solid foundation for all data operations, it enables the sophisticated memory, context, and learning systems that make Marcus an intelligent project management platform. As Marcus evolves toward greater AI integration and autonomous operation, the Persistence Layer will continue to evolve to support these advanced capabilities while maintaining the simplicity and reliability that makes Marcus accessible to developers of all skill levels.