System Architecture
Overview
* RocksDB requires the persistence feature flag: cargo build --features persistence
Processing Flow
Components
Compiler
- Parse VPL via Pest PEG parser
- Generates IR (Intermediate Representation)
- Static optimizations
Execution Graph
- DAG (Directed Acyclic Graph) of operations
- Intelligent scheduling
- Operator fusion when possible
Ingestion Layer
- Source connectors (Kafka, files, HTTP, etc.)
- Deserialization (JSON)
- Schema validation
Pattern Matcher (SASE+ with ZDD)
- NFA-based SASE+ engine for sequence and Kleene pattern detection
- ZDD (
varpulis-zddcrate): compactly represents Kleene capture combinations — e.g., 100 matching events produce ~100 ZDD nodes instead of 2^100 explicit subsets - Temporal constraints, negation, partition-by support
State Manager
Aggregation Engine (Hamlet)
- Aggregation functions (sum, avg, count, min, max, stddev, etc.)
- Temporal windows (tumbling, sliding, session)
- Key-based grouping
- Hamlet (
hamlet/module): multi-query trend aggregation with graphlet-based sharing — O(1) per-event propagation, 3-100x faster than the experimentalzdd_unified/aggregation module - See trend-aggregation.md
Note on ZDD modules: The
varpulis-zddcrate (used by the Pattern Matcher above) compactly represents Kleene match combinations during detection. Thezdd_unified/module is a separate experimental research module that explored ZDD-based aggregation — Hamlet supersedes it for production use.
Pattern Forecasting (PST)
- Prediction Suffix Tree (
pst/module): variable-order Markov model for pattern completion forecasting - Online learning — no pre-training required, starts predictions after a configurable warmup period
- Pattern Markov Chain (PMC): maps PST predictions onto SASE NFA states to forecast which events complete active sequence patterns
- Sub-microsecond prediction latency (51 ns single symbol, 105 ns full distribution)
- Activated via the
.forecast()operator in VPL - See forecasting.md
Parallelism Manager
- See parallelism.md
Context Orchestrator
- Named execution contexts with OS thread isolation
- CPU affinity pinning via
core_affinity - Cross-context routing via bounded
mpscchannels - Zero overhead when no contexts are declared
- See contexts guide
Observability Layer
- See observability.md
Checkpoint Manager
- Versioned state snapshots (windows, SASE runs, aggregation, variables, watermarks)
- Three backends: in-memory, filesystem, RocksDB (requires
persistencefeature) - Crash recovery with automatic watermark restoration
- Opt-in by design — see Feature Maturity above
- See state-management.md
Feature Maturity
Several advanced features are opt-in by design — they are production-ready but require explicit activation. This table clarifies their status to prevent them from being mistaken for incomplete implementations during audits.
| Feature | Status | Activation | Module |
|---|---|---|---|
| Checkpointing | Production | Pass a StateStore to Engine; call checkpoint_tick() periodically | persistence.rs |
| Contexts | Production | Declare context blocks in VPL; engine auto-creates OS threads | context.rs |
.concurrent() | Production | Add .concurrent() to a stream; creates a rayon thread pool | engine/compilation.rs |
| Worker Pool | Production | Instantiate WorkerPool with a config and dispatch events to it | worker_pool.rs |
| Hamlet Aggregation | Production | Use .trend_aggregate() in VPL after a sequence pattern | hamlet/ |
| PST Forecasting | Production | Use .forecast() in VPL after a sequence pattern | pst/ |
Why opt-in? These features add runtime cost (threads, memory, I/O) that would be wasted in simple pipelines. A filter-only pipeline should not pay for checkpointing or multi-threading. The engine defaults to zero-overhead single-threaded execution and scales up only when the VPL program or API caller requests it.
LSP Server
- Language Server Protocol implementation for VPL
- Go-to-definition, find-references, completions, hover docs, semantic tokens
- See lsp.md
MCP Server
- Model Context Protocol server exposing VPL validation, pipeline management, and engine status to AI assistants
- See mcp.md