Skip to content

System Architecture

Overview

Varpulis Runtime Engine overview

* RocksDB requires the persistence feature flag: cargo build --features persistence

Processing Flow

Processing flow pipeline

Components

Compiler

  • Parse VPL via Pest PEG parser
  • Generates IR (Intermediate Representation)
  • Static optimizations

Execution Graph

  • DAG (Directed Acyclic Graph) of operations
  • Intelligent scheduling
  • Operator fusion when possible

Ingestion Layer

  • Source connectors (Kafka, files, HTTP, etc.)
  • Deserialization (JSON)
  • Schema validation

Pattern Matcher (SASE+ with ZDD)

  • NFA-based SASE+ engine for sequence and Kleene pattern detection
  • ZDD (varpulis-zdd crate): compactly represents Kleene capture combinations — e.g., 100 matching events produce ~100 ZDD nodes instead of 2^100 explicit subsets
  • Temporal constraints, negation, partition-by support

State Manager

Aggregation Engine (Hamlet)

  • Aggregation functions (sum, avg, count, min, max, stddev, etc.)
  • Temporal windows (tumbling, sliding, session)
  • Key-based grouping
  • Hamlet (hamlet/ module): multi-query trend aggregation with graphlet-based sharing — O(1) per-event propagation, 3-100x faster than the experimental zdd_unified/ aggregation module
  • See trend-aggregation.md

Note on ZDD modules: The varpulis-zdd crate (used by the Pattern Matcher above) compactly represents Kleene match combinations during detection. The zdd_unified/ module is a separate experimental research module that explored ZDD-based aggregation — Hamlet supersedes it for production use.

Pattern Forecasting (PST)

  • Prediction Suffix Tree (pst/ module): variable-order Markov model for pattern completion forecasting
  • Online learning — no pre-training required, starts predictions after a configurable warmup period
  • Pattern Markov Chain (PMC): maps PST predictions onto SASE NFA states to forecast which events complete active sequence patterns
  • Sub-microsecond prediction latency (51 ns single symbol, 105 ns full distribution)
  • Activated via the .forecast() operator in VPL
  • See forecasting.md

Parallelism Manager

Context Orchestrator

  • Named execution contexts with OS thread isolation
  • CPU affinity pinning via core_affinity
  • Cross-context routing via bounded mpsc channels
  • Zero overhead when no contexts are declared
  • See contexts guide

Observability Layer

Checkpoint Manager

  • Versioned state snapshots (windows, SASE runs, aggregation, variables, watermarks)
  • Three backends: in-memory, filesystem, RocksDB (requires persistence feature)
  • Crash recovery with automatic watermark restoration
  • Opt-in by design — see Feature Maturity above
  • See state-management.md

Feature Maturity

Several advanced features are opt-in by design — they are production-ready but require explicit activation. This table clarifies their status to prevent them from being mistaken for incomplete implementations during audits.

FeatureStatusActivationModule
CheckpointingProductionPass a StateStore to Engine; call checkpoint_tick() periodicallypersistence.rs
ContextsProductionDeclare context blocks in VPL; engine auto-creates OS threadscontext.rs
.concurrent()ProductionAdd .concurrent() to a stream; creates a rayon thread poolengine/compilation.rs
Worker PoolProductionInstantiate WorkerPool with a config and dispatch events to itworker_pool.rs
Hamlet AggregationProductionUse .trend_aggregate() in VPL after a sequence patternhamlet/
PST ForecastingProductionUse .forecast() in VPL after a sequence patternpst/

Why opt-in? These features add runtime cost (threads, memory, I/O) that would be wasted in simple pipelines. A filter-only pipeline should not pay for checkpointing or multi-threading. The engine defaults to zero-overhead single-threaded execution and scales up only when the VPL program or API caller requests it.

LSP Server

  • Language Server Protocol implementation for VPL
  • Go-to-definition, find-references, completions, hover docs, semantic tokens
  • See lsp.md

MCP Server

  • Model Context Protocol server exposing VPL validation, pipeline management, and engine status to AI assistants
  • See mcp.md

Varpulis - Next-generation streaming analytics engine