Pipeline System
VulnParse-Pin’s pipeline is built around derived passes that run after parsing and enrichment.
Pass contract
Pass abstractions live in src/vulnparse_pin/core/classes/pass_classes.py.
Passprotocol definesname,version, andrun(ctx, scan)PassRunnerexecutes passes in sequenceDerivedContextstores pass output by versioned key
Current default pass order:
Scoring@1.0TopN@1.0Summary@1.0
End-to-end flow
flowchart LR
A[Input File] --> B[Schema Detector]
B --> C[Parser]
C --> D[Normalized ScanResult]
D --> E[Enrichment Stage]
E --> F[PassRunner]
F --> G[ScoringPass]
G --> H[TopNPass]
H --> I[SummaryPass]
I --> J[Derived Context]
J --> K[JSON/CSV/MD/Overlay Output]
Runtime sequence
sequenceDiagram
participant CLI as CLI
participant Main as main.py
participant Detector as SchemaDetector
participant Parser as XML Parser
participant Enrich as Enrichment Services
participant Runner as PassRunner
participant Score as ScoringPass
participant TopN as TopNPass
participant Summary as SummaryPass
CLI->>Main: vpp <input> -o <output>
Main->>Detector: detect(input)
Detector-->>Main: parser spec
Main->>Parser: parse(input)
Parser-->>Main: ScanResult
Main->>Enrich: apply KEV/EPSS/NVD/ExploitDB
Enrich-->>Main: enriched ScanResult
Main->>Runner: run_all(ctx, scan)
Runner->>Score: run(ctx, scan)
Score-->>Runner: DerivedPassResult(Scoring@1.0)
Runner->>TopN: run(ctx, scan)
TopN-->>Runner: DerivedPassResult(TopN@1.0)
Runner->>Summary: run(ctx, scan)
Summary-->>Runner: DerivedPassResult(Summary@1.0)
Runner-->>Main: scan with derived context
Main-->>CLI: output artifacts
Scoring execution strategy
ScoringPass switches strategy based on finding count to minimize overhead at small scale and maximize throughput at large scale.
flowchart TD
A[Findings loaded] --> B{count < 100?}
B -- yes --> C[Sequential scoring]
B -- no --> D{count >= 20000?}
D -- no --> E[ThreadPool scoring]
D -- yes --> F[ProcessPool scoring]
F --> G{pool failure?}
G -- yes --> E
G -- no --> H[Merge and finalize]
C --> H
E --> H
Data flow guarantees
- Passes produce
DerivedPassResultinstead of mutating schema shape ad hoc - Derived outputs are namespaced by pass name/version
- Core findings remain available; derived artifacts are additive
- Output layers can consume either raw or derived-enriched context
Adding a new pass
- Implement a class matching
Passprotocol - Return stable
nameand semanticversion - Emit typed output payload inside
DerivedPassResult - Add to pass list in orchestrator
- Add contract and determinism tests
Common pass pitfalls
- Avoid non-serializable objects in process-pool worker inputs
- Use explicit tie-breakers in heaps when payloads include dicts/lists
- Keep worker functions top-level for pickle compatibility
- Preserve deterministic ordering where output is ranked