VulnParse-Pin

Benchmarks

This document summarizes observed performance behavior for VulnParse-Pin v1.0.0rc.1 under high-volume test scenarios.

Test environment

Date baseline: March 2026
Platform: Windows Server 2025
Python: 3.14
CPU profile: 8-core class system

Nessus scale benchmark (NVD optimization focus)

Dataset profile

Baseline file: nessus_expanded_200.xml
Baseline findings: 400
Large-scale file: nessus_benchmark_50k.xml
Large-scale findings: 50,000
Unique CVEs across both: 338

Runtime comparison

Baseline runtime: ~20s
50k runtime: ~194.59s
Input growth: 125x
Runtime growth: 9.73x

Result: sublinear scaling and significant throughput gains at scale.

Throughput signal

Baseline: ~20 findings/sec
50k run: ~257 findings/sec
Effective throughput improvement: ~12.8x

NVD optimization impact

Observed NVD phase remained near-flat relative to finding growth due to:

Streaming/filtered feed handling
CVE-targeted indexing
Early termination behavior
Parallel feed parsing

This decoupled major NVD cost from raw finding count when CVE distribution was stable.

OpenVAS high-volume stress benchmark

Mode used: offline with large-input allowance and low log verbosity.

Dataset sizes

openvas_real_stress_20k.xml — 20,000 findings (~10.48 MB)
openvas_real_stress_100k.xml — 100,000 findings (~52.41 MB)
openvas_real_stress_700k.xml — 700,000 findings (~367.10 MB)

Runtime results

20k: ~48.49s wall clock
100k: ~42.86s wall clock
700k: ~210.65s wall clock (3m 30.65s)

All runs completed with exit code 0.

Interpretation

NVD-related work can remain bounded when unique-CVE cardinality is constrained
At very large scales, bottlenecks shift toward scoring, ranking, and serialization
Output and I/O overhead become proportionally more significant as payloads grow

Projection guidance

Based on observed behavior and current architecture:

100k findings: practical for single-machine workflows
1M findings: feasible but likely dominated by scoring and output costs

Use this as directional guidance, not a guaranteed SLA.

Benchmarking best practices

Benchmark with production-like CVE diversity
Isolate cold-cache vs warm-cache runs
Record phase timing, not only total time
Keep configuration snapshots for reproducibility