Benchmarks

This document summarizes observed performance behavior for VulnParse-Pin v1.0.0rc.1 under high-volume test scenarios.

Public benchmark update (April 22, 2026)

This section is intended for external sharing and value communication. It summarizes the latest end-to-end runs with a consistent output set (JSON, CSV, executive Markdown, technical Markdown, and runmanifest).

Consolidated benchmark table

Scenario	Input	Assets	Findings	Scored findings	Scoring coverage	Enriched findings	Enrichment coverage	Runtime	Throughput
Baseline lab (101)	Nessus	2	101	5	4.95%	4	3.96%	3.333s	30.30 findings/s
Lab scaled 5k	Nessus	10	5,000	5,000	100.00%	2,236	44.72%	11.663s	428.69 findings/s
OpenVAS stress 20k	OpenVAS	20	20,000	140	0.70%	20,000	100.00%	24.796s	806.57 findings/s

Source artifact: tests_output/public_benchmark_comparison_apr22_2026.csv

What this shows (public-facing value)

Throughput scales strongly across larger workloads while still producing full output artifacts.
Coverage behavior is dataset-sensitive, which is expected and useful for explaining risk context quality. The Lab 5k benchmark shows complete scoring coverage by construction, while OpenVAS 20k shows complete enrichment coverage with selective scoring coverage.
Exposure-inference signals remain available at scale (rule-hit summaries and confidence buckets), enabling explainable prioritization rather than raw finding count reporting.

Feature value versus a regular scanner output

Feature area	VulnParse-Pin evidence from benchmark runs	Typical regular scanner output	Public-facing implication/value
Cross-finding prioritization	Produces scored findings and ranked assets/findings across all runs; 5k run scored 5,000/5,000 findings	Usually reports per-finding severity and plugin output with limited cross-finding ranking logic	Teams can prioritize remediation by risk concentration and asset context, not just severity labels
Exposure inference traceability	Decision trace summaries include exposure confidence and rule-hit counts (for example, `private_ip`, `public_service_port_hit`, `critical_asset_hint`)	Often provides host/open-port facts but not a transparent, countable inference trace for prioritization rules	Stakeholders can audit why an asset/finding moved up in priority and defend triage decisions
Multi-artifact decision consistency	Same run emits JSON, CSV, executive and technical reports, plus runmanifest with aligned totals validated in this benchmark cycle	Output formats may exist, but consistency validation is commonly left to downstream tooling	Reduces reporting drift between technical and executive views and improves governance confidence
Actionable risk shaping	Risk-band distributions and TopN-derived context are surfaced in summary artifacts (critical/high/medium/low/info + top assets)	Regular scanner views frequently center on scanner-native severity bins without policy-aware context	Provides clearer, operations-oriented remediation sequencing and communication to leadership
Reproducible benchmark evidence	Public table and raw comparison CSV included from actual e2e runs	Benchmark narratives are often anecdotal or not tied to reusable artifacts	Improves trust in performance/value claims during customer or leadership review

Benchmark claims (evidence-bound)

VulnParse-Pin processed 20,000 OpenVAS findings in 24.796s while producing JSON, CSV, executive report, technical report, and runmanifest outputs in the same run.
On the 5,000-finding Nessus benchmark, VulnParse-Pin achieved 100.00% scoring coverage (5,000 of 5,000 findings).
Across benchmark scenarios, VulnParse-Pin maintained cross-artifact numeric consistency between JSON, CSV, markdown summaries, and runmanifest pass summaries.
VulnParse-Pin exposes explainable prioritization traces through decision trace summaries (for example, exposure confidence buckets and rule-hit counts).
Benchmark claims are backed by reproducible artifacts in tests_output/public_benchmark_comparison_apr22_2026.csv and scenario-specific profile/delta files.

Claim boundaries:

These numbers describe the tested benchmark workloads and environment; they are not universal guarantees for all datasets.
Coverage percentages vary by input structure, enrichment density, and CVE distribution.

Important interpretation note

Different datasets have different CVE density and structure. Coverage percentages should be interpreted as workload characteristics, not universal fixed rates. The value signal is that VulnParse-Pin keeps decision-support outputs and traceability available across both small and high-volume runs.

Test environment

Date baseline: March 2026
Platform: Windows Server 2025
Python: 3.14
CPU profile: 8-core class system

Nessus scale benchmark (NVD optimization focus)

Dataset profile

Baseline file: nessus_expanded_200.xml
Baseline findings: 400
Synthetic demo-derived file: Lab_test_scaled_5k.nessus
Synthetic demo-derived findings: 5,000 across 10 assets
Large-scale file: nessus_benchmark_50k.xml
Large-scale findings: 50,000
Unique CVEs across both: 338

The Lab_test_scaled_5k.nessus sample is generated from the bundled Lab_test.nessus template and forces one CVE per finding across years 2019-2025, with populated titles and plugin outputs for parser and enrichment realism.

5k demo-derived benchmark snapshot

Input file: Lab_test_scaled_5k.nessus
Assets: 10
Findings: 5,000
Runtime: ~7.91s wall clock
Output set: JSON, CSV, executive Markdown, technical Markdown
Enrichment coverage: 100% scoring coverage, 5,000 CVSS vectors assigned and validated

Observed summary from the benchmark run:

Known exploits: 47 findings
KEV hits: 10 findings
EPSS coverage: 2,225 / 5,000 findings (44.50%)
Enriched findings: 2,226

Runtime comparison

Baseline runtime: ~20s
5k runtime: ~7.91s
50k runtime: ~194.59s
Input growth: 125x
Runtime growth: 9.73x

Result: sublinear scaling and significant throughput gains at scale.

Throughput signal

Baseline: ~20 findings/sec
5k run: ~632 findings/sec
50k run: ~257 findings/sec
Effective throughput improvement: ~12.8x

NVD optimization impact

Observed NVD phase remained near-flat relative to finding growth due to:

Streaming/filtered feed handling
CVE-targeted indexing
Early termination behavior
Parallel feed parsing

This decoupled major NVD cost from raw finding count when CVE distribution was stable.

OpenVAS high-volume stress benchmark

Mode used: offline with large-input allowance and low log verbosity.

Dataset sizes

openvas_real_stress_20k.xml — 20,000 findings (~10.48 MB)
openvas_real_stress_100k.xml — 100,000 findings (~52.41 MB)
openvas_real_stress_700k.xml — 700,000 findings (~367.10 MB)

Runtime results

20k: ~48.49s wall clock
100k: ~42.86s wall clock
700k: ~210.65s wall clock (3m 30.65s)

All runs completed with exit code 0.

Interpretation

NVD-related work can remain bounded when unique-CVE cardinality is constrained
At very large scales, bottlenecks shift toward scoring, ranking, and serialization
Output and I/O overhead become proportionally more significant as payloads grow

Projection guidance

Based on observed behavior and current architecture:

100k findings: practical for single-machine workflows
1M findings: feasible but likely dominated by scoring and output costs

Use this as directional guidance, not a guaranteed SLA.

Benchmarking best practices

Benchmark with production-like CVE diversity
Isolate cold-cache vs warm-cache runs
Record phase timing, not only total time
Keep configuration snapshots for reproducibility