Scoring and Prioritization Deep Dive
This page details how VulnParse-Pin computes risk, executes the scoring pass, and performs TopN asset/finding prioritization with inference.
Scope
Primary implementation files:
src/vulnparse_pin/core/passes/Scoring/scoringPass.pysrc/vulnparse_pin/core/classes/scoring_pol.pysrc/vulnparse_pin/core/passes/TopN/topn_pass.pysrc/vulnparse_pin/core/passes/TopN/TN_triage_semantics.pysrc/vulnparse_pin/core/passes/TopN/TN_triage_config.pysrc/vulnparse_pin/core/classes/pass_classes.pysrc/vulnparse_pin/resources/scoring.jsonsrc/vulnparse_pin/resources/tn_triage.json
Triage methodology baseline
VulnParse-Pin applies a real-world impact probability first triage model by default:
- prioritize findings and assets with the highest near-term likelihood of meaningful operational impact
- treat exploitability and exposure evidence as first-order decision signals
- use chain/pathway context as decision support, with bounded influence
This baseline should be tuned for your environment. Adjust scoring and TopN policy to match business goals, risk tolerance, and regulatory obligations.
Pass dependency chain
Default pass order:
Scoring@2.0TopN@1.0Summary@1.0
TopNPass depends on ScoringPass output from derived context. If scoring output is missing, TopN cannot rank.
Scoring policy model
ScoringPolicyV1 is loaded from scoring.json using load_score_policy(...) and supplemented by nmap_ctx.scoring_port_bonus from config.yaml.
Core policy groups:
- EPSS normalization (
scale,min,max) - evidence points (
kev,exploit) - band thresholds (
critical,high,medium,low) - weights (
epss_high,epss_medium,kev,exploit) - risk ceiling (
max_raw_risk,max_operational_risk) - finding-level CVE aggregation (
finding_cve_score,finding_cve_decay,finding_cve_max_contributors) - Nmap context (
nmap_port_bonus, default0.0)
Bootstrap validates monotonic bands and non-negative constraints before pass execution.
Scoring calculation mechanics
Scoring computation in scoringPass.py:
- Build a per-CVE preview for every retained
finding.cve_analysisrecord using CVSS, EPSS, KEV, and exploit signals. - Sort contributor CVEs by descending raw preview score.
- Aggregate contributor raw scores with bounded decay (
1.0,decay,decay^2, ...) up tofinding_cve_max_contributors. - Fall back to scanner/enrichment rollup fields when no per-CVE analysis is present.
- Add Nmap port bonus when a confirmed open port matches the finding's service port and
scoring_port_bonus > 0.
raw_score is the composite pre-normalization score.
operational_score is normalized by max_raw_risk and clamped to max_operational_risk.
Risk band is assigned from raw score thresholds.
score_trace captures the full contributor list, per-CVE contribution weights, display CVE, union flags, and final scoring summary. This trace is stored in both assets[].findings[].score_trace and derived["Scoring@2.0"].scored_findings[*].score_trace.
Scoring execution strategy and thresholds
ScoringPass selects execution path by workload:
- sequential mode for small workloads
- thread-pool mode above
parallel_threshold(default100) - process-pool mode above
process_pool_threshold(default20_000)
Additional controls:
min_findings_per_worker(default50)- optional worker override via
process_workers
Scoring optimizations
- Plugin attribute cache (
_build_plugin_cache) avoids repeated attribute lookups. - Signature memo cache reuses repeated score computations for equivalent signal tuples.
- Process worker payloads are serialization-safe plain structures.
These are performance optimizations only; output semantics remain deterministic.
Whole-of-CVEs trace model
For findings that carry cve_analysis, ScoringPass no longer uses the display or authoritative CVE as the score source. Instead:
display_cveremains presentation-oriented and is retained in the trace for operator continuity.primary_cveis the highest-impact contributor after sorting.contributors[*].raw_contributionshows the exact bounded impact of each CVE on the final finding score.contributors[*].aggregation_weightshows the decay weight applied to that CVE.
This preserves auditability without requiring downstream consumers to recompute the aggregation logic.
TopN configuration model
TopN policy is loaded from tn_triage.json and normalized into TNTriageConfig.
TopNConfig controls:
rank_basis(raworoperational)decayvectork(derived from decay length)max_assetsmax_findings_per_assetinclude_global_top_findingsglobal_top_findings_max
InferenceConfig controls:
- confidence thresholds (
low < medium < high) public_service_portsallow_predicates- rule set
Semantic validation enforces invariants and rejects invalid structures.
Nmap context tiebreak
When nmap_ctx.port_tiebreak_enabled is true (default), TopN ranking injects an nmap_hit flag into all three sort key positions:
- Finding sort within asset:
(-score, -nmap_hit, finding_id) - Asset sort:
(-score, -crit_high, -crit_rank, -scorable_count, -nmap_confirmed, asset_id) - Global finding sort:
(-score, -nmap_hit, asset_id, finding_id)
This promotes findings and assets with confirmed open ports within equal-score groups without altering numeric scores. The tiebreak is disabled when the pass returns an empty port index (no --nmap-ctx provided, or port_tiebreak_enabled: false).
See Nmap Context Deep Dive for full configuration details and operational guidance.
TopN ranking flow
TopN processing stages:
- Load scoring output.
- Build per-asset finding index (backed by
PostEnrichmentIndexfor O(1) lookups when available). - Collect per-asset observations including IP, open ports, hostname, and criticality.
- Compute exposure inference per asset.
- Rank findings per asset using rank basis.
- Rank assets with top-k/decay weighting and criticality tie-breaking.
- Build optional global top findings.
- Trim to configured max assets and finding limits.
Inference semantics
Inference rules are predicate-driven and weighted. Predicates can include port- and token-based evidence.
Confidence tiering is based on cumulative weighted evidence and configured thresholds.
Public service port configuration is central to externally exposed asset inference.
Supported predicate forms:
ip_is_public— matches assets with a routable IP addressip_is_private— matches RFC 1918 / private IP addressesany_port_in_public_list— matches if any open port appears in thepublic_service_portslistport_in:[p1,p2,...]— matches if any open port is in the supplied listhostname_contains_any:[t1,t2,...]— matches if the hostname contains any of the supplied tokenscriticality_is:[extreme|high|medium|low]— matches based on the asset’s enriched criticality classification
Criticality is sourced from asset.criticality and is populated at index-build time via PostEnrichmentIndex.
Assets with extreme or high criticality receive additional exposure weighting from the built-in critical_asset_hint rule.
TopN execution strategy
TopNPass uses process-pool parallelism when total findings exceed process_pool_threshold (default 20_000).
Parallel workers process serializable chunks and return ranked/inference payloads for merge.
Output contract
Scoring output includes:
- per-finding scored records
- per-asset score map
- coverage metrics and aggregate summaries
TopN output includes:
- ranked assets
- ranked findings by asset
- optional global top findings
- rank basis, decay, and k metadata
Derived outputs are append-only under versioned pass keys.
Tuning guidance
- Adjust scoring bands and weights only with representative dataset validation.
- Keep
decay[0] == 1.0and a monotonic non-increasing decay profile. - Prefer
rank_basis = operationalfor policy-normalized triage workflows. - Increase process-pool thresholds only when profiling confirms overhead dominates.