Caching Deep Dive
This page describes how VulnParse-Pin caches enrichment feeds, validates cache integrity, applies TTL policy, and uses the NVD SQLite index for warm-run acceleration.
Scope
Primary implementation files:
src/vulnparse_pin/utils/feed_cache.pysrc/vulnparse_pin/utils/nvdcacher.pysrc/vulnparse_pin/app/runtime_helpers.pysrc/vulnparse_pin/app/bootstrap.pysrc/vulnparse_pin/resources/config.yaml
Cache architecture
VulnParse-Pin uses two cache systems:
- Feed cache files managed by
FeedCacheManager. - Optional NVD SQLite index managed by
NVDFeedCache.
The feed cache is authoritative for downloaded KEV, EPSS, Exploit-DB, and NVD feed archives. The SQLite index is a performance layer built from NVD feed data.
Feed cache internals (FeedCacheManager)
FeedCacheManager is instantiated in bootstrap via FeedCacheManager.from_ctx(...) and receives:
cache_dir- PFH instance (
ctx.pfh) for path-safe I/O - logger
- feed specs
- feed TTL policy (
FeedCachePolicy)
Feed identity and path model
For non-NVD feeds, each feed key resolves to:
- data file (for example
epss_cache.csv) - checksum sidecar (
.sha256) - metadata sidecar (
.meta.json) - integrity sidecar (
.integrity.json)
For NVD feeds, keys use the namespace:
nvd.modifiednvd.year.YYYY
Sidecar purpose
.sha256: canonical content digest for the data file..meta.json: timestamps and source/validation metadata (created_at,last_updated, source URL, validation status)..integrity.json: digest signature envelope (mode,digest,signature,updated_at).
Integrity modes for feed files
FeedCacheManager supports two integrity modes for .integrity.json:
sha256mode:signature == digesthmac-sha256mode: signature isHMAC(secret, digest)
The mode is controlled by environment variable presence.
HMAC environment variables
VP_FEED_CACHE_HMAC_KEY
- Used by
FeedCacheManager._feed_integrity_secret(). - If set, feed integrity sidecars are signed with HMAC-SHA256.
- If unset, feed integrity falls back to plain SHA256 signature mode.
VP_SQLITE_HMAC_KEY
- Used by
NVDFeedCache._sqlite_secret(). - If set, SQLite signature sidecar is HMAC-SHA256.
- If unset, SQLite signature mode remains plain SHA256.
Operational recommendation
In production, set both variables to high-entropy secrets and rotate with standard secret-management policy.
TTL behavior
TTL is derived from FeedCachePolicy created by build_feed_cache_policy(config).
Global structure (config.yaml)
feed_cache.defaults.ttl_hoursfeed_cache.ttl_hours.<feed_key>
Examples of feed keys in ttl_hours:
kevepssexploit_dbnvd_yearlynvd_modified
TTL semantics in FeedCacheManager.is_fresh()
ttl == 0: always stale.ttl < 0: never expires.ttl > 0: cache is fresh when age in hours is less than or equal to TTL.
age is calculated from .meta.json using last_updated (fallback created_at).
Failure cases
is_fresh() returns stale when:
- metadata file is missing
- metadata timestamp is missing
- timestamp parsing fails
Checksum and integrity lifecycle
ensure_feed_checksum(key, allow_regen=...) validates or regenerates feed state:
- Validate
.sha256against current data digest. - Validate
.integrity.jsondigest and signature. - If files are missing or invalid and
allow_regenis true, regenerate local state. - If hard mismatch and
allow_regenis false, fail fast.
This supports strict integrity workflows in offline mode and best-effort recovery when explicitly enabled.
NVD feed policy and plan
NVD settings are normalized by nvd_policy_from_config(config).
Policy fields include:
enabledttl_yearlyttl_modifiedstart_yearend_yearsqlite_enforce_permissionssqlite_max_age_hourssqlite_max_rows
nvd_feed_plan(config) emits feed plan entries:
- Modified feed (
nvd.modified) - Yearly feeds from
start_yearthroughend_year
NVD policy loader also supports legacy key fallbacks and bounds invalid ranges.
NVD SQLite index (NVDFeedCache)
NVDFeedCache initializes in bootstrap unless --no-nvd is set.
Index files
- SQLite DB:
nvd_cache.sqlite3 - Signature sidecar:
nvd_cache.sqlite3.sig.json
Security controls
- Signature verification at startup (
_sqlite_verify_signature). - Optional HMAC signatures via
VP_SQLITE_HMAC_KEY. - Permission checks/hardening via
_sqlite_harden_permissions. - CVE ID validation regex before lookup/upsert paths.
- Quarantine and reset on integrity failure (
_sqlite_quarantine_and_reset).
Quarantine behavior
On tamper or signature failure, the index and signature files are moved to timestamped .tampered.<UTCSTAMP> files, then a clean index is rebuilt.
Pruning policy
_sqlite_prune() enforces:
- max row age (
sqlite_max_age_hours) - max retained rows (
sqlite_max_rows)
This keeps warm cache size bounded over long runtimes.
NVD parsing and performance
NVD feed parsing uses:
ijsonstreaming when availablejson.loadfallback ifijsonis unavailable
Feed processing can run in parallel for multi-feed operations and is filtered by target CVEs and year ranges to reduce parse overhead.
CLI controls that impact cache behavior
--refresh-cache--allow_regen--no-kev--no-epss--no-exploit--kev-source online|offline--epss-source online|offline--exploit-source online|offline--kev-feed [PATH|URL]--epss-feed [PATH|URL]--no-nvd
Failure modes and expected behavior
- Missing cache in offline mode: fail with explicit missing-feed error.
- Corrupt feed sidecar: regenerated only when allowed by
--allow_regen. - SQLite signature mismatch: quarantine and rebuild path is triggered.
- Remote NVD metadata unavailable: local cache can still be used depending on freshness and mode.