Blueprints: Example Vulnerability Management Tooling Architecture From Reactive to Optimised

Christopher Clarkson
May 12
11 min read

A companion reference for the CAXA Technologies Security Operations Series

Most organisations have the wrong tooling problem. They invest in enterprise platforms before the processes that would make those platforms useful are stable, or they stay on manual workflows long after volume has made them unworkable. The result is either sophisticated tooling operating at L2 capability, or an L3 programme constrained by L2 integration. Neither failure mode is uncommon, and neither is solved by buying more software.

This reference extends Episode 11's maturity model into tooling decisions. Episode 4's operating model established that technology is one dimension of three required to deliver the Five Pillars, and that tooling enables rather than replaces people and process. This companion answers the question that episode left open: what specifically should the technology layer be?

The structure follows the four maturity levels. Compliance evidence at each level is a consequence of the tooling decisions, not a selection criterion. AI-assisted vulnerability discovery tools are accelerating finding velocity and compressing the window between publication and working exploit; the programme's maturity level now determines whether it can respond at that speed.

The four levels:

L1 Reactive: Ad hoc use of whatever is licensed. The gap is process and integration, not technology.
L2 Defined and Documented: Scheduled scanning, a maintained asset inventory, structured finding management, and SLAs defined if enforced manually.
L3 Measured and Risk-Led: EPSS enrichment, CNAPP for cloud-native workloads, continuous SCA monitoring, and automated aggregation. Prioritisation is data-driven.
L4 Optimised and Continuous: Automation handles defined remediation classes. Evidence is generated continuously. Compliance frameworks become lagging indicators.

The tools named at each level are what this practice has found to work in production at that maturity level, chosen because they can be deployed without a large supporting project. They are not endorsements. Strong alternatives exist at every level and are noted where relevant. The integration requirement at each level is the load-bearing constraint; the specific product that satisfies it is a secondary choice.

L1 Reactive: a diagnostic, not a destination

L1 is a self-assessment baseline. Episode 11 defined what L1 looks like in programme terms. In tooling terms, L1 is characterised by ad-hoc use of whatever is licensed.

Most L1 organisations have a scanner. The scanner is not the constraint. What is absent:

A scan cadence that runs without manual intervention
A CMDB maintained well enough to define consistent scan scope
A ticketing system connected to scanner output
SLAs, MTTR measurement, or exception records

Nessus Professional or Qualys VMDR may be licensed but not continuously operated. Windows Update or package manager patch checking may be the entirety of vulnerability coverage for some systems. The gap is not the technology; it is the process and integration that makes technology produce consistent, useful output.

The signal that an organisation is ready to move to L2 is simple: someone has been assigned to own the programme, a scan cadence has been agreed, and there is a defined place findings will be tracked. The tooling requirements at L2 follow from those commitments.

L2 Defined and Documented: the minimum viable stack

Scanning

The right scanner at L2 depends on scale. For small estates, OpenVAS via Greenbone Community Edition is a credible free-tier option. It supports authenticated scanning, has no per-IP licence cost, and is adequate for quarterly cadence. Its limitations become relevant at L3: the community NVT feed has narrower coverage than the commercial Enterprise feed, there is no native EPSS integration, and cloud-native scanning requires a separate mechanism regardless.

For organisations expecting to reach L3, Tenable.io is the practical starting point. It scales without re-platforming, includes EPSS scoring natively, and provides cloud connectors for AWS, GCP, and Azure. The investment is justified if the programme is expected to grow.

Cloud asset discovery at L2 does not require an additional tool. AWS Config, Azure Resource Graph, and GCP Asset Inventory are native, free, and should be the source of truth for cloud assets from this level.

Asset inventory

A spreadsheet-based CMDB is the honest L2 tool. The required fields are minimal: hostname or IP, owner, tier (1/2/3), OS, last scan date. The asset tier column is load-bearing. It is the enrichment field that makes prioritisation tractable at L3. Organisations that do not maintain it consistently at L2 pay for that gap when they try to move up.

Finding management

The choice at L2 is between JIRA and DefectDojo. Both work. DefectDojo is the better investment, and the case for deploying it from L2 is specific, not philosophical.

DefectDojo is an open-source application security vulnerability management platform (also available as a commercial SaaS). The feature that justifies deploying it from this level is risk acceptance with expiry: exceptions are granted with a defined expiry date, and the tooling enforces renewal or escalation. Without this, exceptions accumulate in email threads and no one can say with confidence what is still in force. I have seen this failure mode in multiple engagements. The exceptions are always there; the question is whether they are visible and owned.

DefectDojo also supports import from over 100 scanner formats natively: Nessus, Qualys, OpenVAS, Burp Suite, Semgrep, Trivy, Snyk, and many others. Adding a scanner to the programme at L3 does not require a new integration project.

Manual import from scanner to DefectDojo is acceptable at L2. Automating it comes at L3.

What L2 enables and what it produces

Scheduled scans with defined scope. Every finding tracked to closure. SLAs defined, enforced manually. Basic MTTR calculated from ticket open/close dates.

Compliance evidence produced: ISO 27001:2022 A.8.8 satisfiable with documented process, scan logs, and ticket records. NIS2 Article 21(2)(e) baseline satisfiable. PCI-DSS quarterly scan cadence (Req 11.3.1) satisfied for non-CDE systems; CDE requires authenticated scanning and structured exception management, achievable at L2 with the right configuration, but at the boundary.

The signal to move to L3

The queue grows faster than it clears. CVSS-only triage generates noise. Every finding arrives with the same urgency. The team knows some findings are being actively exploited but cannot say which without manual research. This is the constraint L3 tooling resolves.

L3 Measured and Risk-Led: the enrichment layer

The L3 investment is primarily enrichment and integration, not platform replacement. The scanner from L2 is likely the same scanner at L3. What changes is what feeds into it and what it feeds.

Scanning and EPSS enrichment

Tenable.io and Qualys VMDR both expose EPSS scores natively in their finding output. The FIRST EPSS API (api.first.org/epss) is available free for organisations running their own enrichment pipeline. Either approach produces the same result: a finding arrives at the triage layer already annotated with its exploitation likelihood, so human review is reserved for decisions requiring contextual judgement rather than sorting.

One constraint applies at every maturity level: EPSS is a trailing indicator calculated from historical exploitation data. A newly published CVE has no EPSS score at publication. For zero-day and recently disclosed findings (increasingly relevant as AI-assisted discovery compresses the window between vulnerability publication and working exploit), KEV escalation and scanner-assigned severity remain the primary triage signals until exploitation data accumulates.

Tenable's Vulnerability Priority Rating (VPR) is an alternative composite signal. Tenable documents its eight drivers: Age of Vulnerability, Exploit Code Maturity, CVSSv3 Impact Score, Threat Intensity, Threat Recency, Threat Sources, Product Coverage, and CVSS Impact Prediction. EPSS is not one of them; VPR and EPSS are complementary, not substitutable. For programmes using Episode 6's composite model explicitly, raw EPSS from FIRST is the more transparent choice; VPR is reasonable for organisations preferring Tenable's threat intelligence-weighted signal.

Cloud-native scanning

For organisations running cloud-native workloads, a CNAPP is not optional at L3. It is the scanning mechanism for the majority of the estate. Agentless CNAPP (Wiz, Prisma Cloud) reads cloud provider APIs and container layers without deploying agents, providing vulnerability findings and, in higher product tiers, attack path analysis that maps how a discovered vulnerability connects to reachable blast radius. Operating three separate native cloud tools across a multi-cloud estate produces three finding feeds with no unified prioritisation. Across a multi-provider estate I have worked on, unifying those feeds into a single CNAPP was the prerequisite for everything else at L3; without it, there was no consistent way to apply asset tier to cloud findings.

CNAPP findings belong in the same prioritisation pipeline as traditional scanner output, classified against the same asset tiers and subject to the same SLAs. A separate console for cloud findings is an L2 configuration running alongside an L3 network scanning programme.

Continuous SCA monitoring: Dependency-Track

Episode 9 established Syft and Grype as the open-source SBOM and scanning stack. Dependency-Track (OWASP) extends what Syft produces, and the distinction from point-in-time SCA scanning matters.

Trivy and Grype find vulnerabilities when the pipeline runs. Dependency-Track finds vulnerabilities when they are published, against every SBOM it has ever ingested. The workflow: Syft generates a CycloneDX SBOM at build time, pushes it to Dependency-Track via API, and Dependency-Track continuously monitors it against NVD, GitHub Advisories, OSV, and other feeds. When a new CVE affects a component in the stored inventory, the alert fires without a re-scan.

For a Log4Shell-style event, a critical CVE published out of hours affecting components across hundreds of applications, Dependency-Track answers the blast radius question across the full portfolio before anyone has triggered a scan. The alternative is running emergency scans across every application and waiting.

Dependency-Track v4.14.0 introduced EPSS scoring natively against each finding. Its policy framework supports blocking on KEV entries, EPSS above a defined threshold, or licence violations. Findings push to DefectDojo via API for aggregation with the rest of the finding pipeline.

Syft plus Dependency-Track plus DefectDojo is a credible open-source alternative to commercial SCA at L3 for organisations with the engineering capacity to operate it. The licence cost is near-zero; the operational cost is integration and maintenance.

DefectDojo as the aggregation hub

At L3 the tool count grows: network scanner, CNAPP, SAST, DAST, SCA tools, Dependency-Track. Each produces findings in its own format. Without an aggregation layer, the team operates multiple consoles and the ticketing system fills with duplicates.

When SAST, SCA, and DAST tools all flag the same vulnerable library, they produce three findings in three formats. DefectDojo deduplicates them to one record. This is not a minor efficiency gain. It is what keeps the ticket queue credible. Engineers stop trusting a queue that surfaces the same CVE six times from six tools; once that trust is lost, the queue stops being the programme.

DefectDojo at L3: all scanner outputs route to it, deduplication fires, asset tier and EPSS score are attached at ingestion before the JIRA ticket is created, and one enriched, deduplicated ticket reaches the engineer per finding. Risk acceptance lifecycle, including expiry and escalation, is managed in DefectDojo.

The asset tier enrichment is only as reliable as the CMDB feeding it. Episode 8 identified CMDB rot as an invisible failure mode: if an asset has been reclassified in the cloud console but not propagated to the CMDB, the scoring model tier-weights against incorrect context and the mis-prioritisation produces no obvious signal. The L3 enrichment pipeline requires continuous CMDB feed from cloud-native discovery (AWS Config, Azure Resource Graph, GCP Asset Inventory), not periodic manual export.

DefectDojo records the finding-first-seen date from the scanner import, not the ticket-created date; accurate MTTR anchoring requires scanner outputs to be pushed in near-real-time rather than batched. The difference between when a scanner detected a finding and when someone created a ticket from it is latency that belongs in the MTTR number. MTTR calculated from DefectDojo's first-seen date is typically higher than MTTR from JIRA alone. That is the honest number, and the metric from Episode 5's measurement framework requires it.

What L3 enables and what it produces

Findings pre-filtered by EPSS before reaching human reviewers. Asset tier applied as a multiplier. KEV findings escalated automatically. MTTR tracked from first-seen date. Exception management structured with documented risk owners and review dates; vendor-dependency exceptions (where no patch exists or the fix is in a third-party release schedule outside the programme's control) are the primary driver of risk-accepted findings at this level and are managed through DefectDojo's expiry and escalation model. Continuous SCA monitoring against the full component inventory. Blast radius assessment for high-profile CVE events answerable in minutes.

Compliance evidence produced: PCI-DSS Req 6.3.3, 11.3.1, 11.3.1.1, and 12.3.1 all satisfiable with L3 tooling correctly configured; QSA-ready artefacts are a natural output. ISO 27001:2022 A.8.8 substantially exceeded. DORA Article 8 ICT risk management requirements partially satisfied; full coverage requires integration with incident management. NIS2 Article 21(2)(e) satisfied. Episode 10 mapped all five frameworks in detail; that mapping does not need reproducing here.

The signal to move to L4

The programme runs well. The remaining friction is remediation speed on well-understood fix classes: OS patches on defined asset types, container image rebuilds on a new base image CVE, dependency patch-version bumps. The human review step is now the bottleneck for these, not the triage. When the team spends more time approving known-good patch classes than reviewing genuinely ambiguous findings, the L4 automation investment is justified.

L4 Optimised and Continuous: the automation layer

L4 does not replace L3. Every component of the L3 stack continues operating. L4 adds automation to the remediation and reporting layers for defined, well-understood work classes. As AI-assisted discovery tools increase finding velocity, the volume of well-understood remediation classes grows faster than human review can clear; the L4 investment becomes more justified with each increment in discovery speed.

Remediation automation

Ansible handles OS patch automation across defined asset classes. A pre-approved playbook runs for routine patches on Tier 2 and Tier 3 assets without a human change request. Tier 1 retains human approval. For cloud-native environments, AWS Systems Manager Patch Manager, Azure Update Manager, and GCP VM Manager provide equivalent capability at lower operational overhead for single-cloud estates; Ansible is the more consistent choice across multi-cloud.

Renovate Bot or GitHub Dependabot handles automated dependency update PRs. Both tools support configurable adoption delays (a minimum age a release must reach before it is eligible for merge), which implements Episode 9’s cooldown discipline in tooling rather than bypassing it. Episode 9’s cooldown applies based on package blast radius, not version type: the Axios incident that motivated the cooldown was a patch-version attack on a widely-used established package. At L4, Renovate and Dependabot are configured to enforce the adoption hold on packages with significant blast radius and auto-merge only after that window clears. Human change approval is reserved for Tier 1 assets and updates outside the defined safe class.

For IaC-managed infrastructure, Terraform-based automated remediation closes CSPM-surfaced misconfigurations where the fix is in managed code and the change is within a defined safe class.

Exception lifecycle automation

DefectDojo's risk acceptance expiry model, present from L2, becomes the automated exception lifecycle at L4. Exceptions approaching expiry trigger notifications to the documented risk owner. Expired exceptions without renewal escalate automatically. This is the operationalisation of Episode 9's compensating control lifecycle requirements: defined owner, review date, escalation trigger, enforced in tooling rather than process.

Dynamic SLA adjustment

Where EPSS scores increase materially against a finding already in the queue, the SLA should tighten without requiring a human to trigger the change. Dependency-Track policy alerts on EPSS threshold changes feed into DefectDojo or JIRA automation rules. The implementation requires a custom integration layer; it is feasible at L4 for organisations with the engineering capacity to build and maintain it.

What L4 produces

Predictable remediation classes handled without human review. Exception lifecycle enforced in tooling. Audit preparation measured in hours rather than weeks. EPSS movement triggers SLA review without manual intervention.

As Episode 11 noted, compliance frameworks become lagging indicators at L4. The programme produces evidence continuously; the audit finds what it needs when it arrives.

The stack at a glance

Capability	L1	L2	L3	L4
Asset inventory	None / ad hoc	Spreadsheet + cloud-native discovery	Automated CMDB + CSPM	Continuous; ephemeral assets tracked
Scanning	Ad hoc	Scheduled; authenticated (Nessus / OpenVAS / Tenable.io)	Continuous; cloud-native via CNAPP (Wiz / Prisma Cloud)	Continuous + runtime
SCA: point-in-time	None	None or advisory Trivy	Trivy / Snyk in CI with advisory gates	Snyk / Mend with blocking gates
SCA: continuous	None	None	Dependency-Track (SBOM inventory + new CVE alerts, EPSS-enriched)	Same; EPSS policy triggers automated SLA escalation
Finding aggregation	None	DefectDojo (recommended)	DefectDojo (hub for all scanner outputs; deduplication)	Same; exception expiry automated
EPSS integration	None	None	Scanner-native or FIRST API; Dependency-Track v4.14.0	Dynamic SLA adjustment on EPSS movement
Ticketing	Email / spreadsheet	JIRA or DefectDojo (manual import)	JIRA / ServiceNow via DefectDojo + scanner APIs	Same; exception lifecycle in DefectDojo
MTTR tracking	None	Manual (spreadsheet)	Automated by tier; first-seen from DefectDojo	Continuous; trend analysis
Remediation automation	None	None	None	Ansible / cloud-native patch managers / Renovate
Compliance evidence	None	ISO 27001 A.8.8 baseline; NIS2 baseline	Full QSA-ready artefacts across five frameworks	Continuous evidence pack; exceeds requirements

The integration gap is where programmes stall

The vulnerability management tooling decisions that produce the most improvement are rarely the headline platform purchases. Deploying DefectDojo at L2, before the tool count justifies a deduplication argument, is worth the investment because the exception expiry model prevents a class of risk posture failures that are otherwise invisible until an audit surfaces them. Deploying Dependency-Track at L3, before a Log4Shell-equivalent forces the question, means the blast radius assessment is ready when it matters.

The L2 to L3 transition stalls most commonly not because the right scanner is unavailable but because the integration between scanner output, EPSS enrichment, asset tier, and ticketing is manual and fragile. Fixing that pipeline (scanner API to DefectDojo, with asset tier from the CMDB attached at ingestion) is the work that moves an organisation from L2 to L3 more reliably than any platform upgrade.

The maturity level determines the tooling you need. The tooling question to answer at each level is not "what is the best product?" but "what is the integration that makes the process tractable at this scale?"