Pipeline Security Gates¶

Every piece of content published to the NovaTrek Architecture Portal passes through a series of automated security gates in the CI/CD pipeline. No content can reach production without passing all gates. This is fundamentally different from wiki-based platforms where content is published the moment an author clicks "Save."

For the complete evidence base including NIST, SLSA, and OWASP citations, see Research Results.

Fictional Domain

Everything on this portal is entirely fictional. NovaTrek Adventures is a completely fictitious company. All pipeline references describe the NovaTrek proof-of-concept implementation.

Gate Architecture¶

Author writes content
    │
    ▼
Feature branch (git push)
    │
    ├─── GitHub Push Protection ─── blocks commits containing secrets
    │
    ▼
Pull Request opened
    │
    ├─── Gate 1: YAML Metadata Validation
    ├─── Gate 2: Solution Folder Structure Validation
    ├─── Gate 3: Data Isolation Audit
    ├─── Gate 4: Portal Build (link validation)
    ├─── Gate 5: Confluence Dry-Run (mirror validation)
    ├─── Gate 6: PR Review Approval (human gate)
    │
    ▼
Merge to main (only if ALL gates pass)
    │
    ├─── Gate 7: Production Build
    ├─── Gate 8: Static Asset Integrity
    │
    ▼
Deploy to Azure Static Web Apps
    │
    ├─── Gate 9: Azure Platform Security (WAF, DDoS protection)
    │
    ▼
Content live on portal

Pre-Merge Gates (PR Phase)¶

These gates run automatically on every pull request. All must pass before the PR can be merged.

Gate 1 — YAML Metadata Validation¶

What it checks: All YAML files in architecture/metadata/ are syntactically valid and parseable.

Why it matters: Malformed YAML could cause generators to produce incorrect output or fail silently, leading to missing or corrupted content on the portal.

Implementation: The validate-solution.yml workflow parses every YAML file with Python's yaml.safe_load() — note the use of safe_load, not load, which prevents YAML deserialization attacks.

Blocks merge on failure: Yes.

Gate 2 — Solution Folder Structure Validation¶

What it checks: Every solution folder under architecture/solutions/ contains the required artifacts:

A master document (*-solution-design.md)
A capabilities mapping (3.solution/c.capabilities/capabilities.md)

Why it matters: Incomplete solutions could reference non-existent files, causing broken links on the portal or missing capability rollup data.

Blocks merge on failure: Yes.

Gate 3 — Data Isolation Audit¶

What it checks: Scans all tracked files for patterns that indicate corporate data leakage:

Real company names or internal system identifiers
Real domain names (only *.novatrek.example.com is permitted)
Corporate email patterns
Internal project codes or system names
API keys, tokens, or credentials in content files

Why it matters: The NovaTrek workspace is synthetic by design. Any real corporate data appearing in the repository represents a data leakage incident. This gate catches it before it reaches the published site.

Implementation: scripts/audit-data-isolation.sh — a custom shell script that runs regex pattern matching against all tracked files.

Blocks merge on failure: Yes.

Gate 4 — Portal Build¶

What it checks: The full MkDocs site builds successfully, including:

All generators run (microservice pages, solution pages, capability pages, ticket pages)
All internal links resolve to existing pages
All referenced assets (SVGs, images) exist
MkDocs configuration is valid

Why it matters: A successful build proves that the content is internally consistent. Broken links, missing files, or configuration errors are caught before they reach production.

Blocks merge on failure: Yes.

Gate 5 — Confluence Dry-Run¶

What it checks: The Confluence mirror preparation script (confluence-prepare.py) runs successfully and the resulting Markdown passes mark --dry-run validation.

Why it matters: Even though Confluence is a read-only mirror, publishing failures there indicate content formatting issues that may also affect the primary portal.

Blocks merge on failure: Yes.

Gate 6 — PR Review Approval¶

What it checks: At least one designated reviewer has approved the pull request.

Why it matters: Automated gates catch structural and formatting issues but cannot evaluate content accuracy, architectural correctness, or appropriateness. The human review gate ensures that a second pair of eyes validates the substance of every change.

Configuration: GitHub branch protection rules on main require:

At least 1 approving review
Dismissal of stale approvals when new commits are pushed
No self-approval (the PR author cannot approve their own PR)

Blocks merge on failure: Yes.

Post-Merge Gates (Deploy Phase)¶

These gates run after the PR is merged to main, before content reaches production.

Gate 7 — Production Build¶

What it checks: The full site builds again from the merged main branch. This is not redundant — it catches merge conflicts or timing issues where two PRs were individually valid but conflict when combined.

Why it matters: Defense in depth. Even if a PR gate was somehow bypassed, the production build catches issues before deployment.

Blocks deployment on failure: Yes.

Gate 8 — Static Asset Integrity¶

What it checks: Non-Markdown assets (SVGs, OpenAPI specs, Swagger UI pages, staticwebapp.config.json) are correctly copied into the build output.

Why it matters: MkDocs only processes Markdown files. Static assets must be explicitly copied into the site/ output directory. Missing assets could break diagrams, API documentation, or security headers.

Implementation: The generate-all.sh script and post-build cp commands handle this.

Blocks deployment on failure: Yes (missing staticwebapp.config.json would remove all security headers).

Gate 9 — Azure Platform Security¶

What it checks: Azure Static Web Apps provides platform-level protections:

DDoS protection (Azure-managed, included with the platform)
TLS termination (HTTPS only, managed certificates, TLS 1.2 minimum)
Global CDN (Azure Front Door edge nodes, reducing origin exposure)
Custom domain validation (prevents domain spoofing)
Staging environments (PR deployments go to isolated preview URLs, not production)

Why it matters: Even with perfect content security, the hosting platform must also be secure. Azure Static Web Apps is a managed platform with enterprise-grade security controls — the NovaTrek team does not manage web servers, load balancers, or TLS certificates.

Gate Comparison with Confluence¶

Gate	Docs-as-Code	Confluence Equivalent
Secret scanning	Automated, blocks push	Not available
YAML validation	Automated, blocks merge	Not applicable
Data isolation audit	Automated, blocks merge	Not available
Link validation	Automated, blocks merge	Not available
Pre-publish review	Required PR approval	Optional (page restrictions)
Build integrity	Automated, blocks deploy	Not applicable
Security headers	Version-controlled, gated	Atlassian-managed
Platform security	Azure (SOC 2, ISO 27001)	Atlassian (SOC 2, ISO 27001)

Key difference: Confluence has zero automated gates between editing and publishing. Every control is either manual (page restrictions) or managed by Atlassian (platform security). The docs-as-code model provides 6 automated gates plus a required human review, all of which must pass before content reaches production.

SLSA Framework Alignment¶

The Supply-chain Levels for Software Artifacts (SLSA) framework, developed by Google, provides an authoritative blueprint for securing CI/CD pipelines against supply chain attacks. The NovaTrek documentation pipeline aligns with SLSA Build Levels 1--3:

SLSA Level	Requirement	NovaTrek Implementation
Build L1	Fully scripted builds with provenance metadata	Entire build defined declaratively in GitHub Actions YAML
Build L2	Hosted platform with cryptographically signed provenance	Deployments run exclusively on GitHub-hosted runners; artifacts tied to source commits
Build L3	Hardened, ephemeral build environments	Each build spins up a clean, isolated runner — executes MkDocs build, deploys, and destroys the environment

Source: SLSA Framework specification and JFrog SLSA analysis.

OWASP CI/CD Risk Mitigation¶

The OWASP Top 10 CI/CD Security Risks identifies key pipeline threat categories. The docs-as-code model addresses the most critical risks:

OWASP Risk	Risk Description	Docs-as-Code Mitigation
CICD-SEC-1	Insufficient Flow Control	Branch protection rules, required PR approvals, automated status checks
CICD-SEC-3	Dependency Chain Abuse	Snyk SCA blocks malicious packages; Dependabot automates updates
CICD-SEC-4	Poisoned Pipeline Execution (PPE)	Ephemeral runners, SLSA Level 2 provenance, immutable build environments
CICD-SEC-6	Insufficient Credential Hygiene	Workload Identity Federation (OIDC) eliminates long-lived deployment credentials

Source: OWASP CI/CD Security Cheat Sheet.

Snyk Integration¶

Snyk provides three distinct scanning capabilities, each deployed as a CI gate in the documentation pipeline:

Snyk Dependency Scan (`snyk test`)¶

What it checks: All Python packages in requirements-docs.txt against the Snyk vulnerability database.

Why it matters: MkDocs, pymdownx, and other build-time dependencies may contain vulnerabilities. Even though these packages only run at build time (not in production), a compromised build dependency could inject malicious content into the generated HTML.

Implementation:

- name: Snyk dependency scan
  uses: snyk/actions/python-3.12@master
  env:
    SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
  with:
    args: --severity-threshold=high --file=requirements-docs.txt

Blocks merge on failure: Yes (HIGH or CRITICAL severity).

Snyk Code Analysis (`snyk code test`)¶

What it checks: Static analysis of Python generator scripts in portal/scripts/ for security issues including:

Path traversal vulnerabilities (generators process file paths from YAML input)
Unsafe deserialization (generators parse YAML metadata)
Injection risks (generators produce HTML output)
Hardcoded secrets or credentials

Why it matters: The generator scripts are the boundary between untrusted input (YAML metadata, OpenAPI specs) and trusted output (published HTML). Security flaws in generators could allow a crafted YAML file to produce malicious portal content.

Blocks merge on failure: Yes.

Snyk Infrastructure-as-Code Scan (`snyk iac test`)¶

What it checks: Infrastructure and configuration files for security misconfigurations:

staticwebapp.config.json — overly permissive CSP, missing security headers
infra/*.bicep — Azure resource misconfigurations
.github/workflows/*.yml — overly broad workflow permissions, missing pinned action versions

Why it matters: A misconfigured staticwebapp.config.json could silently remove all security headers from the production site. Snyk IaC catches these misconfigurations before they are deployed.

Implementation:

- name: Snyk IaC scan
  uses: snyk/actions/iac@master
  env:
    SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
  with:
    args: --severity-threshold=high
    file: portal/staticwebapp.config.json

Blocks merge on failure: Yes (HIGH or CRITICAL severity).

Continuous Monitoring¶

Beyond CI gates, Snyk's GitHub integration provides continuous monitoring:

New vulnerability alerts: If a CVE is published for a dependency that was clean at merge time, Snyk opens an automated PR with the fix
License compliance: Snyk can enforce that all dependencies use approved licenses (MIT, Apache-2.0, etc.)
Reporting dashboard: Security team gets a single-pane view of all vulnerability findings across the repository

This is a capability that Confluence cannot match — there is no way for an organization to scan Confluence's own dependencies or receive alerts when Confluence's build toolchain has a new vulnerability.

Secret Sprawl: The Scale of the Problem¶

The 2025 State of Secrets Sprawl report by GitGuardian quantifies the scale of credential exposure in modern software environments:

23.77 million new hardcoded secrets found in public repositories in 2024
25% year-over-year increase in secret exposure
58% of all detected leaks are generic secrets (API keys, passwords, connection strings)

While Confluence relies on authors to avoid pasting secrets (with no automated detection), the docs-as-code pipeline provides two layers of defence:

GitHub Push Protection — operates as a pre-receive hook that rejects commits containing detected secrets before they enter the repository history
GitHub Secret Scanning — continuously monitors for secrets that bypass push protection, scanning for 200+ partner patterns plus custom organization-defined patterns

Adding More Gates¶

The pipeline is extensible. Additional gates that can be added with minimal effort:

Gate	Tool	Purpose
Markdown lint	markdownlint-cli	Enforce consistent formatting and catch common Markdown errors
Spell check	cspell	Catch typos and enforce terminology consistency
Accessibility check	pa11y-ci	Validate generated HTML meets WCAG guidelines
Link rot detection	lychee	Check external links still resolve (scheduled, not blocking)
Content policy check	Custom script	Enforce organization-specific content policies (e.g., no PII, no internal codenames)

Each gate is a step in the GitHub Actions workflow — a YAML file that is itself version-controlled, reviewed, and auditable.

Pipeline Security Gates¶

Gate Architecture¶

Pre-Merge Gates (PR Phase)¶

Gate 1 — YAML Metadata Validation¶

Gate 2 — Solution Folder Structure Validation¶

Gate 3 — Data Isolation Audit¶

Gate 4 — Portal Build¶

Gate 5 — Confluence Dry-Run¶

Gate 6 — PR Review Approval¶

Post-Merge Gates (Deploy Phase)¶

Gate 7 — Production Build¶

Gate 8 — Static Asset Integrity¶

Gate 9 — Azure Platform Security¶

Gate Comparison with Confluence¶

SLSA Framework Alignment¶

OWASP CI/CD Risk Mitigation¶

Snyk Integration¶

Snyk Dependency Scan (snyk test)¶

Snyk Code Analysis (snyk code test)¶

Snyk Infrastructure-as-Code Scan (snyk iac test)¶

Continuous Monitoring¶

Secret Sprawl: The Scale of the Problem¶

Adding More Gates¶

Snyk Dependency Scan (`snyk test`)¶

Snyk Code Analysis (`snyk code test`)¶

Snyk Infrastructure-as-Code Scan (`snyk iac test`)¶