Skip to content

NetForge RL

  • Zero-Trust Identity


    Cryptographic token enforcement blocks Red agents from accessing secured subnets without valid Kerberos credentials.

    Architecture

  • Sim2Real Bridge


    Dual-mode hypervisor — MockHypervisor for fast training, DockerHypervisor for live Vulhub container execution.

    Sim2Real

  • NLP-SIEM Pipeline


    Real Windows Event XML + Sysmon logs encoded into 128-dim TF-IDF vectors injected directly into the Blue LSTM policy.

    NLP-SIEM

  • 32 Actions


    17 Red Team attack primitives and 15 Blue Team SOC responses, all mapped to real MITRE ATT&CK techniques.

    Action Reference


What is NetForge RL?

NetForge RL is a high-fidelity multi-agent reinforcement learning (MARL) cybersecurity environment designed for research-grade Sim2Real transfer. It is mathematically derived from the CAGE/CybORG challenge environment and evolved into a physically constrained network simulation where:

  • Red agents execute realistic APT kill chains — recon, exploit, privilege escalation, lateral movement, and impact
  • Blue agents operate a SOC under realistic POMDP conditions — partial observability, noisy telemetry, and business cost trade-offs
  • The environment enforces cryptographic ZTNA constraints, asynchronous tick timing, and stochastic SIEM telemetry

NetForge RL bridges the gap between clean RL toy environments and the messy, probabilistic world of real enterprise security operations.

graph TB
    subgraph RedTeam["🔴 Red Team"]
        RC[Red Commander<br/>Strategic Planning]
        RO[Red Operator<br/>Tactical Execution]
    end

    subgraph BlueTeam["🔵 Blue Team"]
        BC[Blue Commander<br/>SOC Director]
        BO[Blue Operator<br/>Analyst / Responder]
    end

    subgraph Engine["⚙️ NetForge Engine"]
        ENV[Async Tick Engine<br/>PettingZoo ParallelEnv]
        ZTNA[Zero-Trust Physics<br/>Token Enforcement]
        S2R[Sim2Real Bridge<br/>Mock / Docker]
        SIEM[SIEM Logger<br/>Windows Event XML]
        NLP[NLP Encoder<br/>TF-IDF → 128-dim]
    end

    subgraph Network["🌐 Simulated Corporate Network"]
        DMZ[DMZ Subnet<br/>10.0.0.0/24]
        CORP[Corporate Subnet<br/>10.0.2.0/24]
        SEC[Secure Subnet<br/>10.0.1.0/24 🔒]
    end

    RC --> ENV
    RO --> ENV
    BC --> ENV
    BO --> ENV
    ENV --> ZTNA --> Network
    ENV --> S2R
    S2R --> SIEM --> NLP
    NLP --> BC
    NLP --> BO

Key Design Principles

1. Physically Constrained

Actions are not instant boolean state flips. Every action has a duration, agents can be interrupted mid-operation, and the ZTNA layer physically prevents routing unless cryptographic tokens are held.

2. POMDP with Realistic Noise

Blue agents never see ground truth. They observe SIEM telemetry with realistic log latency, false positives injected by the Green noise agent, and NLP-encoded Windows Event XML that requires learning to distinguish true positives from background traffic.

3. Sim-to-Real Transfer Ready

The Sim2RealBridge lets you train at 1000 steps/second using MockHypervisor, then evaluate using live Vulhub Docker containers with real exploit payload execution — with a single config flag change.

4. Attack Economics

Both teams operate under finite budgets (agent_funds, agent_energy, agent_compute). Reckless Blue isolation triggers business_downtime_score penalties modelling real SLA fines. Red must balance speed against cost.