Cybersecurity Reference — Environment Overview¶
This page explains how NetForge RL models real enterprise attack and defence dynamics from a cybersecurity perspective. Understanding these concepts will help you interpret agent behaviour, design reward functions, and connect simulation results to real-world threat intelligence.
The Information Asymmetry Problem¶
In a real SOC, defenders never have perfect visibility. NetForge RL models this fundamental asymmetry:
| Dimension | Red Team (Attacker) | Blue Team (Defender) |
|---|---|---|
| Visibility | Knows what it has compromised | Sees only SIEM logs (with latency) |
| Advantage | Stealth — can act before detection | Telemetry — SIEM embeddings in observation |
| Constraint | Kill-chain ordering is mandatory | Business downtime costs limit aggression |
| Information | Full knowledge of own inventory | Partial — POMDP, fog of war |
This is the core research problem: how does a Blue LSTM policy learn to distinguish malicious SIEM signals from background noise, and respond before an APT achieves its objective?
The Network Topology¶
The simulated enterprise network is segmented into three subnets that map to real corporate security zones:
graph LR
Internet((Internet)) -->|Exposed| DMZ
subgraph DMZ["DMZ — 10.0.0.0/24 (Open)"]
WEB[Web Server<br/>Apache/nginx]
MAIL[Mail Server<br/>SMTP/IMAP]
JUMP[Jump Host<br/>RDP/SSH]
end
subgraph CORP["Corporate — 10.0.2.0/24 (Internal)"]
WS1[Workstation 1<br/>Windows 10]
WS2[Workstation 2<br/>Windows 10]
FILE[File Server<br/>SMB 445]
end
subgraph SEC["Secure — 10.0.1.0/24 (Zero-Trust 🔒)"]
DC[Domain Controller<br/>AD / Kerberos]
HVR[Hypervisor<br/>ESXi]
PLC[PLC Controller<br/>OT / ICS]
end
DMZ -->|Lateral movement| CORP
CORP -->|ZTNA Token Required| SEC
Why This Matters for RL¶
- DMZ is freely accessible — Red starts here. Easy to scan and exploit, but low-value targets.
- Corporate requires lateral movement — Red must first establish a foothold in DMZ, then pivot.
- Secure is Zero-Trust enforced — Red physically cannot route packets to
10.0.1.0/24without a validEnterprise_Admin_Token. This is not a probability gate; it is a hard routing constraint inGlobalNetworkState.can_route_to().
How Ticks Map to Real Dwell Time¶
NetForge RL runs on an asynchronous tick clock. Actions don't resolve instantly — they take duration ticks. This maps to real APT temporal patterns:
| Real World | NetForge RL |
|---|---|
| APT dwell time: days to weeks | 200–500 ticks of Red activity |
| Lateral movement: hours | 5–20 tick action chains |
| Incident response: minutes | 2–8 tick Blue actions |
| Log forwarding latency: seconds | log_latency = 2–5 ticks |
The tick granularity forces Red to plan a coherent kill chain rather than random action selection. An untrained Red policy achieves near-zero reward because:
DumpLSASSfails withoutRootprivilegeRootrequires a successful exploit- Exploits fail on unscanned hosts (wrong service, wrong OS)
- Recon takes ticks during which Blue detects anomalous traffic
Fog of War — What Blue Actually Sees¶
Blue agents observe the world through three channels:
1. Structured Observation Vector (256-dim)¶
Numeric features about known host states, subnet status, agent energy, funds, and tick counter.
2. SIEM Embedding (128-dim)¶
The most important signal. The NLP encoder encodes the last 8 SIEM log lines into a 128-dim TF-IDF LSA vector. These log lines are authentic Windows Event XML:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System><EventID>4624</EventID>
<TimeCreated SystemTime="2026-03-31T14:22:11.847Z"/>
<Computer>10.0.1.1</Computer>
</System>
<EventData>
<Data Name="SubjectUserName">Administrator</Data>
<Data Name="LogonType">3</Data>
<Data Name="IpAddress">10.0.0.7</Data>
<Data Name="AuthenticationPackageName">NTLM</Data>
</EventData>
</Event>
This is what a real Splunk pipeline would ingest. The Blue LSTM must learn that LogonType=3 + NTLM + IpAddress from a DMZ machine = suspicious lateral movement indicator.
3. Action Mask (62-dim)¶
Binary validity mask preventing the agent from selecting illegal actions (e.g., Blue commander can't use operator-only actions).
The Green Agent — Noise Generation¶
A passive Green Agent continuously generates background enterprise traffic — Kerberos renewals, DNS queries, Sysmon network connections, NTLM authentications — to simulate the constant low-level noise in real corporate networks.
This noise is probabilistically injected into the SIEM buffer (15% chance per tick). The Blue LSTM must learn to:
- Ignore
Event ID 4624from a known workstation authenticated to a file server (normal) - Alert on
Event ID 4624from a DMZ web server to the Domain Controller at 3am (anomalous)
This is the signal-to-noise discrimination problem that makes NetForge RL a meaningful research challenge.
Why Random Action Selection Fails¶
Red Agent Failure Mode¶
A random Red policy fails because the kill chain has mandatory ordering dependencies:
NetworkScan → DiscoverNetworkServices → ExploitEternalBlue → PrivilegeEscalate → DumpLSASS → PassTheTicket → [Secure Subnet]
Blue Agent Failure Mode¶
A random Blue policy fails because:
- IsolateHost on an uncompromised host wastes budget and incurs business downtime penalties
- RotateKerberos costs 5,000 funds — done unnecessarily, it bankrupts the SOC
- Without reading SIEM signals, Blue cannot distinguish which host to protect
The environment is designed so that only policies that learn to reason about sequences and signals can succeed.