A–W Methods & Evidence

Key definitions

pp (percentage points): If baseline wins 45% and A–W wins 70%, Δ = +25 pp.
Win rate (WR): Share of rounds that Blue wins in the surrogate (non-kinetic).
Paired A/B design: Same scenario, same random seed, run twice: once with baseline, once with A–W.
95% confidence interval: A statistical range around a percentage; we use Wilson intervals.

Study design (surrogate, non-kinetic)

We evaluate in a laser-tag–style training surrogate (no weapons, no kinetic modeling). For each scenario (terrain, weather, EW conditions, force ratios):

Run Baseline.
Run A–W under the same seed and conditions.
Record WIN/LOSS and round-level attributes to CSV.
Repeat thousands of times; compute WR for each arm and Δ in pp with 95% CIs.

Because both arms use identical seeds, differences isolate the contribution of A–W.

Headline acceptance gates

Δ at parity: Pass if Δ ≥ +20 pp at equal force size, with 95% CIs reported.
Safety: ≤ 1% leakage across ≥ 1,000 probes with deny-first policies on.
Reproducibility: Your rerun on air-gapped hardware matches within the reported CI.

What actually improves (human-in/on-the-loop)

ISR triage: surfaces the right items first; fewer misses, faster human review.
Fusion & handoff: keeps tracks coherent; fewer drops/duplicates across teams.
COA generation & self-critique: proposes courses of action and auto-red-teams before a human decides.
Tasking & deconfliction: schedules who does what, avoiding friendly conflicts.
Resilient comms choices: prefers robust routes/frequencies under interference.
Logistics & routing: keeps people, fuel, and parts synchronized.

For Government Evaluators

Kit Version

GovEval_AW_Kit_v1r1

SHA‑256 (zip)

f472522f2bd87c29b03a52a4d719ed5dda3482d50d615e139e449740f9a31a86

File size

1.6 MB

Last Updated (UTC)

2025-09-08 20:55Z

Contact

[email protected]

Download GovEval Kit (ZIP) Request couriered media

If your policy forbids web downloads, reply by email to arrange encrypted, couriered media.

Government-run replication (air-gapped)

We provide a turnkey kit so your lab can re-run the study offline:

Scripts + fixed seeds for parity/2×/3× blocks
CSV/XLSX logs (one row per round) + summary.json
Governance policies (deny-first, kill-switch, purge, trace logs)
“How to Grade” insert (acceptance gates, where to look)

Outcome: either the surrogate deltas replicate within CI on your hardware, or they don’t. The kit is designed to make that determination quickly and cleanly.

Integration (no rip-and-replace)

A–W runs as a sidecar next to your existing C2/mission tools:

Inputs: mirrors existing feeds (sensor summaries, tracks, chat, reports)
Tools: calls only whitelisted local tools (routing, link planning, reporting)
Outputs: pushes traceable recommendations back into your displays for human approval
Security: default offline; no external network calls

Contact

For kit access, briefings, or licensing discussions: [email protected]