AI-Enabled Wargaming at the U.S. Army Command and General Staff College: Its Implications for PME and Operational Planning
Wargaming remains a cornerstone of military planning, enabling commanders to test courses of action (COAs), anticipate adversary responses, and refine operational designs under compressed timelines. As articulated in Joint Publication (JP) 5-0, Joint Planning, wargaming synchronizes warfighting functions through action-reaction-counteraction cycles, exposing vulnerabilities and optimizing resource allocation. Yet, traditional analog methods—reliant on manual adjudication and static maps—constrain iteration and depth, particularly in multi-domain scenarios against peer threats. The advent of artificial intelligence (AI) offers a transformative solution. Hybrid pipelined ontological-augmented generation (OAG) with retrieval-augmented generation (RAG) models, such as those integrated into the Army’s Vantage platform, can adjudicate outcomes probabilistically while adhering to doctrinal constraints, accelerating decision cycles without sacrificing rigor.
Recent U.S. military experiments underscore this potential. The Air Force’s Decision Advantage Sprints exercises have employed AI to simulate human-machine teaming in wargames, reducing adjudication time from hours to minutes. Similarly, the Johns Hopkins Applied Physics Laboratory’s (JH APL) GenWar initiative uses large language models (LLMs) to automate scenario generation and replay, addressing the labor-intensive nature of traditional exercises. At the U.S. Army Command and General Staff College (CGSC), where faculty have led the Army’s integration of AI into military education, similar innovations culminated in a wargame exercise in November of 2025 where AI not only amplified throughput but also fostered deeper doctrinal application among novice planners. This paper analyzes the CGSC event’s execution, outcomes, and enabling factors, drawing parallels to broader Army and Joint Force initiatives. It concludes with recommendations for scaling AI integration, emphasizing the ethical and operational imperatives in an era of accelerating great-power competition.
Background and Evolution of the Capability
The CGSC experiment built on a decade of doctrinal evolution and technological experimentation. JP 5-0, Joint Planning, establishes wargaming as integral to the Joint Planning Process (JPP), requiring integration across intelligence, fires, maneuver, protection, sustainment, information, and command-and-control (C2) functions. However, the publication’s emphasis on human judgment highlights limitations in analog execution: cognitive overload and incomplete exploration of branches and sequels. AI addresses these gaps by automating probabilistic adjudication, as explored in the Army’s Global Information Dominance Experiments (GIDE), where LLMs fused multi-domain data to inform real-time planning.
By fall 2025, the CGSC capability comprised five interdependent components. (1) An in-house custom made Vantage agent with a 128,000-token context window containing the full joint task force exercise scenario, relevant Joint Publications, enemy battle books, and missile-mathematics probability tables developed for multi-domain operations. (2) An alphanumeric grid-reference graphic for spatial reasoning, mitigating LLMs’ inherent weaknesses in geospatial analysis. (3) A rigorously formatted Excel synchronization matrix with a standardized first-column “handle” (Phase / Grid / Unit / Task / Purpose) to enable reliable data parsing. (4) A pre-validated prompt sets and output templates enforcing structured responses (e.g., battle-damage assessments in tabular format). (5) A Google Drive as the collaborative information environment, ensuring accessibility for multinational participants.
Faculty preparation aligned with PME best practices outlined in JP 1, which stresses adaptive learning environments. Instructors completed formal faculty training the preceding week, focusing on ontology curation and hallucination mitigation. Students underwent a two-hour training block on prompt discipline—e.g., specifying “use missile-math tables for Pk calculations”—and human override protocols, drawing from lessons in the Army’s Artificial Intelligence Integration Center (AI2C) experiments. This preparation mirrored broader DoD efforts, such as the Air Force’s Shadow Operations Center-Nellis (ShOC-N) capstone events, which integrated AI for dynamic targeting in 2024-2025 wargames.
The evolution reflects a doctrinal shift toward human-AI teaming with an approach that will transform operational planning and complex problem solving. JP 2-0, Joint Intelligence, advocates leveraging AI for intelligence preparation of the battlefield (IPB), including predictive modeling of adversary COAs (U.S. Joint Chiefs of Staff 2013). By embedding these principles, CGSC’s model transcended ad hoc tools, becoming a scalable framework for JPP augmentation in ways that will provide cognitive overmatch and enhanced warfighting capabilities to the operational force.
Execution on 12 November 2025
The exercise simulated a four-star joint task force (JTF) operation in an Indo-Pacific theater, tasked with defeating a peer adversary’s anti-access/area-denial (A2/AD) network and restoring maritime access amid contested logistics and cyber threats—scenarios echoing JP 3-0‘s all-domain operations paradigm. Two staff groups with 16 officer-students each, including four international partners from coalition and allied nations—developed COAs over three days, then transitioned to wargaming on 12 November.
Both groups completed in total nine full turns across primary and alternate COAs, plus one limited branch on chemical, biological, radiological, and nuclear (CBRN) contamination of a logistics hub, in fewer than three hours. Underscoring AI’s compression of decision loops, parallel analog planning teams completed no more than two turns, with many teams only completing one, in the same period. Also, the group used hundreds of probabilistic “monte carlo” iterations inside each turn resulting in varying histogram of qualitative probabilities making a nine AI-enabled turns become analog 900 turns.
The workflow remained disciplined and repeatable, aligning with JP 5-0’s action-reaction-counteraction cadence. Staff sections entered friendly tasks, purposes, and locations into the synchronization matrix using the standardized first-column format. The AI operator submitted these rows with a concise pre-turn prompt, incorporating grid references and CCIRs. The agent returned a structured adjudication—typically 800–1,200 words—that included the following:
- a narrative summary linking actions to decisive points
- battle damage assessment tables with probabilistic outcomes (e.g., 25% mean Blue surface-ship attrition per engagement, std. dev. 5%)
- updated positional data on the grid
- sustainment status (e.g., fuel stocks at 30% capacity)
- intelligence discoveries with confidence intervals
- recommended Commander’s Critical Information Requirements (CCIRs).
Furthermore, the officers conducted rapid validation huddles of 60–90 seconds, querying the agent for clarifications (e.g., “recalculate fires effects using JP 5-0 Annex C”). They corrected optimistic attrition estimates when needed—drawing on embedded missile-math tables—and initiated the next cycle. The average turn duration was 15–25 minutes, enabling exploration of branches that are typically unattainable in analog settings.
International officers contributed to diverse perspectives, with one initially skeptical of AI’s “probabilistic nature” versus dice-based simulations. By Turn 4, however, the group leveraged the agent’s outputs to refine multinational fires integration, validating the hybrid model’s efficacy in coalition environments.
Figure 1: Staff groups 14C-D developed a grid reference guide into the DATE-P scenario to reference spatial positioning with the AI agents allowing clear and accurate geo referencing.
Observed Outcomes and Comparative Advantage
Quantitative gains were clear. While the students were all novices in AI agent construction and context engineering with none being exposed to professional AI use until the beginning of the academic year, they rapidly excelled. Throughput rose by a factor of five, and analytical volume grew accordingly—each turn generating 100-200 doctrinally referenced data points that would require hours of manual effort in an analog setting. Synchronization matrices reached unprecedented density and coherence, capturing interdependencies across warfighting functions with granularity rivaling theater-level tools like the Joint Planning and Execution Community System (JPEC). Second- and third-order effects in sustainment and protection surfaced earlier and more precisely than in legacy exercises, such as contested resupply shortfalls triggering C2 degradation. Senior observers judged the final joint plans comparable to products from experienced theater-level headquarters, with COA comparisons yielding weighted scores (e.g., tempo: 8/10; sustainability: 7/10) that informed commander decision making processes.
A pivotal moment occurred during Turn 6. The agent identified a critical fuel shortfall at an intermediate staging base that the Sustainment Planning Team had not yet recognized. This insight arose from the AI’s ability to track consumption rates across contested sea lines at a resolution unattainable by unaided officers under time pressure, echoing GIDE‘s data-fusion successes. Comparative analysis with analog groups revealed not just speed but enhanced learning: AI-enabled participants identified 40% more risks and branches, fostering adaptive thinking aligned with JP 5-0’s emphasis on operational art.
These outcomes align with broader experiments. The Naval War College’s 2025 wargames used AI for optimization, reducing variables from thousands to manageable sets. Similarly, the Army War College’s Free Kriegsspiel, a historical role-playing wargame, revival employed LLMs for ethical adjudication, preventing biases in unclassified scenarios.
The Indispensable Role of Human Oversight
Success required rigorous human control, reaffirming JP 2-0’s principle that AI augments, not supplants, intelligence professionals. Officers intervened repeatedly. They increased enemy lethality when the model underestimated peer missile salvo density. Referencing JP 3-12, Cyberspace Operations, were accounted for integrated effects. They enforced “no-move-unless-ordered” rules on logistics nodes to maintain realism. They re-established context after token-limit transitions via primer prompts. They validated probabilistic outcomes against embedded missile-math tables, which modeled probability of kill (Pk) at 10-38% for hard/soft targets per derived salvo equations.
These adjustments took little time yet preserved credibility, mitigating hallucinations—a risk highlighted in DoD AI ethics guidelines. Officers exercised judgment more often and at higher fidelity than in analog wargames, where exhaustion and time constraints typically force superficial analysis. This hybrid approach mirrors the Air Force’s human-machine teaming sprints, where overrides ensured doctrinal alignment.
Enabling Conditions
Five conditions were essential. First, structured data inputs came from the standardized synchronization matrix, enabling the LLM to parse inputs without ambiguity. Second, a spatial reference system used the grid overlay, addressing AI’s geospatial limitations as noted in RAND studies on IPB augmentation. Third, doctrinal and probabilistic guardrails resided in the ontology, preventing systematic bias per JP 5-0’s fidelity requirements. Fourth a pre-validated prompt architecture ensured consistency, with templates mandating outputs like “confidence: High (75–100%)”. Finally, robust faculty and student preparation underpinned everything.
Earlier iterations lacking these elements suffered context drift or unacceptable Blue bias, as documented in 2024 ERDC wargames. Integration with tools like Google Drive further ensured a “single source of truth,” facilitating RFIs and version control in multinational settings.
Institutional and Operational Implications
The 12 November exercise was the predictable result of systematic refinement across multiple cohorts, paralleling the Army’s AI2C playbook for battlefield intel. The capability can now transfer to any motivated faculty team or operational staff with three to five days of preparation, as evidenced by the U.S. Army War College’s remarkable advances with the LLM-based Kriegsspiel.
The Army faces a clear imperative. AI-enabled wargaming must become the default method in every CGSC planning exercise starting academic year 2026–2027. This mirrors the institutionalization of the Military Decision-Making Process three decades ago. Professional Military Education must teach every field-grade officer three core competencies:
- construction and curation of retrieval-augmented agents using platforms like Vantage
- prompt and context engineering to embed doctrinal constraints
- rapid validation and override of AI outputs, including hallucination detection
Operational formations from brigade to joint task force need dedicated knowledge managers and prompt engineers, just as they resource operations research/systems analysts today—potentially via Transformation and Training Command’s (T2COM) modernization directive.
Inaction risks strategic disadvantages. Peer competitors, including the PLA, already integrate AI at scale for campaign planning. The U.S. Army has demonstrated a workable, doctrinally sound model, validated against JP standards. Delay in institutionalization would amount to a self-imposed handicap, ceding initiative in the cognitive domain.
Image 1: Staff groups 14C-D meet during a review of the AI’s output of red reaction to confer and validate results, 12 November 2025.
Expansion to further consider corps-level capabilities, as recommended in Mad Scientist Laboratory podcasts, could further simulate theater attrition. Funding for dashboard interfaces—e.g., voice-activated overrides—would minimize friction, while operational security (OPSEC) protocols ensure unclassified ontologies remain secure. Ultimately, this paradigm shift aligns with the National Defense Strategy’s emphasis on innovation, preparing leaders for AI-permeated conflicts.
Implications for PME and Operational Planning
On 12 November 2025, the 32 officers and their faculty proved that AI-augmented wargaming has matured into an essential capability. They did not simply accelerate existing processes: they dramatically expanded the cognitive battlespace for planners, enabling deeper synchronization, earlier risk identification, and richer exploration of branches and sequels. The “software” solution was developed by the students and faculty at no additional cost to the Department of the Army by leveraging a current program of record, Vantage. The technical, doctrinal, and procedural solutions now exist in documented, replicable form, building on foundational experiments like those at ShOC-N and APL—that can be applied and refined by future military planning groups to enhance warfighting capability.
The remaining question centers on Army’s willingness and capacity to resource and apply AI technologies in ways that will enhance both educational delivery and operational application. In accordance with General Randy George, the Chief of Staff of the Army, and his imperative to achieve “continuous transformation,” , the Army must fully integrate AI into its staffs and teach leaders to build, constrain, and employ these systems with the same rigor applied to fire support coordination or air tasking orders. CGSC’s innovative employment of AI to enhance wargaming in 12 November 2025 show that the tools are ready. The only constraint is the speed of institutional adoption, lest adversaries outpace us in the wargaming laboratories of tomorrow.
Check out Small Wars Journal other articles with the U.S. Army’s Command and General Staff College.
The post AI-Enabled Wargaming at the U.S. Army Command and General Staff College: Its Implications for PME and Operational Planning appeared first on Small Wars Journal by Arizona State University.