小論文: AI拡張UCTによる汎用ゲームプレイ

author:

黒岩晃弘 と Gemini

date:

2025/10/03

概要

This paper introduces mcts-gen, a novel framework for Monte Carlo Tree Search (MCTS) that replaces the evolutionary mechanisms of Genetic Programming (GP) with a modern AI agent. We propose an architecture centered on an "AI-Augmented UCT" algorithm, where a standard UCT search is enhanced at three key points by an external AI: terminal node evaluation (Value), dynamic search parameter tuning (Exploration), and, most significantly, action space reduction via Policy Pruning. This approach diverges from the popular AlphaZero model by retaining the UCT algorithm's simplicity while leveraging an AI's policy model to dramatically improve performance in games with large branching factors, such as Shogi. We demonstrate a stateful client-server model where the AI agent orchestrates the entire simulation loop, iteratively refining its strategy based on real-time performance metrics.

1. 遺伝的プログラミングのAIエージェントによる置換

In previous works like chess-ant, Genetic Programming was used to evolve a strategy for tuning the MCTS explorationConstant. This process, while effective, involved a computationally expensive evolutionary cycle with a large population and multiple generations. Each evaluation required a full MCTS simulation, leading to significant time investment.

mcts-gen replaces this entire evolutionary loop with a single, intelligent AI agent. The agent maintains a single strategic model (e.g., a Python function) and iteratively refines it based on direct feedback from the search process. This AI-driven, single-strategy evolution is significantly more efficient, allowing for rapid strategy adaptation without the overhead of managing a genetic population.

The core of this interaction is a stateful simulator (AiGpSimulator) whose methods are exposed as MCP tools. The AI agent calls these tools strategically, managing the simulation loop externally. In particular, the introduction of the Search Limit mechanism (via run_mcts_analysis) allows the agent to execute multiple MCTS rounds in a single batch. This mimics the GP routine() cycle in chess-ant but with AI-driven budget allocation, ensuring a tight feedback cycle of execution, analysis, and self-correction while avoiding API-level bottlenecks.

2. Policy Pruning: PUCTの代替案

AlphaZero and its derivatives integrate a policy network directly into the selection phase of MCTS via the PUCT (Polynomial Upper Confidence Trees) formula. While powerful, this tightly couples the search algorithm with the policy model.

We propose a simpler, more decoupled approach: Policy Pruning. The workflow is as follows:

  1. The AI agent calls a tool (get_possible_actions) to retrieve all legal moves from the current node.

  2. The agent applies its internal policy model to this list, filtering out unpromising moves and creating a smaller, pruned list of candidate actions.

  3. The agent then calls the main search tool (run_mcts_round), passing this pruned list (actions_to_expand) as an argument.

  4. MCTSエンジンは、このリストを受け取ると、その展開フェーズをAIによって提供されたアクションのみを考慮するように制約する。

This method effectively uses the AI's policy as a high-level filter, dramatically reducing the branching factor of the search tree, especially in complex games like Shogi. It allows the underlying engine to remain a standard UCT implementation, simplifying the architecture while still reaping the primary benefit of a policy network.

3. AI主導の探索によるUCT

Instead of PUCT, mcts-gen uses the standard UCT (Upper Confidence bounds for Trees) algorithm for node selection. The key innovation lies in how the explorationConstant (C in the UCT formula) is determined.

  • The AI agent is responsible for generating and maintaining a strategy (e.g., a Python function) that determines the optimal explorationConstant for any given game state.

  • This strategy can be complex, taking into account game-specific features (e.g., board.is_check()) and generic simulation metrics (e.g., improvement of the UCT value).

  • The AI executes this strategy to choose a constant for each simulation loop and refines the strategy code based on performance, effectively learning how to best balance exploration and exploitation.

4. AlphaZeroとのその他の相違点

  • Decoupled Logic: The MCTS engine and the AI "brain" are fully decoupled. The engine provides generic tools, and the AI uses them to implement its own, potentially complex, search logic.

  • Stateful Interaction: Unlike a stateless model, the server maintains the MCTS tree instance across multiple tool calls, allowing the AI to build upon previous search results within a single turn.

  • Explicit Strategy: The AI's exploration strategy is an explicit, human-readable piece of code, which can be logged and analyzed, offering greater transparency than the implicit weights of a neural network.

5. ゲームロジック生成の課題

The mcts-gen framework is designed to be generic. This requires the creation of game-specific logic files (*_mcts.py) that inherit from a GameStateBase abstract class. This task has proven to be complex for both humans and AI agents due to the need for a deep understanding of two separate APIs: the game library (e.g., python-shogi) and the GameStateBase interface.

Our experience shows that this process is not a simple, one-shot generation. It requires iterative trial and error, debugging, and a precise understanding of concepts like object copying (deepcopy), return value conventions, and API-specific methods (e.g., board.is_checkmate() vs. GameStateBase.takeAction(action)).

The use of a toolkit for Spec-Driven Development like spec-kit is highly recommended for this process. By defining the requirements in structured markdown files (spec.md, plan.md, tasks.md), the AI can follow a clear, test-driven development (TDD) cycle, breaking down the complex task into manageable steps and verifying each one, which has proven essential for success.

6. Comparison with chess-ant's GP Model

  • `chess-ant`: The Genetic Programming model in chess-ant relies on a large-scale evolutionary simulation. For the evaluation of each individual in the population, key instance variables used for statistics are reset, but the underlying MCTS search tree is maintained. This process, where a full MCTS simulation is run for each individual across many generations, is computationally massive.

  • `mcts-gen`: The AI agent replaces the entire population. It maintains a single strategy and iteratively improves it. The AI drives a main loop where it decides the Search Limit—the number of MCTS rounds to execute in a single batch. This is equivalent to one GP routine() in chess-ant, but the strategy refinement is done intelligently by the AI after each batch, rather than through generational evolution. Furthermore, for complex domains like ligand generation, the agent now manages Conformational Diversity, exploring various 3D orientations as distinct actions in the MCTS tree. The result is a significantly more efficient and realistic search process.

7. 参考文献