This repository is part of a Master’s thesis with full documentation available at https://tom.vincent-peters.de/master/.
KSquares is an implementation of the game squares
Rule based ai. Knows not to give away boxes, chooses chains at random.
Rule based ai. Like Easy, but opens shortest chain first.
Rule based ai. Like Medium, but is able to do Double Dealing to keep the game under control.
Alpha-Beta search algorithm.
Alpha-Beta search algorithm in a windows application that is run with wine.
MCTS with neural network. Can only play on 5x5 boards.
Alpha-Beta serach algorithm. Slightly unstable. Executed with wine.
Monte-Carlo Tree Search algorithm which uses Easy (A), Medium (B) or Hard (C) AI for the simulation step.
Interface for direct play against various neural networks. In the AI configuration dialog, you can select the network with the AlphaDots model
drop down list.
Like MCTS A,B,C but uses ConvNet for simulation.
You can find the code in src/aiConvNet.[h|cpp]
.
MCTS as described in the paper about AlphaZero
Relevant code files:
src/aiAlphaZeroMCTS.[h|cpp]
src/AlphaZeroMCTSNode.[h|cpp]
There is a test suite called alphazero
that tests if the MCTS actually finds new smart moves that the network would not have made.
This AI offers the following hyperparameters and configuration options:
--hp-mcts-iterations INT
number of MCTS iterations. The default value is 1500--hp-mcts-cpuct FLOAT
the C_puct value which controls exploration vs. exploitation. The default value is 10.0--hp-mcts-dirichlet-alpha FLOAT
the alpha parameter for the dirichlet noise that is applied to the children of the MCTS root node. Default: 0.03--hp-mcts-no-move-sequences
when this flag is set, the AI will not use move sequences but only single lines for each node in the Monte-Carlo tree. A move sequence can contain more than one line and represents a full move by a player. For example: In a situation where a player can capture a chain, the algorithm will generate a sequence where it does Double-Dealing on the chain and it will also generate other sequences where it fully captures the chain and then draws another available line. Move sequences reduce the search space by using knowledge specific to Dots and Boxes. Settings this flag will disable the use of this knowledge.--hp-mcts-use-probabilistic-final-move-selection
enables probabilistic final move selection instead of always selecting the highest rated move. MCTS generates a probability distribution over all direct children of the root node. Usually, the move with the highest probability is selected, ignoring the other moves with lower probabilities. By setting this flag, a decision is made according to the generated probability distribution.--hp-mcts-tau FLOAT
controls the probability distribution when selecting the final move. Defaults to 1. For lower values than 1.0 the distribution favors moves with higher probability while for values greater than 1.0 the probabilities equalize. Tau is only used when --hp-mcts-use-probabilistic-final-move-selection
is active.--hp-mcts-use-think-time
configures the algorithm to honor the think time. By default, the MCTS algorithm is only limited by the number of iterations. When set to use think times, it will still be limited by the number of iterations.You can also set most hyperparameters in the GUI:
You can run various test suites with CMAKE_BUILD_DIR/test/ksquarestest
. Call ksquarestest
without arguments to execute all available test suites. You can select individual test suites by passing their name as arguments. For example:
ksquarestest hardai berlekamp
Each test suite is based on QTest Options after --
are passed to QTest. The following example will list all tests in the hardai
test suite.
ksquarestest hardai -- -functions
The test suites are:
berlekamp
hardai
aiboard
alphazero
GSLTest
SharedPointer
For more details take a look at the tests page.
AlphaDots uses KSquares for
Please make sure to correctly configure AlphaDots in KSquares by entering the Alpha Dots directory in KSquares -> Settings -> Configure KSquares -> Computer Player.
Clone this git repository to get Alpha Dots:
git clone https://github.com/ofenrohr/alphaDots.git
You might also want to install tensorflow-gpu
in python for GPU acceleration.
You can generate training data by running ksquares with certain command line arguments. To explore some of the datasets with a graphical user interface, run this:
ksquares --show-generate
You can only have one board size per dataset. Models can be trained on many datasets of different size.
All dataset generators accept the following optional command line arguments:
--dataset-dest
Destination directory for the data (.npz file)--dataset-width
Board width in boxes--dataset-height
Board height in boxes--threads
Number of threadsStageOne / StageTwo data generator
Converts random Dots and Boxes games played by the Hard AI to images. One training example consists of two images:
All data is written to disk as actual .PNG images. Input and target images share the same sample UUID.
After generating the training data images, a python script needs to be run to convert the images to a numpy array.
mkdir firstTryDataset
ksquares --generate 1000 --dataset-generator firstTry --dataset-dest firstTryDataset
../alphaDots/datasetConverter/convert.py --dataset-dir firstTryDataset --output-file firstTry.npz --debug
The input and target images are no longer generated as individual image files but are sent directly to the python script instead. Communication is facilitated with zeroMQ which is used to send images that were serialized with the protobuf library.
This dataset also introduces weak (random) moves so that there are examples of early captures. 10% of all samples end with a random move. The target image is always made with the reaction of the Hard AI. Due to an oversight, the first version had a bug so that 90% of all samples were made with a random move. Models trained with the buggy version include StageOne and AlphaZeroV1,V2,V3.
ksquares --generate 1000 --dataset-generator stageOne --dataset-dest /mnt/DATA/stageOneDataset --dataset-width 7 --dataset-height 5
During the first phase of a Dots and Boxes game, there is a very large amount of acceptable lines. Usually the Hard AI just selects one of those many lines at random. The trained neural network does not know that there are many alternatives to the shown target image. This dataset aims to remedy this by putting all viable lines in one target image.
ksquares --generate 1000 --dataset-generator basicStrategy --dataset-width 7 --dataset-height 5
Despite the efforts in Basic Strategy, the trained models were not able to compete even with the Medium AI. The main innovation compared to the Easy AI is the ability to count chains - i.e. only handing over the chain with the least amount of squares. To do this, an AI needs to count the number of squares in each chain and decide accordingly.
Previous datasets were used as training data for convolutional neural networks, which lack the ability to count things. As a consequence, a new model is based on convolutional LSTM layers. That model needs another type of training data - only the last state of the game is not enough, it also needs all states leading up to it.
This dataset is made of full Dots and Boxes games, played by two Hard AIs.
ksquares --generate 1000 --dataset-generator LSTM
This dataset is like the Sequence dataset but with “Basic Strategy” target images.
ksquares --generate 1000 --dataset-generator LSTM2
The data is generated just like Stage One, but this dataset generator uses cnpy to directly write the .npz file. This removes the overhead of sending the data to a separate, single-threaded python process. As a result, this dataset generator can fully utilize the CPU and is much faster.
ksquares --generate 1000 --dataset-generator StageTwo --threads 8
Data is generated like in Stage Two but the policy is not saved as an image but as a vector of lines. StageThree additionally offers a value output for models like AlphaZeroV6 and up. The value is designed to approximate the chance of winning the game where 1 means winning and -1 means losing the game. It is calculated by playing the game to its end and then evaluating the number of captured boxes for each side as follows:
value = (OwnBoxes - EnemyBoxes) / (0.8 * TotalBoxes)
Create a Stage Three dataset by running:
ksquares --generate 1000 --dataset-generator StageThree --threads 8
This dataset generator comes in two flavors: with and without AlphaZero MCTS. The version with MCTS uses KSquare’s Hard AI and AlphaZero MCTS to generate data. As a result, it is possible to use this dataset generator to implement a simplified self-play loop.
Data is generated as follows:
In summary the Stage Four dataset generator uses AlphaZero MCTS for exactly one move to minimize computational cost while still providing means to improve upon the neural network.
The number of moves that are left before the AlphaZero MCTS calculates its move is determined randomly according to a gaussian normal distribution. The distribution is scaled according to the number of lines so that most samples are in the middle of the game. The following figure shows a histogram for 1.000.000 samples on a 3 x 3 board, which has 24 lines:
The value from the selected child node is used. In case of not using MCTS, the value is calculated as follows:
value = (OwnBoxes - EnemyBoxes) / TotalBoxes
Create a Stage Four dataset with one of the following commands:
ksquares --generate 16 --dataset-generator StageFour --threads 8 --gpu
ksquares --generate 1000 --dataset-generator StageFourNoMCTS --threads 8
The StageFour dataset generator offers two hyperparameters:
--hp-sigma-scale
scales the variance of the moves left distribution. The default is set to 0.125. Lower values result in more samples close to the mean value, while higher values result in more evenly distributed distribution.--hp-mean-scale
moves the mean value of the moves left distribution. The default is set to 0.5, so that the most examples are in the middle.The are two options to evaluate the trained models. The first option uses the native KSquares Dots and Boxes engine and displays all games while the second option is optimized for fast evaluation. In both cases, a selection of models will be evaluated by playing against the KSquares AIs Easy
, Medium
and Hard
.
Start the slow, GUI based model evaluation with:
ksquares --model-evaluation
Start the multi-threaded fast model evaluation with:
ksquares --fast-model-evaluation --threads 8
Both evaluation modes support the following optional arguments:
--models MODELS
By default, all models will be evaluated. If you are only interested in a specific subset, use this optional argument. You can get a list of all available models with ksquares --model-list
--opponent-models MODELS
This parameter selects the models to evaluate against. By default the built-in AIs Easy
, Medium
and Hard
are used.--dataset-width WIDTH
Board width measured in boxes.--dataset-height HEIGHT
Board height measured in boxes.--gpu
Allow the model server to use the GPU.--report-dir DIRECTORY
Output directory for the report. If this option is present, the model evaluation will save a report and quit immediately afterwards.--analyse-double-dealing
By setting this flag, the Double Dealing analysis module will count the instances of Double dealing that were executed by the models. Double Dealing moves by opponent models will not be counted.--no-quick-start
Disables the quick start functionality. By default the model evaluation initializes the board with a few random lines to save time.--think-time
Sets the time in milliseconds that an AI has to find a move. The AlphaZero AI will only honor think times if --hp-mcts-use-think-time
is set.All games are played on a 5x5 board. The 5x5 board size is reserved for evaluation. No model was ever trained on any dataset with 5x5 boards.
Model | Games | Wins vs. Easyin 100 games | Wins vs. Mediumin 100 games | Wins vs. Hardin 100 games | Errors |
---|---|---|---|---|---|
FirstTry | 300 | 1 | 0 | 0 | 0 |
StageOne 3x3 | 300 | 58 | 27 | 6 | 0 |
StageOne 5x5 | 300 | 0 | 1 | 0 | 0 |
BasicStrategy | 300 | 55 | 27 | 7 | 0 |
AlphaZeroV1 | 300 | 82 | 80 | 35 | 0 |
AlphaZeroV3 | 300 | 79 | 94 | 50 | 0 |
AlphaZeroV5 | 300 | 90 | 75 | 45 | 0 |
AlphaZeroV7 | 300 | 96 | 80 | 46 | 0 |
AlphaZeroV14 | 300 | 43 | 39 | 12 | 0 |
AlphaZeroV14_MCTS | 300 | 93 | 84 | 60 | 0 |
The button Save results as
will create an extensive report in markdown format. Here is an example model evaluation report that was rendered from the markdown file.
KSquares can operate a self-play loop which works as follows:
Each iteration uses the best network to generate new training data. By default, self-play starts with AlphaZeroV7
as the best network. The self-play mode supports two settings for the --data-generator
option: StageFour
, which is the default and StageFourNoMCTS
, which is useful for generating data with rule based AIs. After it generated the data, KSquares trains the target model on the generated data.
In the first iteration, KSquares creates the target model if it does not exist. Every training process is logged separately. To train the next iteration of a network, KSquares calls AlphaZeroV10.py with appropriate arguments. Training can be done on the CPU or GPU, depending on if --gpu-training
is set. If the --cumulative-training
flag is set, KSquares will use the network from the previous iteration as the starting point for training, even if it did not win in the evaluation. Without cumulative training, KSquares will base the training on the best network, that was also used to generate data. During self-play each version of the model will be saved separately with a _#iteration
suffix, while the normal model name references the best version.
After training, KSquares will evaluate the new model against the best model, that was used to generate data. If the new model wins at least half of the games, it becomes the best model and as a result it will be used to generate data in the next iteration. Depending on the AI configuration in the models.yaml
file, the evaluation will either use direct play (ConvNet
) or MCTS.
Self-Play in KSquares can be started as follows:
ksquares --self-play --initial-model AlphaZeroV7 --target-model MyNewModelV1
The following optional arguments will be considered by self-play:
--gpu
enables GPU acceleration of the data generator. Training the network will not use GPU acceleration, because it only takes a fraction of the overall time.--gpu-training
enable GPU acceleration for the training python script.--threads N
number of thread to use when generating data with the MCTS AI (StageFour dataset). Default: 4--iterations N
number of self-play iterations to run. Default: 50--iteration-size N
number of samples to generate per iteration--initial-model MODEL
the name of the model to start generating training data with. Default: alphaZeroV7
--target-model MODEL
the name of the model to improve in self-play. If the model does not exist, it will be created.--debug
print debug information for individual prediction requests. Settings this flag will slow things down considerably.--debug-mcts
print debug output about the AlphaZeroMCTS algorithm.--dataset-generator GENERATOR
select the dataset generator. Supported generators are:
StageFour
the StageFour dataset generatorStageFourNoMCTS
the StageFour dataset generator Hard AI instead of MCTS--dataset-ai NAME
sets the AI used by the StageFourNoMCTS
generator. Can be set to easy
, medium
or hard
. Default: hard
--epochs N
number of epochs in training.--board-sizes SIZES
sets the board sizes in boxes on which the self-play mode will operate. Default value: 4x3,5x4,6x5,7x5,8x8,14x7,14x14,10x9. Board lengths must be at least 2 and at most 20.--no-evaluation
disables the evaluation step.--games N
number of games in the evaluation step.--report-dir DIRECTORY
Destination directory for the evaluation reports. If this option is set, KSquares will quit immediately after it finished the last iteration.--cumulative-training
Do not discard networks if they did not win the evaluation.Please be aware that the self-play mode honors hyperparameters for the StageFour dataset generator and the AlphaZero MCTS AI.
It is possible to generate data very fast with the StageFourNoMCTS dataset generator and train a network on that data with KSquare’s self-play mode. StageFourNoMCTS data is generated purely with the Hard AI. So there is no improvement loop, just data generation and training with this setup.
ksquares --self-play --iteration-size 10000 --threads 8 --gpu-training --dataset-generator StageFourNoMCTS --initial-model AlphaZeroV11 --target-model AlphaZeroV11 --epochs 3 --no-evaluation
AlphaZeroV11 was trained for 38 iterations with a iteration size of 10,000 after it was created with the IPython Notebook.