Sequential Probability Ratio Test, or SPRT, is a sequential hypothesis testing method. Generally regarded as the most important process in modern engine development, it is used by virtually all top engines today, including Stockfish.

Testing Methodology

Generally, SPRT is performed on every proposed functional change. Some non-functional speedup changes can also be performed using SPRT, although a speedup script may take less resources than a SPRT test.

Two bounds, x and y, must be provided to conduct an SPRT test. The test will keep going until it can guarantee, within a degree of confidence, whether or not a feature is more likely to change the strength of the by engine y elo than it is to change the strength of the engine by x elo. The choice of x and y is important as it both influences the number of tests to be run on, and the elo gain needed for a patch to pass a SPRT test.

There are many tools one can use to perform SPRT tests, the most popular ones are OpenBench, cutechess-cli, and fast-chess.

Performing an SPRT Test

The following sections describe the procedure for local SPRT testing. For distributed SPRT testing, refer to OpenBench.

Preparing an SPRT Test

In order to perform an SPRT test on the command line, there some programs and resources you would need.

A Tournament Manager that supports SPRT testing. fast-chess is recommended as it supports pentanomial statistics, which can make tests finish faster.
An opening book (.pgn or .epd, not .bin). Some opening books can be found in the Stockfish opening books repository. Balanced books such as 8moves_v3.pgn are recommended for weaker engines, while more biased books such as UHO_Lichess_4852_v1.epd and Pohl.epd are recommended with stronger engines to reduce draw rate.

Performing an SPRT Test With Gainer Bounds

Say you want to introduce LMR to your UCI engine. Build a binary of your engine with LMR added, and a binary of your base engine (without LMR). We'll call the two binarys engine_LMR and engine_BASE respectively.

In your shell, run the following command

.\fast-chess -engine cmd=[Path to engine_LMR] name=engine_LMR -engine cmd=[Path to engine_BASE] name=engine_BASE -each tc=8+0.08 -rounds 15000 -repeat -concurrency [Number of Available Threads] -recover -randomseed -openings file=[Path to Opening Book] format=[Opening book format (pgn or epd)] -sprt elo0=0 elo1=5 alpha=0.05 beta=0.05

Here, elo0 and elo1 represents the test bounds, while alpha and beta represents the expected alpha and beta errors of the test. The test verifies if the test with proposed changes are closer to 0 elo or 5 elo. Therefore, if the test passes, we can be reasonably certain that it brings more than 0 elo.

Performing an SPRT Test With Non-regression Bounds

Say you want to remove Countermove Heuristic from your UCI engine, as it takes a large amount of code and does not necessarily gain elo. To verify that the removal doesn't lose elo below an accepted amount, a non-regression test is performed. Build a binary of your engine with Countermvoe Heuristic removed, and a binary of your base engine (without removal of Countermove Heuristic). We'll call the two binarys engine_NOCM and engine_BASE respectively.

.\fast-chess -engine cmd=[Path to engine_NOCM] name=engine_NOCM -engine cmd=[Path to engine_BASE] name=engine_BASE -each tc=8+0.08 -rounds 15000 -repeat -concurrency [Number of Available Threads] -recover -randomseed -openings file=[Path to Opening Book] format=[Opening book format (pgn or epd)] -sprt elo0=-5 elo1=0 alpha=0.05 beta=0.05

Here, the bounds [-5, 0] are flipped version of the gainer bounds. This is equivalent to testing engine_BASE against engine_NOCM (notice the reverse order) with [0, 5] bounds. If the test fails, it means that Counter Moves is still worth more than an acceptable amount of elo, therefore it should not be simplified.

Common Bounds

A good choice of bounds balances the expected number of games and the lowest elo gain needed for passing. Due to a variety of factors, including elo compression, bounds are generally looser for weaker engines and tighter for stronger engines.

Gainer Bounds	Non-regression Bounds	Usage
[0.5, 2.5]	[-1.75, 0.25]	Stockfish LTC
[0, 2]	[-1.75, 0.25]	Stockfish STC
[0, 3]	[-3, 1]	Top 20 engines
[0, 5]	[-5, 0]	Top 100 engines
[0, 10]	[-10, 0]	All other engines

Sequential Probability Ratio Test

Contents

Testing Methodology

Performing an SPRT Test

Preparing an SPRT Test

Performing an SPRT Test With Gainer Bounds

Performing an SPRT Test With Non-regression Bounds

Common Bounds

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools