Sequential Probability Ratio Test

From Chessprogramming wiki
Revision as of 05:25, 6 July 2024 by ShawnXu (talk | contribs) (ltc before stc)
Jump to: navigation, search

Sequential Probability Ratio Test, or SPRT, is a sequential hypothesis testing method. Generally regarded as the most important process in modern engine development, it is used by virtually all top engines today, including Stockfish.

Testing Methodology

Generally, SPRT is performed on every proposed functional change. Some non-functional speedup changes can also be performed using SPRT, although a speedup script may take less resources than a SPRT test.

Two bounds, x and y, must be provided to conduct an SPRT test. The test will keep going until it can guarantee, within a degree of confidence, whether or not a feature is more likely to change the strength of the by engine y elo than it is to change the strength of the engine by x elo. The choice of x and y is important as it both influences the number of tests to be run on, and the elo gain needed for a patch to pass a SPRT test.

There are many tools one can use to perform SPRT tests, the most popular ones are OpenBench, cutechess-cli, and fast-chess.

Performing an SPRT Test

The following sections describe the procedure for local SPRT testing. For distributed SPRT testing, refer to OpenBench.

Preparing an SPRT Test

In order to perform an SPRT test on the command line, there some programs and resources you would need.

Performing an SPRT Test With Gainer Bounds

Say you want to introduce LMR to your UCI engine. Build a binary of your engine with LMR added, and a binary of your base engine (without LMR). We'll call the two binarys engine_LMR and engine_BASE respectively.

In your shell, run the following command

.\fast-chess -engine cmd=[Path to engine_LMR] name=engine_LMR -engine cmd=[Path to engine_BASE] name=engine_BASE -each tc=8+0.08 -rounds 15000 -repeat -concurrency [Number of Available Threads] -recover -randomseed -openings file=[Path to Opening Book] format=[Opening book format (pgn or epd)] -sprt elo0=0 elo1=5 alpha=0.05 beta=0.05

Here, elo0 and elo1 represents the test bounds, while alpha and beta represents the expected alpha and beta errors of the test. The test verifies if the test with proposed changes are closer to 0 elo or 5 elo. Therefore, if the test passes, we can be reasonably certain that it brings more than 0 elo.

Performing an SPRT Test With Non-regression Bounds

Say you want to remove Countermove Heuristic from your UCI engine, as it takes a large amount of code and does not necessarily gain elo. To verify that the removal doesn't lose elo below an accepted amount, a non-regression test is performed. Build a binary of your engine with Countermvoe Heuristic removed, and a binary of your base engine (without removal of Countermove Heuristic). We'll call the two binarys engine_NOCM and engine_BASE respectively.

.\fast-chess -engine cmd=[Path to engine_NOCM] name=engine_NOCM -engine cmd=[Path to engine_BASE] name=engine_BASE -each tc=8+0.08 -rounds 15000 -repeat -concurrency [Number of Available Threads] -recover -randomseed -openings file=[Path to Opening Book] format=[Opening book format (pgn or epd)] -sprt elo0=-5 elo1=0 alpha=0.05 beta=0.05

Here, the bounds [-5, 0] are flipped version of the gainer bounds. This is equivalent to testing engine_BASE against engine_NOCM (notice the reverse order) with [0, 5] bounds. If the test fails, it means that Counter Moves is still worth more than an acceptable amount of elo, therefore it should not be simplified.

Common Bounds

A good choice of bounds balances the expected number of games and the lowest elo gain needed for passing. Due to a variety of factors, including elo compression, bounds are generally looser for weaker engines and tighter for stronger engines.

Gainer Bounds Non-regression Bounds Usage
[0.5, 2.5] [-1.75, 0.25] Stockfish LTC
[0, 2] [-1.75, 0.25] Stockfish STC
[0, 3] [-3, 1] Top 20 engines
[0, 5] [-5, 0] Top 100 engines
[0, 10] [-10, 0] All other engines