Zobrist Hashing

From Chessprogramming wiki
Jump to: navigation, search

Home * Search * Transposition Table * Zobrist Hashing

Zobrist Hashing,
a technique to transform a board position of arbitrary size into a number of a set length, with an equal distribution over all possible numbers, invented by Albert Zobrist [3]. In an early Usenet post in 1982, Tom Truscott mentioned Jim Gillogly's n-bit hashing technique [4], who apparently read Zobrist's paper early, and credits Zobrist in a 1997 rgcc post [5]. Zobrist Hashing is an instance of tabulation hashing [6], a method for constructing universal families of hash functions by combining table lookup with exclusive or operations. Zobrist Hashing was rediscovered by J. Lawrence Carter and Mark N. Wegman in 1977 [7] and studied in more detail by Mihai Pătrașcu and Mikkel Thorup in 2011 [8] [9].

The main purpose of Zobrist hash codes in chess programming is to get an almost unique index number for any chess position, with a very important requirement that two similar positions generate entirely different indices. These index numbers are used for faster and more space efficient Hash tables or databases, e.g. transposition tables and opening books.

Metamorphosis

Metamorphosis II 1.jpg
Metamorphosis II 2.jpg

M. C. Escher, Metamorphosis III, 1967-1968 [10]

Initialization

At program initialization, we generate an array of pseudorandom numbers [11] [12]:

This leaves us with an array with 781 (12*64 + 1 + 4 + 8) random numbers. Since pawns don't happen on first and eighth rank, one might be fine with 12*64 though. There are even proposals and implementations to use overlapping keys from unaligned access up to an array of only 12 numbers for every piece and to rotate that number by square [13] [14] .

Programs usually implement their own Pseudorandom number generator (PRNG), both for better quality random numbers than standard library functions, and also for reproducibility. This means that whatever platform the program is run on, it will use the exact same set of Zobrist keys. This is also useful for things like opening books, where the positions in the book can be stored by hash key and be used portably across machines, considering endianness.

Runtime

If we now want to get the Zobrist hash code of a certain position, we initialize the hash key by xoring all random numbers linked to the given feature, e.g. the initial position:

[Hash for White Rook on a1] xor [Hash for White Knight on b1] xor [Hash for White Bishop on c1] xor ... ( all pieces )
... xor [Hash for White king castling] xor [Hash for White queeb castling] xor ... ( all castling rights )

The fact that xor-operation is own inverse and can be undone by using the same xor-operation again, is often used by chess engines. It allows a fast incremental update of the hash key during make or unmake moves. E.g., for a White Knight that jumps from b1 to c3 capturing a Black Bishop, these operations are performed:

[Original Hash of position] xor [Hash for White Knight on b1] ... ( removing the knight from b1 )
... xor [Hash for Black Bishop on c3] ( removing the captured bishop from c3 )
... xor [Hash for White Knight on c3] ( placing the knight on the new square )
... xor [Hash for Black to move] ( change sides)

Collisions

Key collisions or type-1 errors are inherent in using Zobrist keys with far less bits than required to encode all reachable chess positions.

Theory

An important issue is the question of what size the hash keys should have. Smaller hash keys are faster and more space efficient, while larger ones reduce the risk of a hash collision. A collision occurs if two positions map the same key [15] . The dangers of which were well assessed by Robert Hyatt and Anthony Cozzie in their paper Hash Collisions Effect [16]. Usually 64bit are used as a standard size in modern chess programs.

Hash collisions demonstrate the birthday "paradox", which is to say the chance of collisions approaches certainty at around the square root of the number of possible keys, contrary to some people's expectations. You can expect to encounter a collision in a 32 bit hash when you have evaluated sqrt(2 ^ 32) == 2 ^ 16 or around 65 thousand positions. With a 64 bit hash, you can expect a collision after about 2 ^ 32 or 4 billion positions.

Praxis

Post by Jonathan Schaeffer [17] :

... I can speak from experience here. In the early versions of my chess program Phoenix, I generated my Zobrist hash numbers using my student id number as a seed, naively thinking the random numbers generated by this seed would be good enough. A few years later I put code in to detect when my 32-bit hash key matched the wrong position. To my surprise, there were lots of errors. I changed my seed to another number and the error rate dropped dramatically. With this better seed, it became very, very rare to see a hash error. All randomly generated numbers are not the same! 

Lack a True Integer Type

Some languages (such as JavaScript and Lua) only have a 64-bit floating point "Number" type. In JavaScript, this type breaks down into a 32 bit integer when bitwise operators are used. One way to get a 64 bit hash is to use two 32 bit numbers in parallel, as Garbochess-JS [18] does. Another, which p4wn used at one stage, is to use 47 or 48 bit additive hashes. 64 bit floating point numbers are true integers up to 53 bits, so it is possible to sum at least 32 (and on average close to 64) random 48 bit numbers, which was enough for p4wn's purposes. For additive Zobrist hashing, you add the number when placing a piece and subtract it when removing it, rather than using xor both ways. There is no difference in accuracy or speed, and 48 bit hashes give you collisions at around the 2 ^ 24 or 16 million point.

Linear Independence

The minimum and average Hamming Distance over all Zobrist keys was often considered as "quality"-measure of the keys. However, maximizing the minimal hamming distance leads to very poor Zobrist keys. As long the minimum hamming distance is greater zero, linear independence (that is a small subset of all keys doesn't xor to zero), is much more important than hamming distance as explained by Sven Reichard [19] :

Assume we associate a bitstring to every piece-square combination. That is what's usually done in chess programs; some codes are added for the side to move, castling rights, e.p. squares, etc. We obtain the code of a position by xor-ing the codes of all the pieces contained in it.

What we want to avoid is collisions at nodes close to the root. For nodes close to the leaves the cost of recomputing the score is smaller. Hence we want to avoid that:

x1^x2^...^xm = y1^y2^...^yn
for codes xi, yi and small number m and n, and xi not equal to yj

To translate that to a language that is more familiar - at least for people of a mathematical background - we consider the field F2 of two elements. The elements are 0 and 1, and we can add and multiply them as usual, with the additional rule that 1 + 1 = 0. This is really a field, just like the real or complex numbers, and we can do calculations as usual. Note that addition is just the exclusive or.

Now the codes or bitstrings become vectors over the field F2, and the bitwise exclusive or becomes componentwise addition, i.e., usual addition of vectors. All these vectors form the vector space F2^k, where k is the length of the vectors. Typically, k = 64.

So, what we want to avoid is an equation

x1 + x2 + ... + xm = y1 + y2 + ... + yn

or

x1 + x2 + ... + xm + y1 + y2 + ... + yn = 0

since in F2, subtraction is the same as addition. Remembering some linear algebra, this just means that we want the set x1,...,xm,y1,...,yn to be linearly independent.

This leads to the following criterion for picking a set of hashcodes: A set of vectors in F2^k is a good set of hash codes if each small subset of non-zero vectors is linearly independent. What is not clear here is the meaning of "small", but we want small to be as big as possible. In other words, we consider sets of size up to a certain size as small, and if we can make that size bigger, it is better, since this leads to unique codes deeper in the tree.

However what is clear is that this quality criterion does not depend on the base of the vector space. I.e., if we have a good set and multiply each vector by an invertible matrix (in other words, if we rotate the vectors), the obtained set will be just as good, since the rotation does not change the linear independence. The Hamming distance, on the other hand, is highly dependent on the vector space base. Take for example the vectors (1,0) and (0,1) in F2^2; they have Hamming distance 2. If we multiply both of them by

(1 1)
(0 1)

we get (1,1) and (0,1), which have Hamming distance 1. Actually we can change any distance to anything else (except for 0) by an appropriate matrix. Thus we try to approximate something that is independent from the base (the quality of our hash codes) by something that depends on it (the Hamming distance). Simple logic tells you that this approximation has to be real bad. An example where it doesn't work: It has been said that the Hamming distance shouldn't be to small or to big. So, vectors at a distance which is half the length should be ok, right? Let the length be 8 (I don't want to type too many 0's and 1's), and consider the vectors

11110000
11001100
00111100

They all have weight 4, their pairwise distance is 4, and yet they add up to 0. Just by looking at Hamming distances, you have no chance of detecting that.

Summarizing I can say that I see no connection between the quality of hash codes and their Hamming distance. Using a good RNG like the one provided in GNU's stdlib will yield good hash codes ( you can actually prove that), and so I will take the codes as they are supplied by rand() or random() without messing with them and thereby most likely make them worse.

See also

Publications

Forum Posts

1982 ...

1990 ...

Re: Hash tables - Clash!!! What happens next? by Jonathan Schaeffer, March 17, 1994

2000 ...

2005 ...

2010 ...

2015 ...

Re: On-the fly hash key generation? by Aleks Peshkov, CCC, January 13, 2016

2020 ...

External Links

References

  1. King Wen sequence, I Ching divination involves obtaining a Hexagram by random generation
  2. All of Cage's music since 1951 was composed using chance procedures, most commonly using the I Ching
  3. Albert Zobrist (1970). A New Hashing Method with Application for Game Playing. Technical Report #88, Computer Science Department, The University of Wisconsin, Madison, WI, USA. Reprinted (1990) in ICCA Journal, Vol. 13, No. 2, pdf
  4. compact representation of chess positions by Tom Truscott, net.chess, January 7, 1982
  5. Re: Hashing function for board positionsby Jim Gillogly, rgcc, May 12, 1997
  6. Re: Zobrist keys - measure of quality? by Rein Halbersma, CCC, February 24, 2015
  7. J. Lawrence Carter, Mark N. Wegman (1977). Universal classes of hash functions. STOC '77
  8. Mihai Pătrașcu, Mikkel Thorup (2011). The Power of Simple Tabulation Hashing. arXiv:1011.5200v2
  9. Tabulation hashing from Wikipedia
  10. Picture gallery "Recognition and Success 1955 - 1972" from The Official M.C. Escher Website
  11. RANDOM.ORG - Integer Generator
  12. The Marsaglia Random Number CDROM including the Diehard Battery of Tests by George Marsaglia
  13. Re: Zobrist key random numbers by Zach Wegner, CCC, January 22, 2009
  14. Overlapped Zobrist keys array by Stefano Gemma, CCC, October 06, 2009
  15. Hashkey collisions (typical numbers) by Renze Steenhuisen, CCC, April 07, 2004
  16. Robert Hyatt, Anthony Cozzie (2005). The Effect of Hash Signature Collisions in a Chess Program. ICGA Journal, Vol. 28, No. 3
  17. Re: Hash tables - Clash!!! What happens next? by Jonathan Schaeffer, March 17, 1994
  18. Garbochess-JS
  19. Re: About random numbers and hashing by Sven Reichard, CCC, December 05, 2001
  20. Mersenne twister from Wikipedia
  21. 64-bit KISS RNGs by George Marsaglia, comp.lang.fortran | Computer Group, February 28, 2009
  22. RKISS by Bob Jenkins

Up one Level