Compute the “signature” of one state under the current partitioning.
For each action, record (reward, next_partition_id).
Sorting these pairs gives us a fingerprint used to decide which states are equivalent.
This version is specifically so it can be used in parallel with rayon.
BFS to enumerate all reachable states from the initial game state.
Each unique state is assigned a unique index from [0, N].
N is the number of reachable states.