Distribution of Sudoku Solution Times

Websudoku.com has a density plot of solution times. If you don't know what sudoku is, then you have a life, so not to worry. On the right is a copy of their plot of solution times for evil difficulty puzzles. The height of the density crushes down to the y-axis as the time gets small, suggesting a very small left tail.  On the right, the height of the density goes to zero very very slowly as the solution time gets large, suggesting a very large right tail. The density reminds me of an inverse gamma density, and I realized I had probably never used an inverse gamma density to model real data before. 

I measured two points on the plot, mode, and the point to the right where the density crosses the bottom grey grid line, I got (.21, 1.46) for the mode and (.63, .18) on the right. The right hand side of the y-axis at (90,0) is measured to be (0,2.21) and the top of the x-axis is measured to be (0,1.47).  From this info, I fit gamma (red) and inverse gamma (green) densities which are plotted on the left. 

The gamma does not look as much like the density on the right as the inverse gamma density does. The gamma goes gently to zero and has positive probability at all points near zero, while the inverse gamma doesn't appear to have any probability mass below a certain point. The gamma right tail area goes to zero sooner than the inverse gamma, and the inverse gamma matches the solution times distribution a touch better on the right tail as well. I therefore declare the inverse gamma density the winner in this bake-off. Checking left tail areas, and subject to the usual high percentage of mistakes that I normally make, I checked left tail areas for 2 min 50 sec and 6 min 55 sec, which according to websudoku, come in as top 1% and top 14% respectively. I got respectively 2.2% and 20% from the gamma, and .012%  and 11% from the inverse gamma, which matches websudoku's numbers perhaps a touch better, though hardly perfectly.

A real maven would have digitized the sudoku curve directly and copied it into the R plot. And one could consider other distributions like the log normal or the inverse Gaussian. Readers? 

 

Subscribe to gamma density