Improved Chess Rating Comparisons Using Nonparametric Statistics

Many chess players wonder how different rating systems map to each other. There is a common idea that it’s not possible to map the ratings because they are different pools of players and different time controls. You can also argue that playing OTB is much different than playing online. These are valid points but we can still take a look at how the ratings compare to give chess players a guide as to where they stand in different rating pools. This post will explain how we create our Rating Comparisons.

Gather Data Sources

First, we download all of the ratings for players in our database. This includes USCF, FIDE, Chess.com, and Lichess ratings. Jesse is able to pull API data from lichess and Chess.com, and I download the latest rating supplements for USCF and FIDE.

We will use the Chess.com blitz vs USCF ratings as an example for the remaining steps.

Subset The Data

Players In Both Rating Pools

Next, we find all players from our sources that have both a chess.com username and a USCF rating ID. This is our widest net on all possible players eligible for the comparison.

Venn Diagram Of Comparison Set

Recent Non-Provisional Players

We don’t want to include players that have only played a handful of games. We also would like to exclude anyone who last played a long time ago. To handle this for online ratings we subset to only players who have RD < 150. For more information on RD values and how ratings work, see the Chess Ratings post.

No Outliers

There are going to be some errors and abnormalities in the data that we need to check for. After hand-verifying some of the egregious values, we are left with a pretty clean set of players to analyze.

Rating Comparison Outlier
Example Of An Outlier

Rank The Data

Now that we have a pretty clean set of data based on players in both rating pools, we rank them each individually to help remove the noise. Here’s an example of the input data and the ranked data. Notice how the 1100 and 1150 USCF values swap places so both are ranked low to high.

Chess.com BlitzUSCF
1000900
11001150
12501100
Input Data
Chess.com BlitzUSCF
1000900
11001100
12501150
Ranked Data

This gives us a very smooth line to map the two rating systems. We also create a 2nd-order polynomial regression formula from this line which will be used for the +/- values later.

Example Of Both Rating Systems Ranked

In the comparison table, we create values every 50-100 points for Chess.com blitz and lookup what the corresponding USCF rating is in the ranked data.

Chess.com Look-up Values

Standard Deviation

The final step is to figure out how certain we are in these predictions. We take the 2nd order polynomial regression equation from the ranked data to predict what each player’s USCF rating will be based on their Chess.com blitz ratings.

Next, we take the difference between the predicted USCF rating and the actual USCF rating. Here’s an example of how the distribution can look between predicted and actual. This histogram happens to be for blitz and bullet, but we can see most values are centered at the predicted value, and an equal but decreasing number of players fall into the bins as we move left and right.

By taking the standard deviation of the predicted minus actual values, we are able to get a sense of how each comparison distribution looks. In the tables, I add a +/- value that corresponds to one standard deviation. Here’s an example:

Sample Player
Photo by Mwtoews

If your Chess.com blitz rating is 1550 and you’re wondering what an equivalent USCF rating would be, the best guess is 1540. Out of the players in our database that have both ratings, we’d expect 68% of players to be between 1540-260 (1280) and 1540+260 (1800). That’s a very big range so that should be kept in mind when looking at these comparisons. Most of the rating comparisons have a standard deviation between 150 and 200.

This Post Has 6 Comments

  1. Name

    Thank you for the ratings update, it is epic!

  2. Matt Jensen

    Thanks!

  3. LaurentS

    Hello ! Ideally, comparisons between online blitz and OTB ratings should take into account the age of the players in the comparison set, even in a very rough way (like U20 vs. over 20yo). Young players tend to be much stronger in blitz and underrated OTB because they haven’t played a lot yet. I understand it’s very difficult to implement though, but I wouldn’t be surprised if it improved considerably the accuracy of the model.

    1. Matt Jensen

      I really like that idea. We are planning to come out with a new survey in 2021 that will hopefully increase our sample size significantly.

  4. Cenk Yilmaz

    Hi, are you sure that your updated dataset is correctly interpreted? Because I believe that since the start of pandemic ,while players may have improved their online rating and skill level, they didn’t have a chance to attend any tournament. That means their OTB( FIDE or USCF) rating is not up to their skill level yet/underrated. I believe you should put such a disclaimer under the dataset to get rid of any misunderstandings.

    1. Matt Jensen

      This is a very good point. I’m planning on adding the disclaimer after comparing to previous results. Thanks.

Leave a Reply