Decomposing Pythagoras
The Pythagorean win expectancy model developed by Bill James remains one of the most celebrated results in sports analytics. Many have extended the application
of this model from its original use in baseball to other sports. Others have shown technical scoring conditions that imply the equivalence of win probability and the Pythagorean model. However, no explanation has been offered for why different sports yield different results beyond "that's what the data say." This talk presents a theoretical analysis of the Pythagorean model by first deducing an exact within-team equation relating win percentage to seasonal scoring records, and then reconciling mathematically this result with the Pythagorean model which is cross-sectional across teams in a league. We derive a complete decomposition of the Pythagorean coefficient γ
in terms of the exact model, and show that γ captures two key quantities – average points per game, and the average margins of victory and defeat – that together explain why
different sports yield different results. We demonstrate this decomposition using the past decade of seasonal results from MLB baseball, NBA basketball, NFL football, and NHL hockey, and show that the data do reflect the properties deduced in our analysis.