by Clifford Blau

Numerous analysts have developed formulas to predict how many runs will result from any given combination of singles, doubles, walks, outs, etc. These formulas are typically verified using seasonal team data, and most of them are very accurate by that standard. However, they are normally used to estimate how many runs result from the production of an individual batter. The main problem with that is that, since there is no way of determining exactly how many runs an individual is responsible for, the accuracy of the formulas for this purpose is difficult to validate. A related problem, highlighted by Phil Birnbaum in the May 1999 issue of By The Numbers, is that the range of offense represented by teams is much smaller than the difference in production by individuals. Therefore, a formula may work very well for a player of average performance, but not for a very good or very bad hitter. He developed a formula that works well over a wider range of player performance.

Some years ago, I did a small study to test the accuracy of runs formulas and, with the introduction of some new methods recently, I decided to revive and expand that study. I applied the formulas to pitchers' statistics. The advantage in this is that we know not only how many singles, walks, homers, etc. a pitcher allows, but also how many runs. Additionally, a pitcher's statistics offer a data set of approximately the same size as a batter's for a season. In theory, the typical error in the estimated runs will be proportionately greater for an individual than for a team, as luck will have fewer opportunities to even out. (While the runs formulas work well for team-seasons, they will not work well for individual games. This is because they assume an average number of uncounted events such as reaching on errors and lost baserunners. They also assume that counted events have a normal distribution, but even for team-seasons there will be some variance from the norm. For smaller samples such as an individual season or team-game, the error will be greater.) A possible drawback with this approach is that the number of runs charged to a pitcher may not reflect the runs deserved, due to the effect of partial innings and relief pitchers. Also, the normal range of performance is narrower for pitchers than for hitters. Few pitchers are good enough to allow under three runs per nine innings for a season, while those bad enough to yield more than seven don't usually pitch long.

I had originally used data from the 1984 and 1985 seasons, and expanded the study to encompass the 1986 and 1987 seasons, since I had data for those years. Fortunately, 1987 was a high offense year, so there were several pitchers who allowed runs at a high rate. The formulas evaluated were Bill James' Runs Created (RC)(Technical Version from the 1988 Abstract), Paul Johnson's Estimated Runs Produced (ERP), Extrapolated Runs (XR)(introduced in the 1999 Big Bad Baseball Annual), and Phil Birnbaum's Ugly Weights (UW). These formulas are detailed in the appendix. Since I was lacking double play data, I had to use a simplified version of XR, called Extrapolated Runs Reduced (XRR). Results are presented using the standard deviation of the difference between the predicted number of runs and the actual runs, the mean of that difference, and the square of the correlation produced by a linear regression equation (coefficient of determination.)

I selected 119 pitchers, representing a wide range of performance. These pitchers worked an average of 164.4 innings, with a range of 34.3 to 271.7. They allowed a mean of 80.3 runs. The results of this test are shown in Table 1:
  Standard Deviation Mean Error Coefficient of Determination
RC 9.19  5.2 .968
ERP 7.7 0.5 .966
XRR 7.80 -0.8 .967
UW 8.15 -1.9 .968

In order to test whether Ugly Weights outperformed the others for very good and very bad pitchers, I divided the sample into four nearly equal-sized groups, based on OPS. The very good group had these results, based on 29 pitchers with a mean of 193.9 innings and 57.6 runs allowed:
  Standard Deviation Mean Error Coefficient of Determination
RC 8.70  5.1 .949 
ERP 7.45 1.9 .949
XRR 7.37 0.5 .949
UW 6.78 1.6 .957

And the thirty pitchers in the very bad group, with an average of 94.5 innings pitched and 69.6 runs allowed:
  Standard Deviation Mean Error Coefficient of Determination
RC 8.05 5.3 .959
ERP 7.77 -1.2 .963
XRR 7.80 -2.2 .966 
UW 6.41 -3.2 .967

Then I hit upon another means of testing these formulas. I went through the major league box scores for July and August of 1993, and compiled the data for high and low offense games into groups of about eighteen games each, to approximate a full-time player. After making nine of these groups for both good and bad offenses, I applied the formulas (using XR instead of XRR this time), and got the following:
1 711 .353 .430 .592 177 161 184 163 160
2 671 .341 .419 .559 145 143 158 146 143
3 663 .299 .375 .507 139 120 126 117 121
4 652 .314 .393 .472 124 118 124 118 117
5 657 .333 .422 .516 134 133 144 133 133
6 669 .344 .419 .538 136 136 150 137 137
7 673 .330 .397 .612 146 148 162 142 148
8 656 .316 .396 .543 138 132 142 128 130
9 671 .334 .389 .548 138 129 136 127 130
Standard Deviation of differences 9.3 10.3 10.3 9.4 
Coefficient of determination .76 .77 .74 .75

10 574 .202 .260 .282 43 40 41 42 40
11 586 .212 .274 .285 37 44 44 45 45
12 572 .187 .250 .260 36 34 37 34 34
13 578 .220 .282 .282 40 45 46 43 46
14 579 .197 .230 .282 35 33 36 35 31
15 577 .217 .267 .267 31 37 39 34 37
16 587 .233 .303 .334 49 63 61 60 61
17 579 .207 .271 .285 38 43 42 44 44
18 578 .218 .270 .334 43 51 50 52 52
Standard Deviation of differences 6.77 6.4 6.0 6.9
Coefficient of determination .71 .72 .79 .68

Overall standard deviations were: 8.13 8.6 8.4 8.3


All of the formulas tended to overpredict runs for the weak hitting teams and all but RC underpredicted for the high scorers. A possible problem with this portion of the study is that the groups may be biased due to being selected for the output (runs) rather than the input (hits, walks, etc.) However, I tried to select games based on the input, so hopefully this does not distort the results.


This study has presented two methods of validating runs formulas for individuals. Using both pitchers' opponents batting statistics and groups of team game statistics, I found that Estimated Runs Produced performed best overall, although the differences among the four formulas were fairly small. All of the formulas correlate well with actual runs. However, the typical error in the estimates appears to be roughly 10%, which should be kept in mind when using them to evaluate hitters. It doesn't make sense to produce an estimate of runs created to the nearest .01 of a run when the formula is only accurate to the nearest 10 runs. Based on this study, Ugly Weights may be more useful for very good and very bad hitters. In the second part of the study, Runs Created was much more accurate in some cases, and much less in others, than the other three. Perhaps an examination of why that is so could lead to a more accurate formula. However, in order to achieve a significant increase in accuracy, data on baserunning, errors, and timely hitting would likely be necessary.


Runs Created: (H+W+HBP-CS-GDP)*(TB+(.26(BB-IBB+HBP)+.52(SH+SF+SB))/ AB+BB+HBP+SH+SF)

Estimated Runs Produced: (2*(TB+BB+HP)+H+SB-(.605X(AB+CS+GIDP-H)))*.16

Extrapolated Runs: .50(1B) + .72(2B) + 1.04(3B) + 1.44(HR) + .34(BB+HP-IBB)+.25(IBB) + .18(SB) - .32(CS)-.09(AB-H-K)-.098(K)-.37(GIDP)+.37(SF)+.04(SH)

Ugly Weights: .46(1B) + .80(2B) + 1.02(3B) + 1.4(HR) + .33(BB) + .3(SB) - .5(CS) - [ .687*ba -1.188*ba2 + .152*ip2- 1.288*iw*ba - .049*ba*ip + .271*ba*ip*iw + .459*iw - .552*iw2 - .018]* (outs)

where ip=Isolated Power (Slugging Average minus Batting Average) and iw=walks divided by at-bats

Thanks to Cyril Morong and Phil Birnbaum for their comments on earlier versions of this article.


Return to homepage