Thanks to a recent Sports Illustrated article and Baseball Prospectus interview, I stumbled across the website for the Grupo Independiente para la Investigacion del Beisbol (GIIB), a group interested in applying sabermetric principles to Cuban baseball. Their website is very interesting, but it's in Spanish. In order to spread their very useful approach, I'm putting my loose translation here, in the hopes that (even if it's not 100% accurate) it will at least be an improvement over what you can get for free through something like Google Translate.
- All content is property of the GIIB and is not my own. I claim zero rights to it. If they get mad about this translation, they just have to contact me and I will absolutely take it down.
- I also claim zero responsibility for the accuracy of these translations. I am not a native Spanish speaker, but I did take Spanish in high school and am currently a level-17 Duolingo user, for whatever that's worth. Any missing Spanish knowledge (of which there is a lot) will be supplied by Google Translate.
- Because my understanding of the original is limited, the translations will probably not be exact. I hope to at least capture the spirit of the article, so others who don't know Spanish can still read the GIIB's research. Sentences or phrases I can't get a good handle on will be denoted by italics. You're welcome to leave corrections or other constructive feedback in the comments.
Is the OBP formula the most correct way to measure the probability that a batter reaches base? Is not the sacrifice bunt an opportunity to get on base? Why does the OBP formula ignore when a batter reaches base on an error?
OBP = (H+BB+HBP)/(AB+BB+HBP+SF)
BB: Base on Balls
HBP: Hit by Pitch
AB: At Bats
SF: Sacrifice Flies
The formula for OBP is too focused on the analysis of the individual hitter and not what he really contributes to the team. One piece of evidence for this last statement is the fact that the OBP formula excludes the sacrifice bunt from the denominator. Suppose a batter comes up with runners on first and second with no outs, and grounds out to second, allowing the runners to advance. What is the difference between this at bat and a sacrifice bunt? Are the two actions not worth the same to their team? Then why does OBP make a distinction between them? Defenders of OBP argue that, because the sacrifice bunt is ordered by the bench, it should not be seen as an opprotunity to get on base and therefore should be excluded from the denominator. But is this true? Are all sacrifice bunts ordered? Should OBP distinguish between sacrifice bunts put on by the manager and other sacrifices? Of course note; and even if we consider all sacrifices as ordered from the bench, the sacrifice isn't an opportunity to reach base? Really?
Let's return to the previous situation: runners on first and second, no outs. Suppose the batter bunts the ball and reaches first safely, getting credit for a hit; is this bunt not an opportunity to get on base? In other words, if the bunt goes for a hit, it is a positive action, but if it advances the runners (which could also be accomplished by other means as shown above) and the batter is thrown out at first, it doesn't count as an opportunity to get on base? This is incongruous.
Now we analyze another important aspect of OBP: the exclusion of times the batter reached on an error.
Suppose a batter reaches on an error. The batter accomplished one of his goals (to get on base), and has become a runner and an opportunity for his team to score. Any runner represents a great opportunity for a team to manufacture runs, regardless of whether he has reached by error, hit, or walk. But the basic OBP formula doesn't see it this way. According to the basic OBP formula, reaching on an error is a negative action. When a batter reaches first by an error, according to this formula he is not credited with reaching first safely but is credited with an at bat. In other words, the times when a batter reaches on an error, it is counted as a failed opportunity to reach first. This is difficult to understand.
Let's analyze this from a different perspective. For some, the error has nothing to do with the offensive player; it is true that the error is a bad defensive acction, but does this mean the batter has no influence? What is the difference between a hit and an error? Subjective concepts; first, the positioning of the defensive player, and then if the scorer considered there to be an opportunity for an out. Is a Texas leaguer to center field worth more than to connect on a shot to third that the fielder can't handle? In general, baseball rewards speed and placement and not force, but is this really important? Imagine your favorite team, losing by one in the ninth, with two outs and the tying run on third. The batter reaches on an error and the game is tied. Does it really matter how the game was tied? Would you rather lose?
It is true that errors are sometimes just bad defensive plays, where there is not a hard-hit ball or fast players to rush the defense and force them to make risky throws. But is the HBP not an error by the pitcher? The HBP is in most cases a mistake by the pitcher, a pitch that gets away from him; but it is nevertheless counted in the traditional OBP formula as a positive offensive action. It takes a lot for us to understand how a HBP is worth more for the batter than to reach on an error.
We now return to the initial question: is the traditional OBP formula the best way to measure the probability of a batter reaching base? Briefly, OBP does not count when a batter reaches on an error and doesn't consider a sacrifice bunt as an opportunity to get on base. We therefore calculate OBP in an alternative way and call it gOBP.
gOBP = (H+BB+HBP+ROE)/(AB+BB+HBP+SF+SAC)
ROE: Reached on error [Trans. note: abbreviated EE in original]
SAC: Sacrifice hit
To capture the reality of the concepts we must move away from the moralistic way of thinking that still exists in baseball analysis. If we want to know the real probability that a batter reaches base, then we have to count all opportunities and all successful actions. Each turn at bat is an opportunity to reach base; and of course, every time the batter reaches first safely, it is a positive result for him. We exclude only interference or obstruction from this analysis, because the probabilities of these actions occuring in a baseball game is around 0.0027 [~0.27 percent]. That is to say, it is such a small sample that it is negligible.
For now, enough philosophical discussions about baseball. We will concentrate on mathematical tools that allow us to demonstrate which of these two statistics is more useful when analyzing the offensive prowess of a team.
For this we compiled statistics from the last 15 Cuban National Series. We calculated both variants of OBP and found the linear correlation of both with runs scored per game.
This is the scatterplot of the traditional OBP vs. runs per game. These statistics show a linear correlation of 0.93. But we are not questioning the proven utility of the classic formula, but we are analyzing which formula is the most suitable.
We next show the scatterplot of gOBP with respect to runs per game.
These statistics show a linear correlation of 0.95.
Although the difference is not very high, it is not possible to deny that gOBP is slightly closer to reality than OBP, at least numerically. We remember that correlation does not explain causation between variables. But we consider gOBP the optimal indicator to measure the probability that a batter will reach base. This is why STRIKE, as well as other projects that derive from it including StatsPlay, includes gOBP in its reports and recommends it over OBP to measure this concept.