BABIP (Batting Average on Balls in Play). Simple statistic. But much analyzed by sabermetricians.
DIPS theory, an important sabermetric concept, indicates that pitchers have limited control over the outcomes of batted balls in play. A corrollary to DIPS is that batters probably have more influence over their BABIP than pitchers do over the BABIP they allow. As a result, analysts and fans frequently compare a pitcher's BABIP to the league average BABIP and assume that the difference between the pitcher's results and the average is due to some combination of luck and defense. This can provide a starting point for predicting the direction of a pitcher's regression from season to season.
However, this isn't the end of the story. We know that pitchers have differing tendencies for fly ball and groundball rates, which affects BABIP (since groundballs generally produce more hits in play than fly balls). It's widely accepted that specialty pitches (e.g., knuckleballs or Mariano Rivera's cutter) can confound BABIP expectations. And it doesn't take much to expand that conjecture to other pitchers who seem to induce weaker than average contact. A better way of looking at BABIP is to view it as a 30 or 40 point range around the league average. Pitchers with really high or low BABIPs beyond that range probably have experienced unsustainable results. But it is also possible that pitchers' differing skills and pitch types will slant their BABIP result toward the high or low end of the normal range. Because BABIP is subject to quite a bit of random variation, it is difficult to distinguish pitcher-specific BABIP from luck or defense.
A recent article in the "Community Research" section of Fangraphs provides an interesting effort at illuminating pitcher BABIP. Steve S presents his results in Projecting BABIP Using Batted Ball Data, and develops a formula for pitcher-specific "expected BABIP," or x-BABIP. (As you may know, x-BABIP formulas have been applied previously to hitters.) Later in this story, I will use that formula to calculate x-BABIP for Astros' pitchers. The most interesting aspect of his article is that it shows a myriad of correlation coefficients between and among BABIP, batted ball types, pitch f/x data, and pitch types. I wouldn't take these results as conclusive, but I think they show the direction of certain relationships, and provide some leads for future hypotheses which might help explain how or why pitchers achieved a particular performance level.
Not surprisingly, line drive rates and infield fly rates are found to be important determinants of BABIP. However, line drive rates are less predictable, even if multiple years of data are used to predict a future line drive rate. The pitcher has some effect on the line drive rates; for example, groundball and strike out rates seem to be correlated with preventing line drives. But, in general, line drive rates probably reflect more year to year random fluctuation. Infield fly ball rates, on the other hand, appear to be more predictable and consistent from year to year. As the Hardball Times glossary states: For some pitchers, inducing infield flies may be a repeatable skill. And infield fly balls are an important outcome, since infield pops are the surest form of out other than a strike out.
Another thought provoking observation from the article: different, and perhaps divergent, pitching skills are related to preventing line drives and inducing infield flyballs. Four seam fastballs in the zone combined with using slower pitches to change speeds appear to be the best recipe for inducing infield flyballs. Fastball movement has a closer relationship to infield flyball rates than velocity. Avoiding line drives, on the other hand, appears to depend on sinkers and 2 seam fastballs, as well as higher velocity. To some extent, these two elements of run prevention profile as two different types of pitchers.
The fangraphs article develops the following formula for a pitcher's expected BABIP: xBABIP = 0.4*LD% – 0.6*FB%*IFFB% + 0.235
I calculated the x-BABIP for some of the Astros' pitchers likely to return next year in key roles. Comparing the pitchers' actual BABIP to x-BABIP provides some information regarding the liklihood of a positive or negative regression in the pitcher's performance next year. If the pitcher's actual BABIP exceeds x-BABIP, this may support an argument that the pitcher is likely to revert to a lower BABIP, and vice versa. However, if the lower than expected BABIP performance is primarily due to poor defense (not an unreasonable possibility, since the Astros did not rank well on advanced defensive stats), than a reversion to x-BABIP may depend on an improved defense in 2013. The variance column is actual BABIP minus x-BABIP (i.e., a positive variance means that the pitcher is expected to revert to a lower BABIP).
The column titled "FDP Wins" is the potential impact of batted ball and the timing of pitching outcomes during 2012, expressed on a WAR type basis. I described Fielding Dependent Pitching Wins in a previous article here. A negative FDP Wins means that the pitcher was hurt by the number and timing of balls in play, and is consistent with a positive variance between BABIP and x-BABIP.
Results and Predicted Direction
Three Astros' starters--Norris, Lyles, and Harrell--exhibited among the largest ERA splits for home vs. road in the majors. For Norris and Lyles, the H/R split in BABIP was a major reason for the ERA differential. The BABIP splits: Norris (H) .263; (R) .326. Lyles (H) .284 ; (R) .319.
Norris is an extreme case for H/R splits. He has the worst road ERA (6.94) among qualified starters. Norris fell short of the inning threshold to rank as qualified starter at home. But if he had qualified, Norris would have the second best home ERA among MLB starters. That is amazing. Norris was almost the best major league starting pitcher at home and the worst starting pitcher on the road.
Norris has similar extreme differentials between x-BABIP and actual BABIP for home and road--except in opposite directions. Norris' x-BABIP/ actual BABIP: (H) .305 / .263; (R) .294 / .326. This would suggest that regression for both home and road BABIP will reduce the size of Norris home and road ERA splits.
But we really don't know why one team has three starting pitchers with such large ERA home/road splits (2 to 5 runs variance). Does Minute Maid Park provide an unspecified ballpark advantage? Does the Astros defense play much better at home? We could probably come up with more lines of speculation. The bottom line is, we don't know.