Friday, May 30, 2008

Run Estimator Using Slash Stats

There are a lot of Run Estimators out there these days. Some of these are linear, such as Base Runs and Extrapolated Runs. These estimators work by assigning a run value to each event (single, double, out, etc.) based on the average change in Run Expectancy when the event occurs. Each event is multiplied by its coefficient, and the resulting sum is the number of runs created.

For convenience, I wanted to make a similar formula that took just the slash stats (Batting Average/On Base Percentage/Slugging Percentage) and calculated runs produced per at bat. I broke the formula down into three parts: outs, hits, and walks. In order to figure out how much each of these counted, I used the linear weight values for the 2007 AL.

Figuring out how to calculate the values of hits was the most difficult process. Since not all hits have equal value, slugging percentage needs to be used. But each base doesn't have the same value. For example, a single is worth .48 runs for one base, while a home run is worth 1.4 runs for four bases, or .35 runs per base. To fix this, I did a regression with the runs per base of each type of hit. This way, the number of runs per base can be calculated based on bases per hit (SLG/BA). Here's the equation:

Runs per base on hits = .022(SLG/BA)^2 - .152(SLG/BA) + .607

With this value found, now the runs created by hits can be found my multiplying runs per base by SLG.

Finding the runs created by walks required two things: walk rate and runs per walk. Since we know the value of a walk is .32 runs, then we just need to find walk rate. Walk rate can be found easily using OBP and BA.

Walk Rate = (OBP-BA)/(1-BA)

So this rate multiplied by .32 calculates runs created by walks.

The last step is to find the runs created by outs, which is a negative number. This negative number is what ultimately make the average run production 0, with above average players creating positive runs and below average players creating negative runs, this being because their outs outweigh their hits and walks. I originally calculated the runs per out to be -.29, but when everything added up an average player (.271/.338/.423) was 10 runs above average per 648 AB. So I changed the value to -.3149, which allowed it to zero out.

Outs per at bat is simply 1-OBP.

Adding all the components up brings this equation:

RUNS PRODUCED PER AT BAT

((-0.3149*(1-OBP))

+(((((0.022*((SLG/BA)^2)-(0.152*(SLG/BA))+(0.607)))))*(SLG))

+(0.32*((OBP-BA)/(1-BA))))


While the overall equation is complicated, the ease in in the need for only three variables.

I compared this equation to the Linear Weights of AL starters with 350 PA. The root mean square deviation between the Linear Weights and my equation for these values was 2.9 runs. The biggest differences came with players who had OBPs that were relatively high or low. Since getting on base is such an important part of creating runs, I don't mind this flaw too much.