Tuesday, September 16, 2008

Can Tony Pena Jr Do It?

With just a couple weeks left in the regular season we can focus on all the races for titles: BA, HR, RBI, ERA, W, or whatever stat you prefer.

But under the radar is a man who is close to accomplishing something that is about as rare as it gets:

Tony Pena Jr has a OPS+ of 1 this year in 223 PA. That's right, 1. Brought to you by a .181 OBP and a .206 SLG.

If Pena can do just a little worse than that .181/.206, he can put his OPS under 0, which is quite rare for this many PA. How rare?

This rare:

First Last PA OPS+ Year

Bill Bergen 250 -4 1911

Frank O'Rourke 216 -11 1912

Pat Rockett 157 -0 1978

Bill Dinneen 155 -5 1902

Togie Pittinger 153 -7 1902

Andy Anderson 152 -2 1949


Since 1901, just 6 players with PA>150 and negative OPS+. Rockett, I believe, is somewhere between 0 and -.5.

In KC's last 12 games, Pena has been 1/12 with 0 BB and a SF for a .160 OPS. If he can replicate that 1-12 over the last 12 games, he can put his OPS+ at -1.5. The minimum he can do to get negative is go 0-4, which would get him to -.333. In order to tie the PA record, he needs 27 PA. If he can hit go 4-27 with 4 singles and no walks in that span, he can get to -1 and tie at 250 PA. Chances are that KC won't give him that opportunity, but it'll be interesting to (not) watch.

Tuesday, August 5, 2008

Profile: Felix Hernandez

A 22 year old starter with a career xERA of 3.44 over 601.2 IP is not exactly a common occurrence. Felix Hernandez has produced those numbers since his debut at age 19 in 2005 by striking out batters (8.45 K/9), and inducing groundballs (58.5 GB%). Hernanez ranks fourth in PWAR over the last three years, and still has plenty of time to improve.

How does Hernandez do what he does? Let's take a look at some pitchfx data from his 2008 season.

Stuff

Velocity

Pitch Hernandez Average
Fastball 96.89 91
Curveball 83.15 (13) 77 (14)
Slider 89.53 (7) 84 (7)
Changeup 86.86 (10) 82 (9)

The numbers in parenthesis are the difference between that pitch and the fastball. Hernandez clearly has elite velocity, and keeps that velocity up with all 4 of his pitches.

Movement



Felix
League Average

Pitch Horiz Move
Vert Move
Move Angle
Fastball -6.07 -6.2 6.68 8.9 -112.7 -60.8
Curveball 5.7 5.2 -5.47 -3.3 -81.8 -42.2
Slider 1.8 0.7 -0.38 3.7 -12.3 -88.7
Changeup -6.05 -7.4 3.41 6 -36.2 -60.3


The graph is from the point of view of the hitter, so negative horizontal break is toward a right handed hitter. The vertical movement is compared to a theoretical pitch without spin, so positive numbers aren't pitches that literally rise, just pitches that don't drop as much.
First of all, Hernandez get significantly above average movement both horizontally and vertically, but especially vertically. Notice the 4+ inches of vertical drop between Hernandez and the average slider. That's a huge difference, making his slider almost as close to a curveball as it is to a slider in terms of movement. Of course, if you go back to the velocity table, his slider is closer to being a fastball in terms of velocity. This is quite a pitch. Compared to the average fastball, Felix's fastball breaks in to righties significantly more than his other pitches. The movement difference between his fastball and changeup is nearly entirely vertical, which is quite different than the league average. In fact, the average fastball and changeup have nearly the same angle of break. Felix's curveball is about what would be expected looking at his other pitches.

Results

FB- Fastball; SL- Slider; CB: Curveball; CU: Changeup

Pitch Ball%
Called K%
Foul%
Swinging K%
InPlay%
FB 0.36 0.34 0.19 0.19 0.19 0.20 0.06 0.05 0.19 0.23
SL 0.36 0.37 0.14 0.19 0.17 0.12 0.13 0.21 0.20 0.11
CB 0.40 0.36 0.19 0.22 0.13 0.16 0.11 0.13 0.16 0.13
CU 0.40 0.36 0.11 0.18 0.14 0.15 0.13 0.12 0.21 0.19
All 0.37 0.35 0.17 0.19 0.17 0.17 0.09 0.10 0.19 0.19

The only pitch that is called a ball more than average, his slider, is either a called strike or swinging strike an extra 13% of the time. Hitters really can't put this pitch into play. Beyond that, Hernandez is simply a bit better, but better across the board for significant value.

Now lets look at what happens when the ball is hit into play (AVG and SLG include HR)

Pitch AVG
BABIP
SLG
HR%
FB 0.330 0.381 0.304 0.372 0.521 0.504 0.037 0.015
CB 0.310 0.486 0.290 0.441 0.471 0.811 0.029 0.081
SL 0.310 0.314 0.286 0.255 0.481 0.667 0.033 0.078
CU 0.319 0.162 0.295 0.139 0.502 0.270 0.035 0.027
All 0.323 0.362 0.298 0.342 0.506 0.532 0.035 0.030

Besides his dominating changeup, Hernandez's pitches don't fare too well when put into play. This may be just a product of the pitches that were recorded by pitchfx; his actuall BABIP is .311, compared to the .342 listed in "All" on this chart. He has, however, prevented giving up homers with his fastball, which must be due to the above average sink he puts on it.

PitchType nipRuns nipR100 bipRuns bipR100 TOTAL runs100
FB -12.35 -1.35 17.46 6.47 3.76 0.32
CB -5.14 -2.04 7.53 20.35 0.35 0.12
SL -9.10 -2.23 4.36 8.55 -6.97 -1.52
CU -2.60 -1.62 -3.66 -9.89 -7.88 -3.98
All -29.18 -1.68 25.69 6.50 -10.73 -0.50

This table requires some explanation. First of all, "nip" is not in play, and "bip" is ball in play (include HR). Second of all, all the "Run" values are based on linear weights. The linear weights for balls and strikes come from here. I used a constant based on the average number of times reaching each count. R100 is runs per 100 pitches, which also comes from here

Now onto the actual results. These numbers give a total picture of Hernandez's results using each pitch. Hernandez dominates when he keeps the ball out of play. All of his pitchers are over -1 run per 100 pitches when not put in play, quite an impressive accomplishment. The bipR100 vary greatly, and some of that is probably due to the luck involved on balls in play. His slider is probably his best pitch, but the value of his changeup is clearly not all luck. That extra sink on his changeup seems to be helping Felix avoid hits, especially of the extra base variety.

Summary

Not surprisingly, Hernandez has 4 above average pitches in terms of movement, velocity, and results. His slider is unique, and has results to match its speed and movement. His impressive fastball is a great pitch, but its his offspeed stuff that really sets him apart. Amazingly just 22 years old, Hernandez is probably the best pitcher in the game if you consider age, experience, talent, and results.

All pitchfx stats come from Josh Kalk's
http://www.baseball.bornbybits.com

Sunday, August 3, 2008

The Best Pitchers in the Major Leagues

After compiling a bunch of xERA numbers, I decided to take a look at the top pitchers in baseball right now. I ranked starters by Peripheral Wins Above Replacement (PWAR), which came from xRA and xIP numbers. I took each players numbers over the last 3 calendar years (thanks fangraphs), and simply found wins above replacement per year over that three year span. This isn't a list of who has the most value over any amount of time or who has the most trade value, it's just who have been the best pitchers over the last 1095 days.

1. Johan Santana, 4.14 PWAR
8.96 K/9, 2.04 BB/9, 39.7 GB%
661 xIP, 3.39 xERA

There's a reason that this guy caused so much attention when he was on the trade market last winter. The GB% is below average, but those walks and strikeouts are fantastic. Santana is 29 and while he probably won't be posting ERA's below 3 anymore, he has a few more years to be included in the conversation on who the best pitcher in the majors is. While his K and BB numbers this year aren't quite what they used to be, his GB% is at a career high 42.7%.

2. Brandon Webb, 4.06 PWAR
7.26 K/9, 2.26 BB/9, 64.3 GB%
677 xIP, 3.46 xERA

When you're the best groundball pitcher in the majors and have 3.21 K/BB, you deserve to be mentioned among the elite. Webb will earn $5.5M this year, $6.5M next year, and has a club option for $8.5M in '10, making him on of the best values for a pitcher in his prime. Webb's PWAR suggests value of over $16M, and if he continues to induce those kind of GB's with those K and BB, he may see that kind of money if he hits the market in '11 at the age of 32.

3. CC Sabathia, 3.91 PWAR
8.22 K/9, 1.96 BB/9, 46.1 GB%
665 xIP, 3.49 xERA

Last year's Cy Young winner will make quite a lot of money this winter, and whatever team is able to sign him will certainly get some value out of that contract. As great as CC was last year, he's been just as good this year, with the highest K/9 of his career and the second best GB%. Sabathia has been fantastic since '06, with both an ERA and FIP under 3.30 in each of those years. Analysts often mention how Sabathia has "learned to pitch" in the past few years, a thought which is backed up by the numbers. Sabathia lowered his BB/9 each year from '01-'07, from a replacement-level 4.74 in his rookie year to an elite 1.38. Sabathia has also gone from throwing 66.3% fastballs in '05 to 55.2% fastballs this year, while raising his slider % from 15.3 to 25.2% in that time. This newfound ability to mix things up is the root of Sabathia's improved K and GB rates.

4. Felix Hernandez, 3.66 PWAR
8.15 K/9, 2.81 BB/9, 58.5 GB%
600.5 xIP, 3.44 xERA

The fact that a 22-year-old has posted these numbers over the last 3 years is pretty nuts. I'd like to do a complete profile of this pitcher who posted a 2.85 FIP over 85.1 IP at the age of 19, but for now I'll stick to the basics. Hernandez could be the best pitcher in the majors very soon, and he's still a few years away from his prime. Felix has been homer-lucky this year (7.8 HR/FB), and the grounders aren't quite what they used to be (51.1% in '08), but the strikeouts are encouraging (8.40 K/9). Hernandez is the probably the hardest throwing starter in the majors, as his 95.3 mph average fastball over the last three years has been the best among MLB starters. Felix's career is just beginning, and it should be an entertaining one to watch.

5. Jake Peavy, 3.58 PWAR
9.39 K/9, 2.71 BB/9, 41.4 GB%
595 xIP, 3.45 xERA

Last year Peavy rode a 2.54 ERA to a 19-6 record and the NL Cy Young title. His FIP wasn't quite as low, although it did as well lead the majors at an impressive 2.84. However, FIP is not immune to ballpark issues, which were definitely prevalent in Peavy's 5.8% HR/FB rate. This year, Peavy has kept his ERA below his peripherals in a different way, by stranding 84.5% of runners. While his BB/9 has improved, he isn't striking out quite as many batters as in the past couple years and his GB%, which was a career high 44.0% last year, is back down to 41.0%. Looking at his pitch selection from the past three years, the difference that stands out most is his use his offspeed pitches, throwing 27.6% offspeed compared to 33.1 offspeed in '07. In '06, when he had a 38 GB% and a 3.51 FIP, he threw just 25.5% offspeed. Peavy could get that GB% back to his '07 level if he started to throw more offspeed pitches. It may cost him a few walks, but those GB are the difference between the 3.16 xERA he sported in '07 and the 3.85 xERA he has this year.

Here's the rest of the top 10.

6. Dan Haren, 3.54 PWAR
7. Derek Lowe, 3.24 PWAR
8. Javier Vazquez, 3.14 PWAR
9. Josh Beckett, 2.97 PWAR
10. Scott Kazmir, 2.81 PWAR

Just for fun, I'll list my opinion on the top five for a few periods of time.

Best Pitcher for 2009
1. Brandon Webb
2. CC Sabathia
3. Johan Santana
4. Jake Peavy
5. Felix Hernandez

Best Pitcher for 2009-2011
1. CC Sabathia
2. Brandon Webb
3. Jake Peavy
4. Felix Hernandez
5. Johan Santana

Best Pitcher for 2009-2014
1. CC Sabathia
2. Felix Hernandez
3. Brandon Webb
4. Jake Peavy
5. Scott Kazmir



Stats came from the irreplaceable fangraphs.com

Friday, August 1, 2008

DIPS don't lie

Defensive Independent Pitching Statistics are a very important part of baseball analysis. They take away the issues that are brought about by Wins, ERA, and other commonly used pitching stats. Here is my attempt to use a few stats to break defense away from pitching to reach xERA, the expected ERA based on things the pitcher can control.

The stats I used were IP, H, BB, K, LD%, GB%, and FB%. IP and H were basically used just to find a rough representation of BF. I did this because I was using Fangraphs' export feature to calculate for a long list of data. Here are the steps.

Outs = IP*3

Balls Hit (BH) = Outs - K + H
Basically Ball in Play plus HR. HR are usually seperated but will be dealt with later.

xHits = ((.734*LD%)+(.245*GB%)+(.217*FB%)*BH
The coefficients come from the MLB batting averages on each type of hit type.

xTotalBases = ((1.016*LD%) + (.267*GB%) + (.574*FB%))*BH
Same as xHits, except with SLG instead of BA. This is also where HR are dealt with, as the Slugging Percentages include HR, and HR are mostly a product of GB/FB ratio.

xBA = xH/(Outs + H)
xOBP = (xH+BB)/(Outs + H + BB)
xSLG = xTB/(Outs + H)

Now that we have defensive independent BA/OBP/SLG, we can find out xRunsAllowed. I decided to use GPA because of its simplicity and ability to be converted into Runs.

xGPA = (1.8*xOBP + xSLG)/4
xRA = (Outs+H+BB)*1.356*(xGPA^1.77)

Now that we have xRA, we just have to find out xIP, which is basically xOuts based on xOBP.

xIP = ((Outs+H+BB)*(1-xOBP))/3

Simple. Now xRA/9

xRA/9 = xRA*9/xIP

Since this includes unearned runs, it won't be on the same scale as ERA. Since the difference between ERA and RA is about .360

xERA = xRA/9 - .360

And that's about it.

Here are some results:

All qualified 2008 Pitchers

All qualifed 2007 Pitchers

Sunday, June 8, 2008

RPA Numbers

After creating RPA (Run Production Average), I made some new alterations to make the stat easier to gauge. I also tested it to compare it's usefulness to other run estimators. First I made a new form that is on the scale of batting average (.271 being average). I did this my making Wins (positive runs) and Losses (negative runs), then finding a Winning Percentage, which I then scaled to batting average.

Here is the RPA for all qualified batters from 2000-2007. Bonds had the best season in 2004, with a .421 RPA. Neifi Perez had the worst in 2002, with a horrific .209 (.236/.260/.303).

I also made RPA with a positional adjustment for 2007. Jorge Posada led baseball with a .342 RPAP, and Nick Punto was last among qualified with a .217.

I also calculated the Pearson's correlation in order to find out how well it predicted runs per game by a team. Here is the comparison. The R value is .9329, which, compared to other run estimators, is pretty good.

Friday, May 30, 2008

Run Estimator Using Slash Stats

There are a lot of Run Estimators out there these days. Some of these are linear, such as Base Runs and Extrapolated Runs. These estimators work by assigning a run value to each event (single, double, out, etc.) based on the average change in Run Expectancy when the event occurs. Each event is multiplied by its coefficient, and the resulting sum is the number of runs created.

For convenience, I wanted to make a similar formula that took just the slash stats (Batting Average/On Base Percentage/Slugging Percentage) and calculated runs produced per at bat. I broke the formula down into three parts: outs, hits, and walks. In order to figure out how much each of these counted, I used the linear weight values for the 2007 AL.

Figuring out how to calculate the values of hits was the most difficult process. Since not all hits have equal value, slugging percentage needs to be used. But each base doesn't have the same value. For example, a single is worth .48 runs for one base, while a home run is worth 1.4 runs for four bases, or .35 runs per base. To fix this, I did a regression with the runs per base of each type of hit. This way, the number of runs per base can be calculated based on bases per hit (SLG/BA). Here's the equation:

Runs per base on hits = .022(SLG/BA)^2 - .152(SLG/BA) + .607

With this value found, now the runs created by hits can be found my multiplying runs per base by SLG.

Finding the runs created by walks required two things: walk rate and runs per walk. Since we know the value of a walk is .32 runs, then we just need to find walk rate. Walk rate can be found easily using OBP and BA.

Walk Rate = (OBP-BA)/(1-BA)

So this rate multiplied by .32 calculates runs created by walks.

The last step is to find the runs created by outs, which is a negative number. This negative number is what ultimately make the average run production 0, with above average players creating positive runs and below average players creating negative runs, this being because their outs outweigh their hits and walks. I originally calculated the runs per out to be -.29, but when everything added up an average player (.271/.338/.423) was 10 runs above average per 648 AB. So I changed the value to -.3149, which allowed it to zero out.

Outs per at bat is simply 1-OBP.

Adding all the components up brings this equation:

RUNS PRODUCED PER AT BAT

((-0.3149*(1-OBP))

+(((((0.022*((SLG/BA)^2)-(0.152*(SLG/BA))+(0.607)))))*(SLG))

+(0.32*((OBP-BA)/(1-BA))))


While the overall equation is complicated, the ease in in the need for only three variables.

I compared this equation to the Linear Weights of AL starters with 350 PA. The root mean square deviation between the Linear Weights and my equation for these values was 2.9 runs. The biggest differences came with players who had OBPs that were relatively high or low. Since getting on base is such an important part of creating runs, I don't mind this flaw too much.



Wednesday, March 5, 2008

Hardball Times Fielding Translations

The Hardball Times has fielding statistics based on percentage balls fielded within a fielder's zone (RZR), and plays made outside of the players zone (OOZ). I translated these numbers into runs above league average for each position. I did this only with players who qualified for the batting title.

Here are the results.