Team Run Values
I know what you're thinking: "Doesn't this guy have a life?", "This is pathetic..." and so on. To the uninitiated, this is undoubtedly true; but it's my time to waste...plus I didn't request sleepless nights.
On the other hand, if you have an open mind and if you're curious to learn a little about the inner workings of softball and baseball, read on.
What are Batting Run Values?
Batting runs or more correctly, Linear Weight Run Values, were developed by Canadian George Lindsey in the late 1950's/early 1960's and expanded upon by Pete Palmer in the late 1970's and early 1980's. Palmer's work was presented in the classic book "The Hidden Game of Baseball." If you have a burning desire to earn your pocket protector, I suggest you hunt down the book and read it.
To grossly simplify Palmer's work, he expressed runs scored in baseball as a multivariate linear algebraic equation of the form: Runs = a1x1 + a2x2 +...+ anxn where "an" are coefficients, "xn" are variables and Runs a constant. The task is to figure out what variables or events are relevant to run scoring and then assigning a weight or run value to each event. Events are fairly common knowledge to any softball player: singles, doubles, triples, home runs, outs and so on.
Palmer's published equation is as follows: Runs = (0.46)1B + (0.80)2B + (1.02)3B + (1.40)HR + (0.33)(BB + HB) + (0.30)SB - (0.60)CS - (0.25)(AB - H) - (0.50)OOB. SB is stolen bases, CS is caught stealing, (AB - H) are batted outs, OOB are outs on base. Therefore, if you take daily, weekly, monthly, yearly or career event frequencies for a given player and multiply them by the event weights, you have Runs produced with respect to league average for that player. A league average player in this system produces 0 runs with respect to average--the floor or ceiling is dictated by league runs scored. A negative run value indicates below average offensive contribution and a positive run value indicates above average offensive contribution.
BaseRuns and the +1 technique
Palmer used a new fangled gadget called a computer to run multiple simulations on all games played since 1901 (up to the point of the 1983 publishing of second edition of the book), game state analysis, and his keen baseball acumen to assign run values to each baseball event. The results were shocking to many fans and entrenched "baseball people" as it showed certain baseball truisms lacking...truth. For example, on AVERAGE there is not much value to stolen bases or especially sacrifice flies or sacrifice bunts. Some 30 years later, these assertions still rile "true baseball" fans or NL fans as they typically emphatically assert.
This is not to say such events have no value, but their value is very situation dependent. How many times do we have to see Derek Jeter (a no doubt Hall of Famer and great player) sac bunt with no outs and runners on 1st, 2nd? It's a waste unless, maybe, it's the bottom of the 9th and a tie game and your desire is to increase your CHANCE of scoring...then again, given his recent proclivity for GIDP's, maybe it's not a bad idea.
The same is true for stolen bases. A player must be successful around 80% of the time just to break even! But, in certain situations the stolen base value is much higher--e.g. the '04 ALCS when Francona inserted pinch runner Dave Roberts solely to steal bases.
Palmer derived run values from game simulations and game states because it made sense; he had substantial LEAGUE data and his values were league averages. Such data is not available to me, so I employed a different method: BaseRuns and +1.
BaseRuns is a dynamic run estimator developed by David Smyth in the late 1990's and has the form: Runs = A*B/(B+C) + D where A is the reach base factor, B is the advance base runner factor, C is the failure to advance base runner factor (Out), and D is home runs. A brief examination of the form shows logical harmony with the game; runners reach base and advance by a certain ratio or proportion. Home runs are treated as a separate entity because they are guaranteed runs. Some versions of BaseRuns place sac flies in the D term, but such a discussion is beyond the scope of this primer.
The Terms
A = H + W + IBB + ROE + CI + OS + HB - HR - CA - DP
B = .781*S + 2.61*D + 4.28*T + 2.42*HR + .82*ROE + 1.433*OS + .034*(W + HB + CI) - .741*IW + 0.813*SB + .125*CA + 1.07*S + 1.81*SF + .69*DP - .029*(AB - H - ROE - OS) - .086*K
C = AB - H - ROE - OS + S + SF
D = HR
The most complicated BaseRun term is B because of the various weights. The weights listed in B above are mostly for baseball; however, using data collected over the years, I derived weights that have proven very reliable for softball at this level of play.
Because of the logical construct of BaseRuns, it has proven applicable across a wide range of run environments. Given it's relative reliability, we may use the linear weights inherent to BaseRuns as it applies to our context; thus, I do not have to derive linear weights by other means where sufficient data is lacking.
How is this accomplished?
There are several methods to accomplish this task including the use of partial differentiation, but a very basic and intuitive approach is the +1 method. Simply, if we want to find the run value of a single, we can add (1) additional single to the system and note the difference.
Inserting +1 single to the system may not give us our most accurate answer because its comparatively large value creates a large disturbance within the system. Therefore, instead of adding a whole single to the system, we can add some fraction of a single--in our case 1 ten millionth of a single. This value was chosen because it is the smallest value Excel capably handles for this application. The difference in accuracy is not massive. Using a value of +1 produces a singles run value of 1.0904 while using a value of +0.0000001 produces a run value of 1.0882. Not a huge difference, but enough to merit the attention to detail--besides, it's the spreadsheet doing the work, not me. This process was used to derive the weights for each event in the Batting Runs table above.
Why bother?
The most important thing to remember is these SEASON AVERAGE values apply only to the specified team up to this point in the season. They may or may not be accurate on a game to game, inning to inning, or event to event basis. How can this be? Without delving into too much detail, it has to do with game states, run environments and situational leverage. Here is a game state example: a walk is worth more when issued with no outs than it is with two outs. The reason is there are 3 outs worth of opportunities to score the base runner. Conversely, a HR is worth more when hit with two outs than no outs because of fewer opportunities to score a base runner; a HR is at least one guaranteed run.
Run Environments
In an extremely high run scoring environment, there is little difference between a HR and singles. The key to scoring in such an environment is just to reach base. In a very low run scoring environment, the opposite is true--there is a large difference between singles and HRs. The key to scoring is not just getting on base, but to hit for extra bases. As a brief aside, the BR equation accurately captures this effect; e.g. compare the Extra Base Value tables for DPH and Toyota of CP.
Consider the following examples. A slow pitch softball game where 40 or more runs are scored and a Div I woman's softball game where 3 or so runs are scored. For the slow pitch game, the value of an out is nearly a run! All a team needs to do is get on base and keep the wheel turning.
At the other extreme lies the Div I softball game with its very low run environment. The key is not to just reach base, but to hit for extra bases. Strangely, most Div I teams are constructed to exacerbate the situation with lineups full of left handed slap hitters trying to reach base by making contact with the pitch while seemingly half way up the base line. The '09 University of Florida team with its season long .543 Slugging rate and .220 Isolated Power rate (an extra base every 4.5 AB's) showed what a properly constructed lineup in a low run environment can do. OK, they didn't win the WCWS, but I'm inherently optimistic.
Linear weights also indicate sensible in game decisions. For example, if the average value of an EB is around .20, and the value of a caught advancing (CA) is around -.65, it is imperative that the baserunner not get thrown out trying to take an extra base! For an accurate run value of this situation, a run expectancy table is necessary. Constructing such a table is a project for the future, I suppose.
Most importantly, the linear weight values show the true average value of events--or player contributions--independent of specific situations or games. This matters when a neutral context for evaluating a player is desired.
Thank you for reading and there is more to come.
Things to do
Adjust the K weight in B term of the BR equation. K's should be more costly because the failure to put the ball in play denies the benefit of an ROE. E's are very common at our level of play..
Develop a run expectancy table. Not likely to happen in the near term.
Introduce wOBA to the STATs section. This is a weighted OBA derived from linear weight event values. This is another way to guage the quality of a player's offense expressed in a familiar way on a familiar scale.
Credits:
Pete Palmer - Developed modern linear weights
John Thorn - Worked with Palmer and verbalized linear weights process.
Tom Tango - Brilliant baseball analyst/sabermetrician. Author of The Book. Recommended reading if any of this stuff interests you. Part time consultant for Seattle Mariners. www.tangotiger.net/
U.S. Patriot - Brilliant baseball analyst/sabermetrician. http://gosu02.tripod.com/id7.html
David Smyth - Developed BR equation
*
On the other hand, if you have an open mind and if you're curious to learn a little about the inner workings of softball and baseball, read on.
What are Batting Run Values?
Batting runs or more correctly, Linear Weight Run Values, were developed by Canadian George Lindsey in the late 1950's/early 1960's and expanded upon by Pete Palmer in the late 1970's and early 1980's. Palmer's work was presented in the classic book "The Hidden Game of Baseball." If you have a burning desire to earn your pocket protector, I suggest you hunt down the book and read it.
To grossly simplify Palmer's work, he expressed runs scored in baseball as a multivariate linear algebraic equation of the form: Runs = a1x1 + a2x2 +...+ anxn where "an" are coefficients, "xn" are variables and Runs a constant. The task is to figure out what variables or events are relevant to run scoring and then assigning a weight or run value to each event. Events are fairly common knowledge to any softball player: singles, doubles, triples, home runs, outs and so on.
Palmer's published equation is as follows: Runs = (0.46)1B + (0.80)2B + (1.02)3B + (1.40)HR + (0.33)(BB + HB) + (0.30)SB - (0.60)CS - (0.25)(AB - H) - (0.50)OOB. SB is stolen bases, CS is caught stealing, (AB - H) are batted outs, OOB are outs on base. Therefore, if you take daily, weekly, monthly, yearly or career event frequencies for a given player and multiply them by the event weights, you have Runs produced with respect to league average for that player. A league average player in this system produces 0 runs with respect to average--the floor or ceiling is dictated by league runs scored. A negative run value indicates below average offensive contribution and a positive run value indicates above average offensive contribution.
BaseRuns and the +1 technique
Palmer used a new fangled gadget called a computer to run multiple simulations on all games played since 1901 (up to the point of the 1983 publishing of second edition of the book), game state analysis, and his keen baseball acumen to assign run values to each baseball event. The results were shocking to many fans and entrenched "baseball people" as it showed certain baseball truisms lacking...truth. For example, on AVERAGE there is not much value to stolen bases or especially sacrifice flies or sacrifice bunts. Some 30 years later, these assertions still rile "true baseball" fans or NL fans as they typically emphatically assert.
This is not to say such events have no value, but their value is very situation dependent. How many times do we have to see Derek Jeter (a no doubt Hall of Famer and great player) sac bunt with no outs and runners on 1st, 2nd? It's a waste unless, maybe, it's the bottom of the 9th and a tie game and your desire is to increase your CHANCE of scoring...then again, given his recent proclivity for GIDP's, maybe it's not a bad idea.
The same is true for stolen bases. A player must be successful around 80% of the time just to break even! But, in certain situations the stolen base value is much higher--e.g. the '04 ALCS when Francona inserted pinch runner Dave Roberts solely to steal bases.
Palmer derived run values from game simulations and game states because it made sense; he had substantial LEAGUE data and his values were league averages. Such data is not available to me, so I employed a different method: BaseRuns and +1.
BaseRuns is a dynamic run estimator developed by David Smyth in the late 1990's and has the form: Runs = A*B/(B+C) + D where A is the reach base factor, B is the advance base runner factor, C is the failure to advance base runner factor (Out), and D is home runs. A brief examination of the form shows logical harmony with the game; runners reach base and advance by a certain ratio or proportion. Home runs are treated as a separate entity because they are guaranteed runs. Some versions of BaseRuns place sac flies in the D term, but such a discussion is beyond the scope of this primer.
The Terms
A = H + W + IBB + ROE + CI + OS + HB - HR - CA - DP
B = .781*S + 2.61*D + 4.28*T + 2.42*HR + .82*ROE + 1.433*OS + .034*(W + HB + CI) - .741*IW + 0.813*SB + .125*CA + 1.07*S + 1.81*SF + .69*DP - .029*(AB - H - ROE - OS) - .086*K
C = AB - H - ROE - OS + S + SF
D = HR
The most complicated BaseRun term is B because of the various weights. The weights listed in B above are mostly for baseball; however, using data collected over the years, I derived weights that have proven very reliable for softball at this level of play.
Because of the logical construct of BaseRuns, it has proven applicable across a wide range of run environments. Given it's relative reliability, we may use the linear weights inherent to BaseRuns as it applies to our context; thus, I do not have to derive linear weights by other means where sufficient data is lacking.
How is this accomplished?
There are several methods to accomplish this task including the use of partial differentiation, but a very basic and intuitive approach is the +1 method. Simply, if we want to find the run value of a single, we can add (1) additional single to the system and note the difference.
Inserting +1 single to the system may not give us our most accurate answer because its comparatively large value creates a large disturbance within the system. Therefore, instead of adding a whole single to the system, we can add some fraction of a single--in our case 1 ten millionth of a single. This value was chosen because it is the smallest value Excel capably handles for this application. The difference in accuracy is not massive. Using a value of +1 produces a singles run value of 1.0904 while using a value of +0.0000001 produces a run value of 1.0882. Not a huge difference, but enough to merit the attention to detail--besides, it's the spreadsheet doing the work, not me. This process was used to derive the weights for each event in the Batting Runs table above.
Why bother?
The most important thing to remember is these SEASON AVERAGE values apply only to the specified team up to this point in the season. They may or may not be accurate on a game to game, inning to inning, or event to event basis. How can this be? Without delving into too much detail, it has to do with game states, run environments and situational leverage. Here is a game state example: a walk is worth more when issued with no outs than it is with two outs. The reason is there are 3 outs worth of opportunities to score the base runner. Conversely, a HR is worth more when hit with two outs than no outs because of fewer opportunities to score a base runner; a HR is at least one guaranteed run.
Run Environments
In an extremely high run scoring environment, there is little difference between a HR and singles. The key to scoring in such an environment is just to reach base. In a very low run scoring environment, the opposite is true--there is a large difference between singles and HRs. The key to scoring is not just getting on base, but to hit for extra bases. As a brief aside, the BR equation accurately captures this effect; e.g. compare the Extra Base Value tables for DPH and Toyota of CP.
Consider the following examples. A slow pitch softball game where 40 or more runs are scored and a Div I woman's softball game where 3 or so runs are scored. For the slow pitch game, the value of an out is nearly a run! All a team needs to do is get on base and keep the wheel turning.
At the other extreme lies the Div I softball game with its very low run environment. The key is not to just reach base, but to hit for extra bases. Strangely, most Div I teams are constructed to exacerbate the situation with lineups full of left handed slap hitters trying to reach base by making contact with the pitch while seemingly half way up the base line. The '09 University of Florida team with its season long .543 Slugging rate and .220 Isolated Power rate (an extra base every 4.5 AB's) showed what a properly constructed lineup in a low run environment can do. OK, they didn't win the WCWS, but I'm inherently optimistic.
Linear weights also indicate sensible in game decisions. For example, if the average value of an EB is around .20, and the value of a caught advancing (CA) is around -.65, it is imperative that the baserunner not get thrown out trying to take an extra base! For an accurate run value of this situation, a run expectancy table is necessary. Constructing such a table is a project for the future, I suppose.
Most importantly, the linear weight values show the true average value of events--or player contributions--independent of specific situations or games. This matters when a neutral context for evaluating a player is desired.
Thank you for reading and there is more to come.
Things to do
Adjust the K weight in B term of the BR equation. K's should be more costly because the failure to put the ball in play denies the benefit of an ROE. E's are very common at our level of play..
Develop a run expectancy table. Not likely to happen in the near term.
Introduce wOBA to the STATs section. This is a weighted OBA derived from linear weight event values. This is another way to guage the quality of a player's offense expressed in a familiar way on a familiar scale.
Credits:
Pete Palmer - Developed modern linear weights
John Thorn - Worked with Palmer and verbalized linear weights process.
Tom Tango - Brilliant baseball analyst/sabermetrician. Author of The Book. Recommended reading if any of this stuff interests you. Part time consultant for Seattle Mariners. www.tangotiger.net/
U.S. Patriot - Brilliant baseball analyst/sabermetrician. http://gosu02.tripod.com/id7.html
David Smyth - Developed BR equation
*