Monday, December 30, 2013

Site News: Movin' on up in 2014!

Last year, I set a New Year's resolution to do more analytics work, including 60 blog posts. I got through 15.

But it's not all bad! Two big pieces of news for 2014:
  • I will be attending the Sloan Sports Analytics Conference again this year.  I submitted an abstract to the research paper competition, which was accepted.  Unfortunately, the results of my research disproved my hypothesis, and the whole thing's come crashing down.  Ordinarily, that means you would see it repurposed here as a blog post but...
  • I've been hired as a contributing writer to Beyond the Box Score, "a saber-slanted baseball community", where I will be writing articles on a regular basis.
I'll keep this blog open for non-baseball stuff, but most of my writing will appear over there.

Best wishes to all my reader(s) for a happy and healthy 2014!

Friday, December 27, 2013

Sorting Through a Million Bags of M&Ms

As a kid, I used to sort bags of M&Ms by color. This was my first sort of data science project, and my parents' first clue that this one was a little off. Every once in awhile, I'll revert to that habit (especially with those fun-size bags you get around Halloween), which led to a long, drawn-out discussion with a friend about the probability of getting a fun-size bag of Skittles with no purples.


So when a recent trip to the vending machine produced a free bag of M&Ms, I found myself asking a number of questions about the distribution of the different colors in a bag of M&Ms. A quick Google search produced no official statement from the company, except this one from 2008:


Our color blends were selected by conducting consumer preference tests, which indicate the assortment of colors that pleased the greatest number of people and created the most attractive overall effect.

On average, our mix of colors for M&M'S MILK CHOCOLATE CANDIES is 24% cyan blue, 20% orange, 16% green, 14% bright yellow, 13% red, 13% brown.

Each large production batch is blended to those ratios and mixed thoroughly. However, since the individual packages are filled by weight on high-speed equipment, and not by count, it is possible to have an unusual color distribution.

Well, we have two bags of M&Ms here. We can check to see whether these proportions are still accurate using a chi-square goodness of fit test. Since there are six colors, we will be looking at a distribution with five degrees of freedom.

These tables show the result of the chi-square calculation for each bag.

BAG 1 Red Orange Yellow Green Blue Brown Total
Observed 8 12 4 10 12 8 54
Expected 7.02 10.8 7.56 8.64 12.96 7.02 54
(O-E)^2/E 0.137 0.133 1.676 0.214 0.071 0.137 2.369

BAG 2 Red Orange Yellow Green Blue Brown Total
Observed 8 7 2 14 12 11 54
Expected 7.02 10.8 7.56 8.64 12.96 7.02 54
(O-E)^2/E 0.137 1.337 4.089 3.325 0.071 2.256 11.22

For a 95% confidence interval (alpha = 0.05), x^2 = 11.0705 for a distribution with 5 degrees of freedom. This suggests that we have one normal bag and one outlier. This is not especially conclusive evidence for or against the 2008 distribution, but the significant lack of yellows makes me suspect the distribution has changed.

We can also ask questions about how unusual is a bag with only two M&Ms of a given color. Unfortunately, this is non-trivial to solve theoretically. But we can estimate these probabilities by simulating a large number of bags of M&Ms. I built a MATLAB script (available on request) to simulate an arbitrary number of 1.69-oz M&M bags. For convenience, I assumed each bag had a consistent number of M&Ms (54). I then drew 54 random numbers uniformly distributed on the interval [0,1], splitting up the number line to match the 2008 proprtions (i.e., a random number less than 0.13 meant a red M&M, a number between 0.13 and 0.33 meant orange, and so on). I repeated this process one million times, because "a million bags of M&Ms" sounded cool.


"A million bags of M&Ms isn't cool. You know what's cool? A billion bags of M&Ms."

Shut up, Justin.

Anyway, you end up with this graph showing the distribution for each color. It's discretized, of course, because 0.4 of an M&M is nonsensical. But, thanks to the central limit theorem, all of the distributions are normal. Note that the red and brown curves are essentially right on top of each other*.

* - Apologies to those of you, like my advisor, who are red-green colorblind, and thought M&Ms came in blue, yellow, orange, and a couple shades of brown.


Now that we have this data set, we can answer a whole bunch of questions.

What are the odds that I get a bag with less than N yellows?
For this, we can make cumulative distribution functions based on that figure. So, if we assume the 2008 distribution is accurate, the probability of getting four or fewer yellows in a bag is approximately 11%. And if each bag represents an independent sample (which might not be true, depending on the manufacturing process), the probability of getting two consecutive bags with four or fewer yellows is 1.2%.

What are the odds that I get a bag with less than N of one color?
Here we have to use a different curve. For instance, a bag with only four yellows seems rare from the previous graph, but remember: that just deals with the probability that you have four or fewer yellows, or four or fewer reds. This question deals with the probability you have four or fewer of one color, regardless of which color it is. And now we see that the probability you have a bag with no more than four of one color is about 48%. For two consecutive bags (assuming independence), the probability is a still-reasonable 23%. So, while getting two bags with a small number of yellows is unusual, getting two bags with a small number of any color is pretty common.


What are the odds that I get a bag missing a color?
I'm sure the process of setting those percentages involves minimizing this possibility: if you were six, and your favorite color was red, you might get upset if you went through a whole bag of M&Ms with no reds. As a result, this is a pretty uncommon occurrence: in my data set, the odds were approximately 1-in-690.

What are the odds that I get a bag that's entirely one color?
This never happened in the million trials I ran. In fact, the greatest number of any single color in one bag was 30 blues (out of 54).

What are the odds that I get a bag with equal numbers of all colors?
You would think this wouldn't be too crazy, but in fact it's very rare. I estimate that the odds are about 1-in-42,000.

What are the odds I get more blues than any other color?
Before I present these results, it's important to note that MATLAB's min and max functions don't deal with ties very well. Ideally, you'd had a function such that a two-way tie would count as 0.5 for each color, and a three-way tie as 1/3, but what actually happens is the left-most column gets 1, and everyone else gets zero. This means that red, orange, and yellow will be skewed a little high, and green, blue, and brown will be skewed a little low. But this is already a 1,000-word entry on candy-coated chocolate, so the min/max functions are good enough for me.

Thursday, December 19, 2013

Bad Beats: Getting Back on the Bowl Prediction Horse

Last year I used the Sagarin pure points method and found the games with the biggest discrepancy between the spread Sagarin predicted and the actual spread. I charted this for all of last year's bowls and finished around .500.

This year, I'm using The Prediction Tracker, which aggregate predictions from a number of systems. Treat the picks as independent and assume they fall in a normal distribution (warning: this is probably a terrible set of assumptions, but roll with it). Then, we can see how many standard deviations each spread is from the mean predictions. The ones that are furthest away are the best values.

Let's run through the Chick-Fil-A Bowl (Duke-TAMU) as an example. There are 48 picks with a mean of TAMU -6.54 and a standard deviation of 4.2. The current spread at the LVH is TAMU -12.5, approximately 1.42 standard deviations away from the mean prediction. The high Z-score demonstrates that there's value in this line, and that TAMU is overvalued. And that's why I picked Duke.

For reference, I put all the scores in a Google Document so you can bet against them as you will.

There haven't been many updates here lately, but that doesn't mean I haven't been working. Expect another update soon with a bunch of news.

Friday, September 27, 2013

42 is the Answer.

Here is a list of every Red Sox pitcher who compiled 20 or more saves from 1997 through 2013.
- Jonathan Papelbon (219)
- Derek Lowe (85)
- Tom Gordon (68)
- Ugueth Urbina (49)
- Heathcliff Slocumb (48)
- Keith Foulke (47)
- Mike Timlin (27)
- Alfredo Aceves (27)
- Tim Wakefield (22)
- Koji Uehara (21)

Note that this list excludes several would-be closers who failed to get to 20 saves:
- Joel Hanrahan (4)
- Andrew Bailey (14)
- Mark Melancon (1)
- Daniel Bard (5)
- Curt Schilling (9)
- Rod Beck (6)
- Butch Henry (6)
- John "Way Back" Wasdin (3)
- Rich "El Guapo" Garces (5)

Here is the same list for the Yankees:
- Mariano Rivera (652)
- Rafael Soriano (44, but only because Rivera tore his ACL last year).

Happy trails, you magnificent bastard.

Seriously, spend some time on Rivera's Baseball Reference page. Look on his works, ye mighty, and despair.

Sunday, September 1, 2013

Media Rights and Market Size

Last week, Larry Granillo found an old article in Sponsor magazine ("the national weekly of TV and radio advertising") that listed the TV and radio deals each MLB team had in place for the upcoming 1961 season. The chart is here, and is pretty cool on its own. Larry did a good job breaking down who spent what where, but I was personally more interested in the "why". Several companies (like American Tobacco) sponsored teams in multiple markets; how did their advertising agency decide to pay $140,000 for a third of the Red Sox' rights, but $200,000 for a third of the Tigers' rights?

Monday, August 26, 2013

The Ballad of Jose Oquendo

Pictured: Jose Oquendo, probably not pitching.

Andrew Koo wrote an article for BP today about position players pitching in extra innings. He noted that 2013 marks the sixth straight year in which this has happened, and chalks it up to increasing reliever specialization: if all your relievers are used to going no more than 2 innings per outing, you're much more likely to run out of pitchers in an 18-inning game. I would add the diminishing run environment -- less offense means that tie games last longer.

But that's not what I'm here to write about. I'm here to write about Jose Oquendo.

On May 14, 1988, Oquendo threw four innings against the Atlanta Braves and picked up the loss in a 7-5, 19-inning affair. Absent any evidence to the contrary, I'm going to claim that Oquendo owns the record for most innings pitched in a single outing by a position player. Let's go through the box score for this and dive into the blissful craziness that was the end of this game.

Thursday, August 22, 2013

How Consistent is Fantasy Football Consistency?

The end of summer means the imminent start of the NFL season. And while players prepare with grueling workouts in 100-degree heat, fans are preparing by spending hours staring at fantasy football preview magazines and webpages and cheat sheets.

The problem facing the fantasy football player is one of prediction: which statistics from the previous season best predict value in the upcoming season. One such measure of performance is a player's consistency -- the variation in the number of points he scores in a given week. Pro Football Reference has previously shown that good teams should prefer more consistent lineups, while weaker teams should prefer less consistent lineups on the theory that their best chance of winning involves a few "lightning in a bottle" weeks.

But how do you determine which players are consistent?

Sunday, February 10, 2013

Optimizing Conference Scheduling for Tournament Selection: Part II, College Basketball

This is Part 2 of a two-part post about conference scheduling in college sports. I submitted a version of this for inclusion in this year's Evolution of Sport competition at the Sloan Sports Analytics Conference in March. Since they didn't accept it, I decided to post it here. Part 1 can be found here.

Since college football won't be very topical in March, I want to close this discussion with an example from college basketball. Now, the goal in college basketball is a 68-team, single elimination tournament. Of those 68 bids, 31 are reserved for the winners of the respective conference championships. The remaining 37 are handed out by a selection committee on the basis of body of work during the season. Since no one can watch every game, the committee gets some help by the Ratings Percentage Index, or RPI.
Now, again, the formula for the RPI isn't spelled out, and there are various bonuses for winning on the road and so on, but the RPI is estimated to be 50 percent strength of schedule. And wouldn't you know it: the RPI is very consistent from year-to-year, with a correlation coefficient of about 0.75.


What's at stake? Well, if you miss the NCAAs, there's always the NIT, and I've seen teams estimate that they can clear about $25,000 to $50,000 from a good NIT run. But make the NCAAs, and Forbes estimates that your conference gets just under $2 million per game to distribute to its members. Remember that year VCU went to the Final Four? That brought in a little under $9 million for the Colonial Athletic Association.

So yes, scheduling is very important in college basketball, and I've read articles that suggest the Missouri Valley Conference punishes members that schedule soft out-of-conference games that bring down their strength of schedule. But could they be doing more?

Here we have an example of another mid-major conference, the Atlantic 10, from the 2010-2011 season. The previous year, 5 of the A-10's 14 teams won at least 19 games. That's usually good enough to be a borderline tournament team, but the A-10 only got 3 bids. In 2010-2011, the A-10 had 6 teams with at least 23 wins. And they got ... the exact same number of bids, for the exact same teams. Here you can see their conference strength of schedule as compared to their previous season's performance, and no surprise: winning doesn't correlate with the next season's strength of schedule. Note that each team played 16 conference games: you play everyone once, and then three teams twice. For the record, the teams in gold made the NCAAs in 2011, and the teams in blue missed out.


But if we take last year's performance into account, we see a drastic improvement in the average opponent's ranking for each of the top six schools – over a 60% improvement, in fact. And again, with the exception of the outlier of Fordham, strength of schedule matches up better with last year's play all around. Now, this won't necessarily guarantee you more tournament bids, but it'll improve your rankings and your public perception, and that might just be enough to get you off that dreaded bubble.


Now obviously this is not a finished product. The calculations and the sample schedules I used were strictly back-of-the-envelope stuff: there's no accounting for home-field advantage, which could be useful, and I'm almost positive the results I generated aren't optimal. And I'll admit, this won't work for every conference. You'll notice I didn't include any of the Mountain West Conference teams because there's no wiggle room: there are nine teams in the conference in football, so every team plays every other team once. That's it.

But if you're the commissioner of a college sports conference – and, with all the turmoil lately, you might be one and just not know it yet – consider all your options before you set that schedule. Take a good long look at that filet mignon. And then take a good long look at those cheeseburgers.

Thursday, February 7, 2013

Optimizing Conference Scheduling for Tournament Selection: Part I, College Football

This is Part 1 of a two-part post about conference scheduling in college sports. I submitted a version of this for inclusion in this year's Evolution of Sport competition at the Sloan Sports Analytics Conference in March. Since they didn't accept it, I decided to post it here. Part 2 is due Monday.

In 2008, the Boise State Broncos of the Western Athletic Conference were ranked 9th in the final BCS standings. That same year, the TCU Horned Frogs of the Mountain West Conference were ranked 11th. Now, in part because neither school was in one of the power conferences like the SEC, both teams were passed over for the most prestigious bowls, and met in the Poinsettia Bowl. Both teams earned a payout of $750,000.

The next season, Boise State finished 6th in the BCS standings, and TCU finished 4th. This time, they met in the Fiesta Bowl, one of the four games in the Bowl Championship Series, and earned a payout of $18 million each. Same teams, very similar regular seasons, 24 times more money. 24 times! That's the difference between a filet mignon with crab meat on top at Smith and Wollensky, and two cheeseburgers – no fries – at McDonald's.

And that's just the monetary benefits. That doesn't even count the national exposure for recruiting, or the increase in freshman applications that typically follows athletic program success.

So, naturally, if you work for a school like Boise State or a conference like the Mountain West, you want to know, "What can I do to improve my chances to get into the biggest bowl games and get that BCS money?" My talk will describe how conferences can improve their members' chances by stacking their conference strength of schedule.

Tuesday, February 5, 2013

Site News: One Month 'Til Sloan...

A little while ago, I wrote about an Evolution of Sport proposal that I submitted to the MIT Sloan Sports Analytics Conference. Today I found out that the proposal was not selected for this year.

But that's the extent of the bad news. I will still be at the conference in Boston this March, and will be tweeting all weekend (and maybe writing a recap once it ends). I also get a couple free blog posts out of it. I'll be posting the first half of my talk Friday, and the conclusion on Monday. For now, here's the abstract, which was at least competent enough to get me into the second round of the competition:

The championship of the NCAA Football Bowl Subdivision is determined by the formula established by the Bowl Championship Series (BCS). Under this system, schools affiliated with smaller conferences (e.g., Boise State and Tulane) are hindered by their strength of schedules. The non-conference games played by these schools are typically determined years in advance, but the conference games are announced a few months before each season. The scheduling of these conference games is dominated by fixed divisions based on geography; as a result, schools from smaller conferences with BCS aspirations are often saddled with suboptimal conference schedules, forgoing games against superior in-conference opponents in order to play weaker schools in the same arbitrary division. To prevent this, we propose non-automatic qualifying conferences use a relegation-style system to assign teams to each division. In defense of this, we show that year-to-year records in major college football programs are strongly correlated, such that last year’s top teams are more likely to be this year’s top team. We further demonstrate that such a system would have significantly improved the conference strength of schedules for several previous BCS contenders. This proposed system will therefore give schools from non-AQ conferences the greatest chance to make a major bowl game, resulting in greater revenues and exposure for the conference as a whole and each of its member institutions. We conclude by showing that similar schemes could be used to improve the reputation of teams in other sports, including the so-called “mid-major” conferences in NCAA Division I basketball.

Of course, I welcome any and all feedback on this topic.

Wednesday, January 30, 2013

Super Bowl Hype Drive: Prediction Aggregation

Back in grade school, I used to poll my friends the week before the Super Bowl, asking who they thought would win. (Okay, fine: In order to get a better sample size, I asked the people who weren't my friends, too.) And they performed well considering the sample: off the top of my head, they correctly called the two of the three games I polled them on.

You may remember that, last March, I collected a few dozen brackets and graded them during the NCAA basketball tournament. Most of the brackets predicted that Kentucky would eventually win the championship, which they did. How accurate would a crowdsourced Super Bowl prediction be?

Tuesday, January 29, 2013

Super Bowl Hype Drive: New Orleans' Bad Mojo

This year marks the Super Bowl's return to New Orleans for the first time since 2002, when the Patriots upset the Rams on a last-second Adam Vinatieri field goal. But the Superdome has a reputation for hosting blowouts, even given the fact that many Super Bowls are one-sided affairs. Is this reputation deserved?

Thursday, January 24, 2013

Super Bowl Hype Drive: Room for Squares

The run-up to the Super Bowl leads to a lot of great traditions -- parades! weird bloggers at media day! the disfiguration of millions of chickens! -- but high among these is the office squares pool. You've probably seen them; they look something like this:


You sign your name to a square, and once all the squares are filled, someone picks row and column values. Say you end up with the square (3, 1). That means that, at the end of any of the four quarters*, you win money if the score is X3-Y1 -- e.g., 21-13, 31-3, 41-33, etc.

* - Actually, the fourth quarter one usually includes overtime periods too, so if you had Ravens 8, Broncos 5 as a square in the AFC divisional playoff game, you would've won once Justin Tucker made that kick in the second overtime.

Obviously, the odds of every combination aren't equal, and you know intuitively that multiples of 7 or 3 are more likely to come up. Wouldn't it be nice to know what your odds of winning were compared to someone who drew, say, 7-7?

To determine the odds, I used Pro Football Reference's Score Index to find the score by quarters of all playoff games dating back to the 1994-5 season, when the NFL added the two-point conversion. This gives you a 20-year sample that (with this season included) encompasses 208 total games. Here are the total number of times each combination of numbers has occurred.

0 1 2 3 4 5 6 7 8 9
0 62 16 6 66 65 6 25 112 7 7
1 5 3 13 20 6 7 27 10 3
2 0 3 4 1 4 9 3 1
3 19 35 3 11 64 6 8
4 20 8 9 43 9 11
5 1 1 7 3 1
6 1 14 2 3
7 41 9 6
8 2 3
9 1

Divide by the total number of quarters of football played (832), and you get percentages:

0 1 2 3 4 5 6 7 8 9
0 7.5% 1.9% 0.7% 7.9% 7.8% 0.7% 3.0% 13.5% 0.8% 0.8%
1 0.6% 0.4% 1.6% 2.4% 0.7% 0.8% 3.2% 1.2% 0.4%
2 0% 0.4% 0.5% 0.1% 0.5% 1.1% 0.4% 0.1%
3 2.3% 4.2% 0.4% 1.3% 7.7% 0.7% 1.0%
4 2.4% 1.0% 1.1% 5.2% 1.1% 1.3%
5 0.1% 0.1% 0.8% 0.4% 0.1%
6 0.1% 1.7% 0.2% 0.4%
7 4.9% 1.1% 0.7%
8 0.2% 0.4%
9 0.1%

So, from the looks of things, (7, 0) is the best combination to get, right?

Not so fast. These numbers include ALL scores, both X0-Y7 and X7-Y0. In typical versions of this game, you only get one of the two squares. How can you determine which of those are more likely to occur? You can't really use home/away, since the playoffs include the neutral-site Super Bowl, and the home/away designation isn't meaningful for those games*. So let's just split it down the middle: if you have (7, 0) and someone else has (0, 7), 50% of the time you'll be on the right side of the pairing and 50% of the time you'll be on the wrong side.

* - Unless you really, REALLY like coin flips.

If you have (0, 0) -- or any of the other pairs along the diagonal -- you're in luck; there's no one else to split the odds with. That means that, overall, your best bet is that (0, 0) square, and THEN one of the (0, 7) or (7, 0) squares. The top five, by percentage:

Square Pct
0,0 7.5%
0,7 6.7%
7,7 4.9%
0,3 4.0%
0,4 3.9%

The bottom five, of course, remains the same, with (2, 2) as the kiss of death with zero occurences.

Good luck, everyone. I for one will be rooting hard for a 42-12 final.

Monday, January 21, 2013

Super Bowl Hype Drive: Naming Wrongs

I'm only going to write this once: it's the Super Bowl, not the Superbowl. It is a bowl that is super. The name was ripped off from the Super Ball, a popular 60s children's toy (which seems ironic given how touchy the NFL is about Super Bowl-related copyright infringement). You'd expect this to be related to the rise of Twitter hashtags, but the pandemic predates the Fail Whale (which, incidentally, is never called the "Failwhale").

I don't get it. No one talks about the "Rosebowl" or the "Worldseries" or the "USOpen" or "Daytona500". Last year's game didn't feature the "Newengland Patriots" and "Newyork Giants". This year's game isn't in "Neworleans". The winner doesn't get the "Vincelombarditrophy".

So please, guys, it's two words, both capitalized. From now on, let's only use "Superbowl" if we're referring to Clark Kent's dishware (shipped with him from Krypton, of course, and able to serve heavy gravies in a single bound), or maybe a really excellent bird of prey.

...a superb owl? Anyone? No?

All right, fine.

Wednesday, January 16, 2013

How Much Is a Win Worth to an NBA Team?

Last month, I used J.C. Bradbury's free agent valuation method to determine how many wins the Red Sox expected Mike Napoli and Shane Victorino to contribute to the team in 2013. That worked fine, but suppose we want to build a similar model for the NBA. Again, we'll use the basic system Bradbury outlines in "The Baseball Economist" (ch. 13). Here, Bradbury found a relationship between revenue, wins, and the size of the city a franchise plays in.

All three of those variables are readily available. For city size, we'll use the population of the metropolitan statistical area (MSA) each team plays its home games in, as reported in the 2010 U.S. Census*. Revenue is available through Forbes' Business of Basketball listings. This data is almost exactly one year old -- suggesting that it covers the 2010-2011 season, and not the recent lockout-shortened 2011-2012 season. This is better for our purposes; I don't want the compressed schedules and reduced number of games to interfere with my results.

* - And the Canadian equivalent for Toronto, with the hope that the two have very similar methodologies.

Tuesday, January 8, 2013

Bad Beats: Sagarin's Predictor Rankings and the Bowl Games

I promised more updates, but that obviously hasn't happened. There are three reasons behind this: first, the million distractions of the holidays got in the way, as tends to happen; second, I've been working on my Evolution of Sport proposal for the upcoming Sloan Sports Analytics Conference, and third, it's harder than you'd think to type with these broken thumbs.

Maybe I should explain.

A coworker of mine is in a weekly football pool: for each NFL game, guess which team will cover the spread to win fame and fortune, etc. The pool keeps going into the playoffs, but since there aren't that many playoff games, he's also required to pick any six college bowl games against the spread. Last year, he told me, he used some good guessing and coin flipping to go 3-3. This year, he wanted to do better.