Wednesday, January 30, 2013

Super Bowl Hype Drive: Prediction Aggregation

Back in grade school, I used to poll my friends the week before the Super Bowl, asking who they thought would win. (Okay, fine: In order to get a better sample size, I asked the people who weren't my friends, too.) And they performed well considering the sample: off the top of my head, they correctly called the two of the three games I polled them on.

You may remember that, last March, I collected a few dozen brackets and graded them during the NCAA basketball tournament. Most of the brackets predicted that Kentucky would eventually win the championship, which they did. How accurate would a crowdsourced Super Bowl prediction be?

Tuesday, January 29, 2013

Super Bowl Hype Drive: New Orleans' Bad Mojo

This year marks the Super Bowl's return to New Orleans for the first time since 2002, when the Patriots upset the Rams on a last-second Adam Vinatieri field goal. But the Superdome has a reputation for hosting blowouts, even given the fact that many Super Bowls are one-sided affairs. Is this reputation deserved?

Thursday, January 24, 2013

Super Bowl Hype Drive: Room for Squares

The run-up to the Super Bowl leads to a lot of great traditions -- parades! weird bloggers at media day! the disfiguration of millions of chickens! -- but high among these is the office squares pool. You've probably seen them; they look something like this:


You sign your name to a square, and once all the squares are filled, someone picks row and column values. Say you end up with the square (3, 1). That means that, at the end of any of the four quarters*, you win money if the score is X3-Y1 -- e.g., 21-13, 31-3, 41-33, etc.

* - Actually, the fourth quarter one usually includes overtime periods too, so if you had Ravens 8, Broncos 5 as a square in the AFC divisional playoff game, you would've won once Justin Tucker made that kick in the second overtime.

Obviously, the odds of every combination aren't equal, and you know intuitively that multiples of 7 or 3 are more likely to come up. Wouldn't it be nice to know what your odds of winning were compared to someone who drew, say, 7-7?

To determine the odds, I used Pro Football Reference's Score Index to find the score by quarters of all playoff games dating back to the 1994-5 season, when the NFL added the two-point conversion. This gives you a 20-year sample that (with this season included) encompasses 208 total games. Here are the total number of times each combination of numbers has occurred.

0 1 2 3 4 5 6 7 8 9
0 62 16 6 66 65 6 25 112 7 7
1 5 3 13 20 6 7 27 10 3
2 0 3 4 1 4 9 3 1
3 19 35 3 11 64 6 8
4 20 8 9 43 9 11
5 1 1 7 3 1
6 1 14 2 3
7 41 9 6
8 2 3
9 1

Divide by the total number of quarters of football played (832), and you get percentages:

0 1 2 3 4 5 6 7 8 9
0 7.5% 1.9% 0.7% 7.9% 7.8% 0.7% 3.0% 13.5% 0.8% 0.8%
1 0.6% 0.4% 1.6% 2.4% 0.7% 0.8% 3.2% 1.2% 0.4%
2 0% 0.4% 0.5% 0.1% 0.5% 1.1% 0.4% 0.1%
3 2.3% 4.2% 0.4% 1.3% 7.7% 0.7% 1.0%
4 2.4% 1.0% 1.1% 5.2% 1.1% 1.3%
5 0.1% 0.1% 0.8% 0.4% 0.1%
6 0.1% 1.7% 0.2% 0.4%
7 4.9% 1.1% 0.7%
8 0.2% 0.4%
9 0.1%

So, from the looks of things, (7, 0) is the best combination to get, right?

Not so fast. These numbers include ALL scores, both X0-Y7 and X7-Y0. In typical versions of this game, you only get one of the two squares. How can you determine which of those are more likely to occur? You can't really use home/away, since the playoffs include the neutral-site Super Bowl, and the home/away designation isn't meaningful for those games*. So let's just split it down the middle: if you have (7, 0) and someone else has (0, 7), 50% of the time you'll be on the right side of the pairing and 50% of the time you'll be on the wrong side.

* - Unless you really, REALLY like coin flips.

If you have (0, 0) -- or any of the other pairs along the diagonal -- you're in luck; there's no one else to split the odds with. That means that, overall, your best bet is that (0, 0) square, and THEN one of the (0, 7) or (7, 0) squares. The top five, by percentage:

Square Pct
0,0 7.5%
0,7 6.7%
7,7 4.9%
0,3 4.0%
0,4 3.9%

The bottom five, of course, remains the same, with (2, 2) as the kiss of death with zero occurences.

Good luck, everyone. I for one will be rooting hard for a 42-12 final.

Monday, January 21, 2013

Super Bowl Hype Drive: Naming Wrongs

I'm only going to write this once: it's the Super Bowl, not the Superbowl. It is a bowl that is super. The name was ripped off from the Super Ball, a popular 60s children's toy (which seems ironic given how touchy the NFL is about Super Bowl-related copyright infringement). You'd expect this to be related to the rise of Twitter hashtags, but the pandemic predates the Fail Whale (which, incidentally, is never called the "Failwhale").

I don't get it. No one talks about the "Rosebowl" or the "Worldseries" or the "USOpen" or "Daytona500". Last year's game didn't feature the "Newengland Patriots" and "Newyork Giants". This year's game isn't in "Neworleans". The winner doesn't get the "Vincelombarditrophy".

So please, guys, it's two words, both capitalized. From now on, let's only use "Superbowl" if we're referring to Clark Kent's dishware (shipped with him from Krypton, of course, and able to serve heavy gravies in a single bound), or maybe a really excellent bird of prey.

...a superb owl? Anyone? No?

All right, fine.

Wednesday, January 16, 2013

How Much Is a Win Worth to an NBA Team?

Last month, I used J.C. Bradbury's free agent valuation method to determine how many wins the Red Sox expected Mike Napoli and Shane Victorino to contribute to the team in 2013. That worked fine, but suppose we want to build a similar model for the NBA. Again, we'll use the basic system Bradbury outlines in "The Baseball Economist" (ch. 13). Here, Bradbury found a relationship between revenue, wins, and the size of the city a franchise plays in.

All three of those variables are readily available. For city size, we'll use the population of the metropolitan statistical area (MSA) each team plays its home games in, as reported in the 2010 U.S. Census*. Revenue is available through Forbes' Business of Basketball listings. This data is almost exactly one year old -- suggesting that it covers the 2010-2011 season, and not the recent lockout-shortened 2011-2012 season. This is better for our purposes; I don't want the compressed schedules and reduced number of games to interfere with my results.

* - And the Canadian equivalent for Toronto, with the hope that the two have very similar methodologies.

Tuesday, January 8, 2013

Bad Beats: Sagarin's Predictor Rankings and the Bowl Games

I promised more updates, but that obviously hasn't happened. There are three reasons behind this: first, the million distractions of the holidays got in the way, as tends to happen; second, I've been working on my Evolution of Sport proposal for the upcoming Sloan Sports Analytics Conference, and third, it's harder than you'd think to type with these broken thumbs.

Maybe I should explain.

A coworker of mine is in a weekly football pool: for each NFL game, guess which team will cover the spread to win fame and fortune, etc. The pool keeps going into the playoffs, but since there aren't that many playoff games, he's also required to pick any six college bowl games against the spread. Last year, he told me, he used some good guessing and coin flipping to go 3-3. This year, he wanted to do better.