Tuesday, March 18, 2008

Using Expected Values in an NCAA Bracket with Upset Points

Part 1

Our tournament bracket gives 1 point for every first round pick, 2 points for every second round pick, 3 points for third round, etc. But they also have "upset scoring." If you pick an underdog to win and they do, you get the normal point(s), plus bonus points worth the difference of the two seeds. For example, if you pick a 14 over a 3 in the first round and are right, you get (1 + (14 - 3)) = 12 points.

I found a page on ESPN.com that lists the historical results for each tournament pairing for each round, ever since the tournament went to 64/65 teams. For example, of the 84 3-14 matchups, the 3 seed has won 70 (83.3%) and the 14 seed has won 14 (16.7%).

I used this to calculate the Expected Value of each seed and matchup. For example, the first round EV for the 3 seed is (1 point * 83.3% probability) = 0.833. The first round EV for the 14 seed is (12 * 16.7%) ~ 2.

Two questions:
1. How should I calculate the probabilities for the later rounds? For example, 3 seeds made it to the second round 70 times and won 39 times. But if they had won all their first round games, they would have made it to the second round 84 times. Should I use the probability 39/70 or 39/84 to calculate the EV?

2. How do I pick which seeds to advance? In my example above, the short-term solution is to pick the 14 over the 3 seed because 2 > 0.83. But looking down the line, no 14 seeds have made it past the Sweet 16, while several 3 seeds have won the tournament. How do I determine the best "long term" solution?

I have a spreadsheet with all my data if someone would like me to email it to them.

Part 2 (Answering My Own Questions)

1. I took everything out of 84. This makes the numbers look much more sensical. For example, an 8 seed has made the championship once, and they won that championship (NC State?). (1/1)*13 = 13 is way to high an EV. But (1/84)*13 = 0.15 looks much better.

2. The best thing would be to plug everything into Excel (which I already did) and fight with Solver for the next two days until the bracket is due. But I don't have Solver installed on my work machine, which is good because now I can do real work and not get fired. So I set up three baseline scenarios:
a) Picking all top seeds gives me an EV of 16 points for an entire regional.

b) I started from the Elite 8 and worked backwards. The 1 seed had the best EV for winning the Elite 8 game, so they were locked in to get to that point. The 1 and 2 had the best EV for winning the Sweet 16 games, so they're locked in.
In the round of 32, I have to advance the 1 and 2 then. That leaves me to pick one team each from the 3/14/6/11 set and one from the 4/13/5/12 set. For the round of 32, the 6 and 13 actually had the best EVs out of those sets.
For the round of 64, I had the 1, 2, 6, and 12 locked in. Of the remaining games, the 14, 13, 10, and 9 were better.
This method gave me a regional EV of 23 points.

c) I started from the beginning and worked forwards. Looking head-to-head, 1 vs 16, 2 vs 15, etc., here are the seeds that had better EVs than their first round opponents: 1, 2, 14, 13, 12, 11, 10, 9.
In the round of 32, I paired those seeds as in the actual tournament, and these EVs were better: 1, 10, 6, 12.
Between those pairs (in the Sweet 16), the 1 and 6 had the highest EVs.
Of course, after that the 1 seeds advanced.
This method gave me a regional EV of 24.

So now what? I'm going to try this method on my work bracket this year, but just as a trial run. I have four regionals and three methods. I'm going to use one method for each regional, plus do the fourth regional (the East, Notre Dame's bracket) "by feel." I'll let you know the results.