powered bybetstamp
Menu

Dear Plus EV #3: Notes on data sets, power ratings, and free bets

DEAR PLUS EV.png

Welcome to another instalment of Dear Plus EV! I’ll be answering questions I get from the community on the analytics and mathematics of betting. Got a question that you would like to have answered? DM me on Twitter @PlusEVAnalytics, or email me at [email protected]. Some questions may be lightly edited for clarity. Let’s get right to it!

Can you recommend a quick tool to estimate the probability distribution of a data set? I used to use something called EasyFit/StatAssist (which was recommended in Andrew Mack's books) but it doesn't seem to exist online anymore. Do any similar apps/sites exist or do I need to learn how to do this the old fashioned way? - Twitter @shrew123

When you're building a model that uses a probability distribution, there are hundreds of them to choose from...so I understand the desire for a quick way to help you pick one. A quick Google search on "distribution fitting tools" gives a bunch of results that look promising, but I have never used any of them myself. I've found that you can do most things in sports analytics with a very limited menu of distributions:

  • Binomial for counts of events where there is a finite upper bound (i.e. the number of times something happens where there is a fixed number of opportunities for that thing to happen), like # of wins in a season
  • Poisson for counts of events where there is no limit on the number of times it could happen, like # of goals scored in a game
  • Beta-Binomial and Negative Binomial for overdispersed versions of Binomial and Poisson respectively
  • Normal for quantities that are close to symmetrical, meaning that mean = median, like quarterback passing yards in a game
  • Gamma for quantities that are "right-skewed", meaning that mean > median, like running back rushing yards in a game
  • Beta for quantities that live in the space between 0 and 1 (probabilities, proportions, etc), like a team's theoretical winning percentage against a league-average opponent

Generally, my plan A is to see if one of the choices above is a good match to the thing I'm trying to model. Plan B is to see if there's a simple transformation I can apply to the thing I'm trying to model to get it to fit one of the choices above. The most common transformations for this purpose are:

  • log(x) or its inverse y = exp(x), called a "log transform", useful to reduce/remove right-skew and transform positive numbers into positive and negative numbers
  • log(x/(1-x)) or its inverse y=exp(x) / (1+exp(x)), called a "logit transform", useful to transform positive numbers into numbers between 0 and 1

You'll notice that I've talked about symmetrical distributions and right-skewed distributions but what about left-skewed distributions? These would have mean < median, and I've done a lot of searching and found nothing great to deal with those, so I did what any good statistician does and made something up. I can't talk about my specific use case for left-skewed distributions because I'm bound by a confidentiality agreement, but to illustrate the method let's talk about a left-skewed distribution that's close to my actuarial heart: lifespans. According to current mortality tables, the median lifespan at birth for an American male is 79.5 years but the mean is only 75.7. To visualize the asymmetry, you can start from the median of 79.5 and move an equal distance (let's say 50 years) in each direction: dying at the age of 29.5, while not common, is tragically possible while living to 129.5 is not. 

So here's my own personal hack for left-skewed distributions to get them to fit into one of the boxes above...raise the value to the power of K, where K is selected to force the transformed mean to equal the transformed median. In mathematical terms, select K such that (E[X^K])^(1/K) = median(X). You can use a solver or just plain trial and error to see that for our lifespan data, K = 3.4 solves that equation. So, while the raw lifespan data is left-skewed and does not fit any of the distributions in our toolbox, (lifespan ^ 3.4) is somewhat symmetrical and could possibly, depending on the application, be approximated using a normal distribution.

When originating a football power rating, what's the best way to put the model on a points basis so lines can be made between the two teams? Obviously the goal is to turn stats like yards per play or success rate into points, which can be done with regression. Is it that simple? - Twitter @BuckeyeSix 

Is it that simple? Yes...and no. Yes, my go-to method for converting one or more statistics into a game prediction would be some kind of regression. But, your basic-bitch linear regression model contains a built-in assumption that everything works in a linear way - which it doesn't necessarily in the real world. For example, a linear regression on NFL margins of victory would ignore the impact of "key numbers" and would predict games decided by 1, 2, 5 or (gulp) zero points with a higher frequency than what we actually see. A better model might be a logistic regression on wins and losses, the results of which can then be converted to point spreads using any of the publicly available money line to spread conversion calculators that are out there. 

You also might get nonlinearity in your predictor variables. Is the difference between 4 and 5 yards per play equal (in terms of its impact on your game prediction) to the difference between 8 and 9 yards per play? Maybe yes, maybe no...but make sure you look at the data and make conscious choices for these things in your model instead of accepting the default assumption that everything is linear. Consider using something like a linear spline to build a model that better fits your data.

What is the optimal strategy for two books that have up to $1k back in free play if your first bet loses? - Twitter @AgerWho 

Ah, the good old "risk-free bet" conundrum. Ok, ready to watch me dance around without giving a clear answer? Great, let's do it.

The reason I can't answer this definitively is that there is no universal definition for "optimal" in the context of this problem, compared to a question like: "what's the optimal strategy for when you should hit or stand in blackjack?" In blackjack, there is a clear objective and that's to win the hand. You make your hit/stand decisions in a way that gives you the maximum possible likelihood of winning the hand, and it's a pretty straightforward mathematical exercise to figure out what those decisions should be. It's called "basic strategy" and it's been a completely solved problem since the 1950s. It's simple because at the point where you have to make the decision, there are only two possible outcomes, you can win 1 unit or you can lose 1 unit. With the free bet question, you're operating in a much broader and more complex space of possibilities. Do you want to try to win big? Are you happy to win small as long as you win period? What is your level of tolerance for risk? These things are individual preferences that would depend on your baseline level of wealth as well as your psychological / cultural attitude towards risk. So I can give you the extreme answers at each end of the spectrum and I can tell you what I personally would do, but "optimal" in this circumstance is in the eye of the beholder.

Let's start by defining the problem. As I discussed on 90 Degrees (listen below), I have a serious beef with this type of promo being marketed as a "risk free bet". To me, risk free means if you bet $1000 and you lose, you get refunded $1000 cash. But you don't - you get refunded $1000 in bet credits. But bet credits are almost as good as cash, right? They are absolutely not. Why? Let's look at what happens when you use them. If you place a $1000 cash bet on the Raiders money line at +120 odds and it wins, you get $2200 in cash - your $1000 wager back plus $1200 in winnings. If you place $1000 in bet credits on the Raiders money line at +120 odds and it wins, you get $1200 in cash - $1200 in winnings only. You don't get your $1000 wager back. You don't even get the bet credits back so you can use them again. After you use them once, they're gone. That sucks, and it sucks more in proportion to how big of a favorite you take with your credits. If you put them on the Chiefs money line at -500 and it wins, you get a lousy $200 AND you lose the $1000 in credits. 

 

Listen to Matt's interview on 90 Degrees below:

 

All of this leads to two principles of strategy that truly illustrate how perverse the "risk free bet" name is:

  • The promo gives you something for free, but only if your initial bet loses. If the initial bet wins, you get nothing extra. So, you maximize the EV of the promo by taking a big longshot on your initial bet, giving yourself the greatest likelihood of scoring the freebie.
  • If and when you lose your initial bet and get the risk-free bet, you maximize the EV by using it on, once again, a big longshot. Because you don't get back the amount of the bet, you want to shift the split of your payout between "initial bet" and "winnings" as much towards winnings as possible. 

So, the EV-maximizing strategy for a risk-free bet is to take the biggest longshot you can find, then if you lose, to use the bet credits on the biggest longshot you can find. So, pretty much the exact opposite of "risk free". Is that the “optimal” strategy? Sure, if you’re rich enough that you can tolerate the likely loss of $1,000.

Now let’s look at the other end of the spectrum. What if you want to truly do this risk-free, maximizing your guaranteed payout amount? If you have enough spare cash lying around and you can find a market with high enough limits, you can always take the longshots as I suggested above and then hedge the other side with a cash bet. You want to place your hedge at a different book, both because some books will void the promo if you bet both sides and because you’ll likely be able to find a better price on your hedge at a different book. 

If you don’t have hedge money and you want to play this in the absolute wimpiest possible way, you’d take the two books offering the promo and bet opposite sides of the same game at each book for $1000. Assuming -110 lines, that would leave you with $1909 cash at one book and a $1000 credit at the other book. You then take opposite sides of another game, one using the $1000 credit and the other using $312 cash (thank you, algebra!) so that no matter who wins you’re left with $596 cash from the second bet. So you end up with $1909 - $312 + $596 = $2193 cash, from your starting point of $2000. And that, my friends, is how to turn a “free $2000” with strings attached into a free $193.