You want to know a secret? The underlying secret to all media and entertainment? The peak behind the curtain that explains all you see in film, TV, music and more?
Here it is.
“Logarithmically distributed returns.”
Once you learn it you can’t forget it. Like how to do a magic trick, which is what I call it, my magic trick for the business of entertainment. I didn’t discover logarithmic distributions. I first read it in Vogel’s Entertainment Industry Economics, the wonk bible of entertainment financial analysis. (Figure 4.8 in chapter 4 if you’re really curious.) I also assume it’s the theoretical underpinning of Anita Elberse’s Blockbusters, which I haven’t read. (Her book is one of those books that has been on my “to read list” for years.
Unfortunately, I can’t just show you that logarithmic distribution under girds all of entertainment. As important as the “logarithm” part of the statement is, the “distribution” part is even more crucial. I don’t want to gloss over that. The value comes in not just seeing one chart, but seeing the value of distributions as a tool.
Today, I’m going to teach you about distributions. What they are and why you need them. This is a mini-statistics lesson to pair with my other mini-statistics lesson on why you can’t use data to pick TV series. I won’t use any equations, because they’re boring, but I’ll show you what the distributions look like. Then, tomorrow I’ll show you the ubiquity of logarithmic distribution.
(As I recommended before, go pick up The Cartoon Guide to Statistics for the best reader on statistics. Learn them in a weekend. It’s way better than this very useful, but very technical Wikipedia page.)
Before we get to the “what” of distributions, let’s get to the “why”.
We live in a statistical distribution
A lot of news coverage on most issues—politics, sports, criminal justice, business—might mislead you on this point. The world seems like an either or world. This or that. One or the other. Binary choices.
But it isn’t. It’s a distributional world. What that means is that most outcomes fall on a spectrum of possible outcomes. An election could be won by a thousand votes, a million votes or ten million votes. A team can win fifty points, tie or lose by fifty, and everything in between. A blockbuster movie could earn a billion dollars or 100 thousand dollars or anywhere in between. A range of outcomes.
We often try to summarize our distributional life in “averages”. Let’s use an example to make it concrete. Since the NFL season just started, we’ll use that. I found the scoring margin of victories for all NFL games (2,668) going back to 2002 here. (The data set didn’t include ties.) If I calculated the mean average, I’d find that, on mean average, NFL teams won their games by 11.9 points. By median average, that number is 9. Of course, the mode, or most frequent scoring margin is 3 points, followed by 7 and 10.
Those numbers, though, aren’t very helpful. We know something about the data, but in general, we still don’t know what it looks like. Knowing what it looks like is a visual way of interpreting the data’s shape, size and characteristics. That’s where distributions come in. Here’s the above data in chart form:
A distribution, at its core, is a description of data, most frequently using a visualization to show you the percentage of outcomes. You could use tables too, but I’m a visual person. The key is that distributions come in lots of different shapes and sizes. Some fall into similar forms, but many are unique. Those shapes and sizes can have a huge impact on what the data means…impact that is lost in averages.
The Flaw of Averages
At it’s simplest, the flaw of averages is the old saying that a statistician drowned in a river with an average depth of 3 feet. See this cartoon from the San Jose Mercury News:
“Plans based on average conditions are wrong on average.”
Here’s an example of that in action. Say a manufacturing process has a ten steps to it, and each has a 75% percent chance of staying on time. That’s pretty good, seventy-five percent. So, how often is the process delayed? Many would say, “Oh, only 25% percent of the time”. Actually, the result is that the process is almost always delayed! It ends up delayed 94% of the time.
Most businesses, academics and journalists rely on the “average” when it is usually phenomenally misleading. The reason is simple: it’s easy. You have a long column of data, and one excel function returns you the median or mean average. You have to set up an entire chart to show the distribution, and make decisions for how you frame that. If you’re writing to publish on the web quickly, the average is easiest. Often, it’s the sexiest number too.
This has real world consequences. Have you ever seen a five year plan? Of course you have. A five year plan—90% of the time—is a collection of estimates of the average performance of a firm. A CFO took the average revenue projected and subtracted the average costs projected. See where I’m going with this? Financial plans based on averages conditions are wrong, on average.
If you’re reading carefully, you’ll noticed I switched from an example of a data set in the first section—NFL scores—to predictions about future financial performance of firms. This is really the key learning point for distributions: Once we have a description of the real world—be it for sports or finance or entertainment or biology or anything—we can convert our “counts” of real world phenomena into “percentages”. Those percentages become probabilities when we use them to predict the future.
The power of distributions is they help us predict the future, more accurately.
When I write about distributions today and tomorrow, I’ll use data set and probability examples interchangeably. Basically, if you’re describing data in the past, that’s a description of the data. If we use that to forecast the future, we’re in the realm of probabilities. Two sides of the same coin, the past and the future, split by the now.
Since predicting the future is tough—have I written about that yet?—we should use the best tools we have. And averages are poor tools compared to distributions.
Distribution Shape 1: Uniform Distribution
So let’s start with the simplest distribution: uniform. This means that in a scenario every outcome is equally likely. What’s the easiest one to show? Dice!
Quick, what is the average roll of a single dice?
This is one of those brain tricks that I believe Daniel Kahneman and Amos Tversky used to show how behavioral economics works.
Did you say 3? A lot of people do. Take a look at the chart below, showing our first distribution, the uniform distribution:I’m going to explain the axes so we’re on the same page. The left hand axis, the Y-axis, shows the probability of a specific outcome. The X-axis, the one running on the bottom, shows the potential outcomes. For a six sided dice, you have six outcomes, returning a 1 to 6. If you only had a coin, you have only two outcomes. If you’re playing Dungeons and Dragons and had a ten sided die, you’d have ten outcomes. The more outcomes, the lower the odds in a uniform distribution.
Mathematically speaking, this is a “discrete distribution” where you have a specific number of possible outcomes. You could also run a uniform distribution as a “continuous distribution”, where it has a range of infinite outcomes. In today’s article, I’m not going to dive deep into the differences between continuous and discrete probabilities, because I mainly want to show the shapes of different distributions, not how to calculate them. I used the dice example above because continuous uniform distributions are hard to find good real world examples. (I went to my statistics textbook on my bookshelf, and it had an example about a pipe bursting, which wasn’t great. Yes, I keep my statistics text book close at hand.)
Back to the brain teaser, most people just naturally think that since three is the two halves of a die (three plus three), it is the expected value of a die roll, not 3.5. Again, the “average” of 3.5 tells you hardly anything about rolling a die; the distribution says everything is equally likely.
Distribution Shape 2: Discrete Probabilities
What if we don’t have a uniform set of probabilities, but a different amount? So we still have a limited (discrete) set of outcomes, but all sorts of different probabilities? To use the dice game, some board games, skew the odds for rolls. So if you roll a six you “win” a prize, if you roll a 3-5 nothing happens, or if you roll a 1 or 2 you “lose”. Scenarios like this happen in certain cooperative or advanced board games like Eldritch Horror. Yes, I’m a nerd who has a stats textbook and plays board games. This outcome would look something like this:
This type of distribution is great for scenarios where you know all outcomes aren’t equally likely, but you may not have good data so have to make estimates. I did this for my Lucasfilm series in the section of feature film projections. I don’t have data that predicts how many future Star Wars films Lucasfilm will make, but I know all outcomes aren’t equal. Same with box office performance. So I made some assumptions. So here’s how that turned out.
I converted those percentages to the total box office as a percentage of initial price, which gets us a range of outcomes. (This should look similar to fans of Nate Silver’s 538 website.)
The key for discrete probabilities is they still need to add up to 100%. Otherwise, you’re missing something. That said, you can quickly complicate it by having correlated variables and other interactions. Again, just know that discrete distributions can look all sorts of different ways, like my Star Wars example or the NFL scores above.
Distribution Shape 3: Binomial Distribution
Regarding uniform distributions, there is really one even simpler than a six-sided dice. It’s the most simple game of chance, and I would have put it first if it didn’t make such a great bridge to the next distribution. That’s the outcome of a single coin flip. In chart form, it looks like this:
A dice is a uniform distribution with two outcomes. Yes or no. Heads or tails. Odds or even. So on. They’re “mutually exclusive” meaning you can’t have them both occur at the same time. The name for this in statistics is “binomial”. Now you can alter binomials in two key ways and ask a lot of fun questions on those alterations: first, you change the percentage from anything above 0 to below 100%. Then, you can repeat the number of “experiments” which is what you call a single coin toss.
What if you take that outcome, and run the scenario multiple times. So you flip the dice twice, or three times or four times and so on? Well, you get a binomial distribution. This a way to show the outcomes of the data and their various probabilities. It looks like this:
If that looks familiar, well hold on a moment. The key to remember right now is that this type of distribution is the “discrete” scenario where you have a limited number of tests. In the real world, with natural phenomena, you have a continuous range. And that looks different.
Distribution Shape 4: The Normal Distribution
You’ve heard of this one, haven’t you? The chart that shows a peak in the middle, that tapers out to it’s ends? Of course you have. It’s called normal because it is so widely taught, but as I was looking for information, I was reminded that technically this is a “Gaussian” distribution. Here’s from the Wikipedia page that captures the ideal normal distribution.
However, for how common it is, it very rarely occurs perfectly in nature. The classic example is height. Here’s that ur-example:
The funny thing about showing real values is that you can see this isn’t a perfectly even normal distribution. And I pulled this from the US census (and then the link broke on me).
To explain, the x-values, along the bottom axis show the various heights we’ve measured. So we start at five foot four and continue to six feet four inches. The left hand axis, the y-value, is the output which in this case is the count of the sample of people. Or it would be, except in this case it’s already been converted to a percentage to show the population of America.
The results cluster around the middle of the range. So the vast majority of things are close together in around the average. This is why height is such a good explanation for the normal distribution The majority of men are around 5’10 in height, according to the above statistics. And the vast majority fall within 4’ inches of that range, between 5’7 and 6’2. The people who are much, much taller, say 6’8 and above, are very very rare.
The clustering around the mean average is what makes a normal distribution normal. As the Wikipedia example two above shows, 68% of things within one “standard deviation” of the mean. Standard deviation is a measure of how much a data set is spread out, which I probably should have mentioned earlier. In a normal distribution, really rare examples start at “3 standard deviations” from the mean average. So if something is “5 standard deviations” from the mean, like say seven foot tall men, it’s really, really rare.
Of course if somethings isn’t normally distribution, those same conclusions are less important.
Which is a good time to marry the caution I put at the start. I said that I would be using both distributions to show both probabilities and descriptions of data, and height shows how they interact. If you know the historical height of a group of people—and it is statistically significant, which is another stats topic for another day—then you can use the outcomes in the sample group to form probabilities which you can use to predict outcomes.
In other words, given that we know that less than 0.1% of people are greater than seven feet tall in the American population—and we have a sample of hundreds of millions showing this—we know that the odds that any baby born will be seven feet tall are extremely minuscule.
Distribution Shape 5: Variations on the Normal Distribution
The normal distribution can be tweaked in all sorts of ways. First, it can be either very skinny or very wide, as these charts show from Wikipedia.Second, the distribution can lean one way or the other. It could lean right or left. Here’s two examples of that, again from Wikipedia.The Most Important Distribution Shape for Entertainment: The Logarithmic Distribution
Well, I hope some of you are still with me. Cause here’s where the magic starts.
The final chart is for distributions that have variance that isn’t linear. It’s exponential. So the numbers at the tops aren’t multiples of sample at the bottom, they’re orders of magnitude larger. I call this, “logarithmic” distribution because it increases by orders of magnitude, most often exprssed in “base 10”.
(I say logarithmically distributed, even though technically that’s for discrete distributions. Also, some power-law distributions can turn into a normal distribution by adjusting the numbers to logarithms. Again, a lot of specifics that I won’t get into. I just want you to see the shape.)
Anyways, take a look at a logarithmic-exponential distribution and a Pareto distribution.
To explain one last time. The x-axis running left to right shows the various outcomes. This could be wealth owned. Or the population of cities. Or the value of oil reserves. Or the returns on owning various stocks. The y-axis is the probability of that occurring or the count of the observed phenomena in a sample. So most of people (say 80%) have hardly any wealth. Or most stocks return very little money. But a few at the far right of the distribution have an inordinate amount of wealth. Or a few stocks have incredible returns. (Apple, Amazon). Or have incredibly valuable oil fields (Saudi Arabia).
Or become massive blockbusters at the box office.
Tomorrow, I’m going to show you a bunch of examples in of this distribution, so that hopefully you never use the “average” in entertainment again.