Category: Data and Decision-Making

Read My Latest at Decider: “‘The Boys’ Is a Hit for Amazon, But What Does That Mean?”

Last week, I threw up a quick Twitter thread on The Boys and I just turned it into a full article for Decider.  (And it’s short for me, about 800 words.)

So take a read and share on social media! Appreciate it in advance.

Of course, trying to judge if a series is performing well or poorly is NOT simple. And as I found some new data sources, I had thoughts that got cut from the final article. (As always.) So here’s the rest of the story, including a broadcast comparison, how I think about managing messy data sets and the rest of Amazon Studios datecdotes.

Introduction – A BH90210 Comparison

Initially, I was going to compare The Boys to BH90210, the Beverly Hills, 90210 revival that was off to a good start last week. Here’s the Variety quote on its success:

Image 1 - Variety TV Rating

That’s good! Or is it bad? I mean, is 3.8 million people watching good? Honestly, with broadcast we don’t know since a show like Night Court used to get 20 million viewers in the 1980s, and The Big Bang Theory—the biggest show on broadcast in 2019—didn’t even get that for its finale. (Fine, it did with DVR viewing.) No seriously, here are the ratings for Night Court:

Image 2 - Night Court Ratings

So which was bigger, BH90210 or The Boys? To the Google Trends. Now, here’s the first look and you can say, “Well The Boys won”…

Image 3 - The Boys initial

But I told you Google Trends was finicky, didn’t I? The problem with a show like BH90210 is the title is super generic and derivative off another series. So here’s with a few other variations on that title.

Image 4 - GTrends Updated

Add them all up, and BH9210 was more in the consciousness than The Boys. Whether that translates to more viewers, I can’t say. But it provides some “broadcast to streaming” context.

Comment on Amazon Datecdotes

One of my favorite parts about writing and researching this article was it forced me to look up all of Amazon’s “datecdotes“. Which I’d been meaning to do since their last earnings report, where they again touted Emmy success while steadfastly avoiding numbers a la Netflix.

Now, why wouldn’t Amazon tell us good news? Well, the pro-Amazon case is they have all sorts of good news but are hoarding it for some advantage. That’s frankly BS. My rule of thumb with all large organizations—from the government to any corporation—is they share good news and hide/bury the bad.

The most basic assumption is that Amazon’s overall numbers are much, much smaller than Netflix, so they avoid specifics. Because if they did, they would look bad. That’s simple logic.

Anyways, here’s my Amazon datecdotes table, a la Netflix. Notably, I left out two other sets of numbers for space in my Decider piece. First, the Reuters leak from last year had aadditional details for Transparent and Good Girls Revolt. Second, last fall Amazon touted it’s NFL viewership numbers for Thursday Night Football:

Image 5 - AMZN Datecdotes

Some quick notes. For The Man In the High Castle, for example, we still don’t really know what 8 million viewers means. Is that over the lifetime, up to the point in time Reuters got the leak? Or some shorter time period? With data, that distinction is really important. 

Or take The Tick as a top five series for Amazon in 2017. That would worry me, given that as the IMDb data shows that series wasn’t even that popular. And it was in their top five? And now it and Sneaky Pete are off your platform? That would make make me think the other series are much much smaller than we imagine. (Sneaky Pete was also a Sony co-production. So the co-pro curse strikes again.)

Google Trends – The Boys Pessimistic Case

Read More

Introducing “Datecdotes”, When Streaming Companies Use Data to Win the PR Wars

Here are some fun stats. What do they tell us?

– Netflix over the summer had 80 million customer accounts watch one of their Netflix Original Romantic Comedies.

– Netflix had 20 million streams for The Christmas Chronicles over the last weekend.

– Amazon Prime/Video/Studios had 14.7 million total customers watch an NFL Thursday Night football game.

– Snapchat had over 10 million viewers watch a Snapchat Original show this year.

At first blush, that’s a lot of data. And it’s big! You know, in terms of size, in that 80 million sure is a lot of people.

But let’s count the actual numbers released. One. Two. Three. Four.

Four numbers is not “big data”, in the data science sense. Data doesn’t get “big” until you reach the hundreds of thousands of data points. In fact, some data scientists would say data doesn’t really get big until you have millions of data points with many, many categorical variables.

Alas, as we ponder the bare handful of data points above, if we really pause to think on them, we understand how little we’re being told. Take the journalism “Five W’s”, who, what, when, where and why. Most data can’t tell us the why—it’s implied—but in streaming video it can tell us the other four.

When streaming video companies release single data points, they usually only give us two of the five W’s. First, they give us the “who”—customer accounts, customers or monthly active users. And they give us the “where” in the broadest sense possible in that they give us the “global numbers”. But crucially they always omit the “what”. How many minutes were viewed per person? The “when” is also usually implied, but not explicitly stated, usually so that the numbers are as large as possible. In the case of The Christmas Chronicles, they gave us the “what”, but left out the why.

As a result, usually we can learn very little as competitors, observers or investors from these nuggets. A contrarian might say, look here, Entertainment Strategy Guy, you said in this very early article that you LOVE data. At least these companies are providing us some data.

Well, I’ll dust off a great quote from statistics to counter that,

“The plural of anecdotes is not data.”

Netflix, Amazon and Snapchat—who are just the three companies I’m picking on today, Twitter, Facebook, Twitch, Hulu and Youtube do this too—aren’t providing data, they’re giving us anecdotes. Selectively curated data-based anecdotes in the hopes—that are almost always granted—that unsuspecting and unquestioning news outlets will repeat to boost their perception among customers, Wall Street and competitors.

And we always fall for it.

See, the companies above aren’t choosing between one or two data points. Or even a couple of dozen. These companies are literally choosing between millions of potential data points, which make these numbers some of the most selective anecdotes you could possibly come across.

The analogy (and yes it is in the title) is the old saw about the iceberg. 10% of the ice floats above the sea, with an even larger 90% below the water. This is how it feels when a streaming company drops their knowledge on us.


With streaming video, the numbers are even more extreme. They have millions of customers watching tens of thousands of videos with at least a dozen or more categorical variables per interaction. We’re talking thousands of potential ways to meaningfully slice the data, and the companies pick one or two per quarter. Again, the plural of anecdotes isn’t data.


The line is so close to the top of the iceberg, it may as well not even be touching it. That’s how much data we don’t have access to.

I have a new name for this. Even if you have a data point, that still isn’t “data”. It’s an anecdote. It’s a “datecdote”, an anecdote of data. Interesting, but not enough to base decisions off of.

Netflix, we’ve been told, isn’t an entertainment company, they’re a product company that leverages huge amounts of data to deliver us our entertainment. Maybe that’s true, for internal work. But when it comes to PR? Netflix isn’t a data company. They’re an anecdote company. They’re a datecdote company.

I’ve spent a lot of the last week polishing an article digging deep into the second most recent Netflix datecdote. My main conclusion is that at conferences or on investor calls or when choosing to publish press releases, as journalists we need to push back. We need at least the five W’s, and we need at least comparisons to put these datecdotes in context. Without those, and this is controversial, we just shouldn’t publish their number. I’m realistic enough to know this won’t happen, but we’d know a lot more if we did.

Be-Twitch-ed: How On The Media Repeated a Bad Statistic and What We Can Learn From It

My favorite Chuck Klosterman rant is in his book Sex, Drugs and Cocoa Puffs about the phrase “apples to oranges”. In short, is anything actually more similar than apples and oranges? How is that a synonym for difference?

He finishes his rant with the line, “in every meaningful way, they’re virtually identical”. He’s right.

It’s a great line because when doing data analysis, this phrase comes up all the time. When you’re comparing two things, you need to keep as many variables the same as possible or it won’t be “apples-to-apples”. Even a small variable being off can make the conclusions drawn worthless. Ever since I first read Klosterman, I’ve tried to use the phrase “apples-to-hammers” since they truly are different.

The media compares “apples-to-hammers” all the time. Or they’re just bad with numbers. If you want to hear plenty of examples—or just become a better educated news consumer—then you need to listen to WNYC’s On The Media (OTM). Of all the podcasts/shows on entertainment/media/communications, I’d rank it number one, just ahead of KCRW’s The Business.

To take just one example, OTM reviewed a book many years back called Sex, Drugs and Body Counts that describes how the media often over-inflates, or abuses numbers when it comes to wars, crimes or deaths. I have a copy on my bookshelf. The moral of the segment on Sex, Drugs and Body Counts is to beware of a journalist bringing huge, sometimes unbelievable, numbers to tell a sexy narrative. (I can’t find a link to the original segment it was so long ago.)

So, ahem, I need to call out OTM specifically for doing the very thing they regularly decry. Last week, OTM had a great episode on Twitch, “Twitch and Shout” and the future of live-streaming video. They announced this project in their newsletter a few weeks back:

“We wanna tell you about a little experiment that we’re working on here at OTM. Have you heard of Twitch? It’s like the live streaming version of YouTube — if YouTube were obsessed with videos gamers. (It is.)…Well, over 200 million people watch this stuff. That’s more than HBO, Netflix, ESPN, and Hulu combined.”

They quoted a similar fact same number at around minute 13 of last week’s episode.

“It is a streaming network that has more viewers than HBO, ESPN, Netflix, and Hulu combined.”

I bolded those two sections because they sound unbelievable. Twitch is a bigger business than HBO, Netflix, ESPN and Hulu combined. Can you believe it?

Well, I don’t. Because it isn’t true. And because the analogy isn’t apples-to-apples.

Without meaning to, On The Media provided me a great example of how “data”, or more precisely, “an interesting factoid” can be misinterpreted. Today, I’ll break down how OTM was led astray and how we should interrogate data better. Tomorrow, I’ll tackle some other thoughts from the episode and their business implications.

Where the Bad Fact Came From

Let’s start with the fact that OTM didn’t hire a consultancy to measure the number of viewers across all these different platforms. Instead, they likely started with the internet. In this case, OTM found their statistics from a website called, “DOT Esports”, an e-sports news website. Here’s the key quote:

Which service has more viewers, Netflix or Twitch? Turns out it’s the latter. A new report reveals that more people watch online gaming videos than HBO, Netflix, ESPN, and Hulu all combined together.

The “new report” is key. That comes from a company called SuperData Research, also a company specializing in video games and e-sports. So we have to acknowledge right off the bat that both of the sources of this fact are heavily biased towards showing how large and influential their audience is. (No industry body or news source under-hypes its potential.) This is the exact same motivation that was in Sex, Drugs and Body Counts discussed when non-profits or government agencies use big numbers to bolster their own importance.

It seems that after publishing this report—likely accompanied by a press release—this hard to believe fact was repeated on multiple gaming and entertainment websites. Then, these websites were quoted by at least some TV stations. These quotes were found by OTM and repeated without being challenged.

The Bad Fact Itself Isn’t Even True

Reread the quote above and then check out the DOT Esports headline:

“Report shows Twitch audience bigger than HBO’s and Netflix’s”

Note that DOT Esports says that it isn’t that more people watching Twitch then watch HBO or Netflix, but that the total size of the audience of “online gaming videos” is bigger than HBO or Netflix etc.

Indeed, as the chart on DOT Esports shows, saying Twitch has more viewers than HBO, Netflix, ESPN is just…wrong. Here’s my table version of DOTeSports chart from 2016:

Twitch Table

Even by the most generous measurement to Twitch, the statement is just false. A combined 325 million people subscribed to one of the four platforms mentioned above; in 2016 Twitch only had 185 million unique visitors. (I haven’t found 2017 unique visitors for Twitch or I’d report that. Given how well Twitch is doing, it’s strange they don’t release this information.)

Of course, in the last few paragraphs I’ve used visitors, people, views, uniques and subscribers interchangeably. And that finally gets us back to the introduction. Even if the Twitch did have more people visiting it then HBO, Netflix, etc, it still wouldn’t be true because the comparison isn’t apples-to-apples.

Apples-to-Hammers Comparison 1: Viewers aren’t views aren’t subscribers

You can see this really clearly in the DOTeSports article when they compare the numbers:

By the year’s end, 185 million people watched gaming videos on Twitch during 2016, with 517 million checking out videos on YouTube. In comparison, HBO had an estimated 130 million subscribers in 2016, with Netflix clocking in at 93 million.

Did you catch the sleight of hand in the above paragraph? The paragraph went from “people watched” Twitch to “subscribers”. Is there a difference? Oh heck yeah. By the time it got to OTM, they changed it to “people” from either viewers or subscribers.

I spent a lot of time at my former employer fighting a losing battle to use terms clearly when it came to our customers. The difference between a stream and a viewer and a unique viewer and what not. This wasn’t an exercise in pedantry; it was vital to the business. I worked really hard so that we didn’t compare things apples-to-hammers and make bad decisions as a result.

So let’s provide a too brief set of definitions. Basically, I’d define the key terms in, roughly, descending order of difficulty to achieve. A view is anytime someone starts watching something. A viewer is the person watching. Does this mean a “viewer” can have multiple “views”? Yes, if they watch multiple videos or the same video multiple times. (So yes, someone could watch a Youtube video multiple times and get multiple views.)

A unique viewer is just charting how many visitors tuned in over a given time period, without counting anyone twice. It is basically saying, “over this time period, we’re only counting this person once, making them unique”. It doesn’t matter if they watch something multiple times, it’s just a “unique viewer”. Even if they only tune in for two seconds, they can still be a “unique viewer” on some websites. This allows websites to measure the number of people showing up on a given day, leading to common metrics like “daily active users” or “monthly active users”. (Though even these can be an average so it depends how you measure it.)

The main problem is that while SuperData Research counted Twitch’s visitors over a year, they aren’t counting ESPN, HBO, Netflix and so on the same way. Way more people watch HBO then subscribe to it; that’s basically a fact. Think about it, do you watch Game of Thrones in a group? Then you would count as, say “five viewers” but only “one subscriber”. This is why this isn’t apples-to-apples. Of course, some people may subscribe to HBO or Netflix and never end up watching, even for an entire year. On the other hand, some kids may watch on their parents accounts without subscribing. On the downside for Twitch, who knows how many ofTwitch’s unique visitors  show up for one day and never return? (The latest daily active users for Twitch is 15 million people globally, according to their data.)

Apples-to-Hammers Comparison 2: Subscribers aren’t Unique Visitors, they’re better

That story is also illustrative because being a unique viewer is such a low cost proposition. And again, this is HUGELY important. Twitch and Youtube have “you” as the product, a phrase popularized post-Facebook’s Cambridge Analytica troubles. Twitch sells advertising, so the goal is to get as many total viewers as possible to sell against. (Indeed, they even have a profit motive to inflate all their viewership numbers as much as possible.)

Subscribers are paying per month. That’s incredibly more valuable. So saying one service can get 130 million people around the globe to pay them versus 185 million who may tune in once? Those aren’t nearly as comparable as they seem. Twitch or Youtubedon’t require a credit card to sign up. Just an email address and a log-in. Actually, you don’t even need that to watch the videos, only to comment in the chat room. So again, subscribers aren’t anything like unique viewers or visitors.

Apples-to-Hammers Comparison 3: Geography also matters

This could be the biggest flaw in this analysis.

Twitch does not have geo-filtering meaning its content is available globally, including in China.

ESPN is only available in the US. 
Hulu is only available in two countries. HBO is available pretty broadly, while also being heavily pirated, and it also has content partnerships, which keep its subscriber count lower, like a partnership with Tencent. Netflix is excluded from China.

It’s worth repeating that last point. Until a recent crackdown in China on live-streaming video, Twitch was available in that billion person country. Netflix has not launched in China because the government won’t let it. A lot of the success of streaming video games comes from other countries.

This isn’t to say that Twitch’s success in China, Korea and Japan (and other countries where gaming is excessively popular) isn’t noteworthy. It definitely is. But it doesn’t really make sense to compare the different services/platform without keeping this variable equal, right? Now, I would love to do a comparison between Twitch and HBO/Netflix/ESPN/Hulu, and for those four companies I could find U.S. subscriber counts, but here’s the fact about Twitch: they don’t release US viewership data.

In fact, one of the weirdest things about Twitch is how finicky it is at presenting it’s own data. If they were super confident in their data, they’d release a ton of it in table form for us to pour over. Instead, Twitch’s advertising site is a selection of data points pulled at seeming random, put next to images that aren’t related. My favorite is the fact that they have “15 million daily active users” put below a picture of the United States. Note, they didn’t say 15 million DAUs in the US, but they want you to think that, don’t they?

Beware the unbelievable factoid; it’s probably not believable

Is Twitch big? Yes. Is it growing? Yes. Is it new and different? Yes.

But just because those questions are true doesn’t make extreme sentences that Twitch is bigger than HBO, Netflix, ESPN and Hulu necessary.

In general, the more a story defies belief, the more we should disbelieve it. Or at least ask where it comes from. The point of a sentence like the one driving this story is to make someone say, “Wow, I know tons of people who subscribe to HBO or Netflix (like myself!) but I’ve never watched Twitch! That must mean a lot of people are watching who I don’t know. I’m out of the loop.”

But you aren’t out of the loop. The statement, “the fact”, is wrong. But I don’t blame OTM too much. They put together a great product every week and some “facts” are so widely distributed it’s hard to believe they aren’t true

Why You Can’t Use Data to Predict Hit TV Series Either

A few weeks back, I explained why “small sample size” dooms any effort to use big data to predict box office performance of feature films. But what about TV shows? What about streaming services? Can’t they use advanced algorithms to predict success there?


As “No, Seriously, Why Don’t You Use Data to Make Movies?” explained in  a “mini-statistics lesson” how small sample size and multiple variables combine to make forecasting very inaccurate in movies. Today, I want to take the lessons of that article and apply it to making TV shows in the streaming era.

Here are the key reasons why “big data” can’t solve making hit TV shows.

1. It’s also data poor environment.

To start, TV has long had fewer data points than feature films. Only recently did the number of scripted TV seasons pass feature films (depending how you count it). Currently, there are over 500 scripted TV series per year in the US. As I wrote last time, that’s still a small sample size.

2. It’s even smaller when you factor in returning series.

Most new “seasons” aren’t brand new, they’re returning seasons of TV series that have been on for several years. That kills your forecasting model.

Take Game of Thrones season 8. Yes, you could call “season 8” a unique data point to study. But with TV shows, to have an accurate model, you’d need to introduce a categorical variable, “has had a previously successful season”. The answer for Game of Thrones for that categorical variables is “Yes!” In other words, it’s super easy to predict that subsequent seasons of Game of Thrones and The Walking Dead will have high ratings because their previous seasons had high ratings. (Though not always.)

The challenge is predicting successful new shows, and that data set is much much smaller than the 400 or so scripted seasons produced every year.

3. The number of categorical variables for a TV show at “pitch” is near infinite.

When a TV show is being pitched or is at the script stage, it has a huge number of categorical variables still in flux. Each of these could influence the final independent variable, which is viewership (depending on if you’re a network or streaming platform you could define this multiple ways).

Everything from the director who ultimately directs the first episode or the acting talent who signs on to the story plan for season one could impact the ratings. Even variables most studios don’t care about like “who is the production manager?” or “can the showrunner manage a room of people?” are categorical variables that could affect the final outcome. Without a large sample size, it’s just tough to predict anything. (And some of them are super hard or very, very difficult to quantify.)

Many good or great scripts or TV pitches become bad TV series. For a lot of reasons that don’t have to do with the script. This is why “algorithms” can’t predict things with high confidence. This explanation also definitely applies to feature films, though I didn’t mention it last time.

4. Most pitches/scripts/pilots will never get made. Hence no “dependent variable”.

Most claims to use advanced metrics or analytics or data to pick TV series utterly discount this key fact. Sure you get thousands of pitches and scripts to read, but they don’t become TV series. Replace “dependent variable” with “performance” and you see the challenge. You have three scripts, and you pick one to become a pilot. The other two scripts don’t get made into to TV shows. So can we use them in our equation for forecast success? No, because they don’t have the same dependent variable to allow us to use them as data. All you can say is you didn’t make them into TV shows. But that’s not a data point.

5. Finally, most of the time, you can only control your own decisions.

The best way to control a data-driven process is to own all the data. And for a TV studio or streaming service, that means understanding all the decisions that went into making a TV show. So, if you don’t make it yourself, well, you can’t really understand what decisions were made. So for a streaming service, that “n” is very, very small.

So let’s use Netflix as an example. They made what, eighty TV shows to date? (Not counting the international productions, that again are their own categorical variables.) So the maximum for their sample size is eighty. Break it down even further by separating kids shows from adult shows and previous IP versus new IP and then you can break it down by genre. You see where I am going with this. The “n” is dwindling rapidly.

What about all the customer viewing data they had from the TV shows on their platform? Well, it doesn’t give Netflix that much of an advantage of traditional networks. Even if traditional networks don’t have Netflix streaming data specifically, they have Nielsen TV viewing data and box office data. Netflix uses that data too.

Which isn’t to say Netflix doesn’t have tons of data and doesn’t use it a lot. But they don’t use it to “pick TV shows” they use it broadly. The “data analysis” that Netflix does is pretty simple: it sees what is popular with its user base. So do traditional TV networks and studios. And what has Netflix learned? People like broad based comedies and dramas featuring crime and/or police. It also knows some people like quirky comedies and some others like arty-shows. (Netflix’ key advantage is it just pays a lot more for the same shows with less opportunity to monetize. That’s a problem for another post.)

So is Netflix is “using data” to decide on TV shows? Yes, but it isn’t that much better than the rest of the industry. Do they have an algorithm that tells them which shows will do well on their platform? Yes, but it is wrong a lot of the time.

No, Seriously, Why Don’t You Use Data to Make Movies?

If you want to know the “holy grail” for data scientists, I’ll tell you:

Predicting box office performance of movie scripts.

Here’s how it goes. An aspiring data scientist—ranging from bright undergraduate in computer science to a Ph.D. candidate in statistics to even tenured professors—looks for a new topic. They’re bored by analyzing mortgage applications and discover that no one is very good at predicting box office for movies. So they say to themselves, “I can do that.”

Sometimes they even create a model and/or publish papers. Then they go to the Hollywood studios and claim they can use an analysis of a script to predict box office success. Often this is touted alongside advanced analytics, machine learning and neural networks, or other similarly jargon.

We shouldn’t shame these data scientists for trying, though. I mean, the executives at streaming services like Netflix and Amazon Studios/Prime/Video claim they can also use complicated algorithms to predict how well they pick TV shows or movies. Both those streaming video platforms are constantly asked about—and they in turn release vague hints about—the data and algorithms they use to pick TV series.

I have also fielded those types of questions since I helped work on strategy at a streaming platform with tons of data, as I mentioned in my second post “Theme 1: It’s about decision-making, not data”. It typically went, “With all the customer viewing data, how did you use that to pick TV shows?” In my initial post, I specifically didn’t answer the question, but went off on a tangent.

But it is worth answering, because it will illuminate a common Entertainment Strategy Guy theme, “Be skeptical”. In this case, “Be skeptical” of the streaming services claiming they have esoteric data knowledge and the entertainment journalists who let them repeat this.

Of course, I don’t blame the executives per se for claiming they have complicated algorithms. I blame the journalists who repeat it without questioning it. These media members don’t probe that audacious statement. A quick push will reveal those statements to be a house of cards, if you will. (Wow, brutal pun.) In reality, Netflix/Amazon/Hulu/other streaming services and traditional studios don’t have enough data to actually use data to help them make decisions.

So let’s push back, just a bit.

Read More