I Tested Whether Screenwriters Are Using AI Right Now…

Here's what I found out and what it means for Hollywood

(Welcome to the Entertainment Strategy Guy, a newsletter on the entertainment industry and business strategy. I write a weekly Streaming Ratings Report and a bi-weekly strategy column, along with occasional deep dives into other topics, like today’s article. Please subscribe.)

As I wrote in the last streaming ratings report, I love getting null results. If every data pull results in an amazing takeaway, something is wrong. Same goes for unexpected results. If every piece of data analysis you do confirms your prior expectations, well, you’re probably not doing data analysis right!

Today, I have one such null result, and it’s actually delightful news (or it’s delightful if you’re on the “Humans should make art” side of things, as I am):

I can’t find any evidence that the highest quality screenplays in Hollywood are being written by LLMs.

Sure, it’s a null result. And I can’t prove that no screenwriters are using LLMs, since you can’t prove a negative. But I can tell you that I couldn’t find evidence for it. And I tested this hypothesis four different ways.

(Today’s article is free for all, but research like this takes time. Literally, my editor/researcher spent multiple days compiling the evidence, testing it, and more, then writing the article, getting new ideas, exploring more research paths, questioning every assumption I’ve ever made, then editing it, took even more time. A lot of time. If you appreciate this work, please subscribe.)

Setting the Table

For a couple years now, I’ve been hearing from my readers and screenwriter friends that some screenwriters are using LLMs, but it was never definitive; it was always, “Other people are using it.” This especially came up during the WGA strike in 2023, leading to the writers and studios making an agreement that every movie and TV show had to be “authored” by a human, meaning an LLM couldn’t get credit for writing a script.

If you asked me before last week, I would have guessed that there’s a pretty high chance professional screenwriters had started using LLMs to write parts of their scripts. After all, we know that other writers are using LLMs. College students are generating essays on masse via LLMs. ChatGPT usage literally goes up in September when school starts. In another example, Clarksworld, the sci-fi and fantasy magazine, had to shut down online submissions in 2023 after they were getting inundated with AI-generated slop short stories.

Then last year, I read a blog post by Kevin Drum about how scientists are using AI/LLMs to help them write their papers. How do we know? Well, LLMs tend to overuse the same words and, sure enough, you can see the increased word usage over the year:

That got me to thinking. Could I find the same trend in professionally written screenplays? Luckily, we have one such database: the Black List.

What’s the Black List, you ask? It’s a list of some of the best unproduced screenplays floating around Hollywood. I’ll let the Black List itself explain:

“The Black List was compiled from the suggestions of more than 375 film executives, each of whom contributed the names of up to ten favorite feature film screenplays that were written in, or are somehow uniquely associated with, 2023 and will not have begun principal photography during this calendar year.

This year, scripts had to receive at least seven mentions to be included on the Black List.

It has been said many times, but it’s worth repeating:

The Black List is not a “best of” list.

It is, at best, a “most liked” list.”

Modesty aside, I think the Black List is an excellent snapshot of the best screenwriters have to offer Hollywood. Just scan those past years; quite a few of these scripts got made and/or won awards. It’s a great barometer for how the Hollywood community defines great screenwriting.

So let’s get right to it. Are professional screenwriters using LLMs to write scripts?

Test #1: Word Frequency

First off, I checked screenplays that made the 2022 to 2024 Black List, comparing them to 2016 to 2020 Black List scripts. (These scripts are available online in Google Drive folders that you can find searching Reddit. I wasn’t able to find scripts from the 2021 list.)

First, I compiled a list of 41 words, including the ten words that Kevin Drum referenced from this article. I also included more words from an article I read on LLMs and short story writing. Then ChatGPT recommended six more words, so I asked it for even more, resulting in 41 words or phrases often overused by LLMs. Then, using an in-browser programming tool with an assist from an LLM, I made a word frequency dataset for every Black List screenplay (minus a handful of PDFs that had data processing issues, scattered evenly across the years).

(If it seems like I’m explaining my process a lot, I am! Every data journalist should do this! Show your work!)

Here are the results for words that showed up more than 20 times over this whole time period:

Bottom line: There’s just no pattern. Period.

No matter how you cut the data (cutting out more scientific-sounding words, for example), there’s little evidence that screenwriters are using LLMs, at least as evidenced by the words that LLMs tend to use. I’m not even sure I could cherry-pick examples to “prove” screenwriters are using LLMs to help write scripts; no trend-lines match the introduction of better AI models that would have influenced 2023 or 2024 screenplays.

Here’s another piece of evidence. Just look at the 2024 Black List screenplays by overused words compared to 2016’s scripts:

That’s basically a bell curve. (Technically a left skewing normal distribution.) A random distribution of words across the sample set, with no outliers. This is exactly what you’d expect to see if LLMs were NOT infecting the process.

In fact, if you look at the scripts that scored the highest, the words they overused (“whispers” and “shadows”) were used in normal ways, organic to the storytelling. (I checked.) If, for example, you’re writing a horror script about people hiding from a serial killer, you’re going to use the word “whispers” a lot.

Now, let’s go over some possible counter-arguments, to really play Devil’s advocate for this analysis being wrong:

Maybe LLMs overuse certain words in academic contexts or prose writing, but screenwriting is its own medium, so LLMs overuse different kinds of words.
Maybe a simple word frequency analysis doesn’t take into account the different ways words can be used, like “whispers” or “shadows”.
Maybe different verb tenses would change this analysis, like “flickers” vs. “flickering”.
Maybe these scripts were all written before 2022, and have floated around Hollywood for years. Sure, that may apply to some of these scripts, but by 2024, I’d guess that most of them were written post-AI boom. Also, most Black List scripts don’t have dates on them, but the ones that do are dated to 2024 and I found one from 2023.

But I tested 40 different words; some of these examples would pop if AI were involved. A phrase like “light pools” would be the exact sort of phrase you’d expect to find, and I didn’t.

Trust me, when LLMs write screenplays, they definitely overuse these words. I know because I tested it. Researching this article, I found this excellent write-up in PC Mag, “I Used AI to Write a Screenplay. Here’s What I Got”. Let’s go to the first example:

Literally, in the first paragraph, this LLM tool used the word “metropolis”, an overused LLM word. It also uses “dazzling”, which wasn’t a word I tested but I’d guess is an overused LLM word.

Of course, I ran my own test, resulting in this LLM-generated sample:

“INT. DETECTIVE’S OFFICE – NIGHT

A naked lightbulb hums overhead, swaying faintly like it’s been watching for years. The room is a coffin of shadows: battered filing cabinets, a coat rack that’s lost its shoulders, Venetian blinds choking the view of a city drowning in neon.”

Literally, in the first paragraph, my LLM used a trigger word, “shadows”, and not in a natural way organic to the story, but in a metaphor! A metaphor that I should mention doesn’t actually make any sense. What’s a “coffin of shadows”?

Most scripts average ten overused LLM words per screenplay. In two quick examples, LLMs averaged one…in their first paragraph!

Now, I can’t prove that none of the Black List screenwriters have used an LLM once in the writing process. (And I’ll elaborate more on this later.) But I would expect that if even 25% of screenwriters were using LLMs, you’d see some sort of effect. One more counter-argument: maybe LLMs are writing some dialogue and action lines, but screenwriters are extensively rewriting that output. Maybe! I can’t prove they’re not, but it seems unlikely to me and not really different than just, you know, writing.

Based on this test, I’d argue that Black List screenplays didn’t use LLMs.

Test #2: Chat GPT Analysis

I won’t lie. If I had found a connection up above, you’d be reading an entirely different article, and I probably wouldn’t have sought out much more evidence. But since my first result came up with nothing (Again, yay! I’m team “humans should create art”. Let’s automate the bad jobs, not the ones people dream of doing), I wanted to keep testing.

So I turned to an LLM, asking the LLM if it could test screenplays for evidence of LLM usage. My LLM identified five red flags:

1. “Tell word” density
– Count how often certain high-literary, LLM-favored words/phrases appear, normalized per 10,000 words so longer scripts aren’t punished.

2. Repetitive phrasing
– Flag if exact sentence fragments (≥4 words) occur more than twice in a script without being an intentional callback (e.g., chorus lines).

3. Overly poetic action lines
– Check for high density of similes/metaphors in description lines (“like a…,” “as if…”), which can be a sign of text-model flourish.

4. Dialogue on-the-nose-ness
– Rough check: dialogue lines that are complete, grammatically correct sentences vs. ones with interruptions, ellipses, or incomplete thoughts.
– AI often writes dialogue with fewer conversational “glitches.”

5. Uniformity of style
– Compare the variance in sentence length across the script; LLM writing tends to have narrower variance.

So I ran that analysis, and it made three HUGE mistakes:

First, I ran the same analysis, twice, on the same dataset, and it gave me two completely different results.
Second, my LLM couldn’t output how it calculated the “AI score” it gave each script.
Finally, it identified 2016 Blacklist scripts as having the same likelihood of being written by AI as the 2024 Black List scripts.

The LLM guessed that Free Guy, the 2021 Ryan Reynolds action comedy, could have been written by an LLM. You should be ashamed, Matt Leiberman! (Sarcasm.) Since LLMs didn’t exist then, Free Guy couldn’t have been written by AI. Ergo, this wasn’t great analysis.

Also, let me re-emphasize, you shouldn’t be able to run the same analysis twice and get two different results.

Bottom line: We can’t use mainstream LLMs to identify LLM screenplays, mainly due to hallucinations and inability to replicate results.

Test #3: Online LLM Detectors

Now, to be fair, one script did get flagged twice by my LLM as being much more likely than any other script to be written by AI. So I figured I’d test it. I went to four different online LLM detectors—Grammarly, ChatGPT0, Scribbr, Quillbot—to see if that script was written by LLMs and…they said it wasn’t written by AI.

How confident am I in this analysis? Not as confident as the other tests. I reached out to Franklin Leonard (who runs both the Black List and an accompanying site where screenwriters can submit scripts), and he told me, “I haven’t seen any tools that can reliably diagnose whether writing has been written with the aid of AI.”

Also, I tested a screenplay, not an essay, so perhaps the AI/LLM detectors aren’t attuned to this medium.

Test #4: Analyzing the one potentially AI Script

My LLM offered to analyze the one offending script to point out evidence that it was written by an LLM. So I took it up on its offer. The first piece of evidence—overusing LLM words—resulted in my LLM claiming the script used words that it didn’t, a.k.a. a hallucination. So I forced my LLM to specify the words it used and…all of the overused words made story-sense except for one.

On point #2, that script did use repeat phrases like “If X were Y, then…” like “If looks were currency, he’d be a millionaire,” multiple times. (That’s a made-up example, not from the script.) And on point #3, checking for similes, yeah, to be honest, this AI-flagged screenplay overused similes throughout, and many of them are incredibly cliched. And it repeats the exact same similes a couple of times. Those are huge red flags.

But on the other two points, my LLM basically repeated the same lines it had already flagged as problematic from points #2 and #3, which were partially the same anyway. In other words, my LLM kept used the same bad sentences over and over as evidence of the script being AI.

The end verdict? It seems like this writer loves writing over-the-top descriptions and hyperbolic metaphors and similes…which basically makes him an early-2010s screenwriter. My editor/researcher, a former script reader, doesn’t think this script’s writing style was unusual at all.

But he couldn’t say the same for the LLM-produced sample. Take the first four paragraphs from that:

“A naked lightbulb hums overhead, swaying faintly like it’s been watching for years. The room is a coffin of shadows: battered filing cabinets, a coat rack that’s lost its shoulders, Venetian blinds choking the view of a city drowning in neon.

On the desk: a half-empty bottle of rye, an ashtray loaded with last week’s regrets, and a revolver that hasn’t forgotten what it’s for.

At the desk sits FRANK MALONE (late 40s), private detective. His suit has been pressed more by the years than by an iron. The scar that bisects his cheek tells one story; his eyes tell every other.

Frank drags on a cigarette, exhales slow. The smoke curls like it’s searching for an exit. He opens a drawer, pulls out a yellowed photograph: a woman, smiling in black and white. He stares at it until the silence grows teeth.”

There’s literally a simile, metaphor or personification in every sentence. Every one! And many of them don’t make sense. How can a coat rack lose its shoulders? What does that even mean? I asked my LLM to explain it to me, and it literally broke.

And you can see that the sample uses repetitive phrasing, plus every paragraph is the same size. Basically, my LLM-written scripts clearly failed every test the LLM proposed in a way the possibly-AI-written Black List screenplay did not.

The example from PC Mag isn’t as bad, but it has many of the same problems, like repeating phrases; two different characters have a metaphor “in their eye” for example.

LLMs Are Not Elite at Screenwriting Right Now

As I always write about when I write about AI, you need to specify what you’re discussing, either competency or ethics. My focus today is on competency. (Ethical issues like the future of work or copyright infringement are a whole other thing.)

This is just a look at the most elite screenwriting production. And as far as I can tell, there’s no evidence, right now, that elite screenwriters are using LLMs. It’s safe to say that LLMs didn’t “pass” this test. (Which, again, is great news for screenwriters!)

Now, do I think some screenwriters are using LLMs are some steps in the process? Absolutely! But let’s define who we’re talking about…

On the one hand, some amateur screenwriters are almost certainly using LLMs to write their screenplays. Being blunt, there might be millions of Americans out there who want to write screenplays, because they have a great idea for a movie or dream of Hollywood stardom, but most of them are terrible writers. In the past, that was a dealbreaker. Today, with ChatGPT, they can shortcut the process. With all the evidence that college students, scientists and short story writers are using LLMs, I have no doubt that aspiring, amateur screenwriters are certainly using it. If just a fraction of aspiring screenwriters are using LLMs, tens of thousands of people are. Related: the Nicholls screenwriting competition capped the number of entries this year. Hmmm.
On the other hand, professional screenwriters might be using LLMs for some steps in the process. They’re probably using LLMs to outline scripts, look at story beats, maybe getting feedback, definitely help with spelling and proofreading mistakes. If I were a screenwriter, would I get story feedback from an LLM? Maybe, but I’d take any LLM-generated feedback with a grain of salt. (As Franklin Leonard told Movie Maker magazine last year, the Black List website forbids its readers from using an AI tools in evaluating screenplays.) And I think outlining is best done by hand to help put the story in your brain, but sure, I’m sure some screenwriters are using these tools in that way. (Since the WGA had about 5,000 members report earnings last year, with maybe another 5,000 on the cusp of breaking in or falling out, estimate the size of this group at about 10,000 people. Or the top 1-5% of screenwriters.)

My editor/researcher worked as a frontline script reader for a major studio for over ten years and did freelance screenplay consulting. Based on the AI-generated scripts he read, he’d place LLM-generated screenplays right between these two groups.

LLMs write better than new screenwriters. Take that example from my LLM. It’s not good, but it’s better than what a brand new screenwriter can produce because LLMs don’t make basic writing/usage/clarity mistakes. If LLMs know how to do anything, it’s ordering English-language words properly; that’s literally what they’ve been trained to do. LLMs can do that.
But LLMs are worse than professional screenwriters, as the examples above clearly show, due to the cliches and overly-dramatic prose.

I also want to point out that this process—using ChatGPT to write a first draft—is NOT efficient. Recently, studies have come out asking LLM users to estimate the time they saved using ChatGPT, versus how much time they actually saved, and they didn’t gain any time. In this case, the same thing applies. Sure, ChatGPT can instantly output 1,000 words of grammatically flawless screenwriting, but to fix the opening paragraphs alone (like, say, not using a simile or metaphor in literally every sentence) is going to require so much rewriting that, after a certain point, you actually lose time using a LLM. True, you can then ask Chat GPT to strip out the similes—I did—but what’s left is lifeless, and the details no longer make sense.

It’d be quicker to write the damn thing in the first place.

The Entertainment Strategy Guy

Former strategy and business development guy at a major streaming company. But I like writing more than sending email, so I launched this website to share what I know.