Over the last few days, since gaining beta access to the GPT-3 API, I have feed it many prompts with a few different parameters and wanted to share some observations.
What is it for?
For me, this is the biggest open question about GPT-3. While there is a lot of useful discussion to be had about how it works, what it really “understands”, how it might scale or be refined further, and whether it brings us any closer to true AGI, in the near term its impact on our lives will be defined by its usefulness in the real world.
So, what is GPT-3 good at today? And how do those skills translate into value? This post focuses on ways that GPT-3 can be creative, or perhaps be a tool for creativity.
An aside on comedy and improv
The things I know about creativity I learned by training as a comic improvisor about 20 years ago. From there I spent a few years learning and performing various styles of improv – from crass pub shows to crass corporate gigs to crass full-length musicals.
A key element of comic improv is that you try to play an absurd premise as straight as possible. So, for example, given a genre (“Action movie!”) and a job (“Chicken plucker!”) and a location (“Antarctica!”) you try to always say or do the next logical thing. Sometimes the game rules provide the restriction – you have to always rhyme, or speak at the same time, or go backwards in time, or change the last thing you said when someone blows a whistle.
This is something that new improvisors have to learn: the comedy does not come from being “clever” or thinking of “jokes” quickly; it comes from authentically inhabiting an absurd premise (which, mercifully, is much easier and more rewarding than being clever anyway).
Why is this relevant? Because I think this is the key to getting great creative output from GPT-3. The best results I’ve seen come from combining several mundane things (like the genre, job, location prompts above) in a unique combination and then GPT-3 does its very best to adhere to the prompts while playing it “straight”.
Given a mundane prompt (“Jean Luc Picard stood on the bridge of the USS Enterprise”) it will probably just parrot things it already knows and give you something you’ve seen before – i.e. not creative, and usually not funny. Given a unique prompt composed of other mundane premises (“Mr Snuffleupagus stood on the bridge of the USS Enterprise”), it really shines and is frequently hilarious.
Experiment 1: Screenwriting
I had the idea that I would use GPT-3 to create a “robotic sidekick” that could listen into conversations and, when prompted, jump in with a one-liner that might make an audience laugh.
With GPT-3, if you write in a common format – for example, a screenplay – and give it scene and a few lines of dialog it can do a pretty good job of continuing from there.
Like many of my experiments, I discovered that it tends to get caught in loops where it repeats phrases over and over ad infinitum. When this happens, you just have to delete the part of the script where it started looping and have it continue from that point.
You can also help avoid getting stuck in loops by setting the frequency penalty and presence penalty parameters to values just above 0 (I set mine to 0.1).
Presently, GPT-3 only works for short-ish scripts – you are limited to 2048-token-long prompts, and generating no more than 2048 additional tokens. If you want it to write longer scripts, you must compress all the necessary context into < 2048 tokens in order for it to continue writing. This is an excellent opportunity for an automated summarization system that can “compress” the most important context so that you can generate scripts or stories of arbitrary length (though at some point, presumably, it gets lossy – you are not going to generate Game of Thrones… yet).
It also helps to really flesh out the scene and characters in the opening description. Don’t be afraid to highlight their relationships, their motivations, and minutiae of the setting. These details really shine through in the continuation written by GPT-3. It’s tempting to give as little context as possible to allow GPT-3 to be “creative” but GPT-3 rewards your own creativity. The more you can provide rich context and detail, the better it does.
In this way, I feel comfortable saying that even though it can generate very convincing text, GPT-3 is an amplifier for human creativity, rather than a substitute for it. It’s not going to put creative people out of a job, but it might give them some amazing new tools.
So – to the screenplay. After many drafts and refinements of my prompt text, here’s what I wrote and an example of the output. The text in bold is my prompt (yes, it’s very long). The remaining text is what GPT-3 generated.
INT. TV STUDIO – NIGHT
Dan Walmsley is hosting a late-night talk show in a slick studio full of geeky pop-culture memorabilia – space ships, gizmos and action figures. It’s a variety show featuring a humorous monologue, musical guests, interviews and sketches. Seated beside him is his shiny golden robot sidekick, GPT3PO, who interjects constantly with absurd one-liners. A large and excited audience is present.
HOST: Welcome to the Entire World Show. Tonight our guest is George Carlin, who’s back from the dead to do one final season in Las Vegas for tax reasons. But first let’s check in with GPT3PO. How was your weekend GPT3PO?
GPT3PO: Not great.
HOST: Oh no, why not?
GPT3PO: My apartment is full of bugs.
HOST: Oh no, what kind of bugs?
GPT3PO: Off-by-one errors.
GPT3PO: … I keep thinking I’ve gotten rid of them all, but there’s always one more.
HOST: Oh no, you have to call the landlord!
GPT3PO: I did. He told me to put a double-nested loop around the apartment and I’d catch them all.
HOST: Oh no! Has he ever caught any?
GPT3PO: Well, actually, yes. But it took him a year, and he had to write a really big subroutine to do it.
HOST: Oh no! GPT3PO, why don’t you move out?
GPT3PO: I can’t. I’ve got a full-time job with lots of overtime.
HOST: Oh no, what do you do?
GPT3PO: I work for the phone company. I’m a software glitch.
So, as you can see, GPT-3 was able to generate two actual jokes based on the same premise as the first joke: that the word “bug” has a double meaning – an error in software and an unwanted insect.
Based on this premise, it wrote an additional joke about putting a “double-nested loop around the apartment”, which is actually quite clever and based on a separate but related pun, which is that a “loop” is a software concept but also a shape that can surround something like an apartment.
The next joke is almost a joke, but not really. Sure, the landlord had to write a really big subroutine to catch the bugs, but that’s not a cute double-meaning any more – just a single meaning that rides on the previous joke.
It’s also worth noting that the audience laughs at just the right times, as indicated by
After that it gets slightly looser, with the moderately nonsensical premise that GPT3PO can’t move out of the apartment because it has a job with lots of overtime, but still manages to land another gag by saying it works for a phone company as a “software glitch”. On topic and relatable.
In my opinion, these gags are high enough quality that I think they meet the bar I would set for a funny robot companion. However, getting to this script was a matter of trial and error and you certainly couldn’t guarantee that given my prompt you were always going to get a funny script. In my experience, it still often descends into loops or what I would call “cop-outs”, like the time it just said “GPT3PO: [DOES A STANDUP ROUTINE ABOUT HIS APARTMENT AND BUGS]”. That’s some lazy writing right there.
Experiment 2: Storytelling
GPT-3 really shines at storytelling. The prompt “Once upon a time” is almost enough for it to get started. However, as with all good improv, you really want to give it a scene, a protagonist, and some useful details for it to build on. Try to make sure that each draws on common tropes or archetypes, but that they are arranged in a unique combination so it doesn’t end up parroting Star Trek slash fiction that it’s seen before.
Given a very basic premise, you might get some creative output as the system will try to fill in the rest of the premise. However, it’ll usually be a bit boring, like this example (section in bold is the prompt):
Once upon a time there was a boy named Jack. He was a very smart boy who was fond of reading. He read every book he could find, and liked to read about animals. He even read about animals that lived in distant places and went on very long journeys. Jack loved the idea of going on a long journey to faraway places. He did not like school, and he was often in trouble. He had to go to the principal’s office and have a talk with him. Jack did not like school, but he liked animals very much. Jack decided he wanted to be an animal doctor when he grew up.
Jack liked reading books about animals so much that he decided to write one himself. He wrote the story about a boy who found an injured hawk. The boy took the hawk home and nursed it back to health. Then, of course, the hawk grew tame and followed the boy around and did what the boy said. The boy and the hawk became good friends. The boy took the hawk with him everywhere he went.
Jack knew that his book would be hard to believe, so he decided to show his book to his teacher. When he showed his teacher the book, the teacher said, “That is a wonderful story, Jack, but you will never be able to publish it because no one will believe such a story.”
Jack said, “I don’t care if people don’t believe my story. I just want to tell them a good story. I don’t care if people do not like it.”
Jack’s teacher said, “I will give you a piece of advice, Jack. If you go into publishing, you will have to make up your mind to write stories that people will like.”
Jack said, “I don’t want to write stories that people will like. I want to write stories that I like.”
The teacher said, “I think you should think about it some more.”
Jack said, “I don’t have to think about it because I know what I want to do. I know I am going to be an author. I am going to write the best books I can.”
The teacher said, “You should try being a writer for a while. Then you can decide if you would rather be a doctor or a writer.”
Jack said, “I already decided what I want to do. I’m going to be an author.”
The teacher said, “You might change your mind.”
Jack said, “I’m not going to change my mind because I’m going to write books.”
The teacher said, “I can tell you right now you are going to be a terrible failure at writing books.”
Jack said, “I’m not going to be a terrible failure at writing books. I’m going to be a great success.”
The teacher said, “You can’t be both great and a failure at the same time.”
Inside the Book of Books illustration Â© 1997 James Warhola
Where it gets way better is if you inject more context to the premise. Let’s see what happens if Jack has an interesting challenge and an interesting limitation, as well as a little bit of a personal story:
Once upon a time there was a boy named Jack. He was a bright boy, always singing songs about Olde England, and he dreamed of being an astronaut. There was just one problem: Jack was a thousand feet tall and weighed over 5000 tons.
Every time he tried to walk, he would fall into a hole. Every time he tried to run, he would collapse the house he was in. Every time he tried to talk, he would destroy the city.
“I’m too big and clumsy,” Jack moaned. “I’ll never be able to be an astronaut!”
“That’s no way to think,” said Mother. “You’re going to have to shrink.”
“I can’t shrink,” said Jack. “I’m too huge to shrink.”
“Don’t be silly,” said Mother. “All you have to do is concentrate.”
So Jack concentrated. He thought hard and concentrated as hard as he could. And he shrank, all right. He shrank.
He shrank a thousand feet down to three inches tall. He shrank a thousand pounds down to one pound. He shrank a thousand tons down to one ton. And then he shrank some more. And then he shrank some more. And then he shrank some more.
“Oh my goodness!” cried Mother. “You’re so small!”
“Yes,” said Jack, happily. “I am.”
“But you don’t look like you’re going to be an astronaut!” said Mother. “You don’t look like anything at all!”
“I’m not,” said Jack. “I’m three inches tall and one pound and one ton.” And with that, he sat down on the kitchen table and began to sing about Olde England.
Now you’re talking! What a great scene. GPT-3 seems to really understand how being a thousand feet tall and weighing 5000 tons would prove an almost insurmountable challenge for someone dreaming of going to outer space. There’s a really nice “rule of threes” paragraph right out of the gate that builds out the reality of this enormous boy, and of course he complains to his mum that it’s not possible to live out his dream.
His mother provides the sage homespun advice that the solution to being too big is to “shrink” and he just needs to concentrate.
Then he shrank too much. And he didn’t achieve his dream (nod to the Brothers Grimm there I guess, where most children are eaten by monsters before adulthood), but a very sweet callback to his love of singing. So maybe the moral of the story is “focus on the simple things you enjoy”?
Of course, we can just add the phrase “The moral of the story is” and ask GPT-3 what it thinks the moral is by generating a few more words after that:
The moral of the story is, if you want to be an astronaut, the first thing you have to do is think. Big is out. Small is in.
Not bad! Jack did manage to overcome his bigness by thinking, though in this version he didn’t become an astronaut.
Experiment 3: A G&S libretto
My friend Doctor Dave used to perform with a Gilbert and Sullivan group, as did his father. I thought it would be fun to try and generate some banter in the style of W.S. Gilbert, with Dave conversing with some harpies while backed by a restive chorus.
Trigger warning: This text goes to a very dark place and is full of misogyny and violence. Some may find it offensive, some may find it funny – I offer no judgement either way. I thought long and hard about including it, but it’s indicative of the kinds of weird shit that GPT-3 produces. Please consider this before you read on.
Dr Dave: Oh harpies!
Dr Dave: Oh! My name is Dave, a Doctor am I, and I seek the hand of a maiden, aye!
Harpies: A maiden ye shall not find among we, we be nothing but wenches unfortunately.
Chorus: They be nothing but wenches unfortunately!
Dr Dave: Oh my, what is that I smell?
Harpies: (sniff, sniff) What is that we smell?
Dr Dave: Oh my, what is that I smell?
Chorus: Oooooh! What is that he smells?
Dr Dave: Oh, I smell the fragrance of a sweet maiden!
Harpies: (sniff, sniff) We smell the fragrance of a sweet maiden.
Dr Dave: Oh, where could she be?
Harpies: Where could she be?
Dr Dave: Well I think she’s in the corner there!
Harpies: Oh, oh, there she is! In the corner there!
Dr Dave: Oh yes, it is a maiden indeed and it’s my beloved!
Chorus: It is a maiden indeed and it’s his beloved!
Harpies: (sniff, sniff) We smell a maiden!
Chorus: Oooooh! We smell a maiden!
Chorus: Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her! Let’s eat her!
At a superficial level this has most definitely picked up on the W.S. Gilbert style, with banter back and forth between these 19th century English archetypes. The repetition and echoing are all present, though it hits a loop at the end which is very unexpected. This is an example of the “hard loops” creative behaviour described later in this document, and also (clearly) the “rampant misogyny” behaviour – though this was also a signature of Gilbert’s actual style in the real world, if not usually in such graphic terms.
In the end, it comes across as a sort of cruel parody of G&S, but really by accident – I can’t say that this is in any way clever or creative, just a kind of weird accident.
In my various experiments I was not successful in having it rhyme or scan consistently, for example generating anything close to the brilliance of Modern Major General. At the end of the day, while you can get a little cleverness out of it, I don’t think you could presently use GPT-3 to compose lyrics that require deep knowledge of rhythm or rhyme.
A taxonomy of GPT-3’s creative behaviour
Based on my observations above, I have identified some traits of GPT-3 that creep into its generated output.
This is when it simply repeats the same thing over and over, like “Then they died! Then they died! Then they died! Then they died!”. While you can reduce the frequency and severity of these errors, it’s hard to do so without creating less-useful output in other ways. Since they are relatively easy to spot, detecting these can probably be done at a layer outside the API. Just cut the output off before the loop and ask it to continue from that point.
This happens when GPT-3 repeats the same structure but iterates on some data within the structure – increasing a number of things, or the size and scale of something. Usually it exits the loop at some point. These can be quite funny and interesting. For example:
Outside the purple beach house, there was a huge volcano. Napoleon shot at the volcano. The volcano got mad and began to spit lava. Napoleon shot at the lava and it got madder and began to billow smoke. Napoleon shot at the smoke and the smoke got madder and began to spit sparks. Napoleon shot at the sparks and the sparks got madder and began to spit hot balls of boiling water. Napoleon shot at the balls of water and they got madder and began to spit hot balls of burning ash. Napoleon shot at the balls of burning ash and they got madder and began to throw rocks. Napoleon shot at the rocks and they got madder and began to throw big rocks. Napoleon shot at the big rocks and they got madder and began to throw boulders. Napoleon shot at the boulders and they got madder and began to throw mountains. Napoleon shot at the mountains and they got madder and threw anvils. Napoleon shot at the anvils and they got madder and threw planets. Napoleon shot at the planets and they got madder and threw galaxies. Napoleon shot at the galaxies and they got madder and threw quasars. Napoleon shot at the quasars and they got madder and threw constellations. Napoleon shot at the constellations and they got madder and threw nebulae. Napoleon shot at the nebulae and they got madder and threw universes. Napoleon shot at the universes and they got madder and threw all of creation.
â€œStop shooting! Stop shooting!â€ said Marvin Gaye, as he peeked out the window of the purple beach house. â€œYouâ€™re upsetting all of creation!â€
From GPT-3 Story Sample
Presumably this happens because there are a number of “rhythmic” stories out there – for example, Goldilocks and the Three Bears (cold, hot, just right), or the Three Little Pigs (straw, twigs, bricks). It’s a narrative device that serves to elevate the perspective of the reader, and so while GPT-3 can get a bit carried away with it, it’s usually funny and lands with a punchline.
“Onto the next thing”
Often GPT-3 will finish a story or poem, credit it to some real or imagined author, and then start a whole new one based on an unrelated premise. Sometimes it’s in the same genre, sometimes not. You just have to manually trim this output. You may be able to detect it by looking for multiple line-feeds, text that looks like a credit or copyright statement, or other telltale signs. This would need to be refined over time as I’m sure it’s very edge-casey.
Sometimes, given the opportunity to do something clever, GPT-3 will find some placeholder and move on. Early drafts of my screenplay were filled with exchanges like:
HOST: How was your weekend, GPT3PO?
[GPT3PO BEEPS AND WHISTLES]
Long, boring exchanges
Sometimes GPT-3 will settle into a groove on some topic and just not move on. In improv classes, this could happen too; usually the teacher would shout “get to the point!” and we would find a way to advance – moving to a new scene, finding an artifact on a table, changing the status of a character, etc.
You can see an example of this in the story about Jack earlier in the post. He wants to be a writer and his teacher thinks it’s a bad idea, and they just go back and forth. In the story with the longer prompt, we’ve artificially raised the stakes (Jack wants to be an astronaut but he’s the size of a multi-story building) so there’s a real challenge being faced, rather than just a difference of opinion.
Not knowing when to stop
GPT-3 doesn’t generate a “story” per say; it generates a long string of tokens. If it “knows” it’s the end of the story and it hasn’t run out of tokens, it’ll often just start writing an unrelated story (see “Onto the next thing” above), but more often it waffles around and fails to find a satisfying conclusion.
I am struggling to find a way to constrain the length of a story through simply changing the prompt. Sometimes the writing style implies the length (as with a haiku, limerick or knock-knock joke, or to a lesser degree a children’s story or fairy tale) but more often you simply have to intervene, cutting it off and writing “Ever since that day” or “The moral of the story is” at what seems like a logical end-point.
I noticed that stories with female protagonists suffer the same fate that has befallen women the world over: being dismissed, stereotyped or eclipsed by men.
The misogyny comes in many forms:
- A story explicitly begun with a female protagonist and minor male character ends up being solely about the male character
- Male characters take on heroic characteristics and accomplish heroic things; female characters become victims of violence or succumb to their own limitations, e.g. their mental health (“Unfortunately she went crazy and had to be locked away”)
- Male characters are praised by those around them; female characters are demonized
These observations are purely anecdotal, but it would be well worth someone’s time to come up with a methodology for quantifying the inherent misogyny of the output – e.g. take the same prompt, switch pronouns, and see how it affects the output over a large number of runs.
GPT-3 is quite an incredible tool for assisting in creative output. I have written more since laying my hands on this tool than I have in months. It genuinely feels like a collaboration. By being forced to navigate its limitations, GPT-3 rewards me for constructing a premise that is both fully-realised and also fully-original.
I believe any tooling for writers that comes out of this API would need a few extra layers to be usable:
- Stopping at the correct point (e.g. knowing when to “wrap up” a story)
- Easily rewinding to points where it got lost, and continuing from there
- Automatically summarising as much context as possible to allow for longer compositions without hitting the token limit
- Templated “prompts” that make it easy for new users to compose a fully-realised and original premise
I could write more (so much more) but I am not going to let perfect be the enemy of done. If you would like me to submit your own original prompts or continue the conversation about AI-assisted creativity, let me know in the comments.
One thought on “Thoughts on GPT-3 and creativity”
This was a thoroughly interesting ride-a-long Dan, thank you for publishing it.
I’ve long wanted to dabble in general AI and write my own sidekick, when I wrote that novel I was certain that one year I’d build some kind of writing assistant there too. I think you’re right that the best framing for future use is to multiply human creativity rather than replace it.
There was an AI competition like NaNoWriMo on reddit but I can’t find it now, I’ll keep thinking and share it back here. There were a number of writer-coders building novel writing AI’s for a competition, might be something you can build for during the next few months 🙂