An Introduction to AI Story Generation

This is an article from Popular Mechanics in 1931.

1. What is Automated Story Generation?

Automated story generation is the use of an intelligent system to produce a fictional story from a minimal set of inputs. Let’s tease this apart.

  • Narrative: The recounting of a sequence of events that have a continuant subject and constitute a whole (Prince, 1987). An event describes some change in the state of the world. A “continuant subject” means there is some relationship between the events—it is about something and not a random list of unrelated events. What “relates” events is not entirely clear but I’ll get to that later.
  • Story: A narrative that tells a story has certain properties that one comes to expect. All stories are narratives, but not all narratives are stories. Unfortunately I cannot point to a specific set of criteria that makes people regard a narrative as a story. One strong contender, however, is a structuring of events in order to have a particular effect on an audience.
  • Plot: A plot is the outline of main incidents in a narrative.

2. Why Study Automated Story Generation?

We can look at this question from a few angles. The first is applications. Aside from the grand challenge of an AI system that can write a book that people would want to read, storytelling appears in many places in society.

  • Human-AI coordination: there are times when it is easier to communicate via narrative. For example, communicating via vignettes helps with coordination because it sets expectations against which to guage the appropriateness of behavior. Humans often find it easier to explain via vignettes, and are often able to more easily process complex procecural information via vignettes.
  • Human-AI rapport: Telling and listening to stories is also a way that humans build rapport.
  • Explainable AI: Explanations can help humans understand what an AI system does. For sequential decision making tasks (e.g. robotics) this might entail a temporal component to the explanation resembling a story.
  • Computer games: many computer games feature stories or plots, which can be generated or customized. Going beyond linear plots, interactive stories are those in which the user assumes the role of a character in a story and is able to change the story with their actions. To be able to respond to novel user actions requires the ability to adapt or re-write the plot.
  • Training and education: inquiry-based learning puts learners in the role of experts and scenarios can be generated to meet pedagogical needs (similar to interactive stories above).

3. Narratology and Narrative Psychology

Before diving into technology, let’s look at some of the things we can learn from narratology and narrative psychology. Narratology is a humanistic field that concerns itself study of narratives. Narrative psychology is a branch of psychology that looks at what happens in the human mind when reading stories.

  • Fabula: The fabula of a narrative is an enumeration of all the events that occur in the story world between the time the story begins and the time the story ends. The events in the fabula are temporally sequenced in the order that they occur, which is not necessarily the same order in which they are told. Most notably, the events in the fabula might not all exist in the final telling of the narrative; some events might need to be inferred from what is actually told. For example: “John departs his house. Three hours later John arrives at the White House. John mutters about the traffic jam.” The fabula clearly contains the events “John departs house” and “John arrives at the White House” and “John mutters”. We might infer that John also drove a car and was stuck in a traffic jam — an event that was not explicitly mentioned and furthermore would have happened between “depart” and “arrive” instead of afterward when the first clue is given.
  • Sjuzhet: The sjuzet of a narrative is a subset of the fabula that is presented via narration to the audience. It is not required to be told in chronological order, allowing for achronological tellings such as flash forward, flashback, ellipses (gaps in time), interleaving, achrony (randomization), etc.

4. Non-Learning Story Generation Approaches

Let’s get into technologies. This cannot be exhaustive, so I have attempted to create some some broad classes and give some examples of each. This section looks at non-machine-learning based approaches. Non-learning systems dominated much of the history of automated story generation. They can be produce good plots though the emphasis on natural language output has been reduced. The key defining feature of these techniques — for the most part — is the reliance on knowledge bases containing hand-coded knowledge structures.

4.1. Story Grammars

Computational grammars were designed to decide whether an input sequence would be accepted by a machine. Grammars can be reversed to make generative systems. The earliest known story generator (Grimes 1960) used a hand-crafted grammar. The details are largely lost to history.

The earliest known story generated by a grammar-based story generation system (1960).
The Rumelhart story grammar.

4.2. Story Planners

Story planners start with the premise that the story generation process is a goal-driven process and apply some form of symbolic planner to the problem of generating a fabula. The plan is the story.

A story generated by the Tale Spin system.
Mis-spun tales generated by the Tale Spin system.
A plot fragment schema from the Universe system.
A story generated by the Universe system.
An action schema for a POCL planner.
A story plan generated by a POCL planner from Riedl and Young (2010).
A story plan generated by Fabulist. The orange bubbles show actions that are part of goal hierarchies.
A story generated by Fabulist corresponding to the above plan data structure.
An example CPOCL plan with character conflict and un-executed actions.
A story generated by CPOCL.

4.3. Case Based Reasoning

Case based reasoning is a theory of intelligence based on the idea that most reasoning is not done from first principles but instead adapts memories of solutions to related problems to new contexts. When a problem is encountered, the agent retrieves a solution to an older related problem, applies the old solution to the new problem, adapts the old solution to better fit the needs of the current problem, and then stores the new solution.

A story generated by the Minstrel system.
Case library for the ProtoPropp system.
A story generated by the Mexica system. Regular text was generated during the engagement phase. Text in italics was generated during the reflection phase.

4.4. Character-Based Simulation

The above approaches can be thought of as author-centric (Riedl 2004) — the story generators assume the role of a singular author responsible for plotting out all the actions and events of all the characters.

The hierarchical task network for character agents in a story simulation.

5. Machine Learning Story Generation Approaches

In this section we explore machine learning approaches that do not use neural networks.

A plot graph learned by Scheherazade for going on a date to a movie theatre.
A story generatedy by Scheherazade for the plot graph above.

6. Neural Story Generation Approaches

The past few years have seen a steady improvement in the capabilities of neural networks for text. The literature on neural network based story generation techniques is growing rapidly, which requires me to focus on just a few of the systems and works that I found notable at the time of writing.

6.1. Neural Language Models

A language model learns the probability of a token (or sequence of tokens) based on a history of previously occurring tokens. The model is trained on a particular corpus of text. Text can be generated by sampling from the language model. Starting with a given prompt, the language model will provide one or more tokens that continue the text. The prompt plus the continuation can be input into the language model to get the next continuation, and so on. Training a language model on a corpus of stories means the language model will attempt to emulate what it has learned from at corpus. Thus sampling from a language model trained on a story corpus tends to produce text that looks like a story.

The generation loop for Martin et al. (2018).

6.2. Controllable Neural Story Generation

One of the main limitations of neural language models is that they generate tokens based on a sequence of previous tokens. Since they are backward-looking instead of forward-looking, there is no guarantee that the neural network will generate a text that is coherent or drives to a particular point or goal. Furthermore, as the story gets longer, the more of the earlier context is forgotten (either because it falls outside of a window of allowable history or because neural attention mechanisms prefer recency). This makes neural language model based story generation systems “fancy babblers” — the stories tend to have a stream-of-consciousness feel to them. Large-scale pre-trained transformers such as GPT-2, GPT-3, BART, and others have helped with some of the “fancy babbling” issues by allowing for larger context windows, but the problem is not completely resolved. As language models themselves they cannot address the problem of forward-looking to ensure they are building toward something in the future, except by accident.

Fine-tuning the event2event neural network from the Martin et al. (2018) framework.
Fine-tuning reward is calculated by analyzing how close verbs are to each other in the corpus.
Stories generated by the hierarchical fusion model.
Stories generated by the plan-and-write system.
Illustration of inputs and outputs of the PlotMachines system.

6.3. Neuro-Symbolic Generation

One of the issues with neural language models is that the hidden state of the neural network (whether a recurrent neural network or a transformer) only represents what is needed to make likely word choices based on a prior context history of word tokens. The “state” of the neural network is unlikely to be the same as the mental model that a reader is constructing about the world, focusing on characters, objects, places, goals, and causes. The shift from symbolic systems to neural language models shifted the focus from the modeling of the reader to the modeling of the corpus. This makes sense because data in the form of story corpora is readily available but data in the form of the mental models readers form is not readily available. Assuming the theories about how reader mental models can be represented symbolically are correct, can we build neurosymbolic systems that take the advantages of neural language models and combine them with the advantages of symbolic models? Neural language models gave us a certain robustness to a very large space of inputs and outputs by operating in language instead of limited symbols spaces. But neural language model based story generation also resulted in a step backward from the perspective of story coherence. Symbolic systems on the other hand excelled at coherence through logical and graphical constraints but at the expense of limited symbol spaces.

The neurosymbolic architecture by Martin. The World Engine maintains a set of propositions about the story world.
The CAST pipeline.

6.4. Other Neural Approaches

Directly sampling continuations from a language model is not the only plausible way of using neural networks to generate stories. One might imagine search-like algorithms that use neural networks as resources for making decisions.

The graph built by C2PO. 1) the initial event. 2) the final event. 3) the event found that bridges the forward and backward sub-graphs.
Example stories generated by C2PO.
Stories generated via question-answering. The bold text is given input.

7. Conclusions

The field of automated story generation has gone through many phase shifts, perhaps none more significant than the phase shift from non-learning story generation systems to machine learning based story generation systems (neural networks in particular). Symbolic story generation systems were capable of generating reasonably long and coherent stories. These systems derived much of their power from well-formed knowledge bases. But these knowledge bases had to be structured by hand, which limited what the systems could generate. When we shifted to neural networks, we gained the power of neural networks to acquire and make use of knowledge from corpora. Suddenly, we could build story generation systems that could generate a larger space of stories about a greater range of topics. But we also set aside a lot of what was known about the psychology of readers and the ability to reason over rich knowledge structures to achieve story coherence. Even increasing the size of neural language models has only delayed the inevitability of coherence collapse in stories generated by neural networks.

--

--

AI for storytelling, games, explainability, safety, ethics. Assoc. Professor @GeorgiaTech . Associate Director @MLatGT . Time travel expert. Geek. Dad. he/him

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mark Riedl

Mark Riedl

AI for storytelling, games, explainability, safety, ethics. Assoc. Professor @GeorgiaTech . Associate Director @MLatGT . Time travel expert. Geek. Dad. he/him