Thread by @Miles_Brundage, The growing ability of generative models to create realistic text/audio/images/video has sparked [...]

The growing ability of generative models to create realistic text/audio/images/video has sparked much excitement and concern, and led to valuable work on anticipating/mitigating/preventing related risks. But I think we're still pretty clueless about the key GM use cases (thread).

The simplest* use cases to think about are ones where a GM generates media that could otherwise be created by a human, but are faster/cheaper when done by a GM. Examples include voice for Duplex customer service bots or generating text w/ a language model.

*still not that simple

For stuff like that (let's call them substitution use cases), it's ~straightforward to think about possible upsides/downsides - imagine something getting cheaper and faster.

A second class of use cases sounds very different, but isn't as different as it sounds: augmentation.

With augmentation, humans and GMs divide up labor in some way, e.g. GM generates multiple candidates + human selects among them. In practice today, many substitution-sounding use cases are really augmentation, due to edge cases/need for human-in-the-loop in some contexts etc.

Autocomplete is an example of augmentation.

People talk about augmentation a lot, but existing use cases are often about doing an existing X more easily, vs. doing things that weren't at all possible before.

What these two use case classes have in common is that they're both about generating *new* content.

A third class might be called transformation, where the aim is to take existing media and transform it, either within the same modality (e.g. text-->text) or across modalities.

Some of this is happening already, and the boundaries are blurry, e.g. you might call auto-generation of captions for a video "transformation" (audio-->text), and depending on whether there's a human in the loop, it might also be thought of as substitution or augmentation.

Summarization might also be an example of this (transforming the length of text while preserving as much of the meaning as possible).

But I could see transformation having as big an impact as de novo content generation, or more, since it could enable extreme customization.

An Internet flooded with machine-generated, novel content threatens a sense of shared truth, whereas routine transformation of media threatens a sense of shared experience.

Imagine e.g. Netflix taking commands to remove scenes that would scare you specifically, or to substitute you for the protagonist. Or Spotify remixing all of the songs you listen to in the style of your favorite DJ, or reading Kindle books with footnotes written just for you.

In some sense, mass transforming of media is less scary than an ecosystem flooded with machine-generated content: at least the content was originally from humans* and we're all *kind* of consuming the same thing.

*well, maybe. These aren't mutually exclusive categories, again.

And people can always choose not to customize their content. But currently we're largely accepting (if not seeking out) mass customization of a different kind, namely customization of *which* content we are exposed to via feeds.

It's hard to say which classes or specific use cases will spread the most.

But it seems safe to say that the era of almost-exclusively-human-generated and almost-never-individually-customized media will not last much longer, absent consumer/policy demand for it continuing. /Fin

Latest Threads Unrolled: