Thread by @AlexTamkin, Some takeaways from @openai's impressive recent progress, including GPT-3, CLIP, and DALL·E:[THREAD] [...]

Alex Tamkin

AlexTamkin

Some takeaways from @openai's impressive recent progress, including GPT-3, CLIP, and DALL·E:

[THREAD]

1) The raw power of dataset design.

These models aren't radically new in their architecture or training algorithm

Instead, their impressive quality is largely due to careful training at scale of existing models on large, diverse datasets that OpenAI designed and collected.

2/

Why does diverse data matter? Robustness.

Can't generalize out-of-domain? You might be able to make most things in-domain by training on the internet

But this power comes w/ a price: the internet has some extremely dark corners (and these datasets have been kept private)

3/

As @sh_reya puts it, the "data-ing" is often more important than the modeling

And @openai put *painstaking* effort into the data-ing for these models.

4/

2) The promise of "impact teams."

Teams that are moderately large, well-resourced, and laser-focused on an ambitious objective can accomplish a lot

5/

The @openai teams are multidisciplinary—different members work on algorithms, data collection, infra, evaluation, policy/fairness, etc

This is hard in academia, not just b/c of resources but also incentives—academia often doesn't assign credit well to members of large teams

6/

3) Soul-searching for academic researchers?

A lot of people around me are asking: what can I do in my PhD that will matter?

@chrmanning has a useful observation—we don't expect AeroAstro PhD students to build the next airliner

7/

I'm also optimistic here—I think there a lot of opportunity for impact in academia, including advancing:
- efficiency
- equity
- domain-agnosticity
- safety + alignment
- evaluation
- theory
…and many other as-of-yet undiscovered phenomena!

8/

4) We're entering a less open era of research.

Companies don't want to release these models, both because of genuine safety concerns but also potentially because of their bottom line

9/

Models are locked behind APIs, datasets are kept internal, and the public may only get to see a polished (but restricted) demo + blog post

10/

Limiting API access has safety benefits, but could also be an extra advantage to the well-connected: established researchers or those with large Twitter followings

11/

Even when papers are published, important details are missing (e.g. key details of GPT-3's architecture or the data collection process)

It's becoming increasingly hard to study/improve these methods—just as they're edging closer and closer to widespread productionization.

12/

Ultimately, no one lab can do this alone—

We need smart new frameworks and mutual trust to overcome coordination challenges and ensure positive outcomes for society

13/13

You can follow @AlexTamkin.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: