Thread by @maxolson, 1/n A quick primer on GPT-3 for anyone who's heard about it [...]

1/n A quick primer on GPT-3 for anyone who's heard about it but doesn't know what it is.

Why? GPT is a game-changer in AI that has the potential to disrupt a huge amount of areas, potentially leading to truly generalized AI problem solvers.

2/ GPT is a series of language-based machine learning models built by @OpenAI. The goal of language models is essentially text generation: look at a sentence → predict the next word(s).

3/ The premise behind the GPT models: how much data & computing power can you throw at an unsupervised deep learning model? What are the performance limits before you start getting diminishing returns?

4/ To do this, you have to design a model & build physical infrastructure so that these huge inputs of data + computing power are possible. This is much easier said than done.

5/ Whereas many models are *specific* like translation or chatbots, GPT is *generalized* taking in a very broad set of data & learning general patterns from it. It is unsupervised (or self-supervised) — i.e. no labeled data like domain-specific ML models such as image recognition

6/ OpenAI released GPT-2 (so their 2nd version) in Feb. 2019. The data fed in was basically just web-crawled internet text. Results were crazy impressive & showed promise for the methodology. https://openai.com/blog/better-language-models/

Better Language Models and Their Implications

We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimen...

https://openai.com/blog/better-language-models/

7/ Results for GPT2 were so impressive in fact that OpenAI didn't release the actual model @ first to prevent bad-actors from using it maliciously (breaking an unwritten rule in the usually open ML research community)

8/ GPT-3 was released in June. It used the same model as GPT2 but >10x the size on all metrics, including >600GB of data vs. 40GB for GPT2. Results have been pretty astonishing. https://venturebeat.com/2020/05/29/openai-debuts-gigantic-gpt-3-language-model-with-175-billion-parameters/

OpenAI debuts gigantic GPT-3 language model with 175 billion parameters

OpenAI's GPT-3 language model can generate convincing news articles and achieve state-of-the-art results on a range of NLP tasks with few-shot learning.

https://venturebeat.com/2020/05/29/openai-debuts-gigantic-gpt-3-language-model-with-175-billion-parameters/

9/ As to the importance of hardware infrastructure: GPT-3 was trained using Microsoft's latest supercomputer with 285,000 CPUs, 10,000 GPUs and 400GB/s of network speed. Estimated cost of training was $12M.

10/ The generalization of the model is the real killer here. The SAME model can output customer service chats, music/movie recs, legal docs, summaries of sporting events, poetry, code functions, and complex descriptions from simple text prompts.

11/ "...like a tool which combines the skills of Curie, Turing, and Bach. An AGI working on a problem would be able to see connections across disciplines that no human could." https://openai.com/blog/microsoft/

Microsoft invests in and partners with OpenAI to support us building beneficial AGI

Microsoft is investing $1 billion in and partnering with OpenAI to support us building beneficial AGI: https://openai.com/blog/microsoft/

https://openai.com/blog/microsoft/

12/ For a bunch of amazing examples of what people are doing with this, check out this thread. https://twitter.com/xuenay/status/1283312640199196673

https://twitter.com/xuenay/status/1283312640199196673

13/ Here's @Merzmensch on GPT3's creative potential: https://towardsdatascience.com/gpt-3-creative-potential-of-nlp-d5ccae16c1ab

GPT-3: Creative Potential of NLP

New ML milestone by OpenAI — in action

https://towardsdatascience.com/gpt-3-creative-potential-of-nlp-d5ccae16c1ab

14/ GPT-3 isn't perfect of course — it has failure modes & wouldn't fully pass the Turing test. http://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html

15/ What will GPT-5 or its equivalent look like? Knowledge is encoded in language & GPT is learning to "understand" how the world works by learning human communication patterns. This is pretty epic & will have far reaching implications.

/end

Latest Threads Unrolled: