[go: up one dir, main page]

Extrapolating the spectacular performance of GPT3 into the future suggests that the answer to life, the universe and everything is just 4.398 trillion parameters.

Jun 10, 2020 · 8:26 PM UTC

Replying to @geoffreyhinton
could convolving dense layers across the masked-self-attention input layers (and hidden input layers) in the decoders help keep GPTNs weight count in check, while allowing for larger context windows? similarly: convolving masked self-attention across large input context windows
Replying to @geoffreyhinton
Sheesh--when did 42 become 4.398 trillion? Deep #AI hypes up everything!
Replying to @geoffreyhinton
In light of your noticing this 3 years ago, why the change of heart last month?
Replying to @geoffreyhinton
And all this time I thought it was just 42 parameters.
Replying to @geoffreyhinton
Sounds like a comfy way to Utopia. How many training tokens, though?
Before or after distillation?
GPT-4, it seems, will have slightly higher computational requirements than GPT-3.
Replying to @geoffreyhinton
my calculations estimated 4.399 trillion.. I guess we are close enough and run with 4.39.. ;)