“We are about to train models that are 10 times larger than the cutting edge GPT-4 and then 100 times larger than GPT-4. That’s what things look like over the next 18 months.”
“We are about to train models that are 10 times larger than the cutting edge GPT-4 and then 100 times larger than GPT-4. That’s what things look like over the next 18 months.”
Apparently Inflection AI have bought 22,000 H100 GPUs. The H100 has approximately 4x the compute for transformers as the A100. GPT4 is rumored to be 10x larger than GPT3. GPT3 takes approximately 34 days to train on 1024 A100 GPUs.
So with 22,000*4/1024=85.9375x more compute, they could easily do 10x GPT4 size in 1-2 months. Getting to 100x the size would be feasible but likely they’re banking on the claimed speedup of 3x from FlashAttention-2, which would result in about 6 months of training.
It’s crazy that these scales and timelines seem plausible.