GPT-4 is a mixed model of 8x220 billion parameters? This gossip went crazy today

Question

Source: Heart of the Machine> George Hotz: Except for Apple, the reason why most companies keep secret is not to hide some black technology, but to hide some "not so cool" things."The parameter volume of GPT-4 is as high as 100 trillion." I believe many people still remember the "heavy" news that swiped the screen at the beginning of this year and a chart that was spread virally.![](https://img.gateio.im/social/moments-bab2147faf-c6ea22b629-dd1a6f-62a40f) But soon, OpenAI CEO Sam Altman came out to refute the rumors, confirming that this is a fake news, and said, "The rumors about GPT-4 are ridiculous. I don't even know where it came from."In fact, many people believe and spread such rumors because the AI community has been increasing the parameter size of AI models in recent years. The Switch Transformer released by Google in January 2021 raised the parameters of the AI large model to 1.6 trillion. Since then, many institutions have successively launched their own trillion-parameter large models. Based on this, people have every reason to believe that GPT-4 will be a huge model with trillions of parameters, and 100 trillion parameters is not impossible.Although Sam Altman's rumor helped us get rid of a wrong answer, the OpenAI team behind him has been tight-lipped about the real parameters of GPT-4, and even the official technical report of GPT-4 did not disclose any information.Until recently, this mystery was suspected to be pierced by the "genius hacker" George Hotz.George Hotz is famous for cracking the iPhone at the age of 17 and hacking the Sony PS3 at the age of 21. He is currently the boss of a company (comma.ai) that develops automatic driving assistance systems.He was recently interviewed by an AI tech podcast called Latent Space. In the interview, he talked about GPT-4, saying that GPT-4 is actually a hybrid model. Specifically, it uses an ensemble system of 8 expert models, each with 220 billion parameters (slightly more than GPT-3's 175 billion parameters), and these models have been trained on different data and task distribution training.![](https://img.gateio.im/social/moments-bab2147faf-7913ab5955-dd1a6f-62a40f) After this podcast was broadcast, PyTorch creator Soumith Chintala said that he seemed to have heard the same "rumor", and many people may have heard it, but only George Hotz said it in public.![](https://img.gateio.im/social/moments-bab2147faf-e857d562e9-dd1a6f-62a40f) "Hybrid models are the ones you consider when you have nothing to do," George Hotz quipped. "Hybrid models come about because you can't make the model bigger than 220 billion parameters. They want the model to be better, but if it's just The training time is longer, and the effect has diminished. Therefore, they adopted eight expert models to improve performance.” As for how this hybrid model works, George Hotz did not elaborate.![](https://img.gateio.im/social/moments-bab2147faf-20616686e1-dd1a6f-62a40f) Why is OpenAI so secretive about this? George Hotz believes that, except for Apple, the reason why most companies keep secret is not to hide some black technology, but to hide some "not so cool" things, and don't want others to know that "as long as you spend 8 times the money, you can can get this model".As for the future trend, he believes that people will train smaller models and improve performance through long-term fine-tuning and discovering various tricks. He mentioned that compared with the past, the training effect has been significantly improved, although the computing resources have not changed, which shows that the improvement of the training method has played a big role.At present, George Hotz's "breaking news" about GPT-4 has been widely spread on Twitter.![](https://img.gateio.im/social/moments-bab2147faf-195a7471f0-dd1a6f-62a40f) Someone got inspired by it and claimed to train an LLaMA ensemble against GPT-4.![](https://img.gateio.im/social/moments-bab2147faf-caf5535cca-dd1a6f-62a40f) It is also said that if, as George Hotz said, GPT-4 is a hybrid model composed of 8 expert models with 220 billion parameters, it is difficult to imagine how expensive the reasoning behind it is.![](https://img.gateio.im/social/moments-bab2147faf-f62695f724-dd1a6f-62a40f) It should be pointed out that since George Hotz did not mention the source, we cannot currently judge whether the above assertion is correct.