On May 22, Gao Wen, an academician of the CAE Member and director of Pengcheng Laboratory, said at the 2023 Bay Area Artificial Intelligence Sub Forum that it would use the computing power of Pengcheng Laboratory and the data of open channels to train a large model base open to the whole society for researchers and entrepreneurs.
To make a big model, it's not as easy as slapping your head or shouting slogans. It requires computing power. It requires billions or tens of billions of investment to make something decent. "Gao Wen said that currently, China's big models face varying degrees of challenges in terms of computing power, algorithms, and data, and need to be broken down one by one. Before the large model, the training of an AI model is usually sufficient for a single machine, single card, or multiple cards, with a training period ranging from hours to several days. Now, in order to complete the training of a large model with hundreds of billions of parameters, distributed training in large clusters with hundreds of servers and thousands of GPU/XPU cards has become a necessary option, and the training cycle has also been extended to months. In order to train GPT-3 with 175 billion parameters (300 billion token data), it takes 32 years to calculate the performance of one A100 based on semi precision peak, and 34 days to calculate 1024 A100 based on resource utilization rate of 45%. Of course, even without considering time issues, one A100 cannot train a model with a parameter scale of hundreds of billions, as the model parameters have exceeded the single card graphics memory capacity. Conducting large-scale model training in a distributed training environment reduces the training cycle from a single card for decades to several days. It requires breaking through various challenges such as computing walls, memory walls, communication walls, etc., so that all resources within the cluster can be fully utilized, accelerating the training process, and shortening the training cycle. As the parameters of the large model become larger, the corresponding cluster size also becomes larger, and these three walls also become higher. Meanwhile, during long-term training in large clusters, equipment failures may occur, which may affect or interrupt the training process.
02 | AI Big Base Accelerated Big Model Training Combining years of technical accumulation and engineering practice in the field of AI and Big Model, Baidu launched a fully stack self-developed AI infrastructure AI Big Base at the end of 2022, including chips -
Framework - Model three-layer technology stack, with key self-developed technologies and leading products at all levels, corresponding to Kunlun Core and Feizhu, respectively
(PaddlePaddle), Wenxin Big Model. On the basis of the three-layer technology stack, Baidu AI Cloud launched two AI engineering platforms, AI Zhongtai and Baidu Baige·
AI Heterogeneous computing platform improves efficiency at the development and resource levels, completes the breakthrough of the three walls, and speeds up the training process. Among them, the AI platform relies on the AI framework to develop parallel strategies and optimized environments for the large model training process, covering the entire training lifecycle. The AI large base has integrated and optimized the technology stacks of each layer, completing the integration of cloud and intelligence technology. It can achieve end-to-end optimization and acceleration of large model training.
03 | More than 20 enterprises in China have entered the big model track, from the release of Baidu's "ERNIE Bot" and Ali's "Tongyi Qianwen" to the launch of 360 "Red Boy", Shangtang's "Ririxin", Netease's "Yuyan", IFlytek's "Spark", Kunlun Wanwei's "Tiangong", and then to the announcements of Tencent's "Hunyuan", JD's "ChatJ", Huawei's "Pangu", and so on. Internet giants and technology companies have shown off their "muscles", and no one wants to fall behind in this big model battle. In this frenzy, the development stage of large models has moved from "universal" to "vertical". Computing power, large-scale data, and high cost talent have become obstacles for most enterprises to enter the general big model. However, the demands of deep customization and broad scene applications have given rise to the development of large-scale vertical models in China. In the past two months, many small and medium-sized enterprises with accumulated user data in industries such as healthcare, finance, education, and painting have started training and adapting their own vertical models based on domestic and foreign large model "bases". At the same time, companies that have released universal large models have also launched models targeting specific industries. If the universal large model is the early stage of the development of large models, then the application of vertical scenarios can be seen as its "middle ground battle". In this stage, the application and scenario are in the forefront, forcing the rapid development of large models in the vertical field and achieving landing value in different industries.
04 | Jingtai viewpoint: Differentiation has emerged, and opportunities lie within it. Currently, the domestic big model race mainly includes three categories: one is the general big model that targets GPT, and the other is the company that focuses on the basic layer; One type is to train vertical big models based on open source big models, focusing on vertical industry enterprises; The other type is pure application companies that focus on specific applications. The cost and resources required for training domain (vertical) models are much lower than starting from scratch to create a universal model. Therefore, from the perspective of business logic, most companies do not have the ability to create universal large models, giants are more suitable for creating universal large models, and companies with rich scene data accumulation are more suitable for creating vertical models. The vertical model mainly focuses on deeply addressing industry needs, which means that enterprises train "industrial version GPTs" that are suitable for themselves in their fields of expertise. The content generated by such large models is more in line with the needs of specific vertical scenarios and has higher quality. Currently, many vertical class models can be seen to be applied in scenarios such as finance, healthcare, and trading. For example, Bloomberg developed the Bloomberg GPT, a financial exclusive model, based on its rich financial data resources and retrained using the GPT-3 framework. In addition to the two common models mentioned above, there is currently a company specializing in application development on the domestic big model entrepreneurship track, which does not have a research and development team and calls interfaces from existing big models to develop products and operations.