Internet Society of China released: 2023 Global Generative AI Industry Research Report

Source: Internet Society of China

On May 19, 2023, during the Seventh World Intelligence Conference "World Intelligent Technology Innovation Cooperation Summit", guided by the Internet Society of China and China Software Industry Association, Tianjin Artificial Intelligence Society, Zhiding Technology, and Zhiding Think Tank jointly The "Global Generative AI Industry Map 2023" and "2023 Global Generative AI Industry Research Report" prepared by the author are released to better understand the development of global generative AI for government departments, industry practitioners, educators and the public situation for reference.

Image credit: Generated by Unbounded AI tools

As the frontier field of artificial intelligence, generative AI has become the hottest technology topic in the world. In 2022, OpenAI released ChatGPT, and the generative AI achieved an important breakthrough at the model application level. The number of monthly active users exceeded 100 million in just two months, making it the fastest growing consumer application in history. Many technology companies around the world have increased their investment in research and development in the field of generative AI, and have continuously launched important achievements in technology, products and applications, and continued to promote the innovation and commercialization of artificial intelligence. develop.

In this context, under the guidance of the Internet Society of China and the China Software Industry Association, the Tianjin Artificial Intelligence Society, Zhiding Technology, and Zhiding Think Tank jointly released the "2023 Global Generative AI Industry Research Report", which starts from a global perspective , to sort out the industry overview, infrastructure, algorithm model, scenario application, opportunities and challenges of generative AI, comprehensively display the industrial development of generative AI, and provide more information for government departments, industry practitioners, educators and the public A good understanding of generative AI provides a reference.

01 Overview of Generative AI Industry

1.1 Generative AI concept and content generation stage

Generative AI is a new production method that uses artificial intelligence technology to automatically generate content after professionally generated content (PGC) and user generated content (UGC).

Generative AI automatically generates and creates text, audio, image, video and cross-modal information based on massive training data and large-scale pre-trained models. Since OpenAI released ChatGPT in 2022, a global wave of generative AI has erupted, and many technology companies have launched generative AI models, products, and related underlying infrastructure and services.

1.2 Driving forces for the development of generative AI industry

In recent years, the global data scale has continued to grow. IDC predicts that the global data scale will reach 175ZB by 2025, providing massive data resources for artificial intelligence model training; the introduction of high-performance AI chips provides important computing power support for large-scale pre-training models; With continuous development, models such as Transformer, BERT, LaMDA, and ChatGPT have achieved rapid iterative optimization. Driven by data, computing power, and models, the global generative AI industry has developed rapidly, and related scenarios and applications have been continuously enriched.

02 Generative AI Infrastructure

2.1 AI high-performance chips provide computing power support for generative AI training

The development of artificial intelligence has entered the era of large models from the era of deep learning. The number of parameters of large-scale pre-training models has shown an exponential increase, which requires the support of high-performance computing power.

At present, the training computing power of large-scale pre-training models is 10 to 100 times that of the past. The current mainstream generative AI model training widely uses Nvidia Tensor Core GPU chips. For example, Microsoft spent hundreds of millions of dollars to purchase tens of thousands of Nvidia A100 chips to help Open AI builds ChatGPT.

2.2 AI computing clusters provide large-scale computing resources for generative AI training

AI computing clusters can provide large-scale computing power, continuously improve the utilization rate of computing power resources, improve data storage and processing capabilities, and accelerate AI large model training and reasoning efficiency.

At present, typical AI computing clusters such as Nvidia DGX SuperPOD, Baidu Intelligent Cloud High-Performance Computing Cluster EHC, Tencent's new generation of high-performance computing cluster HCC, etc., related computing power infrastructure continues to provide powerful computing power resources for generative AI training scenarios, further Reduce the threshold and cost of model training, and promote the implementation of generative AI models.

2.3 AI cloud service provides platform support for generative AI model development

The development of artificial intelligence pre-training models has a large demand for cloud services. AI cloud services can provide artificial intelligence development modules. Through diversified service models, developers' development costs and product development cycles can be reduced, and AI empowerment can be provided for model development. .

A typical case is Amazon SageMaker, which can provide image/image analysis, speech processing, natural language understanding and other related services, and users can realize functional applications without knowing parameters and algorithms.

Baidu Flying Paddle EasyDL zero-threshold AI development platform provides functions such as image classification, object detection, text classification, sound classification, and video classification, realizing one-stop automated training and lowering the threshold for AI custom development.

03 Generative AI Algorithm Model

3.1 Development history of global generative AI models

3.2 Mainstream models for language generation: OpenAI GPT-1 to GPT-4

Since 2018, OpenAI has successively released a series of generative pre-training models such as GPT-1, GPT-2, GPT-3, ChatGPT, and GPT-4. The GPT-1 model is based on the Transformer architecture, and only the decoder part of the architecture is retained;

The GPT-2 model cancels the supervised fine-tuning stage in GPT-1;

The GPT-3 model abandons the zero-shot of GPT-2, and uses few-shot to give a small number of samples for specific tasks; ChatGPT uses RLHF (human feedback reinforcement learning) technology to enhance the ability to adjust the output of the model;

The GPT-4 model released in 2023 has more powerful multi-modal capabilities. It supports multi-modal input of graphics and text and generates response text, which can realize the classification, analysis and implicit semantic extraction of visual elements, showing excellent Response ability.

3.3 Language class generation mainstream model: Google Transformer to PaLM-E

In 2017, Google released the iconic Transformer model. The decoding module of this model has become the core element of the GPT model. By introducing the attention mechanism, it can realize larger-scale parallel computing, significantly reduce the training time of the model, and make large-scale AI models are applied. The BERT model and LaMDA model are constantly improving in terms of information extraction capabilities and security.

The newly launched PaLM-E model has strong generalization and migration capabilities. It can process multi-modal data (language, vision, touch, etc.) Function.

3.4 Mainstream model for image generation: Diffusion Model

Research on Diffusion Model can be traced back to 2015, and the Denoising Diffusion Probabilistic Model (DDPM) was proposed in 2020, demonstrating the powerful capabilities of the diffusion model and driving the development of the diffusion model. The model mainly includes two processes: the forward process and the reverse process. The forward process is also called the diffusion process. The diffusion model learns by adding Gaussian noise to the image to destroy the training data, finds out the method of reversing the noise process, and uses the learned Denoising methods enable the synthesis of new images from random inputs.

The advantage of the Diffusion model is that the generated images are of higher quality and do not require adversarial training. Under the condition that less data is required, the image generation effect of the model is significantly improved.

PART.04 Generative AI Scenario Application 4.1 Overview of Typical Global Generative AI Applications

4.2 Generative AI Scenario Application—Text Generation

Text generation applications are mainly in four areas: content continuation, text style transfer, abstract/title generation, and entire text generation. The related personalized text generation and real-time text interaction have broad prospects.

Generally speaking, text generation based on NLP technology is an earlier application in generative AI. World-renowned technology companies have successively launched text generation application tools, such as Microsoft, Xmind and other related products in copywriting, data analysis, presentations, There are relevant application cases in mind mapping and other aspects.

4.3 Generative AI scene application - image generation

The technical scenarios of image generation are divided into image attribute editing, partial image generation and modification, and end-to-end image generation. Among them, the first two landing scenarios are image editing tools, and the end-to-end image generation corresponds to the two major landing scenarios of creative image generation and functional image generation.

At present, image editing tools are widely used, and related products are relatively abundant; creative image generation is mostly presented in the form of NFT, etc., and functional images are mostly marketing posters/interfaces, LOGO, model pictures, and user avatars.

4.4 Generative AI Scenario Application—Audio Generation

Audio generation is already common in daily life, and its application fields can be further divided into speech synthesis and music creation, and speech synthesis includes the field of text generation specific speech (TTS) and speech cloning.

The technical maturity of the TTS field is relatively high, but there is still a lack of emotional expression; voice cloning is of great significance to the film, animation and other industries and deserves attention; music creation can be further subdivided into lyrics, composition, arrangement, recording, mixing, etc. Multiple directions, the creation process mainly relies on the Transformer model.

4.5 Generative AI Scenario Application—Video Generation

Video generation is expected to be a mid-to-high potential scenario in the field of cross-modal generation in the future. Video generation mainly corresponds to three fields: video attribute editing, video automatic editing, and video part generation.

Video attribute editing has been widely used in the field of video creation, greatly improving the efficiency of video editing; automatic video editing is mainly in the technical trial stage; the principle and essence of video part generation is similar to image generation, emphasizing cutting video into frames, and then editing each frame. Image processing, the technology at this stage is to improve the accuracy of modification and real-time modification.

4.6 Generative AI Scenario Application—Digital Human

Digital humans refer to the synthesis of multiple human characteristics that exist in the non-physical world (such as pictures, videos, live broadcasts, and VR). Digital human represents the transition from low-density modalities such as text/audio to higher information-density modalities such as image/video/real-time interaction. In the future, video and even metaverse will be important application scenarios for digital human.

In the field of generative AI, digital human generation can be divided into digital human video generation and digital human real-time interaction. Digital human video generation is currently one of the most widely used fields, while digital human real-time interaction is mostly used in visual intelligent customer service, and more Emphasis on real-time interactive features.

05 Generative AI Opportunities and Challenges

5.1 In the era of generative AI, administrative work is highly substituted, and "asking customers" is expected to become a new profession

The impact of generative AI on employment Challenges and opportunities coexist. On the one hand, generative AI will promote the intelligent upgrading of jobs, and some jobs will be replaced. According to Goldman Sachs analysis, the intelligent automation capabilities of generative AI can greatly improve work efficiency and reduce operating costs. Traditional jobs in the United States and Europe will be affected by AI automation to varying degrees, and generative AI can replace a quarter of jobs.

On the other hand, generative AI will also create new jobs: "Engineer" allows people to use natural language as prompts to interact with AI to get information or create works. In addition, related fields around artificial intelligence will also generate a large number of new jobs.

5.2 The copyright of generative AI works is mainly distributed between software owners and users

The essence of generative AI is the application of machine learning. In the model learning stage, it will inevitably use a large number of data sets to perform training. However, the issue of copyright ownership of the products after training is still controversial.

Since legal subjects can enjoy rights, the copyright of generative AI works can only be enjoyed by those who have contributed to the generation of the work. Relevant personnel include software developers, owners and users (subject identities may overlap). AI software developers have been compensated from software copyrights, and copyrights of generative AI works are mainly distributed between software owners and users.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)