 
									Author: Wangshu Tencent Research Institute post -doctoral post -doctoral
As early as 2001, some studies pointed out that the field of artificial intelligence in the game has great potential to realize or create artificial intelligence (Human-Level AI) [1].As the starting point of artificial intelligence research, the game, with the complexity and diversity of its task scene, provides guarantee for artificial intelligence in terms of breadth, depth and flexibility.
At present, with the rapid development of generation AI and decision -making AI technology, the development trend of games and artificial intelligence resonance is more obvious.At the GDC2024 (Global Player Development Conference 2024), AI became the focus of the conference, with AI -themed speeches of 64 games, accounting for 8%.In the field of generation AI, 62%of the game industry respondents are using AI tools to make game content [2].In the field of decision -making AI, the Google DeepMind team again launched the GMA SIMA (Scalable Instructor Multiworld Agent) after AlphaStar, which can perform more than 600 tasks in various 3D game worlds according to human natural language instructions.
Technical test field:
GM AI agent practice based on the game environment
The game provides a clear measurement standard for decision -making AI. The ability to evaluate the decision -making AI with clear and quantified rules in the game can solve the problem of lack of artificial intelligence research scenarios and greatly improve the iteration and test efficiency of technology.At present, most of the decision -making AI research teams including OpenAI, DeepMind, etc. have selected games as training scenarios, and are committed to creating universal intelligence in different types of game scenarios, and use this to create general artificial intelligence.
On March 13, 2024, the Google DeepMind team released an AI intelligence named Sima (Scalable Instructor Multiworld Agent).Execute more than 600 tasks.The ability of strong natural language understanding and migration learning has allowed many researchers to see the appearance of SIMA as “ChatGPT moments of smart”.
DeepMind explains the basic principles and technical paths of SIMA in the technical report, and defines it as a universal game intelligent body that can be scalable and instructed in multiple 3D virtual worlds.The DeepMind team selected 9 popular 3D online games and 4 3D scenes made of UNITY engines as SIMA smart training environment.EssenceIn the process of specific training, the intelligent experience continuously observes the game image information in the learning screen, and combines it with the various operations instructions of the player in the game, and then realizes the keyboard and mouse output to control the characters in the game to perform the role executionVarious operations [3].
>
Figure 1 Overview of SIMA Smart Project
The SIMA project is an important milestone in the field of DEEPMIND team in the field of artificial intelligence (AGI). From the Go artificial intelligence AlphaGo and Alphazero, to AlphaStar based on the game “StarCraft 2”, and now, the DeepMind team based on Sima, DeepMind based on large language models,It has been testing and researching general intelligence based on the game environment. From the perspective of DeepMind, the decision -making and action capabilities trained in the gaming environment in the game environment are expected to migrate to the real world.Thoughts and new practice.
Before the release of SIMA, there were multiple general game smart research projects in the industry. There were two representative work, which were GATO published by DeepMind and MineDojo released by Nvidia.
Gato was released by the DeepMind team in November 2022. It can play the Atari Games and control the real robot arm stacking building blocks.Gato uses a large -language model architecture of GPT. Its training materials include images, text, robotic arm joint data, and other multi -mode data sets (4].In a study in March 2023, Microsoft pointed out that Gato, such as Gato, combines a large model of multi -mode information, which is likely to be born in the early stages of intelligence [5].
>
Figure 2 GATO created by DeepMind
Similar to GATO, there are research institutions such as Nobita, Caltech, and Stanford, MineDojo, which are jointly created by the “My World” game.MineDojo uses information about the player video of “My World” game (YouTube), Encyclopedia (Wiki) and the user community (Reddit) as a training material to train a message that can be based on the text prompts in the “Minecraft” game to complete each.Smart universal intelligence with different tasks.MineDojo can not only complete some simple programmatic tasks, but also complete a series of creative tasks based on a simple description, such as building a library based on description [6].
>
Figure 3 MineDojo ability model
The two tasks of Gato and MINEDOJO correspond to the two different ideas in the study of artificial intelligence: solve enough tasks or solve a sufficient complex task.However, there are also certain limitations of the above two studies. For example, MineDojo is only for special intelligence for specific games. It can only complete various tasks in a single game without having the ability to migrate learning. Although GATO has a certain migration learning ability, it has a certain migration learning ability, but it has a certain migration learning ability, but it has a certain migration learning ability, but it has a certain migration learning ability. HoweverThe main application environments are some 2D games, not 3D game environment, which is a large gap with real world scenes.
At present, GM AIGENT based on game environment training has become a consensus in the industry.In the Ted AI 2023 speech, Nvidia senior scientist Jim Fan proposed the concept of the Foundation Agent. He believesThe body can be generalized into “basic models” in multiple environments, and the training of this model is also inseparable from the game environment [7].In China, Tencent has also led the establishment of AI multi -intelligence and complex decision -making research platform -enlightenment, relying on Tencent AI LAB and “Glory of the King” in the core advantages of algorithms, computing power, and experimental scenarios for academic researchers andAlgorithm developers provide the leading domestic application exploration platform.
New breakthrough:
SIMA implements a large language model
Effective fusion with AI agent training
The emergence of SIMA combines large language models with smart training to achieve a breakthrough in AI smart decision -making ability and generalization.SIMA can not only better understand various 3D gaming environments, but also perform various tasks in various 3D game worlds like human beings in accordance with human beings, but also far exceed other smart bodies in decision -making efficiency and ability.Decision -making ability similar to humans [8].DEEPMIND founder and CEO Demis Hassabis is even more blunt in the interview,“The field of combining large language models, AI intelligence training and game environment has huge development prospects. DeepMind will continue to increase research investment in this field in the future.[9]Essence”Overall, the characteristics and breakthroughs of other SIMA are mainly reflected in the following aspects:
First, SIMA uses the game environment for training, but pays more attention to the consistency of smart behavior and receiving instructions.From the perspective of the DeepMind team, “The game is artificial intelligence(AI)Important test field of the systemLike the real world, the game is also a rich learning environment, with real -time setting and changing goals with sensitive response.Compared with the previously released game intelligence released by the DeepMind team, the same thing is that during the training process, it also observed the behavior data of a large number of human players. The difference is that the purpose of SIMA training is not to defeat human players or in the game.Get high scores, but to learn to comply with the natural language instructions from humans in various game environments, and make behaviors that are consistent with instructions in the game environment.
Second, SIMA combines large language models with smart training and adopts a unified and humanized interactive interface.“Language and environment learning is complementary. Through learning natural language, it can improve the ability to understand the universal representation and abstract concepts and improve learning efficiency.” Compared with the previous gaming environment -based smart bodies, SIMA is trainingThe large language model was introduced, and the entire training process followed the rules of language priority. All training behaviors were directly driven by natural language.In other words, SIMA does not need to access the source code of the game nor the custom API.It only needs two inputs: the image information on the screen, and the natural language instructions provided by the user, can use the keyboard and mouse to control the characters in the game to execute these instructions.In terms of specific interaction methods, SIMA uses a unified and humanized interactive interface. Human beings can directly call the interactive interface to issue natural language instructions to SIMA (as shown in Figure 4 below).
>
Figure 4 SIMA Smart Architecture
Third, SIMA has good generalization capabilities and can maintain a high level of ability in different virtual scenarios.According to the data of the DeepMind team, SIMA has been evaluated through 600 basic skills, covering navigation (such as left -turn), object interaction (climbing ladder), and menu useThe performance level higher than similar intelligence.DeepMind researcher evaluated the ability of Sima to complete nearly 1,500 specific games (IN-GAME) mission in accordance with the instructions. Some of them used human evaluation. The results showed that in which game environment, SIMA’s performance far beyond the same type of intelligence than the same type of intelligenceBody (see Figure 5).
>
Figure 5 The performance comparison of multiple intelligence in different environments
New scene application:
AI help game creation
Improve content creation efficiency
The game has become a test field and incubator for creating a general AI agent, and continuously promotes the update iteration of decision -making AI technology.At the same time, with the maturity of generating AI technologies such as Stable Diffusion and Transformer, AI technology has also begun to help games and the content creation of more extensive cultural industries.Image, text, audio and video, NPC and other digital assets, improve product research and development efficiency, and further reduce the threshold for production of interactive content.
At the application level, the generation AI model has become a powerful assistant for game developers.According to the data of the “2024 Unity Game Industry Report”, after the use of AI technology, 71%of the game studios have stated that their R & D and operating efficiency has been improved.It can effectively reduce the communication cost of workers in different links.
On the production side of the game content, the generation AI has been widely used in text generation, 2D art creation, code generation and detection, and level design generation.Before the AI tool intervention in the game art work process, the game art worker to complete a high -quality illustration drawing is about a week. After using the generated AI tools such as the use of Stable DiffusionTime is shortened to 1 day.
>
Figure 6 Drawing process based on AIGC tools
In terms of reducing the communication costs of different types of workers, the generation AI also has huge application space.For example, in the process of gaming production, especially when the game art style is made up and selected, the communication between game planning and art workers often takes a lot of time costs.The intervention of generating AI tools can help planners quickly and present creativity and greatly reduce communication costs.
At the level of tools, with the improvement of the production and development efficiency of the format AI, various game companies have also begun to integrate them into their respective content production tools.Nvidia, a game chip company, released the AI tool platform NVIDIA ACE for Games, which is a game developer in June 2023, allowing game developers to build and deploy customized voice, dialogue and animation AI models in the game, which greatly improvedThe production and production efficiency of game content; on GDC 2024, NVIDIA and Inworld jointly announced a new digital man technology COVERT Protocol. The NPC -based game -based game NPC can interact with players in real time, and can be able to interact basedGenerate gameplay [10].
>
Figure 7 COVERT Protocol Technology Demo released by NVIDIA
The game engine company Unity and Unreal have also released new products based on generation AI.Unity released two new products based on artificial intelligence technology in July 2023: Sentis and MUSE. It is reported that the two products can increase the efficiency of traditional content creation ten times; Unreal has also integrated a large number of AIGC tools in its own engine.For example, Metahuman Creator, a digital man, tried to accelerate the creation of high -quality characters and large -scale scenarios to generate efficiency with artificial intelligence technology.
The game production company also comprehensively embraces AI technology, uses AI to empower content production tools, and continuously improves content research and development efficiency.以腾讯为例,腾讯AI Lab 在GDC 2024 重磅发布了自研游戏全生命周期AI引擎“GiiNEX”,该引擎借助腾讯自研生成式AI和决策AI模型,面向AI驱动的NPC、场景制作、In the field of content generation, it can provide a variety of AIGC capabilities including 3D graphics, animation, city and music.With the help of the GIINEX engine, the urban modeling task that was originally needed for 5 days was completed. Now it only takes 25 minutes to complete, and the efficiency is 100 times [11].
>
Figure 8 Tencent Game AI engine GIINEX architecture
Conclusion
Since the Datmouth Conference in 1956, in the field of artificial intelligence, early computer scientists defined AI as “the way of reaction method of a machine is like the intelligence that a person is based on a person” [12], and almost all of them laterThe research of artificial intelligence all follows the path of “simulation” human intelligence, trying to create artificial intelligence that can listen, see, speak, think, learn, and act, to enhance its perception, cognitive real world, and Sui Xing decision -making decision -makingThe ability to act.
Today, artificial intelligence research still follows the path and goals of human beings.If it is said that a large -generation AI model represented by ChatGPT, Sora, etc., it has enhanced the “perception” and “cognitive” ability of artificial intelligence to things, and completed the first step towards general artificial intelligence.So allowing artificial intelligence to make a suitable “selection” decision -making AI model through machine learning in a complex and diverse game environment, so that artificial intelligence has the ability to “move”, and can make independent decisions according to its own and environmental information to achieve it and achieve it to achieve it.It is a crucial step towards general artificial intelligence.
Although the current artificial intelligence research distance has achieved a long way to go,However, the combination of generation AI and decision -making AI undoubtedly opens up new possibilities for the realization of AGI. As a test field for training AI, the role in the study of artificial intelligence has become more important.We can see that based on the combination of large language models and AI intelligence, it has been able to shape universal game intelligence like SIMA, which can not only make effective decisions in a given environment, but also continue to learn and adapt to the unknown environment. And and also adapt to the unknown environment, and also, and the unknown environment can be continuously learned, and the unknown environment can be continuously learned. It also adapts to the unknown environment.Complete various complex tasks according to natural language instructions to show intelligence -like intelligence.In the future, with the continuous increase of the training environment, General Games Intelligence may have understanding and ability to understand more complex and higher language instructions.EssenceWe also look forward to that one day, GM can pass through the vast stage of the big world through the test of the small world of the game and serve the thousands of industry in human society.
Thanks to Cao Jianfeng, Liu Lin, Wang Peng, etc. for the guidance during the writing of this article!








