How far is GPT-4 from becoming “Skynet”? Microsoft’s latest research revealed: it has begun to take shape

In many sci-fi movies, we often see the characters of an AI system or intelligent robot that can think independently and perform tasks automatically.

For example, "2001: A Space Odyssey" designed a super intelligent computer HAL 9000 to manage the tasks of astronauts; "Terminator" shaped the self-learning artificial intelligence system Skynet, which aims to control the US nuclear weapons and defense systems to ensure that the country Safety.

These AI systems that can think and reason like humans and also have a wide range of cognitive skills and abilities are called AGI (Artificial General Intelligence).

The intelligence of AGI is not limited to specific fields or tasks, but also reasoning, planning, problem solving, abstract thinking, understanding complex ideas, rapid learning and experiential learning capabilities, etc.

For example, although Alpha GO is the only game in the world, it is not AGI. In contrast, Wall-E in "Robot Story" is more in line with the definition of human AGI.

The concept of AGI has existed in the field of artificial intelligence for decades, and many researchers have been trying to realize AGI by developing new algorithms, models and methods. How far are we from achieving AGI?

A paper recently released by Microsoft Research pointed out that OpenAI's latest large language model GPT-4 already has the prototype of AGI.

The broad capabilities of GPT-4 with many capabilities covering a wide range of domains and human-level performance and above on a wide range of tasks allow us to safely say that GPT-4 is an important step towards AGI.

The Spark of Artificial Intelligence

The full text of this paper by Microsoft Research has 154 pages, and it is full of test questions for researchers on GPT-4.

▲ Picture from: YouTube@AI Explained

Due to the length of the full text, YouTube blogger AI Explained has selected and condensed the full text, let us follow his perspective to intuitively understand the capabilities of GPT-4.

It should be noted that these researchers from Microsoft have been exposed to the model in the early development stage of GPT-4 and carried out experiments for about 6 months.

They used the unrestricted development version, not the final version that is now processed with security restrictions, so the conclusions proposed in the article are only for the original GPT-4 model.

Let's get down to business. The article points out that an important new ability of GPT-4 is the ability to use tools correctly with little instruction or no demonstration, such as using a calculator, which is not possible with the GPT-3.5 version of ChatGPT (hereinafter referred to as the old version of ChatGPT) arrived.

Tips: There is a river flowing from left to right, beside the river is a desert with pyramids, there are 4 buttons at the bottom of the screen, the colors are green, blue, brown and red

The researchers found that GPT-4 can be combined with Stable Diffusion to output a detailed picture according to the text prompts, and arrange objects according to the text prompts, which improves the efficiency of use.

An important difference between humans and other animals is that humans will discover and use tools, and now AI is slowly evolving in this direction.

The researchers also had GPT-4 take a mock exam for software engineers on LeetCode.

Taking the best results of the five exams as a sample, GPT-4 scored 86.4%, 60%, and 14.3% in the three levels of easy, medium, and difficult exams, respectively.

The paper modestly states that the encoding level of GPT-4 is close to human level, so how about human performance?

LeetCode's database shows that the average scores of human beings in the three levels of simple, medium and difficult exams are 72.2%, 38.7%, and 7%, respectively. This is the data of people who can't answer a single question.

It can be said that GPT-4 is already better than many software engineers in terms of programming ability.

GPT-4 can not only complete ordinary programming work, but also be competent for complex 3D game development.

The paper mentioned that GPT-4 used JavaScript to generate an obstacle avoidance game demo in HTML in the case of zero samples.

As long as it is slightly optimized on this basis, this Demo can be completely turned into a game product. And when the researchers tested an older version of ChatGPT with the same prompts, the latter said it couldn't.

To test its reasoning skills, the researchers gave it a question from the 2022 International Mathematical Olympiad.

▲ You can also challenge it~

Since the GPT-4 database is only updated until 2021 (although it is a development version, it is still not connected to the Internet), the answer to this question is not in its database, so it must be completed entirely by mathematical logical reasoning.

GPT-4 answered a correct problem-solving logic, but there was an error in the specific answer. The researchers said that this was a basic calculation error (like a person who calculated multiplication as division during the exam), while ChatGPT It can only generate a logically incoherent answer, and the level is far worse.

When asked some difficult questions such as "how many golf balls can be placed in a swimming pool", GPT-4 can also answer in a logical way.

Then the researchers found that GPT-4 can call APIs of other applications to complete operations such as retrieving user emails, calendars, coordinates, etc., so as to help people order meals, book tickets, and reply to emails and other assistant tasks.

This has been reflected in the functions of the ChatGPT plug-in set recently announced by OpenAI. What the GPT-4 model can do is definitely not just as simple as text generation. By combining with other application APIs, it can become a system-like existence.

The researchers also discovered a function that you will hardly notice, that is, GPT-4 can build a human mental model.

The researchers set up a scene for it, and GPT-4 analyzed the psychological processes of people in the scene and the corresponding actions.

In other words, GPT-4 can interpret the connection between human behavior and psychology like humans, rather than simply seeing the action itself, which is a great advancement for AI.

One More Thing?

This paper is divided into ten chapters, introducing GPT-4's multimodal capabilities (related to visually generated content), generating and understanding code capabilities, mathematical capabilities, interaction capabilities with the world, interaction capabilities with humans, Discrimination, and GPT-4 limitations, social impact, future directions.

The full text fully interprets the capabilities of GPT-4 in a way of stripping cocoons. Once it was released, it received widespread attention and became popular.

Interestingly, some netizens found that the author had hidden some information in the LaTeX source code comments of the paper.

▲ Judging from the notes, DV-3 should be Davinci 3 (Da Vinci 3)

For example, the internal name of GPT-4 is actually DV-3. At the same time, it is also the "third author" of this article. Perhaps considering privacy issues, this was deliberately hidden by the author.

Netizens also found that the author is not very clear about the actual cost of GPT-4, and seems to mistakenly refer to GPT-4 as a plain text model instead of a multimodal model.

The part of the paper related to toxic content was also deleted when it was published, perhaps this is to avoid unnecessary negative impact on OpenAI.

In general, if you are interested in what GPT-4 can do, what are the current limitations, or the progress of AI, you can learn more about the most powerful large language model through this article.

The original address is here: https://arxiv.org/pdf/2303.12712.pdf

Enjoy it.

Cut the crap.

#Welcome to pay attention to Aifaner's official WeChat public account: Aifaner (WeChat ID: ifanr), more exciting content will be presented to you as soon as possible.

Ai Faner | Original Link · View Comments · Sina Weibo