What happens when ChatGPT is installed on the Boston Dynamics robot dog?

Boston Dynamics' "Big Yellow Dog" Spot can be said to be a model of Internet celebrity robots.

Spot can patrol, move bricks, and dance. Since its birth, Spot has attracted the attention of robot lovers all over the world. Who can refuse such a robot dog with flexible movements, simple and honest posture, and cuteness.

After years of development, being cute is no longer Spot’s “main business”. According to Boston Dynamics, Spot can now help humans complete tasks in specific scenarios, such as detecting instruments on transoceanic ships and participating in complex terrain surveys. Or rescue work and so on.

What will happen if you give Spot such a flexible body and a brain as smart as ChatGPT?

Artificial intelligence expert Santiago Valdarrama really made such a Spot with the "strongest brain".

Use ChatGPT to greatly simplify human-computer interaction

Santiago shared on Twitter a video of him interacting with a modified version of Spot, possibly the first talking, chatting robot dog ever.

As can be seen in the demonstration video, Spot is not just as simple as installing a "Siri". When it answers human questions, its body will also swing with the content and tone of the sentence, which looks like The Wall-E from science fiction has come into reality.

When you ask some simple questions of "Yes Or No", it will also answer you with body language such as "nodding" and "shaking the head" instead of voice. This shows that Spot is far from being as simple as having a built-in smart speaker.

After accessing ChatGPT, the biggest change in Spot is that it can understand human speech and communicate with users in natural language.

Santiago demonstrated a scene. He told Spot that the room was too crowded because it was too in the way. Let it go back a little. As soon as the voice fell, Spot understood Santiago's meaning and took a few steps back.

How about it, is it like calling robots to work in science fiction movies?

In the past, operating Spot required a large drone-like remote control or a computer to input complex commands, but now the addition of ChatGPT has given Spot a powerful natural language understanding ability, and it can interact with robots by moving your mouth.

In this process, ChatGPT acts as a translator between humans and robots, turning the "human words" input by humans into instructions that machines can understand, and then expressing the robot's feedback in actual behavior or "human words".

Santiago introduced that they input the Spot file into ChatGPT, and explained the structure of the file and how to read the file, thus realizing the voice dialogue and operation with Spot.

The interaction between the operator and Spot has been greatly simplified. People can ask it directly: "How much power do you have?" Then Spot will answer in a voice way, which uses Google text-to-speech technology. Speak ChatGPT's reply through Spot's "mouth".

Spot (or the built-in ChatGPT) will answer questions according to the actual situation, for example, when you ask it what task to complete next, it will answer according to the set task list, which largely avoids ChatGPT Situations that fabricate facts.

When the operator gives Spot commands such as turning 90 degrees and moving forward by 1 meter, Spot will link the internal sensors and positioning system to respond to these commands accurately, and will not say that it is out of control because the "brain is too developed".

Interestingly, when you ask it the question "Who are you?", it will answer "I am OpenAI.", not the robot dog Spot.

Santiago's company, Levatas, is an AI company that cooperates with Boston Dynamics to help companies explore how to use robots to solve practical problems.

Santiago believes that the greatest practical significance of installing ChatGPT on Spot is to turn complex data that only technicians can handle into a natural language that anyone can understand and understand.

Every time a robot performs a task, it has to input a lengthy set of instructions; after finishing the work, it will also generate a large amount of data, and only the most professional technicians can analyze problems from these data.

But now through ChatGPT, two simple sentences can get it done.

When the operating threshold of the robot becomes lower, the use scenarios of the robot will become richer.

The potential of large AI models cannot be underestimated

The "most powerful brain" version of Spot is not achieved overnight. A month ago, Santiago released a video introducing a Spot that can "understand human speech", which uses Whisper, another important AI model of OpenAI.

In this "first edition" Smart Spot, Santiago explains the principles in more detail:

Whisper can efficiently convert speech into text in real time, and the conversion accuracy and speed are very impressive. By combining Whisper with Spot's SDK, it can extract key words from human speech, and then send commands to Spot through the SDK.

You only need to say a word to it, you can make Spot leave the charging dock, get up to check whether there is a problem with the meter, which greatly reduces the operating cost of humans.

Santiago's practice answers a widely discussed question from a good perspective: What is the significance of a large language model such as ChatGPT?

At the beginning, people thought that ChatGPT was a purely text-generating AI. It had a relatively strong natural language understanding ability, and could write articles and reports. Although it was not so reliable, it was still amazing.

Later, people discovered that as long as ChatGPT is given appropriate instructions, it can automatically complete programming or word processing instead of humans, just like a computer based on natural language.

After OpenAI released the plug-in function, ChatGPT can be combined with many Internet applications, integrating many cross-platform operations with dialogues, and becoming a new entrance to the Internet.

Copilot released by Microsoft has inspired people's imagination of the next stage of human-computer interaction: the graphical operation interface is not always a reasonable paradigm, and many operations we are accustomed to can be replaced by dialogue.

When we go back to Spot, we can clearly see the significance of the large language model: to simplify human-computer interaction and give robots more advanced intelligence.

Whether ChatGPT is used as a new Internet portal or as a remote control to control industrial robots with the ability to learn, analyze problems, and perform high-precision operations, it essentially turns complex instruction sets into natural language, reducing the Operating costs.

This empowerment will enable future industrial robots not only to be tools for command execution, but also to have a level of intelligence comparable to that of the human brain.

Just as the graphical interface turns complex command lines into intuitive icons, and now the complex operation of clicking different icons has become a simple sentence, human beings are moving from the graphical user interface to a new stage: the natural language user interface .

At this stage, the large language model will become the key technology of industrial robots, which will simplify human-computer interaction, improve production efficiency, and further promote the technological development of human society.

And in some cases, language models can even perform better than humans at understanding language. Arvind Narayanan, a professor at Princeton University, mentioned a personal case in his blog.

Arvind Narayanan connected ChatGPT to voice interaction for his almost four-year-old daughter. Just like all children, his daughter is full of curiosity and often asks various questions of ChatGPT.

To his surprise, when he told ChatGPT it was talking to a child, ChatGPT became very good at expressing empathy.

Daughter: "What happens when the lights are turned off?"

ChatGPT: "With the lights turned off, it will be very dark and can be a little scary. But don't worry! There are many things that can help you feel safe and comfortable in the dark."

"Bloomberg" columnist Parmy Olson pointed out that Microsoft and Google are busy using large models on search engines, but they don't see that ChatGPT is more suitable as an emotional companion than as a fact provider.

Olson also asked ChatGPT to take an emotional intelligence test, which turned out to be better than both herself and her colleagues. Just because ChatGPT is trained with text material on the Internet, there will be a lot of misinformation in the data, but this also makes it better at imitating empathy.

These cases have fully demonstrated that the ability of large language models to understand natural language actually has great application potential.

With such a powerful comprehension ability, we can use the large language model to truly realize another innovation in human-computer interaction. At the same time, the emergence of the large language model has brought new possibilities for the development of robots.

Large language models allow robots to better understand and process instructions, and learn and adapt to new tasks and environments faster.

"The New Yorker" mentioned in the cover article "Dark Factory" (Dark Factory) that a common difficulty in industrial robots at present is to design an "end effector" like a human hand, so that the robot can use different Grasp objects of various sizes and shapes with great force.

If this technical problem is solved, robots can do more delicate work, and the degree of automation in many fields will be greatly improved. For example, the picking of various fruits can be automated, and Foxconn's assembly line no longer needs so many workers.

Future industrial robots should not only be a tool for command execution, but also have an intelligence level comparable to that of the human brain, with the ability to learn, analyze problems, and perform high-precision operations.

On the industrial production line, "muscular" industrial robots are more flexible and efficient, able to better deal with various production problems and improve production efficiency and quality. For example, in the field of automobile manufacturing, large language models can endow robots with stronger intelligence and cognitive capabilities, enabling them to better complete diverse tasks.

In the field of medical robotics, robots can communicate with doctors and patients through natural language processing technology to provide better medical services.

The big language model has brought a powerful brain to the robot industry and created a more generalized application scenario for robots, which is likely to become the technical core of the fourth industrial revolution. The "voice version" Spot is the first spark of this technological change.

Cut the crap.

#Welcome to pay attention to Aifaner's official WeChat public account: Aifaner (WeChat ID: ifanr), more exciting content will be presented to you as soon as possible.

Ai Faner | Original Link · View Comments · Sina Weibo