The AI ​​version of Siri may take over your phone

Siri has fallen behind.

According to the New York Report, this was what Apple software chief Craig Federighi and machine learning executive John Giannandrea thought after spending weeks testing ChatGPT last year, so they decided to give the 13-year-old voice assistant a major makeover.

With less than half a month left before the WWDC Developers Conference, technology reporter Mark Gurman brought the latest news on the AI ​​version of Siri. It seems that this voice assistant that is always complained about as "artificially retarded" will indeed usher in a big breakthrough. renew.

The bad news: some of its biggest features may not arrive this year.

Despite this, Mark Gurman said that Apple still regards iOS 18 as the most important upgrade in history. Apple WWDC24 will be held at 1 a.m. on June 11, Beijing time. APPSO will bring you the latest reports at Apple Park, so stay tuned.

(Preheating image to be replaced)

The long-awaited but promising “control apps” feature

According to reports, with the support of AI, Siri will be able to further "control applications" and achieve precise control of application functions.

For example, ask Siri to move files from one folder to another, or ask Siri to open a specific news article, or even ask Siri to give a summary of the article.

Although you can now use Siri to send text messages and even WeChat, Siri based on AI overhaul will go further and be able to analyze how people use their devices and learn more and more automatic operations. Apple plans to support "hundreds" of commands in the apps it develops.

▲ The Siri sending WeChat function has been implemented in iOS 10

It sounds really good, but Gurman said that this feature will initially be limited to apps developed by Apple, and will not be launched this year. It may have to wait until the subsequent update of iOS 18 next year at the earliest.

And there is also news that older models may only be equipped with A17 Pro iPhone 15 Pro, and Macs above M1 can support more local AI functions.

The new Siri may only be able to understand and execute one command at a time at first, but it is expected to support a series of commands in the future, such as generating a summary of the recorded meeting recording (also a feature expected to be launched in iOS 18), and then using email When sending it to colleagues, you can also directly add some text instructions, and a set of operations can be completed by Siri in one sentence.

The new Siri is also expected to be the same as other AI functions in iOS 18. There will be a judgment system to judge whether the AI ​​task can be completed locally on the device or needs to be run in the cloud based on the required computing power.

If we want to summarize Siri's development in the past 13 years, "taking care of the students but not cultivating them" is probably the most appropriate one.

At the iPhone 4s launch conference in 2011, Siri made its final appearance, which shocked the audience and the world. The demonstration of Siri at that time was like this: you can ask it what the weather is like in a certain city, or how a certain stock is performing, and you can set a reminder to automatically pop up when you leave the company in one sentence. It was also a very cool and futuristic feature at the time.

As a result, 13 years later, the above functions still seem to be the most used scenarios for Siri. Even the home control and shortcut commands that were later supported are still generally on and off functions. In fact, Siri has never made a qualitative leap.

Even when faced with the challenges of other latecomer voice assistants such as Google Assistant, Microsoft Cortana, Samsung Bixby, etc., and even the domestic Xiaomi "Xiao Ai Classmate" is becoming more and more useful, Apple still did not want to make progress until the emergence of ChatGPT. Realize that Siri is behind the times.

▲ Xiao Ai has already accessed the large model last year

Although Apple seems to introduce it at press conferences every now and then, Siri has become stronger again, can understand more instructions, and can do more things. But in many cases, being able to understand does not mean that it can be done, and being able to do it does not mean that it can be done well. .

For example, tell Siri that I want to take a selfie, and Siri will help you open the camera and front-facing camera without saying a word, and then nothing more happens. The user still needs to reach out and click the shutter. Even sometimes, Siri will only automatically jump to the camera application instead of jumping to front-facing mode.

If it is Samsung's Bixby assistant, it will automatically enter the countdown for selfies, and the entire process does not require user manual operation.

Apple's default camera application itself comes with a countdown function, so the implementation of this process does not require many AI factors, but Apple just does not make the experience good.

Siri’s rival is shortcuts

I wonder if you have ever used the "Shortcut Commands" function?

After Apple acquired Workflow, its product was integrated into the iPhone as a "shortcut command" in iOS 12, and was also launched on the Mac platform in macOS 12.

▲ Shortcut commands have been integrated into Apple’s ecological chain

This function can realize many advanced iOS functions, such as one-click clocking in on DingTalk, one-click production of LivePhotos and videos into GIFs, and even automatic operation to achieve "automatically turn off the alarm clock during holidays".

But this feature is not friendly to novice users. To create a new shortcut command, the user needs to select various operation modules in an interface similar to script programming, and connect them with various logics such as "if… then" and "as long as… then", such as "holiday alarm clock" "This kind of shortcut command contains a variety of logical judgments and automated operations. Even if users set it up according to the tutorial, it is easy to make mistakes.

Although Apple provides a "Quick Command Center" where you can directly obtain ready-made scripts, the shortcut commands provided there are simple and rarely have functions that address users' pain points.

This function is similar to the "Good Lock" module of Samsung Galaxy mobile phones, which provides very powerful customization functions, but the threshold is not low.

One of the most important abilities of AI large models is the ability to understand natural language and logical thinking. In other words, if the user says to the large-model AI, "I'm off work, help me check in," the AI ​​will know that you mean to perform the operations of "open DingTalk" and "check in," instead of saying that you "didn't listen." clear".

Today's Siri is not completely devoid of this ability. Asking Siri to remind you to buy a birthday cake for your family when you leave the company is a feature that was demonstrated at the iPhone 4s conference. Behind it is also a process of understanding the user's language and converting it into relevant operations.

▲ Many of the functions introduced at the Siri press conference are still the main capabilities of Siri now.

And Siri with the support of large AI models should be able to do far more than that. Users describe their complex needs in natural language. After Siri understands it, it converts it into the logic of the script and executes the corresponding steps by itself. This is the real "quick" command without having the user face complicated programming.

▲ChatGPT taught me how to use iOS shortcuts to set holiday alarm clocks. Although the logic is clear, it seems not very usable.

In addition to allowing users to customize operations more naturally, you can also expect Siri to become a more "active" assistant.

If you use iPhone for long enough, you will find that sometimes, iPhone will automatically pop up suggestions. For example, when a user puts on a Bluetooth headset, it is recommended that you turn on NetEase Cloud Music because this is your usage pattern; or when charging late at night, the phone automatically reduces the charging power to maintain the battery, but it can still be used before you wake up. It is also because I found that you have the habit of charging for a long time before going to bed.

These are the results of machine learning, and they are also the AI ​​functions that Apple has been working on. Modern people spend a lot of time on their mobile phones every day. It is very common to use mobile phones to buy, eat and work. Naturally, mobile phones understand your existence better than the roundworms in your stomach.

Imagine Apple's powerful situational awareness, coupled with the more powerful automatic operation capabilities of mobile phones, the AI ​​version of Siri may really be transformed into a real "personal assistant", predicting things before you leave them to it. Everything is arranged according to your needs.

For example, through the air tickets you booked, it will automatically help you check the weather after arrival at your destination, and also set an alarm in advance. Based on your travel habits and real-time traffic conditions, it will help you call a taxi in advance when you arrive at the airport based on the estimated travel time. It automatically pops up your flight tickets and uses the app to check in. When you arrive at the local area, it will open Dianping’s recommended restaurants for you. It’s like a personal assistant + tour guide.

If we want to realize this set of smooth operations, of course we will think about it, which requires both developers and Apple to go both ways. However, the speed of AI development has exceeded our imagination. Perhaps in the future, AI can directly imitate human operations.

UI that we can understand, AI is also learning

Although the new Siri's intelligent operation will only support Apple's own applications in the initial stage, I prefer to believe that this is only the starting point or the middle of Apple's AI Siri route, not the end.

I believe that the ultimate goal of Apple's AI is to realize this scenario: wake up in the morning, wake up Siri with "Siri", and then let it open the WeChat public account "Aifan'er" and read the latest article aloud, without having to use your hands at all. Next, listen to Ai Faner's morning report.

▲ An iPhone concept phone case from many years ago. The idea is to personify "Siri" and free up hands with voice.

"Shortcut commands" can support the operation of third-party apps, mainly because Apple has opened the API, and third-party application manufacturers can also split the operations in the application into modules that can be executed by shortcut commands.

But this depends on whether the application manufacturer is willing to provide relevant modules and operations. For example, if the Cainiao app does not open the operation of displaying the pickup code, no matter how smart Siri is, it will not be able to open the Cainiao app to display the pickup code on its own.

What if we go one step further and allow AI to directly understand what a pick-up code is and where the pick-up code is in the app, and open it on its own after receiving instructions?

This may sound a bit too sci-fi, but the industry is already making related attempts.

At the Microsoft Build 2024 Developers Conference last week, Microsoft made a live event: Copliot supported by GPT-4o can view the content on the screen in real time and guide players to play "Minecraft" together.

In the demonstration, Copilot used very smooth and natural language, even with a hint of emotion, to guide players to make a sword in the game. In this process, Copilot can identify the items in the game backpack and inform the player of the missing materials, just like a "master" who guides you through the game.

This shows that the AI ​​assistant is no longer just a text robot that "asks and answers", or can only program and process data in the background, but can truly begin to understand the UI interface that we humans see, and can also know how we operate it.

The "AI hardware" Rabbit R1, which has been popular for some time this year, basically abandoned the operating interface and completed the use of various services entirely through the AI ​​voice assistant. The Rabbit company claimed that they used a method called "Large Action Model" (LAM) )'s AI model technology can imitate human operations on the server after understanding the user's instructions, and directly complete the user's instructions in relevant web pages and applications.

▲ Rabbit R1 claims to use voice to achieve cross-application and cross-platform operations.

Although the performance of Rabbit R1 is far from the scene they depicted, the vision itself is very beautiful. The excellent performance of robots such as GPT-4o in visual understanding also makes people feel that the future of AI replacing humans is indeed not far away. .

As a company with great appeal among developers, Apple does not need to imitate the startup Rabbit's full use of "LAM". It can open relevant interfaces and provide SDKs to allow major third-party developers to use it in their applications. Natively supports AI operations, bringing a more mature and stable voice operation experience.

Apple-related research shows that they do have this idea. In conjunction with Apple's application UI design standards, Siri can more easily understand everything on the iPhone screen.

▲ Apple is also studying how to make large models understand the UI

Although technology is inferior to humans, Apple's leadership in the number of users and ecological construction can become a very powerful advantage.

9to5Mac predicts and comments on Apple’s upcoming Siri and AI updates:

at WWDC. We may not see anything particularly revolutionary, but the incorporation of AI into systems and applications used by millions of people every day is a revolution in itself.

Compared with trendy hardware like Rabbit R1, smartphones that everyone has may be the best carrier for AI.

The user does not need to know that he is using an AI function, but when he asks Siri to help plan a travel plan and book air tickets, AI is already beginning to profoundly change his life.

# Welcome to follow the official WeChat public account of aifaner: aifaner (WeChat ID: ifanr). More exciting content will be provided to you as soon as possible.

Ai Faner | Original link · View comments · Sina Weibo