Large models “downsize” into mobile phones, wall-facing smart launches MiniCPM, a high-performance small steel cannon

Large models open up a new era, and they must be AI-native.

Li Dahai, co-founder and CEO of Face Wall Intelligence, once said as above. In his view, the era of large models calls for AI native, and the large model hardware running on the device side is native hardware.

This afternoon, Wall-facing Intelligence officially released the 2B flagship end-side large model wall-facing MiniCPM, and the end-side large model battlefield will welcome a new player.

With small power and big power, the 2B performance small steel cannon is "on the machine"

We have previously reported that Mistral AI, the "European version of OpenAI", has released a short and compact Mistral-7B small model. Although it is only 7B in size, its performance and energy consumption have been highly praised.

MiniCPM, which is only 2B in size, has surpassed Mistral-7B in average scores in both Chinese and English in many mainstream evaluation lists. MiniCPM has more comprehensive capabilities, and its scores have surpassed Microsoft's star model Phi-2 in benchmark tests such as CEval, CMMLU, and MMLU.

In terms of the average score on the English list, MiniCPM has significantly surpassed models of the same or even larger scale, and is even comparable to models of 13B, 30B, and 40B scale. In the evaluation set MT-Bench that is closest to human evaluation, MiniCPM can even compete with Claude 2, making it a 2B performance cannon.

So, how does wall-facing intelligence achieve the goal of using small things to make big things happen?

  • Computing power: efficient infra throughout the process, 10x inference acceleration, 90% cost reduction;
  • Algorithm: The wall-facing model wind tunnel is small and large, and efficient model training configurations are found to achieve rapid formation of model capabilities;
  • Data: In terms of modern data factory, it forms a closed-loop traction model version for rapid iteration from data governance to multi-dimensional evaluation;

MiniCPM is also a good player in basic capabilities such as large model generation. Li Dahai, co-founder and CEO of Face Wall Intelligence, introduced at the launch that MiniCPM not only accurately knows the altitude of Mount Huangshan and Mount Tai, calculates the difference, but can even write code for self-development and optimization.

As the competition for large models becomes increasingly fierce in 2024, the addition of multi-modal capabilities is pushing artificial intelligence into the "synaesthesia" era. MiniCPM, which is known as the strongest multi-modal capability in its class, has also realized multi-modality on mobile phones. It can accurately identify dangerous creatures such as poisonous mushrooms and poisonous snakes in the wild.

Li Dahai demonstrated the effect of MiniCPM in practical applications on site. When airplane mode is turned on and questions are asked about self-rescue measures for accidentally eating poisonous mushrooms or getting lost in the wild, MiniCPM's answers appear to be more practical than empty "clichés." If you encounter poor signal conditions in the wild, these suggestions given by MiniCPM offline may be of great help to you.

In the technological competition with soaring scale, cost is the invisible competitiveness of large models. In addition to its powerful performance, MiniCPM's inference cost is only 1% of Mistral-Medium.

As a large-scale terminal-side model, MiniCPM has successfully run through mainstream international mobile phone brands and terminal CPU chips. Even old mobile phones can run normally. However, from the perspective of throughput, the operation is still running, and the actual performance may need to be improved.

Based on the accumulation of wall-facing intelligence in the field of large models, Li Dahai also officially announced at the press conference that it will further open source, "let large models fly into thousands of households." The open source address (including technical report) is as follows:
MiniCPM GitHub: https://github.com/OpenBMB/MiniCPM
OmniLMM GitHub: https://github.com/OpenBMB/OmniLMM

At the end of the press conference, Li Dahai also demonstrated the multi-modal real-time interaction capabilities of the mniLMM-12B model. Similar to the previous idea of ​​​​Google Gemini large model demonstrating multi-modal capabilities, they also asked MiniCPM to play a "guessing game" of rock, paper, scissors. The results showed that the smoothness, accuracy, and delay of its responses were all within acceptable levels. scope.

Logical reasoning ability is also an important highlight. After uploading a picture without a text description, it can accurately infer the meaning of the picture based on small details such as clothing, sunglasses, and guide canes, fully demonstrating its comprehensive ability to "see and think."

Internet of Agents

"If Agent capabilities are used in end-side models, they can better serve specific scenarios and create more value. I think these two directions can support each other and produce some wonderful chemical reactions."

Zeng Guoyang, CTO of Wall-Facing Intelligence, talked about the relationship between the terminal-side large model and the Agent. At this press conference, Li Dahai also reiterated the dual-engine strategy of large model + agent. In his view, the release of MiniCPM still serves the dual-engine strategy.

In fact, when large models seek to be integrated into landing scenarios, AI Agent becomes a key approach. Wall-Facing Intelligence is one of the first companies to propose the concept of Agent. Wall-Facing Intelligence officials have even asserted that the future world will be a world of Agents, and everything will be an Agent.

Imagine that when you are ready to make porridge, you only need to put the ingredients into the rice cooker, wait a moment, and a steaming pot of porridge will come out. During this process, the rice cooker uses the built-in Agent technology to automatically adjust the temperature and heat. , cooking can be completed without manual intervention.

At last year’s Yunqi Conference, Li Dahai said, “Large model + Agent will bring a new round of great technological changes.”

At that time, he compared large models to the engine of a car, which provides power to the car. However, to build a complete car, you also need steering, chassis, and everything else.

Similarly, he believes that on the basis of the engine of large models, a series of upper-level technologies, such as memory capabilities and the ability to use tools, need to be superimposed to expand broader application prospects and imagination. AI Agent is the entity that carries these technical capabilities.

Since its inception, Wallface Intelligence has begun to plan the technical route and implementation direction of "large model + agent", and has successively launched the "troika" of AI agents – XAgent, AgentVerse, and ChatDev.

These three products are all driven by large models and represent the cutting-edge innovation and application results of AI Agent. They aim to combine large models with the actual environment. Among them, XAgent is a large model-driven AI agent application framework, and AgentVerse is a A universal platform for agents, while ChatDev is a multi-agent collaborative development framework.

Specific to the implementation direction of ToB, AI Agent is expected to play multiple roles within the enterprise and reshape the enterprise's operational processes and organizational structure. These AI Agents can perform various tasks, similar to traditional enterprise employees, thereby reducing costs and increasing efficiency.

For consumer applications (ToC), AI Agent may appear in the form of an intelligent assistant to provide users with personalized and convenient services. These intelligent assistants can understand and predict user needs and provide help and suggestions in real time, thereby improving user experience and quality of life.

So what should the future of AI Agent look like?

The vision and concept of Wall-Facing Intelligence is "Internet of Agents", which is to allow AI Agents to connect everything in the world and realize the transformation from "Internet of Everything" to "Intelligence of Everything".

This concept was proposed by Liu Zhiyuan, a permanent associate professor in the Department of Computer Science at Tsinghua University and the founder of Wall-Facing Intelligence. He also said confidently in a public speech:

Facing the future, more people, devices and objects can be connected through large model-driven intelligent platforms, promoting the Internet of Everything to the Intelligent Internet of Everything, and entering a new era of intelligent Internet of Things (loA) of human-computer interaction. We will usher in the second emergence of artificial intelligence.

In the imagination of this top expert who has been working in the computer field for decades, the second emergence of artificial intelligence is to associate individual agents with their own strengths to form complex group intelligence, thus showing more powerful emergent behavior. .

In a recent online interview between Li Dahai and Kevin Kelly, the author of "Out of Control", Kevin Kelly also expressed similar views. He believes that collaboration between AIs has huge potential, and in the future an ecosystem composed of hundreds or even thousands of different AIs will be formed, releasing amazing power.

In fact, swarm intelligence, as a phenomenon commonly found in nature and society, refers to the ability of a group composed of many simple individuals to exhibit a high level of intelligent behavior. In nature, this intelligent behavior manifests itself in various forms, such as the collective actions of ant colonies, bee swarms, and fish schools.

When ant colonies are looking for food sources or returning to their nests, they can guide their fellow ants by releasing pheromones to find the shortest path. When migrating, fish schools form a protective swimming pattern by closely coordinating their positions and actions, allowing the entire group to effectively avoid predators and save energy.

Li Dahai also borrowed the famous sayings from "The Three-Body Problem" to set a New Year's Flag – to make AGI come "faster", echoing the vision of the Wall-facing Intelligence Company to "intellige all things".

The beauty of life is being obsessed with something. Life is too short, don’t do frivolous things.

# Welcome to follow the official WeChat public account of Aifaner: Aifaner (WeChat ID: ifanr). More exciting content will be provided to you as soon as possible.

Ai Faner | Original link · View comments · Sina Weibo