Claiming that Chinese evaluation surpasses GPT-4, Baichuan Intelligent releases Baichuan 3, a large model with over 100 billion parameters

"Tsinghua-based" large-model startup companies are recruiting again.

On January 29, Baichuan Intelligence, founded by Sogou founder Wang Xiaochuan (BS at Tsinghua University), officially released Baichuan 3, a large language model with over 100 billion parameters. This model not only performs well in multiple authoritative tests, but also surpasses GPT-4 in Chinese indicators.

Test results show that Baichuan 3 reaches a level close to 90% of GPT-4 in multiple English benchmarks such as MMLU. In many Chinese benchmark tests such as CMMLU and GAOKAO, Baichuan 3 is far ahead, surpassing GPT-3.5 by a large margin and also surpassing GPT-4 in all aspects.

In the mathematics and code list tests, as well as the evaluation of aligned lists such as MT-Bench and IFEval, Baichuan 3 has surpassed large models such as GPT-3.5 and Claude, and is also at the forefront of the industry, only slightly inferior to GPT-4 .

AI+medicine is a key application area of ​​large models. Medical problems are complex and changeable, knowledge is updated rapidly, and accuracy requirements are high, requiring models to fully demonstrate powerful understanding and decision-making capabilities in text, images, sounds, etc.

Therefore, Baichuan Intelligence regards it as the "crown jewel" of large models.

Baichuan 3 has undergone extensive training and optimization in the medical field, and the results after training are also very significant. Its performance in Chinese medical tasks such as MCMLE, MedExam, and CMExam exceeds that of GPT-4, and its performance in English medical tasks such as USMLE and MedMCQA is also close to GPT-4. level, winning the title of the Chinese model with the strongest medical capabilities in one fell swoop.

According to official disclosures, in order to strengthen training in this area, Baichuan 3 built a medical data set of more than 100 billion Tokens during the model pre-training stage, covering all aspects of medical knowledge from theory to practice to ensure professionalism and professionalism in the medical field. Depth of knowledge.

In the inference stage, Baichuan Intelligence optimizes prompts for medical knowledge by accurately describing tasks and appropriately selecting samples, making the model output more accurate and logical inference steps.

Semantic understanding and text generation are the most basic underlying capabilities of large models and can be regarded as the core pillars of artificial intelligence models. Wang Xiaochuan once pointed out that language is the boundary of human cognitive world. From a technical implementation level, perceiving and understanding language is more difficult than images and videos.

In his view, Newton used the three laws of motion to abstract the laws of the universe into mathematical expressions, which was a major leap forward for human cognition. The same goes for today's larger models. Mastering the laws behind language means mastering knowledge itself, as well as human thinking, communication and culture.

A language model is like an atomic bomb, it might ignite a hydrogen bomb. There will be better modeling in the future, which is what we need to do in the future.

Baichuan 3’s powerful Chinese language processing capability is one of its biggest highlights. Even when faced with difficult genres such as Song lyrics with complex formats, tight structures, and rich rhymes, Baichuan 3 can produce works with neat content, well-aligned counterpoints, and harmonious rhymes.

Officials stated that Baichuan 3 combines "RLHF&RLAIF" and iterative reinforcement learning methods to significantly improve the ability of large language models in poetry creation. Compared with other top models in the industry, its creation usability is increased by more than 5 times, and its literary attainments completely surpass GPT-4.

▲Experience link:

For example, in the official demonstration, Baichuan 3 was asked to generate a sensational song about OpenAI's "Gongdou Drama". Its performance of mixed Chinese and English is quite interesting.

"Defending the Storm·Recalling the Past and the Changing Storm"
Looking back on the past, amidst the changes, the field of AI is booming. Altman is suffering from separation, who can help? Nadella helps.
What's wrong with former peers becoming rivals? The grievances and grievances of the world are in the sky. Let’s create brilliance together and move forward hand in hand, without regrets, and the road ahead is endless.

It is worth mentioning that Baichuan Intelligence is different from other large model companies in its open source concept. This can be seen in Wang Xiaochuan's speech at the Baichuan 2 press conference.

Llama 2 The days of the open source model are over. We can now get an open source model that is friendlier and more capable than Llama, which can help support the development of China's entire ecosystem.

Although Llama 2 is advertised as open source, it is actually only available in the English environment. In contrast, the Baichuan 2 series is fully open to Chinese users and provides free services in a Chinese-English bilingual environment.

Baichuan Intelligent has successively open sourced four low-cost deployment models including Baichuan-7B, Baichuan-13B, Baichuan2-7B, and Baichuan2-13B, supporting large models in Chinese and English.

In addition, when asked by the media how to achieve open source and commercial closed source models to go hand in hand and quickly iterate, Chen Weipeng, co-founder of Baichuan Intelligent Technology, revealed that this is due to their rich search technology experience that can be quickly migrated and applied to large models. R&D.

From a technical perspective, search and large models share many common technical foundations. For example, in the key data processing link of model training, the team conducted data screening and optimization based on its experience in the search field, achieving repeated filtering and improving quality, thus providing high-quality data support for the model.

In September last year, when talking about the gap between domestic large models and ChatGPT, Wang Xiaochuan made this judgment:

GPT-4 is constantly improving, and they recently made a lot of noise with the launch of voice and image capabilities. From a time perspective, we think it may take two or three years to get closer to the current level of GPT-4.

Of course, in the fierce competition between large models, it is not enough to just stay in the technology exploration stage. The next step for Baichuan Intelligence is to accelerate the transformation of technology into application scenarios.

Wang Xiaochuan has mentioned "super applications" more than once in public. He even predicted that there will be several super applications in China this year. And this may become the next battle for large models.

# Welcome to follow the official WeChat public account of Aifaner: Aifaner (WeChat ID: ifanr). More exciting content will be provided to you as soon as possible.

Ai Faner | Original link · View comments · Sina Weibo