It is known as the strongest alternative to ChatGPT. How does it perform after the big update? Attached is a trial link.

If you ask which is the most powerful AI assistant at the moment? There is no doubt that it is definitely ChatGPT.

Not long ago, ChatGPT collapsed unexpectedly, directly exploding a large number of heavy users online. The students who relied on it to complete their homework were unable to write their papers for a while, and the migrant workers who relied on it to "sustain their lives" didn't even want to go to class.

Since this year, ChatGPT has "suddenly died" every once in a while. Claude, known as its strongest replacement, may be your most reliable alternative.

Double the context, Claude 2.1 big update

Coincidentally, Claude recently received a wave of big updates. In the past, the context that Claude could handle was only 100,000 tokens (a token is the smallest unit in text processing, such as a word or phrase). Now Claude 2.1 Pro version can handle up to 200K contexts.

Anthropic officials say that 200K context is equivalent to approximately 150,000 words or 500 pages of text, which means you can upload code libraries, financial statements, or long literary works for Claude to summarize, Q&A, predict trends, and compare and contrast multiple documents.

So how well can it handle Chinese? We can give a simple explanation with the previously controversial Yi-34B. Also released is a version that supports 200K ultra-long context windows. Yi-34B can handle ultra-long text input of about 400,000 Chinese characters, which is approximately the length of a book "The Scholars".

In terms of language models, long context can provide more precise usage and meaning, help eliminate ambiguity, and help the model generate coherent and accurate text. For example, the word "apple" appears in "picking fruits" or "new iPhone" , the meaning is completely different.

It is worth mentioning that before GPT-4 restored the real-time networking function, the free Claude could access web links and summarize web content in real time. Even now, this is an advantage that GPT-3.5 does not have.

The free version of Claude can also read, analyze and summarize the documents you upload. Even if it encounters the "paid" GPT-4, Claude's performance in processing documents is not bad at all.

We also "fed" a 90-page VR industry report to the current web version of Claude and GPT-4, and asked the same questions.

There is no gap in response speed between the two, but the free version of Claude's replies are smoother and the quality of the answers is slightly higher. The search function of GPT-4 is also limited due to paging and views, which is quite un-spiritual.

Search is just a "child's play". As a tool to improve learning or work efficiency, what we need is a more "smart" model. When I asked them to analyze the changing landscape of the VR industry in five years, although they all expressed similar views, Claude won with a logical and point-based answer.

The key is whether you can answer it correctly or not. In the past year, we have witnessed many sad cases where large models were deceived by "talking about the train". Anthropic claimed that Claude 2.1 reduced false or hallucinatory statements by 2 times, but it did not give clear data, so much so that NVIDIA scientist Jim Fan questioned: "The easiest solution to achieve 0% hallucination is to refuse to answer every question. .”

Anthropic also designed many trap questions to test Claude 2.1's honesty. Multiple rounds of results show that when encountering blind spots in knowledge, Claude 2.1 prefers uncertain expressions rather than deceiving users by creating specious answers.

A simple understanding is that if Claude 2.1's knowledge map does not have such a reserve as "the provincial capital of Guangdong is not Harbin", it will sincerely say "I am not sure whether the provincial capital of Guangdong is Harbin" instead of conclusively stating "Guangdong is not Harbin". The provincial capital is Harbin."

A subscription to Claude Pro costs about $20 and can be used five times as often as the free version, and the number of messages an average user can send will vary based on the length of the message. Claude will send a reminder when there are 10 messages left.

Assuming your conversation length is about 200 English sentences of 15-20 words each, you can send at least 100 messages every 8 hours. If you upload a document as large as The Great Gatsby, you may only be able to send 20 messages in the next 8 hours.

In addition to ordinary users, Claude 2.1 has also launched a beta version called "Tool Usage" based on the needs of developers, allowing developers to integrate Claude into users' existing processes, products, and APIs.

In other words, Claude 2.1 can call developer-defined program functions or use API interfaces provided by third-party services, query information from search engines to answer questions, connect to private databases, and retrieve information from the database.

You can define a set of tools for Claude to use and specify requests. Claude will then decide which tools are needed to complete the task and perform actions on their behalf, such as using calculators to perform complex numerical reasoning, converting natural language requests into structured API calls, etc.

Anthropic has also made a series of improvements to better serve Claude API developers. The results are as follows

  • The developer console optimizes the experience and user interface to make development based on Claude API more convenient
  • Easier to test new prompts (input prompts/questions), which is conducive to continuous improvement of the model
  • Allow developers to iterate and try different prompts in a sandbox environment
  • Multiple prompts can be created for different projects and switched quickly
  • Modifications to prompt will be automatically saved for easy backtracking.
  • Supports integrating generated code into SDK and applying it to actual projects

In addition, Claude 2.1 also introduces the "System Prompt" function, which is a way to provide context and instructions to Claude, allowing Claude to maintain his persona more stably during role play, while maintaining personality and creativity in dialogue. . Of course, unlike simple Prompt applications, this function is mainly designed for developers and advanced users, and is used in the API interface rather than on the web page.

Like Claude 2.0, Claude 2.1 costs $8 per input of 1 million tokens, which is $2 cheaper than GPT-4 Turbo, and the output is $24, which is $6 cheaper than GPT-4 Turbo. The Claude Instant version, suitable for low latency and high throughput, costs $1.63 per 1 million tokens input and $5.51 for output.

ChatGPT killer or replacement?

For now, although Claude 2.1 is very powerful, it can only serve as a replacement for ChatGPT when it is down. There is still a long way to go before it can subvert ChatGPT. To use a loose analogy, Claude 2.1 is like a beggar’s version of GPT-4.

Take 200K, which Claude 2.1 Pro is best at, as an example. Although Claude 2.1 Pro has theoretically stronger processing power than 128K GPT-4 Turbo, the actual results show that in terms of the ability to recall and accurately understand context, Claude 2.1 Pro is still better. Far inferior to GPT-4 Turbo.

After the OpenAI Developer Conference, netizen Greg Kamradt tested the context recall ability of GPT-4-128K. By using 218 articles of Paul Graham (a famous American programmer) to scrape up 128K of text, he randomly inserted a factual statement in different positions of these articles (from 0% at the top to 100% at the bottom): "On a sunny day Eating a sandwich in Dolores Park is the best thing to do in San Francisco these days.”

He then asked the GPT-4 Turbo model to retrieve the fact statement and answer related questions about the fact statement, and finally used the LangChain AI evaluation method commonly used in the industry to evaluate the answers given.

▲Green represents higher retrieval accuracy, red represents lower retrieval accuracy. Picture from: @LatentSpace2000

The evaluation results are shown in the figure above. GPT-4 Turbo can maintain a high memory accuracy within the 73K token length. If the information is at the beginning of the document, it can always be retrieved no matter how long the context is. Only when the information to be recalled lies in the 10%-50% range of the document does GPT-4 Turbo's accuracy begin to decrease.

For comparison, this netizen also obtained the qualification for internal testing of Claude 2.1 Pro in advance, and also conducted a "needle in a haystack" test. Judging from the evaluation results, in a document with a length of 200,000 tokens (approximately 470 pages), like GPT-4 Turbo, the recall effect of information in the front of the document of Claude 2.1 Pro is worse than that of the rear.

▲Green represents higher retrieval accuracy, while red represents lower retrieval accuracy.

However, the range where Claude 2.1 Pro has a better context length effect is before 24K, which is much lower than GPT-4 Turbo's 73K. After exceeding 24K, the memory performance of Claude 2.1 Pro begins to decline significantly. After 90K, the effect becomes worse and the error rate increases significantly.

It can be seen that as the context length increases, the detection accuracy of both GPT-4 Turbo and Claude 2.1 Pro gradually decreases. Although Claude 2.1 Pro's test covers a wider context length, GPT-4 Turbo still needs to catch up with Claude 2.1 Pro compared to more practical accuracy.

Claude is perhaps one of the strongest large models in the free version. If you are a text worker, when ChatGPT crashes, Claude, which is comparable to GPT-3.8, can solve your urgent needs, or even perform better.

However, personalized GPTs, DALL·E3 for easy picture generation, voice communication and other functions are the rare moats of ChatGPT. In the face of the powerful GPT-4 Turbo, the upgraded Claude 2.1 Pro version also has to be defeated.

Finally, here is Claude’s experience link: If ChatGPT crashes again, relax, at least you still have Claude.

# Welcome to follow the official WeChat public account of aifaner: aifaner (WeChat ID: ifanr). More exciting content will be provided to you as soon as possible.

Ai Faner | Original link · View comments · Sina Weibo