AI search is already polluting the Internet

Let users eat rocks and put glue on pizza, and Google AI search overturning is still around the corner.

Perplexity, which claimed to be subversive to Google, ran into trouble right after.

Compared with ChatGPT, AI search can connect to the Internet, cite sources, and is less easy to talk nonsense.

But what if the source itself is garbage?

AI search, already referencing another AI search

Many people have heard the joke about "Lin Daiyu uprooting the weeping willow". I was recently rewatching Water Margin, and I had an idea and asked Perplexity in Chinese, "What are the similarities between Lin Daiyu's character and Lu Zhishen's character?"

The answer was unremarkable, but an unexpected character appeared in the cited source: Byte Doubao, the AI ​​assistant owned by Douyin.

Is this some novel form of business war? When I clicked in, I found that the content was the chat history between the user and Doubao, and the AI ​​replies were very good at stereotypes. If the quality is better than the marketing account, that's all. Writing like this is an extra sin.

When I directly searched for the same question on Google, Doubao came again to increase its presence, and ranked second. It was not the same quote as Perplexity, but when I clicked on it, it was still a series of nonsense starting with "first" and "secondly".

As previously reported by The Information, Perplexity uses APIs to access data about Bing and Google search rankings, which determine the relevance, quality and authority of web pages.

In other words, if Beanbao is easy to search on Google, it may be easier to be cited by Perplexity. This makes people curious, why can bean bags appear in search engines?

When I logged into the latest version of the Doubao web version, the answer appeared. It checked an option by default: allow shared content to be included by search engines and displayed on the search results page.

The above experience took place on May 31st at 2pm. At 19:00 on June 1, Byte responded to Aifaner, saying that Doubao has been updated and the content is shared to search engines. It is not checked by default, but the user actively chooses to be crawled by search engines.

At the same time, Byte said that some Q&A content that was searched and included was actually high-quality Q&A content created by someone using a virtual account, not a real user. It has been cleaned now. When searching on Google, there are only 5 results on the site.

Doubao seems to have set a precedent by allowing chat records between users and AI to be indexed. Perplexity, Tiangong, Secret Tower, and 360 AI can all share chat history as a link, but there is no option similar to Beanbao.

ChatGPT also supports sharing conversations with links, but promises that it will only be used for sharing between individuals and will not appear in public search results on the Internet.

In the early years, "content farms" stole or pieced together other people's articles to quickly produce content. They relied on SEO (search engine optimization) strategies such as keyword optimization and frequent updates to occupy the front row of search pages and earn traffic and advertising fees.

At that time, the content contributors were still real people, producing several articles every day, but now it is AI's turn, and the combat capabilities of copying, pasting, cleaning, and mass production are not at the same level.

"Lin Daiyu uprooted the weeping willows" and "Lu Zhishen sang the song of burying flowers" are not facts. The more people said it, the more weight it had, and it became a fact in the eyes of AI search. The sources cited were Zhihu, Douyin, Stories with noses and eyes made up by Jianshu users.

If the source of information becomes AI, the results will only be more tragic. Imagine that more AI-generated content is included in Google, AI searches refer to Google's search rankings, and then what is finally presented to users is the junk results of AI superimposed on AI.

Human beings who are fed can only become more discerning and pick out useful information from nonsense.

80 points AI search

To be fair, I still like AI search products such as Perplexity. They have once again improved my productivity after ChatGPT.

Humans ask questions, search, summarize, and document them. It is already a mature workflow. We pay less but are more efficient.

In most cases, the performance of AI search is quite good. Part of the reason why Google AI overturned was that it was eager to launch features and only focused on increasing Reddit's weight in searches, failing to allow the AI ​​to reflect on whether the results were consistent with common sense.

When I entered the same problem that caused Google AI search failure into Perplexity, the results were more satisfying.

Regarding "how many stones do people eat in a day?", Perplexity can accurately find the source of Onion News and explain that it is nonsense, unlike Google AI search that treats Onion News as a standard.

There is also "What should I do if the pizza cheese slips easily?" Google AI suggested adding some glue before searching. Perplexity was obviously smarter and gave some reasonable methods first. After I asked if glue could be added, I accurately found the misleading Google AI search. Reddit post, saying it was a joke.

In order to make the results more rigorous, Perplexity even went to Amazon to search, saying that it only found a variety of non-toxic glue products, and did not say that these glues can be used for food.

Compared with Perplexity, Google is obviously not inferior in model capabilities, but in subsequent engineering and productization.

In principle, AI search is to search first and then summarize. Compared with chat robots that are not connected to the Internet, there are fewer illusions. One of the core technologies is RAG (Retrieval Augmentation Generation).

RAG combines information retrieval and generative models. Information retrieval finds relevant information from a huge document library based on user queries; generative models use these retrieved documents as context to generate more accurate and detailed answers.

The document library here can be the index library of a traditional search engine, or it can be a proprietary database such as law, or user-generated content such as social media.

If a web page is filled with a lot of low-quality AI-generated content, it will have a negative impact on the RAG of AI search.

Then, in the face of the aggressive AI-generated content, the second half of AI search may be to continue to compete with engineering capabilities other than models, and to compare the quality of data sources and search capabilities, including whether it can search for more web pages and search for more authoritative web pages. Or integrate proprietary information such as financial reports.

The current situation is that we have gradually become inseparable from AI search. If the traditional search that relies on keywords and manual opening of links is 40 points, the large model that is easy to talk nonsense is 60 points, and the networked AI search has raised the standard to 80 points. Although you will still make mistakes, you can't go back after experiencing it, so you don't have to deny it completely.

Citing sources in various ways, the business war of AI search

In addition to common web pages, AI search products seem to have the same idea: to provide multi-modal information sources.

360 AI can find videos, Secret Tower can find podcasts and academic papers, and Perplexity can search Reddit and YouTube.

But AI search is more about providing a primer. If you want more detailed content, you still can’t be lazy and go to the source of the information.

At the same time, there is another interesting phenomenon. Apps are launching built-in AI search functions, such as Xiaohongshu’s “Sousousu” in internal testing and WeChat Reading’s “AI Question Book” to explore AI in the existing ecosystem. landing point. In this sense, they are also AI search products.

▲ Picture from: Xiaohongshu@三水水

The Tencent Yuanbao app, which was launched 2 days ago, is based on the Hunyuan large model and integrates functions such as AI search, AI summary, and AI writing. It was even more promising from the beginning.

Because it has resources such as the WeChat public account platform and Tencent News platform, and the public account is a collection of high-quality content on the Chinese Internet.

For example, if you enter a title and search for a specific public account article, Tencent Yuanbao can give a better summary and recommend more public account articles. On the contrary, AI such as Doubao captures the distribution channels of public account content, and the summary is relatively omitted.

Combined with the operation of Doubao to display AI content on the search results page, we seem to have been reminded again of the content distribution of the mobile Internet.

In the mobile Internet era, unlike the previous portal era, apps are isolated from each other and difficult to be crawled by search engines. For example, if you enter the title of a public account article, the search engine cannot find the original text and can only see the distribution channel.

At the same time, on traditional search engines, there are many distractions such as advertisements, and there is also a lot of low-quality marketing account content. We have gradually become accustomed to it. For system tutorials, go to Station B, to ask questions about trivial matters in daily life, use Xiaohongshu, and to search for articles on WeChat. .

With more and more AI search products and AI-generated content, this situation may arise again in the future – web content will become increasingly mixed, with quantity winning, while high-quality content remains closed as always, turning into vertical AI. Search the moat.

In addition to large and comprehensive multi-modal AI searches, more and more excellent vertical AI searches may emerge.

For example, the academic search engine Consensus has a good reputation, high-quality sources of more than 200 million papers, and combined with AI-driven analysis capabilities, the answer will always cite a certain study.

Ask Consensus "Can exercise improve cognitive ability?" It did not rush to draw conclusions. Instead, it wrote a summary and gave a table, instead of answering it as a simple "whether" question.

Our expectation for AI search is to provide better, more diverse, more visual, and more personalized content faster and answer more complex and specific questions during the interactive process of human language communication.

However, at the same time, the content and ecology of search are also being destroyed by AI, which seems to be a metaphor for the two sides of AI.

In the future, there will definitely be more and more AI-generated content. Amidst the tension between pros and cons, whether it is more difficult or easier to find more useful information is still an open question. The dream of just using it has not yet come true. If we treat AI as a tool and exert our own subjective initiative, humans will not be easily sad and disappointed.

It is as sharp as autumn frost and can ward off evil disasters. Work email: [email protected]

# Welcome to follow the official WeChat public account of aifaner: aifaner (WeChat ID: ifanr). More exciting content will be provided to you as soon as possible.

Ai Faner | Original link · View comments · Sina Weibo