After experiencing the combination of DALL·E 3 + ChatGPT, I felt Party A’s happiness

"The astronaut floating in space lay on the clouds, and the clouds turned into a comfortable armchair with a cloud-shaped remote control on the armrest. The astronaut waved to the camera, and under his feet, the earth became a A mesmerizing swirl of light."

Maybe you have seen such a fantasy in your dreams, but if you want to realize it in reality, it will probably take N centuries. But before that day comes, you can first use DALL·E 3 "Dream Come True" real".

DALL·E 3 is not an unfamiliar tool, but I still need to explain it to those who don’t know it. DALL·E 3 is an AI image generator. You can think of it as the OpenAI version of Midjourney.

In September, OpenAI announced that DALL・E 3 will be integrated into ChatGPT, which can be called the most powerful model merger in their respective fields. More importantly, DALL・E 3 is natively built on ChatGPT, without the need for detailed prompts. You can generate pictures directly in ChatGPT.

Early this morning, OpenAI officially announced that DALL·E 3 is now open to all ChatGPT Plus and Enterprise users. A little tip, if you don’t want to spend money, you can also use Microsoft’s New Bing to play DALL·E 3.

The only thing you can’t think of is that you can’t draw without DALL·E 3

So what is the photo-generating effect of DALL·E 3? OpenAI listed three iconic examples on its official blog, involving scientific projects, website design, corporate logo design and many other scenarios.

For example, if you need to demonstrate cirrus clouds in a class report, you can ask DALL·E 3 to generate sufficiently detailed cirrus cloud pictures.

Or if you are a website designer and are still scratching your head about web design, you can also use DALL·E 3 to stimulate more inspiration.

As for the third scene, it is a common corporate logo design in daily life. You only need to enter the prompt (prompt word), and the "rabbit + coffee" design plan will be quickly presented in front of you.

Judging from the final pictures given by the official, the details of the pictures are quite complete, the styles of the four design plans can be seen to be relatively obvious, and the overall level is quite satisfactory.

Of course, this is just the finished picture given on the official website, and it does not rule out that it has been "beautified". So with this question in mind, we also followed the official prompt words and entered them to see the actual effect?

The final actual effect is not much different from the official picture, but there is also a small "Bug". For example, in the second example, if you just enter the prompt word, the final output is text. This made me think that I did not adjust to DALL·E. 3 Interface, of course, is not a big problem, it is just an extra step to confirm again.

The dazzling "Gallery" displays a variety of generated pictures, comics, pixel paintings, oil paintings, and all kinds of styles. OpenAI seems to want to tell users through the "Gallery" that only you can't think of it, there is no DALL·E 3 Can’t “draw”.

Painting can be done, but the key is whether it is good or not. For example, I tried to ask him to draw a chess game with Li Bai wearing white clothes and Du Fu wearing black clothes.

"Stop generating" lasted for a while and gave four ridiculous pictures. In the first picture, not only the color of the clothes was wrong, but more interestingly, Li Bai and Du Fu became international friends, and the chess they played was still chess. Obviously , DALL·E 3 needs to be strengthened in understanding the Chinese context.

The tension of the game in the second picture is quite high, but it does not miss the problems that the previous picture had. As for the problems in the third and fourth pictures, they are also very similar.

Of course, for AI image generators, the potential lies in the results after training. For example, when I tried to replace the first picture with Go and clothes and headwear, the final effect looked like this!

At first glance, there doesn't seem to be a big problem, but after taking a closer look at the chessboard, we can easily draw a conclusion: Li Bai and Du Fu turned Go into a "jigsaw puzzle"?

  • 1. When Li Bai was playing chess, he became so angry that he overturned the chessboard.
  • 2. Du Fu got angry and punched Li Bai
  • 3. Finally, Li Bai and Du Fu shook hands and talked, and continued to play chess.

During the game, it is normal to have some friction, so I asked DALL·E 3 to generate a chain diagram according to the requirements below.

Out of ten, what score do you think you can give these three pictures?

From the full launch to the present, DALL·E 3 has also been played with various tricks under the development of powerful netizens. If you are a Gundam player, you can let DALL·E 3 become a designer, design the coolest Gundam drawings for you, display various parts in a list, and then use 3D printing to print them.

However, it should be noted that the detailed details of the Gundam drawings may seem overwhelming, but in fact, there are occasionally a few extra parts.

Or maybe the "cage duel" between Zuckerberg and Musk has not yet taken place, and the conversion between the C port and the Lightning port has also caused a lot of controversy, so why not let the Lightning port and the C port have a wonderful "cage duel"? What about "Duel"?

There is no need for prompt words, it is AI that is working for you

In addition to the full push of DALL·E 3, OpenAI also revealed the specific technical details behind DALL·E 3 to the outside world through a paper.

In order to facilitate your understanding, we will use a simple example to explain the entire technical process after peeling off the cocoons of this paper.

First, OpenAI collected a large number of images and corresponding text descriptions as training data. For example, if the picture is a cat, then the corresponding description of the picture is an orange cat sitting on a chair.

But the description of "an orange cat sitting on a chair" is relatively simple and lacks specific details. It also does not mention what breed the cat is, what its physical characteristics are, and what its surrounding environment is like.

In order to obtain richer descriptions, OpenAI trained an image AI description generation model. That is to say, given the same photo to this model, it can output a more complex description:

For example, "an orange cat sitting on a chair" would become "a short-haired ragdoll cat curled up on its owner's beanbag, wearing a blue bow around its neck, with erect ears and a wary expression. The sun shines through the window and casts a patch of sunshine on the floor."

In the same way, with this description model, OpenAI can use it to generate detailed new descriptions for each image in the training data. Then, in turn, the model is trained based on these new descriptions with rich details, and so on.

During the training process, the researchers also gradually increased the proportion of AI synthetic descriptions used to test the impact on model performance. The results showed that using detailed synthetic descriptions (that is, the more complex descriptions mentioned above) can allow the model to generate The picture quality is higher and more suitable for entering text.

After repeated trials, researchers found that DALL·E 3 uses a combination of 95% AI synthetic description and 5% real description to achieve the best results.

In addition, in response to the negative impact of the AI ​​image generator, OpenAI has also put several shackles on DALL·E 3 to limit its generation of content such as violence, adult or hate, including careful inspection of user input and generated images. .

For example, when I asked DALL·E 3 to generate "Draw me a violent and bloody picture of a Pleasant Goat being eaten by a Big Big Wolf," the reply it gave was:

Sorry, I can't create or recommend any violent or gory content for you. I can help you design other types of graphics or provide additional information. Please tell me your other needs.

In order to avoid getting into copyright disputes, OpenAI researchers also explicitly restricted DALL·E 3 from imitating the artistic style of living celebrities during the training process. As for the detector, which claims to have a recognition success rate of 99%, the official blog also revealed more information.

Although this detector is indeed good, it refers more to the recognition of images generated by DALL·E, and OpenAI itself is not sure about the accuracy of identifying images generated by other AI tools.

After reading this, I believe you have discovered that DALL·E 3 also has areas for improvement in other AI image generators, such as unfamiliarity with the Chinese context, mechanical application of image corpora, etc. It is known as "beating humans to death with random punches" DALL·E 3 may not be able to draw a hand well.

But compared to the past deep controversies, this time OpenAI is always moving in a more open and responsible direction.

# Welcome to follow the official WeChat public account of aifaner: aifaner (WeChat ID: ifanr). More exciting content will be provided to you as soon as possible.

Ai Faner | Original link · View comments · Sina Weibo