Defeat Midjourney, why does this AI product launched by Google’s giant overtake others in a corner?

No one can always be king, but with the prefix, anyone has a chance to be king.

What new tricks can AI Wenshengtu play?

In this red sea dominated by a group of heroes, the head is occupied by Midjourney, DALL·E, Stable Diffusion, etc., and there are not many other products that can make people shine.

However, there are still dark horses emerging: Ideogram, founded by former Google engineers and invested by Silicon Valley AI masters, was launched in August last year and released its latest model at the end of February.

What makes Ideogram special is that it is good at generating images containing text, which is exactly the problem that several giants are improving.

Facts have proved that getting up from where others fell is a way to overtake in corners.

It can "draw" and "photograph", but AI may still be "illiterate"

It has always been a pain point for AI to accurately generate text. Even if the characters and scenery generated look like those captured by a camera, they will look like garbled and distorted text, and the AI ​​will instantly reveal its original shape.

▲ Error text generated by Midjourney v5.2.

Ideogram stood up and said that it refused to let AI continue to be "illiterate" and might as well start with it.

The threshold for getting in touch with Ideogram is very low. Just open the webpage ( and log in to use it. The interface looks refreshing and uncomplicated.

There are not many steps to generate a picture. Fill in the prompt words in the input box, and then check the picture aspect ratio, as well as picture styles such as photos, posters, and 3D renderings according to the effect you want.

Ideogram also took into account that humans may be "difficult" with prompt words. In February this year, it launched "Magic Prompt", which is like a built-in ChatGPT to help you improve the prompt words, and AI can grasp the thoughts of similar people.

What images contain text content? Product logos, T-shirt printing, book covers, movie posters…

Let’s do an entry-level test first, asking a few people to hold up a sign with the name of an animal. At first glance, the text is correct, but the face and hands are not normal. The two cancel each other out. It turns out that the shortcomings have not disappeared, but have just been transferred.

If you only let Ideogram write, the effect will be much more amazing.

Let AI generate Musk's classic sentence "I would rather be optimistic and wrong than pessimistic and right." Except for the "W" that is flawed, all other words are written correctly.

The font is a little more lively, and Musk, who was born in the 1970s, doesn’t know if he can accept it, but the collision of black and white should satisfy him who changed the Little Blue Bird.

Taking the classic proverb "All work and no play makes a smart child stupid" as a test question, although the prompt word emphasizes the use of printer fonts, Ideogram failed to do so. It seems that the font cannot be defined using prompt words alone, and can only be approximated.

Then, the AI ​​was ordered to design a logo for an AI startup company called “Coffee AI”.

The main body is a coffee cup with a circuit pattern. There is a robot barista in the upper right corner. The company name is written in bold capital letters. The layout is simple and restrained. As a logo, it is very intuitive, but generally speaking, it is expected and it is difficult to let people know. Fang made a quick decision.

It’s time to make AI more difficult, with longer sentences and higher design requirements.

I asked Ideogram to design an inner page for a children's picture book. Not only should it be written "Fox in socks and rabbit in top hat" in a prominent position, but also "Anonymous" should be marked at the bottom.

For these two lines of text, Ideogram basically completed the task, using hand-drawn fonts and chalk graffiti, and even included illustrations that fit the meaning of the title. The taste of the picture book is right, but the mistakes are also very conspicuous. There is something wrong with the word "in". The rabbit looks like a fox and is brother to the fox.

Ideogram can also be used for movie posters. You might as well try the popular movie "The Three Evils" starring Ethan Ruan, which became popular some time ago.

I mixed allusions and movies into the prompt words. The background used images of knight silhouettes, seas, and mountains, and the text referenced the English names of the movies: Pigs, Snakes, and Doves.

Except for the missing "the", the final poster effect is pretty good, blending classical images and modern fonts. The pigeon graffiti is the finishing touch, but overall it is more like a Western fantasy style, which makes people feel unfamiliar and difficult to associate with it. The plot of the movie.

Through experience, I found that the probability of textual errors in Ideogram is not small. Sometimes it is generated two or three times to get the ideal result that is word-for-word.

Even if the text is correct, the characters' faces and fingers often look off.

It may also add some fancy tricks, randomly generate meaningless and distorted text, and slap itself in the face.

▲The small characters here are blurred into a ball.

But overall, Ideogram is surprising. It can write long sentences, and it uses appropriate fonts and layout methods to match the atmosphere of the picture. Although it is not able to write Chinese yet, the words like ghost symbols fit very well into the folds of clothes.

▲ These four words actually mean "Gong Xi Fa Cai".

Despite its flaws, there are already many employment scenarios for Ideogram. It can be used as an inspiration reference and creative assistant when designing logos, posters, and T-shirt patterns.

In the past, I was worried that AI would be able to "draw" and "photograph". In the future, I would be worried that AI would be literate and capable of designing.

The aesthetics are not inferior to Midjourney, and it is also an emoticon tool

Progress in AI is measured in days, and the world may change as soon as you wake up. Although Ideogram claims that its text rendering capabilities are the strongest, its opponents do not admit defeat.

Stable Diffusion 3, which has not yet been released to the public and is open source, was officially announced in February and has improved text spelling capabilities.

▲ Stable Diffusion 3’s spelling abilities.

Midjourney v6, a beta version released in December last year, is the first Midjourney version with reliable text generation capabilities.

However, its requirements are still quite demanding. In addition to placing the text in quotation marks, the prompt words should preferably explain the location and writing method of the text, and use keywords such as "printing" and "writing", one to two words Text generation works best.

▲ Text generation function of Midjourney v6.

The Ideogram team, which was being chased, did not panic and believed that the advantage lies with me. Ideogram still has a higher accuracy rate and can handle complex and long sentences.

Ideogram's system evaluation also shows that Ideogram 1.0 has the highest accuracy in rendering text, with an error rate reduced by nearly 2 times compared with other models such as DALL·E 3.

Instead of just talking and practicing tricks, it is better to use the same prompt words and let Ideogram 1.0 compete with Midjourney V6 and DALL·E 3 on the same stage.

First, let's compare the accuracy of the generated text. I asked the AIs to draw an illustration of a sunrise in the Ukiyo-e style. The classic line "Tomorrow is a new day" from "Gone with the Wind" was placed in the appropriate position to express hope and rebirth.

This time, Ideogram was the winner, with accurate spelling and bold and outstanding design of lines and colors.

DALL·E, who has never been very artistic, unexpectedly has a texture. The text is basically correct but not completely correct, and the painting style is more abstract. Not only is the text of Midjourney inaccurate, but the aesthetics is also inferior, and he did not even listen carefully to the question.

▲ On the left is DALL·E, on the right is Midjourney.

The second is to compete on the ability to create memes. Ideogram officially mentioned the function of generating emoticons. With the help of "magic prompts", AI will use its imagination to expand the prompt words and add copywriting to make the pictures emotional.

I wanted to see if AI could generate a working cat emoticon, so I entered the prompt: "Draw an interesting meme about a tearful cat wearing a bow tie and shirt, typing in front of a computer, as a metaphor for human working Hard."

Ideogram used his imagination and consciously added the text "Cats also have to work."

The only drawback is that there is an extra "have" and the number of fingers on the front paws is wrong. It seems that the AI ​​is not only having trouble with human hands, but also with cat paws. Compared with the original emoticon package, it is quite satisfactory and cannot be as hit-and-miss as the "Crying Cat Head".

▲ The left is the network image, the right is the Ideogram.

Midjourney’s cat is serious and elegant, as if it’s a writer who has become wealthy and free, and looks more like taking a magazine photoshoot, but the mouse doesn’t know what’s going on.

▲ On the left is Midjourney, on the right is DALL·E.

DALL·E has the best emotions. Although the style of painting is a bit casual, it has the advantage of being rough. It seems that the noodle tears that are not in the same layer have an internal flavor. It is really suitable to be used as an emoticon pack.

The third is the ability to understand complex and long prompt words, especially whether the elements of the prompt word are complete and whether the position of the prompt word is accurate. Therefore, I entered a relatively long-winded prompt word and stipulated the position of each subject.

Ideogram performs better in the overall composition. Several key points mentioned in the prompt words are covered. The heart-shaped brand, robot, astronaut, balloon and medal are all included, although details such as the astronaut's hand and the medal's words are missing. question.

In comparison, Midjourney is more artistic, but the elements are missing, and there are some decorations that are not there, and it has its own ideas and personality. DALL·E is not only missing elements, but the details are wrong, and it also doesn’t look good.

▲ The top is Midjourney, the bottom is DALL·E.

So regardless of the text, just looking at the image quality, Ideogram is not bad either. Sometimes the restoration of the spatial relationship of various objects in the prompt words is more accurate than other AIs.

In terms of user experience, the generation speed of Ideogram is faster than that of Midjourney. Generally, four pictures can be completed in more than ten seconds.

Even according to the industry's evaluation rules, human evaluators preferred Ideogram 1.0 to DALL·E 3 and Midjourney V6 in terms of prompt alignment, image coherence, overall preference, and text rendering quality.

Even if you are not satisfied with the pictures generated by Ideogram, if you use its magic prompt words, the generated effect on Midjourney and DALL·E may be better than rubbing it by hand. It can be regarded as a way to optimize the prompt words.

No one can beat me in my BGM, but if the same prompt word is used by different AIs, the outcome is really uncertain.

A star company founded by Google engineers, with down-to-earth AI products

Ideogram was established in August last year and launched its latest model, Ideogram 1.0, in February this year.

This is another star company with a founding team of seven people from Google Brain, the University of California, Berkeley, Carnegie Mellon University and the University of Toronto. Four of them are the authors of the Google Vincent graph diffusion model Imagen research paper.

The cautious Google is often slow to launch products, and has watched its competitors become famous around the world many times. The chatbot was preempted by ChatGPT, and Imagen was overtaken by DALL·E 2.

From the perspective of engineers, it is not a good thing that research results cannot be implemented into consumer applications. Many people have chosen to leave and build new products themselves, making them as open to use as possible, and first accumulating user scale and reputation.

Ideogram's free quota of 25 prompt words a day may also be based on this consideration.

The market is very optimistic about this product. Ideogram has completed an $80 million Series A round of financing led by Silicon Valley venture capital a16z. Among the investors are AI masters, including Google chief scientist Jeff Dean and OpenAI founding team member Andrej Karpathy.

In fact, after experiencing many AI products, I secretly have a question: How to define the usefulness of a product?

▲ T-shirt pattern generated by Ideogram.

What I found useful before was the plug-in "Immersive Translation". Unlike Google Translate, it covers the original text and can be compared between Chinese and English. It can be used not only on news web pages, but also on X information streams, YouTube subtitles, and PDF files.

Ideogram seems to be so down-to-earth. On the one hand, it can more accurately generate the text content required by users and adapt to various styles of pictures. On the other hand, it can also create something out of nothing and match pictures with suitable text, such as emoticons.

Although many of the results generated by Ideogram cannot be used immediately, they at least basically meet the prompt word requirements, and most of the text is readable.

I also found in my experience that Ideogram's realistic pictures are average, but its graffiti, illustrations, and paintings are good, and its artistic talent is on par with Midjourney's.

▲ Graffiti art illustrations generated by Ideogram.

Ideogram’s official website also has popularity rankings of various works. The moment you open the website, you seem to have mistakenly entered an Instagram-style picture community, and you can also learn the prompt words above.

When an AI tool combines creativity, convenience and sharing, it is easy to get addicted. The specific manifestation is that the 25 prompt words a day are quickly used up. This anxious feeling is similar to waiting for Suno's points to be updated.

For a monthly membership of $7 or $16, in addition to more generation times, Ideogram also provides image upload and editor functions.

Image uploading means that users upload their own pictures and then recreate them through the Remix function.

▲ The left is the original image and the right is the output.

In addition to regular functions such as cropping and zooming, the editor also has an interesting drawing tool that generates pictures from an abstract drawing. Human painters roughly outline the shape, composition, color, etc. of each element, and AI is responsible for turning decay into magic, giving Ma Liang a sense of déjà vu.

Ideogram can survive from the bloody storm, ease of use is of course the most important thing, and its positioning is also very accurate.

If aesthetics is the most important criterion, then Midjourney takes the cake. Although the level of DALL·E is up and down, the built-in ChatGPT is convenient to call, and the open source Stable Diffusion brings freedom.

In terms of user scale alone, Ideogram may not be able to beat any of them, but it has done a good job in its long list and should be able to gain a solid audience of its own.

At least among the free AI image generators, Ideogram's overall quality is leading, the web page is easy to use, free credits are provided, text rendering is powerful, the magic prompt function and the creator community provide creativity and inspiration.

Vincentian diagram models are far from perfect, and are still working hard to restore the physical world, or to be on par with painters and designers. More Ideograms may still find their place.

This is where the cruelty and charm of AI competition lie. I don’t know who will have the last laugh, but there will always be new opponents aiming at Achilles’ heel.

