Last week, the developers added a new command to Midjourney — /describe. Here is an in-depth look at what it is, how it works, and what you need it for.
In a nutshell
/describe makes Midjourney "go backward." Instead of the usual text-to-image process, you upload an image, and Midjourney analyzes it and "guesses" the prompt.
We already did a study on a similar tool—CLIP Interrogator 2. It's a free instrument available online and created with a focus on Stable Diffusion prompts. But as the study showed, it also works great with Midjourney. It generates wordy, convoluted, and often nonsensical prompts but the results are stunningly close to the original images.
However, now we have a dedicated instrument—for Midjourney and by Midjourney.
futuristic portrait by Gareth Pugh --v 5
paper women neos rex, in the style of sci-fi inspired futurism, monochrome portraits, marvel comics, sony alpha a1, faceted forms, medieval-inspired, strong facial expression --v 5
a close up of a person wearing a costume, an ambient occlusion render, inspired by Marek Okon, digital art, diamond plated superhero, futuristic woman portrait, vektroid album cover, minimal design armor style, sculpture of cate blanchett, fantasy character photo, angry female alien, holly herndon origami statue, nvidia and behance, perfect android girl --v 5
In this study, we compare the two—the built-in designated function and the external tool created for another text-to-image AI.
Quick facts
It's super easy to use: type /describe, hit Enter, and upload an image. That results in four prompts in different styles—that Midjourney guessed after analyzing your input. With CLIP Interrogator you go to the Hugging Face page↗︎ and simply drag your image to a designated field.
blueprint holographic design of futuristic Midlibrary --v 5
technology research and industry background concept stock photo, in the style of fairy academia, illuminated interiors, hyper-realistic sci-fi, studyplace, light azure, bibliopunk, scientific diagrams --v 5
a room filled with lots of blue lights, futuristic scientific laboratory, futuristic laboratory, in futuristic laboratory room, futuristic space ship interrior, sci - fi interior, inside a futuristic army base, futuristic chemistry lab, science fiction scene, surreal sci fi set design, futuristic room, futuristic government chambers, futuristic setting, futuristic production facility, 3 d render of a scifi spaceport --v 5
If Midjourney finds an artist's style in the initial picture and adds their name to a prompt, that name becomes a link to Google Search! However, it doesn't do the same (for now) with artistic techniques, art movements, and other style modifiers.
Each prompt is linked with a numbered Generate button to quickly send the desired prompt to work. If Remix mode is turned on, Midjourney allows you to adjust the prompt before submitting it.
There are no additional parameters available for /describe at the moment. Meanwhile, CLIP Interrogator offers three modes (Best, Classical, and Fast) and the Best Mode Max Flavors setting—the number of keywords and expressions at which the AI will stop analyzing an image—from 2 to 24. For this test, we will use the median value of 12.
Midjourney's /describe is blazing fast, taking only a few seconds per image. That's another difference with CLIP Interrogator that needs 25–30 seconds to a couple of minutes, depending on the number of flavors and the input image complexity.
ROUND ONE:
Midjourney Generations
For the first test, I fed Midjourney its own generations to see how it "decodes" itself. The same images went to CLIP Interrogator.
deer in the magical forest of elves by Ryohei Hase --v 5
an digital painting of deer in woodland, in the style of alexander jansson, primordial creatures, white and bronze, aleksi briclot, whimsical character design, james paick, baroque animals --v 5
a painting of a deer in a forest, by Bastien L. Deharme, by WLOP, fantasy art behance, beautiful digital artwork, by Jesper Ejsing, beautiful fantasy art, god of the forest, forest spirit, by Yang J, anthropomorphic deer, wojtek fus, realistic fantasy illustration, fantasy digital painting --v 5
In short, both models "speak the AI language." It naturally doesn't sound human, and in many cases doesn't seem to make sense. But when you "speak" this language back to Midjourney, this happens:
unimaginable portrait of a female, with extraordinary mask, in style of Vivienne Westwood --v 5
a close up of a person wearing a mask, a character portrait, baroque, erwin olaf, old lady cyborg merchant, jean-sebastien rossbach, stefan koidl inspired, full face and body portrait, in a baroque style, metallic skin, photoshop render, album --v 5
a lady is wearing a costume with a gold mask, in the style of realistic hyper-detailed portraits, sven nordqvist, gabriel metsu, dark silver and light cyan, alessio albi, twisted characters, spectacular show of ages --v 5
And even when Midjourney or/and CLIP Interrogator "miss", the outcome might be a beautiful artistic discovery!
Tyrus Wong's painting depicting closeup portrait of spring flowers witch by William Morris and Charles Angrand --v 5
chinese women and daffodils, 2014, in the style of atmospheric and dreamy, teal and gold, dreamlike portraiture, tender depiction of nature, layered imagery, referential painting, soft and dreamy --v 5
a painting of a woman surrounded by flowers, inspired by Lin Liang, trending on cg society, figurative art, tyrus wong, daffodils, yellow and blue and cyan, by tom bagshaw, vietnamese woman, song dynasty, she --v 5
Both models are very potent in what they are designed to do, often showing equally fantastic results.
the close up of magical mushroom plant, with colorful flowers around, plants and greenery, view from below, in the forgotten forest by Albrecht Durer --v 5
a painting titled'mystery' shows different colors of mushrooms, in the style of northern renaissance, highly detailed foliage, wimmelbilder, vignetting, light red and dark gray, flower and nature motifs --v 5
a painting of mushrooms and other plants in a forest, by Maria Sibylla Merian, magic realism, jean-sebastien rossbach, 256x256, esao andrews and yoshitaka amano, niels otto møller, discovered photo, trending ,, mid 1 9 th century, by Joseph binder --v 5
However, if you do it long enough, you notice that CLIP Interrogator's prompts return slightly better, more interesting, and more detailed results—especially with complex source images.
Tsutomu Nihei's illustration depicting intricate biopunk mask by Kris Kuksi --v 5
an abstract drawing of a head with a spider, in the style of stephan martinière, detailed costumes, daniel arsham, made of vines, chromepunk, yanjun cheng, multi-layered figures --v 5
a black and white drawing of a robot, a detailed drawing, by Todd Lockwood, synthetic maw, david kassan, intricate oil details, visible head, pastel, 4k. detailed drawing, bo xun ling, intricate wiring, azathoth, yanjun chengt, ellen jewett, no type --v 5
I would say that CLIP Interrogator does a slightly better job in this round. At least with Midjourney generations as source images—that might be pretty unique. How about more well known images? Will /describe and CLIP Interrogator recognize famed visuals?
ROUND TWO:
FAMOUS ARTWORKS
In the next test, I picked several famous works of art—from classical paintings to iconic cinema scenes—and let both models do their magic.
Niko Pirosmani. Fisherman in a Red Shirt (1908)
a painting of a man holding a fishing bucket, in the style of dark orange and black, rural china, folkloric realism, naive childlike, qajar art, life-size figures, 1918–1939 (interwar) --v 5
a painting of a man with a bucket and a fish, farmer, inspired by Moïse Kisling, joseph todorovitch ”, anton, inspired by Max Pechstein, contemporary art, folk art, boy, inspired by Marianne von Werefkin, in style of henri rousseau, inspired by José Malhoa, in style of niko pirosmani --v 5
And the very first test brings an interesting observation. CLIP Interrogator recognized the author of the original painting—the great Georgian artist Niko Pirosmani. However, Midjourney's output with CLIP Interrogator's prompt is farther away from the source image. In this case, it's because MJ doesn't know the style of Pirosmani. And thus, /describe's prompt is more faithful to the original image.
Here is a double task: a great architectural chef-d'oeuvre—Pompidou Centre by architects Renzo Piano, Richard Rogers, Peter Rice, Gianfranco Franchini, Su Rogers, and Mike Davies—photographed by the talented photographer Nisian Hughes.
Pompidou Centre by Renzo Piano, Richard Rogers, Peter Rice, Gianfranco Franchini, Su Rogers, Mike Davies. Photograph by Nisian Hughes
paris, france 3d printed building from aerial video stock footage & royaltyfree footage, in the style of turquoise and indigo, felipe pantone, sarah sze, majestic ports, red and azure, dark sky-blue and light yellow, street art sensibilities --v 5
an aerial view of a city with lots of tall buildings, a hyperrealistic painting, paris school, blue and red color scheme, apartment complex made of tubes, courtesy of centre pompidou, video, abcdefghijklmnopqrstuvwxyz, colorful building, cars parked underneath, technology, splendid haussmann architecture, chromatic, mario --v 5
Tim Walker's is one of my favorite photographic styles in Midjourney. Almost as much as I admire his original, real-life style!
Dame of Thrones. Kristen McMenamy by Tim Walker (2012)
a man dressed in black posing for a picture, by Elmyr de Hory, white cyborg fashion shot, tilda swinton, 2003, fashion editorial photography, award winning costume design, gandalf as a woman, patrick westwood style, with shoulder pads, arms extended, style of kieran yanner, pointed hoods, tetsuya nomura --v 5
a woman in an outfit with gloves and a mask, in the style of asymmetrical geometry, gothcore, associated press photo, mote kei, androgynous, elongated forms, mallgoth --v 5
What about movies? In case with the iconic scene from Wong Kar-wai's classics, both models delivered beautiful results (both not too close to the original though). And Midjourney even recognized the director!
Chungking Express. Director: Wong Kar-wai, cinematography: Christopher Doyle and Andrew Lau (1994)
i think she's crying, in the style of wong kar-wai, imaginative prison scenes, yellow and white, mirror rooms, stockphoto, rumiko takahashi, traincore --v 5
a couple of people sitting on top of an escalator, a picture, inspired by Zhang Xiaogang, eating spaghetti from a bowl, ((yellow magic orchestra)), admiring her own reflection, high resolution movie still, siamese twins, 2004, huang yuxing and aya takano, [[fantasy]], jingna zhang, elevator, competition winning --v 5
It's mind-bending what the two are capable of, like in this (challenging) case with a fragment from one of the art history's most famous illustrated books—the Voynich Manuscript.
"Voynich Manuscript" fragment (carbon-dated to the early 15th century)
Original illustration fragment
a large sketch of a dragon titled le monde romin à l'origine médicienne à chihon, 1537, in the style of light green and red, alchemical symbolism, assemblage art, ottoman art, mesmerizing optical illusions, otherworldly beings --v 5
a close up of a drawing of a dragon, an album cover, by Johannes Martini, occult diagram, orrery, ffffound, color illustration, monster, real image --v 5
And here is a more modern classics that generated one the most fabulous CLIP Interrogator (and any other) prompt in this study. Yes, with emoji. :)
"Rick and Morty." Justin Roiland, Dan Harmon (2013–ongoing)
rick and mort in a space vehicle, in the style of dark brown and light aquamarine, lively facial expressions, animated exuberance --v 5
rick rick rick rick rick rick rick rick rick rick rick rick rick rick rick rick rick rick, a cartoon, from rick and morty, photograph credit: ap, delorean, in a crashed spaceship, getty images, (smoke), 🦩🪐🐞👩🏻🦳, chicago, intense emotion, 4 d, may) --v 5
And another animation—this time, a legendary anime scene (spoiler: the results are unsettling... ((o____O)) What's that with CLIP Interrogator and animated films??
"Spirited Away." Hayao Miyazaki (2001)
the no face, girl sitting on a train with two people, in the style of anime influenced, dark emerald and light cyan, movie still, light red and light gray, sparse and simple, chilling creatures, uniformly staged images --v 5
spirited spirited spirited spirited spirited spirited spirited spirited spirited spirited spirited spirited spirited spirited spirited spirited spirited spirited, a picture, by Miyazaki, no faces visible, a person standing in front of a, sitting, iconic film character, arasaka, gelbooru anime image, pareidolia, kano), on ship, diy, son, distant full body view --v 5
Trying to decode another classical painting, a heart-breaking masterpiece by August Friedrich Schenck, both models returned results that are stunningly similar to the original—just not in the details (notice though, how both creatures are bizarrely alike). What is stunning, is how both contestants captured the original's tone and atmosphere. Perfect hits.
August Friedrich Schenck. Anguish (c. 1878)
a painting of a sheep being writhes by crows, in the style of frieke janssens, shwedoff, mort künstler, 19th century, contrast of scale, symbolic overload, poignant --v 5
a painting of a sheep surrounded by crows, by Petrus Van der Velden, tumblr, alexey egorov, sorrow, by jim bush and ed repka, dore, 2 0 2 4, jean-sebastien rossbach, mourning family, polar, victor einrich --v 5
This rounded showed that both, Midjourney's /describe, and CLIP Interrogator know the classics from very different domains of art. However, knowing the source material and indicating it in the prompts doesn't always mean that Midjourney would returns the results you expect. :)
Round three:
My photographs
For a visual artist, /describe and CLIP Interrogator present a mind-blowing opportunity—to look at their work with an AI's eyes (pun not intended).
From "Faces of A.Picolo" series (2012). Yes, it's the same series Francis D.'s portrait comes from ;)
man in camouflage shirt on a dark background, in the style of native american, first nations, and alaska native art, cinematic lighting, feminine portraiture, new american documentary photography, hyper-realistic details, frostpunk, asymmetrical framing --v 5
a close up of a person wearing a jacket, a portrait, flickr, shin hanga, portrait of a navy seal soldier, skilled warrior of the apache, yuli ban, long chin, 2 0 2 2 photo, jon kuo, adam driver, nepal, benjamin vnuk, artists portrait --v 5
Perfect for close-ups, how will our competitors deal with a medium-shot—and a more busy portrait? To test this, here is a self-portrait from lo-o-ong ago.
Andrei Kovalev by Andrei Kovalev (2011)
the room is full of decorations, in the style of sacha goldberger, studio portrait, cluttered, studio light, neil gaiman, industrial photography, alex russell flint --v 5
man standing with camera in his hand, in the style of handcrafted objects, dramatic portraits, portraits with soft lighting, east village art, juxtaposition of objects, troubadour style, tabletop photography --v 5
CLIP Interrogator is closer stylistically. But boy, do I love /describe's interpretation of myself!
Wider shots work well when they are minimalist enough and are not loaded with details and action.
Jimsher whisky advertising (2016)
a man stands on the side of a mountain, in the style of olivier valsecchi, klaus wittmann, neo-traditionalist, elisabeth sonrel, light black, neo-classical symmetry, anglocore --v5
a man standing on top of a mountain, an album cover, by Andrei Kolkoutine, wearing black overcoat, anna kovalevskaya, volcano in background, portrait mode photo, hasan piker, portrait full body, portrait of mélenchon, 4k high res, monk, andes, peter capaldi, morning light --v 5
However, more complex photographs can confuse both AIs easily.
Poster art for "Baron Munchausen" theatrical play in Pyotr Fomenko Workshop Theatre. Director: Sergey Diachkovsky, actor: Karen Badalov (2018)
two people playing an instrument and sitting next to someone, in the style of characterful animal portraits, theatrical lighting, fantastical contraptions, clever wit, historical reproductions, tabletop photography, charming character illustrations --v 5
a man standing next to a teddy bear holding a sheet of paper, a portrait, cg society contest winner, vanitas, band playing instruments, new cats movie, studio shot, magic and steam - punk inspired, mixture between an! owl and wolf, triumphant pose, press photo, gregoire and manon, from below, costume, harmony of --v 5
Finally, let's see what happens if we include a couple of clearly visible close-ups into an input photograph.
Key visual for "The Zoo Story," a play at Mikhail Tumanishvili Theatre. Director: Mamuka Tkemaladze. Actors: Malkhaz Abuladze and Nikusha Tserediani (2017)
a man stands on the side of a mountain, in the style of olivier valsecchi, klaus wittmann, neo-traditionalist, elisabeth sonrel, light black, neo-classical symmetry, anglocore --v5
a couple of men standing next to each other, a portrait, by Edi Rama, antipodeans, promo shot, anomalisa, on black background, wintermute, stanly kubrick, catalog photo, production photo, heavy vignette!, greg rutkowski and edgar maxence, artur bordello, with wart, actors --v 5
I can't stop being amazed by what Midjourney V5 does with hands now!
Overall, both /describe and CLIP Interrogator deliver great results across every category. And with the V5 Alpha's focus on insane photorealism, I will definitely recommended as a creative exploration (and professional!) tool to not only all my fellow photographers, but all my fellow visual artist out there.
The Possibilities
To conclude this study, I want to quickly go over some of the superpowers that /describe and its counterpart present us with. Or: what can we use them for?
LEARNING NEW PROMPTING STRATEGIES AND EXPANDING YOUR MJ VOCABULARY
Midjourney mainly uses words it knows and can visually interpret to describe pictures. So in most cases, if /describe uses a word, an expression, or a style modifier, we can use it in our own prompts.
EXPLORING NEW ARTISTS
The same goes for the artists Midjourney "recognizes" (frequently, by mistake :)) in the input images. Every name in an output prompt means (in MOST cases) that MJ knows the artist.
REVERSE-ENGINEERING PROMPTS
Midlibrary stands for sharing knowledge—know-how, insights, tips and tricks, and, naturally, prompts—in our studies.
/describe, CLIP Interrogator, and similar tools made reverse-engineering prompts super accessible. It is now easier than ever to learn style modifiers and prompting ideas from any AI-generated image. And my hope is that will somehow make people less secretive about their prompts. ;)
FINALLY, IT IS ONE OF THE BEST TOOLS FOR REMIXING YOUR OWN WORKS
For a creator, /describe and CLIP Interrogator are creative goldmines! If you want to see your visual artwork reinterpreted by an AI, look at it from a new perspective, or get inspired by new and unique ways of developing your work further—these instrument are a must-have addition to your toolbox.
And the winner is... *drumroll*
The /describe command (and Image-to-Text AI tools in general) is an absolute gift to a Midjourney artist. It allows you to go deeper, do more, extract new prompting strategies and style modifiers, and learn unexplored artists' names.
As to our (in no way serious) competition: after "feeding" 100+ images to both models, my experience is that CLIP Interrogator does a slightly better job than /describe. It would often decode a bit more, generating more detailed and peculiar prompts.
"Faces Of Bagrationi 1882" advertising campaign (2020)
a person in the jungle with colorful lights on her, in the style of kodak aerochrome, unreal engine 5, romanticized femininity, futuristic victorian, cinematic sets, 32k uhd, made of mist --v 5
a woman standing in the middle of a lush green forest, by Emma Andijewska, romanticism, purple volumetric lighting, jungle around him, film stills, julian ope, prince, purple - tinted, lut, neon jungle, hannibal, detailed picture, ji-min, fauna, luts, tyler, flume --v 5
But in many cases, this difference is marginal. And undoubtedly, /desribe—that has just been introduced—will grow and develop.
Both models work great with simple source material, but often struggle with complex images. And don't even get me started about the prompts they generate. Prompting purist/minimalist in me weeps! >____<
But—magically!—they work! And provide incredible opportunities to any AI artist (and any visual artist, for that matter). So be sure to try those amazing tools out. <3
You can help us maintain and expand Midlibrary and produce more regular educational content of higher quality. And keep it free for all!
Support Midlibrary on Patreon! →
Midjourney Style Roulette
ⓘ New styles on every page!
/explore Midjourney styles