Quintin's Newsletter
Posts
Deconstruct: How Midjourney hit $200M with no VC funding using AI growth loops

Deconstruct: How Midjourney hit $200M with no VC funding using AI growth loops

How 40 employees bootstrapped a $200M AI image generation startup

Quintin Au
February 27, 2025

Have you ever said no to a VC willing to give you a blank check to invest?

Well David Holtz, Founder of Midjourney, continues to say no on a daily basis.

VCs: “Do you want millions of dollars?”
David Holtz: “Naw.”

Midjourney is an anomaly. As an image generation AI startup, it has generated over $200M in revenue, with 40 employees, and no VC funding. Ever. And with Midjourney 6, they’re pushing state-of-the-art image generation models. With 40 employees (as of 2023). There’s no wonder VCs are clamouring to invest.

Midjourney was built on Discord, a Slack-meets-Reddit community app mainly for gamers. In fact, there wasn’t even a proper web app. Midjourney.com was a site full of Matrix-looking numbers that just redirected to the Discord channel where all the generations happened. Despite rolling out a web app today, they’re still allowing users on Discord today.

But to understand how Midjourney made their meteoric rise, you need to understand how they’ve used Discord for Peer-to-peer onboarding, adopted multi-generation UX, the edit-data loop, and changed how they take feedback from users.

Deconstruct is a series of deep dives where I break down the growth and product strategies behind high-growth AI products.

I dissect the psychology & mechanics behind the new behavioural and growth loops AI startups are leveraging to grow.

What Midjourney learnt from video games: Peer-to-peer onboarding

All GenAI products face an education gap.

For example, new Midjourney users needed to learn how to prompt before they could generate images - which was more difficult to understand than other AI tools.

Imagine trying to understand what this prompt means as a new user.

However, in Midjourney, you can see everyone’s generated images - including all their edits and mistakes.

At first, you think it's terrible UX. It's messy. The feed is spammed. It’s hard to find your own generations.

But then, you do what humans do - watch others, copy and learn. You start to figure out how to prompt, what works, and the speci

By forcing users to prompt in a Discord channel with other users’ generations, you are forced to learn from others.

It’s peer-to-peer onboarding, and it’s free.

And in doing so, you learn how to prompt without ever reading a guide - or one of those extremely skippable (and punchable) product pop ups.

In Discord, you can see the breadcrumbs of a users’ prompts.

And as others get better and learn, so do you. You start to pick up common phrases that make amazing images like “hyperrealistic”, “cinematic lighting”, and “4K”.

I call this “peer-to-peer” onboarding, and the Discord UX is perfect for this.

Don’t know how to get a certain aesthetic? Browse Discord.

Need inspiration? Browse Discord.

Got a question? Browse Discord.

✨ Important: Peer-to-peer onboarding needs a live feed.

Static feeds don’t work as well because the user still needs to scroll to browse other users’ generations.

Live feeds, on the other hand, take advantage of default bias - users are constantly exposed to new prompt-generations by default, without needing to lift a finger. Youtube’s Autoplay uses a similar psychology.

We can compare Midjourney’s Discord live feed to Pika.Art’s static feed to see the difference. See how fast you drop off on a static feed.

Midjourney wasn’t the first to do this.

Video games have notoriously steep learning curves. From controls to tips/tricks to an entire game economy, there’s no way you can teach that in an onboarding funnel or a 3-part email series.

So they turn to their user base to pick up the leg work.

Look at any popular video game, there is often a subreddit of loyal gamers sharing tips, tricks, and easter eggs. For example, r/FortNiteBr has over 4M redditors.

And before Reddit, there was GameFAQs. And before GameFAQs, there were forums. And in each platform, players would teach other players - making 10-page game guides, FAQs, answering each others’ questions, and even coding full game mods for the community.

Peer-to-peer UX is a virtuous onboarding loop. It worked before, and it worked for Midjourney today.

If you remember this, you were probably really cool back in the day - but probably also a cheater. 👀

How multi-generation UX is defining GenAI

One of the most difficult problems with GenAI products is predicting what the user wants from a single prompt and producing the correct output.

For example, the user might not prompt well or the model might misunderstand the intent or a user gets unlucky due to LLM’s inherent randomness, leading to a disappointing outcome.

Example of a bad generation. Good on you AI for not seeing species though.
Prompt: wallpaper of a tiny astronaut floating in space next to the moon, mysterious and eerie --ar 7:4

But what if you generated multiple? So for every prompt, you generate 4 images instead of just 1. At least one of them is bound to be right - right?

That’s why every Midjourney prompt generates at least 4 low-res images.

By providing multiple generations per prompt, Midjourney’s AI model doesn’t need 100% accuracy. It just needs to be right ¼ times or 25% of the time.

ChatGPT does this too.

Ever notice why most GPT replies are bullets?

Bullet points are the multi-generation UX for text models. Instead of generating 1 answer, it generates multiple answers. And in this case, at least 1 of the 5 bullets will yield a satisfying answer.

Same with Perplexity, Opusclip, Suno AI.

Models provides multiple outputs, because at least one of them will be valuable.

However, video GenAI apps like RunwayML and Pika.Art struggle with this because it’s a lot more expensive to generate 4 video clips, than 4 images. Waiting a minute for a video to generate, only for it to be way off from your intended prompt is never a great user experience.

Most AI users get around this by generating images first before feeding it into a image-to-video AI.

However, as of December 2024, Sora’s UX gets it right. Every Sora prompt can generate up to 4 videos.

This. This is what VC money can buy.

But multi-generation UX is critical for another reason. By providing 4 images, it also allows for one of Midjourney’s most powerful growth loops…

Generate → Edit → Data Loop

Every Midjourney prompt generates 4 images.

Underneath each generation, there are 8 buttons - 4 buttons to upscale one of the images and 4 buttons to generate variations of an image.

Each action gives a positive signal (upscale) or a neutral/negative signal (variations) to the model. This data feedback loop allows the model to optimize for better images in the future.

U = Upscale, V = Variation

And that’s the brilliance of the Midjourney UI.

Midjourney captures the behavioural data of their users, uses it to improve their model, and therefore, product, which brings in more users.

This is the triforce of AI models. The perfect loop. The loop to rule them all.

Loops are powerful because they apply leverage - making more output from your inputs. For example, with this loop, just increasing users will grow 3 things - your user base, training data, and product - all at the same time.

This is used by Netflix, Spotify, Google Ads, you name it:

This data loop has worked for recommendation-centric products. But the same now applies to AI products. Better data leads to better prediction of what to generate.

Midjourney has many “edit” behaviours that capture high-signal actions from users:

Upscale (positive signal)
Remaster (positive signal)
Export (positive signal)
Aspect ratio (slight positive signal)
Vary Region (slight positive signal)
Variation (slight negative signal)

Capturing these edits are vital to the growth loop of capturing better data, leading to better model predictions and happier users.

This is why there’s a trend with AI startups trying to “capture” the edits of the process (eg. RunwayML, Canva). It also makes for a smoother workflow.

Showing user edits are vital for engagement.

While most capture edits, most AI products don’t show the edits that users make.

Showing edits to new users does two things:

Teaches new users how to edit their own prompts to get better outcomes.
Sets expectations for new users. The most common dropoff point for a new user of AI tools is after their first few (often mediocre) prompts, notice it’s worse than other polished works, and dismiss that the tool is "not that good yet”. Showing edits of other users resets their expectations and shows it takes a combination of multiple prompts & post-editing to get a great image.

To be honest, I don’t think this UX problem has been solved yet. Discord’s live feed sort of does this but I’d imagine some type of version control would be a better way of showing this to new users.

Discord for shorter product-user feedback loop

Bug in the model? File a Discord ticket.

Recommend a UX suggestion? Tag a Discord mod.

Need customer support? Message in the Discord.

Traditionally bug reports, user interviewing, and customer support would go through 3-4 different tools that we all hate using like Zendesk, Fullstory, Slack Bot, and email.

It’d take weeks to set up a user interview. Weeks for a user to provide more details on a bug report via email.

For an industry that raves about feedback, the tools we use makes feedback a chore.

But Midjourney was different.

They saw that they could use Discord as their user feedback/support/testing tool - enabling users to provide quick, seamless, rapid feedback.

Imagine, users could tag you in their image generation, engineers can see the prompt, see the logs, test a fix, and ask a user to try it again, all in a day, and all in one tool.

But this can only exist if the Midjourney product team bought into using Discord as their primary user research & testing tool. And they did. The benefits from rapid iteration and feedback allowed for them to work out bugs, finetuning, and so much more.

✨ GenAI products depend on rapid user feedback loops.

Unlike traditional apps, the outputs of AI models are variable.

Meaning - you’ll never get the exact same reply from ChatGPT, given the same prompt. And you’ll likely get a lot more “wrong” responses and errors.

As a result, it’s critical to have a rapid way to collect, analyze and fix poor responses that the model is producing because there will be 10x more unintended outputs than a traditional app.

Or you end up like Google’s Gemini. (Too soon? Too soon. 👀)

A startup powered by flywheels and feedback loops.

Where other startups have subscribed to “throw the kitchen sink and several million dollars” at marketing and growth, Midjourney experimented with new paradigms of product and growth.

Where others would chalk it up to chance, luck, and first-mover advantage, I see the ways that Midjourney worked so well on Discord. Even as Midjourney has moved to a web app, they’ve still kept a “chat” feature that holds several of the paradigms we talked about from live feeds, multi-generation UX, and peer-to-peer onboarding.

Midjourney’s chat feature on its new web app.

However, I’m still shocked that they’ve able to create such powerful flywheels with a small team. Midjourney is a testament that you don’t need more money to build in this era of AI - you just need the right leverage, the right loops, and the right people.

Reply

or to participate.