Vibe code death match

I've been running an experiment with the current breed of AI app builders (vibe coding tools). Trying to replicate some real-world screens that I have also previously built in Bubble.io. I've learned a few things along the road which might save you time when exploring the World Wide Web when looking for a new tool.

Bubble.io is a great low-code app builder; it isn't such a good AI-powered app builder, so there has always been this opportunity to disrupt my short-lived love affair (workflow) with Bubble.io for something else.

I tried out the following tools:

Cursor
FigmaMake
Lovable
Replit
V0

I set out to make the same prototype in all of them, and this prototype was, as mentioned, already something that existed, so I knew what the result should be like. This is important as I'm not working in a greenfield situation; I'm working within an existing product, and things need to work in an OOUX way. From here on out I'm calling this the 'Minds Eye' test, how closely does the AI app builder realise my intended design outcome.

Each app builder was expected to

use generative AI to (OpenAI API was provided if necessary to facilitate this process.):
- dynamically stub data based on initial user inputs
- create new data based on instructions and user inputs
provide a means to publish the prototype securely
incoporate the visual design of the SmartRecruiters product either via:
- Figma files
- Figma MCP
- Static images
- Some other Figma integration
be usable in testing tools and behavioural analytics tools like Maze, Google Analytics, Full Story, etc
interact with external public APIs if needed (such as OpenAI, SmartRecruiter APIs, etc)

I then considered the following things initially as measures of what good looks like:

how many of the above capabilities does it have
how easy is it to use
- does it incorporate point and shoot editing
is it fast to render a prompt
how expensive is it to run at the kind of scale a medium-sized design team might see
how helpful was the product in guiding users to get optimum outcomes
how easy was it to make subsequent prototypes without having to repeat much of the foundational prompts
how well did the prototype do in terms of the various levels of prototype fidelity:
- visual
- content
- data
- user journey
- micro interaction
is it conducive to collaboration
does it achieve a result close to my Mind's Eye

Having built the same app a few times now I quickly learnt I to throw in a few more quality measures:

will it test its code
will it let you know when it's done
how much control does it give you at what cost

They, kinda, all work the same.

This was my first big takeaway: they're pretty much all using the same underlying AI models - Claude being the main culprit here. While outcomes differ quite a lot, you can tell the underlying models are the same because they all have a similar feature set. A couple of giveaways included the max input size of a prompt being identical across half of the app builders. Another was that none of them wrote unit tests or tested their code before completing the prompt's instructions. They all seem to accept synchronous prompts, which routinely lead to failures. They all returned pretty similar UIs and sometimes (and I'm yet to explain this apart from being conspiratorial) add identical functionality that one of the other builders created earlier that was not included in any prompt.

Where they differ then is in some under-the-hood starting prompts and, of course, the UX.

For once, I need a bell notification.

I hate bell notifications; I have to turn them off on my phone, which means I miss genuine notifications. Here, however, all but 1 app builder made a noise when it finished my prompt. What that means is you have two choices: sit and watch it write its code, sometimes taking a minute, sometimes 5 and waste hours over the day just sitting there watching. Or, option B is to go away, do something else, while the prompt runs for its unknown duration and come back later. If you are like me, the latter option quickly means you lose track of the app builder, and an hour later, you come back from doing the thing you were doing only to learn the app builder finished after 5 minutes and it's made no progress since. Both are extremely inefficient ways of working. Something that can easily be fixed with a visual and audible notification to indicate completion.

The trade-off between control and agency

These app builders all let you look at and edit the code. Total control, right? Well, absolutely none for a non-engineer. Most of us non-technical folks barely know what an API is, let alone understand the nuances of someone else's code. A couple of app builders, FigmaMake and V0, provide a minimal set of controls for users to change things like border radius, background colours, etc. But still very limited in terms of direct control via a WYSIWYG interface. Instead, they rely almost entirely on providing agency to the user to influence what the AI builder builds. This is a key difference between these AI app builders: agency first (only) and a low-code platform like Bubble.io. Bubble.io is a control-only system (its AI builder isn't worth the airtime right now). The truth is, we need something somewhere in the middle for all non-technical people.

Demo accounts suck

Most of the platforms have tight limits on how many prompts you can use for free, not enough to prove their value at all, not even close. When it comes to credits, one credit doesn't equal one desired outcome. It just means one credit equals one attempt at getting the desired outcome, and for Lovable, a credit is merely a fixed number of tokens (size of request/response), so a single prompt can utilise multiple credits. So the worse the platform is, the more credits you need, and the more you have to pay. Paradoxically, AI businesses are motivated to make their product worse in this way. For Lovable and Curso, I wasn't able to get anywhere near close without a paid account, which fortunately I was able to secure.

You spend much of your time fixing bugs.

This ties into the fact that none of these builders test their code before launch, leaving you to do it via UAT (which is not ideal on many levels). But the amount of 'fix' buttons I had to press, particularly with Cursor, was wildly frustrating.

How did all the builders do?

The best

For me, FigmaMake was the best of the bunch. I didn't want Figma to do well; they already have a stranglehold on my designer world, but here we are. FigmaMake was easy to use and had some nice features to provide the AI with context, like pasting directly frames from Figma Design to Figma Make. It also had some nice point-and-edit control on some simple properties. Pointing and clicking also allowed me to provide the AI context about what I was asking for changes on. However, describing what part of the app I wanted to change became very repetitive when not available. It was reliably fast, and the visual UI most closely matched the product's design language.

The good

V0 by Vercel came second; it had similar capabilities and worked well. There is, of course, the Vercel IDE available as well. I don't think there is any interplay between them. I had limited success with Vercel, so it was quite good to see V0 working so well for me.

The ugly

Cursor is more of a developer-friendly IDE-based app builder, but it's not strong enough to replace their everyday workflows. I am told that GitHub Co-Pilot is better. And for designers, it doesn't take care of some of the things, like publishing and securing keys in a dead simple way. It relies on you knowing how to publish an app to a hosting provider, how to use vaults and databases, and being able to create accounts for all of these.

I found Lovable frustrating to use. Its demo account is limited, and even the paid account chews through credits so quickly that you have to prioritise what you want to achieve. For sure, you couldn't build a full application with it in a month's worth of pro credits.

The bad

And replit, you can forget it. Similar to all the rest, except it has this one fatal flaw: it re-interprets everything on every prompt. So you may be happy with a design but you want to change the button, just the button. So you say change the button to do x,y,z. Then the page loads, and the entire interface has changed! It means lots of negative prompts are needed, and every prompt is on a knife-edge. Unusable.

End.

9 Pregnant ladies fallacy

A good colleague of mine, Krzysztof Mejka, shared this quip about the Project Management Triangle: 9 pregnant women won'…

Same same but different

I've been working with AI… who hasn't right? Well anyway I have been using Dovetail to do a lot of classification. At first…

A UI Designer's first design is their best design. A UX Designer's last design is their best

Ty Fairclough

See all my favourite quotes

Good Reads

Creativity: The Psychology of Discovery and Invention

See my bookrecommendations