The Recipe Is the App — In Software 3.0

Keeping Pixshop sharp means staying honest about what the best AI can actually do right now — not what shipped six months ago, not what the benchmarks say, but what works in practice. I track it closely: papers, demos, open-source releases, anything that moves the frontier. When something meaningful ships, I want to know whether it changes what Pixshop should be doing.

Last week, Andrej Karpathy — co-founder of OpenAI, former head of AI at Tesla, and now founder of Eureka Labs — gave a talk at Sequoia AI Ascent 2026. Near the end, he pulled up menugen.app — a side project he had built a few months earlier — and called the entire thing spurious, in the software 3.0 paradigm.

It confirmed the bet at the core of Pixshop: as models grow stronger, you do not need to pre-train on dozens of your photos. One selfie is enough — and the recipe handles the rest.

What Andrej said

The talk covers what Karpathy calls the Software 1.0/2.0/3.0 progression — from hand-coded rules to trained weights to models that reason directly over raw data. Menugen.app is his live example of where that progression leads — and here are his latest thoughts on Sequoia AI Ascent 2026. Here is what the software 3.0 version showed him:

"And then I saw the software 3.0 version of this which blew my mind — which is literally just take your photo, give it to Gemini, and say use Nanobanana to overlay the things onto the menu. Nanabanana basically returned an image that is exactly the picture of the menu that I took, but it actually put into the pixels — it rendered the different things in the menu. And this blew my mind because actually all of my menu gen is spurious. It's working in the old paradigm that the app shouldn't exist. The software 3.0 paradigm is a lot more raw. Your neural network is doing more and more of the work, and your prompt or context is just the image, and the output is an image — and there's no need to have any of the app in between."

Verifying it

I handed it off to my agent team. Ten minutes later, I had a working reproduction. The setup is simpler than it sounds: write a recipe — a structured specification in natural language, about 20 lines — describing exactly what to preserve and what to produce. Keep every dish name and price as written. Render a realistic food photo beside each dish. Do not invent dishes that are not on the menu. Maintain a warm restaurant aesthetic. That recipe, plus the menu photo, goes into a single model call.

Here is the original menu:

Original restaurant menu photo — plain text, no images — Input: a restaurant menu, plain text, no photography.

What came back

Below is the output — one model call, nothing else:

Model-generated visual menu — food thumbnails beside each dish, styled typography, warm restaurant aesthetic — Output: one structured recipe, one model call. No image database. No OCR. No template engine.

The model read the menu, identified every dish, generated food photography, made layout decisions, and returned a fully designed result. No database. No OCR step. No template engine. No compositing layer. Courtesy to Andrej.

Thinking differently

The natural instinct when building software is decomposition: break the problem into clean steps, solve each step well, chain them together. For something like menugen, that means OCR to pull the text, a generative model to produce food images, a layout engine to composite the result. Each piece has a defined responsibility. The engineering feels tractable.

Software 3.0 makes that instinct a trap. The model already understands the full problem — it can read the menu, visualize the dishes, and design the layout in one pass. The constraint is not capability; it is specification. A vague recipe produces generic output. A precise recipe — one that describes exactly what to preserve, what to generate, and what constraints matter — unlocks what the model actually knows how to do. The engineering work is not building the pipeline. It is learning how to ask.

Figuring out a good recipe is iterative. You run it, see where the output falls short, add precision in that area, re-run. The question to ask at every step is: is this falling short because the model cannot do it, or because I did not tell it clearly enough what I wanted? With a capable model, the answer is almost always the latter.

What makes this compelling over time is compounding. When the underlying model gets stronger — which happens on a roughly quarterly cadence — every recipe built on top of it gets better results automatically. A pipeline requires active maintenance to improve. A better model on the same recipe is free improvement. The craft of writing good recipes accumulates in a way that pipeline maintenance does not.

Pixshop was already built this way

When you upload a selfie to Pixshop, there is no face-detection step followed by a re-lighting model followed by a background compositor. Your photo goes in. A recipe — carefully written and iterated across hundreds of real outputs — tells the model exactly what to preserve and what to transform. One model call. Your photo comes out.

This was an architectural choice made early, not a retrofit. The recipe is version-controlled, independently iterable, and decoupled from the model underneath. When the underlying model improves — which happens every few months — every recipe gets stronger results automatically. No re-engineering required.

That is why Pixshop works from a single selfie. The older approach to AI photo tools asks for ten, fifteen, even twenty photos — that is the training data needed to fine-tune a model on your face when the base model is not strong enough to generalize from one shot. In Software 3.0, the base model is strong enough. A well-written recipe, one photo, one call. Karpathy's demo gave a name to that direction.

What this means if you use AI photo tools

A pipeline-based tool has a quality ceiling set by its weakest step. A recipe-based tool has a quality ceiling set by the model — and model quality has been compounding steadily for years. Pipelines do not compound on their own.

It also changes what reliability looks like. Pipelines tend to be brittle outside the specific cases they were designed for. A model reading a recipe can handle variation, because it is understanding context rather than matching a template.

Try it at Pixshop — upload a selfie and see what a well-written recipe does with it. Three free credits. No card required.

Ready to try it yourself?

Start your first shoot free — no credit card required.

Try Pixshop free

Back to all posts