We Invented Image Slicing Again
GPT-4o can generate a product page as an image, then generate the imagemap coordinates itself, which means we have arrived somewhere either brilliant or cursed.
We did invent image slicing again. The regression from semantic HTML to generated screenshots-as-UI is real and ongoing. A decade of web standards progress, casually discarded because the model can draw a button faster than it can code one.
Anybody remember slicing PNGs?
You'd take a Photoshop comp, cut it into thirty-seven individual GIFs, reassemble them in a table with zero borders and one pixel of cellspacing, and call it a website. The whole thing was a single image pretending to be a document. It was insane. We spent a decade escaping it — semantic HTML, CSS layouts, component libraries, design systems, the entire apparatus of modern web development — all of it was an argument against the sliced image.
We're back.
The loop I watched happen today: ask GPT to generate HTML for a product page, get perfectly adequate HTML for a fictional boutique selling hats to squirrels, feed that HTML into a fresh GPT-4o instance, ask it to render the page visually. It does. Then somebody points out — correctly — that you don't need the HTML step at all.
Just ask for the product page. Get an image. Then ask a VLM to generate the <map> coordinates.
<img src="squirrel-boutique.png" usemap="#squirrelMap" />
<map name="squirrelMap">
<area shape="rect" coords="50,260,350,660" href="acorn-cap.html" alt="The Acorn Cap" />
<area shape="rect" coords="375,260,675,660" href="top-nut-top-hat.html" alt="Top Nut Top Hat" />
...
</map>
This is imagemap. This is 1995. The model looked at its own rendered output, measured the product cards, and handed back pixel coordinates.
The part that's hard to shake: it generated all the downstream pages too. The whole site — every product detail page, every click target — fell out of the pipe. No component library. No CSS variables. No build step. Just a model reasoning spatially about an image it made.
There's a version of this that's a parlor trick and a version that's the whole game.
The parlor trick version is just funny. We went around an enormous loop — years of web standards, accessibility advocacy, progressive enhancement discourse, the long war against tables and spacer GIFs — and landed back at "it's a picture with hotspots." The Acorn Cap, $24.99, coords="50,260,350,660". Cute.
The other version is what happens when image generation gets fast enough to be real-time. Right now there's latency — you generate an image, you wait, you get coordinates, you wait. But these gaps are closing in one direction. The async step becomes a frame rate. And once it's a frame rate, the interaction model changes entirely.
You stop sending UI components. You send rendered moments. The model decides what the interface looks like — not from a component tree but from its own understanding of what the interface should look like, right now, for this user, for this task. It renders. You click. It re-renders. The whole thing is omnimodal — text, image, gesture, voice — with no layer separation between "logic" and "display" because there's only one thing happening.
That's a diffusion OS. That's not a UI toolkit, it's a medium.
Nobody planned the squirrel hats. The boutique was a test case — absurd enough to be obviously fake, concrete enough to have real UI requirements (products, prices, cart buttons, a footer with copyright and a contact email for squirrelystyles@example.com). The model produced a complete working example without being told what "complete" meant.
The imagemap angle came from someone who remembered that <map> exists, which is a thing that fewer and fewer people do every year, which makes the whole episode feel slightly like receiving a transmission from 1998 via a model trained on the entire archive of human decisions about how to put rectangles on screens.
We kept all of it. Even the part where it spelled "visually" as "visuallt" and nobody fixed it because it was ready to roll.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.