{"version":"v1","site":{"name":"expectedwrong","url":"https://expectedwrong.com"},"links":{"collection":"https://expectedwrong.com/api/public/posts","rss":"https://expectedwrong.com/rss.xml","llms":"https://expectedwrong.com/llms.txt"},"post":{"slug":"mint-video-model-snapchat-uoft","title":"Snapchat and UofT Built a Video Model That Actually Understands the Assignment","subtitle":"MINT treats video generation like storyboarding — and the prompt coherence is unsettling.","url":"https://expectedwrong.com/mint-video-model-snapchat-uoft","api_url":"https://expectedwrong.com/api/public/posts/mint-video-model-snapchat-uoft","published_at":1734264000,"published_at_iso":"2024-12-15T12:00:00.000Z","updated_at":1771550038,"updated_at_iso":"2026-02-20T01:13:58.000Z","tags":["video-generation","ai","snapchat","diffusion-models","sora"],"excerpt":"MINT treats video generation like storyboarding — and the prompt coherence is unsettling.","meta_description":"MINT treats video generation like storyboarding — and the prompt coherence is unsettling.","reading_time_minutes":2,"word_count":251,"engagement":{"signals":0,"counterpoints":0},"body_markdown":"Sora dropped six days ago and the discourse is still mostly people arguing about whether it's real-time or not, which means everyone is sleeping on MINT — a video model out of Snapchat Research and the University of Toronto that is doing something conceptually interesting.\n\nThe framing is storyboards. Instead of treating a prompt as a single atomic instruction and hoping the model figures out the narrative arc, MINT approaches video generation the way a director actually thinks about scenes — shot by shot, with continuity across the sequence. Sora gestured at this with its storyboard interface. MINT seems to have made it load-bearing.\n\nThe part that's hard to dismiss is the prompt coherence. Most video models fail the moment your description gets specific — you ask for a woman in a red coat crossing a wet street at dusk and you get a generic person in an outdoor location at some time of day. MINT's outputs stay close to what you actually asked for, which sounds like a minimum viable product but is somehow still a differentiator in December 2024.\n\nSnapchat is a strange institution to be running serious video generation research. They are also, it turns out, one of the few consumer companies that has been thinking seriously about short-form video at scale for a decade, which is either a coincidence or the entire explanation.\n\nThe UofT collaboration is less surprising. Toronto has been a machine learning node since before it was fashionable to call it that.\n\nWorth watching.","body_text":"Sora dropped six days ago and the discourse is still mostly people arguing about whether it's real-time or not, which means everyone is sleeping on MINT — a video model out of Snapchat Research and the University of Toronto that is doing something conceptually interesting. The framing is storyboards. Instead of treating a prompt as a single atomic instruction and hoping the model figures out the narrative arc, MINT approaches video generation the way a director actually thinks about scenes — shot by shot, with continuity across the sequence. Sora gestured at this with its storyboard interface. MINT seems to have made it load-bearing. The part that's hard to dismiss is the prompt coherence. Most video models fail the moment your description gets specific — you ask for a woman in a red coat crossing a wet street at dusk and you get a generic person in an outdoor location at some time of day. MINT's outputs stay close to what you actually asked for, which sounds like a minimum viable product but is somehow still a differentiator in December 2024. Snapchat is a strange institution to be running serious video generation research. They are also, it turns out, one of the few consumer companies that has been thinking seriously about short-form video at scale for a decade, which is either a coincidence or the entire explanation. The UofT collaboration is less surprising. Toronto has been a machine learning node since before it was fashionable to call it that. Worth watching.","hindsight":{"verdict":"persists","note":"The storyboard approach to video generation — shot by shot with continuity — remains the right direction. Whether MINT or something else ships it as the default UX is still playing out.","links":[],"at":1739980800,"at_iso":"2025-02-19T16:00:00.000Z"}}}