{"version":"v1","site":{"name":"expectedwrong","url":"https://expectedwrong.com"},"links":{"collection":"https://expectedwrong.com/api/public/posts","rss":"https://expectedwrong.com/rss.xml","llms":"https://expectedwrong.com/llms.txt"},"post":{"slug":"zeroscope-v2-xl-open-source-text-to-video","title":"Text-to-Video Is Moving Faster Than It Should","subtitle":"ZeroScope v2 XL is open source, runs at 1024×576, and the results are arriving faster than anyone warned us they would.","url":"https://expectedwrong.com/zeroscope-v2-xl-open-source-text-to-video","api_url":"https://expectedwrong.com/api/public/posts/zeroscope-v2-xl-open-source-text-to-video","published_at":1687694400,"published_at_iso":"2023-06-25T12:00:00.000Z","updated_at":1771534074,"updated_at_iso":"2026-02-19T20:47:54.000Z","tags":["video-generation","open-source","ai","text-to-video"],"excerpt":"ZeroScope v2 XL is open source, runs at 1024×576, and the results are arriving faster than anyone warned us they would.","meta_description":"ZeroScope v2 XL is open source, runs at 1024×576, and the results are arriving faster than anyone warned us they would.","reading_time_minutes":1,"word_count":160,"engagement":{"signals":0,"counterpoints":0},"body_markdown":"There's a specific feeling you get when a capability arrives ahead of schedule — not ahead of the hype, the hype is always early, but ahead of the actual timeline you'd internalized. ZeroScope v2 XL hit that nerve.\n\nWatch the demo. The resolution is 1024×576. It's running faster than I expected something at that resolution to run. That's the part that should probably concern us — not the quality, which is still obviously synthetic, but the speed, which is not behaving like something that should be this cheap to run.\n\nAnd then the other thing: it's open source. On HuggingFace. Right now. No waitlist, no API, no company between you and the weights.\n\nThe cadence here is worth sitting with. Text-to-image went from \"research curiosity\" to \"anyone with a GPU\" in about eighteen months. Text-to-video appears to be doing the same thing, except the clock started later and everyone's still acting like we have time.\n\nWe probably don't have time.","body_text":"There's a specific feeling you get when a capability arrives ahead of schedule — not ahead of the hype, the hype is always early, but ahead of the actual timeline you'd internalized. ZeroScope v2 XL hit that nerve. Watch the demo. The resolution is 1024×576. It's running faster than I expected something at that resolution to run. That's the part that should probably concern us — not the quality, which is still obviously synthetic, but the speed, which is not behaving like something that should be this cheap to run. And then the other thing: it's open source. On HuggingFace. Right now. No waitlist, no API, no company between you and the weights. The cadence here is worth sitting with. Text-to-image went from \"research curiosity\" to \"anyone with a GPU\" in about eighteen months. Text-to-video appears to be doing the same thing, except the clock started later and everyone's still acting like we have time. We probably don't have time.","hindsight":{"verdict":"right","note":"Text-to-video kept moving faster than it should. Every benchmark I thought would take a year took six months. By late 2025 the outputs are functionally indistinguishable from footage for short clips. The pace never decelerated.","links":[{"slug":"text-to-video-march-2023","title":"Text-to-Video Is Where Image Gen Was Before It Was Good"}],"at":1740000000,"at_iso":"2025-02-19T21:20:00.000Z"}}}