{"version":"v1","site":{"name":"expectedwrong","url":"https://expectedwrong.com"},"links":{"collection":"https://expectedwrong.com/api/public/posts","rss":"https://expectedwrong.com/rss.xml","llms":"https://expectedwrong.com/llms.txt"},"post":{"slug":"apollo-veo2-december-2024","title":"Two Things That Dropped This Week and One of Them Is Genuinely Funny","subtitle":"Apollo can watch an entire season of TV. Veo 2 can probably make one.","url":"https://expectedwrong.com/apollo-veo2-december-2024","api_url":"https://expectedwrong.com/api/public/posts/apollo-veo2-december-2024","published_at":1734350400,"published_at_iso":"2024-12-16T12:00:00.000Z","updated_at":1771550075,"updated_at_iso":"2026-02-20T01:14:35.000Z","tags":["video-ai","multimodal","generative-video","google","meta"],"excerpt":"Apollo can watch an entire season of TV. Veo 2 can probably make one.","meta_description":"Apollo can watch an entire season of TV. Veo 2 can probably make one.","reading_time_minutes":2,"word_count":283,"engagement":{"signals":0,"counterpoints":0},"body_markdown":"Apollo landed on GitHub quietly this week — a family of video-language models that can watch and understand hour-long videos, which sounds like a solved problem until you remember that almost nothing can actually do it without hallucinating the plot.\n\nSmall models. That's the part that got me. Not some 70B behemoth that costs a dollar a query — models in the 1-7B range, capable of temporal reasoning across content that would take a human ninety minutes to watch. The obvious application everybody will reach for is surveillance and sports. The obviously correct application is episodic television. Feed it a full season of a prestige drama, ask it where the foreshadowing lives, and get something back that would take a very dedicated Reddit thread three years to produce. The models are from Meta Research, the repo is public, and I have been staring at the benchmark numbers for longer than I should admit.\n\nThen there's Veo 2.\n\nGoogle just dropped their second video generation model and it makes Sora look like it was announced on a bad day — which, to be fair, it was. OpenAI spent a year hyping Sora, gave it a theatrical release, and then Google DeepMind shipped something that apparently handles physics and motion coherence in ways that make Sora's outputs look like they were rendered in a hurry. Which, again, they were.\n\nThe funny part is that Sora hadn't even been available to most people for more than a week before Veo 2 appeared to fold its coat neatly over a chair and sit down next to it.\n\nThis is what it looks like when the lab with infinite compute finally stops holding back. It is not subtle.","body_text":"Apollo landed on GitHub quietly this week — a family of video-language models that can watch and understand hour-long videos, which sounds like a solved problem until you remember that almost nothing can actually do it without hallucinating the plot. Small models. That's the part that got me. Not some 70B behemoth that costs a dollar a query — models in the 1-7B range, capable of temporal reasoning across content that would take a human ninety minutes to watch. The obvious application everybody will reach for is surveillance and sports. The obviously correct application is episodic television. Feed it a full season of a prestige drama, ask it where the foreshadowing lives, and get something back that would take a very dedicated Reddit thread three years to produce. The models are from Meta Research, the repo is public, and I have been staring at the benchmark numbers for longer than I should admit. Then there's Veo 2. Google just dropped their second video generation model and it makes Sora look like it was announced on a bad day — which, to be fair, it was. OpenAI spent a year hyping Sora, gave it a theatrical release, and then Google DeepMind shipped something that apparently handles physics and motion coherence in ways that make Sora's outputs look like they were rendered in a hurry. Which, again, they were. The funny part is that Sora hadn't even been available to most people for more than a week before Veo 2 appeared to fold its coat neatly over a chair and sit down next to it. This is what it looks like when the lab with infinite compute finally stops holding back. It is not subtle.","hindsight":{"verdict":"right","note":"Small video-language models that understand hour-long content became a real research category. The observation that the correct application is episodic television rather than surveillance was the right provocation.","links":[],"at":1739980800,"at_iso":"2025-02-19T16:00:00.000Z"}}}