PLEASE OUTPUT A PYTHON PARSEABLE LIST OF URLS AND NO OTHER COMMENTARY
Two discoveries in one afternoon: Weave makes everything observable, and Pydantic makes the all-caps prompt extinct.
structured outputs became the standard. instructor's approach was adopted by every major provider — openai, anthropic, google all ship native structured output modes. the 'yelling in caps at a statistical process' era ended. pydantic won.
There is a specific genre of prompt that every LLM practitioner has written — the one in all caps, the one that ends with a threat or a bribe, the one that says something like "PLEASE OUTPUT A PYTHON PARSEABLE LIST OF URLS AND NO OTHER COMMENTARY, I WILL TIP YOU HEARTILY AND OTHERWISE I MAY DIE."
You have written this prompt. I have written this prompt. We all have.
It doesn't work consistently. The model adds a preamble. Or a postscript. Or decides to explain what a URL is. And then you add more caps. More exclamation points. You put the instruction at the top AND the bottom. You start negotiating with a statistical process.
The thing that ends this — and I mean ends it completely, forever, for this entire category of problem — is a Pydantic class and Jason Liu's Instructor library. You stop writing prayer and start writing types.
class UrlList(BaseModel):
"""
This is a list of URLs.
"""
urls: Optional[List[str]] = Field(
default_factory=list,
description="List of URLs that can be empty.",
min_length=0
)
That's it. The model is forced to adhere to the schema. Not asked. Not begged. Not tipped. Forced. The structure is the prompt now — and it turns out structure is a better communicator than desperation.
This arrived on the same afternoon as Weave, which is Weights & Biases applying everything they learned from years of ML training observability to the inference-time stack. Traces, spans, inputs, outputs — across fly.io ephemeral agents, across Cloudflare Workers, across whatever scraps of distributed infrastructure you've stitched together. One decorator. That's the implementation cost.
The combination is what's interesting. You now have structured, typed outputs you can actually parse reliably — and a trace of every call showing you exactly what went in and what came out. Optimization becomes possible in a way it wasn't before. Before this, debugging an LLM pipeline meant squinting at logs if you were lucky and reading tea leaves if you weren't. Now you have a run. You can look at the run. You can compare runs.
I keep thinking about how much time went into prompt gymnastics for output formatting — the preprocessing, the parsers, the fallback logic for when the model decided to preface its JSON with "Certainly! Here is the JSON you requested:". All of it is just gone. It was the wrong layer to solve the problem at.
The tutorial for the RAG trace specifically — watching it render the full retrieval and generation pipeline as a tree of observable calls — is the kind of thing where you sit there for a second and think about all the invisible work that used to happen in that gap between input and output.
It wasn't invisible because it had to be. It was invisible because we hadn't built the glass yet.
Counterpoints
Push back, extend the argument, or sharpen it. New counterpoints go through review before they show up here.
No approved counterpoints yet.