Agents want flexible schemas
The NoSQL comeback nobody saw coming
My friend Matt Holden likes to say that LLMs are fuzzy compilers. I like this analogy. It points toward the profound shift that is happening in software.
The thing about computers is that they can do anything you wish, so long as you specify your wish in exacting detail. However, LLMs relieve this constraint. An LLM can extrapolate what you mean (more or less) from just a few words. So computers can do vibes now. And it seems that once you create an AI that is smart enough to interpret natural language, you incidentally create an AI that is smart enough to solve a wide range of other problems. Throw an LLM at a new domain, and it will usually find some way to make itself useful.
However, a problem arises at the boundary layer between these two paradigms of code and vibes. It’s an impedance mismatch that is most obvious at the data layer:
Code wants fixed schemas: Code crashes when it encounters something unfamiliar. You deal with this by validating all inputs and ensuring they conform to a rigid data schema. Schemas narrow the domain so that your code can gracefully handle all possible branches of the (reduced) state space.
LLMs want text: LLMs, on the other hand, are able to orient to new domains and recover from errors. Pinning down an LLM with fixed schemas just limits its potential. You want the LLM to be able to invent its own domain models on the fly, and since LLMs think in tokens, you’re usually better off just saving text to the database.
There are, however, certain kinds of data that exist in the liminal space between vibes and code:
Markup languages like XML and HTML
Text-based data formats like YAML, JSON, and CSV
DSLs like spreadsheet formulas or SQL
Hybrid authoring formats like Markdown with frontmatter
These formats were designed to bridge the boundary between computers (code) and humans (vibes). They are text-based and they tend to be forgiving of errors. Most of them evolved out of things like wikis, or tools for thought, or static site generators—places where humans and computers need to cooperate. And it feels significant to me that they are finding new uses in AI. (Does AI converge toward knowledge management?)
One format jumps out as almost universally useful: Markdown with frontmatter. It’s a retread of the headers pattern we see in packets, email, and HTTP: a block of structured data, followed by free text. The combination offers something for everyone. Code can leverage the structured data, while AI gets natural language. This feels like the closest thing to an AI-native data format.
I’ve been experimenting with variations of the frontmatter pattern in agent harnesses. In SQL, I’ve been leveraging jsonb features to save the structured data. So you end up with something like…
create table docs (
id text not null,
type text,
data jsonb not null default '{}'::jsonb,
content text not null default '',
-- etc
)A few notes on what I’ve learned from these experiments:
Markdown with data is flexible enough to describe skills and agents. If your harness loads skills and agents from the same document store that agents write to, then agents can upgrade their own skills and create new agents.
Identifying documents by path (e.g. music/music-theory/chords) has turned out to be surprisingly useful. Paths give agents useful hints about content. They also offer useful access patterns. You can make prefix lookups (music/music-theory/*) quite fast in SQL (index with text_pattern_ops in Postgres). LLMs are good at navigating directories if you give them a few tools. They get a lot of training in code harnesses.
It’s a good idea to index content for plain text search (FTS5 in SQLite, tsvector in Postgres). This gives your agents another powerful access pattern.
Code wants schemas to be relatively stable, so you’ll want some form of schema validation for the data. Easy answer: assign a type discriminator to every doc, and validate on write.
If you implement the schema validation logic with JSONSchema, you can save the schemas themselves to your document store. Now agents can define new domain models on the fly as they encounter new problem spaces.
I give my agents tools to list doc schemas. It turns out that the agents are pretty smart about looking up an appropriate document type before saving.
It’s a good idea to let schema validation throw an informative error in your write_doc tool. The LLM is smart enough to fix its mistakes and try again.
Schema migration remains tricky. It’s common for NoSQL records to have a schema and a version field to facilitate migration. However, I prefer to define version in the type (e.g. “agent/v1”), and define migrations as a function of type-to-type. This gives me clean type discriminators in TypeScript.
If you implement versioning for the document store, you have a Karpathy-style LLM wiki, but better, because the agents can save structured data. I’ve been playing with bolting a Couch-like versioning on top of SQL in this vibe-coded clone of CouchDB on SQLite.
Versioning also gives you a way to resolve conflicts that arise when multiple agents share the same LLM wiki. Conflict resolution is a deep rabbit hole, but a simple trick is to make the LLM re-read the existing version of a document if the version it is working from is out of date. It’s like solving CRDTs with a sledgehammer, but it works.
CouchDB’s conflict resolution strategy is simple, and I think it has nice properties for LLMs. The database surfaces replica conflicts on read. The application (not the database) resolves the conflicts by reading them in and making a new merge commit. The idea is that the application knows the most about the domain model, so it is best positioned to resolve conflicts.
While conflicts between replicas are possible, CouchDB rejects writes from clients when a write lags behind head. This avoids the most common source of conflicts, simple timing issues. The client must pull the latest, rebase, and try again. This might feel like extra ceremony, but it’s the read-before-write pattern that we want to force on LLMs anyway, and it gets us multi-agent collaboration over shared docs.
Don’t forget that you can build specialized tools backed by the document store. For example, we have a time series tool in the Deep Future harness. Each time series observation is saved as a separate document, but the tool allows for bulk writes of hundreds of observations, and takes care of gory details such as generating ULIDs for each document.
Since documents can store arbitrary JSON, you can actually persist UI component state to documents. In part of our Deep Future codebase, we’re playing with Lit components that persist their state to docs in the store. A service worker caches the latest doc revisions for every UI component. Subscribe the service worker to an SSE feed of doc changes, and you have a simple sync engine. We get to skip the ceremony of hooks and stores. Plus, component state gets all of the features we built for docs, including wiki-like versioning, schema validation, schema migration, and full-text search.
Pairing all of this with generative UI can be incredibly powerful, and alleviates one of the downsides of flexible schemas. Even with all of that schema validation and migration, we can run into edge cases connecting soft schemas to hard code. But the agent can act as a flexible mediator between data and UI. Give the agent a skill that instructs on it how to plug data into HTML attributes, and it will jam the round peg into the square hole.
NoSQL is back!
How about a few links?
Tim Kellogg has a great roundup of agent memory patterns.
A good overview of the agent harness patterns leaked from the Claude Code source. A lot of these are actor patterns, rediscovered.
Classical inheritance is weird and complicated. How did we end up with this pervasive OOP pattern? It turns out inheritance was a performance hack. Oh.
Dotcom capacity swaps. I was fuzzy on the specifics of the Dotcom Crash, but this article clears it up. Step one: raise money. Step two: lend that money to other companies so they can turn around and buy your product, thus manufacturing demand. Step three: sell each other rights to your networks. Do we need rights to these networks? Don’t worry about it! We’re booking the revenue today and amortizing the cost over twenty years. Hilarious. Hacker mindset is SV’s greatest strength (and weakness).
Playing to your outs. A MtG deep cut. How do you play a game that you know you’re bound to lose? Very differently. You leverage everything you’ve got and do things that would otherwise be very reckless.



