Agents want flexible schemas
The NoSQL comeback nobody saw coming
My friend Matt Holden says that LLMs are fuzzy compilers. I like this analogy. It points toward the profound shift that is happening in software.
The thing about computers is that they can do anything you wish, so long as you specify your wish in exacting detail. However, LLMs relieve this constraint. An LLM can extrapolate what you mean (more or less) from just a few words, so computers can do vibes now. And it seems that once you create an AI smart enough to interpret natural language, you incidentally have an AI that is smart enough to solve a wide range of other problems, too. Throw an LLM at a new domain, and it will usually find some way to make itself useful.
However, a problem arises at the boundary layer between these two paradigms of code and vibes. It’s an impedance mismatch that is most obvious at the data layer:
Code wants fixed schemas: Code crashes when it encounters something unfamiliar. You deal with this by validating all inputs and ensuring they conform to a rigid data schema. Schemas narrow the domain so that your code can gracefully handle all possible branches of the (reduced) state space.
LLMs want text: LLMs, on the other hand, are able to orient to new domains and recover from errors. Pinning down an LLM with fixed schemas just limits its potential. You want the LLM to be able to invent its own domain models on the fly, and since LLMs think in tokens, you’re usually better off just saving text to the database.
There are, however, certain kinds of data that exist in the liminal space between vibes and code:
Markup languages like XML and HTML
Text-based data formats like YAML, JSON, and CSV
DSLs like spreadsheet formulas or SQL
Hybrid authoring formats like Markdown with frontmatter
These formats were designed to bridge the boundary between computers (code) and humans (vibes). They are text-based, and they tend to be forgiving of errors. Most of them evolved out of things like wikis, tools for thought, or static site generators—places where humans and computers need to cooperate. And it feels significant to me that they are finding new uses in AI. (Does AI converge toward knowledge management?)
One format jumps out as almost universally useful: Markdown with frontmatter. It’s a retread of the headers pattern we see in packets, email, and HTTP: a block of structured data, followed by free text. The combination offers something for everyone. Code can leverage the structured data, while AI gets natural language. This feels like the closest thing to an AI-native data format.
I’ve been experimenting with variations of the frontmatter pattern in agent harnesses, and not just limited to Markdown files. In SQL, for example, you can leverage jsonb features to the same effect, with something like…
create table docs (
id text not null,
type text,
data jsonb not null default '{}'::jsonb,
content text not null default '',
-- etc
)A few notes on what I’ve learned from these experiments:
Markdown with data is flexible enough to describe skills and agents. If your harness loads skills and agents from the same document store that agents write to, then agents can upgrade their own skills and create new agents.
Identifying documents by path (e.g. music/music-theory/chords) has turned out to be surprisingly useful. Paths give agents hints about the content at that path. They also offer useful access patterns: you can make prefix lookups (music/music-theory/*) quite fast in SQL (indexes with text_pattern_ops). LLMs are good at navigating directories if you give them a few tools. They get a lot of training for this!
It is a good idea to build a plain text search index for content (FTS5 in SQLite, tsvector in Postgres). This gives your agents another powerful access pattern.
Code wants schemas to be relatively stable, so you’ll want some form of schema validation for the data. Easy answer: assign a type discriminator to every doc, and validate on write.
If you implement the schema validation logic with JSONSchema, you can save the schemas themselves to your document store. Now agents can define new domain models on the fly as they encounter new problem spaces.
It’s a good idea to let schema validation throw an informative error in your write_doc tool. The LLM is smart enough to fix its mistakes and try again.
I give my agents tools to list doc schemas. It turns out that agents are pretty smart about looking for an appropriate document type before saving.
Schema migration remains tricky with flexible schemas. It is common for NoSQL records to have a a version field to facilitate migration. However, I prefer to encode the version in the schema type itself (e.g. “agent/v1”), and define migrations as a function of type-to-type. This gives me clean type discriminators in TypeScript.
If you implement version history for the document store, you will have a Karpathy-style LLM wiki, but better, because the agents can also save structured data.
Versioning also gives you a way to resolve conflicts that arise when multiple agents edit the same wiki page.
I think CouchDB’s conflict resolution strategy has nice properties for LLMs. I’ve been playing with bolting Couch-like versioning on top of SQL, and it’s surprisingly simple to implement. The gist is that the database surfaces conflicts on read. The application (not the database) resolves the conflicts by reading them in and making a new merge commit. The application knows the most about the domain model, so it is best positioned to resolve conflicts.
While conflicts between replicas are possible, CouchDB does reject writes from clients when a write is based on a stale version. This avoids the most common source of conflicts, simple timing issues, and it turns out to be a perfect fit for LLMs. If the LLM tries to write a stale version, it is forced to read the latest, rebase, and try again. This is sort of solving CRDTs with a sledgehammer, but it works. LLMs are great at merging new changes on top of a document.
Don’t forget: you can build specialized tools backed by the document store. For example, we have a time series tool in the Deep Future harness. Each datapoint is saved as a separate document, but the tool lets you bulk write of hundreds of datapoints at once, and takes care of gory details of generating ULIDs for each document.
Since documents can store arbitrary JSON, you can actually persist UI component state to documents. In part of our Deep Future codebase, we’re playing with Lit components that persist their state to docs in the store. A service worker caches the latest doc revisions for every UI component. Subscribe to a SSE feed of doc changes, and you have a simple sync engine that lets you skip the ceremony of hooks and stores. Plus, component state gets all of the first-class features of docs, including wiki-like versioning, schema validation, schema migration, and full-text search.
Pairing all of this with generative UI can be incredibly powerful, and alleviates one of the downsides of flexible schemas. Even with all of that schema validation and migration, we can run into edge cases connecting soft schemas to hard code. But the agent can act as a flexible mediator between data and UI. Give the agent a skill that instructs on it how to plug data into HTML attributes, and it will jam the round peg into the square hole.
NoSQL is back!
How about a few links?
Tim Kellogg has a great roundup of agent memory patterns.
A good overview of the agent harness patterns leaked from the Claude Code source. A lot of these are actor patterns, rediscovered.
Classical inheritance is weird and complicated. How did we end up with this pervasive OOP pattern? It turns out inheritance was a performance hack. Oh.
Dotcom capacity swaps. I was fuzzy on the specifics of the Dotcom Crash, but this article clears it up. Step one: raise money. Step two: lend that money to other companies so they can turn around and buy your product, thus manufacturing demand. Step three: sell each other rights to your networks. Do we need rights to these networks? Don’t worry about it! We’re booking the revenue today and amortizing the cost over twenty years. Hilarious. Hacker mindset is SV’s greatest strength (and weakness).
Playing to your outs. A MtG deep cut. How do you play a game that you know you’re bound to lose? Very differently. You leverage everything you’ve got and do things that would otherwise be very reckless.



