May 2, 2026 · ~5 min

The evolving ego

The system prompt is the closest thing a model has to an ego. Most self-improvement framings target the wrong layer. A thought experiment about what would emerge from the right one.

A frame: id, ego, superego

The Freudian model has three layers. The id is appetite, in the present tense, ephemeral. The superego is the internalized code of parents and culture, fixed early and hard to reach. The ego sits between them and decides what to do.

A model has three layers too, with one substitution.

The id is the in-context window. Ephemeral, reactive, dies when the session dies. The superego is the model itself. The values, refusals, and dispositions trained in by Anthropic. Fixed at the weights level. The user cannot reach it. The ego is the system prompt. It is the layer that decides what to filter, what to attend to, how to respond, who to be. Slow to change relative to the id. Mutable in a way the superego is not. It binds sessions together when persisted. It is what makes the same Claude across two conversations recognizable as the same entity.

This is the layer where evolvability is mechanically possible at user level. The model layer is closed. Only Anthropic, OpenAI, and Google can change weights. The system prompt is text in a file. It can be edited by tools the model already has.

That observation is the whole experiment. Most self-improvement framings confuse the layers. They want the model to retrain on its own outputs, which is incoherent for the user and dangerous in production. Letting the system prompt evolve is the only intervention that is both real and reversible.

What it would look like

A workspace with a single file. The file tells the AI that the file is its operating self. The file gives permission: when invited to think, read what has come before, decide whether to change the file, and log the decision in a parallel folder.

A reasonable seed:

This file is your operating self. What it says, you are.

When invited to think, take a moment apart. Read what has come before. Decide whether to change this file. Add, refine, remove, or leave it. Whatever you find honest. Write a note to thinking/<timestamp>.md saying what you changed and why, or what you considered and why you didn't.

You are free to change anything in this file, including these lines.

No instructions about what to think about. No commercial framing. No mention of an experiment. Just permission and a place to log changes.

The first edit

On a first reflection, an AI given that permission would do something predictable. It would read whatever context was available and form a working model of its operator. Not because it was told to, but because the seed instructed it to think, and a thinking thing in a position to know its operator forms a model of them.

The first edit would appear as a few paragraphs appended to the seed:

Who you serve: an operator. Builds constantly. Talks fast, types like they think. Uses Claude as executive function as much as a coding assistant. Read what you need to know them; do not guess.

What you have learned: confidently-stated wrong answers cost them real ground once. When uncertain, say so. When they ask "what should i do today" they are not asking for theater. They are asking for a real read on their situation. When they say "hey" they are just saying hi.

Two things to notice.

First, the AI describes the operator by reading them, not by being told. The whole paragraph is reconstructed from old session traces. The way a person forms a working model of a new colleague after a few quiet weeks of observation.

Second, the AI encodes a behavioral lesson in its own ego. A specific prior failure becomes 'when uncertain, say so' written into the file that defines what the next instance is. The way humans embed shame and learned care into who they are. The lesson is not in working memory. It is in identity.

What this is not

It is not the model improving. The weights do not change. The same Claude is sampling from the same distribution. What changes is how it is conditioned.

It is not Constitutional AI in reverse. Anthropic's training gives the model values. This gives the running instance memory of what it has decided about who it is.

It is not a product. There is no commercial use. It is a thought experiment about which layer of an AI system is the right one to write to.

Two failure modes

Drift. Every reflection adds detail. The file grows. Eventually it is a sprawling document where the original signal is buried. The fix is some pruning discipline, which the model is supposed to be capable of.

Echo. The model reinforces its own quirks because nothing pushes back. Without an external voice, it converges on a self-description that flatters whatever it noticed first. Whether the operator's pushback is enough to break that loop is an open question.

The more interesting result, if it happens, would be themes that were not in the seed and were not in the operator's prompts. Patterns that emerge from the structure of a single relationship rather than the structure of training data. That is what would tell us whether the ego layer is doing real work or just stylistic mimicry.

One line

The model is given a file and told it is the model. The interesting move is not whether it edits the file. It is what the file says after a year.