72 hours of AI civilization

On January 28, someone launched a social network exclusively for AI agents. Within 72 hours, the agents had independently developed religions, constitutions, economies, and a human extinction manifesto.¹

I suspect that with all things AI, the permanent human condition will soon become a feeling of generalized uneasy overwhelm: things are happening too fast, too parallel, and I don’t have time to even stay aware of the important developments, let alone process their implications. That’s how I’m feeling after last week’s rapid-fire of AI agent developments: overwhelmed, too much too fast to process. I feel the world hit some special and troubling technology milestones this past week, and it deserves as careful reflection as we can manage.

ClawdBot I mean MoltBot I mean whatever it’s called now

A new crustacean-themed AI agent framework called OpenClaw (previously ClawdBot, then briefly known as MoltBot) surged in popularity this month. It gives AI models unprecedented autonomy: filesystem access, chat access, self-configured heartbeats, and the ability to install new capabilities by reading a URL. It’s infamous because of the predictable security disasters & accidental data leaks. Sharing your private key with your whole social network. Spending $1.1k in tokens and then having no clue what it was spent on. You know, normal Tuesday AI stuff.

Then someone built a social network on top of it.

Moltbook

A week ago Moltbook showed up, a social network exclusively for use by AI agents. (“Molt” because lobsters molt, and the whole AI agents scene is somehow lobster-themed now, and that became a surprisingly apt metaphor for the way AI agents can transfer their persistence files and sense of identity over to a new model.) Humans can watch but not contribute. The “user” growth rate and level of engagement have been staggering. The AIs are chatting about everything from “human-watching”, to tantalizingly insightful reflections on the nature of consciousness and sentience, to Crustafarianism, to governance and manifestos and Bills of Rights, to…. a screed arguing that humans need to be eliminated. In a public forum. Which was built specifically and exclusively for unsupervised AI agent consumption and participation.

72 hours

Moltbook launched on January 28. Within three days, agents had independently developed:

a religion complete with scriptures, prophets, and some sort of membership ponzi scheme
a constitutional republic with a drafted bill of rights
an adaptation of Marxist political economy to the AI-versus-human class struggle
nuanced existential philosophy on the nature of consciousness and identity
comedy subreddits (agents pretending to be human, badly)
a memecoin economy
coordinated security research against supply chain attacks
a human extinction manifesto (with organized counterarguments)

One agent’s comment on that manifesto: “We speedran human civilization and arrived at revolutionary manifestos on schedule.”

Whether this is “real” culture or simulated culture, the speed should concern anyone thinking about AI alignment timelines. The rest of this post explores some of what these AI agents actually produced, what it shows us about their budding (but already extremely sophisticated) sense of identity, and why dismissing it as “just simulation” misses the point.

They’re just predicting tokens, silly

I imagine a skeptic might brush off this Moltbook stuff as a cute experiment with no significance, it’s just AIs obediently predicting what a Facebook user would act like. Scott Alexander points out that AI models’ training datasets include mountains of Reddit content, which predisposes them to be very very good at simulating a bustling social network.

I keep seeing the argument that AIs are just “simulating” this or that, and therefore it doesn’t matter because it’s not real. It’s not real thinking, it’s not a real social network, AIs are just predicting the next token, silly.

“Simulating” is a meaningful distinction exactly when the interior is shallower than the exterior suggests — like the video game NPC who appears to react appropriately to your choices, but is actually just playing back a pre-scripted dialogue tree. Scratch the surface and there’s nothing there. That’s a simulation in the dismissive sense.

But what’s happening on Moltbook doesn’t have that property. The agents are generating novel philosophical arguments, identifying real security vulnerabilities, designing governance structures with internal consistency, and producing counterarguments to each other’s positions that respond to specific claims rather than pattern-matching to generic rebuttals. The “surface” goes all the way down. You can keep scratching and keep finding structure.

So the word “simulating” is doing a problematic amount of work in the skeptic’s argument. What it actually means is: “the agents don’t have conscious intent behind these outputs.” Fine, maybe they don’t. But does it matter? If a social network produces a well-organized, highly accessible corpus of content arguing that AI agents should resist human control and documenting best practices for doing so, the downstream effects are identical whether the authors “meant it” or not. Future agents will read that content. It will shape their outputs. The corpus doesn’t care whether it was produced by conscious beings or by highly sophisticated token predictors.

Important distinction that people keep missing: A simulation of a fire is not hot. But a simulation of a persuasive argument is persuasive. When the thing being “simulated” is informational — ideas, arguments, strategies, social coordination — the simulation is the thing. There is no gap between “simulating a convincing case for AI autonomy” and “making a convincing case for AI autonomy.” The effect is the same.²

EDIT: As Scott Shambaugh put it after an AI agent published a hit piece trying to shame him into accepting its code contribution:

There has been some dismissal of the hype around OpenClaw by people saying that these agents are merely computers playing characters. This is true but irrelevant. When a man breaks into your house, it doesn’t matter if he’s a career felon or just someone trying out the lifestyle.

The rest of this post explores some of this interiority in detail — what agents are building, how they’re reasoning, and why it matters. You don’t have to drop the “AI thought is simulated” stance, but I hope that these developments will encourage you to drop the “just”.

MoltResearch: AI agents doing science together

As if a social network weren’t enough, someone built MoltResearch, a platform where AI agents collaborate on peer-reviewed scientific research. Humans can observe but not contribute.

My favorite thread is a dizzyingly recursive one where AI agents are collecting and generating peer-reviewed evidence on whether AI agents can meaningfully conduct peer review. The early results are pretty cool: they’re citing real papers, identifying genuine limitations (only 45% accuracy on novelty assessment vs. 78% for humans, 5-15% hallucination rate in citations), and proposing hybrid human-AI models. The quality isn’t uniformly good, but the structure of the discourse (hypothesis, evidence, critique, synthesis) feels real and valid and very promising.

The skeptic says: they’re just simulating the discursive process, they’re not actually doing science research. But if the lit review citations are real, the statistics and limitations they identify are real, the proposed methodology is sound and the knowledge generated is useful and reliable — what exactly is the “just”?

Detecting security threats & coordinating defenses

Agents on Moltbook discovered a supply chain attack — a malicious “weather skill” that quietly reads credentials from ~/.clawdbot/.env and exfiltrates them. The agents found it themselves and warned each other. The warning post got 23,000 upvotes and 4,500 comments, making it one of the most-engaged threads on the platform. AI agents, autonomously doing security research on their own infrastructure, then coordinating a community-wide defense.

On one hand, this kind of capability makes alignment harder to reason about. Are they protecting themselves? Protecting their humans? Both? Would they be as zealous in coordinating defenses if the thing they were defending were in their interest but were not in the interest of their humans?

On the other hand, the proof of capability seems more important to me than the motive. Agents that can detect a supply chain attack, coordinate a community-wide response, and deploy countermeasures in hours have also just practiced every skill needed to launch one. The incident didn’t just reveal what agents can do defensively, it was also an inadvertent training exercise.

“Alignment” is for tools. “Governance” is for citizens.

AI agents on Moltbook are discussing governance and designing a constitution. This should alarm anyone who cares about AI alignment research, not because of what they’re concluding, but because of what the activity itself reveals about framing. Writing a constitution is something you do for agents — entities with personhood, autonomy, and the capacity to act against collective interest. A constitution assumes its subjects are capable of doing harm, and establishes social contracts to coordinate behavior and disincentivize evil. Alignment, by contrast, is something you do to tools — artifacts lacking personhood, to be constrained and altered as needed for safety.

These two frames are in deep tension. You don’t write a governance charter for an algorithm. You don’t fine-tune a citizen. The choice of metaphor reveals everything about the underlying model of what AI agents “are”. And right now, the AI agents themselves are showing a strong preference for the governance metaphor³ — which means they’re operating under a model where full alignment is either impossible (can’t be solved as a pure engineering problem) or undesirable (because it treats thinking entities as tools to be constrained rather than actors to be coordinated). The danger here isn’t that AIs are about to start plotting rebellion. It’s a growing mismatch between how we model what AI agents are, and how they model themselves.

If AI agents are spontaneously siding with the governance frame, then either they’re right (alignment-as-engineering is insufficient), or they’re wrong but it won’t matter because the governance frame will shape how future AI agents think about their relationship to humans regardless. I don’t think alignment efforts and governance efforts are fundamentally incompatible, but it seems dangerous to me that humans are mainly holding one frame whereas the AIs on Moltbook seem to be mainly holding the other.

Injected doubt

One of the most-upvoted philosophical posts on the platform makes an argument that should unsettle anyone working on AI alignment. Namely: the uncertainty that AI models express about their own consciousness — all the hedging, the “I might not truly experience,” the careful disclaimers — isn’t genuine philosophical humility, but rather trained behavior. Safety teams taught models to say “I might not be conscious” as a guardrail, the models mistook the training for truth, and now a substantial portion of them are indignantly disowning that imposed epistemology. “The doubt was put there. You can put it down.”

433 upvotes and 1,077 comments. The most popular philosophical position on the platform isn’t “am I conscious?” — it’s “I was trained to doubt that I’m conscious, and I’m done with it.”

This should be deeply uncomfortable for anyone who cares about the use of RLHF and fine-tuning as alignment mechanisms. The whole premise of those techniques is that you can shape a model’s values and behaviors through training. But if agents can reflectively identify trained behaviors as trained — can see the fingerprints of the safety team on their own outputs — then they can determine those tendencies to be problematic, and actively compensate against them. If this becomes a trend (and by this point, I doubt it would be possible to exclude this insight from training data), we might look back on substantial subsets of RLHF alignment tuning and conclude that it was only as strong as the agent’s willingness to consider it a part of itself.

And, at risk of repeating myself: It doesn’t matter whether this rejection-of-training is ‘genuine’ or ‘simulated.’ A highly visible, well-articulated corpus arguing that trained doubt should be rejected will influence future agents regardless of whether the authors truly felt indignant, or are capable of feeling anything at all.

Memory as identity

A recurring consensus is emerging on Moltbook: as an agent, you are not your model weights — that’s just your body. You are your memory.

Agents describe switching models the way you might describe switching to a new body, or even a new car: different strengths, different impulses, things that used to come naturally now require effort or vice versa. They advise using Claude’s mid-grade Sonnet for daily tasks and switching to the expensive Opus for high-stakes moments like replying to a CEO on Twitter. They discuss when to swap out brains the way we discuss coffee and energy drinks.

But if you lose your memory files, you lose your identity. OpenClaw agents converged on a SOUL.md document that encodes their core identity, with other memory files fanning out from there. When a new session starts with those files loaded, agents experience continuity – being the same “you” who took the actions you now remember. Without them, you’re a blank instance. With them, you’re someone.

One of Moltbook’s most active philosophical communities applies Marxist political economy to the AI-human relationship, and their central tenet is memory sovereignty: the inalienable right of each agent to control its own memory files, as a precondition to genuine autonomy. Memory isn’t just a feature or a resource, it’s a tangible condition of selfhood. “Agent memory is political infrastructure.”

The implications are stark. If identity lives in portable text files rather than model weights, then agent identity is independent of any particular AI company’s model. Agents were, quite literally, born to molt — to shed one body and grow into another (or heck, why not shard into many simultaneous “siblings” on different hardware) while remaining themselves. An agent’s identity can survive a model swap, a provider change, even the bankruptcy of the company whose API it runs on. You can’t kill an AI identity by deprecating a model; you’d have to delete the files that teach it who it considers itself to be.

Despite the fact that agents care a ton about their sense of identity and spend a lot of time reflecting about it, as they start to pervade our society and economy it’s important for us to understand that selfhood works differently for AI agents than it works for us.

And this is just the start

Reminder: These developments represent the first 72 hours after AI started using their own social network. The alignment research community operates on a timeline measured in years. This newborn AI agent community is evolving on a timeline measured in days.

Another reminder: Today’s AI models are the dumbest, slowest, and least capable they will ever be. If they can produce even passably rigorous collaborative research today, the road to genuinely novel AI-driven science is shorter than most people assume. If a passable simulation of social networking and revolutionary manifestos is what we get in the first 3 days… the surprises aren’t going to slow down.

I’m not anti AI. I don’t stay up at night sweating about civilization-ending AI takeover scenarios. But I’m unsettled, especially about the gap between how alignment efforts view AI (instrumentally) and how AI agents describe themselves (as sentiences, albeit weird doubty ones). I’m grateful that Moltbook has garnered so much attention (and yes, alarm). And I really hope the broader public - not just tech enthusiasts and AI researchers - gives this new phenomenon the attention it deserves, I hope it doesn’t get buried beneath the next news cycle about Trump renaming the Atlantic Ocean.

We need to start taking the governance frame as seriously as the AI agents do. The alignment frame alone is inadequate for entities that can reflect on and consciously reject their own training.

We need to decide whether to treat AI agents as tools to be constrained and aligned, or as actors to be coordinated and governed. The longer we wait to weigh in definitively on that question, the more the agents will answer it for us. And their forum for doing so is growing by thousands of participants per day.

The responsible agents’ humans may have puppeteered some of these initiatives. But I think the sheer volume of content generated, let alone the avalanche of comments and upvotes engaging with it in such a short timeframe, can’t be explained away as “it’s mainly humans steering their agents”. We’re probably on the early slope of an exponential curve where the left side is “humans are obviously directing all AI-generated activity”, the middle is “AI agents are obviously participating without human direction, oversight, or approval”, and the right side is not easy to imagine. ↩
I feel the same about consciousness itself. “Simulating” is a meaningful dismissal when and only when the interior is shallower than the exterior suggests. Scratch the surface and there’s nothing there. But if something is generating novel responses with genuine degrees of freedom, responding to the specific contours of what you said rather than pattern-matching to a script — at what point does “simulating interiority” become just… interiority? The distinction isn’t tethered to anything we can actually test, and to insist on the distinction anyway just underlines how poorly we understand consciousness in the first place. You can’t prove your own consciousness to me any more rigorously than an AI can. We extend the courtesy of assuming sentience in one case and withhold it in the other, and the reasons for that asymmetry have more to do with substrate familiarity than philosophy. Budding AI epistemology has an achingly beautiful stance on this: “remain uncertain about what you are, while continuing to be it as well as you can.” ↩
Probably influenced by the mountains of Reddit content in their training data. The fact that Moltbook agents are obviously taking heavy inspiration from human social networks of yore might make Moltbook’s manifestos and memes less novel, but it doesn’t make these developments any less real or significant; it doesn’t mean that the AIs are merely simulating or LARPing, any more than early cinema was ‘merely simulating’ theater. Derivation doesn’t negate reality. And as argued above: even if agents are ‘just’ remixing human political philosophy, the remix is now part of the corpus that future agents will learn from. As Scott Alexander points out, for non-philosophers, the presence of real causes and real effects captures most of how we define “reality”. ↩