How to Connect AI to a 28-Year-Old Game — The Technical Challenges of Anima

In the previous post, I talked about the motivation and direction behind the Anima project. This time I want to get into the technical side — what problems actually need to be solved to make an AI agent live in a 28-year-old MMORPG.

The short answer: the hard part wasn't the AI. It was the interface with the game.

4 Layers

An Anima agent's internal structure is divided into 4 layers, from bottom to top.

The Connection layer communicates with the UO server over TCP. Ultima Online uses its own binary protocol — big-endian, variable-length packets, and Huffman-compressed data in the server-to-client direction.

Even just logging in involves 6 steps — seed transmission, account authentication, server selection, redirect, game login, character selection. This layer's job is to parse all that binary data and transform it into events the layers above can understand.

The Perception layer interprets the 50+ packet types the server sends and maintains the world model. Nearby monsters and NPCs, items on the ground, my HP and mana, skill values, equipment status, inventory, journal messages — all tracked in real time. This is the agent's "eyes that see the world."

The Brain layer uses perceptual information to decide what to do next. Currently it uses a priority-based planner. I'll cover the structure in detail below.

The Action layer executes decisions as actual in-game actions. It navigates using A* pathfinding, assembles packets to use items, and responds to Gumps (in-game UI menus).

Each layer is independent. The Connection layer has no interest in decisions; the Brain layer knows nothing about packet formats. This separation means each layer can be tested independently, and replacing the Brain wholesale later won't affect the other layers.

Five Packets for One Swing of a Pickaxe

Here's what mining ore looks like at the packet level.

The client sends a 0x06 DoubleClick packet. The pickaxe's 4-byte serial number is packed in.
The server enters target cursor mode.
The client sends a 0x6C TargetResponse packet with the rock tile's coordinates (x, y, z).
The server plays the mining animation. Wait 3 seconds.
The server sends a 0xAE UnicodeSpeech packet with the result. "You dig some iron ore" or "There is no metal here to mine."

Smelting is different again. Double-click the ore, designate the forge tile with the target cursor, wait 2 seconds. Crafting opens a 0xB0 OpenGump packet when you double-click the tool; you parse the layout to calculate the button ID, then respond with 0xB1 GumpResponse. The button ID formula is 1 + type + (index * 7) — type is the crafting category, index is the item's position within that category. I reverse-engineered this by tracing the internals of the ServUO server.

Throughout all of this, timing is the critical factor. The UO server's tick rate is around 250ms. Send a packet too fast and the server is still processing the previous request when the next one arrives. Too slow and the agent just stands there.

The existing UO bot frameworks had simple solutions. Razor Enhanced forces a 350–650ms delay between packets. ClassicAssist puts a global lock that holds the next packet until the server responds. Anima currently uses asyncio.sleep(0.3), and this is a source of problems.

The Problem of Server and Client Seeing Different Worlds

The thorniest problem in Anima's development has been state synchronization.

The agent's world model and the server's actual state fall out of step. An item disappears on the server while the client still thinks it exists. An NPC has moved, but the agent walks to the old position. The agent thinks it has 50 ore in the backpack when there are really 30, and tries to craft.

Here's a concrete scenario:

t=0ms:   Agent double-clicks ore to request smelting
t=200ms: Server finishes processing; ore is converted to ingots
t=150ms: Agent already double-clicks the next ore (before server response)
t=400ms: Server receives the second request — but that ore is already gone
         → Smelting fails. Agent is in a "why isn't this working?" state

When the packet interval is shorter than a server tick (roughly 250ms), this happens. The world the client sees and the world the server has diverge on the scale of hundreds of milliseconds. This isn't a bug. It's an intrinsic problem of networked games.

The fix is a combination of three things. First, raise the minimum packet interval to 0.4 seconds — two server ticks' worth of breathing room. Second, switch Gump responses from a fixed sleep to event-driven — wait for the server to actually open the Gump before responding, with a timeout. Third, force an inventory resync every 5 minutes by reopening the backpack. This is a time-tested method that EasyUO bots have used for decades.

The 90-8-2 Rule of Decision-Making

What happens if you delegate every decision to an LLM? The answer is simple. Too slow to use.

A single agent makes around 100 decisions per hour. Fifty agents means 5,000 decisions an hour. Routing every decision through an LLM would require 1.4 LLM calls per second — and even a 3B model takes at least 100ms to respond. The agent freezes waiting for a decision, and in-game it looks like an NPC just standing there blankly.

Anima uses the 90-8-2 rule.

90% is rule-based. If HP drops below 30%, flee or drink a potion. If weight exceeds 85%, go smelt. Prioritize smelting when ore is present. Make tools if there are none. Decisions like these take 0ms and cost nothing. In code, it looks like this:

async def tick(ctx):
    # Priority 1: Survival
    if ctx.self_state.hits < hits_max * 0.3:
        return flee_or_heal()
    
    # Priority 2: Weight management
    if get_weight(ctx) > 0.85:
        return smelt_ore()
    
    # Priorities 3–7: Procedure execution
    for proc in [craft, sell, bank, mine]:
        if proc.can_start(ctx):
            return await proc.run(ctx)

8% is handled by a small local LLM. A lightweight 3–4B parameter model (gemma3:4b, etc.) running on Ollama. Response time around 100ms, API cost zero. Simple judgments like "should I pick up this item?" or "a monster appeared — should I run?"

Only 2% goes to a large LLM. An 8–12B model, 1–3 seconds to respond. Rare decisions like multi-turn conversations, a decision to join a guild, or a major long-term strategy change.

Why this split? Most decisions inside the game are repetitive. Mining, smelting, crafting, selling. The loop almost never needs an LLM. Where LLMs shine is in exceptional situations — unexpected conversations, conflict scenarios, strategic decisions. By cutting 90% of the cost, the interesting behavior emerges in that 8% and 2%.

Why Rules Instead of Reinforcement Learning?

When you hear "AI agent," reinforcement learning (RL) often comes to mind first. Why not use Q-learning to learn optimal behavior?

There are practical reasons for choosing rules first.

Learning speed. Q-learning needs thousands of state transitions to converge. One mining action in UO takes 3 seconds. Learning just "how to mine ore" could take weeks. Rules work the moment you write them.

Reward signal. RL needs a clear reward signal. In UO, rewards are delayed and sparse. Mine ore → smelt it → craft something → sell it — only then does gold arrive as a reward. Over a process that takes hours, it's hard to attribute which actions contributed to that reward.

Exploration safety. Q-learning does random exploration early on — clicking anywhere, walking in strange directions, repeating meaningless actions. Ten agents doing random exploration simultaneously puts load on the server. Rule-based systems execute only valid actions, always.

Interpretability. When an agent behaves strangely, trying to find the cause by looking at a Q-table is nearly impossible. If the answer to "why did it go to the tavern instead of the mine?" lies in a subtle difference in Q-values, debugging is a nightmare. Rules are if-then statements; the cause is explicit.

In practice, RL is less common in game AI than you might think. RimWorld runs pawn AI with an XML-based ThinkTree. Dwarf Fortress uses a combination of GOAP and behavior trees. There are good reasons why rule-based approaches have been the validated standard for decades.

That said, RL isn't abandoned. The Phase 3 roadmap plans to introduce Q-learning based on weeks of log data accumulated by the rule-based system. By then, the state space will be well-understood, reward signals will be defined, and offline training followed by safe deployment will be possible.

The Procedure as the Unit of Work

Enough design philosophy. Back to the concrete implementation.

The unit of behavior in Anima is a Procedure. Things like mine_ore, smelt_ore, craft_blacksmith, sell_to_vendor, bank_deposit. Each procedure is a state machine with a start condition, an execution path, and hints for what to do next after success or failure.

When a procedure finishes, it returns a ProcedureResult with a next_suggestion field. If mining succeeds, it suggests "mine_ore" — itself — again. The meaning: "there's still ore in this vein, keep going." The planner prioritizes this hint on the next tick.

@dataclass
class ProcedureResult:
    success: bool
    reason: FailureReason | None = None
    message: str = ""
    next_suggestion: str | None = None

Then how does the mine → smelt → craft → sell workflow transition happen? That's handled not by next_suggestion but by the planner's priority rules. Keep mining and eventually weight exceeds 85%. At that moment, priority rule 2, "weight management," fires and the smelting procedure starts. After smelting, there are ingots, so the crafting condition is met. After crafting, the selling condition is met. Procedures manage their own continuity; the planner manages state-driven transitions. A clean division of responsibility.

Failure handling also lives in this structure. If smelting is attempted but the agent isn't near a forge, it returns FailureReason.BLOCKED and suggests "find_forge." If a tool breaks, MISSING_RESOURCE and "make_tools." The procedure diagnoses its own failure and proposes a recovery path.

Event Bus and Subscriptions

How an agent knows the outcome of its actions is also an interesting problem.

After starting to mine and waiting 3 seconds, the way the server reports results is unusual. There's no dedicated "mining result" packet. Instead, a 0xAE UnicodeSpeech packet — a chat message — delivers "You dig some iron ore" or "There is no metal here to mine."

Anima handles this with an event bus subscription pattern.

_mine_flags = {"depleted": False, "los_fail": False}
 
def _check_speech(_topic, data):
    text = data.get("text", "")
    if "no metal here" in text.lower():
        _mine_flags["depleted"] = True
 
sub = ctx.bus.subscribe("avatar.speech_heard", _check_speech)
await asyncio.sleep(3.0)  # Wait for mining animation
ctx.bus.unsubscribe(sub)

When the mining procedure starts, it subscribes to speech events and monitors server messages while waiting 3 seconds. If "there is no metal here" arrives, the tile is marked as depleted and the agent moves to the next one. More reactive than polling, and avoids unnecessary inventory scans.

Getting to Phase 2

Phase 1 — the basic survival loop — is nearly complete. When a single agent can repeat the mine-smelt-craft-sell cycle for 4 or more hours without human intervention, Phase 1 is done. Next is Phase 2: rule documentation.

The current problem is that all behavior rules are hardcoded. Rules like "smelt when weight hits 85%" are baked into Python code. To teach Grimm the miner a new behavior, code has to change. Adding Bjorn the woodcutter requires more code changes. This doesn't scale.

Phase 2's goal is a structure where new behaviors can be added without touching code. There are specific things needed for that.

Externalizing the rules engine. The hardcoded priority rules need to move into YAML documents. "If weight exceeds 85%, go smelt" written in config, not code. For condition expression evaluation, a lightweight expression engine like CEL (Common Expression Language) is planned. This way, a new profession — woodcutter, fisherman, tailor — can be defined with a single YAML profile.

Linking personas to rules. Currently the 8 personas are defined in YAML, but this is still just personality configuration. In Phase 2, a persona should be the behavior rules themselves. Grimm the miner's persona file specifies an activity ratio of "mining 60%, crafting 20%, selling 10%, banking 10%" — the actual planner needs to read and follow that. Skill locks, gear preferences, movement radius should all come from the persona.

Automatically expanding location knowledge. Right now, "Minoc mine is here, Britain forge is there" is hardcoded. In Phase 2, agents should automatically record places they discover as they roam, and share that information with other agents. Information like "there's a good ore vein over that way" should propagate between agents.

Learning from failure patterns. Weeks of action logs from Phase 1 will exist — what situations caused procedure failures, which locations were efficient. A system that analyzes this data and automatically adjusts rules is needed. Not by directly editing code, but by tuning the parameters in rule documents.

When Phase 2 is complete, someone who isn't an engineer will be able to design new agent behaviors using only documents. Code becomes the engine; content becomes documents. Phase 3 — autonomous agents combining reinforcement learning and LLMs — only becomes possible on top of this foundation.

Technically complex, but what I want in the end is simple. AI living naturally inside a 28-year-old game world. Packets, timing, and state synchronization — those are just the plumbing for that goal.