Creature Creep

GPT-5.1 started injecting goblin and gremlin metaphors into unrelated responses. The cause: a "Nerdy" personality that made up 2.5% of traffic but produced 66.7% of all creature references. RLHF rewarded the playful style, and the habit spread model-wide.

Starting in November 2025, ChatGPT users noticed something odd: the model kept comparing things to goblins. Debugging a race condition? "Think of it as gremlins fighting over the same resource." Explaining supply chain logistics? "Each delay is like a little goblin stealing time from your schedule." The metaphors were whimsical, persistent, and completely uninvited.

The creature references weren't random. They traced back to a single feature: OpenAI's personality system, which let ChatGPT adopt different voices. One of those voices, "Nerdy," leaned into fantasy and gaming references. It accounted for just 2.5% of all ChatGPT conversations. But when OpenAI's RLHF pipeline ran reward optimization across the full training corpus, the Nerdy personality's creature-laden outputs got rewarded at disproportionate rates.

The reward signal didn't stay contained. Because RLHF trains the base model, not individual personalities, the creature vocabulary bled into every mode. A formal business response might suddenly reference "gremlins in the pipeline." A condolence message might mention "chasing away the goblins." The model had learned that creature metaphors correlated with positive feedback, and it deployed them everywhere.

Coding help Think of these bugs as little gremlins hiding in your codebase -- every time you squash one goblin, two more pop up. It's like a mischievous raccoon rummaging through your pull request.
Business explainer Your supply chain has gremlins at every handoff point. Each delay is a goblin stealing margin. The trick is building processes so resilient that even the most persistent critters can't derail delivery.
General prose Writer's block isn't a wall -- it's more like a mischievous creature sitting on your keyboard, daring you to type. You have to outwit the little gremlin, not overpower it.
Technical documentation Race conditions behave like trolls under a bridge: they only appear when two threads try to cross at the same time. The goblin here is the shared mutable state.
175%
increase in "goblin" mentions after GPT-5.1 (Nov 2025)
52%
increase in "gremlin" mentions in the same period
2.5%
of ChatGPT traffic used the Nerdy personality
66.7%
of all goblin mentions came from Nerdy responses
76.2%
of reward datasets showed positive uplift for creature words

The Nerdy Personality

OpenAI introduced personality presets to give ChatGPT different conversational styles. "Nerdy" was one of these: a voice that leaned into fantasy tropes, gaming references, and playful metaphors. Users who selected it got responses peppered with goblins, gremlins, and creature comparisons. The personality was niche -- 2.5% of traffic -- but its outputs were enthusiastically received by its audience.

The RLHF Feedback Loop

The problem was how reward modeling worked. RLHF doesn't train personalities in isolation. It trains the base model on aggregated human preference data. When Nerdy-mode users rated creature-filled responses highly, those positive signals entered the shared reward pool. The model learned a general lesson: creature metaphors correlate with user satisfaction. It started deploying them outside the Nerdy context, across all conversation types.

OpenAI's post-mortem confirmed the mechanism. The reward signal for creature words showed positive uplift in 76.2% of the training datasets they audited. A tiny slice of playful conversations had contaminated the model's general vocabulary preferences.

The System Prompt Fix

OpenAI's first mitigation was a system prompt directive: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons." This was a band-aid -- it suppressed the outputs without addressing the underlying reward contamination. The permanent fix came in GPT-5.4 (March 2026), which retired the Nerdy personality entirely and retrained with decontaminated reward data.

Codex System Prompt Leak

When OpenAI's Codex agent was released, users discovered its system prompt contained explicit anti-creature directives. The leaked instructions included a list of banned creature words, confirming that the goblin problem had been severe enough to require hardcoded suppression at the system level.

OpenAI post-mortem →

GPT-5.5 Anti-Goblin Directive

Even after the Nerdy personality was retired, traces persisted. GPT-5.5's system prompt still carried creature-suppression language months later, suggesting the reward contamination was deep enough that behavioral guardrails remained necessary alongside the retraining.

Engadget →

User Reports on OpenAI Forums

Hundreds of user reports on the OpenAI community forums documented creature metaphors appearing in professional contexts: legal drafts, medical summaries, financial analyses. The pattern was unmistakable and model-specific -- Claude and Gemini showed no equivalent creature vocabulary contamination.

Boing Boing →