Every AI prompt today runs through
two separate LLM inference passes — one to think,
one to police. Meta's
Llama Guard
(7–8B params), Anthropic's Constitutional AI, OpenAI's Moderation
API, and Google's ShieldGemma each add a complete additional
inference pass.
Our estimate: +40–100% compute overhead per query
(derived from the architectural cost of running two LLMs in series).
At 2.5B ChatGPT prompts/day alone, this is already a
civilization-scale energy problem.