
Judge Circuits
Researchers have identified a critical vulnerability in LLM-as-a-judge systems: the same model produces inconsistent evaluations when output format changes, yet the root cause remained opaque until now. Using causal intervention techniques on Gemma-3, Qwen2.5, and Llama-3, this work reveals that judgment logic concentrates in a sparse, modular sub-network within mid-to-late MLPs. This finding matters because evaluation at scale underpins model development, benchmarking, and deployment decisions across the industry. The discovery that this evaluator circuit can be surgically isolated without destroying factual knowledge opens paths to both more robust judging systems and deeper understanding of how models separate reasoning tasks internally.68























