
Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
Researchers propose a novel framework for detecting LLM hallucinations by modeling text corpora as probabilistic drift fields in embedding space. The approach scores sentence transitions against learned patterns from training data, yielding interpretable, corpus-traceable confidence scores without requiring model internals. This addresses a critical pain point in production LLM deployment: distinguishing genuine outputs from fabrications. The Vector Sequence Database infrastructure enables efficient computation at scale, making the technique practical for real-world groundedness verification across large corpora.62

















