Hallucination Dataset 

Hallucination Dataset — AI Factual Drift, Fabrication Detection & Answer Integrity Layer

Hallucination Dataset is a diagnostic intelligence layer that captures, classifies, and quantifies factual hallucination patterns in AI-generated outputs. It focuses on detecting when models produce information that is unsupported, partially inferred, or fully fabricated relative to available retrieval sources.

Core purpose: transform hallucination from an anecdotal AI failure into a measurable system signal that can be tracked across models, prompts, and retrieval conditions.

Internal system links: Datasets Root | Retrieval Observation Dataset | AI Citation Dataset | Entity Visibility Dataset | Framework Layer


DATASET OBJECTIVE

The Hallucination Dataset is designed to isolate and structure failure modes in AI reasoning where output diverges from grounded sources or retrieval evidence.

  • Detect factual inconsistency in generated answers
  • Classify hallucination severity levels
  • Identify trigger patterns in prompts and contexts
  • Measure hallucination frequency across AI models
  • Track correction and self-repair behavior in responses

CORE DATA FIELDS

Each record captures one hallucination event at response level.

  • query_id
  • input_prompt
  • ai_model (GPT, Gemini, Claude, etc)
  • response_output
  • claimed_facts
  • verification_status (verified / unverified / false)
  • hallucination_type (factual / entity / numerical / citation / fabricated source)
  • severity_score (low / medium / high / critical)
  • ground_truth_source
  • timestamp

HALLUCINATION CLASSIFICATION SYSTEM

Hallucinations are not treated as binary errors. They are structured into failure taxonomies.

  • Factual hallucination (incorrect real-world claims)
  • Entity hallucination (false or mixed entity identity)
  • Citation hallucination (fabricated or invalid sources)
  • Numerical hallucination (wrong calculations or stats)
  • Contextual drift hallucination (misinterpreted prompt intent)

Link: Hallucination Classification Module


HALLUCINATION TRIGGER CONDITIONS

This layer identifies systemic conditions that increase hallucination probability.

  • Low retrieval grounding density
  • High ambiguity query structures
  • Conflicting entity contexts
  • Missing citation reinforcement signals
  • Over-compression of multi-topic prompts

Link: Hallucination Trigger Analysis


CROSS-MODEL HALLUCINATION RATE COMPARISON

Different AI systems exhibit different hallucination profiles under identical query conditions.

  • Model-specific hallucination frequency
  • Severity distribution per model
  • Entity drift comparison across models
  • Citation fabrication variance

Link: AI Retrieval Behavior Dataset


ENTITY DRIFT DETECTION LAYER

A key hallucination subtype occurs when entity identity becomes unstable or incorrectly merged.

  • entity_id
  • incorrect_entity_mapping
  • entity_confusion_pairs
  • reference_mismatch_score
  • cross-context identity stability

Link: Entity Visibility Dataset


CITATION FABRICATION TRACKING

This module isolates hallucinated references that appear structurally valid but are not grounded.

  • fabricated_url_detection
  • invalid_source_pattern
  • citation_confidence_score
  • reference_verification_status

Link: AI Citation Dataset


HALLUCINATION DECAY & CORRECTION BEHAVIOR

Some models self-correct hallucinations within extended dialogue or updated context windows.

  • self_correction_rate
  • error_persistence_duration
  • follow-up correction triggers
  • context_recovery effectiveness

Link: Retrieval Observation Dataset


USE CASES

  • AI reliability engineering and evaluation
  • GEO system trust calibration
  • Entity grounding improvement strategies
  • Retrieval augmentation optimization
  • Cross-model factual consistency benchmarking

SYSTEM POSITIONING

Hallucination Dataset defines the boundary between generated knowledge and factual truth. It acts as a control system for measuring when AI shifts from retrieval-based reasoning to synthetic fabrication.

In GEO architecture, hallucination is not noise. It is a signal of broken retrieval grounding.