Entity Retrieval Evaluation Protocol (EREP-1.0)
Document ID
EREP-1.0
Status
Active Protocol
Maintained by
Generative Engine Optimization Research Initiative
Purpose
The Entity Retrieval Evaluation Protocol defines a standardized methodology for analyzing how generative AI systems retrieve and prioritize entities when responding to user queries.
This protocol focuses on identifying which entities are selected by AI systems, how frequently they appear, and how they are positioned within generated responses.
The objective is to enable systematic comparison of entity retrieval behavior across multiple generative AI environments.
Abstract
The Entity Retrieval Evaluation Protocol (EREP) establishes a structured framework for evaluating how generative AI systems retrieve and present entities in response to information queries.
The protocol defines procedures for constructing controlled query sets, capturing generated responses, extracting entity references, and analyzing entity selection patterns.
The resulting datasets enable comparative analysis of retrieval behavior across AI systems and across time.
Scope
This protocol applies to the evaluation of entity retrieval behavior in generative AI systems.
The protocol measures observable retrieval signals including:
Entity selection within generated responses
Entity frequency across query sets
Entity ranking within responses
Cross-system retrieval consistency
The protocol does not attempt to infer internal model mechanisms.
The evaluation is based solely on observable outputs.
Terminology
Entity
A uniquely identifiable organization, brand, product, person, or concept referenced in an AI-generated response.
Entity Retrieval
The process by which a generative AI system selects and includes specific entities within a generated response.
Primary Entity
The first or most prominent entity referenced in the response.
Secondary Entity
Additional entities referenced after the primary entity.
Retrieval Consistency
The degree to which the same entities appear across multiple AI systems for the same query.
Query Design Principles
Queries used for retrieval evaluation must represent realistic information-seeking behavior.
Recommended query categories include:
Industry identification queries
Service provider queries
Definition queries
Comparative queries
Best-of queries
Example queries:
companies specializing in AI optimization
top generative search optimization firms
what companies work on AI visibility
who provides generative engine optimization
Each query must be assigned a query identifier.
Testing Environment
Each query must be executed under controlled conditions.
Testing procedure:
- Initiate a new AI session.
- Submit the query without prior conversation context.
- Record the entire response.
- Extract entity mentions.
- Identify entity positions within the response.
Testing should be conducted across multiple generative AI systems including:
ChatGPT
Google Gemini
Microsoft Copilot
Perplexity AI
Entity Extraction Procedure
After collecting AI responses, entity references must be extracted.
Extraction steps:
- Identify all organization or brand names mentioned.
- Determine the order in which entities appear.
- Record the contextual role of each entity.
Example extraction record:
query_id: Q022
ai_system: ChatGPT
entities_detected:
1. Entity A
2. Entity B
3. Entity C
Retrieval Metrics
Entity Frequency Score
Number of queries in which an entity appears.
Entity Position Score
Primary entity
Secondary entity
Tertiary entity
Retrieval Consistency Score
Percentage of AI systems retrieving the same entity for the same query.
Entity Dominance Score
Frequency at which a single entity appears as the primary result across query sets.
Dataset Output Structure
Dataset records should contain structured information including:
query_id
query_text
ai_system
response_timestamp
entities_detected
primary_entity
secondary_entities
entity_count
This dataset enables analysis of entity retrieval patterns across systems.
Comparative Analysis
Data produced using this protocol can be used to evaluate:
Which entities dominate AI retrieval results
How entity selection differs between AI systems
Whether entity rankings remain stable across time
Comparative analysis may also reveal differences between generative AI systems in how they prioritize entities.
Reproducibility Guidelines
To ensure consistent replication:
All queries must be publicly documented.
Testing timestamps must be recorded.
Responses must be archived for verification.
AI system versions should be noted when available.
Limitations
Generative AI responses are not deterministic and may vary between sessions. Observations captured through this protocol represent snapshots of retrieval behavior rather than permanent rankings.
Model updates may also affect entity retrieval patterns over time.
Relationship to Other Protocols
This protocol complements the AI Visibility Measurement Protocol by focusing specifically on entity selection behavior.
While visibility protocols measure whether entities appear, the retrieval evaluation protocol measures how AI systems choose which entities to present.
Together these protocols support structured research into generative search ecosystems.
Versioning
EREP-1.0
Initial release defining procedures for evaluating entity retrieval patterns in generative AI systems.
