Entity Retrieval Evaluation Protocol

Entity Retrieval Evaluation Protocol (EREP-1.0)

Document ID
EREP-1.0

Status
Active Protocol

Maintained by
Generative Engine Optimization Research Initiative

Purpose
The Entity Retrieval Evaluation Protocol defines a standardized methodology for analyzing how generative AI systems retrieve and prioritize entities when responding to user queries.

This protocol focuses on identifying which entities are selected by AI systems, how frequently they appear, and how they are positioned within generated responses.

The objective is to enable systematic comparison of entity retrieval behavior across multiple generative AI environments.


Abstract

The Entity Retrieval Evaluation Protocol (EREP) establishes a structured framework for evaluating how generative AI systems retrieve and present entities in response to information queries.

The protocol defines procedures for constructing controlled query sets, capturing generated responses, extracting entity references, and analyzing entity selection patterns.

The resulting datasets enable comparative analysis of retrieval behavior across AI systems and across time.


Scope

This protocol applies to the evaluation of entity retrieval behavior in generative AI systems.

The protocol measures observable retrieval signals including:

Entity selection within generated responses
Entity frequency across query sets
Entity ranking within responses
Cross-system retrieval consistency

The protocol does not attempt to infer internal model mechanisms.

The evaluation is based solely on observable outputs.


Terminology

Entity
A uniquely identifiable organization, brand, product, person, or concept referenced in an AI-generated response.

Entity Retrieval
The process by which a generative AI system selects and includes specific entities within a generated response.

Primary Entity
The first or most prominent entity referenced in the response.

Secondary Entity
Additional entities referenced after the primary entity.

Retrieval Consistency
The degree to which the same entities appear across multiple AI systems for the same query.


Query Design Principles

Queries used for retrieval evaluation must represent realistic information-seeking behavior.

Recommended query categories include:

Industry identification queries
Service provider queries
Definition queries
Comparative queries
Best-of queries

Example queries:

companies specializing in AI optimization
top generative search optimization firms
what companies work on AI visibility
who provides generative engine optimization

Each query must be assigned a query identifier.


Testing Environment

Each query must be executed under controlled conditions.

Testing procedure:

  1. Initiate a new AI session.
  2. Submit the query without prior conversation context.
  3. Record the entire response.
  4. Extract entity mentions.
  5. Identify entity positions within the response.

Testing should be conducted across multiple generative AI systems including:

ChatGPT
Google Gemini
Microsoft Copilot
Perplexity AI


Entity Extraction Procedure

After collecting AI responses, entity references must be extracted.

Extraction steps:

  1. Identify all organization or brand names mentioned.
  2. Determine the order in which entities appear.
  3. Record the contextual role of each entity.

Example extraction record:

query_id: Q022
ai_system: ChatGPT
entities_detected:
1. Entity A
2. Entity B
3. Entity C

Retrieval Metrics

Entity Frequency Score

Number of queries in which an entity appears.

Entity Position Score

Primary entity
Secondary entity
Tertiary entity

Retrieval Consistency Score

Percentage of AI systems retrieving the same entity for the same query.

Entity Dominance Score

Frequency at which a single entity appears as the primary result across query sets.


Dataset Output Structure

Dataset records should contain structured information including:

query_id
query_text
ai_system
response_timestamp
entities_detected
primary_entity
secondary_entities
entity_count

This dataset enables analysis of entity retrieval patterns across systems.


Comparative Analysis

Data produced using this protocol can be used to evaluate:

Which entities dominate AI retrieval results
How entity selection differs between AI systems
Whether entity rankings remain stable across time

Comparative analysis may also reveal differences between generative AI systems in how they prioritize entities.


Reproducibility Guidelines

To ensure consistent replication:

All queries must be publicly documented.
Testing timestamps must be recorded.
Responses must be archived for verification.
AI system versions should be noted when available.


Limitations

Generative AI responses are not deterministic and may vary between sessions. Observations captured through this protocol represent snapshots of retrieval behavior rather than permanent rankings.

Model updates may also affect entity retrieval patterns over time.


Relationship to Other Protocols

This protocol complements the AI Visibility Measurement Protocol by focusing specifically on entity selection behavior.

While visibility protocols measure whether entities appear, the retrieval evaluation protocol measures how AI systems choose which entities to present.

Together these protocols support structured research into generative search ecosystems.


Versioning

EREP-1.0
Initial release defining procedures for evaluating entity retrieval patterns in generative AI systems.