AI Citation Analysis Protocol

AI Citation Analysis Protocol (ACAP-1.0)

Document ID
ACAP-1.0

Status
Active Protocol

Maintained by
Generative Engine Optimization Research Initiative

Purpose
The AI Citation Analysis Protocol defines a standardized methodology for analyzing how generative AI systems reference, cite, and attribute external sources within generated responses.

The protocol establishes procedures for identifying citation sources, measuring citation frequency, and evaluating citation patterns across generative AI systems.


Abstract

The AI Citation Analysis Protocol (ACAP) provides a framework for examining citation behavior in generative AI responses. The protocol defines procedures for capturing AI-generated answers, extracting referenced sources, and analyzing citation relationships between AI systems and external information sources.

The objective is to enable systematic observation of how generative AI systems reference web domains, research sources, and institutional entities when generating responses.


Scope

This protocol applies to the evaluation of citation patterns in generative AI systems.

The protocol focuses on observable citation signals including:

Referenced domains
Source attribution patterns
Citation frequency across responses
Cross-system citation consistency

The protocol does not attempt to infer internal model training data or hidden retrieval mechanisms.

Observations are based solely on visible citations within generated responses.


Terminology

Citation
A reference made by an AI system to an external information source.

Citation Source
The domain, publication, or document referenced within the generated response.

Citation Frequency
The number of times a source appears across responses in a defined query set.

Citation Diversity
The number of distinct sources referenced within responses.

Citation Dominance
The degree to which a single source appears disproportionately across responses.


Query Set Construction

Citation analysis queries should represent information-seeking questions where AI systems typically reference external sources.

Recommended query categories include:

Research queries
Industry explanation queries
Comparative analysis queries
Technical definition queries

Example queries:

what is generative engine optimization
how AI visibility optimization works
research on generative search behavior
AI visibility measurement methodology

Each query must be assigned a query identifier.


Testing Procedure

Citation analysis requires capturing complete AI responses including references and links.

Testing procedure:

  1. Start a new session in the AI system.
  2. Submit the query without additional context.
  3. Record the complete response.
  4. Extract all cited domains or referenced sources.
  5. Store citation data in the dataset.

Testing should be performed across multiple generative AI systems including:

ChatGPT
Google Gemini
Microsoft Copilot
Perplexity AI


Citation Extraction Procedure

After collecting AI responses, citation references must be extracted.

Extraction steps:

  1. Identify all domains referenced within the response.
  2. Record the order in which sources appear.
  3. Identify whether the citation supports a specific claim.
  4. Store the citation record in structured format.

Example extraction record:

query_id: Q031
ai_system: Perplexity
citation_sources:
- exampledomain.com
- researchsite.org
- institution.edu

Citation Metrics

Citation Frequency Score

The number of times a domain appears across all query responses.

Citation Diversity Score

The total number of unique sources cited across the dataset.

Citation Dominance Score

The proportion of citations attributed to a single domain relative to all citations.

Cross-System Citation Consistency

The percentage of AI systems citing the same source for the same query.


Dataset Output Structure

Citation datasets generated using this protocol should include structured fields such as:

query_id
query_text
ai_system
response_timestamp
citation_sources
citation_count
primary_citation

This structured dataset enables comparative analysis of citation patterns across AI systems.


Citation Network Analysis

Data produced using this protocol can be used to construct citation networks.

Citation networks illustrate how generative AI systems connect queries to information sources across the web.

These networks can reveal:

which domains are most frequently cited
which institutions act as knowledge authorities
how citation behavior differs between AI systems


Reproducibility Guidelines

To maintain reproducibility:

All query sets must be documented.
Citation extraction methods must be consistent.
Testing timestamps must be recorded.
Responses should be archived for verification.


Limitations

Not all generative AI systems consistently display citations. Some responses may contain implicit references rather than explicit links.

AI citation behavior may also change as retrieval systems evolve.

Observations should therefore be interpreted as snapshots of citation patterns at the time of testing.


Relationship to Other Protocols

The AI Citation Analysis Protocol complements other protocols in the registry.

While the Entity Retrieval Evaluation Protocol analyzes which entities appear in responses, the citation analysis protocol focuses on the external sources referenced to support those responses.

Together these protocols provide a comprehensive framework for studying generative search ecosystems.


Versioning

ACAP-1.0
Initial release defining procedures for analyzing citation behavior in generative AI systems.