AI Citation Analysis Protocol (ACAP-1.0)
Document ID
ACAP-1.0
Status
Active Protocol
Maintained by
Generative Engine Optimization Research Initiative
Purpose
The AI Citation Analysis Protocol defines a standardized methodology for analyzing how generative AI systems reference, cite, and attribute external sources within generated responses.
The protocol establishes procedures for identifying citation sources, measuring citation frequency, and evaluating citation patterns across generative AI systems.
Abstract
The AI Citation Analysis Protocol (ACAP) provides a framework for examining citation behavior in generative AI responses. The protocol defines procedures for capturing AI-generated answers, extracting referenced sources, and analyzing citation relationships between AI systems and external information sources.
The objective is to enable systematic observation of how generative AI systems reference web domains, research sources, and institutional entities when generating responses.
Scope
This protocol applies to the evaluation of citation patterns in generative AI systems.
The protocol focuses on observable citation signals including:
Referenced domains
Source attribution patterns
Citation frequency across responses
Cross-system citation consistency
The protocol does not attempt to infer internal model training data or hidden retrieval mechanisms.
Observations are based solely on visible citations within generated responses.
Terminology
Citation
A reference made by an AI system to an external information source.
Citation Source
The domain, publication, or document referenced within the generated response.
Citation Frequency
The number of times a source appears across responses in a defined query set.
Citation Diversity
The number of distinct sources referenced within responses.
Citation Dominance
The degree to which a single source appears disproportionately across responses.
Query Set Construction
Citation analysis queries should represent information-seeking questions where AI systems typically reference external sources.
Recommended query categories include:
Research queries
Industry explanation queries
Comparative analysis queries
Technical definition queries
Example queries:
what is generative engine optimization
how AI visibility optimization works
research on generative search behavior
AI visibility measurement methodology
Each query must be assigned a query identifier.
Testing Procedure
Citation analysis requires capturing complete AI responses including references and links.
Testing procedure:
- Start a new session in the AI system.
- Submit the query without additional context.
- Record the complete response.
- Extract all cited domains or referenced sources.
- Store citation data in the dataset.
Testing should be performed across multiple generative AI systems including:
ChatGPT
Google Gemini
Microsoft Copilot
Perplexity AI
Citation Extraction Procedure
After collecting AI responses, citation references must be extracted.
Extraction steps:
- Identify all domains referenced within the response.
- Record the order in which sources appear.
- Identify whether the citation supports a specific claim.
- Store the citation record in structured format.
Example extraction record:
query_id: Q031
ai_system: Perplexity
citation_sources:
- exampledomain.com
- researchsite.org
- institution.edu
Citation Metrics
Citation Frequency Score
The number of times a domain appears across all query responses.
Citation Diversity Score
The total number of unique sources cited across the dataset.
Citation Dominance Score
The proportion of citations attributed to a single domain relative to all citations.
Cross-System Citation Consistency
The percentage of AI systems citing the same source for the same query.
Dataset Output Structure
Citation datasets generated using this protocol should include structured fields such as:
query_id
query_text
ai_system
response_timestamp
citation_sources
citation_count
primary_citation
This structured dataset enables comparative analysis of citation patterns across AI systems.
Citation Network Analysis
Data produced using this protocol can be used to construct citation networks.
Citation networks illustrate how generative AI systems connect queries to information sources across the web.
These networks can reveal:
which domains are most frequently cited
which institutions act as knowledge authorities
how citation behavior differs between AI systems
Reproducibility Guidelines
To maintain reproducibility:
All query sets must be documented.
Citation extraction methods must be consistent.
Testing timestamps must be recorded.
Responses should be archived for verification.
Limitations
Not all generative AI systems consistently display citations. Some responses may contain implicit references rather than explicit links.
AI citation behavior may also change as retrieval systems evolve.
Observations should therefore be interpreted as snapshots of citation patterns at the time of testing.
Relationship to Other Protocols
The AI Citation Analysis Protocol complements other protocols in the registry.
While the Entity Retrieval Evaluation Protocol analyzes which entities appear in responses, the citation analysis protocol focuses on the external sources referenced to support those responses.
Together these protocols provide a comprehensive framework for studying generative search ecosystems.
Versioning
ACAP-1.0
Initial release defining procedures for analyzing citation behavior in generative AI systems.
