Skip to content

AI Training Data Public Disclosure Template Mandated by European Commission

AI Providers to Publicly Disclose Training Data Details: European Commission Mandates Transparent Template for General-Purpose AI Models, Promoting Copyright and Data Protection Compliance.

Mandate for Standardized Public Disclosure of Artificial Intelligence Training Data Issued by...
Mandate for Standardized Public Disclosure of Artificial Intelligence Training Data Issued by European Commission

AI Training Data Public Disclosure Template Mandated by European Commission

The European Commission has published a mandatory template for the public summary of training data for General-Purpose AI (GPAI) models, effective from August 2, 2025, under the EU's AI Act. This disclosure tool aims to ensure transparency and enforce accountability in the use of training data for GPAI models.

The template, published by the European Commission's AI Office, requires GPAI providers to publicly disclose a "sufficiently detailed summary" of the data used to train their models. The summary should cover all data used in the training process, including pre-training, fine-tuning, and alignment stages.

The template consists of three major sections: General Information, List of Data Sources, and Data Processing Aspects.

In the General Information section, providers must disclose details about model and provider identification, modalities covered, training data size estimates, and language and demographic coverage.

The List of Data Sources section requires identification of large training datasets, specification of data sources' origins, disclosure of web-scraped content details, reporting of user-generated data usage, and disclosure of synthetic data usage from other AI models.

The final section, Data Processing Aspects, requires an explanation of data processing methods and steps, copyright compliance, and content moderation measures. Providers are encouraged to include any additional relevant information about their data processing practices to enhance transparency.

The template balances transparency with protecting providers' trade secrets by setting a minimum baseline of disclosure. For example, for licensed datasets, providers must confirm licensing agreements and data modalities (e.g., text, audio). For crawled or scraped data, providers must disclose more detailed information, including a list of the top 10% most relevant internet domains by size of content scraped (5% for SMEs).

The requirement to publish a summary of the training data applies to all GPAI models, including those released under open-source licenses. Entities that modify an existing GPAI model significantly must use the template to report information solely about the training content used for those modifications.

Noncompliance with these disclosure requirements can result in fines of up to €15 million or 3% of global annual revenue, whichever is greater. Starting August 2, 2026, the AI Office may verify compliance and issue corrective measures. The European Commission will not perform content-level audits but can act upon complaints or "qualified alerts" issued by the scientific panel.

Where additional training occurs post-market, providers must update their summaries every six months, or sooner if the update constitutes a material change. Providers offering GPAI models with systemic risk face stricter requirements, including model evaluations, risk mitigation, incident reporting, and cybersecurity measures.

This template is part of the EU’s broader AI governance framework, aiming to ensure transparency and enable stakeholders such as copyright holders, data subjects, and downstream users to exercise their rights under EU law with regard to training data used in GPAI models.

[1] Regulation (EU) 2024/1689 - AI Act [2] European Commission - Press Release, "AI Act: The European Commission sets out rules for artificial intelligence", 21 April 2021 [3] European Commission - Explanatory Note, "Template for the public summary of training data for general-purpose AI models", 24 July 2025 [4] European Commission - Q&A, "Template for the public summary of training data for general-purpose AI models", 24 July 2025 [5] European Commission - AI Act, "Article 53(1)(d) - Disclosure of information", accessed 26 July 2025

  1. As mandated by the EU's AI Act, providers of General-Purpose AI (GPAI) models must adhere to copyright law while disclosing details about their data processing methods for artificial-intelligence models, particularly in the Data Processing Aspects section of the provided public summary template.
  2. The template for the public summary of training data, effective from August 2, 2025, under the EU's AI Act, includes a requirement for providers to disclose copyright compliance details, signifying the essential role of copyright law in ensuring transparency and accountability in the utilization of training data for GPAI models.

Read also:

    Latest