The AI Act and public information on training data for general-purpose AI models

The European Union’s Artificial Intelligence Act Regulation (EU) 2024/1689 (AI Act), which came into force in August 2024, establishes a comprehensive framework for the development and deployment of artificial intelligence within the EU. A key provision, Article 53(1)(d), mandates that providers of general-purpose AI (GPAI) models “draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model.” This requirement aims to enhance transparency and accountability in AI systems by shedding light on the data that shapes these models.

Current Status of Article 53(1)(d) Implementation

As of February 2025, the implementation of Article 53(1)(d) is actively progressing. The European AI Office is spearheading the development of a standardized template to guide GPAI model providers in creating the mandated training data summaries. This initiative is part of a broader effort to draft the General-Purpose AI Code of Practice, which will offer detailed guidance on complying with various aspects of the AI Act, including transparency obligations. The drafting process involves nearly 1,000 stakeholders, encompassing industry representatives, civil society organizations, academic experts, and EU Member State officials. The goal is to finalize the Code of Practice by April 2025, ahead of the AI Act’s full enforcement in August 2025.

Key Considerations in Developing the Training Data Summary Template

The creation of the training data summary template under Article 53(1)(d) involves balancing several critical factors:

  1. Transparency vs. Intellectual Property Rights: While the AI Act emphasizes transparency, it also recognizes the need to protect trade secrets and intellectual property. The forthcoming template aims to provide sufficient detail about training data without compelling providers to disclose proprietary information that could compromise their competitive advantage.
  2. Scope of Information: The template is expected to require a general description of the AI model, including its intended tasks, acceptable use policies, release date, distribution methods, architecture, and licensing information. Additionally, details about the model’s design, training process, data sources, and energy consumption during training may be included.
  3. Open-Source Models: Providers releasing models under free and open-source licenses, with publicly available parameters and architecture, may be exempt from certain documentation obligations. However, this exemption does not apply to models deemed to carry systemic risks, which require full compliance regardless of their open-source status.

Challenges and Industry Response

The AI Act’s stringent transparency requirements have elicited mixed reactions from the AI industry. Some stakeholders express concern that detailed disclosure mandates could inadvertently expose sensitive information, potentially stifling innovation and competitiveness. Conversely, proponents argue that such transparency is essential for building public trust and ensuring ethical AI development. The ongoing development of the Code of Practice seeks to address these concerns by providing clear, balanced guidelines that uphold transparency without compromising proprietary interests.

Next Steps

With the AI Act’s full enforcement slated for August 2025, the coming months are critical for finalizing the General-Purpose AI Code of Practice and the associated training data summary template. Providers of GPAI models should proactively engage with the drafting process to ensure their perspectives are considered and begin preparing to meet the forthcoming transparency obligations. Staying informed and involved will be key to navigating the evolving regulatory landscape and achieving compliance with Article 53(1)(d) of the AI Act.