Open Source Licenses & Copyright: Key Considerations for Software and AI Development

Open Source Copyright for AI & software

As open-source software and artificial intelligence continue to reshape the technology landscape, understanding how licenses and copyright impact your work has never been more critical. Whether you’re creating original code, reusing open-source libraries, training AI models, or sharing datasets, the legal choices you make influence collaboration, distribution, and commercial potential.

Notably, the recent U.S. AI Action Plan unveiled in July 2025 underscores the strategic importance of AI innovation, open-source and open-weight AI models, and high-quality data as national assets, reinforcing the need for developers to stay informed on licensing practices that enable responsible, secure, and competitive AI development nationwide. (For a quick overview of the U.S. AI Action Plan read our blog article here)

This article sheds some light on the complexities of open-source licensing and copyright for both software and AI developers.

Open-source software and AI artifacts are governed by licenses specifying how code, data, and models may be used, modified, and shared. These licenses fall into two broad categories:

  1. Permissive licenses, which allow almost any use with minimal obligations, and
  2. Copyleft licenses, which require that derivatives remain open under the same terms.

Popular permissive licenses include MIT, Apache 2.0, and BSD; popular copyleft licenses include GNU GPL v2/v3 and the Mozilla Public License (MPL).

For non-code assets like datasets or models, Creative Commons (CC) licenses (e.g., CC BY, CC BY-SA, CC0) are often used, alongside data-specific licenses such as CDLA-Permissive-2.0 and ODC-By. Each license type imposes different requirements for attribution, redistribution, and commercial use, which are especially important when integrating code or content into proprietary products or using it to train AI models.

Permissive Licenses

  • MIT License: Extremely permissive, allows reuse in proprietary products with minimal requirements (primarily attribution).
  • Apache License 2.0: Also permissive, but adds an explicit patent grant for protection against patent disputes, which is a key distinction from MIT/BSD licenses.
    Note: Both gpt-oss-20b and gpt-oss-120b have been released under this licence.
  • BSD License: Similar to MIT, very permissive, and often used in academia/industry for maximum flexibility and clarity regarding endorsements.

Copyleft and Weak Copyleft Licenses

  • GNU GPL: Strong copyleft, requiring all derivative works to remain open-source under GPL.
  • Mozilla Public License 2.0 (MPL): Weak copyleft, applying copyleft requirements only to modified files, so code can be combined with proprietary elements.

Model-Specific and Data/Model Licenses for AI

Traditional open source licenses were designed for code and are not always well-suited for AI model weights or data. For contemporary AI models, model-specific licenses are emerging:

  • OpenRAIL (Responsible AI License): Used by models like Stable Diffusion and BLOOM, RAIL licenses add ethical restrictions (e.g., prohibiting misuse or harmful applications).

    Note: these are not OSI-compliant and their enforceability varies.
  • OpenMDW (Model, Data & Weights) License: Introduced in 2025 as a comprehensive, permissive license that covers model code, weights, datasets, documentation, and associated software (“Model Materials”). OpenMDW aims to bring clarity and consistency across all model elements by providing a single license purpose-built for AI, with explicit grants over copyright, patent, database, and trade secret rights, facilitating broad unrestricted use and removing the need to combine multiple licenses.

Data-specific licenses:

  • CDLA-Permissive-2.0 for datasets
  • ODC-By for databases.

Creative Commons (for Data and Models)

  • CC0: Public domain dedication, no restrictions.
  • CC BY: Requires attribution.
  • CC BY-SA: Requires derivatives to be licensed under the same terms.
  • CC BY-NC: Restricts commercial use.
  • CC BY-ND: No derivative works allowed.

Note: Creative Commons licenses are not recommended for distributing software code as they lack essential clauses related to patent rights, source availability, and modification/distribution. Instead, CC may be used for datasets, documentation, or other non-code materials.

Comparing License Features

  • Permissive licenses (MIT, BSD, Apache) allow proprietary use with minimal obligations. Apache 2.0 stands out with its explicit patent protection.
  • Copyleft licenses (GPL, MPL) guarantee continued openness in derivatives. GPL ensures all downstream works remain free.
  • Model-specific licenses (OpenMDW, OpenRAIL) provide tailored coverage for AI models and artifacts, adding clarity, responsible-use clauses, and, in the case of OpenMDW, a comprehensive, OSI-aligned approach for both code and non-code components.

Attribution and inclusion of license notices are universal for all recognized FOSS (Free and Open Source Software)/licenses and generally required in downstream uses.

Choosing a License: Advantages and Disadvantages by Project Type

  • Commercial Software: Prefer permissive licenses for maximum compatibility with proprietary/commercial deployment.
  • Internal Tools: License choice less critical unless external distribution is intended.
  • AI Development: Permissive licenses still dominate, but emerging model-specific licenses (OpenMDW, OpenRAIL) are gaining traction for their comprehensive coverage and responsible-use conditions. Pure copyleft may deter adoption, especially in industry.
  • Community Projects: Copyleft ensures community reciprocity; permissive licenses attract broader (including corporate) participation.

Copyright in the EU vs US

  • Authorship & Originality: EU requires “own intellectual creation;” US has a lower threshold of “creativity.”
  • Default Ownership: In the EU, employers generally hold economic rights for employee-created software; in the US, the “work-for-hire” doctrine applies—but contract and national nuances exist.
  • AI-Generated Works: Both regions currently require human authorship for copyright protection. The EU has introduced exceptions for text/data mining, with rights-holders able to opt out for AI training; the US relies on fair use doctrine, providing a more open but less certain landscape.
  • Duration: Both typically grant copyright for life plus 70 years (subject to some variations for corporate works in the US).
  • Moral Rights: Stronger protection in the EU (e.g., right to attribution/integrity) than in the US, where such rights are mainly limited to certain works and can often be waived.

AI Model Training and Data: Cross-Jurisdictional Issues

EU: Explicit copyright exceptions for AI/data mining, subject to rights-holder opt-out.

US: Broader (but less clearly defined) fair use doctrine governs legal status for model/data training.

Patent Rights: Especially relevant in the US; Apache 2.0 and OpenMDW specifically grant patent rights, reducing litigation risk for implementers.

Conclusion

When choosing an open-source license for software or AI models, consider the type of asset (code, data, model, documentation), legal risks (including patents), project goals, and jurisdictional issues affecting data/model training.

For modern AI releases, model-specific licenses like OpenMDW increasingly provide the best of both worlds: broad permissive use, clear rights coverage, and simplicity by reducing the need for multiple layered licenses.

Disclaimer:
The content of this blog is provided for general informational purposes only and does not constitute legal advice. While we strive to ensure that the information is accurate and up to date, it may not reflect the most current legal developments or the specific circumstances of your organization. Readers should not act upon any information contained in this blog without first seeking professional legal counsel. No attorney-client relationship is created through your use of or reliance on the information provided herei