Skip to article frontmatterSkip to article content

Lesson 1: Using Openly Licensed Works Published by Others

Open licensing concepts

In the world of research and education, open licensing has become increasingly important for enabling the free exchange of ideas, promoting collaboration, and accelerating scientific progress. But what do we mean by “open licensing”?

In essence, open licensing refers to a set of principles and legal frameworks that grant permissions for others to freely access, use, modify, and share creative works or products. This concept applies to various types of outputs, including software, research data, scholarly publications, and educational materials.

logos of licensing orgs

Open-source software is perhaps the most well-known example of open licensing. Open-source licenses, such as the MIT, BSD, and Apache licenses, allow users to access the source code of a software program, study how it works, modify it to suit their needs, and redistribute the modified versions.

Similarly, the open data movement advocates for making research data openly available and reusable, often under licenses like the Open Data Commons licenses or Creative Commons Zero public domain dedication. This enables other researchers to validate findings, combine datasets, and build upon existing work.

Open content licenses, like the Creative Commons suite of licenses, are commonly used for scholarly publications, educational resources, media files, and other creative works. These licenses grant varying levels of permissions for reusing, redistributing, and modifying the licensed content.

The principles of open licensing are deeply rooted in the academic traditions of sharing knowledge, enabling peer review and scrutiny, and fostering collaborative advancement of human understanding. By embracing open licensing, researchers and educators can:

  1. Increase the transparency and reproducibility of their work, a cornerstone of the scientific method.
  2. Facilitate the dissemination of their findings and educational materials to a wider audience, amplifying their impact.
  3. Build upon the work of others, accelerating the pace of discovery and innovation.
  4. Collaborate more effectively with colleagues across institutions and disciplines, breaking down silos and fostering interdisciplinary research.

Open licensing is not just a technological or legal construct; it represents a cultural shift towards openness, collaboration, and the democratization of knowledge. As we go deeper into this topic, you will learn the practical aspects of using openly licensed works, and what to consider when licensing your own research outputs and collaborating with others in open-source projects.

Understanding license types

Understanding the basic differences between license types is essential for researchers and educators who want to make informed decisions about which licenses to use for their own projects and how to properly use and attribute third-party software in their work. By familiarizing yourself with the key characteristics and implications of each license type, you can ensure that you are using and sharing software in a way that aligns with your goals and values, while also respecting the rights and intentions of the original creators.

The three main categories of licenses are: open-source permissive, open-source copyleft, and proprietary.

Open-source permissive licenses
Open-source permissive licenses, such as the MIT, BSD, and Apache licenses, are designed to grant users broad freedoms to use, modify, and distribute the licensed software. These licenses typically only require that the original copyright notice and license text be included in any copies or derivative works. Permissive licenses place minimal restrictions on how the software can be used, making them popular choices for academic and research projects. They allow for easy integration with other software projects, including proprietary ones, making them ideal for promoting widespread adoption and collaboration.
Open-source copyleft licenses
Open-source copyleft licenses, like the GNU General Public License (GPL), GNU Lesser General Public License (LGPL), and Mozilla Public License (MPL), are designed to make sure that the freedoms granted by the license are preserved in any derivative works. Copyleft licenses require that any modifications or additions to the licensed software also be released under the same license terms. This “share-alike” requirement ensures that the software and its derivatives remain open and accessible to all. Copyleft licenses are often chosen by developers who want their work to remain open and freely available in perpetuity.
Proprietary licenses
Proprietary licenses are the most restrictive type of license. They typically grant users the right to use the software only for its intended purpose, often limiting the ability to copy, modify, or redistribute the software. Proprietary licenses are commonly used for commercial software products, where the source code is kept closed and the software is distributed only in binary form. Examples of proprietary licenses include the End-User License Agreements (EULAs) that accompany most commercial software packages.

Composing works under different licenses

As a researcher or educator you’ll likely find yourself wanting to use code, data, or content from multiple sources in your projects. But before you start mixing and matching, it’s crucial to understand how different licenses can work together (or not!) and the implications of combining them.

What is license compatibility?

Some licenses play nicely with others, while others have strict requirements that can limit their compatibility. When combining pieces of code together, you will want to check whether the terms of their different licenses allow them to be used together in the same project.

For example, let’s say you want to use a Python library licensed under the permissive MIT license in your research project. You also want to include a dataset licensed under the Creative Commons Attribution (CC BY) license. In this case, you’re in luck! The MIT and CC BY licenses are generally compatible, meaning you can use them together without any legal hiccups.

License directionality

Now, let’s talk about license directionality. This concept refers to how a license behaves differently depending on whether it’s applied to code, data, or content that is being incorporated into your project (upstream) or to the resulting work you create (downstream).

Imagine you’re building a research software tool that analyzes social media data. You plan to use a sentiment analysis library licensed under the GNU General Public License (GPL), which is a copyleft license. If you incorporate this GPL-licensed code into your project, the terms of the GPL require that your entire project also be licensed under the GPL when you distribute it. In this case, the GPL license is said to be “viral” in the downstream direction, affecting the licensing of your derived work.

Similarly, if you’re using a dataset licensed under the Creative Commons Attribution-ShareAlike (CC BY-SA) license, you’re required to distribute any derivative works under the same license terms. However, this requirement only applies to the dataset and any modifications you make to it, not to your entire project. The CC BY-SA license is considered “viral” only for the dataset, not for the larger work it’s incorporated into.

License directionality. Schematic under CC-BY from . doi:10.1371/journal.pcbi.1002598.g002

License directionality. Schematic under CC-BY from Morin et al. (2012). Public Library of Science (PLoS) (n.d.)

Combining code, data, and content under different licenses

Now, let’s put it all together. When you’re combining code, data, and content from different sources in your research or educational projects, it’s essential to consider the compatibility and directionality of their respective licenses.

Say you’re creating an online course that includes a mix of your own content, images licensed under Creative Commons, and code examples from various open-source projects. You’ll need to review the licenses of each component to ensure that they are compatible with each other and with the license you choose for your overall course materials. A handy tool to help you with this task is the Creative Commons License Compatibility Chart.

Understanding license compatibility and directionality is key to making informed decisions when using and combining licensed works in your research and educational projects. Take the time to review the licenses of the components you use, and always confirm that you are using them legally and ethically.

Examples of OSS for researchers and their license terms

Python, NumPy, R, BioPython, LAMMPS, NWChem, Quantum ESPRESSO, QGIS...
Python
Python is a popular open-source programming language widely used in scientific computing, data analysis, and machine learning. It is released under the Python Software Foundation License, a permissive software license approved by the OSI. Thus, all of the Python source code can be reused and redistributed for any use, including commercial. The only condition is that the PSF’s License Agreement and PSF’s notice of copyright be retained in any derivative works.
NumPy
NumPy is a fundamental package for scientific computing in Python, providing support for multi-dimensional arrays and matrices, along with a collection of mathematical functions. It is under the BSD 3-Clause License, which is a permissive license that allows users to freely use, modify, and distribute the software, with minimal requirements for attribution and no restrictions on commercial use.
R
R is a programming language and environment for statistical computing and graphics. It is licensed under the GNU General Public License (GPL) version 2 or later. The GPL is a copyleft license, meaning that any derivative works must also be distributed under the same license terms. You can use, modify, and distribute R and its packages, but any modifications you make must also be open-source under the GPL.
Biopython
Biopython is a set of tools for computational biology and bioinformatics written in Python. It is released under the Biopython License Agreement, but many files are dual-licensed under the BSD 3-Clause License. These are both permissive licenses allowing all uses and redistribution, provided that the copyright notice is retained. The Biopython License, however, is not OSI-approved.
LAMMPS
The Large-scale Atomic/Molecular Massively Parallel Simulator, LAMMPS is a software package for high-performance (classical) molecular dynamics simulations for materials science applications.It is distributed under the terms of the GNU Public License Version 2 (GPLv2), so any software that uses or includes LAMMPS source code must also be under GPL.
NWChem
NWChem is a computational chemistry software package for both quantum chemical and molecular dynamics simulations, distributed under the terms of the OSI-approved Educational Community License, ECL 2.0. The ECL is a permissive license that allows free use, modification, and distribution of NWChem, including for commercial purposes, as long as the copyright notice and disclaimer are retained.
Quantum ESPRESSO
Quantum ESPRESSO is a software suite for electronic-structure calculations and materials modeling at the nanoscale, with contributions from many research groups around the world. It is distributed under the GNU General Public License (GPL), a copyleft license requiring any modifications or derivative works to also be open-source under the GPL.
QGIS
QGIS is a popular open-source geographic information system (GIS) used for mapping and spatial data analysis. It is licensed under the GNU General Public License (GPL) version 2 or later. Like with R, above, any modifications or derivative works of QGIS must also be distributed as open-source under the GPL.

Giving credit where credit is due: citing openly licensed works

As you’ve seen already, openly licensed software, data, and content play a key role in enabling reproducible, collaborative, and innovative research. So it’s not enough to use and build from these resources; we also have an obligation to properly credit and cite them in our work.

Citing openly licensed works is important because:

  • It gives recognition to the creators who have generously made their work available for others to use and build upon.
  • It helps others discover and access the resources you’ve used, promoting further collaboration and reuse.
  • It enhances the transparency and reproducibility of your research by clearly documenting the tools and data you’ve relied on.

In academia, citing software and data is becoming increasingly common and accepted, just as citing research articles and books has long been the norm. Many open-source software projects provide clear guidelines on how to cite them in research publications, often including a suggested citation format or even a DOI (Digital Object Identifier) for stable referencing.

Leading by example

As researchers and educators, we have an opportunity to lead by example in properly crediting and citing the openly licensed works we use. By consistently providing attribution and references for the software, data, and content we rely on, we can help to establish this practice as a community norm and expectation.

Moreover, by introducing our students and colleagues to resources like JOSS and the importance of software citation, we can help to foster a new generation of researchers who understand and value the role of openly licensed works in advancing scientific discovery.

So, as we move forward in exploring the use and creation of openly licensed works, let us remember to give credit where credit is due, and to actively participate in building a culture of recognition and collaboration within the open research community.

Activity: exploring open-source software in your field

The goal of this activity is to help you gain hands-on experience in identifying, evaluating, and properly crediting open-source software relevant to your research or educational work.

  1. Think of a research topic or educational need in your field that could potentially be addressed using open-source software.
  2. Search for open-source software packages, libraries, or tools that could be used to tackle this problem or meet this need. You can use resources such as GitHub, GitLab, JOSS, or other domain-specific repositories to find relevant software.
  3. Select one or two open-source software packages that seem most promising for your purposes. For each package, do the following: a. Identify the license under which the software is released. Is it a permissive license (e.g., MIT, BSD) or a copyleft license (e.g., GPL)? What are the key terms and conditions of the license? b. Investigate whether the software has been published in JOSS or another peer-reviewed venue. If so, read the associated article to gain a deeper understanding of the software’s purpose, functionality, and potential applications. c. Determine how to properly cite the software in a research publication or educational material. Look for a CITATION file, README, or other documentation that provides guidance on citation format.
  4. Take some notes of your findings for each software package, including: a. The name and purpose of the software b. The license under which it is released and its key terms c. Whether it has been published in JOSS or another peer-reviewed venue d. How to properly cite the software in your work
  5. Share your summary with your colleagues or classmates, and discuss how you might incorporate these openly licensed tools into your research or educational work.

By engaging in this activity, you’ll gain practical experience in navigating the landscape of open-source software, understanding licensing terms, and properly crediting the works you use. These skills will serve you well as you continue to explore the use and creation of openly licensed resources in your work.

Sources

Portions of this lesson are based on materials from the presentation by Barba (2017), which drew from Morin et al. (2012).

References
  1. Morin, A., Urban, J., & Sliz, P. (2012). A quick guide to software licensing for the scientist-programmer. PLOS Computational Biology, 8(7), e1002598. 10.1371/journal.pcbi.1002598
  2. (N.d.). Public Library of Science (PLoS). 10.1371/journal.pcbi.1002598.g002
  3. Barba, L. A. (2017). A short lecture on Open Licensing. Presentation on Figshare. 10.6084/m9.figshare.4516892.v1