Artificial intelligence is becoming more deeply embedded in our world, enabling all sorts of new possibilities — but also raising new risks and dangers as well. Data privacy, embedded biases, intentional and unintentional false information are all major concerns.
One possible approach to some of these problems is being studied at the Allen Institute for AI (AI2) in Seattle. Building on its recently announced AI2 Open Language Model (OLMo), the institute recently launched the Impact License Project (ImpACT), a family of licenses designed to help reduce risks and promote transparency in the field of artificial intelligence.
The project is named for and designed to implement AI2’s core values of impact, accountability, collaboration, and transparency.
“We view OLMo as a platform,” said Jesse Dodge, an AI research scientist working closely with the AI2 legal team. “So, we are doing our best to be open about all of the steps that we’re taking when building this large open model. That includes doing things like releasing the data that was used to train it.”
Initially designed to be used specifically with the OLMo open model, the ImpACT licenses take a number of novel approaches.
The license restrictions are risk-based and therefore, artifact agnostic. Because licenses rely on risk assessment and aren’t segmented by artifact or artifact type (the data and the models to be licensed), they can be readily applied to any derivative downstream models and applications. To do this, the risk categories of the licenses — low, medium, and high — are assigned based on an assessment completed by a multidisciplinary group of lawyers, ethicists, and scientists.
Large models have taken the world by storm during the past few years. These include large language models and large multimodal models that are essentially neural networks at their core. The model refers to the configuration of the nodes – or neurons – as well as the values that interconnect those nodes. These values are referred to as its weights and biases.
The data — or corpus — on which the models are trained is also critical to its function and is increasingly coming to be treated as proprietary IP by corporations. Because it is typically scrapped from the public web, this data often incorporates copyright material created by others. The resulting corpus is often a vast collection of information that is further modified to make it more suitable for its intended application and for public consumption. This can include the elimination of NSFW text and images, a process that requires human oversight and can itself raise many ethical issues.
Much of the information needed to make the imPACT license assessments comes from Derivative Impact Reports that are completed in the process of receiving the license. These good-faith self-reporting documents disclose details about the artifacts. The reports are intended to work similarly to model cards or dataset cards and include details about intended uses, funding sources, energy consumption, and data provenance for the derivatives.
This transparency will allow researchers, developers and users to better understand what they are working with, as well as the community at large.
“I do think what’s so innovative about this whole project are the safety measures and decision points we’ve built in throughout the development process,” said Jennifer Dumas, general counsel at AI2 and project lead.
While the ImpACT licenses are designed to be used for AI artifacts like data and models, they aren’t intended for software and source code. These will continue to be licensed under existing licensing schemes.
The ImpACT Licenses begin with the idea that possible applications of an artifact are separate from what kind of artifact it is. For instance, both public data and health data can be classified as datasets, but their potential for harmful uses is significantly different. Under existing licensing these two datasets might be released and classified similarly, but under ImpACT, they’re treated differently.
Another novel feature of these licenses is that the open nature of the information facilitates oversight by the community. This enables public reporting of violators and disclosure requirements around intended uses and project inputs. Such a crowdsourced approach can help ensure community values are respected as well as helping the concept to scale.
By building in a liability exemption for the information provided in these good-faith public disclosures, it will hopefully incentivize public reporting. This is important as a way of ensuring alignment with community values.
This raises other considerations. Because many of the criteria being examined come down to values and value judgments, it’s important to recognize that what works for one group or community may not work for another.
“I would love for somebody to take this license and modify it for their own use,” Dumas said. “To share and to use their own value judgments.”
This new approach to managing AI comes at a time when the world is growing increasingly concerned about its risks and challenges. Last month, the Biden administration met with many of the major players in AI, including Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI. The White House said it secured voluntary commitments from these companies to help ensure safe, secure, and transparent development, but details about how this is going to be achieved remained scarce.
AI2’s approach is still in its early stages but seems to at least be looking to more concrete solutions for the problems we face with AI. By recognizing the importance of open approaches and broad representation of human values through community oversight, it feels like a step in the right direction.
“For the first time in the history of AI, massive developments are occurring in the for-profit system but aren’t being shared with the research community,” said Dodge. “The research community will continue to make advances that will benefit the for-profit system, but not the other way around. Part of our motivation here is to be as open as possible to try to swing the pendulum in the other direction.”
The institute acknowledges the initial idea for the licenses came from Alvaro Herrasti, a researcher on AI2’s PRIOR team. Herasti first pitched the idea of a “software license for the common good” at AI2’s 2022 Hackathon event.
“As we looked at the question of licensing, we sort of flipped how we thought about it,” said Dumas. “I think the license-by-risk philosophy is something that can also be used outside OLMo, possibly even outside of AI.”
Further details about how the AI2 ImpACT licenses work can be found in a recent AI2 blog post and in the summary of the ImpACT license legal text.