California Legislature Passes Generative AI Training Data Transparency Bill

Posted

The California legislature recently passed Assembly Bill 2013 (AB 2013) on August 27, 2024, a measure aimed at enhancing transparency in AI training and development. If signed into law by Governor Gavin Newsom, developers of generative AI systems or services that are made available to Californians would be required to disclose significant information on the data used to train such AI systems or services. This, in turn, may raise novel compliance burdens for AI providers as well as unique challenges for customers in interpreting the information.

To What Does AB 2013 Apply?
The bill’s requirements apply only to “generative AI,” a term that the bill defines as “artificial intelligence that can generate derived synthetic content, such as text, images, video, and audio, that emulates the structure and characteristics of the artificial intelligence’s training data [emphasis added].”

Put differently, generative AI is a system that takes in data, learns from it, and then creates new content that mimics the patterns and characteristics of the data it was trained on. Generative AI systems or services impacted by AB 2013 could include, for instance, conversational product searches, shopping assistants, chatbot support features, and virtual try-on features.

“Artificial intelligence,” more broadly, means an “engineered or machine-based system that varies in its level of autonomy and that can, for explicit or implicit objectives, infer from the input it receives how to generate outputs that can influence physical or virtual environments.”

Because AB 2013 applies to generative AI and not artificial intelligence more broadly, AB 2013 does not apply to simple symbolic (or “traditional”) AI, as those are systems that simply follow pre-programmed rules or instructions without learning from data and generating synthetic content.

To Whom Does AB 2013 Apply?
If you or your business are building or changing generative AI tools that other people are using, AB 2013 would likely apply to you. The legislation’s coverage applies to “developers” which means a “person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence system or service for use by members of the public.” Because of this broad definition of “developers,” any service provider that is engaged in retraining or fine-tuning an existing generative AI model could also be in scope.

Developers will be required to comply with the bill whether or not the generative AI tool is provided for a fee.

On the other hand, AB 2013 would not apply to generative AI technology (1) whose sole purpose is to ensure security and integrity; (2) whose sole purpose is the operation of aircraft in the national airspace; or (3) developed for national security, military or defense purposes that is made available only to a federal entity.

What Does AB 2013 Require Developers to Do?
AB 2013 requires developers to post a high-level summary of the datasets used to train the generative AI system or service on their website. The information required to be published includes, but is not limited to:

  1. The sources or owners of the datasets.
  2. A description of how the datasets further the intended purpose of the artificial intelligence system or service.
  3. The number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets.
  4. A description of the types of data points within the datasets.
  5. Whether the datasets include any data protected by copyright, trademark or patent, or whether the datasets are entirely in the public domain.
  6. Whether the datasets were purchased or licensed by the developer.
  7. Whether the datasets include personal information, as defined in the California Consumer Privacy Act (CCPA).
  8. Whether the datasets include aggregate consumer information, as defined in the CCPA
  9. Whether there was any cleaning, processing or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the artificial intelligence system or service.
  10. The time period during which the data in the datasets were collected, including a notice if the data collection is ongoing.
  11. The dates the datasets were first used during the development of the artificial intelligence system or service.
  12. Whether the generative AI system or service used or continuously uses synthetic data generation in its development.

When Does AB 2013 Take Effect?
The legislation would apply to generative AI systems or services made available to Californians on or after January 1, 2022. Developers would need to post the required information on or before January 1, 2026, and thereafter, before such a system or service (or a substantial modification thereto) is made publicly available to Californians. Because the transparency obligations are backwards looking, companies would need to provide on January 1, 2026, what data was previously used to train a generative AI service that was already in operation.

Emerging Themes
A theme surrounding emerging AI regulations (and related AI governance processes) revolves around ensuring that generative AI systems afford users with sufficient transparency as to how content is generated. The policy principle is that transparency regarding the origins of training data enables users to identify and address issues such as biases and problematic outputs, and the quality of the materials informing the output. Ensuring transparency in the training data purportedly also aims to allay privacy and intellectual property concerns and allow consumers to better evaluate and have confidence in AI systems or services.

AB 2013 apparently seeks to achieve these objectives by enabling AI users to make more informed decisions about which AI systems to use and how to interpret their outputs. The legislation raises questions about the appropriate level of detail needed to meaningfully inform users without overwhelming them or compromising competitive advantages. Both developers and consumers of generative AI services will need to carefully evaluate what information should be specifically disclosed and how that information should be interpreted—especially where training data is complex or sourced from multiple origins.

Another potential issue is that AB 2013 does not explicitly permit developers to exclude information from their disclosures that may be protected as a trade secret (but also does not contain any provision overriding existing trade secret protections generally available under California law). This is significant, as a common pushback from vendors when negotiating reporting and transparency requirements relates to the proprietary nature of model weightings, logic, and other training data specifics.

Broader Regulatory Context
Following Colorado, which enacted the first comprehensive AI legislation in the U.S., and other states that passed AI laws in 2024, the California legislature passed 38 AI bills in its 2024 session

Other AI-related bills that are still pending Gov. Newsom’s signature include:

  • SB 1047 (“Safe and Secure Innovation for Frontier Artificial Intelligence Act”), a sweeping and controversial bill aimed to prevent disastrous events caused by large AI models and requiring the addition of certain safety guardrails;
  • AB 2885, defining artificial intelligence as “an engineered or machine-based system that varies in its level of autonomy and that can, for explicit or implicit objectives, infer from the input it receives how to generate outputs that can influence physical or virtual environments,” setting the scope for California’s AI-related regulatory efforts;
  • SB 942 (“California AI Transparency Act”), which obligates large providers of generative AI to embed disclosures to AI-generated content and include tools that enable users to detect AI-generated content.

Our article about the Colorado, Tennessee and Utah AI laws can be found here.

What Comes Next?
Gov. Newsom has until September 30 to sign or veto AB 2013. If this legislation becomes law, developers will have a host of new compliance obligations and likely, ensuing challenges. AI customers, on the other hand, will need to devise contractual obligations for AI providers to ensure compliance as well as practical strategies for how to consume and interpret the information provided pursuant to the law. These practical challenges are only likely to increase as a patchwork of state-level AI regulations emerges in the United States.


RELATED ARTICLES

Consumer Protection Dispatch: The Latest Developments in the World of Consumer Protection (5/22/24)

What You Need to Know If You’re Using AI-Generated Voices for Your Company