TITLE 15.3. Copyrighted Materials Used for Artificial Intelligence Training
3115.
For the purposes of this title, the following definitions apply:(a)“Approximate content fingerprint” or “fingerprint” means an abstract representation of digital content that encodes distinctive features of the content and that is all of the following:
(1)Distinct to the digital content being represented.
(2)Robust to minor variations in the original digital content.
(3)Incapable of being used to reconstruct the original digital content.
(4)Capable of being used to readily identify digital content in a dataset.
(b)
(a) “Artificial intelligence” or “AI” means an engineered or machine-based system that varies in its level of autonomy and that can, for explicit or implicit objectives, infer from the input it receives how to generate outputs that can influence physical or virtual environments.
(c)
(b) (1) “Covered material” means either of the following:
(A) A material registered,
preregistered, or indexed with the United States Copyright Office pursuant to the federal Copyright Act of 1976, Public Law 94-553 (17 U.S.C. Sec. 101 et seq.).
(B) A sound recording fixed before February 15, 1972, enforceable under the Classics Protection and Access Act, Title II of Public Law 115-264 (17 U.S.C. Secs. 301(c) and 1401).
(2) “Covered material” does not mean a material that is in the public domain.
(d)
(c) “Rights owner” means the owner of a
covered material.
(e)
(d) “Developer” means a business, person, partnership, corporation, or other entity that designs, codes, produces, or substantially modifies a GenAI model and that does either of the following:
(1) Uses the GenAI model commercially in California.
(2) Makes the GenAI model available to Californians for reasonably foreseeable commercial use.
(f)
(e) “Generative artificial intelligence” or “GenAI” means an artificial intelligence system that can generate derived synthetic content, including text, images, video, and audio, that emulates the structure and characteristics of the system’s training data.
3115.5.A developer of a GenAI model shall do all of the following:
(a)(1)Document any covered materials that the developer knows were used by the developer to train the GenAI model.
(2)Make reasonable efforts to identify and document any other covered materials that were used by the developer to train the GenAI model.
(3)Document the rights owner of each covered material documented
pursuant to this subdivision.
(b)(1)Make available information on the developer’s internet website sufficient to enable a natural person to generate a fingerprint that is both of the following:
(A)Compatible with any covered materials used by the developer to train the GenAI model.
(B)Generated according to widely accepted industry standards.
(2)The obligation to make available information pursuant to this subdivision may be satisfied by directing rights owners to an external tool that is free to use, nondiscriminatory, and reasonably accessible.
(c)(1)Make available
3115.5.
(a) A developer shall make available on its internet website a mechanism on the developer’s internet website allowing a rights owner to submit a request for
information about the developer’s use of the rights owner’s covered materials.(2) The
materials. The mechanism made available pursuant to this subdivision shall allow a
rights owner to provide the developer with all of the following:
(A)
(1) Documentation sufficient to establish the rights owner’s identity.
(B)
(2) The physical or electronic signature of the rights owner or a third party authorized to act on behalf of the rights owner.
their behalf.
(C)
(3) Registration, preregistration, or index numbers and fingerprints for one or more of the rights owner’s
covered materials.
(4) (A) Except as provided in subparagraph (B), any additional information specified by the developer that is reasonably necessary to comply with this chapter.
(B) A rights owner shall not be required to transmit a copy of a covered material in a form suitable for training, fine-tuning, or otherwise developing a GenAI model to a developer in order to receive information about the developer’s use of covered materials under this chapter.
(d)Document any requests received using the mechanism established pursuant to subdivision (c).
(e)Retain the documentation required under subdivisions (a) and (d)
(b) A developer shall document and retain any requests received from rights owners under this chapter for as long as the developer uses the GenAI model commercially in California or makes the GenAI model available to Californians for reasonably foreseeable commercial use, whichever is longer, plus five years.
3116.
(a) Within 30 days of receiving a request for information from a rights owner using the mechanism established pursuant to subdivision (c) of Section 3115.5, under this chapter, a developer shall do both of the following:(1)(A)For each fingerprint provided by the rights owner, assess whether the covered material
represented by the fingerprint is likely to be present in the developer’s dataset.
(B)A developer shall not be required to assess a fingerprint that was not generated according to widely accepted industry standards.
(1) Assess whether the developer used the rights owner’s covered materials to develop the GenAI model. The assessment shall be all of the following:
(A) Designed to identify exact or substantially similar copies of the covered materials in training datasets or other records maintained by the developer, including through the use of approximate content fingerprints or functionally equivalent
technical measures where appropriate.
(B) Robust to minor variations in covered materials, including changes in file format, resolution, cropping, resizing, excerpting, or other modifications that do not change the expressive meaning or functional content of the materials.
(C) Appropriate to the format of the covered material, including text, images, audio, video, or other protected works.
(D) Applied to all training datasets and other records maintained by the developer that are reasonably likely to contain information related to the request.
(E) Conducted in good faith and in a manner reasonably calculated to produce accurate and complete results.
(2) Provide the rights owner with the following information:
a list of covered materials identified pursuant to this subdivision.
(A)(i)A list of covered materials held by the rights owner that the developer documented pursuant to subdivision (a) of Section 3115.5.
(ii)A rights owner shall not be required to provide a
registration number, preregistration number, index number, or fingerprint to a developer in order to receive the information required under this subparagraph.
(B)A list of covered materials held by the rights owner that a fingerprint assessment suggests are likely to be present in the
developer’s dataset pursuant to paragraph (1).
(b) A developer’s collection, use, retention, and sharing of information from a rights owner pursuant to this section shall be reasonably necessary and proportionate to achieve the purposes for which the information was collected and processed, or for another disclosed purpose that is compatible with the context in which the information was collected, and not further processed in a manner that is incompatible with those purposes.
(c) Each day after the 30-day period described in subdivision (a) that a developer fails to provide a rights owner with the information required under this title constitutes a discrete violation.
(d) A developer shall not be required
to respond to a request that is either of the following:
(1) Not accompanied by documentation sufficient to establish the rights owner’s identity.
(2) Made in violation of Section 3116.5.
3116.5.
(a) A rights owner, or any person acting on their behalf, shall not submit more than one request per calendar quarter to the same developer concerning the same GenAI model, unless the subsequent request includes material new information not available to the rights owner at the time of the prior request.(b) A request submitted pursuant to this section may pertain to multiple covered materials.
3117.
A rights owner that has complied in good faith with Section 3116.5 and that is not provided with the information as required by this title may bring a civil action against the developer for any of the following:(a) One thousand dollars ($1,000) per violation or actual damages, whichever is greater.
(b) Injunctive or declaratory relief.
(c) Reasonable attorney’s costs and fees.
(d) Any other relief the court deems appropriate.
3117.5.
This title shall not apply to a GenAI model that is any of the following:(a) Trained exclusively using datasets the developer makes publicly available at no cost to users of the developer’s internet website.
(b) Trained exclusively using datasets a third party makes publicly available at no cost to users, as disclosed by the developer pursuant to Section 3111.
(c) Developed and used solely by universities or government entities exclusively for noncommercial academic or governmental research.
(d) Not trained using covered materials.
(e) Trained exclusively using covered materials for which the developer is the rights owner.
(f) Trained exclusively using covered materials the developer licensed for the disclosed purpose of training a GenAI model.
(g) Developed and used exclusively for the operation of aircraft in the national airspace.
3118.
Nothing in this chapter shall be construed to impose liability on the provider of a telecommunications service, information service, or cable service, as those terms are defined in Section 153 of Title 47 of the United States Code, for content provided by another person, to the extent the provider is not acting as the developer of a GenAI model.