Q. Does ChatGPT contain copies of the text it was trained on? Do AI image generators contain copies of the images they were trained on?

  Jan 07, 2025

No, these models don’t contain exact copies of the texts or images they were trained on. Instead, they generate new texts or images based on the statistical patterns and relationships they learned from the training data. The models create mathematical representations of these patterns, enabling them to produce novel content.

In rare instances, a generated text or image may closely resemble something from the training data. This is not a deliberate feature but rather an unintended consequence of the model’s learning process. Researchers are actively working on techniques to minimize such verbatim copying and ensure the models generate original content derived from their learned patterns.

 

Adapted from "FAQs about generative AI" by Nicole Hennig, University of Arizona Libraries. Licensed under CC BY 4.0.


View All Topics

VIEW ALL FAQs chevron_right

Contact Us

Contact Us

email

Email

Email us your research questions and we’ll respond within 24 hours

question_answer

Chat

Talk online to a research librarian 7 days / week

smartphone

Text

Send us your questions at 617-431-2427

call

Call

Call for info or research assistance at 617-353-2700

people

Meet

Make an appointment with a subject specialist librarian over Zoom