DALL-E 2 was “trained” on about 650 million image-text pairs scraped from the Internet, learning from this dataset the relationships between the images and the words used to describe them. However, while OpenAI filters images for specific content (e.g. pornography and duplicates) and implemented additional filters at the API level, for example for prominent public figures, the company admits that the system can sometimes generate projects that include logos or characters with trademarks. I see: “OpenAI will evaluate different approaches to handling potential copyright and trademark issues, which may include licensing such generations as part of ‘fair use’ or similar concepts, filtering certain types of content, and working directly with copyright [and] trademark owners on these issues,” the company wrote in an analysis published ahead of the DALL-E 2 beta on Wednesday. It’s not just a DALL-E 2 problem. As the AI community builds open-source implementations of DALL-E 2 and its predecessor, DALL-E, both free and paid services are launching top models that train less carefully filtered data sets. The first, Pixelz.ai, which launched an image-making app this week powered by a custom DALL-E model, makes it easy to create photos showing various Pokemon and Disney characters from movies like Guardians of the Galaxy and Frozen . When contacted for comment, the Pixelz.ai team told TechCrunch that they filter the model’s training data for profanity, hate speech and “illegal activities” and block users from requesting such images during creation. The company also said it plans to add a reporting feature that will allow users to submit images that violate its terms of service to a group of human moderators. However, when it comes to intellectual property (IP), Pixelz.ai leaves users “in charge” of using or distributing the images they create — gray area or not. “We discourage copyright infringement in both our dataset and our platform’s terms of service,” the team told TechCrunch. “That said, we provide an open-text input, and people will always find creative ways to abuse a platform.” An image of Rocket Racoon from Disney/Marvel’s Guardians of the Galaxy, created by Pixelz.ai’s system. Bradley J. Hulbert, a founding partner at the law firm MBHD and an expert in intellectual property law, believes that imaging systems are problematic from a copyright perspective in several ways. He noted that artwork that is “demonstrably derived” from a “protected work” — that is, copyrighted character — has generally been found by courts to be infringing, even if additional elements are added. (Think of an image of a Disney princess walking through a New York neighborhood.) To be protected from copyright claims, the work must be “transformative” — in other words, changed to such an extent that the IP is not recognizable. “If a Disney princess is recognizable in an image created by DALL-E 2, we can safely assume that Walt Disney Co. will likely claim that the DALL-E 2 image is a derivative work and infringes her copyright on the Disney princess likeness,” Hulbert told TechCrunch via email. “A substantial transformation is also a factor considered in determining whether a copy constitutes ‘fair use.’ But, again, to the extent that a Disney princess is recognizable in a later work, assume that Disney will claim that the later work is copyright infringement.” Of course, the battle between IP owners and alleged infringers is nothing new, and the Internet has merely acted as an accelerator. In 2020, Warner Bros. Entertainment, which owns the rights to film depictions of the Harry Potter universe, has removed some fan art from social media platforms including Instagram and Etsy. A year earlier, Disney and Lucasfilm asked Giphy to remove “Baby Yoda” GIFs. But image-generating artificial intelligence threatens to greatly escalate the problem by lowering the barrier to entry. The plight of big corporations is not likely to garner sympathy (nor should it), and their attempts to enforce intellectual property often fail in the court of public opinion. On the other hand, AI-generated artwork that infringes on, say, an independent artist’s characters could threaten their livelihood. The other thorny legal issue with systems like DALL-E 2 concerns the content of the training data sets. Did companies like OpenAI violate copyright law by using copyrighted images and artwork to develop their system? It’s a question that’s already been asked within Copilot, the commercial code generation tool jointly developed by OpenAI and GitHub. But unlike Copilot, which was trained on code that GitHub could have the right to use for this purpose under its terms of service (according to a legal analysis), systems like DALL-E 2 come from images from countless public websites. Ladies and gentlemen, I got my invitation to Dall-E 2! 😁😁 here are some shots of Homer Simpson in Stranger Things before I start tweeting the amazing stuff #dalle2 pic.twitter.com/PHPI6n9yJk — limb0wl 🦉👾 (@limb0wl) July 5, 2022 As Dave Gershgorn points out in a recent feature for The Verge, there is no direct US legal precedent supporting publicly available education data as fair use. A potentially relevant case involves a Lithuanian company called Planner 5D. In 2020, the company sued Meta (then Facebook) for stealing thousands of files from its Planner 5D software, which were made available through a partnership with Princeton to contestants in the 2019 Scene Understanding and Modeling challenge for computer vision researchers. Planner 5D claimed that Princeton, Meta and Oculus, Meta’s VR-focused hardware and software division, could have profited commercially from the training data obtained from it. The case isn’t scheduled to go to trial until March 2023. But last April, the U.S. district judge overseeing the case rejected then-Facebook and Princeton’s motions to dismiss Planner 5G’s claims. Not surprisingly, rights holders are not swayed by the fair use argument. A spokesperson for Getty Images told IEEE Spectrum in an article that there are “big questions” that need to be answered about “rights in images and the people, places and objects within the images that [models like DALL-E 2] they were trained in.” Association of Illustrators CEO Rachel Hill, who was also quoted in the piece, raised the issue of image compensation in the training data. Hulbert thinks it’s unlikely a judge would see copies of copyrighted works in training datasets as fair use — at least in the case of commercial systems like DALL-E 2. He doesn’t think it’s out of the question for IP owners to come after companies like OpenAI at some point and require them to license the images used to train their systems. “Copying… constitutes an infringement of the copyright of the original creators. And infringers are liable to copyright holders for damages,” he added. “[If] DALL-E (or DALL-E 2) and its collaborators are making a copy of a protected work and the copy is not authorized by the copyright owner nor fair use, copying constitutes copyright infringement.” Interestingly, the UK is exploring legislation that would remove the current requirement that systems trained through text and data mining, such as DALL-E 2, be used strictly for non-commercial purposes. While copyright holders could claim payment under the proposed regime by putting their works behind a paywall, it would make the UK’s policy one of the most liberal in the world. The US seems unlikely to follow suit, given the lobbying power of IP owners in the US. But time will tell.
title: “Ai That Creates Commercial Images Raises All Kinds Of Thorny Legal Issues Techcrunch " ShowToc: true date: “2022-12-04” author: “Luis Beckel”
DALL-E 2 was “trained” on about 650 million image-text pairs scraped from the Internet, learning from this dataset the relationships between the images and the words used to describe them. However, while OpenAI filters images for specific content (e.g. pornography and duplicates) and implemented additional filters at the API level, for example for prominent public figures, the company admits that the system can sometimes generate projects that include logos or characters with trademarks. I see: “OpenAI will evaluate different approaches to handling potential copyright and trademark issues, which may include licensing such generations as part of ‘fair use’ or similar concepts, filtering certain types of content, and working directly with copyright [and] trademark owners on these issues,” the company wrote in an analysis published ahead of the DALL-E 2 beta on Wednesday. It’s not just a DALL-E 2 problem. As the AI community builds open-source implementations of DALL-E 2 and its predecessor, DALL-E, both free and paid services are launching top models that train less carefully filtered data sets. The first, Pixelz.ai, which launched an image-making app this week powered by a custom DALL-E model, makes it easy to create photos showing various Pokemon and Disney characters from movies like Guardians of the Galaxy and Frozen . When contacted for comment, the Pixelz.ai team told TechCrunch that they filter the model’s training data for profanity, hate speech and “illegal activities” and block users from requesting such images during creation. The company also said it plans to add a reporting feature that will allow users to submit images that violate its terms of service to a group of human moderators. However, when it comes to intellectual property (IP), Pixelz.ai leaves users “in charge” of using or distributing the images they create — gray area or not. “We discourage copyright infringement in both our dataset and our platform’s terms of service,” the team told TechCrunch. “That said, we provide an open-text input, and people will always find creative ways to abuse a platform.” An image of Rocket Racoon from Disney/Marvel’s Guardians of the Galaxy, created by Pixelz.ai’s system. Bradley J. Hulbert, a founding partner at the law firm MBHD and an expert in intellectual property law, believes that imaging systems are problematic from a copyright perspective in several ways. He noted that artwork that is “demonstrably derived” from a “protected work” — that is, copyrighted character — has generally been found by courts to be infringing, even if additional elements are added. (Think of an image of a Disney princess walking through a New York neighborhood.) To be protected from copyright claims, the work must be “transformative” — in other words, changed to such an extent that the IP is not recognizable. “If a Disney princess is recognizable in an image created by DALL-E 2, we can safely assume that Walt Disney Co. will likely claim that the DALL-E 2 image is a derivative work and infringes her copyright on the Disney princess likeness,” Hulbert told TechCrunch via email. “A substantial transformation is also a factor considered in determining whether a copy constitutes ‘fair use.’ But, again, to the extent that a Disney princess is recognizable in a later work, assume that Disney will claim that the later work is copyright infringement.” Of course, the battle between IP owners and alleged infringers is nothing new, and the Internet has merely acted as an accelerator. In 2020, Warner Bros. Entertainment, which owns the rights to film depictions of the Harry Potter universe, has removed some fan art from social media platforms including Instagram and Etsy. A year earlier, Disney and Lucasfilm asked Giphy to remove “Baby Yoda” GIFs. But image-generating artificial intelligence threatens to greatly escalate the problem by lowering the barrier to entry. The plight of big corporations is not likely to garner sympathy (nor should it), and their attempts to enforce intellectual property often fail in the court of public opinion. On the other hand, AI-generated artwork that infringes on, say, an independent artist’s characters could threaten their livelihood. The other thorny legal issue with systems like DALL-E 2 concerns the content of the training data sets. Did companies like OpenAI violate copyright law by using copyrighted images and artwork to develop their system? It’s a question that’s already been asked within Copilot, the commercial code generation tool jointly developed by OpenAI and GitHub. But unlike Copilot, which was trained on code that GitHub could have the right to use for this purpose under its terms of service (according to a legal analysis), systems like DALL-E 2 come from images from countless public websites. Ladies and gentlemen, I got my invitation to Dall-E 2! 😁😁 here are some shots of Homer Simpson in Stranger Things before I start tweeting the amazing stuff #dalle2 pic.twitter.com/PHPI6n9yJk — limb0wl 🦉👾 (@limb0wl) July 5, 2022 As Dave Gershgorn points out in a recent feature for The Verge, there is no direct US legal precedent supporting publicly available education data as fair use. A potentially relevant case involves a Lithuanian company called Planner 5D. In 2020, the company sued Meta (then Facebook) for stealing thousands of files from its Planner 5D software, which were made available through a partnership with Princeton to contestants in the 2019 Scene Understanding and Modeling challenge for computer vision researchers. Planner 5D claimed that Princeton, Meta and Oculus, Meta’s VR-focused hardware and software division, could have profited commercially from the training data obtained from it. The case isn’t scheduled to go to trial until March 2023. But last April, the U.S. district judge overseeing the case rejected then-Facebook and Princeton’s motions to dismiss Planner 5G’s claims. Not surprisingly, rights holders are not swayed by the fair use argument. A spokesperson for Getty Images told IEEE Spectrum in an article that there are “big questions” that need to be answered about “rights in images and the people, places and objects within the images that [models like DALL-E 2] they were trained in.” Association of Illustrators CEO Rachel Hill, who was also quoted in the piece, raised the issue of image compensation in the training data. Hulbert thinks it’s unlikely a judge would see copies of copyrighted works in training datasets as fair use — at least in the case of commercial systems like DALL-E 2. He doesn’t think it’s out of the question for IP owners to come after companies like OpenAI at some point and require them to license the images used to train their systems. “Copying… constitutes an infringement of the copyright of the original creators. And infringers are liable to copyright holders for damages,” he added. “[If] DALL-E (or DALL-E 2) and its collaborators are making a copy of a protected work and the copy is not authorized by the copyright owner nor fair use, copying constitutes copyright infringement.” Interestingly, the UK is exploring legislation that would remove the current requirement that systems trained through text and data mining, such as DALL-E 2, be used strictly for non-commercial purposes. While copyright holders could claim payment under the proposed regime by putting their works behind a paywall, it would make the UK’s policy one of the most liberal in the world. The US seems unlikely to follow suit, given the lobbying power of IP owners in the US. But time will tell.