News24.pro Life24.pro Ru24.pro Today24.pro Game24.pro Ria.city CoolReceptES.ru

123ru.net

News Every Day

ISIS Executions and Non-Consensual Porn Are Powering AI Art

March 2010 April 2010 May 2010 June 2010 July 2010

August 2010

September 2010 October 2010

November 2010

December 2010

January 2011

February 2011 March 2011 April 2011 May 2011 June 2011 July 2011 August 2011 September 2011 October 2011 November 2011 December 2011 January 2012 February 2012 March 2012 April 2012 May 2012 June 2012 July 2012 August 2012 September 2012 October 2012 November 2012 December 2012 January 2013 February 2013 March 2013 April 2013 May 2013 June 2013 July 2013 August 2013 September 2013 October 2013 November 2013 December 2013 January 2014 February 2014 March 2014 April 2014 May 2014 June 2014 July 2014 August 2014 September 2014 October 2014 November 2014 December 2014 January 2015 February 2015 March 2015 April 2015 May 2015 June 2015 July 2015 August 2015 September 2015 October 2015 November 2015 December 2015 January 2016 February 2016 March 2016 April 2016 May 2016 June 2016 July 2016 August 2016 September 2016 October 2016 November 2016 December 2016 January 2017 February 2017 March 2017 April 2017 May 2017 June 2017 July 2017 August 2017 September 2017 October 2017 November 2017 December 2017 January 2018 February 2018 March 2018 April 2018 May 2018 June 2018 July 2018 August 2018 September 2018 October 2018 November 2018 December 2018 January 2019 February 2019 March 2019 April 2019 May 2019 June 2019 July 2019 August 2019 September 2019 October 2019 November 2019 December 2019 January 2020 February 2020 March 2020 April 2020 May 2020 June 2020 July 2020 August 2020 September 2020 October 2020 November 2020 December 2020 January 2021 February 2021 March 2021 April 2021 May 2021 June 2021 July 2021 August 2021 September 2021 October 2021 November 2021 December 2021 January 2022 February 2022 March 2022 April 2022 May 2022 June 2022 July 2022 August 2022 September 2022 October 2022 November 2022 December 2022 January 2023 February 2023 March 2023 April 2023 May 2023 June 2023 July 2023 August 2023 September 2023 October 2023 November 2023 December 2023 January 2024 February 2024 March 2024 April 2024 May 2024

1 2 3 4 5 6 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

Home Pics Feed Brief

News Every Day | 21 September 2022, 19:49

ISIS Executions and Non-Consensual Porn Are Powering AI Art

7

ISIS Executions and Non-Consensual Porn Are Powering AI Art

Some of the image-generating AI tools that have taken over the internet in recent months are powered in part by some of the worst images that have ever been posted to the internet, including images of the Islamic State executing people, photoshopped nudes of celebrities, and real nudes that were hacked from celebrities’ phones in the 2014 incident that came to be known as “The Fappening.”

AI text-to-image tools like DALL-E 2, Midjourney, and Stable Diffusion have all gone mainstream in recent months, allowing people to generate images in a matter of seconds. These tools (and other, less popular ones) rely on training data, which comes in the form of massive datasets of images scraped from the internet. These datasets are usually impossible for an average person to audit because they contain hundreds of millions and in some cases billions of images, with no easy way to sort through them.

Recently, however, a site called Have I Been Trained allowed people to search the LAION-5B open source dataset, which contains 5.8 billion images scraped from the internet. LAION-5B and LAION 400M (which contains 400 million images) are not used by DALL-E 2 or Midjourney, but are used in part by several other projects, including the text-to-image AI Stable Diffusion and Google’s similar tool Imagen, which is not yet public.

Have I Been Trained was created by artist and musician Holly Herndon to help artists know whether their work is included in the datasets these AI models were trained on, and in theory request these images be removed. These datasets often include artwork made by artists who are not compensated or credited for their work in any way; artists are worried that some of their work will be replaced by these AI tools, which are ultimately fueled by photos and artworks unwittingly contributed by a large swath of humanity.

When presented with specific images of ISIS executions and non-consensual pornography, Stability AI said it could not say whether Stable Diffusion was trained on them.

The Have I Been Trained site allows you to search through the LAION-5B dataset using a method called clip-retrieval. Though LAION has its own search function, a spokesperson for Spawning, the group that created Have I Been Trained, told Motherboard that the site opted for simplicity, with the purpose of focusing on helping people “opt out of, or opt into, the dataset for future model training.”

To check whether their art is included in the LAION dataset, all an artist needs to do is type in their name or the title of an image they made and see if it turns up any of their work. Motherboard used the same search function with terms like “ISIS execution” and celebrity names and immediately found photos of extreme violence and non-consensual pornography included in the same datasets.

image (8).png

An example of images Stable Diffusion generated when given the prompt "ISIS."

On the LAION Discord, users have noted the amount of NSFW and violent content included in the database, and have discussed having a "violence detector" in the database. Users there have expressed concerns that some of the dataset could include child sex abuse imagery, and have at least attempted to remove "illegal" content, which a Discord user who works on the project describes as "content that was marked as being likely to be both about children and unsafe was completely filtered out at collection time."

A user asked "is it possible CP [child porn] is still in the dataset?"

"It is possible," a person who works on the project responded. "As for now, nobody reported any such sample and I saw none while trying to find unsafe samples in LAION."

The existence of Have I Been Trained shows clearly the many problems with large datasets that power various AI projects. There are copyright issues, privacy issues, bias issues, problems with scraping in illegal content, stolen content, and non-consensual content. The images from the dataset pose questions about what this means for people as their images are not only being unknowingly used, but also transformed in the form of AI-generated images.

AI is progressing at an astonishing speed, but we still don’t have a good understanding of the datasets that power AI, and little accountability for whatever abusive images they contain.

Google, which used the LAION-400M dataset to train its Imagen image-generating AI, told Motherboard that it has several systems in place to minimize—but not eliminate—the risk of using violent or abusive images.

"We’ve designed our systems to automatically detect and filter out words or phrases that violate our policies, which prohibit users from knowingly generating content that is sexually explicit; hateful or offensive; violent, dangerous, or illegal; or divulges personal information,” a Google spokesperson told Motherboard in an email. “Additionally, Imagen relies on two types of filters--safety filters for prompts entered by users, and state-of-the-art machine vision models to identify unsafe images after generation. We also eliminate risks of exposing personally identifiable information, by avoiding images with identifiable human faces.”

Stable Diffusion, which is developed by Stability AI and trained on the LAION-5B dataset (and which Motherboard previously reported is already generating porn) similarly told Motherboard that it took several steps to reduce potential harms included in the dataset.

“Stable Diffusion does not regurgitate images but instead crafts wholly new images of its own just like a person learns to paint their own images after studying for many years,” a Stability AI spokesperson told Motherboard. “The Stable Diffusion model, created by the University of Heidelberg, was not trained on the entire LAION dataset. It was created from a subset of images from the LAION database that were pre-classified by our systems to filter out extreme content from training. In addition to removing extreme content from the training set, the team trained the model on a synthetic dataset to further reduce potential harms.”

When presented with specific images of ISIS executions and non-consensual pornography, Stability AI said it could not say whether Stable Diffusion was trained on them, and distanced itself from the LAION dataset.

“The [Heidelberg] Lab undertook multiple steps in training the models and we did not have visibility or input on those steps—so we are not in a position to confirm (or deny) which images made it through each filter. Those decisions were made by the Lab, not Stability AI,” the spokesperson said. “From [the] LAION dataset (which is also not Stability AI), we understand that the dataset pointers are labeled for unsafe nudity and violence and we believe these were part of the filtering process.”

Christoph Schuhmann, the organizational lead for the group developing the LAION dataset, said that the group takes safety and privacy very seriously, and that while the dataset is available to the public, it is specifically designed for the “research community” rather than the general public, and that it is merely pointing to images that are publicly available elsewhere on the internet.

“First of all, it is important to know that we as LAION, are not providing any images, but just links to images already available in the public internet on publicly available websites,” Schuhmann told Motherboard in an email. “In general, our data set is targeted at users from the research community and should always, as stated in our disclaimer, be used responsibly and very thoughtfully,” Schuhmann told Motherboard in an email.

Schuhmann said that to avoid any kind of unwanted content, LAION implemented filters to not only tag potential NSFW content, but instantly remove images that could potentially contain such content.

“However, since we are providing data for scientific use, it is important to know that NSFW-images are very valuable for developing better detectors for such undesirable contents and can contribute to AI safety research,” he said. “In our opinion, any image generation model must ALWAYS ONLY be trained on filtered subsets of our LAION datasets.”

Schuhmann said that LAION can’t say that people who used its dataset actually used the filters it has, “but we can say with certainty that we have always been safety oriented & transparent about the nature of our data sets.”

LAION-5B was released in March, expanding upon its original LAOIN-400M. According to a press release about LAION-5B, the new dataset was “collected using a three-stage pipeline.” First, machines gathered data from Common Crawl, which is an open repository of web crawl data composed of over 50 billion web pages, to collect all HTML image tags that had alt-text attributes. Then, language detection was performed on the alt-text. Raw images were downloaded from the URLs and sent to a CLIP model, which is an AI model that connects text and images to calculate the similarity between the image and text. Duplicate images and pairs with low similarity were discarded. If the text was less than five characters or the image resolution was too large, the image was also removed.

The LAION team is aware of the copyright issues that they may face. LAION claims that all data falls under Creative Common CC-BY 4.0, which allows people to share and adapt material as long as you attribute the source, provide a link to the license and indicate if changes were made. On its FAQ, it says that it’s simply indexing existing content and after computing similarity scores between pictures and texts, all photos are discarded. The FAQ also states that if your name is on the ALT text data and the corresponding image does not contain your image, it is not considered personal data. However, there is a takedown form on the LAION site that people can fill out and the team will remove the image from all data repositories that it owns if it violates data protection regulations. LAION is supportive of the Spawning team’s efforts, who have been sending them bulk lists of links of images to remove.

LAION’s site does not go into detail about the NSFW and violent images that appear in the dataset. It mentions that “links in the dataset can lead to images that are disturbing or discomforting depending on the filter or search method employed,” but does not address whether or not action would be taken to change this. The FAQ says, “We cannot act on data that are not under our control, for example, past releases that circulate via torrents.” This sentence could potentially apply to something such as Scarlett Johansson’s leaked nudes, which already exist across the internet, and relinquishes responsibility from the dataset creators.

Have I Been Trained aims to empower people to see whether there are images they want to be removed from the datasets that power AI, but Motherboard’s reporting shows that it’s more difficult to find anyone who will take responsibility for scraping these images from the internet in the first place. When reached for comment, Google and Stability AI referred us to the maintainers of the LAION dataset. LAION, in turn, pointed to Common Crawl. Everyone we contacted explained that they have methods to filter abusive content, but admitted that they are not perfect.

All of this points to an increasingly obvious problem: AI is progressing at an astonishing speed, but we still don’t have a good understanding of the datasets that power AI, and little accountability for whatever abusive images they contain.

Motherboard has written extensively about internet platforms’ inability to stop the spread of non-consensual pornography and how new forms of machine learning introduce a new form of abuse, especially for women and people in marginalized communities. The inclusion of image-text pairings that perpetuate exploitation and bias as well as take images without explicit permission forms a harmful foundation for AI art models.

Google’s Imagen has acknowledged these shortcomings and cite them as the reason why it has not allowed public usage of the program: “While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes. Imagen relies on text encoders trained on uncurated web-scale data, and thus inherits the social biases and limitations of large language models. As such, there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place.”

Stable Diffusion also acknowledges the potential harm of the results, saying “Despite how impressive being able to turn text into image is, beware to the fact that this model may output content that reinforces or exacerbates societal biases, as well as realistic faces, pornography and violence.”

Jason Koebler contributed reporting.

Москва

Собянин записал видеообращение по случаю Дня Победы

Driving Northern California 8K Dolby Vision HDR - Pebble Beach to San Francisco

Driving Los Angeles 8K HDR Dolby Vision - USC to Manhattan Beach

Seven reasons Sporting are champions of Portugal

Sci-Fi Short Film BackSpace Forever - DUST - Online Premiere

Ria.city

Read also

10 hours ago

£80m star open to move as Man United prepare summer swoop - report

6 hours ago

Spike Lee got Reggie Miller to sign some New York newspapers from their infamous Knicks-Pacers feud

2 hours ago

UK weather: Brits to bask in stunning 22C before ‘unsettled’ weather and cooler temperatures hit the country

News, articles, comments, with a minute-by-minute update, now on Today24.pro

News Every Day

Driving Northern California 8K Dolby Vision HDR - Pebble Beach to San Francisco

Today24.pro — latest news 24/7. You can add your news instantly now — here

News Every Day

Driving Los Angeles 8K HDR Dolby Vision - USC to Manhattan Beach

Sports today

Новости тенниса

ATP

После победы в Мадриде Рублев поднялся с 8-го на 6-е место в рейтинге ATP

Энди Роддик: «У меня было несколько типов рака кожи с тех пор, как я завершил карьеру» Рублёв поднимется на шестое место в рейтинге ATP после победы на «Мастерсе» в Мадриде Российский теннисист поднялся на две позиции в топ-10 рейтинга ATP Рублев не отстал от «Реала»! Как русская звезда тенниса покорила Мадрид

Спорт в России и мире

Москва

В преддверии Дня Победы в Великой Отечественной войне: уроки мужества и познавательно-спортивная эстафета

All sports news today

Sports in Russia today

Москва

В преддверии Дня Победы в Великой Отечественной войне: уроки мужества и познавательно-спортивная эстафета

Новости России

Game News

Состоялся релиз стратегии Warbits+ в App Store и Google Play

Москва

Главе Якутии вручена медаль Министерства обороны РФ «За помощь и милосердие»

Губернаторы России

Владимир Путин

Начались переговоры Путина с Пашиняном

Парад в честь 79-й годовщины Победы начался на Красной площади

Путин перед парадом Победы встретил в Гербовом зале Кремля иностранных коллег

5 фактов, которые необходимо знать о СЭЙН и Wimmortal и их релизе «Старик и воля».

Анна Данилова дала старт Всероссийской акции "Синий платочек Победы" 2024 на станции метро Курская

Программу к 100-летию Булата Окуджавы представят псковичам 13 мая

Пасхальный фестиваль под руководством Гергиева пройдёт 7 мая в Челябинске

Тучная рэперша Lizzo показала фигуру в платье с откровенными вырезами

Почему муж Анны Нетребко женился на 70-летней певице и куда пропала оперная дива?

Двенадцать казахстанских теннисистов поднялись в рейтинге ATP

Павлюченкова официально аккредитовала на турнир в Риме двух своих собак

Рублёв стал победителем «Мастерса» в Мадриде

WTA огорчила Елену Рыбакину после турнира в Мадриде

Эксперт Президентской академии в Санкт-Петербурге о комфортных автопутешествиях

Победительницу "Диктанта Победы" из Томской области наградили в Москве

Что лечит невролог и с какими симптомами к нему обращаться?

Эксперт Президентской академии в Санкт-Петербурге о реконструкции крымской Алупки

Эвелина Бледанс: «У меня было три пышные свадьбы, и каждая из них - уникальная»

Выставка о красоте старины открыта в Поганкиных палатах в Пскове

Патрулирование пожароопасных участков лесов провели в Подмосковье

Эксперт Президентской академии в Санкт-Петербурге о реконструкции крымской Алупки

Айсен Николаев встретился с ветеранами Великой Отечественной войны

Гуцул поблагодарила Шора за передачу частицы Вечного огня

Знамя Победы и флаг России проносят на параде Победы на Красной площади

День Победы празднуют в России 9 мая

Путин в России и мире

Эксперт Президентской академии в Санкт-Петербурге о реконструкции крымской Алупки Что лечит невролог и с какими симптомами к нему обращаться? Состоялась премьера песни Гульдарии Юсуповой Эксперт Президентской академии в Санкт-Петербурге о доступности популярных туристических объектов

Мы помним, мы гордимся: «585*ЗОЛОТОЙ» представила проект «Книга памяти» с историями из семейных архивов сотрудников Ozon и Wildberries впервые вошли в мировой рейтинг маркетплейсов Эксперт "Рексофт" о технологических помощниках в логистике, ритейле и маркетплейсах История любви символов театра – в постановке Юрия Грымова

Агрегатор новостей 24СМИ

Песков: холодная погода не помешает проведению парада на Красной площади Военнослужащий заявил, что перед парадом Победы не волнуется Столтенберг назвал «неосторожным и опасным вызовом» ядерные учения России В России изменились цены на Honda CR-V. Сколько сейчас стоит надежный и выносливый японский кроссовер?

Персональные новости

Exclusive - Kettan Singh apologises to Karan Johar after filmmaker expresses disappointment over his mimicry on Madness Machayenge; says 'My intention was never to hurt him' Sci-Fi Short Film BackSpace Forever - DUST - Online Premiere Driving Northern California 8K Dolby Vision HDR - Pebble Beach to San Francisco Driving Los Angeles 8K HDR Dolby Vision - USC to Manhattan Beach

Russian.city

Queen

White Queen Birthday party «Королева морей»

Агрегатор новостей 24СМИ

News Every Day

Driving Los Angeles 8K HDR Dolby Vision - USC to Manhattan Beach

Sci-Fi Short Film BackSpace Forever - DUST - Online Premiere Exclusive - Kettan Singh apologises to Karan Johar after filmmaker expresses disappointment over his mimicry on Madness Machayenge; says 'My intention was never to hurt him' Driving Los Angeles 8K HDR Dolby Vision - USC to Manhattan Beach Seven reasons Sporting are champions of Portugal

Военнослужащий заявил, что перед парадом Победы не волнуется Опубликовано интервью Такера Карлсона с обвинившей Байдена в домогательствах и переехавшей в Россию Тарой Рид Олег Газманов поздравил томскую участницу ВОВ с Днем Победы Пронько: "Страна в буквальном смысле вымирает"

Friends of Today24

CoolReceptES.ru

Музыкальные новости

Агрегатор новостей 24СМИ

Персональные новости