We in Telegram
Add news
March 2010 April 2010 May 2010 June 2010 July 2010
August 2010
September 2010 October 2010
November 2010
December 2010
January 2011
February 2011 March 2011 April 2011 May 2011 June 2011 July 2011 August 2011 September 2011 October 2011 November 2011 December 2011 January 2012 February 2012 March 2012 April 2012 May 2012 June 2012 July 2012 August 2012 September 2012 October 2012 November 2012 December 2012 January 2013 February 2013 March 2013 April 2013 May 2013 June 2013 July 2013 August 2013 September 2013 October 2013 November 2013 December 2013 January 2014 February 2014 March 2014 April 2014 May 2014 June 2014 July 2014 August 2014 September 2014 October 2014 November 2014 December 2014 January 2015 February 2015 March 2015 April 2015 May 2015 June 2015 July 2015 August 2015 September 2015 October 2015 November 2015 December 2015 January 2016 February 2016 March 2016 April 2016 May 2016 June 2016 July 2016 August 2016 September 2016 October 2016 November 2016 December 2016 January 2017 February 2017 March 2017 April 2017 May 2017 June 2017 July 2017 August 2017 September 2017 October 2017 November 2017 December 2017 January 2018 February 2018 March 2018 April 2018 May 2018 June 2018 July 2018 August 2018 September 2018 October 2018 November 2018 December 2018 January 2019 February 2019 March 2019 April 2019 May 2019 June 2019 July 2019 August 2019 September 2019 October 2019 November 2019 December 2019 January 2020 February 2020 March 2020 April 2020 May 2020 June 2020 July 2020 August 2020 September 2020 October 2020 November 2020 December 2020 January 2021 February 2021 March 2021 April 2021 May 2021 June 2021 July 2021 August 2021 September 2021 October 2021 November 2021 December 2021 January 2022 February 2022 March 2022 April 2022 May 2022 June 2022 July 2022 August 2022 September 2022 October 2022 November 2022 December 2022 January 2023 February 2023 March 2023 April 2023 May 2023 June 2023 July 2023 August 2023 September 2023 October 2023 November 2023 December 2023 January 2024 February 2024 March 2024 April 2024
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
27
28
29
30
News Every Day |

ISIS Executions and Non-Consensual Porn Are Powering AI Art

ISIS Executions and Non-Consensual Porn Are Powering AI Art

Some of the image-generating AI tools that have taken over the internet in recent months are powered in part by some of the worst images that have ever been posted to the internet, including images of the Islamic State executing people, photoshopped nudes of celebrities, and real nudes that were hacked from celebrities’ phones in the 2014 incident that came to be known as “The Fappening.”

AI text-to-image tools like DALL-E 2, Midjourney, and Stable Diffusion have all gone mainstream in recent months, allowing people to generate images in a matter of seconds. These tools (and other, less popular ones) rely on training data, which comes in the form of massive datasets of images scraped from the internet. These datasets are usually impossible for an average person to audit because they contain hundreds of millions and in some cases billions of images, with no easy way to sort through them. 

Recently, however, a site called Have I Been Trained allowed people to search the LAION-5B open source dataset, which contains 5.8 billion images scraped from the internet. LAION-5B and LAION 400M (which contains 400 million images) are not used by DALL-E 2 or Midjourney, but are used in part by several other projects, including the text-to-image AI Stable Diffusion and Google’s similar tool Imagen, which is not yet public.

Have I Been Trained was created by artist and musician Holly Herndon to help artists know whether their work is included in the datasets these AI models were trained on, and in theory request these images be removed. These datasets often include artwork made by artists who are not compensated or credited for their work in any way; artists are worried that some of their work will be replaced by these AI tools, which are ultimately fueled by photos and artworks unwittingly contributed by a large swath of humanity. 

When presented with specific images of ISIS executions and non-consensual pornography, Stability AI said it could not say whether Stable Diffusion was trained on them.

The Have I Been Trained site allows you to search through the LAION-5B dataset using a method called clip-retrieval. Though LAION has its own search function, a spokesperson for Spawning, the group that created Have I Been Trained, told Motherboard that the site opted for simplicity, with the purpose of focusing on helping people “opt out of, or opt into, the dataset for future model training.”

To check whether their art is included in the LAION dataset, all an artist needs to do is type in their name or the title of an image they made and see if it turns up any of their work. Motherboard used the same search function with terms like “ISIS execution” and celebrity names and immediately found photos of extreme violence and non-consensual pornography included in the same datasets.

image (8).png
An example of images Stable Diffusion generated when given the prompt "ISIS."

On the LAION Discord, users have noted the amount of NSFW and violent content included in the database, and have discussed having a "violence detector" in the database. Users there have expressed concerns that some of the dataset could include child sex abuse imagery, and have at least attempted to remove "illegal" content, which a Discord user who works on the project describes as "content that was marked as being likely to be both about children and unsafe was completely filtered out at collection time." 

A user asked "is it possible CP [child porn] is still in the dataset?" 

"It is possible," a person who works on the project responded. "As for now, nobody reported any such sample and I saw none while trying to find unsafe samples in LAION."

The existence of Have I Been Trained shows clearly the many problems with large datasets that power various AI projects. There are copyright issues, privacy issues, bias issues, problems with scraping in illegal content, stolen content, and non-consensual content. The images from the dataset pose questions about what this means for people as their images are not only being unknowingly used, but also transformed in the form of AI-generated images. 

AI is progressing at an astonishing speed, but we still don’t have a good understanding of the datasets that power AI, and little accountability for whatever abusive images they contain.

Google, which used the LAION-400M dataset to train its Imagen image-generating AI, told Motherboard that it has several systems in place to minimize—but not eliminate—the risk of using violent or abusive images.

"We’ve designed our systems to automatically detect and filter out words or phrases that violate our policies, which prohibit users from knowingly generating content that is sexually explicit; hateful or offensive; violent, dangerous, or illegal; or divulges personal information,” a Google spokesperson told Motherboard in an email. “Additionally, Imagen relies on two types of filters--safety filters for prompts entered by users, and state-of-the-art machine vision models to identify unsafe images after generation. We also eliminate risks of exposing personally identifiable information, by avoiding images with identifiable human faces.”

Stable Diffusion, which is developed by Stability AI and trained on the LAION-5B dataset (and which Motherboard previously reported is already generating porn) similarly told Motherboard that it took several steps to reduce potential harms included in the dataset.

“Stable Diffusion does not regurgitate images but instead crafts wholly new images of its own just like a person learns to paint their own images after studying for many years,” a Stability AI spokesperson told Motherboard. “The Stable Diffusion model, created by the University of Heidelberg, was not trained on the entire LAION dataset. It was created from a subset of images from the LAION database that were pre-classified by our systems to filter out extreme content from training. In addition to removing extreme content from the training set, the team trained the model on a synthetic dataset to further reduce potential harms.”

When presented with specific images of ISIS executions and non-consensual pornography, Stability AI said it could not say whether Stable Diffusion was trained on them, and distanced itself from the LAION dataset.

“The [Heidelberg] Lab undertook multiple steps in training the models and we did not have visibility or input on those steps—so we are not in a position to confirm (or deny) which images made it through each filter. Those decisions were made by the Lab, not Stability AI,” the spokesperson said. “From [the] LAION dataset (which is also not Stability AI), we understand that the dataset pointers are labeled for unsafe nudity and violence and we believe these were part of the filtering process.”

Christoph Schuhmann, the organizational lead for the group developing the LAION dataset, said that the group takes safety and privacy very seriously, and that while the dataset is available to the public, it is specifically designed for the “research community” rather than the general public, and that it is merely pointing to images that are publicly available elsewhere on the internet.

“First of all, it is important to know that we as LAION, are not providing any images, but just links to images already available in the public internet on publicly available websites,” Schuhmann told Motherboard in an email. “In general, our data set is targeted at users from the research community and should always, as stated in our disclaimer, be used responsibly and very thoughtfully,”  Schuhmann told Motherboard in an email.

Schuhmann said that to avoid any kind of unwanted content, LAION implemented filters to not only tag potential NSFW content, but instantly remove images that could potentially contain such content.

“However, since we are providing data for scientific use, it is important to know that NSFW-images are very valuable for developing better detectors for such undesirable contents and can contribute to AI safety research,” he said. “In our opinion, any image generation model must ALWAYS ONLY be trained on filtered subsets of our LAION datasets.”

Schuhmann said that LAION can’t say that people who used its dataset actually used the filters it has, “but we can say with certainty that we have always been safety oriented & transparent about the nature of our data sets.”

LAION-5B was released in March, expanding upon its original LAOIN-400M. According to a press release about LAION-5B, the new dataset was “collected using a three-stage pipeline.” First, machines gathered data from Common Crawl, which is an open repository of web crawl data composed of over 50 billion web pages, to collect all HTML image tags that had alt-text attributes. Then, language detection was performed on the alt-text. Raw images were downloaded from the URLs and sent to a CLIP model, which is an AI model that connects text and images to calculate the similarity between the image and text. Duplicate images and pairs with low similarity were discarded. If the text was less than five characters or the image resolution was too large, the image was also removed. 

The LAION team is aware of the copyright issues that they may face. LAION claims that all data falls under Creative Common CC-BY 4.0, which allows people to share and adapt material as long as you attribute the source, provide a link to the license and indicate if changes were made. On its FAQ, it says that it’s simply indexing existing content and after computing similarity scores between pictures and texts, all photos are discarded. The FAQ also states that if your name is on the ALT text data and the corresponding image does not contain your image, it is not considered personal data. However, there is a takedown form on the LAION site that people can fill out and the team will remove the image from all data repositories that it owns if it violates data protection regulations. LAION is supportive of the Spawning team’s efforts, who have been sending them bulk lists of links of images to remove. 

LAION’s site does not go into detail about the NSFW and violent images that appear in the dataset. It mentions that “links in the dataset can lead to images that are disturbing or discomforting depending on the filter or search method employed,” but does not address whether or not action would be taken to change this. The FAQ says, “We cannot act on data that are not under our control, for example, past releases that circulate via torrents.” This sentence could potentially apply to something such as Scarlett Johansson’s leaked nudes, which already exist across the internet, and relinquishes responsibility from the dataset creators. 

Have I Been Trained aims to empower people to see whether there are images they want to be removed from the datasets that power AI, but Motherboard’s reporting shows that it’s more difficult to find anyone who will take responsibility for scraping these images from the internet in the first place. When reached for comment, Google and Stability AI referred us to the maintainers of the LAION dataset. LAION, in turn, pointed to Common Crawl. Everyone we contacted explained that they have methods to filter abusive content, but admitted that they are not perfect. 

All of this points to an increasingly obvious problem: AI is progressing at an astonishing speed, but we still don’t have a good understanding of the datasets that power AI, and little accountability for whatever abusive images they contain.

Motherboard has written extensively about internet platforms’ inability to stop the spread of non-consensual pornography and how new forms of machine learning introduce a new form of abuse, especially for women and people in marginalized communities. The inclusion of image-text pairings that perpetuate exploitation and bias as well as take images without explicit permission forms a harmful foundation for AI art models. 

Google’s Imagen has acknowledged these shortcomings and cite them as the reason why it has not allowed public usage of the program: “While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes. Imagen relies on text encoders trained on uncurated web-scale data, and thus inherits the social biases and limitations of large language models. As such, there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place.” 

Stable Diffusion also acknowledges the potential harm of the results, saying “Despite how impressive being able to turn text into image is, beware to the fact that this model may output content that reinforces or exacerbates societal biases, as well as realistic faces, pornography and violence.”

Jason Koebler contributed reporting.

Москва

Транспортный коридор «Север-Юг» станет катализатором социального и экономического развития стран Евразии и глобального Юга

Paige Spiranac puts on busty display in plunging top as she lists the ‘things that drive me crazy’

Ramon Cardenas aims to cement his contender status agains Jesus Ramirez Rubio tonight

NYU Hospital on Long Island performs miraculous surgery

Ryan Poles Needs A Last-Minute Review Of His Quarterback Scouting Notes To Ensure Nothing Is Missed

Ria.city






Read also

Phillies vs. Padres Player Props Today: Brandon Marsh - April 26

Essential Tips for Paddleboarding Launching, Standing, and Stability

Meet the American who invented the gas-powered tractor, entrepreneur John Froelich, helped feed the world

News, articles, comments, with a minute-by-minute update, now on Today24.pro

News Every Day

NYU Hospital on Long Island performs miraculous surgery

Today24.pro — latest news 24/7. You can add your news instantly now — here


News Every Day

Ryan Poles Needs A Last-Minute Review Of His Quarterback Scouting Notes To Ensure Nothing Is Missed



Sports today


Новости тенниса
ATP

Медведев остался лучшим среди россиян в обновлённом рейтинге ATP, Рублёв — восьмой



Спорт в России и мире
Москва

Команда подмосковного главка Росгвардии заняла призовое место на чемпионате Центрального округа по стрельбе из боевого ручного стрелкового оружия



All sports news today





Sports in Russia today

Москва

Команда подмосковного главка Росгвардии заняла призовое место на чемпионате Центрального округа по стрельбе из боевого ручного стрелкового оружия


Новости России

Game News

Для мобильного шутера Nebula Rangers проходит бета-тест на Android


Russian.city


Симферополь

Час экологической безопасности «Эхо далекой катастрофы» к Международному дню памяти жертв радиационных аварий и катастроф


Губернаторы России
Михаил Кутушов

Компания ICDMC стала победителем престижной премии в сфере ЗОЖ – Green Awards 2023/24


Проще, чем кажется: как сэкономить на пикнике

Замена труб канализации в Московской области

Подключение системы отопления в Московской области

ЖК “Балтийская Гавань” - комфортная жизнь на берегу моря


Валерий Гергиев — о судьбе театров после объединения Мариинки и Большого

Певица Юлианна Караулова пожалела, что увеличила губы

«Победа над ЦСКА всегда принципиальна, но этот сезон потерян и провален» — Мацуев

Суд рассмотрит 25 апреля протокол на Ивлееву за дискредитацию ВС России


Мария стала соперницей Азаренко на турнире WTA в Мадриде

«Был риск завершить борьбу еще в первом матче». В России оценили победу Рыбакиной в Штутгарте

Россиянин Сафиуллин потерял четыре места в рейтинге ATP

Медведев остался лучшим среди россиян в обновлённом рейтинге ATP, Рублёв — восьмой



Ведущие «Авторадио» исполнили в Кремле культовую песню о самой масштабной стройке XX века

Шапки женские вязаные на Wildberries, 2024 — новый цвет от 392 руб. (модель 466)

«Регионы России – инвестиционный рай для коворкингов»: экспертное мнение Сергея Потанина, основателя сети коворкингов BLOKS

Врач Пылев: склонность к получению солнечных ожогов связана с риском рака кожи


Энергетики привлекают в отрасль студентов профильных специальностей

Сергей Трофимов выступит с летним концертом в Зеленом Театре ВДНХ

Бизнес-астролог Александра Секачева дала интервью журналу “Фокус внимания”

Чем интересен EVOLUTE i-SPACE? Почему его стоит купить рядовому потребителю


Большая вода паводка-2024 выявила множество проблем в России

Каким видит туризм на Ямале руководитель профильного центра

Закупка на проектирование Минздрава Подмосковья не прошла проверку

Киркорова оштрафовали за уклонение от оплаты парковки



Путин в России и мире






Персональные новости Russian.city
Певец

Певец SHAMAN расплакался на концерте в Уфе после исполнения песни «Мама»



News Every Day

Paige Spiranac puts on busty display in plunging top as she lists the ‘things that drive me crazy’




Friends of Today24

Музыкальные новости

Персональные новости