Wikimedia explains combating AI in HAI seminar
Wikimedia Enterprise principal product manager Chris Petrillo advised strategies to combat the rise of bot traffic, emphasizing the importance of human-driven contribution despite AI’s growing role in a Wednesday talk with the Stanford Institute for Human-Centered Artificial Intelligence (HAI) and Stanford Data Science (SDS).
During the talk “Wikimedia: Wikipedia in the Age of AI and Bots,” attendees got a glimpse of Wikipedia’s back-end strategies and policies made to combat the issue of automation and data laundering posed by AI. .
A top 10 worldwide visited website, Wikipedia’s universality and widespread use motivated HAI to feature them at the seminar.
“Understanding that Wikipedia in particular is something that all of us use, and in the age of AI, it’s a source of information that is being used by companies making chatbots and it also still remains as one of the few remaining places on the internet that all of us go to,” said Patrick Hynes, a senior manager of research communities at Stanford HAI.
Petrillo said that page views have decreased with the advent of generative AI. Studies show that many internet users prefer the Gemini summary that now appears after a quick Google search, rather than manually clicking a link. Conversely, bot traffic to websites has increased, scraping data to exhibit on their own sites. Petrillo said there are three million references to Wikipedia content on Google Scholar, and emphasized the impact of Wikipedia on content everywhere from LLMs to common products to TikTok.
“We’ve seen pretty major spikes associated with specific news events,” Petrillo said. “People will rush to Wikipedia and expect to see the content, but bots will do the same thing, where they’ll try to take all this content.”
Although generative AI became widespread in 2023, Wikipedia has had a history of employing bots since the early 2000s. For instance, in 2002 Rambot debuted as the first editor on Wikipedia, and eight years later, ClueBot NG, a bot using an artificial neural network and Bayesian statistics made its appearance on Wikipedia. However, with the rise of generative AI in 2023, Wikipedia explained an AI policy to address an influx of drafted edits generated by ChatGPT.
“AI has always existed on Wikipedia since its inception,” Petrillo said. “We’ve also developed more robust policies around general access, instituting things like high-level rate limits [restrictions that limit the number of user requests] that are extremely permissive but allow us to better protect and safeguard our infrastructure over time.”
Currently, Wikipedia is adopting several strategies to accommodate these shifts. Intending to uphold their mission of disseminating free knowledge for all, Wikipedia is looking for other tech and commercial partnerships to further drive this vision. Petrillo’s team is also using commercial APIs to manage traffic and commercial access on pages.
Users’ preference for AI summaries rather than webpages brings light to another issue: inherent bias in AI and algorithms. To Petrillo, a solution could be reviving Wikipedia as a source for human-driven contributions through open-forum discussion on Wikipedia pages among editors.
“Paradoxically, some of the most reliable and accurate pages on Wikipedia come from controversial topics because there’s more attention given to those topics, and there’s more editors that are working overtime to figure out how to represent different viewpoints on those pages,” Petrillo said.
According to Petrillo, human-driven contribution thrives through the editorial process, in which one begins as a reader and progresses to provide input on others’ work as an editor. However, he is noticing a shift in this pathway as many editors have been aggregating information from AI or external sources and then making edits on pages.
“At every single stage of large language model development and commercialization, Wikimedia content has been found,” Petrillo said.
Beyond behind-the-scenes concerns at Wikipedia, Petrillo stressed the need to acknowledge Wikipedia’s readers as well. Petrillo suggested including short-form video content on Wikipedia to improve site engagement.
“And then the other question is, what about other audiences?” Petrillo said. “How do we reach new audiences and continue our mission of free knowledge in a meaningful way?”
Wikipedia’s diverse online audience was reflected in the diversity of backgrounds represented in the seminar, which drew faculty, students and community members.
“I’m here on a business trip, and I want to run an AI [startup/initiative], and so I found this event and attended this,” event attendee Karen Yamamoto said.
While some audience members favored the use of AI and bots, Petrillo ultimately promoted a human-centric view of Wikipedia.
“As more people participate, the quality of the encyclopedia improves. Consensus is reached more readily, or debates happen more readily to build better content,” Petrillo said.
The post Wikimedia explains combating AI in HAI seminar appeared first on The Stanford Daily.