Study finds AI trains itself on Canadian journalism but doesn't always give credit
A pair of studies from the Quebec-based Centre for Media, Technology and Democracy has found that AI companies not only use Canadian journalism to train their chatbots, but that those bots are not revealing their sources, with worrisome consequences for the future of the craft.
In the first study, researchers at McGill University tested four major AI models on 2,267 real Canadian news stories in both English and French — 18,134 queries in all — to measure what models have absorbed from their training data and whether they attribute it.
In the second, they enabled web searching and asked the same models about 140 specific recent articles from seven Canadian outlets, measuring whether AI models produce viable substitutes for current journalism, and whether they credit their sources.
“When asked about Canadian news events drawn from their training data, ChatGPT, Gemini, Claude and Grok provide no source attribution 82 per cent of the time,” the researchers wrote in a policy brief titled AI News Audit: AI, Canadian Journalism, and Paths for Policy Action.
When given web access and asked about specific recent articles, the same models covered enough of the original reporting to substitute for the source in between 54 per cent and 81 per cent of cases, the study found.
“Models linked to Canadian news sites in 29 per cent to 69 per cent of responses,” researchers wrote, “but named the originating outlet in the response text in only one per cent to 16 per cent of cases.”
They added: “When we named the outlet and asked the same models for citations, attribution rates reached 74 to 97 per cent.”
They also noted that AI models are becoming an important channel through which the public encounters news, but that the AIs rarely send readers back to the source.
“Links provide a pathway back to the source,” they wrote. “A consumer who clicks through can reach the newsroom. But a consumer who simply reads the response will not know which newsroom produced the reporting.”
Researchers found the outlets that receive the most AI visibility are a handful of large, free, nationally prominent organizations including CBC, CTV and the Globe and Mail.
“Paywalled and regional outlets, including many that do substantial original reporting, fall well below proportional representation,” they added. The Toronto Star, for instance, received 11 named-as-source mentions across over 18,000 responses. The Montreal Gazette received one.
French-language journalism faced a compounded version of this problem.
“French stories were absorbed into training data at rates comparable to English ones, but French outlets appeared in citations only 10 per cent of the time,” researchers wrote.
“Radio-Canada and La Presse dominate the small number of French citations that do appear. The Journal de Montreal, one of Quebec’s most widely read newsrooms, is nearly invisible to AI systems. Their content is ingested. Their contribution is erased.”
Implications for the future of journalism are dire.
“AI companies have built commercial products that depend, in significant part, on the reporting that Canadian journalists produce,” the report says. “They have done so without compensation, without attribution, and without any obligation to sustain the infrastructure they are drawing from. The result is a system that accelerates the economic decline of the journalism it relies on.”
The report makes several recommendations. It notes that that Bill C-18, the Online News Act, “established the principle that technology companies profiting from the work of Canadian journalists should enter into a fair process to determine the value of this exchange.”
It says those principles should also apply to AI companies. “But the Act’s definitional architecture, built around entities that index and display news content, does not capture companies that absorb and synthesize it. The question is whether and how C-18’s scope should be extended to a fundamentally different form of intermediation. That is not a simple amendment.”
It also brings up The Copyright Act as providing “the other potential lever,” but notes: “The Act’s fair dealing doctrine is limited to enumerated purposes, and whether large-scale commercial AI training constitutes ‘research’ has never been tested in Canadian courts.”
Under the heading of “next steps,” the paper recommends reforming the Online News and Copyright Acts to deal with AI systems; enacting statutory licensing and international frameworks; and creating attribution standards for AI answers to news questions.
“The 74 to 97 per cent attribution rate we observed when users named the outlet and asked for citations suggests that the technical capacity for meaningful source identification already exists,” the report says.
It concludes: “In the absence of deliberate policy choices, the terms of AI companies’ relationship to Canadian journalism are being set by corporate design decisions made outside Canadian jurisdiction. The evidence we present here makes the scale of that relationship visible. What democratic institutions do with that evidence is a political choice, not a technical one.”
The report hasn’t gone unnoticed in political circles. Federal Culture Minister Marc Miller said Tuesday that he had seen the report, and that the government must have a serious conversation about the use of news by AI.
“This is about people paying their fair share,” he said. “Having the news cannibalized and regurgitated undermines the spirit of the use of that news in the first place and the purpose for which it’s used, and we have to have a serious conversation with the platforms that purport to use it, including AI shops.”
The paper was written by Taylor Owen and Aengus Bridgman, founding director and associate director of research, respectively, at the Centre for Media, Technology and Democracy at McGill University.
Our website is the place for the latest breaking news, exclusive scoops, longreads and provocative commentary. Please bookmark nationalpost.com and sign up for our daily newsletter, Posted, here.