Inside OpenAI’s New Experiment: Using Real Jobs to Measure AI vs Humans
OpenAI is taking an unusually direct route to see how well its AI can handle real jobs.
The AI juggernaut is asking third-party contractors to upload real work they completed in previous or current jobs as part of a new effort to test how well its AI systems perform on real-world tasks. The details were first reported by WIRED, which reviewed internal documents and presentations tied to the project.
According to the report, the initiative is being carried out in partnership with the training data firm Handshake AI. Contractors are asked to describe tasks they handled at work and upload the actual files they produced, such as documents, spreadsheets, presentations, images, or code repositories.
The goal is to create a clear “human baseline” that OpenAI can compare directly with the output of its latest AI models.
The project ties into OpenAI’s broader push to measure AI performance against skilled human workers across different industries. In September, the company launched a new evaluation process designed to compare AI systems with professionals doing economically valuable work — an effort OpenAI considers central to its long-term goal of building artificial general intelligence (AGI).
One confidential OpenAI document reviewed by WIRED explains the thinking behind the project.
“We’ve hired folks across occupations to help collect real-world tasks modeled off those you’ve done in your full-time jobs, so we can measure how well AI models perform on those tasks,” the document reads, as quoted by WIRED.
Contractors are encouraged to turn completed long or complex assignments into structured tasks that AI systems can attempt to replicate.
What OpenAI is asking for
An OpenAI presentation seen by WIRED says contractors should upload actual work files, not summaries. Examples listed include Word documents, PDFs, PowerPoint slides, Excel files, images, and software repositories.
Each submission has two parts:
- The task request, such as instructions from a manager or colleague
- The task deliverable, meaning the real work produced in response
The company stresses repeatedly that these should be tasks the contractor has “actually done” on the job. OpenAI also allows fabricated examples, but only if they realistically reflect how someone would respond in a real workplace scenario.
The ‘superstar’ scrubbing tool
OpenAI isn’t asking for trade secrets — at least not intentionally. The company has repeatedly emphasized that contractors must remove any confidential or personally identifiable information (PII) before clicking the upload button.
To help with this “data cleaning,” OpenAI has even pointed workers toward a specific ChatGPT tool. According to Wired, a document mentions a tool called “Superstar Scrubbing” that provides guidance on deleting sensitive data before files are handed over.
Contractors are reminded to “remove or anonymize any: personal information, proprietary or confidential data, material nonpublic information (e.g., internal strategy, unreleased product details).”
A legal minefield?
Despite the scrubbing tools, legal experts are sounding the alarm. The strategy relies almost entirely on the individual contractor’s ability to spot what is and isn’t a corporate secret.
Evan Brown, an intellectual property lawyer with Neal & McDevitt, told Wired that this approach is a massive gamble for the AI giant.
“The AI lab is putting a lot of trust in its contractors to decide what is and isn’t confidential,” Brown said. “If they do let something slip through, are the AI labs really taking the time to determine what is and isn’t a trade secret? It seems to me that the AI lab is putting itself at great risk.”
Beyond the risk to OpenAI, contractors themselves could be in hot water. Even if a document is “scrubbed,” sharing it could still violate non-disclosure agreements (NDAs) signed with previous employers who actually own the intellectual property.
A growing industry around high-quality AI data
The documents reviewed by Wired highlight a wider trend across the AI industry. Companies such as OpenAI, Anthropic, and Google are increasingly relying on large networks of skilled contractors to generate high-quality training data that goes beyond what’s publicly available online.
As AI models aim to automate more complex office and enterprise work, demand for realistic, professional-grade data has grown. That shift has helped create a fast-growing data-training market, with firms like Handshake AI and Surge positioning themselves as key players.
OpenAI has also explored other ways to source real company data, including acquiring information from businesses that have shut down — an idea that was ultimately not pursued due to concerns about fully removing personal information, according to Wired.
Be sure to check out how OpenAI’s GPT-5.2 just cracked a 30-year math challenge, marking a new era of AI reasoning that goes beyond simple pattern matching.
The post Inside OpenAI’s New Experiment: Using Real Jobs to Measure AI vs Humans appeared first on eWEEK.