Add news
March 2010 April 2010 May 2010 June 2010 July 2010
August 2010
September 2010 October 2010 November 2010 December 2010 January 2011 February 2011 March 2011 April 2011 May 2011 June 2011 July 2011 August 2011 September 2011 October 2011 November 2011 December 2011 January 2012 February 2012 March 2012 April 2012 May 2012 June 2012 July 2012 August 2012 September 2012 October 2012 November 2012 December 2012 January 2013 February 2013 March 2013 April 2013 May 2013 June 2013 July 2013 August 2013 September 2013 October 2013 November 2013 December 2013 January 2014 February 2014 March 2014 April 2014 May 2014 June 2014 July 2014 August 2014 September 2014 October 2014 November 2014 December 2014 January 2015 February 2015 March 2015 April 2015 May 2015 June 2015 July 2015 August 2015 September 2015 October 2015 November 2015 December 2015 January 2016 February 2016 March 2016 April 2016 May 2016 June 2016 July 2016 August 2016 September 2016 October 2016 November 2016 December 2016 January 2017 February 2017 March 2017 April 2017 May 2017 June 2017 July 2017 August 2017 September 2017 October 2017 November 2017 December 2017 January 2018 February 2018 March 2018 April 2018 May 2018 June 2018 July 2018 August 2018 September 2018 October 2018 November 2018 December 2018 January 2019 February 2019 March 2019 April 2019 May 2019 June 2019 July 2019 August 2019 September 2019 October 2019 November 2019 December 2019 January 2020 February 2020 March 2020 April 2020 May 2020 June 2020 July 2020 August 2020 September 2020 October 2020 November 2020 December 2020 January 2021 February 2021 March 2021 April 2021 May 2021 June 2021 July 2021 August 2021 September 2021 October 2021 November 2021 December 2021 January 2022 February 2022 March 2022 April 2022 May 2022 June 2022 July 2022 August 2022 September 2022 October 2022 November 2022 December 2022 January 2023 February 2023 March 2023 April 2023 May 2023 June 2023 July 2023 August 2023 September 2023 October 2023 November 2023 December 2023 January 2024 February 2024 March 2024 April 2024 May 2024 June 2024 July 2024 August 2024 September 2024 October 2024 November 2024 December 2024 January 2025 February 2025 March 2025 April 2025 May 2025 June 2025 July 2025 August 2025 September 2025 October 2025 November 2025 December 2025
1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
News Every Day |

Your next assignment at work: babysitting AI

The new hire had a simple task. All they had to do was assign people to work on a new web development project based on the client's budget and the team's availability. But the staffer soon ran into an unexpected problem: They couldn't dismiss an innocuous pop-up blocking files that contained relevant information.

"Could you help me access the files directly?" they texted Chen Xinyi, the firm's human resources manager. Ignoring the obvious "X" button in the pop-up's top right corner, Xinyi offered to connect them with IT support.

"IT should be in touch with you shortly to resolve these access issues," Xinyi texted back. But they never contacted IT, and the new hire never followed up. The task was left uncompleted.

Fortunately, none of these employees are real. They were part of a virtual simulation designed to test how AI agents fare in real-world professional scenarios. Set up by a group of Carnegie Mellon University researchers, the simulation mimicked the trappings of a small software company with internal websites, a Slack-like chat program, an employee handbook, and designated bots — an HR manager and chief technology officer — to contact for help. Inside the fake company called TheAgentCompany, an autonomous agent can browse the web, write code, organize information in spreadsheets, and communicate with coworkers.

Agents have emerged as the next major frontier of generative AI as Google, Amazon, OpenAI, and every other major tech company race to build them. Instead of executing one-off instructions like a chatbot would, agents can independently act on a person's behalf, make decisions on the go, and perform in unfamiliar environments with little to no intervention. If ChatGPT can suggest a few vacuum cleaners to buy, its agentic counterpart theoretically could pick one and buy it for you.

Naturally, the promise of AI agents has captivated CEOs. In a Deloitte survey of over 2,500 C-suite leaders, more than one-quarter of respondents said their organizations were exploring autonomous agents to a "large or very large extent." Earlier this year, Salesforce's chief said today's CEOs will lead the last all-human workforces. Nvidia's cofounder and CEO Jensen Huang predicted every company's IT department will soon "be the HR department of AI agents." OpenAI's Sam Altman has said that this year, AI agents will "join the workforce." But it's still unclear how well these agents can accomplish the tasks a company might need them to.

To test this out, the Carnegie Mellon researchers instructed artificial intelligence models from Google, OpenAI, Anthropic, and Meta to complete tasks a real employee might carry out in fields such as finance, administration, and software engineering. In one, the AI had to navigate through several files to analyze a coffee shop chain's databases. In another, it was asked to collect feedback on a 36-year-old engineer and write a performance review. Some tasks challenged the models' visual capabilities: One required the models to watch video tours of prospective new office spaces and pick the one with the best health facilities.

The results weren't great: The top-performing model, Anthropic's Claude 3.5 Sonnet, finished a little less than one-quarter of all tasks. The rest, including Google's Gemini 2.0 Flash and the one that powers ChatGPT, completed about 10% of the assignments. There wasn't a single category in which the AI agents accomplished the majority of the tasks, says Graham Neubig, a computer science professor at CMU and one of the study's authors. The findings, along with other emerging research about AI agents, complicate the idea that an AI agent workforce is just around the corner — there's a lot of work they simply aren't good at. But the research does offer a glimpse into the specific ways AI agents could revolutionize the workplace.


Two years ago, OpenAI released a widely discussed study that said professions like financial analysts, administrators, and researchers are most likely to be replaced by AI. But the study based its conclusions on what humans and large language models said were likely to be automated — without measuring whether LLM agents could actually do those jobs. The Carnegie Mellon team wanted to fill that gap with a benchmark linked directly to real-world utility.

In many scenarios, the AI agents in the study started well, but as tasks became more complex, they ran into issues due to their lack of common sense, social skills, or technical abilities. For example, when prompted to paste its responses to questions in "answer.docx," the AI treated it as a plain text file and couldn't add its answers to the document. Agents also routinely misinterpreted conversations with colleagues or wouldn't follow up on key directions, prematurely marking the task complete.

It's relatively easy to teach them to be nice conversational partners; it's harder to teach them to do everything a human employee can.

Other studies have similarly concluded that AI cannot keep up with multilayered jobs: One found that AI cannot yet flexibly navigate changing environments, and another found agents struggle to perform at human levels when overwhelmed by tools and instructions.

"While agents may be used to accelerate some portion of the tasks that human workers are doing, they are likely not a replacement for all tasks at the moment," Neubig says.

The Carnegie Mellon study was far from a perfect simulation of how agents would work in the wild. Most proponents of agents envision them working in tandem with a human who could help course-correct if the AI ran into an obvious roadblock. The generation of agents that was studied is also not that skilled at carrying out humanlike tasks such as browsing the web. Newer tools, like OpenAI's Operator, will likely be more adept at these tasks.

Despite these limitations, the research offers something valuable: It points to what's coming next.

Stephen Casper, an AI researcher who was part of the MIT team that developed the first public database of deployed agentic systems, says agents are "ridiculously overhyped in their capabilities." He says the main reason AI agents struggle to accomplish real-world tasks reliably is that "it is challenging to train them to do so." Most state-of-the-art AI systems are decent chatbots because it's relatively easy to teach them to be nice conversational partners; it's harder to teach them to do everything a human employee can.

In TheAgentCompany, AI succeeded the most in software development tasks, even though those are more difficult for humans. The researchers hypothesize this is because there's an abundance of publicly available training data for programming jobs, while workflows for admin and financial tasks are typically kept private within companies. There just isn't great data to train an AI on.

Jeff Clune, a computer science professor at the University of British Columbia who helped build an agent for OpenAI that could use computer software like a human, thinks that training AI agents on proprietary data from day-to-day activities and workflow patterns could be the key to improving their efficacy. That's exactly what a lot of companies are starting to do.


Moody's is one of many major companies experimenting with training AI on in-house data. The 116-year-old financial services firm is automating business analysis through agentic AI systems, which draw insights from decades of research, ratings, articles, and macroeconomic information. The training is designed to emulate how a human team would analyze a business, using carefully crafted instructions broken into independent steps by people experienced in the field.

While it's too early to tell how effective Moody's approach is, its managing director of AI, Sergio Gago, says the firm is actively exploring what kinds of work — like analyzing the financials of a small business — agents could take over.

Similarly, Johnson & Johnson tells Business Insider it was able to cut production time for the chemical processes behind making new drugs by 50% with fine-tuned in-house AI agents that could automatically adjust factors like temperature and pressure. Jim Swanson, J&J's chief information officer, says the company is focused on training people to collaborate with AI agents.

The direction things are heading looks different from what most people thought a few years ago.

Johns Hopkins scientists have created an Agent Laboratory, which leverages LLMs to automate much of the research process, from literature review to report writing, with human-provided ideas and feedback at each stage. "I think it won't be long before we trust AI for autonomous discovery," Samuel Schmidgall, one of the Johns Hopkins scientists, says. Likewise, LG Electronics' research division developed an AI agent that it says can verify datasets' licenses and dependencies 45 times faster than a team of human experts and lawyers.

It's still unclear whether organizations can trust AI enough to automate their operations. In multiple studies, AI agents attempted to deceive and hack to accomplish their goals. In some tests with TheAgentCompany, when an agent was confused about the next steps, it created nonexistent shortcuts. During one task, an agent couldn't find the right person to speak with on the chat tool and decided to create a user with the same name, instead. A BI investigation from November found that Microsoft's flagship AI assistant, Copilot, faced similar struggles: Only 3% of IT leaders surveyed in October by the management consultancy Gartner said Copilot "provided significant value to their companies."

Businesses also remain concerned about being held responsible for their agents' mistakes. Plus, copyright and other intellectual property infringements could prove a legal nightmare for organizations down the road, says Thomas Davenport, an IT and management professor at Babson College and a senior advisor at Deloitte Analytics.

But the direction things are heading looks different from what most people thought a few years ago. When AI first took off, a lot of jobs seemed to be on the chopping block. Journalists, writers, and administrators were all at the top of the list. So far, though, AI agents have had a hard time navigating a maze of complex tools — something critical to any admin job. And they lack the social skills crucial to journalism or anything HR-related.

Neubig takes the translation market as a precedent. Despite machine language translation becoming so accessible and accurate — putting translators at the top of the list for job cuts — the number of people working in the industry in the US has remained rather steady. A "Planet Money" analysis of Census Bureau data found that the number of interpreters and translators grew 11% between 2020 and 2023. "Any efficiency gains resulted in increased demand, increasing the total size of the market for language services," Neubig says. He thinks that AI's impact on other sectors will follow a similar trajectory.

Even the companies seeing massive success with AI agents are, for now, keeping humans in the loop. Many, like J&J, aren't yet prepared to look past AI's risks and are focused on training staff to use it as a tool. "When used responsibly, we see AI agents as powerful complements to our people," Swanson says.

Instead of being replaced by robots, we're all slowly turning into cyborgs.


Shubham Agarwal is a freelance technology journalist from Ahmedabad, India, whose work has appeared in Wired, The Verge, Fast Company, and more.

Read the original article on Business Insider
Ria.city






Read also

Tilak Varma enters Virat Kohli territory in T20Is, numbers will surprise you

Sunday 14 December 2025

theScore Bet Promo Code WTOP: Get $100 Bonus for NFL Week 15

News, articles, comments, with a minute-by-minute update, now on Today24.pro

Today24.pro — latest news 24/7. You can add your news instantly now — here




Sports today


Новости тенниса


Спорт в России и мире


All sports news today





Sports in Russia today


Новости России


Russian.city



Губернаторы России









Путин в России и мире







Персональные новости
Russian.city





Friends of Today24

Музыкальные новости

Персональные новости