Last spring, when Karim Lakhani began testing how ChatGPT affected the work of elite business consultants, he thought they’d be delighted by the tool. In a preliminary study of two dozen workers, the language bot had helped them finish two hours’ worth of tasks in 20 minutes.
“I assumed they, like me, would think, ‘Great! I can do so much more!” said Dr. Lakhani, a professor at the Harvard Business School.
Instead, the consultants had feelings of unease. They appreciated that they had done better work in less time. But ChatGPT’s quick work threatened their sense of themselves as high-skilled workers, and some feared relying on it too much. “They were really worried and felt like this was going to denigrate them and be sort of empty calories for their brain,” Dr. Lakhani said.
After these preliminary tests, Dr. Lakhani and his colleagues devised a larger, controlled experiment to measure how ChatGPT would affect more than 750 white-collar workers. That study, which is under review at a scientific journal, indicated sharply mixed results in the consultants’ work product. ChatGPT greatly improved the speed and quality of work on a brainstorming task, but it led many consultants astray when doing more analytical work.
The study also detailed workers’ varied feelings about the tool. One participant compared it to the fire Prometheus stole from the gods to help mortals. Another told Dr. Lakhani’s colleague Fabrizio Dell’Acqua that ChatGPT felt like junk food — hard to resist, easy to consume but ultimately bad for the consumer.
In the near future, language bots like OpenAI’s ChatGPT, Meta’s Llama and Google’s Gemini are expected to take on many white-collar tasks, like copy writing, preparing legal briefs and drafting letters of recommendation. The study is one of the first to show how the technology might affect real office work — and office workers.
“It’s a well-designed study, particularly in a nascent area like this,” said Maryam Alavi, a professor at the Scheller College of Business at the Georgia Institute of Technology who was not involved in the experiments. Dr. Alavi, who has studied the impact of new digital technology on workers and organizations, also noted that the study “really points out how much more we need to learn.’’
The study recruited management consultants from Boston Consulting Group, one of the world’s largest management-consulting firms. The company had barred its consultants from using A.I. bots in their work.
“We wanted it to involve a large set of real workers working on real tasks,” said François Candelon, a managing director of the company who helped design the experiments.
The volunteers were split into two groups, each of which worked on a different management-consulting problem. Within each group, some consultants used ChatGPT after 30 minutes of training, some used it with no instructions and some did not use it.
One of the tasks was to brainstorm about a new type of shoe, sketch a persuasive business plan for making it and write about it persuasively. Some researchers had believed only humans could perform such creative tasks.
They were wrong. The consultants who used ChatGPT produced work that independent evaluators rated about 40 percent better on average. In fact, people who simply cut and pasted ChatGPT’s output were rated more highly than colleagues who blended its work with their own thoughts. And the A.I.-assisted consultants were more than 20 percent faster.
Studies this year of ChatGPT in legal analysis and white-collar writing chores have found that the bot helps lower-performing people more than it does the most skilled. Dr. Lakhani and his colleagues found the same effect in their study.
On a task that required reasoning based on evidence, however, ChatGPT was not helpful at all. In this group, volunteers were asked to advise a corporation that had been invented for the study. They needed to interpret data from spreadsheets and relate it to mock transcripts of interviews with executives.
Here, ChatGPT lulled employees into trusting it too much. Unaided humans had the correct answer 85 percent of the time. People who used ChatGPT without training scored just over 70 percent. Those who had been trained did even worse, getting the answer only 60 percent of the time.
In interviews conducted after the experiment, “people told us they neglected to check because it’s so polished, it looks so right,” said Hila Lifshitz-Assaf, a management professor at Warwick Business School in Britain.
Many consultants said that ChatGPT made them uneasy about how the tool would change their profession and even their sense of themselves. Nearly three out of four participants told the researchers that they worried ChatGPT use would cause their own creative muscles to atrophy, said Mr. Candelon of Boston Consulting Group.
“If you haven’t had an existential crisis about this tool, then you haven’t used it very much yet,” said another co-author, Ethan Mollick, a management professor at the Wharton School at the University of Pennsylvania.