OpenAI’s big announcements highlighted the tech industry’s desire to give bots more decision-making capabilities
Despite worries about the technology’s limits, the tech sector wants to give bots greater decision-making power, as seen by big announcements from OpenAI and Salesforce on Thursday.
Why it matters: Increasing the autonomy and reasoning power of genAI has the potential to boost productivity while potentially raising danger.
o1 (formerly code-named Strawberry), a new model that stops to consider several responses before beginning to answer a query, was revealed by OpenAI on Thursday.
Specifically in the areas of math, physics, and coding, OpenAI claims that the outcome is much superior at managing intricate questions.
Salesforce, on the other hand, unveiled Agentforce, an initiative to transition from using genAI as a copilot to enhance human productivity to a setting where autonomous AI agents are permitted to work independently, although under certain constraints.
Enlarge: According to early users, these more potent AI systems are beginning to produce results.
The legal division of Thomson Reuters, whose CoCounsel section had early access to o1, reports that it has seen the new model performing better on jobs requiring deeper analysis and stringent adherence to guidelines and data in particular documents.
“CoCounsel product head Jake Heller told Axios that its meticulous attention to detail and thorough thinking enables it to do a few tasks correctly where we have seen every other model so far fail.”
Heller said that although answers take longer, “professionals want the most thorough, detailed, and accurate answer —and they would much rather wait for it than get something wrong and quick.”
Wiley said that the technology is enabling it to respond to more queries without involving people. Wiley has been using an early version of Agentforce.
During a Salesforce event on Thursday, Kevin Quigley, a senior manager at Wiley, said, “When you compare the agent to our old chatbot, we’ve seen an over 40% increase in our case resolution.”
What they’re saying: Salesforce executives and other business leaders assert that stringent restrictions on the scope and authority of AI agents’ decision-making are essential to guaranteeing security.
“You don’t want to just give AI unlimited agency,” Paula Goldman, chief ethical and compassionate usage officer at Salesforce, said to Axios. It should be constructed using a set of boundaries, benchmarks, and tried-and-true procedures. If you don’t, you’re exposing your business to significant danger and won’t receive the kind of outcomes you want.”
While using AI agents for low-stakes activities makes sense, EqualAI CEO Miriam Vogel issued a warning: “We do not want to go into AI agents prematurely in areas where advice might effect someone’s benefits, safety, etc.According to Vogel, doing so “is inviting liability and potential harms.”
Dorit Zilbershot, vice president of platform and AI innovation at ServiceNow, told Axios, “We feel that’s going to be a revolution with AI agents having access to the enterprise data and having this intelligence with their reasoning and the planning capabilities.”
Earlier this week, ServiceNow revealed its own AI agent push. “But we know that with that power comes a lot of responsibility,” she said. One of the main safeguards implemented by ServiceNow is the need that human approval be obtained for every proposed action taken by an AI agent. Businesses may decide to have an agent function autonomously after they are satisfied that the agent is doing appropriately, according to Zilbershot.
Indeed, but: There’s a risk that autonomous robots may just start fighting with one another.
Phil Libin, co-founder and former CEO of Evernote, told Axios that “many proposed AI agents use cases don’t make sense because they will set up arms-race conditions that drive up costs for everyone, but only benefit the arms dealers.” This is true even if we manage bias and hallucinations to a reasonable level.
“LLMs are incomplete, but they can be an important part of a system that has other, non-LLM ways of grounding them to reality and values,” Libin said.
Between the lines: HuggingFace CEO Clement Delangue claims that even OpenAI’s usage of the word “thinking” to characterize what is occurring prior to o1 responding is inaccurate.
Delangue said on X that “an AI system is not ‘thinking,’ it is ‘processing.'” “Giving the false impression that technology systems are human is just cheap snake oil and marketing to fool you into thinking it’s more clever than it is.”
In summary, experts advise that the industry solve AI’s prejudice and tendency to fabricate facts before granting it more autonomy.