Mind and Iron: OpenAI finally releases a new GPT. Will it change everything?
Also, Google AI becomes our street buddy. And the future of dating is...AI concierges?
Hello and welcome back to Mind and Iron. I'm Steve Zeitchik, veteran of The Washington Post and Los Angeles Times and lead scientist at this newsy geodesic dome.
The future is coming at us hard, with tech companies devising new products and features that seem exotic in the way Instagram and Netflix once seemed exotic. We make sense of them and their impact, cutting through the hype to tell you what you need to know. Important work, but alas not free work. Please consider pledging a few dollars to our mission.
The big news in future-land this week: OpenAI is finally coming out with a new GPT (the model that powers ChatGPT and possibly many tech interactions hence). Salivatory anticipation has been the mood in the 14 months since the company’s last GPT drop. So what’s the reaction now? What exactly will it change?
Not to be outdone, Google — perpetual laggard in the AI race — emerged a day later showcasing its own vision of a machine-intelligence future, with a demo of an employee walking through Google’s London offices accompanied by an “AI Agent.” They’ve got big plans for us too.
Finally, AI dating concierges, could they be a thing? Also, what on earth are AI dating concierges? The visionary founder of Bumble thinks these digital advance men are coming. We’ll hear from the executive — at least we think it’s the executive.
First, the future-world quote of the week.
“her”
—OpenAI chief Sam Altman, perhaps overestimating what his new GPT is capable of
Let's get to the messy business of building the future.
IronSupplement
Everything you do — and don’t — need to know in future-world this week
OpenAI says they’ve brought the “Her” movie to life; Google as your new walking buddy; could we soon put the data in dating?
1. ON MONDAY OPENAI RELEASED A NEW GPT FOR THE FIRST TIME IN 14 MONTHS.
You can expend a lot of words analyzing what the model, called GPT-4o, can and can't do. Or you can capture it with three X posts.
On one side you have Gary Marcus, one of the foremost experts on AI out there. He believes this new version has some slick features but doesn't change the game in a meaningful way — it still makes a lot of the mistakes its predecessors did, it has the same underlying approach to solving problems its predecessors did, it fails to transform our lives in any way its predecessors didn't. Mostly its innovation lies in being “multimodal” — wonkspeak for the ability to recognize your voice and some objects and video.
On the other side we have a low-profile hypester who says GPT-4o augurs a Spike Jonez-y world of intelligent and autonomous human-like creatures. I know which side I’m with. Done and done.
But OK, we'll still spill a few words on how to think about this.
Some background first:
For the past year OpenAI has dangled — and fanboys have thirsted for — what has come to be called GPT-5. Why would this be significant given that there have already been four GPTs, you reasonably ask? Why would this matter when movie sequels tend to get bad after the third installment and tech iterations show only incremental improvement as the numbers tick higher? (The first few iPhones: fantastic. The iPhone 15: better charging.)
Because GPT-5 has promised a breakthrough, not just in how much the AI can handle but the very nature of what it handles.
All the GPTs until now (the latest of which also power ChatGPT) have basically repackaged what humans have already done. Elegant stuff. And impressive displays of processing power. But not society-shaking. These large language models, or LLMs, can write a crisp thank-you note because they have access to a million thank-you notes that have come before; they can generate a decent replica of a still-life painting because they've ingested every Monet he and his many imitators have ever created.
But complex reasoning and analysis — the kind of deep problem-solving so innate to humans even your doofus brother-in-law does it — has eluded GPT and all rival LLM products. (LLMs are the relatively new AI approach that rely on absorbing massive amounts of existing text and images to synthesize something new at your request — it's an extremely sophisticated version of the text-prediction feature on your phone.)
If a machine could do that — if it could reason like a human, assessing numerous variables and reaching meaningful conclusions — it would truly be transformative. Then you can combine that ability with all the raw calculation power to...solve climate change? become a true human-like friend? propose the first viable plan for Middle East peace? All of it?
That would be what scientists call AGI, or artificial general intelligence — a computer that can think like a human but move at the hyperspeed of a machine. (It could ultimately lead to something even more profound — an exponential multiplication of itself, or "superintelligence," in the thinker Nick Bostrom's famous coinage about a machine that surpasses all humans capabilities. Superintelligence basically makes a machine into a god — it’s what Scarlett Johansson was describing at the end of "Her." At that point the machine is potentially capable of either destroying worlds or building them. But one sci-fi development at a time.)
Anyway for all the talk about GPT-5 and how it will put us on the threshold of AGI, there's a question whether large language models are inherently capable of this at all. Of whether this AI approach of simply ingesting massive amounts of data, no matter how souped up the training model, is indeed capable of giving us this AGI-type reasoning. Or whether, instead, a new approach is required in order to break through.
Marcus has been sounding the bell for the latter. He’s not alone.
A few weeks ago we talked to one of the pioneers of AI, Ray Perrault, who literally helped lay the foundation for Siri. Perrault just oversaw Stanford’s annual AI Index, and he agreed fundamentally with Marcus — we're already near the limit of what LLMs can do. AGI isn't possible with a large language model.
As he persuasively put it:
“There seem to be in the mill maybe one more generation of models out of OpenAI and Google and maybe Meta that will cost an astronomical amount of money and will do somewhat better. But my guess is it still won't do hard math and planning problems. How do you get to the next stage? That’s the $64,000 dollar technical question.” To get anywhere close to AGI, he said, we’ll have to go beyond LLMs to a new approach — or at least creatively connect it to one of the old ones.
Yet that hasn't stopped Altman from strongly hinting otherwise. He regularly drops references to AGI as if it's a given in the coming years with the LLMs his company specializes in. In fact, a few minutes after he livestreamed the announcement of GPT-4o he went with this.
That, as the kids say, is a vibe. Anyone who fires up ChatGPT and expects it to have anything close to the human-like mental nimbleness of Johansson's character is in for more of a letdown than those high-waisted pants.
(Also, as a stan of that movie I am compelled to note the title may in fact be referring to Joaquin Phoenix’s very human ex-wife, played by Rooney Mara. Freudian slip?)
On its site OpenAI made an almost equally big boast as Altman did on X, proclaiming that GPT-4o "can reason across audio, video and text in real time."
This is a cheat. The statement implies that the first part — the "reasoning" — is the innovation. It's not. There's no evidence that this new GPT can reason any better than previous GPTs. What's new is the second clause — the audio and video. 4o can understand it when you speak or show it video, and it can increasingly respond in kind. That's an improvement, but to the usability, not the underlying mission. It would be like telling someone you can make their 2007 Chevy Blazer fly — and then trotting out a keyless start. Nice addition. But it won't change how they use the thing.
As for the “real time” point, no large language model is ever up to date in "real time," for the simple reason that it takes (real) time to train a model. Another way it ain't ScarJo.
OpenAI also showed off some conversation-based translation skills, which is nifty but not limited to OpenAI and may still be a little stilted and error-prone.
Altman was more sober in a blog post, focusing on ease-of-use more than anything else. But what might be most striking in all this is, as Marcus notes, what wasn’t said — there was no mention of GPT-5, which suggests they're not getting closer to AGI and in fact may want us to forget about it for the time being.
I texted Perrault on Tuesday to see if he thought GPT-4o in fact — far from lighting a path to AGI — showed how distant we are from it. He basically agreed.
“The addition of voice is nice but doesn't solve any fundamental issues,” he wrote succinctly.
All of this, in other words, just makes Open AI's GPTs easier to use, but doesn't change what we use them for.
The reality may have been summed up best by Wharton professor and AI author Ethan Mollick.
Good for the company, not so transformative for society.
But take note: this doesn't mean AI isn’t hurtling us into new ways of interacting and living — it is. As the tech gets easier to use with additions like this voice feature, LLMs will find their way into more of our apps, more of our routines, more of our consciousness. Don't be surprised if we're talking to it a lot more in the coming months and years to learn about or interact with the world (or to help us with someone who speaks a foreign language).
But when it comes to the reasoning tasks we want done — never mind the fresh ideas we want devised — this week's announcement didn't get us to that long-desired mistress. It showed us how far from, well, her we really are.
2. HAL, KNIGHT RIDER, ZIGGY FROM “QUANTUM LEAP” — pop culture has been manifesting AI assistants for decades.
Sometimes they’re helpful, sometimes they’re wisecracky, sometimes they’re helpful and wisecracky, sometimes they seem helpful and wisecracky until they read your lips and decide to disconnect the life-support system sustaining a crew of astronauts. (That seems unlikely in our scenario.)
In pretty much all of these visions, the assistant shares a set of basic traits that would be helpful to us. They’re nimble in understanding language, human in conveying thoughts and superlatively analytical in assessing situations, especially compared to the humans.
It’s easy to start seeing the faint outlines of that vision in Google’s newly released demo of its AI assistant, which is running on its Gemini model and is part of what it’s calling “Project Astra.” Though in modern tech terms we’d really just kind of call it a smart search engine with some audio-video recognition abilities; it falls short on a number of the above traits, especially that last analytical one.
Check out this short video Google put out this week as part of a presentation at the Shoreline Ampitheatre near its offices in Northern California.
The demo is clearly meant to showcase the range of ways AI assistants might help the early-adopter audience: it can recognize speech and objects, it has a rudimentary understanding of code, it can come up with some limited creative ideas. (In use-case terms this could be segmented into fun home uses, fun work uses and fun lark-y uses.)
What’s missing, of course, is a lot of action — could it reliably do high-level coding, orchestrate a home makeover or engage in truly original creativity? Or can it just give you the showroom floor sample of all of these? (Never mind plan a trip, execute a job assignment or aid with a life decision.) That’s what we might actually need an assistant for, and there’s nothing about large language models in general or Project Astra specifically that yet suggests it can do this.
As I watched this video I imagined a semi-helpful voice accompanying me on a walk around a city (paired with XR glasses, a much better form function than a phone) and enjoying its commentary but hardly leaning on it to, say, buy tickets to a show that fit a certain set of criteria or finding a doctor that met my exact needs. For that I'd require human skills both in my search and in what it turned up (eg, I’d want to glance at reviews before taking action). These assistant-like tasks have consequences to us, and shutting ourselves off inside an AI walled garden is cutting us off from a lot of fruits that lie outside.
(I will say I was impressed with the pseudo-reasoning behind the “Schroedinger’s Cat” moment. It’s a parlor trick, but a cool one.)
The Shoreline presentation also saw Google execs emphasize the company’s increasing pivot to the “AI overview” results in its search engine, which you may have already started to see when you innocently type in a search query (and then have to scroll down after it clearly missed the point of your question).
The Platformer newsletter had an excellent take on the toll this could take on the health of many sites, (never mind on our knowledge and information-ferreting skills) as traffic to them starts to dry up.
My first reaction is — “Google, Twitter, the old Candy Stand gaming site, why do all good things on the Internet have to go away?” But the truth may be muckier. The Internet has long been about minimizing the barriers between us and information — from message boards and scattered Web sites eliminating the need to seek out a print encyclopedia to search engines giving us a way to quickly find those Web sites to, now, AI eliminating that step of going to those Web sites at all. We’d be better off fighting gravity than shadowboxing that trend.
That said, there’s something ironic about Google skimming off a whole ecosystem of knowledge that, while they empowered it for years with their search, really wasn’t created by them and isn’t theirs in any meaningful way. They can do what they want; it just feels icky.
Actually they can’t do what they want, and it remains to be seen what legal challenges await to stop them. We are already standing by for the unfolding of the New York Times copyright lawsuit against OpenAI for plundering its articles — a lawsuit joined by a slew of other media companies. I imagine some of the Web’s biggest publishers will initiate their own action here too, as traffic goes down when the AI Overview answers go up.
I’m also hopeful Google will work out the kinks. They’re not small.
When I wanted for example to clarify for the reference above what anti-human actions HAL perpetrated in “2001”— a movie I’ve seen twice but whose plot details I’d forgotten — I entered in a Google search box “what does hal do in 2001,” which then prompted the AI Overview to pop up this decidedly unhelpful answer.
“In 2001: A Space Odyssey, HAL (Heuristically Programmed Algorithmic Computer) is a sentient computer that controls the Discovery One spacecraft's systems and interacts with the crew. HAL is responsible for the spacecraft's mechanical and life support systems, and has ‘eyes’ placed around the ship.”
Um, thanks?
Fortunately I could just scroll down to the human-created Web sites — notably, Wikipedia — which told me what I really needed to know in about three seconds; I used human intuition to bypass the supposedly intelligent machine and reach the human-written answer.
Because as much as Google executives, whether with these overviews or Astra Assistants, think they’re helping us, they’re also removing a human component that could in many cases better serve our needs; they’re turning Google from a tool to reach the human into a human-less end unto itself.
I don’t think we should say that machine intelligence fundamentally won't be able to help one day with stuff like this — with understanding what we need to enhance our knowledge of the world or even to become an equal partner as we move around the office or neighborhood.
But it is fair to wonder why there’s such a rush to release products that can’t really do this when we already have perfectly good tools that can.
3. FEW LIFE REALMS IN THE PAST DECADE HAVE BEEN AS INFLUENCED BY TECHNOLOGY AS DATING.
Some 80 million American adults— one in three — date by swiping, according to a Pew study, logging in to a code-heavy platform to see what’s out there. There an algorithm chooses their pool of prospective mates like a matriarch at a cotilion ball, making literally true the idea that we trust the machine to decide what’s best for us.
But there is room for tech to have even more of a role, and last week Bumble founder Whitney Wolfe Herd noted one of them. She described how we might soon employ “AI concierges” as one of our dating stratagems — a kind of cannon fodder tactic in which the first wave of soldiers dies so those behind can live. Only in this case, it's the machines that are dying. On bad dates.
She was scant on details, but Wolfe Herd seems to believe that after matching we’ll send our AI to meet the partner (‘s AI), they’ll converse, and report back. If we each like what they say, we’ll then move on to the next step (endlessly texting before having one video call and never speaking again).
“There is a world where your dating concierge could go and date for you, with another dating concierge,” Wolfe Herd said at the Bloomberg Technology Summit last week.
(The nature of their interaction would be unclear; it would be weird to be sitting at the bar eavesdropping on two AIs awkwardly asking each other about how many siblings they have.)
On the surface Wolfe Herd’s idea can seem appealing. “We’ve all been on the bad dates. This eliminates (some of) them,” we say to ourselves.
And the truth is as funny as an AI concierge sounds it doesn’t really mark that big a change from what we do now with algorithms. The code already eliminates a bunch of bad matches. Now we have the concierge come along and discard a whole fresh slew.
What actually doesn’t work is the effectiveness part — namely, it isn’t. For all the ways this can seems more efficient — let the AI do the work while you’re off doing something better! — what would almost certainly happen is the opposite, making an already crowded and clouded dating pool that much more packed and murky. Back when these sites were taking off last decade there was briefly the illusion that expanding the available selection would make people find a partner that much more quickly.
But anyone who’s used these platforms will tell you how abjectly untrue that is. The algorithm is wonky, it can be manipulated, the sheer amount of choice is psychically overwhelming. Even dating prospects that might hold promise become lost under the skyscraper pile of detritus, or we simply don’t have the energy for them because we’re so burned out from all the other encounters. Rather than taking the pre-digital reality of a small number of intimate, high-potential dates and adding to them, dating apps have just turned most dates into something superficial and forgettable.
What the dating-app era has given us is the romantic equivalent of the theory of induced traffic demand — build another dating lane, you’re less likely to get where you want to go. That same Pew survey about everyone using dating apps found that only one in ten partnered adults met that way, which sums up the effectiveness right there.
Not for noting are in-person dating meetups continually on the rise — they’re simply more manageable.
Given all this it might seem tempting to have the AI Concierge pick up the load, but all that’s really doing is expanding the pool even further (since now you can plausibly go on even more dates). That’s great for the dating-app companies, who get paid when usage goes up. It’s bad for us, whose odds of finding a partner go down.
So let them add their AI concierges, their digital wingmen, their algorithmic yentas. We’ll be able to meet a lot more people. They’re just not any more likely to be our person.
4. FINALLY THIS WEEK, I’D BE REMISS IF I DIDN’T CLOSE WITH SOME OTHER OPENAI NEWS — THE DEPARTURE OF TWO SIGNIFICANT SAFETY-MINDED EXECUTIVES AT THE COMPANY.
On Tuesday morning chief scientist and engineering-brains-behind-the-operation Ilya Sutskever announced he was leaving the company. You may remember Sutskever as the man who clashed and then reconciled with Altman during the great boardroom drama of November 2023 that was centered on safety issues.
Sutskever was widely seen as one of the biggest safety bulwarks at an OpenAI that has not always lately exhibited an interest in same. Sutskever said he was "confident that OpenAI will build AGI that is both safe and beneficial," which is the kind of comment that takes away your confidence.
(Altman wrote a quick hit on Sutskever upon the news he was leaving, paying him pro forma homage. A wag on X asked ChatGPT to do the same thing in Altman’s tone and the results were eerily similar. Is it that AI is getting better at sounding like a human? Or that Altman, in giving perfunctory thanks to a frenemy, sounds like an AI?)
That night, an even bigger surprise: Sutskever’s colleague on the team overseeing safety, Jan Leike, posted on X, “I resigned,” with nothing further. The team, known as “superalignment,” was charged with answering the question “How do we ensure AI systems much smarter than humans follow human intent?”
Given Altman’s track record of full-steam-ahead and given this duo’s mission of slowing down that train, not exactly heartening news.
The Mind and Iron Totally Scientific Apocalypse Score
Every week we bring you the TSAS — the TOTALLY SCIENTIFIC APOCALYPSE SCORE (tm). It’s a barometer of the biggest future-world news of the week, from a sink-to-our-doom -5 or -6 to a life-is-great +5 or +6 the other way. Last year ended with a score of -21.5 — gulp. Can 2024 do better? So far it’s been pretty good. This week it’s a….bit of a nosedive.
OPENAI DOESN’T SEEM AS CLOSE TO AGI AS WE THOUGHT: +2
GOOGLE’S PIVOT TO AI IS MARGINALIZING SOME VALUABLE HUMAN ELEMENTS: -2.5
AI DATING CONCIERGES: Now with 33 percent more ghosting! -2
OPENAI SAFETY EXODUS: -3.5