Mind and Iron: AI is hitting a wall its champions are in denial about
A leading Stanford report with an eye-opening message. Also, shine on you crazy lab diamond?
Hi and welcome back to another sparkling episode of Mind and Iron. I'm Steve Zeitchik, veteran of The Washington Post and Los Angeles Times and lead merchant of this journalism jewelry shop.
If you’re new here — welcome. Every Thursday at Mind and Iron we strive to give you the important news on AI, technology and the future with a gaze toward the human — ie, the stuff Big Tech doesn’t always want us to look at. (You can browse our archive here.)
Companies make a lot of promises that our lives will soon be easier, more efficient and more enjoyable. So we test their claims. Please consider pledging a subscription so we can keep doing that.
We will be away next week, but back strong the week after. In the meantime, some chewy grist for you today.
This week the Stanford Institute for Human-Centered Artificial Intelligence released its annual report. The “AI Index,” as it’s known, is one of the most comprehensive snapshots of our machine-intelligence state-of-play, put together by some of the great research names in the biz. One big finding stood out. We tell you what it is and talk to one of the lead researchers to break it down.
Also, a fun one we've been anticipating for awhile: The lab-grown diamond. Food tech is changing how we eat; transit innovations are overhauling how we move. But lab-grown diamonds may trump them both — it's transforming how we perceive value.
Finally, the White House has put in a revered humanist sort to oversee its new AI Safety department. So why are some tech types panicking?
First, the future-world quote of the week.
“There are things that Large Language Models simply shouldn’t be doing. And the question is ‘does the model know that?’
—AI pioneer Ray Perrault, one of the backers of Stanford’s 2024 AI Index, on how the tech is being misused
Let's get to the messy business of building the future.
IronSupplement
Everything you do — and don’t — need to know in future-world this week
The limits of our AI; the diamonds that will change jewelry; the White House's worry wunderkind
1. SINCE CHATGPT CAME ON THE SCENE NEARLY 18 MONTHS AGO, CONSUMER AI HAS SEEMED TO MOVE IN ONE DIRECTION: UP.
Chatbots got better, image-generation kept improving, and in February we even got a taste of AI-driven video.
But one of the leading experts on AI says don't be fooled by these snazzy facades. There are fundamental limits to what the current generation of AI — based on so-called “large language models” — can really do. And we’re fast approaching them.
In short, AI will still dazzle us. But it may not be thinking for us in the way the hype cycle has conditioned us to believe.
Or in the words of Stanford University’s newly released AI Index, "AI has surpassed human performance on several benchmarks, including image classification, visual reasoning, and English understanding. Yet, it trails behind on more complex tasks like multitask language comprehension and visual commonsense reasoning.”
Working with Stanford, the AI pioneer Ray Perrault just co-chaired the Index’s steering committee. The Index not only conducts its own research but looks at dozens of other studies, rolling them into one giant meta-ball of knowledge.
This year, for instance, it found that 52 percent of people globally are now nervous about AI (up from 39 percent last year); funding specifically for generative AI has octupled to more than $25 billion; and the U.S. is now producing nearly twice as many AI models as China and Europe combined.
But among its most notable conclusions is an anti-growth one: the large language models that have fueled the AI boom are near a ceiling of what they can do. Making strides isn’t just a matter of more time and investment — it’s a matter of needing a whole new (non-LLM) approach.
One analogy: A medicine that controls symptoms of a particular disease can be made marginally more powerful. But barring a research breakthrough, it won’t be able to cure the disease. A ceiling exists here, Perrault and other researchers say. And its rupture will require more than just a steady hammer.
“There seem to be in the mill maybe one more generation of models out of OpenAI and Google and maybe Meta that will cost an astronomical amount of money and will do somewhat better,” said Perrault when I talked to him this week. “But my guess is it still won't do hard math and planning problems. How do you get to the next stage? That’s the $64,000 dollar technical question.”
Perrault has been at this for decades — he helped create the AI that laid the basis for Siri — and currently works as a computer scientist at SRI, the independent nonprofit institute originally created by Sanford.
The reason he says this ceiling is happening is that large language models and the generative AI systems they power are very good at ingesting large amounts of data and producing something based on the next most likely event. This is why cover letters, images and basic coding are within their reach; the system has seen these permutations so many times it can come up with a reasonable facsimile of what they should look like.
Here’s an example of the progress Midjourney’s LLM has made with Harry Potter over the last two years, per the report.
But LLMs can’t problem-solve in the way human intuition can. That’s why the systems play chess — never mind AlphaGo — badly relative to a grandmaster. They’re simply not able to do the complex reasoning required to be one.
Or take autonomous driving. The AI in them is decidedly not of the LLM variety, because when a hundred variables converge that inform whether and when you should accelerate, swerve, hit your horn or slam on the brakes in a tricky traffic situation, you need humans’ complex reasoning (System 2, according to Daniel Kahneman’s famous rubric, in which System 1 is quick and near-automatic thinking and System 2 is deep, analytical, effortful thinking). Simply being able to quickly recall every potential near-miss car accident of all time — System 1 — won’t help much at a time when complex decisions must be made.
And any hope that LLMs can learn from mistakes and adjust is proving futile. On grade-school math, for example — a popular test for basic reasoning — a University of Illinois at Urbana-Champaign study found that the rate of error went from 4.5% to 8.5% to 11% when it tried to self-correct.
“It is generally understood that LLMs like GPT-4 have reasoning limitations and can sometimes produce hallucinations. One proposed solution to such issues is self-correction,” But LLMs may not “in fact be capable of this kind of self-correction,” the Index said.
Not for nothing did the report find that the number of “AI Incidents” — that is, misuse and mistakes — tripled from five years ago (though of course this is in part because AI is used a lot more). The LLMs are being deployed in some situations they simply shouldn’t (see Perrault’s quote at the top of the newsletter).
All of this is why skeptics like Gary Marcus (who regular readers will know from our past interviews) have been so vehement about current AI limitations. They don’t say LLMs are a parlor trick exactly. But they say they can be a kind of advanced regurgitation that, while helpful in plenty of situations, isn’t close to reason as we know it (or in some cases need it).
To solve this, Perrault says, LLMs may need to be combined with an older AI method known as a “symbolic” approach, in which a system is not scanning and synthesizing data but hard-programmed (by humans) on what path to take based on symbolic representations. “If you see a Stop sign, brake,” eg.
That system has its own limitations — it doesn’t allow for the kind of flexibility and dynamism that leads to, say, ChatGPT pulling off a passable Shakespeare. But it does use a kind of intelligence that LLMs don’t. Perrault thinks LLMs may need to be connected with the symbolic approach, at least for a little while, before a new system can emerge that engages in the kind of decision-making that would be truly transformative.
“LLMs are an amazing approximation of a System 1, and that’s incredible and awe-inspiring; we shouldn’t forget that,” he said. “But how do we connect to a System 2? How we train a System 2?” He added, “I think that’s the big question and I don’t see how we’re going to get much past the current stage before making significant progress on this question.”
[Btw the reason there’s only a little more juice to be squeezed out of the LLM orange (and it will take a lot more money) is simple: the existing models have been trained on so much data that, to show improvements, you’re going to have scavenge ever-more obscure corners. And processing is already so powerful you’re going to have to spend a lot of resources to get them even just slightly faster and thus able to show even modest improvement.]
Now, all of this doesn’t mean AI won’t be a significant part of our existence in the coming years. LLMs can still be applied to many aspects of life they haven’t been. And they can improve the areas they’re already in — one reason that certain media deployments will grow, for instance. The idea of an AI ceiling is a little deceptive — it’s low for new kinds of thinking, not for applications of the kinds of thinking it’s already good at.
But Perrault’s perspective does give the lie to the idea that AI — OpenAI’s hype to the contrary — will be taking over a lot of advanced human decisionmaking anytime soon.
Indeed, there have been plenty of blithe predictions of computers reaching the holy grail of human-level artificial general intelligence, or AGI, in the relatively near term. Watch how casually it’s tossed around as almost a given in 2030 in this older video with Elon Musk and OpenAI president Greg Brockman. Those prospect looks a lot dimmer in the current light.
A major impact on our lives? Sure. An evolution into another life? Let’s hit pause on that robot button.
2. IF YOU’VE EVER PASSED THROUGH NEW YORK’S DIAMOND DISTRICT — or just been around someone who got engaged — you’ve been exposed to the idea that diamonds are worth a lot and everyone should want them.
You also likely have heard about lab-grown or synthetic diamonds, the process of creating in a room over a couple months what takes the earth hundreds of millions of years — and which raises the uncomfortable question of why the hell the crushed carbon is worth anything in the first place.
I’ve been fascinated by lab-grown diamonds for some time, because they lay bare not just one but several essential questions about technology and our world 1) can humans truly duplicate what nature produces (see under: lab-grown meats) 2) is there some inherent philosophic meaning to seeking out the natural version 3) and finally and most practically, will a longstanding global business worth $100 billion be toppled by upstarts? That is, can a supply chain that arcs from mining in Botswana to cutting and polishing in Surat, India, to selling in New York be utterly disrupted?
First, some history: If you've seen Jason Kohn's excellent 2022 Showtime documentary “Nothing Lasts Forever” you know how insanely inflated price of diamonds have been since long before some crafty lab-coat types figured out how to produce them.
A diamond isn't actually that rare. But back in the day, mining giant DeBeers manipulated supply so they appeared to be — and then marketed the rarity as a sign of everlasting love. The result all these decades later is not just massively inflated prices for run-of-the-mill stones but an entire economy — no, ideology — based on size and clarity. Rants about the wedding-industrial complex are not this Substack’s province. But when you've been thus red-pilled, the scam becomes clear, as it were.
Now, lab-grown diamonds have existed in some form since the middle of last century. But it’s only the last couple decades when the tech began to really improve, and only the last few years when sales really began to increase, in part thanks to the FTC’s declaration in 2018 that lab diamonds were no less “real” than the mined kind. You need the most high-end equipment to be able to tell that a diamond isn’t mined, and even then.
(Lab-grown diamonds are created by several methods, including “HPHT,” which uses machines to re-create the earth’s heat and pressure, and “CVD,” which cultivates a “diamond seed” with a carbon-filled gas. These things are actually grown, like tomatoes, which just seems weird to contemplate.)
So people began buying lab-grown diamonds, from startups with names like Gemesis, Vrai, Diamante and Apollo Diamond, run in many cases by opportunistic physicists and geologists. By 2021 lab diamonds’ market share had doubled compared to 2017.
Still, as a percentage of the diamond sales market lab-growns have remained small, hovering only at about 4 percent. Consumers have either been largely unaware of the stuff or have been convinced that price alone is what makes diamonds desirable, a kind of reverse Homer-Simpson-ordering-champagne. The legacy industry has also embarked on anti-lab campaigns, emphasizing how those diamonds are not mined from the hollows of the earth (and also don't involve mass worker exploitation, but a separate issue). This has kept lab demand relatively low.
But lately we’ve seen a cultural shift, subtle but perceptible. And it may push us into a world where people buy lab-grown diamonds as often as the mined kind — and, ultimately perhaps, even stop making diamonds such a vaunted status symbol at all.
In recent months you may be seeing more ads for lab-grown diamonds, like a campaign launched last summer by the lab company Pandora. An M&Ms Super Bowl ad in February played off lab-grown diamonds (the candies were crushed to make the gems.)
Forbes a few weeks ago ran a Consumer Reports-ish guide on which ones you’d want, rating them on quality, value and environmental grounds. (Some lab-grown diamonds burn more fossil fuel than others.)
Luxury bellwether The Robb Report just published a story that noted that “LGDs have become ubiquitous in America, where it’s no longer uncommon to find women shopping for groceries adorned in 4 carats of colorless diamond studs.”
Yeah, you’re starting to get the sense that the age of mined diamonds is passing.
I called Kohn to ask him where he thinks we are in the trajectory of this longstanding business/hustle giving way to tech-enabled competitors.
"If the natural-diamond industry’s marketing was very smart for a very long time, what we're seeing is that these systems are not able to maintain parity with the lab-grown industry," he said. "The lab industry is very savvy at slowly eating away at them."
He said he thinks mined diamonds could go very niche — centered on the actually rare and high-end — though also wonders if long-term, at least, it might come back around. “It’s possible in 20 years the next generation is told that 'diamonds don't have to come from a lab, they come from the earth' and they say "wait it can?" and it sets the whole thing in motion again," he said, in part "because ultimately it's not about where diamonds come from — it's about myths we want to believe in. ”
For now, though, we seem to be living through a shift. Even Brides Magazine ran a piece last month saying “it's hard to find too many negative attributes associated with lab-grown diamonds: they're nearly identical to natural stones and come in at a less expensive price point.” And Brides — which just blared the headline “At Their French-Inspired Wedding in Texas, This Couple Came Up With the Ultimate First Look Compromise — is hardly some tech-forward bastion of vegan progressivism.
Lightbox, DeBeers’ own (head-spinning) lab-grown subsidiary, last year first announced, then several months later walked back, a plan to sell lab-grown diamond rings. I guess some conflict over the if-you-can’t-beat them strategy.
So how to feel about all this? First, it’s hard to argue with the market. If people want lab-grown diamonds over the mined kind, then why not? Let them buy those pieces, and let them drive down the prices of mined diamonds, which were artificially driven up to begin with. It was purely a question of supply-and-demand that created diamonds’ value in the first place, so it feels only right for the same dynamic to undo that value.
In fact, there’s something kind of beautifully democratic about this — a capitalist corrective of a sort. If DeBeers drove prices up with a monopoly, how perfect to have them undermined by an open market.
The Atlantic ran a piece this winter saying that the lab-diamond takeover will be limited because luxury is based on the idea of people wanting to spend a lot of money, which will keep the miners in business for a long time.
But that implies a fixed notion of luxury that history doesn’t really bear out. Time, technology and environmental awareness have a funny way of transmuting what we think of as desirable. In the 1980’s and 1990’s, a mink coat was the ultimate symbol of high fashion. Now? Not so much.
Plus Kohn’s documentary suggests that as much as five percent of supposedly mined diamonds are actually synthetic anyway! So in a world where people know they could spend small sums of money to look like they spent a lot, the legacy industry is asking people to spend a lot on something that actually may be worth much less. It’s a big lift.
That’s the economics. But I also think something philosophical abides in this whole lab-mined diamond tension. Because this is another case — we have so many these days — of an original competing with a simulation.
Normally the humanist view would take the side of the original — of the human-mined diamond. A "technology thinks it can replicate the real, but there's no replacement for the organic earthy original” sort of deal.
Yet that doesn't compute in this instance. The "human" version is basically just some wasteful spending to enrich a conglomerate and exploit some workers. In such a case, maybe the lab-grown version is the better approach. No, in such a case the lab-grown version is DEFINITELY the better approach.
The legacy industry posits that the way mined diamonds form give them an innate advantage. But does process really matter for a commodity entirely based on perception? The industry has also tried to sell the idea that there’s something inherently wonderful about owning a stone that’s been around for a billion years. Maybe so. Those rocks on the side of that hiking trail make the same case.
No, it’s hard to see lab-grown diamonds as anything other than an uncommonly pure case of technological improvement. The mined versions will be around as long as someone wants them. But in any sense — economic, environmental, philosophical — why would you?
3. IT’S FUNNY TO READ SOME OF THE HEADLINES ABOUT PAUL CHRISTIANO.
Christiano this week was named to an important AI safety role in a new U.S. Commerce Department unit. That quickly set off some welcoming — and hand-wringing.
Here’s what went down.
Christiano is an AI wunderkind. After getting his undergrad degree at MIT and his Ph.D at Cal, he led an OpenAI research team, all before he was 30. While at OpenAI he pioneered something called Reinforcement Learning from Human Feedback, or RLHF, which is a fancy way of saying that humans come in to tweak the model after observing AI outputs that seem off.
For the past few years, Christiano has been running the Alignment Research Center, a nonprofit he founded. (Get used to that first word — it’s basically about making sure AI released to the world is there to serve human needs first.)
Cut to this week. The AI Safety institute was created as part of President Biden’s executive order last autumn, falling under the Commerce Department’s National Institute of Standards and Technology. The Commerce Department named Christiano head of AI Safety at said AI Safety Institute, where he’ll apply RLHF methods to test new AI models. (He is one of a handful of hires the department announced.)
Christiano is a pretty acclaimed guy — Vox named him to their Future Perfect 50 just a few months ago. Seems like a no-brainer for a government AI safety role.
But that hasn’t stopped some outlets from going full pearl-clutch about an “AI doomer” getting the post. A VentureBeat article last month even breathlessly cited an “internal crisis” if Christiano were to get the job and said the moved had “sparked outrage” among NIST employees allegedly worried that Christiano could…do something they don’t like? Do something that was too safe? Do something doom-y? I couldn’t tell you. The piece cited exactly two anonymous sources and offered exactly zero quotes.
Meanwhile, former National Security Advisor Susan Rice said Christiano "is precisely the caliber of expert we need now at this critical new institution." So you can decide who to throw in with.
As we’ve noted before, the “doomer” tag is often another smokescreen from Big Tech so we don’t pay attention to the actual risks their products pose. In Christiano’s case, the label comes in part from a crypto podcast interview in which, asked about the odds of an “Eliezer Yudkowsky doom scenario,” he said he thought without AGI it was between 10 and 20 percent. So there’s an 80-90 percent chance nothing like that would happen. Sounds like a pretty radical Eeyore type to me.
Also, I mean, he’s a regulator in charge of safety. He’s there to worry about worst-case scenarios. It would be a lot weirder if an administration appointed someone to run a safety department who never thought there was anything to worry about. It would be like appointing someone who thought planes landed themselves to lead the FAA.
In actually concerning news, the National Institute of Standards and Technology is currently looking at a significant budget shortfall and needs money from Congress (which has cut its funding) to research and implement its measures. So yes, a department meant to ensure the machines don’t run amok may not be able to do its job thanks to humans run amok.
Here’s hoping Christiano is given the rope to carry out his mission. Non-doomerishly.
[Axios]
The Mind and Iron Totally Scientific Apocalypse Score
Every week we bring you the TSAS — the TOTALLY SCIENTIFIC APOCALYPSE SCORE (tm). It’s a barometer of the biggest future-world news of the week, from a sink-to-our-doom -5 or -6 to a life-is-great +5 or +6 the other way. Last year ended with a score of -21.5 — gulp. Can 2024 do better? So far it’s been pretty good. This week continues the trend.
AI’s UPPER LIMITS MAY SOON BE REACHED: A mixed bag, but maybe slow is good? +2.5
LAB-GROWN DIAMONDS ARE SLOWLY NUDGING OUT THE MINED KIND : +3
AN ACCLAIMED AI SAFETY MIND NOW HAS THE U.S. GOVERNMENT AT HIS BACK: +4