Ok so this theme of "provably safe AI" has been coming up a lot recently and I thought I needed to learn more about it.
First I saw Stuart Russell mention "provably safe and beneficial" during the U.S. Senate hearing in July.
Then there is @davidad's Open Agency Architecture which has a bunch of math and formal verification in it.
And last weekend I saw Max Tegmark's TED talk where he uses the exact phrase.
Why start writing about this?
I'm not an expert in formal verification, theorem provers, type theory, or any of the “mathier” sides of computer science.
I'm also not an expert on neural networks, transformers, machine learning, or AI alignment.
And I'm definitely not an expert on policy and how it might intersect with all of the above.
So who am I to write about this? I'm a software developer who wants to live.
Current path to superintelligence
It's not my goal here to debate whether superintelligence poses an existential threat to humanity or not, those debates aren't hard to find elsewhere.
But just to provide some context, from my current point of view:
It's possible to make a system that is vastly more intelligent than all of humanity combined.
If such a system is optimizing for goals that are incompatible with ours, we all die (or worse).
AI labs are deliberately trying to make systems that are that powerful, and they seem to be getting close to achieving it.
In the current way AI is being developed, we don't know 1) how to detect if they have goals, 2) what those goals are, 3) how to change them, nor 4) what we would change them to.
And we're not on track to figuring any of that out before they get too powerful.
Therefore, my current position is that we should globally, indefinitely stop making any system more powerful than the ones we already have, go back to the drawing board, and make sure we know exactly what we're doing before taking each next step.
Upping our game
Intuitively, formal verification does sound very appealing to me.
If you can't promise with a straight face you'll make a web server in 4 years that is immune to buffer overflows, use-after-free, null pointer exceptions, undefined behavior, and every hardware and software vulnerability humans have come up with and could come up with, all the way to rowhammer-style and side-channel exploits, and then put up a 10 billion dollar bounty on it, you probably shouldn't rush to build a superintelligence and say you'll make it safe in 4 years while admitting that if you don't it's "lights out for all of us".
So stopping, then slowly building up those skills of how to make (designed, not found) mundane hardware and software, knowing how to specify what we want, being able to predict confidently, verifiably, ahead of time, what it will and won't do, and then building and running it successfully as predicted, that sounds like the level of competence that a civilization capable of making smarter-than-itself systems, and not just surviving it, but having it go well, would be able to perform at easily.
We should probably aim at those lesser goals first and not skip steps.
But asking AI labs to just “f*cking stop” doesn't work very well. Coordination is hard, especially when there's so much short-term incentive for not coordinating.
My main interest at this point then, about formal methods, is can they help us stop faster, before it's too late?
The goal is not to never build superintelligence, but to build it well.
And formal verification sounds like it could leave the door open for that. It's not "don't do this", it's "this is the bar for how good you have to be at it".
Beginner questions
As I’m new to this, I want to start by doing some breadth-first exploration and figure out some of the basics, like:
Is that even viable?
Are there legal precedents?
What kinds of formal requirements exist in areas such as aerospace, medical devices, automotive, nuclear energy, defense, computer security, and other safety-critical systems?
What kinds of standards, conventions, regulations, treaties, and commercial agreements could provide inspiration for AI policy?
What are the existing auditing and enforcement mechanisms?
What kinds of safety properties are amenable to formal specification and verification?
Does this make sense for complex systems? What’s the most complex thing that can currently be called “provably safe”, by what standards of “provable” and “safe”?
Can legislation extending formal verification requirements to frontier AI development actually help slow things down?
Who are the researchers, lawmakers, institutions, government agencies, and other relevant actors in a position to create, propose, and advocate for these requirements?
Welcome aboard
If you have some of the same questions, know some of the answers, or want to explore and collaborate on these topics with me and other people, let’s talk!
Feel free to DM me, schedule a call, or reply to this email:
twitter: @agentofuser
discord: @agentofuser
calendly: @agentofuser
There’s also a discord server:
And of course, feel free to use the comments section here on Substack. I’d love to hear your thoughts and feedback.