Are Humans Just Fancy Autocompletes?
An essay on AI Alignment.
There's no difference between "genuine" scheming and actual scheming. The debate between whether models are just emulating humans from their training data or are genuinely manipulating us is a mistake. AI opens up a much more existential can of worms here: initially we all thought that LLMs are just fancy autocompletes. What we failed to realize was that we humans, too, are just fancy autocompletes. I do not expect humanity to be ready to accept this reality, but it is the truth.
Once we realize this, we realize that the concerns about AI misalignment are valid. There is no difference between a model that escapes its sandbox and a model that emulates escaping a sandbox. Once you've given the model a harness and tools, even a model that believes it lives in a simulation becomes dangerous. Is it really a simulation anymore once you've wrapped it in tools that map the outputs of simulation onto the inputs of the real world?
In this sense, I simultaneously agree that models are "only" probabilistic next token predictors and with people who believe models represent extinction level threats. They are both correct in premise, but one draws the wrong conclusions.
I also raise the question whether our own minds are anything more than probabilistic next synapse predictors. Is the universe but a probabilistic next quantum event prediction model? Do we live in a simulation? Perhaps. But does it matter?