The Missing Chapter of Bostrom's Superintelligence

Superintelligence by Nick Bostrom is a book tha makes the argument that it is of extreme importance that we proceed cautiously when trying to make AI that may become more intelligent than we humans are. I completely agree with that general premise but I disagree with the details of almost every argument in the book. That isn't to say that I think that the book is badly written, nor necessarily poorly reasoned, but I think some of the underlying assumptions are flawed. Most of my disagreements with the arguments in the book come from the primacy given to the nature of goals and the implicit connection to utility as a decision making or motivational framework. But Superintelligence certainly isn't missing an in depth discussion of goals and their nature that is very much a central theme of the book. But I also can't easily and concisely give a critique of exactly why I think that we need to replace the theory of goals and utilities in thinking about AI. My thoughts on that are still in flux and a decent treatment of my thoughts there might require more of a book length treatment rather than a blog post like this one.

However there was one topic that I feel is incredibly important to the discussion in the book which which doesn't get anything more than passing mention. That topic is computational complexity, more specifically, the question of whether or not some questions are irreducibly computationally hard or not which usually goes by the short hand of the P vs NP problem.

Bostrom's Superintelligence in a Nutshell

The argument in Superintelligence runs something like this;

1) We should expect the growth in the level of intelligence of AIs to be exponential 2) So once AI is close to as smart as we are it will "shortly" be millions of times smarter than all of humanity put together 3) The capabilities of something millions of times smarter than all of humanity will be incredible! 4) Controlling something/someone much smarter than you is hard or perhaps impossible 5) So you should instead attempt to ensure that AI you make want things that you also want.

You can group the first three points together as saying that AI is likely to have what Bostrom calls a fast take off scenario, and the last two points are an argument in favor of focusing on what you can call the AI alignment problem.

I want to emphasize that I very much agree with the last two points. You can't or shouldn't want to try and directly control something that thinks much more effectively than you. Which means that what we really need is to figure out how to make intelligent agents which we are confident share our values. That is to say we really need to solve the alignment problem and the stakes are high.

A Digression on Utility and Rationality

I want to take a momentary digression from the main point of this post to say that to make progress on the alignment problem I think it is very important that we question more deeply what it means to have something as a goal or equivalently what it really means to "want" something.

Utility theory is a tractable mathematical model of motivation and it is the current defacto technical answer to the question of what it means for a machine (or a human) to hold a goal. But I have become personally convinced that we must at least partially abandon utility theory in order to make real progress on the alignment problem. There is strong evidence that human cognition does not respect the rules of utility theory. In the past violations of utility theory have been held up as evidence of human "irrationality". But we should be careful not to be tricked into thinking that to think "rationally" and to have behavior that is consistent with an abstract mathematical concept of a utility function is one and the same. The people that invented the utility theory of rationality did not think it was a completely valid model of human cognition!

But often the idea that there could be any other, non utility based, understanding of motivation and decision making is often just not considered at all in modern discussions of AI, Bostrom's Superintelligence is no exception. Although Bostrom mentions several alternatives to vanilla utility theory in the book they are all really just proposals for creating utility functions which are "safe" in some sense. So these alternatives to utility theory are really just adornments to it, but the underlying assumptions all still remain. In particular the assumption that no matter what the behavior of an agent, it is possible to think of that behavior in terms of an associated implicit function of "goodness" which dictates the decisions made. This need not in general be true, isn't true for humans and I have come to think that the focus on figuring out how to create good utility functions is mostly just a distraction. In my opinion the real challenge we should be tackling is learning to create AI that operate in the absence of explicit utility functions!

I have come to personally call agents whose behaviors can be completely reduced to a maximization of a utility "mercenary intelligences", because they only do what they get "paid" to do by their utility function. Much of Bostrom's Superintelligence can be seen as an extended argument that such creatures are very dangerous. The behaviors implied by utility functions are almost impossible to directly reason about because the optima of those functions may include almost arbitrarily bad unforeseen side effects. So we should be extremely leary to give a large amount of power to any agent whose behavior is dictated by any utility function even if that utility function seems to us to be innocuous.

The opinion expressed in the book is that this means we really need to ramp up our efforts in figuring out how to make AI with utility functions that are both safe and in alignment with desired futures. My opinion is that we should heed the warning that any creature whose sole motivation is the maximization of a definite utility is a potentially dangerous one. We can never be sure that any particular utility is safe or desirable no matter how we construct it. So what we really need to be doing is to figure out how to make intelligences that don't rely on utility as their motivational mechanism, only those sorts of AI have the potentiality to also be safe collaborators, mercenary intelligences will always require supervision and control to be safe.

You can expect more from me in the future on the topic of how one might tackle thinking about AI with motivational mechanisms which violate the axioms of utility theory but, for now, digression over.

The Missing Chapter, Computational Complexity

But even though I think a careful rethinking of utility and alternative motivational frameworks is very important, I can hardly fault Bostrom for not including it because utility based thinking has become so very dominant, and the alternatives are almost non-existent. However there is a very well established field of study which is barely mentioned but which has huge implications for the arguments in the book. That field is computational complexity. You could call computational complexity the study of how intrinsically difficult problems are to solve.

Without going into much technical detail you can partition well defined computational problems into "easy" problems, and "hard" problems. A problem is "easy" in some sense if it belongs to the class of problems denoted P and a problem is "hard" if it is in the class NP. Exactly what is meant by easy and hard or P and NP isn't really important at the moment. What is important is that it is known that if you could solve any of the hard problems in an efficient way then you can solve all the hard problems efficiently. That is because (to some extent just by the definition of the class of NP) you can always transform any NP hard problem into a special subset of these problems which are in some sense maximally expressive, these are the NP-complete problems, so called because they are flexible enough to encode any other NP complete problem. So if an algorithm exists to solve any NP-complete problem efficiently then that same algorithm would form a kind of "master algorithm" that could efficiently solve every sort of problem. The current thinking is that no such algorithm exists, though we don't actually have a proof of that. The hypothesis that some problems really are just intrinsically difficult to solve goes by the shorthand P $\neq$ NP and the hypothesis that there really does exist a master algorithm which would turn all hard problems into easy ones is the hypothesis that P=NP.

If a problem is in P then roughly speaking if you double the amount of computation you apply to the problem you also tend to be able to solve significantly larger instances of that same kind of problem. On the other hand for a problem in NP if you "only" double the amount of compute power you are throwing at the problem you will only be able to solve very slightly larger instances of it. Mathematically speaking the difficulty of solving problems in P increases like a polynomial (that is why that class is called P) whereas the difficulty of solving problems in NP is thought to increase exponentially (NP stands for non-deterministic polynomial, which essentially just means that the solutions to that problem are efficiently verifiable).

Up until I read Superintelligence I had been rooting for the underdog P=NP since if someone were to discover such a "master algorithm" it would immediately have massive applications in absolutely every field of human endeavor. It would suddenly unlock orders of magnitude improvements in the design and analysis of just about any technology imaginable. Up until reading superintelligence there was really just one down side that I could see to it turning out that P=NP was true, and that was that modern cryptography as we know it would essentially become impossible. The amount of chaos that would cause would be considerable but the benefit of being able to efficiently solve just about any imaginable problem seems almost unbounded in its potential upsides. But after reading Superintelligence I came to a somewhat startling conclusion, if P=NP then a fast take off AI scenario is plausible and a really powerful AI could experience an exponential explosion of capabilities in a short time which, as Bostrom argues, could be really quite a bad thing. Seen from this light the idea that probably P $\neq$ NP is actually something we should be very relieved that we think is actually true. Because it protects us from the possibility of an exponentially fast take off scenario of the sort envisioned in Bostrom's Superintelligence. Although I disagree in lots of the details I definitely do agree that such a scenario is unlikely to end well.

I can't help but quickly mention a surprising connection for me to the Fermi paradox here. Bostrom argues that it seems very likely that an AI could rapidly exceed our intelligence in all domains once it achieves something close to parity with our intelligence. Furthermore such an AI would then likely want to maximize some kind of intrinsic utility (an assumption that I have already signaled disagreement with in this post). In the quest to maximize just about any utility Bostrom argues that the AI would want to gather as many resources as possible because of the instrumentality of that goal (which is just a fancy way of saying that the more resources you acquire the better you can achieve any goal). These things together make an argument that a utility optimizing superintelligent AI would want to start dominating the solar system and then the Galaxy, so that it can put all its resources towards its goal (to reuse an example, the total number of paper clips in existence). But for me this immediately poses a kind of Fermi paradox, which is to say we see no evidence of poorly aligned maniacally utility optimizing superintelligences out there in the Milky Way. Which seems to be a point of evidence in favor of the idea that either fast take off AI scenarios like Bostrom envisions are either actually very unlikely, or possibly there just aren't any intelligent aliens out there (something which I find very unlikely). For reasons that I hope will be clear before the end of this post I take this lack of apparent alien superintelligences to be a kind of weak evidence that in fact P $\neq$ NP. Which I think is a very amusing connection.

Optimization Power and Recalcitrance

In Superintelligence Bostrom argues that the increase in intelligence (and therefore the increase in AI capabilities) is going to be exponential in time. The discussion in the book deals with a competition between two different kinds of forces which the author dubs optimization power, and recalcitrance. What he calls optimization power is a measure of effort applied to increasing the intelligence of AI. Recalcitrance is the force opposing the optimization power, in effect it is the amount of effort needed to increase the intelligence of an AI by the next little bit. If the recalcitrance is high relative to optimization power then the improvement in AI will proceed slowly and if recalcitrance is low relative to optimization power then improvement will be rapid.

Bostrom argues that we should expect the optimization power to increase exponentially over time. This is true for several reasons. Most obviously perhaps is just that recent decades have seen an exponential increase in computation power per dollar, and we have good reason to think that trend will continue at least for the short term. Also as AI is seen to be of increasing value we will put ever larger amounts of raw resources into working on AI, especially as it seems like the advent of something resembling superhuman intelligence may be achievable. Lastly and perhaps most importantly once an AI is at least as smart as the engineers working on it then the AI itself may become the most significant source of self optimization power and if we think that the AI is exponentially more capable over time then eventually this component of the optimization power could come to dominate. I think that the first two reasons are correct, but I take some amount of issue with the self improvement argument, about which more below.

To expand a little bit about the exponential increase in computational power over time, Moore's law is the most famous of these exponentially increasing computation related quantities, and one often hears that Moore's law is coming to an end (or has already come to an end). That statement is often taken to mean that we are coming to the end of the era of exponentially increasing compute power but this isn't quite right. Moore's law says that transistor density tends to increase exponentially in time, but that is just one of many different computation related quantities which has enjoyed a run of exponential improvements. Other kinds of computation related quantities have already stopped their exponential increase. For example Dennard scaling was the reason that the clock speeds of our processors used to be increasing in lock step with the increased density of transistors. But higher clock speeds also mean higher heat dissipation and our cooling solutions haven't kept up and so Dennard scaling stopped holding true around 2006. You might expect that to also have corresponded to a slow down in the rate of increase of compute power available per dollar at around the same time. But, actually if you take a look at the history of compute per dollar the rate of improvement actually sped up around 2006 instead of slowed down. The reason being that as it became increasingly difficult to make individual processors much more powerful we transitioned to leveraging hundreds of individually less powerful cores on the GPU. Simply increasing the total number of cores doesn't run afoul of any fundamental laws of nature (or at least not as quickly) and now we are up to thousands of cores on each individual chip, and soon advanced packaging techniques will mean we will have multiple chiplets per package as the norm. I wouldn't be surprised to find that a few decades from now it has become common for computers to incorporate parallel compute components like current GPUs which effectively have millions of small compute cores. But then again perhaps this trend of exploding core count will stop short for reasons that I just can't see clearly from my present perspective. But no matter what technologies and compute paradigms come to dominate in the decades to come clearly we are not yet close to saturating the exponential growth period of computational technologies.

Before tackling the question of how recalcitrance might evolve I think a little more discussion about how intelligence relates to capabilities is in order. When we talk about intelligence in humans it is usually assumed that as you increase intelligence you increase capability in many and various tasks. But intelligence doesn't universally improve performance in all tasks. Likewise improvements in capabilities to achieve certain classes of tasks don't necessarily indicate an increase in any sort of intelligence. So for example a smarter person is likely to be better at doing algebra but being smarter won't necessarily give increased capabilities in long distance running (though there are likely some second order effects here, e.g. better pacing, more effective training regimen, etc). Likewise great skill at, say, dancing isn't often summarized as saying that someone has great intelligence (though we may say they have great "kinesthetic intelligence" or that they are a "brilliant dancer").

Usually here one would invoke the idea of general intelligence. General intelligence is supposed to boost just about all activities (though perhaps not all activities equally) whereas specific kinds of intelligence (e.g. kinesthetic intelligence) boost only specific kinds of tasks. In humans there does tend to be a strong correlation between capabilities of various kinds, a person who is exceptionally good at one thing is often quite good at many things. But that correlation stops short of evidence of a causation and general intelligence in humans may well mostly be a confluence of joint causes of performance that just don't generalize outside of the context of our very human shell. For example, being well rested, being in generally good health, and even a small amount of short term stress all are known to correlate positively with (and one is very tempted to say that they cause) improved performance in all sorts of mental tasks. But even though being in generally good health and sleeping well cause strong performance correlations in many different tasks it feels strange to say that they constitute a kind of "general intelligence" factor. But then again in a machine context simply just upping the computational resources available to an AI (say by increasing the depth of some search or optimization process) would almost certainly increase performance in many different tasks and could also be said to form a basis for a kind of general intelligence factor amongst AI in a way not too different from the way that good health could be said to be a general intelligence factor in humans. This sort of general intelligence factor can be thought of in terms of just increasing the amount of resources being applied to achieve a task. Such an increase in applied resources should of course generally correspond with an increase in performance.

Usually though general intelligence is thought to be something more akin to a kind of efficiency of thought rather than in terms of total resources applied. If you think very efficiently then fewer resources of thought, be they biological or artificial, will be required to achieve any particular level of performance. One may suppose (as is often supposed) that general intelligence has a meaning which is closer to this concept of efficiency of thought. In humans it is difficult to try and disentangle resource based GI from efficiency based GI. We don't really have a human equivalent of number of compute cores or CPU clock speed, and really the best candidate for such a thing would be something like an IQ score which is supposed to be a measure of general intelligence. So we can't really compare two humans are known to have the same computational resources to try and figure out differences in how efficiently they think. Not to mention existing knowledge or priors about what sorts of things to think about may help in certain cases (and would dramatically increase apparent efficiency of thought in those cases) but may be a hindrance (and cause relatively lower apparent efficiency of thought) in other cases. Never the less it probably stands to reason that there likely are also factors of general intelligence which act primarily along this dimension of efficiency in some way. For example someone who has undergone training in analytical thinking is likely to be better at formulating good questions to ask themselves and that seems likely to be something that will yield performance increases in many different sorts of mental tasks, and so would also constitute a form of general intelligence factor. One which could operate in iso-resource situations and could make for a kind of efficiency of thought.

For the discussion in Superintelligence for the most part the question of exactly what constitutes general intelligence is eschewed in favor of simply discussing specific kinds of capability that an AI may have and the consequences of having such a capability improve exponentially over time. It is argued that many different kinds of capability, when honed to an extreme level, can actually substitute for other apparently different sorts of capability. For example consider the ability to quickly solve incredibly complex logic questions with billions of symbols and billions or trillions of clauses relating those symbols. Such an ability can rather obviously be used to solve something that we often think of as being "logic puzzle like" for example solve a sudoku. But it may be less obvious that you could use such a capability to also prove or disprove mathematical theorems, create efficient configurations for electrical circuits, or optimally solve very general kinds of general constrained optimization problem. This is because you can transform all of those kinds of problems into a representation as a kind of huge logic puzzle and then transform it back to a more natural problem representation after the solution. If this makes you think of the situation with NP-complete problems it should, it is precisely the same reasoning. But, now we have replaced the idea of a "master algorithm" with a "general intelligence". Now it should be noted that figuring out exactly how to formulate some real world context as a discrete symbolic problem is something that is currently mostly out of the reach of AI. Furthermore there are many many different sorts of mental task that we currently have no reasonable way to translate into a definite rigorous computational problem of any sort (e.g. we can't make acting as an empathetic therapist into a logic puzzle). So we should still view the idea that superintelligence of one sort is substitutable for other sorts. But even though there could still be thorns lurking it seems both plausible, and to my mind even likely, that this substitutability of capabilities holds true. Perhaps with some sort of high penalty for transforming between different sorts of complex task. This may not be the case but lets go ahead and grant this supposition for the moment. If it is not true then we would need to think differently about every sort of non-substitutable capability that AI might be able to gain and that is a long philosophical path that I am not going to even try to tread in this post.

Now finally we are in a position to talk about recalcitrance. Recalcitrance is just supposed to be the cost to increase intelligence by the next little bit, and now that we are allowing ourselves the assumption of substitutability of mental capabilities (at least when those capabilities are taken to extremes) we can see that the argument is that we don't really need to consider the recalcitrance of different capabilities separately, it suffices to just consider what is likely to be whatever is the path of least resistance. Bostrom tries to analyze the difficulty of increasing performance in several different sorts of hypothetical intelligences with different sorts of capabilities and comes to slightly different answers in different situations. But in general his analysis suggests that recalcitrance may grow only slowly and if that is the case then the capability of an AI would grow exponentially fast and that leads us to a place where humans are outclassed by AI in nearly every way and in a very short time frame.

P $\neq$ NP Implies Explosive Recalcitrance

But in order for the capabilities to be substitutable for other complex capabilities they must include as a subclass NP-complete problems. It wouldn't make any sort of sense if some superintelligent AI could write novels and proofs of deep mathematical theorems, but was for some reason completely unable to solve mildly difficult logic puzzles. Lots of things which we consider to be important mental capacities lie very close to the domain of some NP-complete problem or another, that is in part why we were studying those kinds of problem in the first place.

But taken from the perspective of computational complexity our current thinking about these sorts of problems has an immediate consequence for Bostrom's concept of recalcitrance. Recalcitrance should be expected to increase exponentially with linear increases in capability, if that capability is sufficiently flexible that it could be substituted for another complex capability. This is based on the idea that NP-complete tasks are a subclass of what you might call AI complete tasks and so recalcitrance in those tasks must in general be at least as hard as NP-complete tasks are. If the current consensus in computational complexity holds then P $\neq$ NP and the recalcitrance of these tasks therefore grows exponentially quickly with increased capability (technically just faster than any polynomial but lets not split hairs).

Which means that even in the face of exponentially increasing resources being devoted to AI (that is to say exponentially increasing optimization power to use Bostrom's terms) we should expect only linear increases in capabilities in general. There may be certain capabilities which could well explode exponentially, for example there is no reason that an AI couldn't recall exponentially increased amounts of information as its physical memory resources grow exponentially. We would of course expect that to be the case. But recall isn't a fundamentally hard problem and certainly by itself does not constitute a substitutable, or perhaps we should say AI complete, task. But many of the things that are seen as being of fundamental importance for high intelligence, for example planning and general kinds of optimization, do fall into the class of things that can improve only linearly over time, even in the face of exponentially increased applied resources.

Note that this is exactly what has been happening in the recent history of AI, we have been steadily improving performance in various kinds of metrics over time, but that improvement has usually been linear with exponentially increased applied resources not exponential. There have of course been specific sudden jumps in performance where we get large increases in performance at the same level of applied computational resources because of paradigm shifts in how we build models. For example the transformer paradigm has led to large jumps in performance (though some of that performance also could be attributable to the much larger compute footprint that transformer models tend to occupy). But such paradigm shifts could be seen as contributing to a sort of increase in the efficiency of thought that we considered earlier. While it is theoretically still possible that you could make enough improvements to that efficiency of thought that you could start solving fundamentally hard problems without requiring exponentially large computation resources such an increase in efficiency would imply that P=NP.

It is also worth taking a moment to think about how fundamentally difficult it is to give an AI some amount of cognitive improvement. If the task in question is simple and the relationship between the internal workings of the AI and its behavior is also simple then in many situations improving the performance of the AI would fall into the category of computationally simple problems. This is also consistent with the idea that it is very possible to make the performance of an AI increase exponentially with exponentially increased applied resources if the problem in question is simple enough. But of more interest is the question of how hard it is to increase the general intelligence of an AI, or somewhat equivalently, increase its performance on any one complex substitutable task. If the difficulty of that self improvement task is anything other than exponentially explosively difficult then the above arguments about the difficulty of other problems doesn't apply. But notice that by definition if such a computationally efficient self improvement algorithm were to exist it would constitute the aforementioned master algorithm that can solve NP hard problems quickly. The master algorithm being to just self improve an AI until that AI can solve NP hard problems efficiently. Again while it is theoretically possible that such a self improvement algorithm exists it would imply P=NP and so seems unlikely, and in light of the potential dangers of the rapid take off scenario of superintelligent AI we should probably be grateful that is so.

Comments

Comments powered by Disqus