4

4 general intelligence is not that definite

In note 1, I claimed that “I will get some remotely definitive understanding of problem-solving” is sorta nonsense like “I will solve/[find a grand theor[y/em] for] math” or “I will conceive of the ultimate technology”. One could object to this by saying “look, I’m not trying to understand intelligence-the-infinite-thing; I’m trying to understand intelligence as it already exists in humans/humanity/any-one-particular-mind-that’s-sorta-generally-intelligent, which is surely a finite thing, and so we can hope to pretty completely understand it?”. I think this is still confused; here’s my response:

Intelligence in humans/humanity/any-reasonable-mind-that’s-sorta-smart will already be a very rich thing. Humans+humanity+evolution has already done very much searching for structures of thinking, and already found and put to use a great variety of important ones.
1. [Humans are]/[humanity is] self-reprogramming. (Human self-reprogramming needn’t involve surgery or whatever.) A central example: [humans think]/[humanity thinks] in language(s), and humans/humanity made language(s) — in particular, we made each word in (each) language.¹ Humanity is amassing an arsenal of mathematical concepts and theorems and methods and tricks. We make tools², some of which are clearly [parts of]/[used in]/[playing a role in] thinking, and all of which³ have been involved in us doing very much in the world. We learn how to think about various things and in various ways; when doing research, one thinks about how to think about something better all the time. I’m writing these notes in large part to restructure my thinking (and hopefully that of some others) around thinking and alignment⁴ (as opposed to like, idk, just stating my yes/no answers to some previously well-specified questions (though doing this sort of thing could also totally be a part of improving thinking)).
Anyway, yes, it’s probably reasonable to say that humanity-now has some finite (but probably “big”) specification. (Moreover, I’m pretty sure that there is a 1000-line python program such that running that program on a 2024 laptop with internet access would start a “process” which would fairly quickly take over the world and lead to some sort of technologically advanced future (like, with most of the compute used being on the laptop until pretty late in the part of the process until takeover).) Unfortunately, understanding a thing is generally much harder than specifying it. Like, consider the humble cube⁵. Is it obvious to you that its symmetry group (including rotations only, i.e., only things you can actually do with a solid physical cube) is \(S_4\), the permutation group on \(4\) elements?⁶ Or compare knowing the weights of a neural net to understanding it.
The gap between specifying a thing and understanding it is especially big when the thing is indefinitely growing, indefinitely self-[reworking/reprogramming/improving] (as [humans are]/[humanity is]).
Obviously, the size of the gap between the ability to specify a thing and understanding it depends on what we want from that understanding — on what we want to do with it. If all we wanted from this “understanding” was to, say, be able to print a specification of the thing, then there would not be any gap between “understanding” the thing and having a specification of it. Unfortunately, when we speak of understanding intelligence, especially in the context of alignment, we usually want to understand it in its long-term unfolding,⁷ then there’s a massive gap — for an example, consider the gap between having the positions and momenta of atoms when evolution got started in front of you⁸ vs knowing what “evolution will be up to” in billions of years.
And even in this cursed reference class of comprehending growing/structure-gaining things in their indefinite unfolding, comprehending a thinking thing in its unfolding has a good claim to being particularly cursed still, because thinking things have a tendency to be doing their best to “run off to infinity” — they are actively (though not always so explicitly) looking for new better ways to think and new thinking-structures to incorporate.

Relatedly: one could try to conceive of the ability to solve problems in general as some sort of binary-ish property that a system might have or might not have, and I think this is confused as well.

I think it makes much more sense to talk loosely about a scale of intelligence/understanding/capability-to-understand/capability/skill, at least compared to talking of a binary-ish property of general problem-solving. While this also has limitations, I’ll accept it for now when criticizing viewing general intelligence as a binary-ish thing. (I’m also going to accept something like the scalar view more broadly for these notes, actually.⁹)
Given such a scale of intelligence, we could talk of whether a system has reached some threshold in intelligence, or some threshold in its pace of gaining intelligence. We could maybe talk of whether a system has developed some certain amount of technology (for thinking), or whether its ability to develop technology has reached a certain level.
We could talk of whether it has put to use in/for its doing/thinking/fooming¹⁰ some certain types of structures.
But it seems hard to make a principled choice of threshold or of structures to require. Like, there’s an ongoing big foom which keeps finding/discovering/inventing/gaining new (thinking-)structures (and isn’t anywhere close to being done — the history of thought is only just getting started¹¹). Where (presumably before humanity-now) would be a roughly principled place to draw a line?
1. One could again try to go meta when drawing a line here, saying it’s this capacity to incorporate novel structures itself which makes for an intelligent thing. But this will itself again be a rich developing thing, not a definite thing. In fact, it is not even a different thing from [the thought dealing (supposedly) with object-level matters] for which we just struggled to draw a line above. It’s not like we think in one way “usually”, and in some completely different way when making new mathematical concepts or inventing new technologies, say — the thinking involved in each is quite similar. (Really, our ordinary thought involves a great deal of “looking at itself”/reflection anyway — for instance, think of a mathematician who is looking at a failed proof attempt (which is sorta a reified line of thought) to try to fix it, or think of someone trying to find a clearer way to express some idea, or think of someone looking for tensions in their understanding of something, or think of someone critiquing a view.)

One (imo) unfortunate line of thinking which gets to thinking of general intelligence as some definite thing starts from noticing that there are nice uncomputable things like kolmogorov complexities, solomonoff induction (or some other kind of ideal bayesianism), and AIXI (or some other kind of ideal expected utility maximization), and then thinks it makes sense to talk of “computable approximations” of these as some definite things, perhaps imagining some actual mind already possessing/being a “computable approximation” of such an uncomputable thing.

I think this is like thinking some theorem is “an approximate grand formula for math”.
It is also like thinking that a human mathematician proving theorems is doing some “computable approximation” of searching through all proofs. A human mathematician is really “made of” many structures/[structural ideas].
More generally, the actual mind will have a lot of structure which is not remotely well-described by saying it’s a computable approximation of an infinite thing. (But also, I don’t mean to say that it is universally inappropriate to draw any analogy between any actual thing and any of these infinitary things — there are surely contexts in which such an analogy is appropriate.)
For another example of this, an “approximate universal prediction algorithm” being used to predict weather data could look like humans emerging from evolution and doing philosophy and physics and inventing computers and doing programming and machine learning, in large part by virtue of thinking and talking to each other in language which is itself made of very many hard-won discoveries/inventions (e.g., there are some associated to each word), eventually making good weather simulations or whatever — there’s very much going on here.
Thinking of some practical string compression algorithm as a computable approximation to kolmogorov compression is another example in the broader cluster. Your practical string compression algorithm will be “using some finite collection of ideas” for compressing strings, which is an infinitesimal fraction of “the infinitely many ideas which are used for kolmogorov compression”.

One more (imo) mistake in this vicinity: that one could have a system impressively doing math/science/tech/philosophy which has some fixed “structure”, with only “content” being filled in, such that one is able to understand how it works pretty well by knowing/understanding this fixed structure. Here’s one example of a system kinda “doing” math which has a given structure and only has “content” being “learned”: you have a formal language, some given axioms, and some simple-to-specify and simple-to-understand algorithm for assigning truth values¹² to more sentences by making deductions starting from the given axioms¹³. Here’s a second example of a system with a fixed structure, with only content being filled in: you have a “pre-determined world modeling apparatus” which is to populate a “world model” with entities (maybe having mechanisms for positing both types of things and also particular things) or whatever, maybe with some bayesianism involved. Could some such thing do impressive work while being understandable?

I think that at least to a very good approximation, there are only the following two possibilities here: either (a) the system will not be getting anywhere (unless given much more compute than could fit in our galaxy) — like, if it is supposed to be a system doing math, it will not actually produce a proof of any interesting open problem or basically any theorem from any human math textbook (without us pretty much giving it the proof) — or (b) you don’t actually understand the working of the system, maybe wrongly thinking you do because of confusing understanding the low-level structure of the system with understanding how it works in a fuller sense. Consider (again) the difference between knowing the initial state and transition laws of a universe and understanding the life that arises in it (supposing that life indeed arises in it), or the difference between knowing the architecture of a computer + the code of an AI-making algorithm run on it and understanding the AI that emerges. It is surely possible for something that does impressive things in math to arise on an understood substrate; my claim is that if this happens, you won’t be understanding this thing doing impressive math (despite understanding its substrate).
Let us focus on systems doing math, because (in this context), it is easier to think about systems doing math than about systems doing science/tech/philosophy, and because if my claim is true for math, it’d be profoundly weird for it to be false for any of these other fields.¹⁴ So, could there be such a well-understood system doing math?
There is the following fundamental issue: to get very far (in a reasonable amount of time/compute), the system will need to effectively be finding/discovering/inventing better ways to think, but if it does that, us understanding the given base structure does not get us anywhere close to understanding the system with all its built structure. The system will only do impressive things (reasonably quickly) if it can make use of radical novelty, if it can think in genuinely new ways, if it can essentially thoroughly reorganize \(\approx\)any aspect of its thinking. If you genuinely manage to force a system to only think using/with certain “ideas/structures”, it will be crippled.
1. A response: “sure, the system will have to come up with, like, radically new mathematical objects, but maybe the system could keep thinking about the objects the same way forever?”. My response to this response: there will probably need to be many kinds of structure-finding; rich structure will need participate in these radically new good mathematical objects being found; you will want to think in terms of the objects, not merely about them (well, to really think remotely well about them, you will need to think in terms of them, anyway)¹⁵; to the extent that you can make a system that supposedly “only invents new objects” work, it will already be open to thinking radically differently just using this one route you gave it for thinking differently; like, any thing of this kind that stands a chance will be largely “made of” the “objects” it is inventing and so not understandable solely by knowing a specification of some fixed base apparatus^{[and again, your own thinking is not just
  “operating on” your concepts, but also made (in part) of your
  concepts].}[One could try to see this picture in terms of the (constitutive) ideas involved in thinking being “compute multipliers”, with anything that gets very far in not too much compute needing to find many compute multipliers for itself.] I guess a core intuition/(hypo)thesis here is that it’d be profoundly “unnatural”/“bizarre” for thinking not to be a rich, developing, technological sort of thing, just like doing more broadly. Like, there are many technologies which make up a technological system that can support various doings, and there are similarly many thinking-technologies which make up a thinking-technological system which is good for various thinkings; furthermore, the development of (thinking-)technologies is itself again a rich technological thing — really, it should be the same (kind of) thing as the system for supposedly object-level thought.
2. In particular, if you try to identify science-relevant structures in human thinking and make a system out of some explicit versions of those, you either get a system open-endedly searching for better structures (for which understanding the initial backbone does not bestow you with an understanding of the system), or you get an enfeebled shadow of human thought that doesn’t get anywhere.
This self-reprogramming on many/all levels that is (?\(\approx\))required to make the system work needn’t involve being explicitly able to change any important property one has. For instance, humans¹⁶ are pretty wildly (self-)reprogrammable, even though there are many properties of, say, our neural reward systems which we cannot alter (yet) — but we can, for example, create contexts for ourselves in which different things end up being rewarded by these systems (like, if you enroll at a school, you might be getting more reward for learning; if you take a game seriously or set out to solve some problem, your reward system will be boosting stuff that helps you do well in that game or solve the problem); a second example: while we would probably want to keep our thinking close to our natural language for a long time, we can build wild ways to think about mathematical questions (or reorganize our thinking about some mathematical questions) while staying “inside/[adjacent to]” natural language; a third example: while you’d probably struggle to visualize 4-dimensional scenes¹⁷, you might still be able to figure out what shape gets made if you hang a 4-dimensional hypercube from one vertex and intersect it with a “horizontal” hyperplane through its center.¹⁸
Are these arguments strong enough that we should think that this kind of thing is not ever going to be remotely competitive? I think that’s plausible. Should it make us think that there is no set of ideas which would get us some such crisp system which proves Fermat’s last theorem with no more compute than fits in this galaxy (without us handing it the proof)? Idk, maybe — very universally quantified statements are scary to assert (well, because they are unlikely to be true). But minimally, it is very difficult.

Anyway, if I were forced to give an eleven-word answer to “how does thinking work?”,¹⁹ I’d say “one finds good components for thinking, and puts them to use”²⁰.

But this finding of good components and putting them to use is not some definite finite thing; it is still an infinitely rich thing; there is a real infinitude of structure to employ to do this well. A human is doing this much better than a Jupiter-sized computer doing some naive program search, say.
I’m dissatisfied with “one finds good components for thinking, and puts them to use” potentially giving the false impression that [what I’m pointing to must involve being conscious of the fact that one is looking for components or putting them to use], which is really a very rare feature among instances in the class I have in mind. Such explicit self-awareness is rare even among instances of finding good components for thinking which involve a lot of thought; here are some examples of thinking about how to think:
1. a mathematician coming up with a good mathematical concept;
2. seeing a need to talk about something and coining a word for it;
3. a philosopher trying to clarify/re-engineer a concept, eg by seeing which more precise definition could accord with the concept having some desired “inferential role”;²¹
4. noticing and resolving tensions in one’s views;
5. discovering/inventing/developing the scientific method; inventing/developing p-values; improving peer review;
6. discussing what kinds of evidence could help with some particular scientific question;
7. inventing writing; inventing textbooks;
8. the varied thought that is upstream of a professional poker player thinking the way they do when playing poker;
9. asking oneself “was that a reasonable inference?”, “what auxiliary construction would help with this mathematical problem?”, “which techniques could work here?”, “what is the main idea of this proof?”, “is this a good way to model the situation?”, “can I explain that clearly?”, “what caused me to be confused about that?”, “why did I spend so long pursuing this bad idea?”, “how could I have figured that out faster?”, “which question are we asking, more precisely?”, “why are we interested in this question?”, “what is this analogous to?”, “what should I read to understand this better?”, “who would have good thoughts on this?”²².
I prefer “one finds good components for thinking, and puts them to use” over other common ways to say something similar that I can think of — here are some: “(recursive )self-improvement”, “self re-programming”, “learning”, and maybe even”creativity” and “originality”. I do also like “one thinks about how to think, and then thinks that way”.

Even though intelligence isn’t that much of a natural kind,²³ I think it makes a lot of sense for us to pay a great deal of attention to an artificial system which is smarter than humans/humanity being created. In that sense, there is a highly natural threshold of general intelligence which we should indeed be concerned about. I’ll say more about this in Note 8. (Having said that thinking can only be infinitesimally understood and isn’t even that much of a definite thing, let me talk about it for 20 more notes :).)

Language and words are largely (used) for thinking, not just for transferring information.↩︎
words and mathematical concepts and theorems are also sorta tools, so I should really say e.g. “we also make more external tools” in this sentence↩︎
well, all the useful ones, anyway↩︎
including by hopefully opening up a constellation of questions to further inquiry↩︎
you know, like \([-1,1]^3\)↩︎
If that’s too easy for you, I’m sure you can think of a tougher que about the cube which isstion about the cube which is appropriate for yourself. Maybe this will be fun: take a uniformly random 2-dimensional slice through the center of an \(n\)-dimensional hypercube; what kind of 2-dimensional shape do you see, and what’s the expected number of faces? (I’m not asking for exact answers, but for a description of what’s roughly going on asymptotically in \(n\).)↩︎
this is particularly true in the context of trying to solve alignment for good; it is plausibly somewhat less severe in the context of trying to end the present period of (imo) acute risk with some AI involvement; I will return to these themes in Note 7↩︎
let’s pretend quantum mechanics isn’t a thing↩︎
I think there’s a unity to being good at things, and I admit that the cluster of views on intelligence in these notes — namely, thinking this highly infinite thing, putting to use structures from a very diverse space — has some trouble/discomfort admitting/predicting/[making sense of] this. While I think there’s some interesting/concerning tension here, I’m not going to address it further in these notes.↩︎
by “fooming”, I mean: becoming better at thinking, understanding more, learning, becoming more capable/skillful↩︎
ignoring the heat death of this universe or some other such thing that might end up holding up and ignoring terrorism (e.g. by negative utilitarians :)) etc., the history of thought will probably always only be getting started↩︎
in this example, the truth values are the “content”↩︎
For example, you might just be going through all finite strings in order, checking for each string whether it is a valid proof of some sentence from the axioms, and if it is, assigning truth value \(1\) to that sentence and truth value \(0\) to its negation.↩︎
Like, if there were any difference between the areas here, it’d surely involve math being more doable with a crisp system than science/tech/philosophy?↩︎
it might help to take some mathematical subject you’re skilled in and think about how you operate differently in it now that you have reprogrammed yourself into thinking in terms of its notions, comparing that to how you were thinking when solving problems in the textbook back when you were first learning it (it might help to pick something you didn’t learn too long ago, though, so you still have some sense of what it was like not to be comfortable with it; alternatively/additionally, you could try to compare to some other subject you’re only learning now) — like, if you’re comfortable with linear algebra, think about how you can now just think in terms of vectors and linear maps and kernels and singular value decompositions and whatever, but how when you were first learning these things, you might have been translating problems into more basic terms that you were familiar with, or making explicit calls to facts in the textbook without having a sense of what’s going on yourself↩︎
especially young humans↩︎
though you might even just be able to train yourself into being able to do that, actually↩︎
That is, what 3-dimensional body is this intersection?↩︎
maybe after I’ve already said “thinking is an organic/living, technological, developing, open-ended, on-all-levels-self-reinventing kind of thing”, as I have above↩︎
I’m aware that this would be circular if conceived of as a definition of thinking.↩︎
I feel like this might sound spooky, but I really think it isn’t spooky — I’m just trying to describe the most ordinary process of reworking a concept. One reason it might sound spooky is that I’m describing it overly from the outside. From the inside, it could eg look like noting that knowledge has some properties, and then trying to make sense of what it could be more precisely given that it has those properties.↩︎
note also that one can be improving one’s thinking in these ways without explicitly asking these questions↩︎
what I mean by this is more precisely is just what I’ve said above↩︎