4 general
intelligence is not that definite
- In note 1, I claimed that “I will get some remotely definitive
understanding of problem-solving” is sorta nonsense like “I will
solve/[find a grand theor[y/em] for] math” or “I will conceive of the
ultimate technology”. One could object to this by saying “look, I’m not
trying to understand intelligence-the-infinite-thing; I’m trying to
understand intelligence as it already exists in
humans/humanity/any-one-particular-mind-that’s-sorta-generally-intelligent,
which is surely a finite thing, and so we can hope to pretty completely
understand it?”. I think this is still confused; here’s my response:
- Intelligence in
humans/humanity/any-reasonable-mind-that’s-sorta-smart will already be a
very rich thing. Humans+humanity+evolution has already done very much
searching for structures of thinking, and already found and put to use a
great variety of important ones.
- [Humans are]/[humanity is] self-reprogramming. (Human
self-reprogramming needn’t involve surgery or whatever.) A central
example: [humans think]/[humanity thinks] in language(s), and
humans/humanity made language(s) — in particular, we made each word in
(each) language. Humanity is amassing an arsenal of
mathematical concepts and theorems and methods and tricks. We make
tools, some of which are clearly [parts
of]/[used in]/[playing a role in] thinking, and all of which have been involved in us doing very
much in the world. We learn how to think about various things and in
various ways; when doing research, one thinks about how to think about
something better all the time. I’m writing these notes in large part to
restructure my thinking (and hopefully that of some others) around
thinking and alignment (as opposed to like, idk, just
stating my yes/no answers to some previously well-specified questions
(though doing this sort of thing could also totally be a part of
improving thinking)).
- Anyway, yes, it’s probably reasonable to say that humanity-now has
some finite (but probably “big”) specification. (Moreover, I’m pretty
sure that there is a 1000-line python program such that running that
program on a 2024 laptop with internet access would start a “process”
which would fairly quickly take over the world and lead to some sort of
technologically advanced future (like, with most of the compute used
being on the laptop until pretty late in the part of the process until
takeover).) Unfortunately, understanding a thing is generally much
harder than specifying it. Like, consider the humble cube. Is
it obvious to you that its symmetry group (including rotations only,
i.e., only things you can actually do with a solid physical cube) is
\(S_4\), the permutation group on \(4\) elements? Or
compare knowing the weights of a neural net to understanding it.
- The gap between specifying a thing and understanding it is
especially big when the thing is indefinitely growing, indefinitely
self-[reworking/reprogramming/improving] (as [humans are]/[humanity
is]).
- Obviously, the size of the gap between the ability to specify a
thing and understanding it depends on what we want from that
understanding — on what we want to do with it. If all we wanted from
this “understanding” was to, say, be able to print a specification of
the thing, then there would not be any gap between “understanding” the
thing and having a specification of it. Unfortunately, when we speak of
understanding intelligence, especially in the context of alignment, we
usually want to understand it in its long-term unfolding,
then there’s a massive gap — for an example, consider the gap between
having the positions and momenta of atoms when evolution got started in
front of you vs knowing what “evolution will be
up to” in billions of years.
- And even in this cursed reference class of comprehending
growing/structure-gaining things in their indefinite unfolding,
comprehending a thinking thing in its unfolding has a good claim to
being particularly cursed still, because thinking things have a tendency
to be doing their best to “run off to infinity” — they are actively
(though not always so explicitly) looking for new better ways to think
and new thinking-structures to incorporate.
- Relatedly: one could try to conceive of the ability to solve
problems in general as some sort of binary-ish property that a system
might have or might not have, and I think this is confused as well.
- I think it makes much more sense to talk loosely about a scale of
intelligence/understanding/capability-to-understand/capability/skill, at
least compared to talking of a binary-ish property of general
problem-solving. While this also has limitations, I’ll accept it for now
when criticizing viewing general intelligence as a binary-ish thing.
(I’m also going to accept something like the scalar view more broadly
for these notes, actually.)
- Given such a scale of intelligence, we could talk of whether a
system has reached some threshold in intelligence, or some threshold in
its pace of gaining intelligence. We could maybe talk of whether a
system has developed some certain amount of technology (for thinking),
or whether its ability to develop technology has reached a certain
level.
- We could talk of whether it has put to use in/for its
doing/thinking/fooming some certain types of
structures.
- But it seems hard to make a principled choice of threshold or of
structures to require. Like, there’s an ongoing big foom which keeps
finding/discovering/inventing/gaining new (thinking-)structures (and
isn’t anywhere close to being done — the history of thought is only just
getting started). Where (presumably before
humanity-now) would be a roughly principled place to draw a line?
- One could again try to go meta when drawing a line here, saying it’s
this capacity to incorporate novel structures itself which makes for an
intelligent thing. But this will itself again be a rich developing
thing, not a definite thing. In fact, it is not even a different thing
from [the thought dealing (supposedly) with object-level matters] for
which we just struggled to draw a line above. It’s not like we think in
one way “usually”, and in some completely different way when making new
mathematical concepts or inventing new technologies, say — the thinking
involved in each is quite similar. (Really, our ordinary thought
involves a great deal of “looking at itself”/reflection anyway — for
instance, think of a mathematician who is looking at a failed proof
attempt (which is sorta a reified line of thought) to try to fix it, or
think of someone trying to find a clearer way to express some idea, or
think of someone looking for tensions in their understanding of
something, or think of someone critiquing a view.)
- One (imo) unfortunate line of thinking which gets to thinking of
general intelligence as some definite thing starts from noticing that
there are nice uncomputable things like kolmogorov complexities,
solomonoff induction (or some other kind of ideal bayesianism), and AIXI
(or some other kind of ideal expected utility maximization), and then
thinks it makes sense to talk of “computable approximations” of these as
some definite things, perhaps imagining some actual mind already
possessing/being a “computable approximation” of such an uncomputable
thing.
- I think this is like thinking some theorem is “an approximate grand
formula for math”.
- It is also like thinking that a human mathematician proving theorems
is doing some “computable approximation” of searching through all
proofs. A human mathematician is really “made of” many
structures/[structural ideas].
- More generally, the actual mind will have a lot of structure which
is not remotely well-described by saying it’s a computable approximation
of an infinite thing. (But also, I don’t mean to say that it is
universally inappropriate to draw any analogy between any actual thing
and any of these infinitary things — there are surely contexts in which
such an analogy is appropriate.)
- For another example of this, an “approximate universal prediction
algorithm” being used to predict weather data could look like humans
emerging from evolution and doing philosophy and physics and inventing
computers and doing programming and machine learning, in large part by
virtue of thinking and talking to each other in language which is itself
made of very many hard-won discoveries/inventions (e.g., there are some
associated to each word), eventually making good weather simulations or
whatever — there’s very much going on here.
- Thinking of some practical string compression algorithm as a
computable approximation to kolmogorov compression is another example in
the broader cluster. Your practical string compression algorithm will be
“using some finite collection of ideas” for compressing strings, which
is an infinitesimal fraction of “the infinitely many ideas which are
used for kolmogorov compression”.
- One more (imo) mistake in this vicinity: that one could have a
system impressively doing math/science/tech/philosophy which has some
fixed “structure”, with only “content” being filled in, such that one is
able to understand how it works pretty well by knowing/understanding
this fixed structure. Here’s one example of a system kinda “doing” math
which has a given structure and only has “content” being “learned”: you
have a formal language, some given axioms, and some simple-to-specify
and simple-to-understand algorithm for assigning truth values to more sentences by making
deductions starting from the given axioms.
Here’s a second example of a system with a fixed structure, with only
content being filled in: you have a “pre-determined world modeling
apparatus” which is to populate a “world model” with entities (maybe
having mechanisms for positing both types of things and also particular
things) or whatever, maybe with some bayesianism involved. Could some
such thing do impressive work while being understandable?
- I think that at least to a very good approximation, there are only
the following two possibilities here: either (a) the system will not be
getting anywhere (unless given much more compute than could fit in our
galaxy) — like, if it is supposed to be a system doing math, it will not
actually produce a proof of any interesting open problem or basically
any theorem from any human math textbook (without us pretty much giving
it the proof) — or (b) you don’t actually understand the working of the
system, maybe wrongly thinking you do because of confusing understanding
the low-level structure of the system with understanding how it works in
a fuller sense. Consider (again) the difference between knowing the
initial state and transition laws of a universe and understanding the
life that arises in it (supposing that life indeed arises in it), or the
difference between knowing the architecture of a computer + the code of
an AI-making algorithm run on it and understanding the AI that emerges.
It is surely possible for something that does impressive things in math
to arise on an understood substrate; my claim is that if this happens,
you won’t be understanding this thing doing impressive math (despite
understanding its substrate).
- Let us focus on systems doing math, because (in this context), it is
easier to think about systems doing math than about systems doing
science/tech/philosophy, and because if my claim is true for math, it’d
be profoundly weird for it to be false for any of these other fields. So, could there be such a
well-understood system doing math?
- There is the following fundamental issue: to get very far (in a
reasonable amount of time/compute), the system will need to effectively
be finding/discovering/inventing better ways to think, but if it does
that, us understanding the given base structure does not get us anywhere
close to understanding the system with all its built structure. The
system will only do impressive things (reasonably quickly) if it can
make use of radical novelty, if it can think in genuinely new ways, if
it can essentially thoroughly reorganize \(\approx\)any aspect of its thinking. If you
genuinely manage to force a system to only think using/with certain
“ideas/structures”, it will be crippled.
- A response: “sure, the system will have to come up with, like,
radically new mathematical objects, but maybe the system could keep
thinking about the objects the same way forever?”. My response to this
response: there will probably need to be many kinds of
structure-finding; rich structure will need participate in these
radically new good mathematical objects being found; you will want to
think in terms of the objects, not merely about them (well, to really
think remotely well about them, you will need to think in terms of them,
anyway); to the extent that you can make a
system that supposedly “only invents new objects” work, it will already
be open to thinking radically differently just using this one route you
gave it for thinking differently; like, any thing of this kind that
stands a chance will be largely “made of” the “objects” it is inventing
and so not understandable solely by knowing a specification of some
fixed base apparatus[and again, your own thinking is not just
“operating on” your concepts, but also made (in part) of your
concepts].[One could try to see this picture in terms of the
(constitutive) ideas involved in thinking being “compute multipliers”,
with anything that gets very far in not too much compute needing to find
many compute multipliers for itself.] I guess a core
intuition/(hypo)thesis here is that it’d be profoundly
“unnatural”/“bizarre” for thinking not to be a rich,
developing, technological sort of thing, just like
doing more broadly. Like, there are many technologies
which make up a technological system that can support various doings,
and there are similarly many thinking-technologies which make up a
thinking-technological system which is good for various thinkings;
furthermore, the development of (thinking-)technologies is itself again
a rich technological thing — really, it should be the same (kind of)
thing as the system for supposedly object-level thought.
- In particular, if you try to identify science-relevant structures in
human thinking and make a system out of some explicit versions of those,
you either get a system open-endedly searching for better structures
(for which understanding the initial backbone does not bestow you with
an understanding of the system), or you get an enfeebled shadow of human
thought that doesn’t get anywhere.
- This self-reprogramming on many/all levels that is (?\(\approx\))required to make the system work
needn’t involve being explicitly able to change any important property
one has. For instance, humans are pretty wildly
(self-)reprogrammable, even though there are many properties of, say,
our neural reward systems which we cannot alter (yet) — but we can, for
example, create contexts for ourselves in which different things end up
being rewarded by these systems (like, if you enroll at a school, you
might be getting more reward for learning; if you take a game seriously
or set out to solve some problem, your reward system will be boosting
stuff that helps you do well in that game or solve the problem); a
second example: while we would probably want to keep our thinking close
to our natural language for a long time, we can build wild ways to think
about mathematical questions (or reorganize our thinking about some
mathematical questions) while staying “inside/[adjacent to]” natural
language; a third example: while you’d probably struggle to visualize
4-dimensional scenes, you might still be able to figure
out what shape gets made if you hang a 4-dimensional hypercube from one
vertex and intersect it with a “horizontal” hyperplane through its
center.
- Are these arguments strong enough that we should think that this
kind of thing is not ever going to be remotely competitive? I think
that’s plausible. Should it make us think that there is no set of ideas
which would get us some such crisp system which proves Fermat’s last
theorem with no more compute than fits in this galaxy (without us
handing it the proof)? Idk, maybe — very universally quantified
statements are scary to assert (well, because they are unlikely to be
true). But minimally, it is very difficult.
- Anyway, if I were forced to give an eleven-word answer to “how does
thinking work?”, I’d say “one finds good components
for thinking, and puts them to use”.
- But this finding of good components and putting them to use is not
some definite finite thing; it is still an infinitely rich thing; there
is a real infinitude of structure to employ to do this well. A human is
doing this much better than a Jupiter-sized computer doing some naive
program search, say.
- I’m dissatisfied with “one finds good components for thinking, and
puts them to use” potentially giving the false impression that [what I’m
pointing to must involve being conscious of the fact that one is looking
for components or putting them to use], which is really a very rare
feature among instances in the class I have in mind. Such explicit
self-awareness is rare even among instances of finding good components
for thinking which involve a lot of thought; here are some examples of
thinking about how to think:
- a mathematician coming up with a good mathematical concept;
- seeing a need to talk about something and coining a word for
it;
- a philosopher trying to clarify/re-engineer a concept, eg by seeing
which more precise definition could accord with the concept having some
desired “inferential role”;
- noticing and resolving tensions in one’s views;
- discovering/inventing/developing the scientific method;
inventing/developing p-values; improving peer review;
- discussing what kinds of evidence could help with some particular
scientific question;
- inventing writing; inventing textbooks;
- the varied thought that is upstream of a professional poker player thinking
the way they do when playing poker;
- asking oneself “was that a reasonable inference?”, “what auxiliary
construction would help with this mathematical problem?”, “which
techniques could work here?”, “what is the main idea of this proof?”,
“is this a good way to model the situation?”, “can I explain that
clearly?”, “what caused me to be confused about that?”, “why did I spend
so long pursuing this bad idea?”, “how could I have figured that out
faster?”, “which question are we asking, more precisely?”, “why are we
interested in this question?”, “what is this analogous to?”, “what
should I read to understand this better?”, “who would have good thoughts
on this?”.
- I prefer “one finds good components for thinking, and puts them to
use” over other common ways to say something similar that I can think of
— here are some: “(recursive )self-improvement”, “self re-programming”,
“learning”, and maybe even”creativity” and “originality”. I do also like
“one thinks about how to think, and then thinks that way”.
- Even though intelligence isn’t that much of a natural kind, I think it makes a lot of sense for
us to pay a great deal of attention to an artificial system which is
smarter than humans/humanity being created. In that sense, there is a
highly natural threshold of general intelligence which we should indeed
be concerned about. I’ll say more about this in Note 8. (Having said
that thinking can only be infinitesimally understood and isn’t even that
much of a definite thing, let me talk about it for 20 more notes
:).)
onward to Note 5!