{
"feeds": [
{
"name": "wolfram",
"url": "https://writings.stephenwolfram.com/feed/",
"folder": ""
},
{
"name": "xkcd",
"url": "https://xkcd.com/rss.xml ",
"folder": ""
},
{
"name": "korben.info",
"url": "https://www.korben.info/feed",
"folder": ""
}
],
"updateTime": 0,
"filtered": [
{
"name": "Favorites",
"read": true,
"unread": true,
"filterTags": [],
"filterFolders": [],
"filterFeeds": [],
"ignoreTags": [],
"ignoreFeeds": [],
"ignoreFolders": [],
"favorites": true,
"sortOrder": "ALPHABET_NORMAL"
},
{
"name": "read",
"sortOrder": "DATE_NEWEST",
"filterFeeds": [],
"filterFolders": [],
"filterTags": [],
"ignoreFolders": [],
"ignoreFeeds": [],
"ignoreTags": [],
"read": true
},
{
"name": "unread",
"sortOrder": "DATE_NEWEST",
"filterFeeds": [],
"filterFolders": [],
"filterTags": [],
"ignoreFolders": [],
"ignoreFeeds": [],
"ignoreTags": [],
"unread": true
}
],
"saveLocation": "default",
"displayStyle": "cards",
"saveLocationFolder": "",
"items": [
{
"title": "xkcd.com",
"subtitle": "",
"link": "https://xkcd.com/",
"image": null,
"description": "xkcd.com: A webcomic of romance and math humor.",
"items": [
{
"title": "What If We Had Bigger Brains? Imagining Minds beyond Ours",
"description": "Cats Don’t Talk We humans have perhaps 100 billion neurons in our brains. But what if we had many more? Or what if the AIs we built effectively had many more? What kinds of things might then become possible? At 100 billion neurons, we know, for example, that compositional language of the kind we humans […]",
"content": "
We humans have perhaps 100 billion neurons in our brains. But what if we had many more? Or what if the AIs we built effectively had many more? What kinds of things might then become possible? At 100 billion neurons, we know, for example, that compositional language of the kind we humans use is possible. At the 100 million or so neurons of a cat, it doesn’t seem to be. But what would become possible with 100 trillion neurons? And is it even something we could imagine understanding?
\nMy purpose here is to start exploring such questions, informed by what we’ve seen in recent years in neural nets and LLMs, as well as by what we now know about the fundamental nature of computation, and about neuroscience and the operation of actual brains (like the one that’s writing this, imaged here):
\nOne suggestive point is that as artificial neural nets have gotten bigger, they seem to have successively passed a sequence of thresholds in capability:
\nSo what’s next? No doubt there’ll be things like humanoid robotic control that have close analogs in what we humans already do. But what if we go far beyond the ~1014 connections that our human brains have? What qualitatively new kinds of capabilities might there then be?
\nIf this was about “computation in general” then there wouldn’t really be much to talk about. The Principle of Computational Equivalence implies that beyond some low threshold computational systems can generically produce behavior that corresponds to computation that’s as sophisticated as it can ever be. And indeed that’s the kind of thing we see both in lots of abstract settings, and in the natural world.
\nBut the point here is that we’re not dealing with “computation in general”. We’re dealing with the kinds of computations that brains fundamentally do. And the essence of these seems to have to do with taking in large amounts of sensory data and then coming up with what amount to decisions about what to do next.
\nIt’s not obvious that there’d be any reasonable way to do this. The world at large is full of computational irreducibility—where the only general way to work out what will happen in a system is just to run the underlying rules for that system step by step and see what comes out:
\nAnd, yes, there are plenty of questions and issues for which there’s essentially no choice but to do this irreducible computation—just as there are plenty of cases where LLMs need to call on our Wolfram Language computation system to get computations done. But brains, for the things most important to them, somehow seem to routinely manage to “jump ahead” without in effect simulating every detail. And what makes this possible is the fundamental fact that within any system that shows overall computational irreducibility there must inevitably be an infinite number of “pockets of computational reducibility”, in effect associated with “simplifying features” of the behavior of the system.
\nIt’s these “pockets of reducibility” that brains exploit to be able to successfully “navigate” the world for their purposes in spite of its “background” of computational irreducibility. And in these terms things like the progress of science (and technology) can basically be thought of as the identification of progressively more pockets of computational reducibility. And we can then imagine that the capabilities of bigger brains could revolve around being able to “hold in mind” more of these pockets of computational reducibility.
\nWe can think of brains as fundamentally serving to “compress” the complexity of the world, and extract from it just certain features—associated with pockets of reducibility—that we care about. And for us a key manifestation of this is the idea of concepts, and of language that uses them. At the level of raw sensory input we might see many detailed images of some category of thing—but language lets us describe them all just in terms of one particular symbolic concept (say “rock”).
\nIn a rough first approximation, we can imagine that there’s a direct correspondence between concepts and words in our language. And it’s then notable that human languages all tend to have perhaps 30,000 common words (or word-like constructs). So is that scale the result of the size of our brains? And could bigger brains perhaps deal with many more words, say millions or more?
\n“What could all those words be about?” we might ask. After all, our everyday experience makes it seem like our current 30,000 words are quite sufficient to describe the world as it is. But in some sense this is circular: we’ve invented the words we have because they’re what we need to describe the aspects of the world we care about, and want to talk about. There will always be more features of, say, the natural world that we could talk about. It’s just that we haven’t chosen to engage with them. (For example, we could perfectly well invent words for all the detailed patterns of clouds in the sky, but those patterns are not something we currently feel the need to talk in detail about.)
\nBut given our current set of words or concepts, is there “closure” to it? Can we successfully operate in a “self-consistent slice of concept space” or will we always find ourselves needing new concepts? We might think of new concepts as being associated with intellectual progress that we choose to pursue or not. But insofar as the “operation of the world” is computationally irreducible it’s basically inevitable that we’ll eventually be confronted with things that cannot be described by our current concepts.
\nSo why is it that the number of concepts (or words) isn’t just always increasing? A fundamental reason is abstraction. Abstraction takes collections of potentially large numbers of specific things (“tiger”, “lion”, …) and allows them to be described “abstractly” in terms of a more general thing (say, “big cats”). And abstraction is useful if it’s possible to make collective statements about those general things (“all big cats have…”), in effect providing a consistent “higher-level” way of thinking about things.
\nIf we imagine concepts as being associated with particular pockets of reducibility, the phenomenon of abstraction is then a reflection of the existence of networks of these pockets. And, yes, such networks can themselves show computational irreducibility, which can then have its own pockets of reducibility, etc.
\nSo what about (artificial) neural nets? It’s routine to “look inside” these, and for example see the possible patterns of activation at a given layer based on a range of possible (“real-world”) inputs. We can then think of these patterns of activation as forming points in a “feature space”. And typically we’ll be able to see clusters of these points, which we can potentially identify as “emergent concepts” that we can view as having been “discovered” by the neural net (or rather, its training). Normally there won’t be existing words in human languages that correspond to most of these concepts. They represent pockets of reducibility, but not ones that we’ve identified, and that are captured by our typical 30,000 or so words. And, yes, even in today’s neural nets, there can easily be millions of “emergent concepts”.
\nBut will these be useful abstractions or concepts, or merely “incidental examples of compression” not connected to anything else? The construction of neural nets implies that a pattern of “emergent concepts” at one layer will necessarily feed into the next layer. But the question is really whether the concept can somehow be useful “independently”—not just at this particular place in the neural net.
\nAnd indeed the most obvious everyday use for words and concepts—and language in general—is for communication: for “transferring thoughts” from one mind to another. Within a brain (or a neural net) there are all kinds of complicated patterns of activity, different in each brain (or each neural net). But a fundamental role that concepts, words and language play is to define a way to “package up” certain features of that activity in a form that can be robustly transported between minds, somehow inducing “comparable thoughts” in all of them.
\nThe transfer from one mind to another can never be precise: in going from the pattern of activity in one brain (or neural net) to the pattern of activity in another, there’ll always be translation involved. But—at least up to a point—one can expect that the “more that’s said” the more faithful a translation can be.
\nBut what if there’s a bigger brain, with more “emergent concepts” inside? Then to communicate about them at a certain level of precision we might need to use more words—if not a fundamentally richer form of language. And, yes, while dogs seem to understand isolated words (“sit”, “fetch”, …), we, with our larger brains, can deal with compositional language in which we can in effect construct an infinite range of meanings by combining words into phrases, sentences, etc.
\nAt least as we currently imagine it, language defines a certain model of the world, based on some finite collection of primitives (words, concepts, etc.). The existence of computational irreducibility tells us that such a model can never be complete. Instead, the model has to “approximate things” based on the “network of pockets of reducibility” that the primitives in the language effectively define. And insofar as a bigger brain might in essence be able to make use of a larger network of pockets of reducibility, it can then potentially support a more precise model of the world.
\nAnd it could then be that if we look at such a brain and what it does, it will inevitably seem closer to the kind of “incomprehensible and irreducible computation” that’s characteristic of so many abstract systems, and systems in nature. But it could also be that in being a “brain-like construct” it’d necessarily tap into computational reducibility in such a way that—with the formalism and abstraction we’ve built—we’d still meaningfully be able to talk about what it can do.
\nAt the outset we might have thought any attempt for us to “understand minds beyond ours” would be like asking a cat to understand algebra. But somehow the universality of the concepts of computation that we now know—with their ability to address the deepest foundations of physics and other fields—makes it seem more plausible we might now be in a position to meaningfully discuss minds beyond ours. Or at least to discuss the rather more concrete question of what brains like ours, but bigger than ours, might be able to do.
\nAs we’ve mentioned, at least in a rough approximation, the role of brains is to turn large amounts of sensory input into small numbers of decisions about what to do. But how does this happen?
\nHuman brains continually receive input from a few million “sensors”, mostly associated with photoreceptors in our eyes and touch receptors in our skin. This input is processed by a total of about 100 billion neurons, each responding in a few milliseconds, and mostly organized into a handful of layers. There are altogether perhaps 100 trillion connections between neurons, many quite long range. At any given moment, a few percent of neurons (i.e. perhaps a billion) are firing. But in the end, all that activity seems to feed into particular structures in the lower part of the brain that in effect “take a majority vote” a few times a second to determine what to do next—in particular with the few hundred “actuators” our bodies have.
\nThis basic picture seems to be more or less the same in all higher animals. The total number of neurons scales roughly with the number of “input sensors” (or, in a first approximation, the surface area of the animal—i.e. volume2/3—which determines the number of touch sensors). The fraction of brain volume that consists of connections (“white matter”) as opposed to main parts of neurons (“gray matter”) increases as a power of the number of neurons. The largest brains—like ours—have a roughly nested pattern of folds that presumably reduce average connection lengths. Different parts of our brains have characteristic functions (e.g. motor control, handling input from our eyes, generation of language, etc.), although there seems to be enough universality that other parts can usually learn to take over if necessary. And in terms of overall performance, animals with smaller brains generally seem to react more quickly to stimuli.
\nSo what was it that made brains originally arise in biological evolution? Perhaps it had to do with giving animals a way to decide where to go next as they moved around. (Plants, which don’t move around, don’t have brains.) And perhaps it’s because animals can’t “go in more than one direction at once” that brains seem to have the fundamental feature of generating a single stream of decisions. And, yes, this is probably why we have a single thread of “conscious experience”, rather than a whole collection of experiences associated with the activities of all our neurons. And no doubt it’s also what we leverage in the construction of language—and in communicating through a one-dimensional sequence of tokens.
\nIt’s notable how similar our description of brains is to the basic operation of large language models: an LLM processes input from its “context window” by feeding it through large numbers of artificial neurons organized in layers—ultimately taking something like a majority vote to decide what token to generate next. There are differences, however, most notably that whereas brains routinely intersperse learning and thinking, current LLMs separate training from operation, in effect “learning first” and “thinking later”.
\nBut almost certainly the core capabilities of both brains and neural nets don’t depend much on the details of their biological or architectural structure. It matters that there are many inputs and few outputs. It matters that there’s irreducible computation inside. It matters that the systems are trained on the world as it is. And, finally, it matters how “big” they are, in effect relative to the “number of relevant features of the world”.
\nIn artificial neural nets, and presumably also in brains, memory is encoded in the
\nstrengths (or “weights”) of connections between neurons. And at least in neural nets it seems that the number of tokens (of textual data) that can reasonably be “remembered” is a few times the number of weights. (With current methods, the number of computational operations of training needed to achieve this is roughly the product of the total number of weights and the total number of tokens.) If there are too few weights, what happens is that the “memory” gets fuzzy, with details of the fuzziness reflecting details of the structure of the network.
But what’s crucial—for both neural nets and brains—is not so much to remember specifics of training data, but rather to just “do something reasonable” for a wide range of inputs, regardless of whether they’re in the training data. Or, in other words, to generalize appropriately from training data.
\nBut what is “appropriate generalization”? As a practical matter, it tends to be “generalization that aligns with what we humans would do”. And it’s then a remarkable fact that artificial neural nets with fairly simple architectures can successfully do generalizations in a way that’s roughly aligned with human brains. So why does this work? Presumably it’s because there are universal features of “brain-like systems” that are close enough between human brains and neural nets. And once again it’s important to emphasize that what’s happening in both cases seems distinctly weaker than “general computation”.
\nA feature of “general computation” is that it can potentially involve unbounded amounts of time and storage space. But both brains and typical neural nets have just a fixed number of neurons. And although both brains and LLMs in effect have an “outer loop” that can “recycle” output to input, it’s limited.
\nAnd at least when it comes to brains, a key feature associated with this is the limit on “working memory”, i.e. memory that can readily be both read and written “in the course of a computation”. Bigger and more developed brains typically seem to support larger amounts of working memory. Adult humans can remember perhaps 5 or 7 “chunks” of data in working memory; for young children, and other animals, it’s less. Size of working memory (as we’ll discuss later) seems to be important in things like language capabilities. And the fact that it’s limited is no doubt one reason we can’t generally “run code in our brains”.
\nAs we try to reflect on what our brains do, we’re most aware of our stream of conscious thought. But that represents just a tiny fraction of all our neural activity. Most of the activity is much less like “thought” and much more like typical processes in nature, with lots of elements seemingly “doing their own thing”. We might think of this as an “ocean of unconscious neural activity”, from which a “thread of consensus thought” is derived. Usually—much like in an artificial neural net—it’s difficult to find much regularity in that “unconscious activity”. Though when one trains oneself enough to get to the point of being able to “do something without thinking about it”, that presumably happens by organizing some part of that activity.
\nThere’s always a question of what kinds of things we can learn. We can’t overcome computational irreducibility. But how broadly can we handle what’s computationally reducible? Artificial neural nets show a certain genericity in their operation: although some specific architectures are more efficient than others, it doesn’t seem to matter much whether the input they’re fed is images or text or numbers, or whatever. And for our brains it’s probably the same—though what we’ve normally experienced, and learned from, are the specific kinds of input the come from our eyes, ears, etc. And from these, we’ve ended up recognizing certain types of regularities—that we’ve then used to guide our actions, set up our environment, etc.
\nAnd, yes, this plugs into certain pockets of computational reducibility in the world. But there’s always further one could go. And how that might work with brains bigger than ours is at the core of what we’re trying to discuss here.
\nAt some level we can view our brains as serving to take the complexity of the world and extract from it a compressed representation that our finite minds can handle. But what is the structure of that representation? A central aspect of it is that it ignores many details of the original input (like particular configurations of pixels). Or, in other words, it effectively equivalences many different inputs together.
\nBut how then do we describe that equivalence class? Implementationally, say in a neural net, the equivalence class might correspond to an attractor to which many different initial conditions all evolve. In terms of the detailed pattern of activity in the neural net the attractor will typically be very hard to describe. But on a larger scale we can potentially just think of it as some kind of robust construct that represents a class of things—or what in terms of our process of thought we might describe as a “concept”.
\nAt the lowest level there’s all sorts of complicated neural activity in our brains—most of it mired in computational irreducibility. But the “thin thread of conscious experience” that we extract from this we can for many purposes treat as being made up of higher-level “units of thought”, or essentially “discrete concepts”.
\nAnd, yes, it’s certainly our typical human experience that robust constructs—and particularly ones from which other constructs can be built—will be discrete. In principle one can imagine that there could be things like “robust continuous spaces of concepts” (“cat and dog and everything in between”). But we don’t have anything like the computational paradigm that shows us a consistent universal way that such things could fit together (there’s no robust analog of computation theory for real numbers, for example). And somehow the success of the computational paradigm—potentially all the way down to the foundations of the physical universe—doesn’t seem to leave much room for anything else.
\nSo, OK, let’s imagine that we can represent our thread of conscious experience in terms of concepts. Well, that’s close to saying that we’re using language. We’re “packaging up” the details of our neural activity into “robust elements” which we can think of as concepts—and which are represented in language essentially by words. And not only does this “packaging” into language give a robust way for different brains to communicate; it also gives a single brain a robust way to “remember” and “redeploy” thoughts.
\nWithin one brain one could imagine that one might be able to remember and “think” directly in terms of detailed low-level neural patterns. But no doubt the “neural environment” inside a brain is continually changing (not least because of its stream of sensory input). And so the only way to successfully “preserve a thought” across time is presumably to “package it up” in terms of robust elements, or essentially in terms of language. In other words, if we’re going to be able to consistently “think a particular thought” we probably have to formulate it in terms of something robust—like concepts.
\nBut, OK, individual concepts are one thing. But language—or at least human language—is based on putting together concepts in structured ways. One might take a noun (“cat”) and qualify it with an adjective (“black”) to form a phrase that’s in effect a finer-grained version of the concept represented by the noun. And in a rough approximation one can think of language as formed from trees of nested phrases like this. And insofar as the phrases are independent in their structure (i.e. “context free”), we can parse such language by recursively understanding each phrase in turn—with the constraint that we can’t do it if the nesting goes too deep for us to hold the necessary stack of intermediate steps in our working memory.
\nAn important feature of ordinary human language is that it’s ultimately presented in a sequential way. Even though it may consist of a nested tree of phrases, the words that are the leaves of that tree are spoken or written in a one-dimensional sequence. And, yes, the fact that this is how it works is surely closely connected to the fact that our brains construct a single thread of conscious experience.
\nIn the actuality of the few thousand human languages currently in use, there is considerable superficial diversity, but also considerable fundamental commonality. For example, the same parts of speech (noun, verb, etc.) typically show up, as do concepts like “subject” and “object”. But the details of how words are put together, and how things are indicated, can be fairly different. Sometimes nouns have case endings; sometimes there are separate prepositions. Sometimes verb tenses are indicated by annotating the verb; sometimes with extra words. And sometimes, for example, what would usually be whole phrases can be smooshed together into single words.
\nIt’s not clear to what extent commonalities between languages are the result of shared history, and to what extent they’re consequences either of the particulars of our human sensory experience of the world, or the particular construction of our brains. It’s not too hard to get something like concepts to emerge in experiments on training neural nets to pass data through a “bottleneck” that simulates a “mind-to-mind communication channel”. But how compositionality or grammatical structure might emerge is not clear.
\nOK, but so what might change if we had bigger brains? If neural nets are a guide, one obvious thing is that we should be able to deal directly with a larger number of “distinct concepts”, or words. So what consequences would this have? Presumably one’s language would get “grammatically shallower”, in the sense that what would otherwise have had to be said with nested phrases could now be said with individual words. And presumably this would tend to lead to “faster communication”, requiring fewer words. But it would likely also lead to more rigid communication, with less ability to tweak shades of meaning, say by changing just a few words in a phrase. (And it would presumably also require longer training, to learn what all the words mean.)
\nIn a sense we have a preview of what it’s like to have more words whenever we deal with specialized versions of existing language, aimed say at particular technical fields. There are additional words of “jargon” available, that make certain things “faster to say” (but require longer to learn). And with that jargon comes a certain rigidity, in saying easily only what the jargon says, and not something slightly different.
\nSo how else could language be different with a bigger brain? With larger working memory, one could presumably have more deeply nested phrases. But what about more sophisticated grammatical structures, say ones that aren’t “context free”, in the sense that different nested phrases can’t be parsed separately? My guess is that this quickly devolves into requiring arbitrary computation—and runs into computational irreducibility. In principle it’s perfectly possible to have any program as the “message” one communicates. But if one has to run the program to “determine its meaning”, that’s in general going to involve computational irreducibility.
\nAnd the point is that with our assumptions about what “brain-like systems” do, that’s something that’s out of scope. Yes, one can construct a system (even with neurons) that can do it. But not with the “single thread of decisions from sensory input” workflow that seems characteristic of brains. (There are finer gradations one could consider—like languages that are context sensitive but don’t require general computation. But the Principle of Computational Equivalence strongly suggests that the separation between nested context-free systems and ones associated with arbitrary computation is very thin, and there doesn’t seem to be any particular reason to expect that the capabilities of a bigger brain would land right there.)
\nSaid another way: the Principle of Computational Equivalence says it’s easy to have a system that can deal with arbitrary computation. It’s just that such a system is not “brain like” in its behavior; it’s more like a typical system we see in nature.
\nOK, but what other “additional features” can one imagine, for even roughly “brain-like” systems? One possibility is to go beyond the idea of a single thread of experience, and to consider a multiway system in which threads of experience can branch and merge. And, yes, this is what we imagine happens at a low level in the physical universe, particularly in connection with quantum mechanics. And indeed it’s perfectly possible to imagine, for example, a “quantum-like” LLM system in which one generates a graph of different textual sequences. But just “scaling up the number of neurons” in a brain, without changing the overall architecture, won’t get to this. We have to have a different, multiway architecture. Where we have a “graph of consciousness” rather than a “stream of consciousness”, and where, in effect, we’re “thinking a graph of thoughts”, notably with thoughts themselves being able to branch and merge.
\nIn our practical use of language, it’s most often communicated in spoken or written form—effectively as a one-dimensional sequence of tokens. But in math, for example, it’s common to have a certain amount of 2D structure, and in general there are also all sorts of specialized (usually technical) diagrammatic representations in use, often based on using graphs and networks—as we’ll discuss in more detail below.
\nBut what about general pictures? Normally it’s difficult for us to produce these. But in generative AI systems it’s basically easy. So could we then imagine directly “communicating mental images” from one mind to another? Maybe as a practical matter some neural implant in our brain could aggregate neural signals from which a displayed image could be generated. But is there in fact something coherent that could be extracted from our brains in this way? Perhaps that can only happen after “consensus is formed”, and we’ve reduced things to a much thinner “thread of experience”. Or, in other words, perhaps the only robust way for us to “think about images” is in effect to reduce them to discrete concepts and language-like representations.
\nBut perhaps if we “had the hardware” to display images directly from our minds it’d be a different story. And it’s sobering to imagine that perhaps the reason cats and dogs don’t appear to have compositional language is just that they don’t “have the hardware” to talk like we do (and it’s too laborious for them to “type with their paws”, etc.). And, by analogy, that if we “had the hardware” for displaying images, we’d discover we could also “think very differently”.
\nOf course, in some small ways we do have the ability to “directly communicate with images”, for example in our use of gestures and body language. Right now, these seem like largely ancillary forms of communication. But, yes, it’s conceivable that with bigger brains, they could be more.
\nAnd when it comes to other animals the story can be different. Cuttlefish are notable for dynamically producing elaborate patterns on their skin—giving them in a sense the hardware to “communicate in pictures”. But so far as one can tell, they produce just a small number of distinct patterns—and certainly nothing like a “pictorial generalization of compositional language”. (In principle one could imagine that “generalized cuttlefish” could do things like “dynamically run cellular automata on their skin”, just like all sorts of animals “statically” do in the process of growth or development. But to decode such patterns—and thereby in a sense enable “communicating in programs”—would typically require irreducible amounts of computation that are beyond the capabilities of any standard brain-like system.)
\nWe humans have raw inputs coming into our brains from a few million sensors distributed across our usual senses of touch, sight, hearing, taste and smell (together with balance, temperature, hunger, etc.). In most cases the detailed sensor inputs are not independent; in a typical visual scene, for example, neighboring pixels are highly correlated. And it doesn’t seem to take many layers of neurons in our brains to distill our typical sensory experience from pure pieces of “raw data” to what we might view as “more independent features”.
\nOf course there’ll usually be much more in the raw data than just those features. But the “features” typically correspond to aspects of the data that we’ve “learned are useful to us”—normally connected to pockets of computational reducibility that exist in the environment in which we operate. Are the features we pick out all we’ll ever need? In the end, we typically want to derive a small stream of decisions or actions from all the data that comes in. But how many “intermediate features” do we need to get “good” decisions or actions?
\nThat really depends on two things. First, what our decisions and actions are like. And second, what our raw data is like. Early in the history of our species, everything was just about “indigenous human experience”: what the natural world is like, and what we can do with our bodies. But as soon as we were dealing with technology, that changed. And in today’s world we’re constantly exposed, for example, to visual input that comes not from the natural world, but, say, from digital displays.
\nAnd, yes, we often try to arrange our “user experience” to align with what’s familiar from the natural world (say by having objects that stay unchanged when they’re moved across the screen). But it doesn’t have to be that way. And indeed it’s easy—even with simple programs—to generate for example visual images very different from what we’re used to. And in many such cases, it’s very hard for us to “tell what’s going on” in the image. Sometimes it’ll just “look too complicated”. Sometimes it’ll seem like it has pieces we should recognize, but we don’t:
\nWhen it’s “just too complicated”, that’s often a reflection of computational irreducibility. But when there are pieces we might “think we should recognize”, that can be a reflection of pockets of reducibility we’re just not familiar with. If we imagine a space of possible images—as we can readily produce with generative AI—there will be some that correspond to concepts (and words) we’re familiar with. But the vast majority will effectively lie in “interconcept space”: places where we could have concepts, but don’t, at least yet:
\nSo what could bigger brains do with all this? Potentially they could handle more features, and more concepts. Full computational irreducibility will always in effect ultimately overpower them. But when it comes to handling pockets of reducibility, they’ll presumably be able to deal with more of them. So in the end, it’s very much as one might expect: a bigger brain should be able to track more things going on, “see more details”, etc.
\nBrains of our size seem like they are in effect sufficient for “indigenous human experience”. But with technology in the picture, it’s perfectly possible to “overload” them. (Needless to say, technology—in the form of filtering, data analysis, etc.—can also reduce that overload, in effect taking raw input and bringing our actual experience of it closer to something “indigenous”.)
\nIt’s worth pointing out that while two brains of a given size might be able to “deal with the same number of features or concepts”, those features or concepts might be different. One brain might have learned to talk about the world in terms of one set of primitives (such as certain basic colors); another in terms of a different set of primitives. But if both brains are sampling “indigenous human experience” in similar environments one can expect that it should be possible to translate between these descriptions—just as it is generally possible to translate between things said in different human languages.
\nBut what if the brains are effectively sampling “different slices of reality”? What if one’s using technology to convert different physical phenomena to forms (like images) that we can “indigenously” handle? Perhaps we’re sensing different electromagnetic frequencies; perhaps we’re sensing molecular or chemical properties; perhaps we’re sensing something like fluid motion. The kinds of features that will be “useful” may be quite different in these different modalities. Indeed, even something as seemingly basic as the notion of an “object” may not be so relevant if our sensory experience is effectively of continuous fluid motion.
\nBut in the end, what’s “useful” will depend on what we can do. And once again, it depends on whether we’re dealing with “pure humans” (who can’t, for example, move like octopuses) or with humans “augmented by technology”. And here we start to see an issue that relates to the basic capabilities of our brains.
\nAs “pure humans”, we have certain “actuators” (basically in the form of muscles) that we can “indigenously” operate. But with technology it’s perfectly possible for us to use quite different actuators in quite different configurations. And as a practical matter, with brains like ours, we may not be able to make them work.
\nFor example, while humans can control helicopters, they never managed to control quadcopters—at least not until digital flight controllers could do most of the work. In a sense there were just too many degrees of freedom for brains like ours to deal with. Should bigger brains be able to do more? One would think so. And indeed one could imagine testing this with artificial neural nets. In millipedes, for example, their actual brains seem to support only a couple of patterns of motion of their legs (roughly, same phase vs. opposite phase). But one could imagine that with a bigger brain, all sorts of other patterns would become possible.
\nUltimately, there are two issues at stake here. The first is having a brain be able to “independently address” enough actuators, or in effect enough degrees of freedom. The second is having a brain be able to control those degrees of freedom. And for example with mechanical degrees of freedom there are again essentially issues of computational irreducibility. Looking at the space of possible configurations—say of millipede legs—does one effectively just have to trace the path to find out if, and how, one can get from one configuration to another? Or are there instead pockets of reducibility, associated with regularities in the space of configurations, that let one “jump ahead” and figure this out without tracing all the steps? It’s those pockets of reducibility that brains can potentially make use of.
\nWhen it comes to our everyday “indigenous” experience of the world, we are used to certain kinds of computational reducibility, associated for example with familiar natural laws, say about motion of objects. But what if we were dealing with different experiences, associated with different senses?
\nFor example, imagine (as with dogs) that our sense of smell was better developed than our sense of sight—as reflected by more nerves coming into our brains from our noses than our eyes. Our description of the world would then be quite different, based for example not on geometry revealed by the line-of-sight arrival of light, but instead by the delivery of odors through fluid motion and diffusion—not to mention the probably-several-hundred-dimensional space of odors, compared to the red, green, blue space of colors. Once again there would be features that could be identified, and “concepts” that could be defined. But those might only be useful in an environment “built for smell” rather than one “built for sight”.
\nAnd in the end, how many concepts would be useful? I don’t think we have any way to know. But it certainly seems as if one can be a successful “smell-based animal” with a smaller brain (presumably supporting fewer concepts) than one needs as a successful “sight-based animal”.
\nOne feature of “natural senses” is that they tend to be spatially localized: an animal basically senses things only where it is. (We’ll discuss the case of social organisms later.) But what if we had access to a distributed array of sensors—say associated with IoT devices? The “effective laws of nature” that one could perceive would then be different. Maybe there would be regularities that could be captured by a small number of concepts, but it seems more likely that the story would be more complicated, and that in effect one would “need a bigger brain” to be able to keep track of what’s going on, and make use of whatever pockets of reducibility might exist.
\nThere are somewhat similar issues if one imagines changing the timescales for sensory input. Our perception of space, for example, depends on the fact that light travels fast enough that in the milliseconds it takes our brain to register the input, we’ve already received light from everything that’s around us. But if our brains operated a million times faster (as digital electronics does) we’d instead be registering individual photons. And while our brains might aggregate these to something like what we ordinarily perceive, there may be all sorts of other (e.g. quantum optics) effects that would be more obvious.
\nThe more abstractly we try to think, the harder it seems to get. But would it get easier if we had bigger brains? And might there perhaps be fundamentally higher levels of abstraction that we could reach—but only if we had bigger brains.
\nAs a way to approach such questions, let’s begin by talking a bit about the history of the phenomenon of abstraction. We might already say that basic perception involves some abstraction, capturing as it does a filtered version of the world as it actually is. But perhaps we reach a different level when we start to ask “what if?” questions, and to imagine how things in the world could be different than they are.
\nBut somehow when it comes to us humans, it seems as if the greatest early leap in abstraction was the invention of language, and the explicit delineation of concepts that could be quite far from our direct experience. The earliest written records tend to be rather matter of fact, mostly recording as they do events and transactions. But already there are plenty of signs of abstraction. Numbers independent of what they count. Things that should happen in the future. The concept of money.
\nThere seems to be a certain pattern to the development of abstraction. One notices that some category of things one sees many times can be considered similar, then one “packages these up” into a concept, often described by a word. And in many cases, there’s a certain kind of self amplification: once one has a word for something (as a modern example, say “blog”), it becomes easier for us to think about the thing, and we tend to see it or make it more often in the world around us. But what really makes abstraction take off is when we start building a whole tower of it, with one abstract concept recursively being based on others.
\nHistorically this began quite slowly. And perhaps it was seen first in theology. There were glimmerings of it in things like early (syllogistic) logic, in which one started to be able to talk about the form of arguments, independent of their particulars. And then there was mathematics, where computations could be done just in terms of numbers, independent of where those numbers came from. And, yes, while there were tables of “raw computational results”, numbers were usually discussed in terms of what they were numbers of. And indeed when it came to things like measures of weight, it took until surprisingly modern times for there to be an absolute, abstract notion of weight, independent of whether it was a weight of figs or of wool.
\nThe development of algebra in the early modern period can be considered an important step forward in abstraction. Now there were formulas that could be manipulated abstractly, without even knowing what particular numbers x stood for. But it would probably be fair to say that there was a major acceleration in abstraction in the 19th century—with the development of formal systems that could be discussed in “purely symbolic form” independent of what they might (or might not) “actually represent”.
\nAnd it was from this tradition that modern notions of computation emerged (and indeed particularly ones associated with symbolic computation that I personally have extensively used). But the most obvious area in which towers of abstraction have been built is mathematics. One might start with numbers (that could count things). But soon one’s on to variables, functions, spaces of functions, category theory—and a zillion other constructs that abstractly build on each other.
\nThe great value of abstraction is that it allows one to think about large classes of things all at once, instead of each separately. But how do those abstract concepts fit together? The issue is that often it’s in a way that’s very remote from anything about which we have direct experience from our raw perception of the world. Yes, we can define concepts about transfinite numbers or higher categories. But they don’t immediately relate to anything we’re familiar with from our everyday experience.
\nAs a practical matter one can often get a sense of how high something is on the tower of abstraction by seeing how much one has to explain to build up to it from “raw experiential concepts”. Just sometimes it turns out that actually, once one hears about a certain seemingly “highly abstract” concept, one can actually explain it surprisingly simply, without going through the whole historical chain that led to it. (A notable example of this is the concept of universal computation—which arose remarkably late in human intellectual history, but is now quite easy to explain, albeit particularly given its actual widespread embodiment in technology.) But the more common case is that there’s no choice but to explain a whole tower of concepts.
\nAt least in my experience, however, when one actually thinks about “highly abstract” things, one does it by making analogies to more familiar, more concrete things. The analogies may not be perfect, but they provide scaffolding which allows our brains to take what would otherwise be quite inaccessible steps.
\nAt some level any abstraction is a reflection of a pocket of computational reducibility. Because if a useful abstraction can be defined, what it means is that it’s possible to say something in a “summarized” or reduced way, in effect “jumping ahead”, without going through all the computational steps or engaging with all the details. And one can then think of towers of abstraction as being like networks of pockets of computational reducibility. But, yes, it can be hard to navigate these.
\nUnderneath, there’s lots of computational irreducibility. And if one is prepared to “go through all the steps” one can often “get to an answer” without all the “conceptual difficulty” of complex abstractions. But while computers can often readily “go through all the steps”, brains can’t. And that’s in a sense why we have to use abstraction. But inevitably, even if we’re using abstraction, and the pockets of computational reducibility associated with it, there’ll be shadows of the computational irreducibility underneath. And in particular, if we try to “explore everything”, our network of pockets of reducibility will inevitably “get complicated”, and ultimately also be mired in computational irreducibility, albeit with “higher-level” constructs than in the computational irreducibility underneath.
\nNo finite brain will ever be able to “go all the way”, but it starts to seem likely that a bigger brain will be able to “reach further” in the network of abstraction. But what will it find there? How does the character of abstraction change when we take it further? We’ll be able to discuss this a bit more concretely when we talk about computational language below. But perhaps the main thing to say now is that—at least in my experience—most higher abstractions don’t feel as if they’re “structurally different” once one understands them. In other words, most of the time, it seems as if the same patterns of thought and reasoning that one’s applied in many other places can be applied there too, just to different kinds of constructs.
\nSometimes, though, there seem to be exceptions. Shocks to intuition that seem to separate what one’s now thinking about from anything one’s thought before. And, for example, for me this happened when I started looking broadly at the computational universe. I had always assumed that simple rules would lead to simple behavior. But many years ago I discovered that in the computational universe this isn’t true (hence computational irreducibility). And this led to a whole different paradigm for thinking about things.
\nIt feels a bit like in metamathematics. Where one can imagine one type of abstraction associated with different constructs out of which to form theorems. But where somehow there’s another level associated with different ways to build new theorems, or indeed whole spaces of theorems. Or to build proofs from proofs, or proofs from proofs of proofs, etc. But the remarkable thing is that there seems to be an ultimate construct that encompasses it all: the ruliad.
\nWe can describe the ruliad as the entangled limit of all possible computations. But we can also describe it as the limit of all possible abstractions. And it seems to lie underneath all physical reality, as well as all possible mathematics, etc. But, we might ask, how do brains relate to it?
\nInevitably, it’s full of computational irreducibility. And looked at as a whole, brains can’t get far with it. But the key idea is to think about how brains as they are—with all their various features and limitations—will “parse” it. And what I’ve argued is that what “brains as they are” will perceive about the ruliad are the core laws of physics (and mathematics) as we know them. In other words, it’s because brains are the way they are that we perceive the laws of physics that we perceive.
\nWould it be different for bigger brains? Not if they’re the “same kind of brains”. Because what seems to matter for the core laws of physics are really just two properties of observers. First, that they’re computationally bounded. And second, that they believe they are persistent in time, and have a single thread of experience through time. And both of these seem to be core features of what makes brains “brain-like”, rather than just arbitrary computational systems.
\nIt’s a remarkable thing that just these features are sufficient to make core laws of physics inevitable. But if we want to understand more about the physics we’ve constructed—and the laws we’ve deduced—we probably have to understand more about what we’re like as observers. And indeed, as I’ve argued elsewhere, even our physical scale (much bigger than molecules, much smaller than the whole universe) is for example important in giving us the particular experience (and laws) of physics that we have.
\nWould this be different with bigger brains? Perhaps a little. But anything that something brain-like can do pales in comparison to the computational irreducibility that exists in the ruliad and in the natural world. Nevertheless, with every new pocket of computational reducibility that’s reached we get some new abstraction about the world, or in effect, some new law about how the world works.
\nAnd as a practical matter, each such abstraction can allow us to build a whole collection of new ways of thinking about the world, and making things in the world. It’s challenging to trace this arc. Because in a sense it’ll all be about “things we never thought to think about before”. Goals we might define for ourselves that are built on a tower of abstraction, far away from what we might think of as “indigenous human goals”.
\nIt’s important to realize that there won’t just be one tower of abstraction that can be built. There’ll inevitably be an infinite network of pockets of computational reducibility, with each path leading to a different specific tower of abstraction. And indeed the abstractions we have pursued reflect the particular arc of human intellectual history. Bigger brains—or AIs—have many possible directions they can go, each one defining a different path of history.
\nOne question to ask is to what extent reaching higher levels of abstraction is a matter of education, and to what extent it requires additional intrinsic capabilities of a brain. It is, I suspect, a mixture. Sometimes it’s really just a question of knowing “where that pocket of reducibility is”, which is something we can learn from education. But sometimes it’s a question of navigating a network of pockets, which may only be possible when brains reach a certain level of “computational ability”.
\nThere’s another thing to discuss, related to education. And that’s the fact that over time, more and more “distinct pieces of knowledge” get built up in our civilization. There was perhaps a time in history when a brain of our size could realistically commit to memory at least the basics of much of that knowledge. But today that time has long passed. Yes, abstraction in effect compresses what one needs to know. But the continual addition of new and seemingly important knowledge, across countless specialties, makes it impossible for brains of our size to keep up.
\nPlenty of that knowledge is, though, quite siloed in different areas. But sometimes there are “grand analogies” to make—say pulling an idea from relativity theory and applying it to biological evolution. In a sense such analogies reveal new abstractions—but to make them requires knowledge that spans many different areas. And that’s a place where bigger brains—or AIs—can potentially do something that’s in a fundamental way “beyond us”.
\nWill there always be such “grand analogies” to make? The general growth of knowledge is inevitably a computationally irreducible process. And within it there will inevitably be pockets of reducibility. But how often in practice will one actually encounter “long-range connections” across “knowledge space”? As a specific example one can look at metamathematics, where such connections are manifest in theorems that link seemingly different areas of mathematics. And this example leads one to realize that at some deep level grand analogies are in a sense inevitable. In the context of the ruliad, one can think of different domains of knowledge as corresponding to different parts. But the nature of the ruliad—encompassing as it does everything that is computationally possible—inevitably imbues it with a certain homogeneity, which implies that (as the Principle of Computational Equivalence might suggest) there must ultimately be a correspondence between different areas. In practice, though, this correspondence may be at a very “atomic” (or “formal”) level, far below the kinds of descriptions (based on pockets of reducibility) that we imagine brains normally use.
\nBut, OK, will it always take an “expanding brain” to keep up with the “expanding knowledge” we have? Computational irreducibility guarantees that there’ll always in principle be “new knowledge” to be had—separated from what’s come before by irreducible amounts of computation. But then there’s the question of whether in the end we’ll care about it. After all, it could be that the knowledge we can add is so abstruse that it will never affect any practical decisions we have to make. And, yes, to some extent that’s true (which is why only some tiny fraction of the Earth’s population will care about what I’m writing here). But another consequence of computational irreducibility is that there will always be “surprises”—and those can eventually “push into focus” even what at first seems like arbitrarily obscure knowledge.
\nLanguage in general—and compositional language in particular—is arguably the greatest invention of our species. But is it somehow “the top”—the highest possible representation of things? Or if, for example, we had bigger brains, is there something beyond it that we could reach?
\nWell, in some very formal sense, yes, compositional language (at least in idealized form) is “the top”. Because—at least if it’s allowed to include utterances of any length—then in some sense it can in principle encode arbitrary, universal computations. But this really isn’t true in any useful sense—and indeed to apply ordinary compositional language in this way would require doing computationally irreducible computations.
\nSo we return to the question of what might in practice lie beyond ordinary human language. I wondered about this for a long time. But in the end I realized that the most important clue is in a sense right in front of me: the concept of computational language, that I’ve spent much of my life exploring.
\nIt’s worth saying at the outset that the way computational language plays out for computers and for brains is somewhat different, and in some respects complementary. In computers you might specify something as a Wolfram Language symbolic expression, and then the “main action” is to evaluate this expression, potentially running a long computation to find out what the expression evaluates to.
\nBrains aren’t set up to do long computations like this. For them a Wolfram Language expression is something to use in effect as a “representation of a thought”. (And, yes, that’s an important distinction between the computational language concept of Wolfram Language, and standard “programming languages”, which are intended purely as a way to tell a computer what to do, not a way to represent thoughts.)
\nSo what kinds of thoughts can we readily represent in our computational language? There are ones involving explicit numbers, or mathematical expressions. There are ones involving cities and chemicals, and other real-world entities. But then there are higher-level ones, that in effect describe more abstract structures.
\nFor example, there’s NestList, which gives the result of nesting any operation, here named f:
\nAt the outset, it’s not obvious that this would be a useful thing to do. But in fact it’s a very successful abstraction: there are lots of functions f for which one wants to do this.
\nIn the development of ordinary human language, words tend to get introduced when they’re useful, or, in other words, when they express things one often wants to express. But somehow in human language the words one gets tend to be more concrete. Maybe they describe something that directly happens to objects in the world. Maybe they describe our impression of a human mental state. Yes, one can make rather vague statements like “I’m going to do something to someone”. But human language doesn’t normally “go meta”, doing things like NestList where one’s saying that one wants to take some “direct statement” and in effect “work with the statement”. In some sense, human language tends to “work with data”, applying a simple analog of code to it. Our computational language can “work with code” as “raw material”.
\nOne can think about this as a “higher-order function”: a function that operates not on data, but on functions. And one can keep going, dealing with functions that operate on functions that operate on functions, and so on. And at every level one is increasing the generality—and abstraction—at which one is working. There may be many specific functions (a bit analogous to verbs) that operate on data (a bit analogous to nouns). But when we talk about operating on functions themselves we can potentially have just a single function (like NestList) that operates, quite generally, on many functions. In ordinary language, we might call such things “metaverbs”, but they aren’t something that commonly occurs.
\nBut what makes them possible in computational language? Well, it’s taking the computational paradigm seriously, and representing everything in computational terms: objects, actions, etc. In Wolfram Language, it’s that we can represent everything as a symbolic expression. Arrays of numbers (or countries, or whatever) are symbolic expressions. Graphics are symbolic expressions. Programs are symbolic expressions. And so on.
\nAnd given this uniformity of representation it becomes feasible—and natural—to do higher-order operations, that in effect manipulate symbolic structure without being concerned about what the structure might represent. At some level we can view this as leading to the ultimate abstraction embodied in the ruliad, where in a sense “everything is pure structure”. But in practice in Wolfram Language we try to “anchor” what we’re doing to known concepts from ordinary human language—so that we use names for things (like NestList) that are derived from common English words.
\nIn some formal sense this isn’t necessary. Everything can be “purely structural”, as it is not only in the ruliad but also in constructs like combinators, where, say, the operation of addition can be represented by:
\nCombinators have been around for more than a century. But they are almost impenetrably difficult for most humans to understand. Somehow they involve too much “pure abstraction”, not anchored to concepts we “have a sense of” in our brains.
\nIt’s been interesting for me to observe over the years what it’s taken for people (including myself) to come to terms with the kind of higher-order constructs that exist in the Wolfram Language. The typical pattern is that over the course of months or years one gets used to lots of specific cases. And only after that is one able—often in the end rather quickly—to “get to the next level” and start to use some generalized, higher-order construct. But normally one can in effect only “go one level at a time”. After one groks one level of abstraction, that seems to have to “settle” for a while before one can go on to the next one.
\nSomehow it seems as if one is gradually “feeling out” a certain amount of computational irreducibility, to learn about a new pocket of reducibility, that one can eventually use to “think in terms of”.
\nCould “having a bigger brain” speed this up? Maybe it’d be useful to be able to remember more cases, and perhaps get more into “working memory”. But I rather suspect that combinators, for example, are in some sense fundamentally beyond all brain-like systems. It’s much as the Principle of Computational Equivalence suggests: one quickly “ascends” to things that are as computationally sophisticated as anything—and therefore inevitably involve computational irreducibility. There are only certain specific setups that remain within the computationally bounded domain that brain-like systems can deal with.
\nOf course, even though they can’t directly “run code in their brains”, humans—and LLMs—can perfectly well use Wolfram Language as a tool, getting it to actually run computations. And this means they can readily “observe phenomena” that are computationally irreducible. And indeed in the end it’s very much the same kind of thing observing such phenomena in the abstract computational universe, and in the “real” physical universe. And the point is that in both cases, brain-like systems will pull out only certain features, essentially corresponding to pockets of computational reducibility.
\nHow do things like higher-order functions relate to this? At this point it’s not completely clear. Presumably in at least some sense there are hierarchies of higher-order functions that capture certain kinds of regularities that can be thought of as associated with networks of computational reducibility. And it’s conceivable that category theory and its higher-order generalizations are relevant here. In category theory one imagines applying sequences of functions (“morphisms”) and it’s a foundational assumption that the effect of any sequence of functions can also be represented by just a single function—which seems tantamount to saying that one can always “jump ahead”, or in other words, that everything one’s dealing with is computationally reducible. Higher-order category theory then effectively extends this to higher-order functions, but always with what seem like assumptions of computational reducibility.
\nAnd, yes, this all seems highly abstract, and difficult to understand. But does it really need to be, or is there some way to “bring it down” to a level that’s close to everyday human thinking? It’s not clear. But in a sense the core art of computational language design (that I’ve practiced so assiduously for nearly half a century) is precisely to take things that at first might seem abstruse, and somehow cast them into an accessible form. And, yes, this is something that’s about as intellectually challenging as anything—because in a sense it involves continually trying to “figure out what’s really going on”, and in effect “drilling down” to get to the foundations of everything.
\nBut, OK, when one gets there, how simple will things be? Part of that depends on how much computational irreducibility is left when one reaches what one considers to be “the foundations”. And part in a sense depends on the extent to which one can “find a bridge” between the foundations and something that’s familiar. Of course, what’s “familiar” can change. And indeed over the four decades that I’ve been developing the Wolfram Language quite a few things (particularly in areas like functional programming) that at first seemed abstruse and unfamiliar have begun to seem more familiar. And, yes, it’s taken the collective development and dissemination of the relevant ideas to achieve that. But now it “just takes education”; it doesn’t “take a bigger brain” to deal with these things.
\nOne of the core features of the Wolfram Language is that it represents everything as a symbolic expression. And, yes, symbolic expressions are formally able to represent any kind of computational structure. But beyond that, the important point is that they’re somehow set up to be a match for how brains work.
\nAnd in particular, symbolic expressions can be thought of “grammatically” as consisting of nested functions that form a tree-like structure; effectively a more precise version of the typical kind of grammar that we find in human language. And, yes, just as we manage to understand and generate human language with a limited working memory, so (at least at the grammatical level) we can do the same thing with computational language. In other words, in dealing with Wolfram Language we’re leveraging our faculties with human language. And that’s why Wolfram Language can serve as such an effective bridge between the way we think about things, and what’s computationally possible.
\nBut symbolic expressions represented as trees aren’t the only conceivable structures. It’s also possible to have symbolic expressions where the elements are nodes on a graph, and the graph can even have loops in it. Or one can go further, and start talking, for example, about the hypergraphs that appear in our Physics Project. But the point is that brain-like systems have a hard time processing such structures. Because to keep track of what’s going on they in a sense have to keep track of multiple “threads of thought”. And that’s not something individual brain-like systems as we current envision them can do.
\nAs we’ve discussed several times here, it seems to be a key feature of brains that they create a single “thread of experience”. But what would it be like to have multiple threads? Well, we actually have a very familiar example of that: what happens when we have a whole collection of people (or other animals).
\nOne could imagine that biological evolution might have produced animals whose brains maintain multiple simultaneous threads of experience. But somehow it has ended up instead restricting each animal to just one thread of experience—and getting multiple threads by having multiple animals. (Conceivably creatures like octopuses may actually in some sense support multiple threads within one organism.)
\nWithin a single brain it seems important to always “come to a single, definite conclusion”—say to determine where an animal will “move next”. But what about in a collection of organisms? Well, there’s still some kind of coordination that will be important to the fitness of the whole population—perhaps even something as direct as moving together as a herd or flock. And in a sense, just as all those different neuron firings in one brain get collected to determine a “final conclusion for what to do”, so similarly the conclusions of many different brains have to be collected to determine a coordinated outcome.
\nBut how can a coordinated outcome arise? Well, there has to be communication of some sort between organisms. Sometimes it’s rather passive (just watch what your neighbor in a herd or flock does). Sometimes it’s something more elaborate and active—like language. But is that the best one can do? One might imagine that there could be some kind of “telepathic coordination”, in which the raw pattern of neuron firings is communicated from one brain to another. But as we’ve argued, such communication cannot be expected to be robust. To achieve robustness, one must “package up” all the internal details into some standardized form of communication (words, roars, calls, etc.) that one can expect can be “faithfully unpacked” and in effect “understood” by other, suitably similar brains.
\nBut it’s important to realize that the very possibility of such standardized communication in effect requires coordination. Because somehow what goes on in one brain has to be aligned with what goes on in another. And indeed the way that’s maintained is precisely through continual communication.
\nSo, OK, how might bigger brains affect this? One possibility is that they might enable more complex social structures. There are plenty of animals with fairly small brains that successfully form “all do the same thing” flocks, herds and the like. But the larger brains of primates seem to allow more complex “tribal” structures. Could having a bigger brain let one successfully maintain a larger social structure, in effect remembering and handling larger numbers of social connections? Or could the actual forms of these connections be more complex? While human social connections seem to be at least roughly captured by social networks represented as ordinary graphs, maybe bigger brains would for example routinely require hypergraphs.
\nBut in general we can say that language—or standardized communication of some form—is deeply connected to the existence of a “coherent society”. For without being able to exchange something like language there’s no way to align the members of a potential society. And without coherence between members something like language won’t be useful.
\nAs in so many other situations, one can expect that the detailed interactions between members of a society will show all sorts of computational irreducibility. And insofar as one can identify “the will of society” (or, for that matter, the “tide of history”), it represents a pocket of computational reducibility in the system.
\nIn human society there is a considerable tendency (though it’s often not successful) to try to maintain a single “thread of society”, in which, at some level, everyone is supposed to act more or less the same. And certainly that’s an important simplifying feature in allowing brains like ours to “navigate the social world”. Could bigger brains do something more sophisticated? As in other areas, one can imagine a whole network of regularities (or pockets of reducibility) in the structure of society, perhaps connected to a whole tower of “higher-order social abstractions”, that only brains bigger than ours can comfortably deal with. (“Just being friends” might be a story for the “small brained”. With bigger brains one might instead have patterns of dependence and connectivity that can only be represented in complicated graph theoretic ways.)
\nWe humans have a tremendous tendency to think—or at least hope—that our minds are somehow “at the top” of what’s possible. But with what we know now about computation and how it operates in the natural world it’s pretty clear this isn’t true. And indeed it seems as if it’s precisely a limitation in the “computational architecture” of our minds—and brains—that leads to that most cherished feature of our existence that we characterize as “conscious experience”.
\nIn the natural world at large, computation is in some sense happening quite uniformly, everywhere. But our brains seem to be set up to do computation in a more directed and more limited way—taking in large amounts of sensory data, but then filtering it down to a small stream of actions to take. And, yes, one can remove this “limitation”. And while the result may lead to more computation getting done, it doesn’t lead to something that’s “a mind like ours”.
\nAnd indeed in what we’ve done here, we’ve tended to be very conservative in how we imagine “extending our minds”. We’ve mostly just considered what might happen if our brains were scaled up to have more neurons, while basically maintaining the same structure. (And, yes, animals physically bigger than us already have larger brains—as did Neanderthals—but what we really need to look at is size of brain relative to size of the animal, or, in effect “amount of brain for a given amount of sensory input”.)
\nA certain amount about what happens with different scales of brains is already fairly clear from looking at different kinds of animals, and at things like their apparent lack of human-like language. But now that we have artificial neural nets that do remarkably human-like things we’re in a position to get a more systematic sense of what different scales of “brains” can do. And indeed we’ve seen a sequence of “capability thresholds” passed as neural nets get larger.
\nSo what will bigger brains be able to do? What’s fairly straightforward is that they’ll presumably be able to take larger amounts of sensory input, and generate larger amounts of output. (And, yes, the sensory input could come from existing modalities, or new ones, and the outputs could go to existing “actuators”, or new ones.) As a practical matter, the more “data” that has to be processed for a brain to “come to a decision” and generate an output, the slower it’ll probably be. But as brains get bigger, so presumably will the size of their working memory—as well as the number of distinct “concepts” they can “distinguish” and “remember”.
\nIf the same overall architecture is maintained, there’ll still be just a single “thread of experience”, associated with a single “thread of communication”, or a single “stream of tokens”. At the size of brains we have, we can deal with compositional language in which “concepts” (represented, basically, as words) can have at least a certain depth of qualifiers (corresponding, say, to adjectival phrases). As brain size increases, we can expect there can both be more “raw concepts”—allowing fewer qualifiers—as well as more working memory to deal with more deeply nested qualifiers.
\nBut is there something qualitatively different that can happen with bigger brains? Computational language (and particularly my experience with the Wolfram Language) gives some indications, the most notable of which is the idea of “going meta” and using “higher-order constructs”. Instead of, say, operating directly on “raw concepts” with (say, “verb-like”) “functions”, we can imagine higher-order functions that operate on functions themselves. And, yes, this is something of which we see powerful examples in the Wolfram Language. But it feels as if we could somehow go further—and make this more routine—if our brains in a sense had “more capacity”.
\nTo “go meta” and “use higher-order constructs” is in effect a story of abstraction—and of taking many disparate things and abstracting to the point where one can “talk about them all together”. The world at large is full of complexity—and computational irreducibility. But in essence what makes “minds like ours” possible is that there are pockets of computational reducibility to be found. And those pockets of reducibility are closely related to being able to successfully do abstraction. And as we build up towers of abstraction we are in effect navigating through networks of pockets of computational reducibility.
\nThe progress of knowledge—and the fact that we’re educated about it—lets us get to a certain level of abstraction. And, one suspects, the more capacity there is in a brain, the further it will be able to go.
\nBut where will it “want to go”? The world at large—full as it is with computational irreducibility, along with infinite numbers of pockets of reducibility—leaves infinite possibilities. And it is largely the coincidence of our particular history that defines the path we have taken.
\nWe often identify our “sense of purpose” with the path we will take. And perhaps the definiteness of our belief in purpose is related to the particular feature of brains that leads us to concentrate “everything we’re thinking” down into just a single stream of decisions and action.
\nAnd, yes, as we’ve discussed, one could in principle imagine “multiway minds” with multiple “threads of consciousness” operating at once. But we humans (and individual animals in general) don’t seem to have those. Of course, in collections of humans (or other animals) there are still inevitably multiple “threads of consciousness” —and it’s things like language that “knit together” those threads to, for example, make a coherent society.
\nQuite what that “knitting” looks like might change as we scale up the size of brains. And so, for example, with bigger brains we might be able to deal with “higher-order social structures” that would seem alien and incomprehensible to us today.
\nSo what would it be like to interact with a “bigger brain”? Inside, that brain might effectively use many more words and concepts than we know. But presumably it could generate at least a rough (“explain-like-I’m-5”) approximation that we’d be able to understand. There might well be all sorts of abstractions and “higher-order constructs” that we are basically blind to. And, yes, one is reminded of something like a dog listening to a human conversation about philosophy—and catching only the occasional “sit” or “fetch” word.
\nAs we’ve discussed several times here, if we remove our restriction to “brain-like” operation (and in particular to deriving a small stream of decisions from large amounts of sensory input) we’re thrown into the domain of general computation, where computational irreducibility is rampant, and we can’t in general expect to say much about what’s going on. But if we maintain “brain-like operation”, we’re instead in effect navigating through “networks of computational reducibility”, and we can expect to talk about things like concepts, language and towers of abstraction.
\nFrom a foundational point of view, we can imagine any mind as in effect being at a particular place in the ruliad. When minds communicate, they are effectively exchanging the rulial analog of particles—robust concepts that are somehow unchanged as they propagate within the ruliad. So what would happen if we had bigger brains? In a sense it’s a surprisingly “mechanical” story: a bigger brain—encompassing more concepts, etc.—in effect just occupies a larger region of rulial space. And the presence of abstraction—perhaps learned from a whole arc of intellectual history—can lead to more expansion in rulial space.
\nAnd in the end it seems that “minds beyond ours” can be characterized by how large the regions of the ruliad they occupy are. (Such minds are, in some very literal rulial sense, more “broad minded”.) So what is the limit of all this? Ultimately, it’s a “mind” that spans the whole ruliad, and in effect incorporates all possible computations. But in some fundamental sense this is not a mind like ours, not least because by “being everything” it “becomes nothing”—and one can no longer identify it as having a coherent “thread of individual existence”.
\nAnd, yes, the overall thrust of what we’ve been saying applies just as well to “AI minds” as to biological ones. If we remove restrictions like being set up to generate the next token, we’ll be left with a neural net that’s just “doing computation”, with no obvious “mind-like purpose” in sight. But if we make neural nets do typical “brain-like” tasks, then we can expect that they too will find and navigate pockets of reducibility. We may well not recognize what they’re doing. But insofar as we can, then inevitably we’ll mostly be sampling the parts of “minds beyond ours” that are aligned with “minds like ours”. And it’ll take progress in our whole human intellectual edifice to be able to fully appreciate what it is that minds beyond ours can do.
\nThanks for recent discussions about topics covered here in particular to Richard Assar, Joscha Bach, Kovas Boguta, Thomas Dullien, Dugan Hammock, Christopher Lord, Fred Meinberg, Nora Popescu, Philip Rosedale, Terry Sejnowski, Hikari Sorensen, and James Wiles.
\n", "category": "Artificial Intelligence", "link": "https://writings.stephenwolfram.com/2025/05/what-if-we-had-bigger-brains-imagining-minds-beyond-ours/", "creator": "Stephen Wolfram", "pubDate": "Wed, 21 May 2025 14:28:31 +0000", "enclosure": "", "enclosureType": "", "image": "", "id": "", "language": "en", "folder": "", "feed": "wolfram", "read": false, "favorite": false, "created": false, "tags": [], "hash": "2841357beeb72f8b939e88b179422b99", "highlights": [] }, { "title": "What Can We Learn about Engineering and Innovation from Half a Century of the Game of Life Cellular Automaton?", "description": "Things are invented. Things are discovered. And somehow there’s an arc of progress that’s formed. But are there what amount to “laws of innovation” that govern that arc of progress?
\nThere are some exponential and other laws that purport to at least measure overall quantitative aspects of progress (number of transistors on a chip; number of papers published in a year; etc.). But what about all the disparate innovations that make up the arc of progress? Do we have a systematic way to study those?
\nWe can look at the plans for different kinds of bicycles or rockets or microprocessors. And over the course of years we’ll see the results of successive innovations. But most of the time those innovations won’t stay within one particular domain—say shapes of bicycle frames. Rather they’ll keep on pulling in innovations from other domains—say, new materials or new manufacturing techniques. But if we want to get closer to the study of the pure phenomenon of innovation we need a case where—preferably over a long period of time—everything that happens can be described in a uniform way within a single narrowly defined framework.
\nWell, some time ago I realized that, actually, yes, there is such a case—and I’ve even personally been following it for about half a century. It’s the effort to build “engineering” structures within the Game of Life cellular automaton. They might serve as clocks, wires, logic gates, or things that generate digits of π. But the point is that they’re all just patterns of bits. So when we talk about innovation in this case, we’re talking about the rather pure question of how patterns of bits get invented, or discovered.
\nAs a long-time serious researcher of the science of cellular automata (and of what they generically do), I must say I’ve long been frustrated by how specific, whimsical and “non-scientific” the things people do with the Game of Life have often seemed to me to be. But what I now realize is that all that detail and all that hard work have now created what amounts to a unique dataset of engineering innovation. And my goal here is to do what one can call “metaengineering”—and to study in effect what happened in that process of engineering over the nearly six decades since the Game of Life was invented.
\nWe’ll see in rather pure form many phenomena that are at least anecdotally familiar from our overall experience of progress and innovation. Most of the time, the first step is to identify an objective: some purpose one can describe and wants to achieve. (Much more rarely, one instead observes something that happens, then realizes there’s a way one can meaningfully make use of it.) But starting from an objective, one either takes components one has, and puts human effort into arranging them to “invent” something that will achieve the objective—or in effect (usually at least somewhat systematically, and automatically) one searches to try to “discover” new ways to achieve the objective.
\nAs we explore what’s been done with the Game of Life we’ll see occasional sudden advances—together with much larger amounts of incremental progress. We’ll see towers of technology being built, and we’ll see old, rather simple technology being used to achieve new objectives. But most of all, we’ll see an interplay between what gets discovered by searching possibilities—and what gets invented by explicit human effort.
\nThe Principle of Computational Equivalence implies that there is, in a sense, infinite richness to what a computational system like the Game of Life can ultimately do—and it’s the role of science to explore this richness in all its breadth. But when it comes to engineering and technology the crucial question is what we choose to make the system do—and what paths we follow to get there. Inevitably, some of this is determined by the underlying computational structure of the system. But much of it is a reflection of how we, as humans, do things, and the patterns of choices we make. And that’s what we’ll be able to study—at quite large scale—by looking at the nearly six decades of work on the Game of Life.
\nHow similar are the results of such “purposeful engineering” to the results of “blind” adaptive evolution of the kind that occurs in biology? I recently explored adaptive evolution (as it happens, using cellular automata as a model) and saw that it can routinely deliver what seem like “sequences of new ideas”. But now in the example of the Game of Life we have what we can explicitly identify as “sequences of new ideas”. And so we’re in a position to compare the results of human effort (aided, in many cases, by systematic search) with what we can “automatically” do by the algorithmic process of adaptive evolution.
\nIn the end, we can think of the set of things that we can in principle engineer as being laid out in a kind of “metaengineering space”, much as we can think of mathematical theorems we can prove as being laid out in metamathematical space. In the mathematical case (notwithstanding some of my own work) the vast majority of theorems have historically been found purely by human effort. But, as we’ll see below, in Game-of-Life engineering it’s been a mixture of human effort and fairly automated exploration of metaengineering space. Though—much like in traditional mathematics—we’ve still in a sense always only pursuing objectives we’ve already conceptualized. And in this way what we’re doing is very different from what I’ve done for so long in studying the science (or, as I would now say, the ruliology) of what computational systems like cellular automata (of which the Game of Life is an example) do “in the wild”, when they’re unconstrained by objectives we’re trying to achieve with them.
\nHere’s a typical example of what it looks like to run the Game of Life:
\nThere’s a lot of complicated—and hard to understand—stuff going on here. But there are still some recognizable structures—like the “blinkers” that alternate on successive steps
\nand the “gliders” that steadily move across the screen:
\nSeeing these structures might make one think that one should be able to “do engineering” in the Game of Life, setting up patterns that can ultimately do all sorts of things. And indeed our main subject here is the actual development of such engineering over the past nearly six decades since the introduction of the Game of Life.
\nWhat we’ll be concentrating on is essentially the “technology” of the Game of Life: how we take the “raw material” that the Game of Life provides, and make from it “meaningful engineering structures”.
\nBut what about the science of the Game of Life? What can we say about what the Game of Life “naturally does”, independent of “useful” structures we create in it? The vast majority of the effort that’s been put into the Game of Life over the past half century hasn’t been about this. But this type of fundamental question is central to what one asks in what I now call ruliology—a kind of science that I’ve been energetically pursuing since the early 1980s.
\nRuliology looks in general at classes of systems, rather then at the kind of specifics that have typically been explored in the Game of Life. And within ruliology, the Game of Life is in a sense nothing special; it’s just one of many “class 4” 2D cellular automaton (in my numbering scheme, it’s the 2-color 9-neighbor cellular automaton with outer totalistic code 224).
\nMy own investigations of cellular automata have particularly focused in 1D than 2D examples. And I think that’s been crucial to many of the scientific discoveries I’ve made. Because somehow one learns so much more by being able to see at a glance the history of a system, rather than just seeing frames in a video go by. With a class 4 2D rule like the Game of Life, one can begin to approach this by including “trails” of what’s previously happened, and we’ll often use this kind of visualization in what follows:
\nWe can get a more complete view of history by looking at the whole (2+1)-dimensional “spacetime history”—though then we’re confronted with 3D forms that are often somewhat difficult for our human visual system to parse:
\nBut taking a slice through this 3D form we get “silhouette” pictures that turn out to look remarkably similar to what I generated in large quantities starting in the early 1980s across many 1D cellular automata:
\nSuch pictures—with their complex forms—highlight the computational irreducibility that’s close at hand even in the Game of Life. And indeed it’s the presence of such computational irreducibility that ultimately makes possible the richness of engineering that can be done in the Game of Life. But in actually doing that engineering—and in setting up structures and processes that behave in understandable and “technologically useful” ways—we need to keep the computational irreducibility “bottled up”. And in the end, we can think of the path of engineering innovation in the Game of Life as like an effort to navigate through an ocean of computational irreducibility, finding “islands of reducibility” that achieve the purposes we want.
\nMost of the structures of “engineering interest” in the Game of Life are somehow persistent. The simplest are structures that just remain constant, some small examples being:
\nAnd, yes, structures in the Game of Life have been given all sorts of (usually whimsical) names, which I’ll use here. (And, in that vein, structures in the Game of Life that remain constant are normally called “still lifes”.)
\nBeyond structures that just remain constant, there are “oscillators” that produce periodic patterns:
\nWe’ll be discussing oscillators at much greater length below, but here are a few examples (where now we’re including a visualization that shows “trails”):
\nNext in our inventory of classes of structures come “gliders” (or in general “spaceships”): structures that repeat periodically but move when they do so. A classic example is the basic glider, which takes on the same form every 4 steps—after moving 1 cell horizontally and 1 cell vertically:
\nHere are a few small examples of such “spaceship”-style structures:
\nStill lifes, oscillators and spaceships are most of what one sees in the “ash” that survives from typical random initial conditions. And for example the end result (after 1103 steps) from the evolution we saw in the previous section consists of:
\nThe structures we’ve seen so far were all found not long after the Game of Life was invented; indeed, pretty much as soon it was simulated on a computer. But one feature that they all share is that they don’t systematically grow; they always return to the same number of black cells. And so one of the early surprises (in 1970) was the discovery of a “glider gun” that shoots out a glider every 30 steps forever:
\nSomething that gives a sense of progress that’s been made in Game-of-Life “technology” is that a “more efficient” glider gun—with period 15—was discovered, but only in 2024, 54 years after the previous one:
\nAnother kind of structure that was quickly discovered in the early history of the Game of Life is a “puffer”—a “spaceship” that “leaves debris behind” (in this case every 128 steps):
\nBut given these kinds of “components”, what can one build? Something constructed very early was the “breeder”, that uses streams of gliders to create glider guns, that themselves then generate streams of gliders:
\nThe original pattern covers about a quarter million cells (with 4060 being black). Running it for 1000 steps we see it builds up a triangle containing a quadratically increasing number of gliders:
\nOK, but knowing that it’s in principle possible to “fill a growing region of space”, is there a more efficient way to do it? The surprisingly simple answer, as discovered in 1993, is yes:
\nSo what other kinds of things can be built in the Game of Life? Lots—even from the simple structures we’ve seen so far. For example, here’s a pattern that was constructed to compute the primes
\nemitting a “lightweight spaceship” at step 100 + 120n only if n is prime. It’s a little more obvious how this works when it’s viewed “in spacetime”; in effect it’s running a sieve in which all multiples of all numbers are instantiated as streams of gliders, which knock out spaceships generated at non-prime positions:
\nIf we look at the original pattern here, it’s just made up of a collection of rather simple structures:
\nAnd indeed structures like these have been used to build all sorts of things, including for example Turing machine emulators—and also an emulator for the Game of Life itself, with this 499×499 pattern corresponding to a single emulated Life cell:
\nBoth these last two patterns were constructed in the 1990s—from components that had been known since the early 1970s. And—as we can see—they’re large (and complicated). But do they need to be so large? One of the lessons of the Principle of Computational Equivalence is that in the computational universe there’s almost always a way to “do just as much, but with much less”. And indeed in the Game of Life many, many discoveries along these lines have been made in the past few decades.
\nAs we’ll see, often (but not always) these discoveries built on “new devices” and “new mechanisms” that were identified in the intervening years. A long series of such “devices” and “mechanisms” involved handling “signals” associated with streams of gliders. For example, the “glider pusher” (from 1993) has the somewhat subtle (but useful) effect of “pushing” a glider by one cell when it goes past:
\nAnother example (actually already known in 1971, and based on the period-15 “pentadecathlon” oscillator) is a glider reflector:
\nBut a feature of this glider pusher and glider reflector is that they work only when both the glider and the stationary object are in a particular phase with respect to their periods. And this makes it very tricky to build larger structures out of these that operate correctly (and in many cases it wouldn’t be possible but for the commensurability of the period 30 of the original glider gun, and the period 15 of the glider reflector).
\nCould glider pushing and glider reflection be done more robustly? The answer turns out to be yes. Though it wasn’t until 2020 that the “bandersnatch” was created—a completely static structure that “pushes” gliders independent of their phase:
\nMeanwhile, in 2013 the “snark” had been created—which served as a phase-independent glider reflector:
\nOne theme—to which we’ll return later—is that after certain functionality was first built in the Game of Life, there followed many “optimizations”, achieving that functionality more robustly, with smaller patterns, etc. An important methodology has revolved around so-called “hasslers”, which in effect allow one to “mine” small pieces of computational irreducibility, by providing “harnesses” that “rein in” behavior, typically returning patterns to their original states after they’ve done what one wants them to do.
\nSo, for example, here’s a hassler (found, as it happens just on February 8, 2025!) that “harnesses” the first pattern we looked at above (that didn’t stabilize for 1103 steps) into an oscillator with period 80:
\nAnd based on this (indeed, later that same day) the most-compact-ever “spaceship gun” was constructed from this:
\nWe’ve talked about some of what it’s been possible to build in the Game of Life over the years. Now I want to talk about how that happened, or, in other words, the “arc of progress” in the Game of Life. And as a first indication of this, we can plot the number of new Life structures that have been identified each year (or, more specifically, the number of structures deemed significant enough to name, and to record in the LifeWiki database or its predecessors):
\nThere’s an immediate impression of several waves of activity. And we can break this down into activity around various common categories of structures:
\nFor oscillators we see fairly continuous activity for five decades, but with rapid acceleration recently. For “spaceships” and “guns” we see a long dry spell from the early 1970s to the 1990s, followed by fairly consistent activity since. And for conduits and reflectors we see almost nothing until sudden peaks of activity, in the mid-1990s and mid-2010s respectively.
\nBut what was actually done to find all these structures? There have basically been two methods: construction and search. Construction is a story of “explicit engineering”—and of using human thought to build up what one wants. Search, on the other hand, is a story of automation—and of taking algorithmically generated (usually large) collections of possible patterns, and testing them to find ones that do what one wants. Particularly in more recent times it’s also become common to interleave these methods, for example using construction to build a framework, and then using search to find specific patterns that implement some feature of that framework.
\nWhen one uses construction, it’s like “inventing” a structure, and when one uses search, it’s like “discovering” it. So how much of each is being done in practice? Text mining descriptions of recently recorded structures the result is as follows—suggesting that, at least in recent times, search (i.e. “discovery”) has become the dominant methodology for finding new structures:
\nWhen the Game of Life was being invented, it wasn’t long before it was being run on computers—and people were trying to classify the things it could do. Still lifes and simple oscillators showed up immediately. And then—evolving from the (“R pentomino”) initial condition that we used at the beginning here—after 69 steps something unexpected showed up. In between complicated behavior that was hard to describe was a simple free-standing structure that just systematically moved—a “glider”:
Some other moving structures (dubbed “spaceships”) were also observed. But the question arose: could there be a structure that would somehow systematically grow forever? To find it involved a mixture of “discovery” and “invention”. In running from the (“R pentomino”) initial condition lots of things happen. But at step 785 it was noticed that there appeared the following structure:
For a while this structure (dubbed the “queen bee”) behaves in a fairly orderly way—producing two stable “beehive” structures (visible here as vertical columns). But then it “decays” into more complicated behavior:
\nBut could this “discovered” behavior be “stabilized”? The answer was that, yes, if a “queen bee” was combined with two “blocks” it would just repeatedly “shuttle” back and forth:
\nWhat about two “queen bees”? Now whenever these collided there was a side effect: a glider was generated—with the result that the whole structure became a glider gun repeatedly producing gliders forever:
\nThe glider gun was the first major example of a structure in the Game of Life that was found—at least in part—by construction. And within a year of it being found in November 1970, two more guns—with very similar methods of operation—had been found:
\nBut then the well ran dry—and no further gun was found until 1990. Pretty much the same thing happened with spaceships: four were found in 1970, but no more were found until 1989. As we’ll discuss later, it was in a sense a quintessential story of computational irreducibility: there was no way to predict (or “construct”) what spaceships would exist; one just had to do the computation (i.e. search) to find out.
\nIt was, however, easier to have incremental success with oscillators—and (as we’ll see) pretty much every year an oscillator with some new period was found, essentially always by search. Some periods were “long holdouts” (for example the first period-19 oscillator was found only in 2023), once again reflecting the effects of computational irreducibility.
\nGlider guns provided a source of “signals” for Life engineering. But what could one do with these signals? An important idea—that first showed up in the “breeder” in 1971—was “glider synthesis”: the concept that combinations of gliders could produce other structures. So, for example, it was found that three carefully-arranged gliders could generate a period-15 (“pentadecathlon”) oscillator:
\nIt was also soon found that 8 gliders could make the original glider gun (the breeder made glider guns by a slightly more ornate method). And eventually there developed the conjecture that any structure that could be synthesized from gliders would need at most 15 gliders, carefully arranged at positions whose values effectively encoded the object to be constructed.
\nBy the end of the 1970s a group of committed Life enthusiasts remained, but there was something of a feeling that “the low-hanging fruit had been picked”, and it wasn’t clear where to go next. But after a somewhat slow decade, work on the Game of Life picked up substantially towards the end of the 1980s. Perhaps my own work on cellular automata (and particularly the identification of class 4 cellular automata, of which the Game of Life is a 2D example) had something to do with. And no doubt it also helped that the fairly widespread availability of faster (“workstation class”) computers now made it possible for more people to do large-scale systematic searches. In addition, when the web arrived in the early 1990s it let people much more readily share results—and had the effect of greatly expanding and organizing the community of Life enthusiasts.
\nIn the 1990s—along with more powerful searches that found new spaceships and guns—there was a burst of activity in constructing elaborate “machines” out of existing known structures. The idea was to start from a known type of “machine” (say a Turing machine), then to construct a Life implementation of it. The constructions were made particularly ornate by the need to make the phases of gliders, guns, etc. appropriately correspond. Needless to say, any Life configuration can be thought of as doing some computation. But the “machines” that were constructed were ones whose “purpose” and “functionality” was already well established in general computation, independent of the Game of Life.
\nIf the 1990s saw a push towards “construction” in the Game of Life, the first decade of the 2000s saw a great expansion of search. Increasingly powerful cloud and distributed computing allowed “censuses” to be created of structures emerging from billions, then trillions of initial conditions. Mostly what was emphasized was finding new instances of existing categories of objects, like oscillators and spaceships. There were particular challenges, like (as we’ll discuss below) finding oscillators of any period (finally completely solved in 2023), or finding spaceships with different patterns of motion. Searches did yield what in censuses were usually called “objects with unusual growth”, but mostly these were not viewed as being of “engineering utility”, and so were not extensively studied (even though from the point of the “science of the Game of Life” they are, for example, perhaps the most revealing examples of computational irreducibility).
\nAs had happened throughout the history of the Game of Life, some of the most notable new structures were created (sometimes over a long period of time) by a mixture of construction and search. For example, the “stably-reflect-gliders-without-regard-to-phase” snark—finally obtained in 2013—was the result of using parts of the (ultimately unstable) “simple-structures” construction from around 1998
\nand combining them with a hard-to-explain-why-it-works “still life” found by search:
\nAnother example was the “Sir Robin knightship”—a spaceship that moves like a chess knight 2 cells down and 1 across. In 2017 a spaceship search found a structure that in 6 steps has many elements that make a knight move—but then subsequently “falls apart”:
\nBut the next year a carefully orchestrated search was able to “find a tail” that “adds a fix” to this—and successfully produces a final “perfect knightship”:
\nBy the way, the idea that one can take something that “almost works” and find a way to “fix it” is one that’s appeared repeatedly in the engineering history of the Game of Life. At the outset, it’s far from obvious that such a strategy would be viable. But the fact that it is seems to be similar to the story of why both biological evolution and machine learning are viable—which, as I’ve recently discussed, can be viewed as yet another consequence of the phenomenon of computational irreducibility.
\nOne thing that’s happened many times in the history of the Game of Life is that at some point some category of structure—like a conduit—is identified, and named. But then it’s realized that actually there was something that could be seen as an instance of the same category of structure found much earlier, though without the clarity of the later instance, its significance wasn’t recognized. For example, in 1995 the “Herschel conduit” that moves a from one position to another (here in 64 steps) was discovered (by a search):
But then it was realized that—if looked at correctly—a similar phenomenon had actually already been seen in 1972, in the form of a structure that in effect takes if it is present, and “moves it” (in 28 steps) to a
at a different position (albeit with a certain amount of “containable” other activity):
Looking at the plots above of the number of new structures found per year we see the largest peak after 2020. And, yes, it seems that during the pandemic people spent more time on the Game of Life—in particular trying to fill in tables of structures of particular types, for example, with each possible period.
\nBut what about the human side of engineering in the Game of Life? The activity brought in people from many different backgrounds. And particularly in earlier years, they often operated quite independently, and with very different methods (some not even using a computer). But if we look at all “recorded structures” we can look at how many structures in total different people contributed, and when they made these contributions:
\nNeedless to say—given that we’re dealing with an almost-60-year span—different people tend to show up as active in different periods. Looking at everyone, there’s a roughly exponential distribution to the number of (named) structures they’ve contributed. (Though note that several of the top contributors shown here found parametrized collections of structures and then recorded many instances.)
\nAs a first example of systematic “innovation history” in the Game of Life let’s talk about oscillators. Here are the periods of oscillators that were found up to 1980:
\nAs of 1980, many periods were missing. But in fact all periods are possible—though it wasn’t until 2023 that they were all filled in:
\nAnd if we plot the number of distinct periods (say below 60) found by a given year, we can get a first sense of the “arc of progress” in “oscillator technology” in the Game of Life:
\nFinding an oscillator of a given period is one thing. But how about the smallest oscillator of that period? We can be fairly certain that not all of these are known, even for periods below 30. But here’s a plot that shows when the progressive “smallest so far” oscillators were found for a given period (red indicates the first instance of a given period; blue the best result to date):
\nAnd here’s the corresponding plot for all periods up to 100:
\nBut what about the actual reduction in size that’s achieved? Here’s a plot for each oscillator period showing the sequence of sizes found—in effect the “arc of engineering optimization” that’s achieved for that period:
\nSo what are the actual patterns associated with these various oscillators? Here are some results (including timelines of when the patterns were found):
\nBut how were these all found? The period-2 “blinker” was very obvious—showing up in evolution from almost any random initial condition. Some other oscillators were also easily found by looking at the evolution of particular, simple initial conditions. For example, a line of 10 black cells after 3 steps gives the period-15 “pentadecathlon”. Similarly, the period-3 “pulsar” emerges from a pair of length-5 blocks after 22 steps:
\nMany early oscillators were found by iterative experimentation, often starting with stable “still life” configurations, then perturbing them slightly, as in this period-4 case:
\nAnother common strategy for finding oscillators (that we’ll discuss more below) was to take an “unstable” configuration, then to “stabilize” it by putting “robust” still lifes such as the “block” or the “eater”
around it—yielding results like:
For periods that can be formed as LCMs of smaller periods one “construction-oriented” strategy has been to take oscillators with appropriate smaller periods, and combine them, as in:
\nIn general, many different strategies have been used, as indicated for example by the sequence of period-3 oscillators that have been recorded over the years (where “smallest-so-far” cases are highlighted):
\nBy the mid-1990s oscillators of many periods had been found. But there were still holdouts, like period 19 and for example pretty much all periods between 61 and 70 (except, as it happens, 66). At the time, though, all sorts of complicated constructions—say of prime generators—were nevertheless being done. And in 1996 it was figured out that one could in effect always “build a machine” (using only structures that had already been found two decades earlier) that would serve as an oscillator of any (sufficiently large) period (here 67)—effectively by “sending a signal around a loop of appropriate size”:
\nBut by the 2010s, with large numbers of fast computers becoming available, there was again an emphasis on pure random search. A handful of highly efficient programs were developed, that could be run on anyone’s machine. In a typical case, a search might consist of starting, say, from a trillion randomly chosen initial conditions (or “soups”), identifying new structures that emerge, then seeing whether these act, for example, as oscillators. Typically any new discovery was immediately reported in online forums—leading to variations of it being tried, and new follow-on results often being reported within hours or days.
\nMany of the random searches started just from 16×16 regions of randomly chosen cells (or larger regions with symmetries imposed). And in a typical manifestation of computational irreducibility, many surprisingly small and “random-looking” (at least up to symmetries) results were found. So, for example, here’s the sequence of recorded period-16 oscillators with smaller-than-before cases highlighted:
\nUp through the 1990s results were typically found by a mixture of construction and small-scale search. But in 2016, results from large-scale random searches (sometimes symmetrical, sometimes not) started to appear.
\nThe contrast between construction and search could be dramatic, like here for period 57:
\nOne might wonder whether there could actually be a systematic, purely algorithmic way to find, say, possible oscillators of a given period. And indeed for one-dimensional cellular automata (as I noted in 1984), it turns out that there is. Say one considers blocks of cells of width w. Which block can follow which other is determined by a de Bruijn graph, or equivalently, a finite state machine. If one is going to have a pattern with period p, all blocks that appear in it must also be periodic with period p. But such blocks just form a subgraph of the overall de Bruijn graph, or equivalently, form another, smaller, finite state machine. And then all patterns with period p must correspond to paths through this subgraph. But how long are the blocks one has to consider?
\nIn 1D cellular automata, it turns out that there’s an upper bound of 22p. But for 2D cellular automata—like the Game of Life—there is in general no such upper bound, a fact related to the undecidability of the 2D tiling problem. And the result is that there’s no complete, systematic algorithm to find oscillators in a general 2D cellular automaton, or presumably in the Game of Life.
\nBut—as was actually already realized in the mid-1990s—it’s still possible to use algorithmic methods to “fill in” pieces of patterns. The idea is to define part of a pattern of a given period, then use this as a constraint on filling in the rest of it, finding “solutions” that satisfy the constraint using SAT-solving techniques. In practice, this approach has more often been used for spaceships than for oscillators (not least because it’s only practical for small periods). But one feature of it is that it can generate fairly large patterns with a given period.
\nYet another method that’s been tried has been to generate oscillators by colliding gliders in many possible ways. But while this is definitely useful if one’s interested in what can be made using gliders, it doesn’t seem to have, for example, allowed people to find much in the way of interesting new oscillators.
\nIn traditional engineering a key strategy is modularity. Rather than trying to build something “all in one go”, the idea is to build a collection of independent subsystems, from which the whole system can then be assembled. But how does this work in the Game of Life? We might imagine that to identify the modular parts of a system, we’d have to know the “process” by which the system was put together, and the “intent” involved. But because in the Game of Life we’re ultimately just dealing with pure patterns of bits we can in effect just as well “come in at the end” and algorithmically figure out what pieces are operating as separate, modular parts.
\nSo how can we do this? Basically what we want to find out is which parts of a pattern “operate independently” at a given step, in the sense that these parts don’t have any overlap in the cells they affect. Given that in the rules for the Game of Life a particular cell can affect any of the 9 cells in its neighborhood, we can say that black cells can only have “overlapping effects” if they are at most
cell units apart. So then we can draw a “nearest neighbor graph” that shows which cells are connected in this sense:
But what about the whole evolution? We can draw what amounts to a causal graph that shows the causal connections between the “independent modular parts” that exist at each step:
\nAnd given this, we can summarize the “modular structure” of this particular oscillator by the causal graph:
\nUltimately all that matters in the “overall operation” of the oscillator is the partial ordering defined by this graph. Parts that appear “horizontally separated” (or, more precisely, in antichains, or in physics terminology, spacelike separated) can be generated independently and in parallel. But parts that follow each other in the partial order need to be generated in that order (i.e. in physics terms, they are timelike separated).
\nAs another example, let’s look at graphs for the various oscillators of period 16 that we showed above:
\nWhat we see is that the early period-16 oscillators were quite modular, and had many parts that in effect operated independently. But the later, smaller ones were not so modular. And indeed the last one shown here had no parts that could operate independently; the whole pattern had to be taken together at each step.
\nAnd indeed, what we’ll often see is that the more optimized a structure is, the less modular it tends to be. If we’re going to construct something “by hand” we usually need to assemble it in parts, because that’s what allows us to “understand what we’re doing”. But if, for example, we just find a structure in a search, there’s no reason for it to be “understandable”, and there’s no reason for it to be particularly modular.
\nDifferent steps in a given oscillator can involve different numbers of modular parts. But as a simple way to assess the “modularity” of an oscillator, we can just ask for the average number of parts over the course of one period. So as an example, here are the results for period-30 oscillators:
\nLater, we’ll discuss how we can use the level of modularity to assess whether a pattern is likely to have been found by a search or by construction. But for now, this shows how the modularity index has varied over the years for the best known progressively smaller oscillators of a given period—with the main conclusion being that as the oscillators get optimized for size, so also their modularity index tends to decrease:
\nOscillators are structures that cycle but do not move. “Gliders” and, more generally, “spaceships” are structures that move every time they cycle. When the Game of Life was first introduced, four examples of these (all of period 4) were found almost immediately (the last one being the result of trying to extend the one before it):
\nWithin a couple of years, experimentation had revealed two variants, with periods 12 and 20 respectively, involving additional structures:
\nBut after that, for nearly two decades, no more spaceships were found. In 1989, however, a systematic method for searching was invented, and in the years since, a steady stream of new spaceships have been found. A variety of different periods have been seen
\nas well as a variety of speeds (and three different angles):
\nThe forms of these spaceships are quite diverse:
\nSome are “tightly integrated”, while some have many “modular pieces”, as revealed by their causal graphs:
\nPeriod-96 spaceships provide an interesting example of the “arc of progress” in the Game of Life. Back in 1971, a systematic enumeration of small polyominoes was done, looking for one that could “reproduce itself”. While no polyomino on its own seemed to do this, a case was found where part of the pattern produced after 48 steps seemed to reappear repeatedly every 48 steps thereafter:
\nOne might expect this repeated behavior to continue forever. But in a typical manifestation of computational irreducibility, it doesn’t, instead stopping its “regeneration” after 24 cycles, and then reaching a steady state (apart from “radiated” gliders) after 3911 steps:
\nBut from an engineering point of view this kind of complexity was just viewed as a nuisance, and efforts were made to “tame” and avoid it.
\nAdding just one still-life block to the so-called “switch engine”
\nproduces a structure that keeps generating a “periodic wake” forever:
\nBut can this somehow be “refactored” as a “pure spaceship” that doesn’t “leave anything behind”? In 1991 it was discovered that, yes, there was an arrangement of 13 switch engines that could successfully “clean up behind themselves”, to produce a structure that would act as a spaceship with period 96:
\nBut could this be made simpler? It took many years—and tests of many different configurations—but in the end it was found that just 2 switch engines were sufficient:
\nLooking at the final pattern in spacetime gives a definite impression of “narrowly contained complexity”:
\nWhat about the causal graphs? Basically these just decrease in “width” (i.e. number of independent modular parts) as the number of engines decreases:
\nLike many other things in Game-of-Life engineering, both search and construction have been used to find spaceships. As an extreme example of construction let’s talk about the case of spaceships with speed 31/240. In 2013, an analog of the switch engine above was found—which “eats” blocks 31 cells apart every 240 steps:
\nBut could this be turned into a “self-sufficient” spaceship? A year later an almost absurdly large (934852×290482) pattern was constructed that did this—by using streams of gliders and spaceships (together with dynamically assembled glider guns) to create appropriate blocks in front, and remove them behind (along with all the “construction equipment” that was used):
\n\nBy 2016, a pattern with about 700× less area had been constructed. And now, just a few weeks ago, a pattern with 1300× less area (11974×45755) was constructed:
\nAnd while this is still huge, it’s still made of modular pieces that operate in an “understandable” way. No doubt there’s a much smaller pattern that operates as a spaceship of the same speed, but—computational irreducibility being what it is—we have no idea how large the pattern might be, or how we might efficiently search for it.
\nWhat can one engineer in the Game of Life? A crucial moment in the development of Game-of-Life engineering was the discovery of the original glider gun in 1970. And what was particularly important about the glider gun is that it was a first example of something that could be thought of as a “signal generator”—that one could imagine would allow one to implement electrical-engineering-style “devices” in the Game of Life.
\nThe original glider gun produces gliders every 30 steps, in a sense defining a “clock speed” of 1/30 for any “circuit” driven by it. Within a year after the original glider gun, two other “slower” glider guns had also been discovered
\nboth working on similar principles, as suggested by their causal graphs:
\nIt wasn’t until 1990 that any additional “guns” were found. And in the years since, a sequence of guns have been found, with a rather wide range of distinct periods:
\nSome of the guns found have very long periods:
\nBut as part of the effort to do constructions in the 1990s a gun was constructed that had overall period 210, but which interwove multiple glider streams to ultimately produce gliders every 14 steps (which is the maximum rate possible, while avoiding interference of successive gliders):
\nOver the years, a whole variety of different glider guns have been found. Some are in effect “thoroughly controlled” constructions. Others are more based on some complex process that is reined in to the point where it just produces a stream of gliders and nothing more:
\nAn example of a somewhat surprising glider gun—with the shortest “true period” known—was found in 2024:
\nThe causal graph for this glider gun shows a mixture of irreducible “search-found” parts, together with a collection of “well-known” small modular parts:
\nBy the way, in 2013 it was actually found possible to extend the construction for oscillators of any period to a construction for guns of any period (or at least any period above 78):
\nIn addition to having streams of gliders, it’s also sometimes been found useful to have streams of other “spaceships”. Very early on, it was already known that one could create small spaceships by colliding gliders:
\nBut by the mid-1990s it had been found that direct “spaceship guns” could also be made—and over the years smaller and smaller “optimized” versions have been found:
\nThe last of these—from just last month—has a surprisingly simple structure, being built from components that were already known 30 years ago, and having a causal graph that shows very modular construction:
\nWe’ve talked about some of the history of how specific patterns in the Game of Life were found. But what about the overall “flow of engineering progress”? And, in particular, when something new is found, how much does it build on what has been found before? In real-world engineering, things like patent citations potentially give one an indication of this. But in the Game of Life one can approach the question much more systematically and directly, just asking what configurations of bits from older patterns are used in newer ones.
\nAs we discussed above, given a pattern such as
\nwe can pick out its “modular parts”, here rotated to canonical orientations:
\nThen we can see if these parts correspond to (any phase of) previously known patterns, which in this case they all do:
\nSo now for all structures in the database we can ask what parts they involve. Here’s a plot of the overall frequencies of these parts:
\nIt’s notable that the highest-ranked part is a so-called “eater” that’s often used in constructions, but occurs only quite infrequently in evolution from random initial conditions. It’s also notable that (for no particularly obvious reason) the frequency of the nth most common structure is roughly 1/n.
\nSo when were the various structures that appear here first found? As this picture shows, most—but not all—were found very early in the history of the Game of Life:
\nIn other words, most of the parts used in structures from any time in the history of the Game of Life come from very early in its history. Or, in effect, structures typically go “back to basics” in the parts they use.
\nHere’s a more detailed picture, showing the relative amount of use of each part in structures from each year:
\nThere are definite “fashions” to be seen here, with some structures “coming into fashion” for a while (sometimes, but not always, right after they were first found), and then dropping out.
\nOne might perhaps imagine that smaller parts (i.e. ones with smaller areas) would be more popular than larger ones. But plotting areas of parts against their rank, we see that there are some large parts that are quite common, and some small ones that are rare:
\nWe’ve seen that many of the most popular parts overall are ones that were found early in the history of the Game of Life. But plenty of distinct modular parts were also found much later. This shows the number of distinct new modular parts found across all patterns in successive years:
\nNormalizing by the number of new patterns found each year, we see a general gradual increase in the relative number of new modular parts, presumably reflecting the greater use of search in finding patterns, or components of patterns:
\nBut how important have these later-found modular parts been? This shows the total rate at which modular parts found in a given year were subsequently used—and what we see, once again, is that parts found early are overwhelmingly the ones that are subsequently used:
\nA somewhat complementary way to look at this is to ask of all patterns found in a given year, how many are “purely de novo”, in the sense that they use no previously found modular parts (as indicated in red), and how many use previously found parts:
\nA cumulative version of this makes it clear that in early years most patterns are purely de novo, but later on, there’s an increasing amount of “reuse” of previously found parts—or, in other words, in later years the “engineering history” is increasingly important:
\nIt should be said, however, that if one wants the full story of “what’s being used” it’s a bit more nuanced. Because here we’re always treating each modular part of each pattern as a separate entity, so that we consider any given pattern to “depend” only on base modular parts. But “really” it could depend on another whole structure, itself built of many modular parts. And in what we’re doing here, we’re not tracking that hierarchy of dependencies. Were we to do so, we would likely be able to see more complex “technology stacks” in the Game of Life. But instead we’re always “going down to the primitives”. (If we were dealing with electronics it’d be like asking “What are the transistors and capacitors that are being used?”, rather than “What is the caching architecture, or how is the floating point unit set up?”)
\nOK, but in terms of “base modular parts” a simple question to ask is how many get used in each pattern. This shows the number of (base) modular parts in patterns found in each year:
\nThere are always a certain number of patterns that just consist of a single modular part—and, as we saw above, that was more common earlier in the history of the Game of Life. But now we also see that there have been an increasing number of patterns that use many modular parts—typically reflecting a higher degree of “construction” (rather than search) going on.
\nBy the way, for comparison, these plots show the total areas and the numbers of (black) cells in patterns found in each year; both show increases early on, but more or less level off by the 1990s:
\nBut, OK, if we look across all patterns in the database, how many parts do they end up using? Here’s the overall distribution:
\nAt least for a certain range of numbers of parts, this falls roughly exponentially, reflecting the idea that it’s been exponentially less likely for people to come up with (or find) patterns that have progressively larger numbers of distinct modular parts.
\nHow has this changed over time? This shows a cumulative plot of the relative frequencies with which different numbers of modular parts appear in patterns up to a given year
\nindicating that over time the distribution of the number of modular parts has gotten progressively broader—or, in other words, as we’ve seen in other ways above, more patterns make use of larger numbers of modular parts.
\nWe’ve been looking at all the patterns that have been found. But we can also ask, say, just about oscillators. And then we can ask, for example, which oscillators (with which periods) contain which others, as in:
\nAnd looking at all known oscillators we can see how common different “oscillator primitives” are in building up other oscillators:
\nWe can also ask in which year “oscillator primitives” at different ranks were found. Unlike in the case of all structures above, we now see that some oscillator primitives that were found only quite recently appear at fairly high ranks—reflecting the fact that in this case, once a primitive has been found, it’s often immediately useful in making oscillators that have multiples of its period:
\nWe can think of almost everything we’ve talked about so far as being aimed at creating structures (like “clocks” and “wires”) that are recognizably useful for building traditional “machine-like” engineering systems. But a different possible objective is to find patterns that have some feature we can recognize, whether with obvious immediate “utility” or not. And as one example of this we can think about finding so-called “die hard” patterns that live as long as possible before dying out.
\nThe phenomenon of computational irreducibility tells us that even given a particular pattern we can’t in general “know in advance” how long it’s going to take to die out (or if it ultimately dies out at all). So it’s inevitable that the problem of finding ultimate die-hard patterns can be unboundedly difficult, just like analogous problems for other computational systems (such as finding so-called “busy beavers” in Turing machines).
\nBut in practice one can use both search and construction techniques to find patterns that at least live a long time (even if not the very longest possible time). And as an example, here’s a very simple pattern (found by search) that lives for 132 steps before dying out (the “puff” at the end on the left is a reflection of how we’re showing “trails”; all the actual cells are zero at that point):
\nSearching nearly 1016 randomly chosen 16×16 patterns (out of a total of ≈ 1077 possible such patterns), the longest lifetime found is 1413 steps—achieved with a rather random-looking initial pattern:
\nBut is this the best one can do? Well, no. Just consider a block and a spaceship n cells apart. It’ll take 2n steps for them to collide, and if the phases are right, annihilate each other:
\nSo by picking the separation n to be large enough, we can make this configuration “live as long as we want”. But what if we limit the size of the initial pattern, say to 32×32? In 2022 the following pattern was constructed:
\nAnd this pattern is carefully set up so that after 30,274 steps, everything lines up and it dies out, as we can see in the (vertically foreshortened) spacetime diagram on the left:
\nAnd, yes, the construction here clearly goes much further than search was able to reach. But can we go yet further? In 2023 a 116×86 pattern was constructed
\nthat it was proved eventually dies out, but only after the absurdly large number of 17↑↑↑3 steps (probably even much larger than the number of emes in the ruliad), as given by:
\nor
\nThere are some definite rough ways in which technology development parallels biological evolution. Both involve the concept of trying out possibilities and building on ones that work. But technology development has always ultimately been driven by human effort, whereas biological evolution is, in effect, a “blind” process, based on the natural selection of random mutations. So what happens if we try to apply something like biological evolution to the Game of Life? As an example, let’s look at adaptive evolution that’s trying to maximize finite lifetime based on making a sequence of random point mutations within an initially random 16×16 pattern. Most of those mutations don’t give patterns with larger (finite) lifetimes, but occasionally there’s a “breakthrough” and the lifetime achieved so far jumps up:
\nThe actual behaviors corresponding to the breakthroughs in this case are:
\nAnd here are some other outcomes from adaptive evolution:
\nIn almost all cases, a limited number of steps of adaptive evolution do succeed in generating patterns with fairly long finite lifetimes. But the behavior we see typically shows no “readily understandable mechanisms”—and no obviously separable modular parts. And instead—just like in my recent studies of both biological evolution and machine learning—what we get are basically “lumps of irreducible computation” that “just happen” to show what we’re looking for (here, long lifetime).
\nLet’s say we’re presented with an array of cells that’s an initial condition for the Game of Life. Can we tell “where it came from”? Is it “just arbitrary” (or “random”)? Or was it “set up for a purpose”? And if it was “set up for a purpose”, was it “invented” (and “constructed”) for that purpose, or was it just “discovered” (say by a search) to fulfill that purpose?
\nWhether one’s dealing with archaeology, evolutionary biology, forensic science, the identification of alien intelligence or, for that matter, theology, the question of whether something “was set up for a purpose” is a philosophically fraught one. Any behavior one sees one can potentially explain either in terms of the mechanism that produces it, or in terms of what it “achieves”. Things get a little clearer if we have a particular language for describing both mechanisms and purposes. Then we can ask questions like: “Is the behavior we care about more succinctly described in terms of its mechanism or its purpose?” So, for example, “It behaves as a period-15 glider gun” might be an adequate purpose-oriented description, that’s much shorter than a mechanism-oriented description in terms of arrangements of cells.
\nBut what is the appropriate “lexicon of purposes” for the Game of Life? In effect, that’s a core question for Game-of-Life engineering. Because what engineering—and technology in general—is ultimately about is taking whatever raw material is available (whether from the physical world, or from the Game of Life) and somehow fashioning it into something that aligns with human purposes. But then we’re back to what counts as a valid human purpose. How deeply does the purpose have to connect in to everything we do? Is it, for example, enough for something to “look nice”, or is that not “utilitarian enough”? There aren’t absolute answers to these questions. And indeed the answers can change over time, as new uses for things are discovered (or invented).
\nBut for the Game of Life we can start with some of the “purposes” we’ve discussed here—like “be an oscillator of a certain period”, “reflect gliders”, “generate the primes” or even just “die after as long as possible”. Let’s say we just start enumerating possible initial patterns, either randomly, or exhaustively. How often will we come across patterns that “achieve one of these purposes”? And will it “only achieve that purpose” or will it also “do extra stuff” that “seems irrelevant”?
\nAs an example, consider enumerating all possible 3×3 patterns of cells. There are altogether
Other patterns can take a while to “become period 2”, but then at least give “pure period-2 objects”. And for example this one can be interpreted as being the smallest precursor, and taking the least time, to reach the period-2 object it produces:
\nThere are other cases that “get to the same place” but seem to “wander around” doing so, and therefore don’t seem as convincing as having been “created for the purpose of making a period-2 oscillator”:
\nThen there are much more egregious cases. Like
\nwhich after 173 steps gives
\nbut only after going through all sorts of complicated intermediate behavior
\nthat definitely doesn’t make it look like it’s going “straight to its purpose” (unless perhaps its purpose is to produce that final pattern from the smallest initial precursor, etc.).
\nBut, OK. Let’s imagine we have a pattern that “goes straight to” some “recognizable purpose” (like being an oscillator of a certain period). The next question is: was that pattern explicitly constructed with an understanding of how it would achieve its purpose, or was it instead “blindly found” by some kind of search?
\nAs an example, let’s look at some period-9 oscillators:
\nOne like
\nseems like it must have been constructed out of “existing parts”, while one like
\nseems like it could only plausibly have been found by a search.
\nSpacetime views don’t tell us much in these particular cases:
\nBut causal graphs are much more revealing:
\nThey show that in the first case there are lots of “factored modular parts”, while in the second case there’s basically just one “irreducible blob” with no obvious separable parts. And we can view this as an immediate signal for “how human” each pattern is. In a sense it’s a reflection of the computational boundedness of our minds. When there are factored modular parts that interact fairly rarely and each behave in a fairly simple way, it’s realistic for us to “get our minds around” what’s going on. But when there’s just an “irreducible blob of activity” we’d have to compute too much and keep too much in mind at once for us to be able to really “understand what’s going on” and for example produce a human-level narrative explanation of it.
\nIf we find a pattern by search, however, we don’t really have to “understand it”; it’s just something we computationally “discover out there in the computational universe” that “happens” to do what we want. And, indeed, as in the example here, it often does what it does in a quite minimal (if incomprehensible) way. Something that’s found by human effort is much less likely to be minimal; in effect it’s at least somewhat “optimized for comprehensibility” rather than for minimality or ease of being found by search. And indeed it will often be far too big (e.g. in terms of number of cells) for any pure exhaustive or random search to plausibly find it—even though the “human-level narrative” for it might be quite short.
\nHere are the causal graphs for all the period-9 oscillators from above:
\nSome we can see can readily be broken down into multiple rarely interacting distinct components; others can’t be decomposed in this kind of way. And in a first approximation, the “decomposable” ones seem to be precisely those that were somehow “constructed by human effort”, while the non-decomposable ones seem to be those that were “discovered by searches”.
\nTypically, the way the “constructions” are done is to start with some collection of known parts, then, by trial and error (sometimes computer assisted) see how these can be fit together to get something that does what one wants. Searches, on the other hand, typically operate on “raw” configurations of cells, blindly going through a large number of possible configurations, at every stage automatically testing whether one’s got something that does what one wants.
\nAnd in the end these different strategies reveal themselves in the character of the final patterns they produce, and in the causal graphs that represent these patterns and their behavior.
\nIn engineering as it’s traditionally been practiced, the main emphasis tends to be on figuring out plans, and then constructing things based on those plans. Typically one starts from components one has, then tries to figure out how to combine them to incrementally build up what one wants.
\nAnd, as we’ve discussed, this is also a way of developing technology in the Game of Life. But as we’ve discussed at length, it’s not the only way. Another way is just to search for whole pieces of technology one wants.
\nTraditional intuition might make one assume this would be hopeless. But the repeated lesson of my discoveries about simple programs—as well as what’s been done with the Game of Life—is that actually it’s often not hopeless at all, and instead it’s very powerful.
\nYes, what you get is not likely to be readily “understandable”. But it is likely to be minimal and potentially quite optimal for whatever it is that it does. I’ve often talked of this approach as “mining from the computational universe”. And over the course of many years I’ve had success with it in all sorts of disparate areas. And now, here, we’ve see in the Game of Life a particularly clean example where search is used alongside construction in developing technology.
\nIt’s a feature of things produced by construction that they are “born understandable”. In effect, they are computationally reducible enough that we can “fit them in our finite minds” and “understand them”. But things found by search don’t have this feature. And most of the time the behavior they’ll show will be full of computational irreducibility.
\nIn both biological evolution and machine learning my recent investigations suggest that most of what we’re seeing are “lumps of irreducible computation” found at random that just “happen to achieve the necessary objectives”. This hasn’t been something familiar in traditional engineering, but it’s something tremendously powerful. And from the examples we’ve seen here in the Game of Life it’s clear that it can often achieve things that seem completely inaccessible by traditional methods based on explicit construction.
\nAt first we might assume that irreducible computation is too unruly and unpredictable to be useful in achieving “understandable objectives”. But if we find just the right piece of irreducible computation then it’ll achieve the objective we want, often in a very minimal way. And the point is that the computational universe is in a sense big enough that we’ll usually be able to find that “right piece of irreducible computation”.
\nOne thing we see in Game-of-Life engineering is something that’s in a sense a compromise between irreducible computation and predictable construction. The basic idea is to take something that’s computationally irreducible, and to “put it in a cage” that constrains it to do what one wants. The computational irreducibility is in a sense the “spark” in the system; the cage provides the control we need to harness that spark in a way that meets our objectives.
\nLet’s look at some examples. As our “spark” we’ll use the R pentomino that we discussed at the very beginning. On its own, this generates all sorts of complex behavior—that for the most part doesn’t align with typical objectives we might define (though as a “side show” it does happen to generate gliders). But the idea is to put constraints on the R pentomino to make it “useful”.
Here’s a case where we’ve tried to “build a road” for the R pentomino to go down:
\nAnd looking at this every 18 steps we see that, at least for a while, the R pentomino has indeed moved down the road. But it’s also generated something of an “explosion”, and eventually this explosion catches up, and the R pentomino is destroyed.
\nSo can we maintain enough control to let the R pentomino survive? The answer is yes. And here, for example, is a period-12 oscillator, “powered” by an R pentomino at its center:
\nWithout the R pentomino, the structure we’ve set up cycles with period 6:
\nAnd when we insert the R pentomino this structure “keeps it under control”—so that the only effect it ultimately has is to double the period, t0 12.
\nHere’s a more dramatic example. Start with a static configuration of four so-called “eaters”:
\nNow insert two R pentominoes. They’ll start doing their thing, generating what seems like quite random behavior. But the “cage” defined by the “eaters” limits what can happen, and in the end what emerges is an oscillator—that has period 129:
\nWhat else can one “make R pentominoes do”? Well, with appropriate harnesses, they can for example be used to “power” oscillators with many different periods:
\n“Be an oscillator of a certain period” is in a sense a simple objective. But what about more complex objectives? Of course, any pattern of cells in the Game of Life will do something. But the question is whether that something aligns with technological objectives we have.
\nGenerically, things in the Game of Life will behave in computationally irreducible ways. And it’s this very fact that gives such richness to what can be done with the Game of Life. But can the computational irreducibility be controlled—and harnessed for technological purposes? In a sense that is the core challenge of engineering in both the Game of Life, and in the real world. (It’s also rather directly the challenge we face in making use of the computational power of AI, but still adequately aligning it with human objectives.)
\nAs we look at the arc of technological development in the Game of Life we see over the course of half a century all sorts of different advances being made. But will there be an end to this? Will we eventually run out of inventions and discoveries? The underlying presence of computational irreducibility makes it clear that we will not. The only thing that might end is the set of objectives we’re trying to meet. We now know how to make oscillators of any period. And unless we insist on for example finding the smallest oscillator of a given period, we can consider the problem of finding oscillators solved, with nothing more to discover.
\nIn the real world nature and the evolution of the universe inevitably confront us with new issues, which lead to new objectives. In the Game of Life—as in any other abstract area, like mathematics—the issue of defining new objectives is up to us. Computational irreducibility leads to infinite diversity and richness of what’s possible. The issue for us is to figure out what direction we want to go. And the story of engineering and technology in the Game of Life gives us, in effect, a simple model for the issues we confront in other areas of technology, like AI.
\nI’m not sure if I made the right decision back in 1981. I had come up with a very simple class of systems and was doing computer experiments on them, and was starting to get some interesting results. And when I mentioned what I was doing to a group of (then young) computer scientists they said “Oh, those things you’re studying are called cellular automata”. Well, actually, the cellular automata they were talking about were 2D systems while mine were 1D. And though that might seem like a technical difference, it has a big effect on one’s impression of what’s going on—because in 1D one can readily see “spacetime histories” that gave an immediate sense of the “whole behavior of the system”, while in 2D one basically can’t.
\nI wondered what to call my models. I toyed with the term “polymones”—as a modernized nod to Leibniz’s monads. But in the end I decided that I should stick with a simpler connection to history, and just call my models, like their 2D analogs, “cellular automata”. In many ways I’m happy with that decision. Though one of its downsides has been a certain amount of conceptual confusion—more than anything centered around the Game of Life.
\nPeople often know that the Game of Life is an example of a cellular automaton. And they also know that within the Game of Life lots of structures (like gliders and glider guns) can be set up to do particular things. Meanwhile, they hear about my discoveries about the generation of complexity in cellular automata (like rule 30). And somehow they conflate these things—leading to all too many books etc. that show pictures of simple gliders in the Game of Life and say “Look at all this complexity!”
\nAt some level it’s a confusion between science and engineering. My efforts around cellular automata have centered on empirical science questions like “What does this cellular automaton do if you run it?” But—as I’ve discussed at length above—most of what’s been done with the Game of Life has centered instead on questions of engineering, like “What recognizable (or useful) structures can you build in the system?” It’s a different objective, with different results. And, in particular, by asking to “engineer understandable technology” one’s specifically eschewing the phenomenon of computational irreducibility—and the whole story of the emergence of complexity that’s been so central to my own scientific work on cellular automata and so much else.
\nMany times over the years, people would show me things they’d been able to build in the Game of Life—and I really wouldn’t know what to make of them. Yes, they seemed like impressive hacks. But what was the big picture? Was this just fun, or was there some broader intellectual point? Well, finally, not long ago I realized: this is not a story of science, it’s a story about the arc of engineering, or what one can call “metaengineering”.
\nAnd back in 2018, in connection with the upcoming 50th anniversary of the Game of Life, I decided to see what I could figure out about this. But I wasn’t satisfied with how far I got, and other priorities interceded. So—beyond one small comment that ended up in a 2020 New York Times article—I didn’t write anything about what I’d done. And the project languished. Until now. When somehow my long-time interest in “alien engineering”, combined with my recent results about biological evolution coalesced into a feeling that it was time to finally figure out what we could learn from all that effort that’s been put into the Game of Life.
\nIn a sense this brings closure to a very long-running story for me. The first time I heard about the Game of Life was in 1973. I was an early teenager then, and I’d just gotten access to a computer. By today’s standards the computer (an Elliott 903C) was a primitive one: the size of a desk, programmed with paper tape, with only 24 kilobytes of memory. I was interested in using it for things like writing a simulator for the physics of idealized gas molecules. But other kids who had access to the computer were instead more interested (much as many kids might be today) in writing games. Someone wrote a “Hunt the Wumpus” game. And someone else wrote a program for the “Game of Life”. The configurations of cells at each generation were printed out on a teleprinter. And for some reason people were particularly taken with the “Cheshire cat” configuration, in which all that was left at the end (as in Alice in Wonderland) was a “smile”. At the time, I absolutely didn’t see the point of any of this. I was interested in science, not games, and the Game of Life pretty much lost me at “Game”.
\nFor a number of years I didn’t have any further contact with the Game of Life. But then I met Bill Gosper, who I later learned had in 1970 discovered the glider gun in the Game of Life. I met Gosper first “online” (yes, even in 1978 that was a thing, at least if you used the MIT-MC computer through the ARPANET)—then in person in 1979. And in 1980 I visited him at Xerox PARC, where he described himself as part of the “entertainment division” and gave me strange math formulas printed on a not-yet-out-of-the-lab color laser printer
\nand also showed me a bitmapped display (complete with GUI) with lots of pixels dancing around that he enthusiastically explained were showing the Game of Life. Knowing what I know now, I would have been excited by what I saw. But at the time, it didn’t really register.
\nStill, in 1981, having started my big investigation of 1D cellular automata, and having made the connection to the 2D case of the Game of Life, I started wondering whether there was something “scientifically useful” that I could glean from all the effort I knew (particularly from Gosper) had been put into Life. It didn’t help that almost none of the output of that effort had been published. And in those days before the web, personal contact was pretty much the only way to get unpublished material. One of my larger “finds” was from a friend of mine from Oxford who passed on “lab notebook pages” he’d got from someone who was enumerating outcomes from different Game-of-Life initial configurations:
\nAnd from material like this, as well as my own simulations, I came up with some tentative “scientific conclusions”, which I summarized in 1982 in a paragraph in my first big paper about cellular automata:
\nBut then, at the beginning of 1983, as part of my continuing effort to do science on cellular automata, I made a discovery. Among all cellular automata there seemed to be four basic classes of behavior, with class 4 being characterized by the presence of localized structures, sometimes just periodic, and sometimes moving:
\nI immediately recognized the analogy to the Game of Life, and to oscillators and gliders there. And indeed this analogy was part of what “tipped me off” to thinking about the ubiquitous computational capabilities of cellular automata, and to the phenomenon of computational irreducibility.
\nMeanwhile, in March 1983, I co-organized what was effectively the first-ever conference on cellular automata (held at Los Alamos)—and one of the people I invited was Gosper. He announced his Hashlife algorithm (which was crucial to future Life research) there, and came bearing gifts: printouts for me of Life, that I annotated, and still have in my archives:
\nI asked Gosper to do some “more scientific” experiments for me—for example starting from a region of randomness, then seeing what happened:
\nBut Gosper really wasn’t interested in what I saw as being science; he wanted to do engineering, and make constructions—like this one he gave me, showing two glider guns exchanging streams of gliders (why would one care, I wondered):
\nI’d mostly studied 1D cellular automata—where I’d discovered a lot by systematically looking at their behavior “laid out in spacetime”. But in early 1984 I resolved to also systematically check out 2D cellular automata. And mostly the resounding conclusion was that their basic behavior was very similar to 1D. Out of all the rules we studied, the Game of Life didn’t particularly stand out. But—mostly to provide a familiar comparison point—I included pictures of it in the paper we wrote:
\nAnd we also went to the trouble of making a 3D “spacetime” picture of the Game of Life on a Cray supercomputer—though it was too small to show anything terribly interesting:
\nIt had been a column in Scientific American in 1970 that had first propelled the Game of Life to public prominence—and that had also launched the first great Life engineering challenge of finding a glider gun. And in both 1984 and 1985 a successor to that very same column ran stories about my 1D cellular automata. And in 1985, in collaboration with Scientific American, I thought it would be fun and interesting to reprise the 1970 glider gun challenge, but now for 1D class 4 cellular automata:
\nMany people participated. And my main conclusion was: yes, it seemed like one could do the same kinds of engineering in typical 1D class 4 cellular automata as one could in the Game of Life. But this was all several years before the web, and the kind of online community that has driven so much Game of Life engineering in modern times wasn’t yet able to form.
\nMeanwhile, by the next year, I was starting the development of Mathematica and what’s now the Wolfram Language, and for a few years didn’t have much time to think about cellular automata. But in 1987 when Gosper got involved in making pre-release demos of Mathematica he once again excitedly told me about his discoveries in the Game of Life, and gave me pictures like:
\nIt was in 1992 that the Game of Life once again appeared in my life. I had recently embarked on what would become the 10-year project of writing my book A New Kind of Science. I was working on one of the rather few “I already have this figured out” sections in the book—and I wanted to compare class 4 behavior in 1D and 2D. How was I to display the Game of Life, especially in a static book? Equipped with what’s now the Wolfram Language it was easy to come up with visualizations—looking “out” into a spacetime slice with more distant cells “in a fog”, as well as “down” into a fog of successive states:
\n\nAnd, yes, it was immediately striking how similar the spacetime slice looked to my pictures of 1D class 4 cellular automata. And when I wrote a note for the end of the book about Life, the correspondence became even more obvious. I’d always seen the glider gun as a movie. But in a spacetime slice it “made much more sense”, and looked incredibly similar to analogous structures in 1D class 4 cellular automata:
\nIn A New Kind of Science I put a lot of effort into historical notes. And as a part of such a note on “History of cellular automata” I had a paragraph about the Game of Life:
\nI first met John Conway in September 1983 (at a conference in the south of France). As I would tell his biographer many years later, my relationship with Conway was complicated from the start. We were both drawn to systems defined by very simple rules, but what we found interesting about them was very different. I wanted to understand the big picture and to explore science-oriented questions (and what I would now call ruliology). Conway, on the other hand, was interested in specific, often whimsically presented results—and in questions that could be couched as mathematical theorems.
\nIn my conversations with Conway, the Game of Life would sometimes come up, but Conway never seemed too interested in talking about it. In 2001, though, when I was writing my note about the history of 2D cellular automata, I spent several hours specifically asking Conway about the Game of Life and its history. At first Conway told me the standard origin story that Life had arisen as a kind of game. A bit later he said he’d at the time just been hired as a logic professor, and had wanted to use Life as a simple way to enumerate the recursive functions. In the end, it was hard to disentangle true recollections from false (or “elaborated”) ones. And, notably, when asked directly about the origin of the specific rules of Life, he was evasive. Of course, none of that should detract from Conway’s achievement in the concept of the Game of Life, and in the definition of the hacker-like culture around it—the fruits of which have now allowed me to do what I’ve done here.
\nFor many years after the publication of A New Kind of Science in 2002, I didn’t actively engage with the Game of Life—though I would hear from Life enthusiasts with some frequency, but none as much as Gosper, from whom I was a recipient of hundreds of messages about Life, a typical example from 2017 concerning
\nand saying:
\n\nNovelty is mediated by the sporadic glider gas (which forms very sparse
\nbeams), sporadic debris (forming sparse lines), and is hidden in sporadic
\ndefects in the denser beams and lines. At this scale, each screen pixel
\nrepresents 262144 x 262144 Life cells. Thus very sparse lines, e.g. density
\n10^-5, appear solid, while being very nearly transparent to gliders.
After 3.4G, (sparse) new glider beams are still fading up. The beams
\nrepeatedly strafe the x and y axis stalagmites.
I suspect this will (very) eventually lead to a positive density of
\nswitch-engines, and thus quadratic population growth.
⋮
\nFinally, around 4.2G, an eater1 (fish hook):
\nDepending on background novelty radiation, there ought to be one of
\nthese every few billion, all lying on a line through the origin.
⋮
\nWith much help from Tom R, I slogged to 18G, with *zero* new nonmovers
\nin the 4th quadrant, causing me to propose a mechanism that precluded
\nfuture new ones. But then Andrew Trevorrow fired up his Big Mac (TM),
\nran 60G, and found three new nonmovers! They are, respectively, a mirror
\nimage(!) of the 1st eater, and two blinkers, in phase, but not aligned with
\nthe origin. I.e., all four are "oners'", or at least will lie on different
\ntrash trails.
I’m still waiting for one of these to sprout switch-engines and begin quadratic
\ngrowth. But here’s a puzzle: Doesn’t the gas of sparse gliders (actually glider
\npackets) in the diagonal strips athwart the 1st quadrant already reveal (small
\ncoefficient) quadratic growth? Which will *eventually* dominate? The area of the
\nstrips is increasing quadratically. Their density *appears* to be at least holding,
\nbut possibly along only one axis. I don’t see where quadratically many gliders could
\narise. They’re being manufactured at a (roughly) fixed rate. Imagine the above
\npicture in the distant future. Where is the amplification that will keep those
\nstrips full? ‐‐Bill