diff --git a/.obsidian/community-plugins.json b/.obsidian/community-plugins.json
index 5737bf04..a2f31287 100644
--- a/.obsidian/community-plugins.json
+++ b/.obsidian/community-plugins.json
@@ -19,5 +19,18 @@
"extended-graph",
"mysnippets-plugin",
"obsidian-pandoc-reference-list",
- "obsidian-share-as-gist"
+ "obsidian-share-as-gist",
+ "obsidian-excalidraw-plugin",
+ "txt-as-md-obsidian",
+ "text-snippets-obsidian",
+ "obsidian-tasks-plugin",
+ "tag-wrangler",
+ "obsidian-plugin-toc",
+ "share-note",
+ "obsidian-shellcommands",
+ "obsidian-rollover-daily-todos",
+ "qmd-as-md-obsidian",
+ "number-headings-obsidian",
+ "note-aliases",
+ "nldates-obsidian"
]
\ No newline at end of file
diff --git a/.obsidian/plugins/rss-reader/data.json b/.obsidian/plugins/rss-reader/data.json
index a74278d7..94881108 100644
--- a/.obsidian/plugins/rss-reader/data.json
+++ b/.obsidian/plugins/rss-reader/data.json
@@ -66,16 +66,16 @@
"description": "xkcd.com: A webcomic of romance and math humor.",
"items": [
{
- "title": "The Story Continues: Announcing Version 14 of Wolfram Language and Mathematica",
- "description": "Version 14.0 of Wolfram Language and Mathematica is available immediately both on the desktop and in the cloud. See also more detailed information on Version 13.1, Version 13.2 and Version 13.3. Building Something Greater and Greater… for 35 Years and Counting Today we celebrate a new waypoint on our journey of nearly four decades with […]",
- "content": "
Version 14.0 of Wolfram Language and Mathematica is available immediately both on the desktop and in the cloud. See also more detailed information on Version 13.1, Version 13.2 and Version 13.3.
\nToday we celebrate a new waypoint on our journey of nearly four decades with the release of Version 14.0 of Wolfram Language and Mathematica. Over the two years since we released Version 13.0 we’ve been steadily delivering the fruits of our research and development in .1 releases every six months. Today we’re aggregating these—and more—into Version 14.0.
\nIt’s been more than 35 years now since we released Version 1.0. And all those years we’ve been continuing to build a taller and taller tower of capabilities, progressively expanding the scope of our vision and the breadth of our computational coverage of the world:
\n
Version 1.0 had 554 built-in functions; in Version 14.0 there are 6602. And behind each of those functions is a story. Sometimes it’s a story of creating a superalgorithm that encapsulates decades of algorithmic development. Sometimes it’s a story of painstakingly curating data that’s never been assembled before. Sometimes it’s a story of drilling down to the essence of something to invent new approaches and new functions that can capture it.
\nAnd from all these pieces we’ve been steadily building the coherent whole that is today’s Wolfram Language. In the arc of intellectual history it defines a broad, new, computational paradigm for formalizing the world. And at a practical level it provides a superpower for implementing computational thinking—and enabling “computational X” for all fields X.
\nTo us it’s profoundly satisfying to see what has been done over the past three decades with everything we’ve built so far. So many discoveries, so many inventions, so much achieved, so much learned. And seeing this helps drive forward our efforts to tackle still more, and to continue to push every boundary we can with our R&D, and to deliver the results in new versions of our system.
\nOur R&D portfolio is broad. From projects that get completed within months of their conception, to projects that rely on years (and sometimes even decades) of systematic development. And key to everything we do is leveraging what we have already done—often taking what in earlier years was a pinnacle of technical achievement, and now using it as a routine building block to reach a level that could barely even be imagined before. And beyond practical technology, we’re also continually going further and further in leveraging what’s now the vast conceptual framework that we’ve been building all these years—and progressively encapsulating it in the design of the Wolfram Language.
\nWe’ve worked hard all these years not only to create ideas and technology, but also to craft a practical and sustainable ecosystem in which we can systematically do this now and into the long-term future. And we continue to innovate in these areas, broadening the delivery of what we’ve built in new and different ways, and through new and different channels. And in the past five years we’ve also been able to open up our core design process to the world—regularly livestreaming what we’re doing in a uniquely open way.
\nAnd indeed over the past several years the seeds of essentially everything we’re delivering today in Version 14.0 has been openly shared with the world, and represents an achievement not only for our internal teams but also for the many people who have participated in and commented on our livestreams.
\nPart of what Version 14.0 is about is continuing to expand the domain of our computational language, and our computational formalization of the world. But Version 14.0 is also about streamlining and polishing the functionality we’ve already defined. Throughout the system there are things we’ve made more efficient, more robust and more convenient. And, yes, in complex software, bugs of many kinds are a theoretical and practical inevitability. And in Version 14.0 we’ve fixed nearly 10,000 bugs, the majority found by our increasingly sophisticated internal software testing methods.
\nEven after all the work we’ve put into the Wolfram Language over the past several decades, there’s still yet another challenge: how to let people know just what the Wolfram Language can do. Back when we released Version 1.0 I was able to write a book of manageable size that could pretty much explain the whole system. But for Version 14.0—with all the functionality it contains—one would need a book with perhaps 200,000 pages.
\nAnd at this point nobody (even me!) immediately knows everything the Wolfram Language does. Of course one of our great achievements has been to maintain across all that functionality a tightly coherent and consistent design that results in there ultimately being only a small set of fundamental principles to learn. But at the vast scale of the Wolfram Language as it exists today, knowing what’s possible—and what can now be formulated in computational terms—is inevitably very challenging. And all too often when I show people what’s possible, I’ll get the response “I had no idea the Wolfram Language could do that!”
\nSo in the past few years we’ve put increasing emphasis into building large-scale mechanisms to explain the Wolfram Language to people. It begins at a very fine-grained level, with “just-in-time information” provided, for example, through suggestions made when you type. Then for each function (or other construct in the language) there are pages that explain the function, with extensive examples. And now, increasingly, we’re adding “just-in-time learning material” that leverages the concreteness of the functions to provide self-contained explanations of the broader context of what they do.
\nBy the way, in modern times we need to explain the Wolfram Language not just to humans, but also to AIs—and our very extensive documentation and examples have proved extremely valuable in training LLMs to use the Wolfram Language. And for AIs we’re providing a variety of tools—like immediate computable access to documentation, and computable error handling. And with our Chat Notebook technology there’s also a new “on ramp” for creating Wolfram Language code from linguistic (or visual, etc.) input.
\nBut what about the bigger picture of the Wolfram Language? For both people and AIs it’s important to be able to explain things at a higher level, and we’ve been doing more and more in this direction. For more than 30 years we’ve had “guide pages” that summarize specific functionality in particular areas. Now we’re adding “core area pages” that give a broader picture of large areas of functionality—each one in effect covering what might otherwise be a whole product on its own, if it wasn’t just an integrated part of the Wolfram Language:
\nBut we’re going even much further, building whole courses and books that provide modern hands-on Wolfram-Language-enabled introductions to a broad range of areas. We’ve now covered the material of many standard college courses (and quite a lot besides), in a new and very effective “computational” way, that allows immediate, practical engagement with concepts:
\nAll these courses involve not only lectures and notebooks but also auto-graded exercises, as well as official certifications. And we have a regular calendar of everyone-gets-together-at-the-same-time instructor-led peer Study Groups about these courses. And, yes, our Wolfram U operation is now emerging as a significant educational entity, with many thousands of students at any given time.
\nIn addition to whole courses, we have “miniseries” of lectures about specific topics:
\n\nAnd we also have courses—and books—about the Wolfram Language itself, like my Elementary Introduction to the Wolfram Language, which came out in a third edition this year (and has an associated course, online version, etc.):
\n\nIn a somewhat different direction, we’ve expanded our Wolfram Summer School to add a Wolfram Winter School, and we’ve greatly expanded our Wolfram High School Summer Research Program, adding year-round programs, middle-school programs, etc.—including the new “Computational Adventures” weekly activity program.
\nAnd then there’s livestreaming. We’ve been doing weekly “R&D livestreams” with our development team (and sometimes also external guests). And I myself have also been doing a lot of livestreaming (232 hours of it in 2023 alone)—some of it design reviews of Wolfram Language functionality, and some of it answering questions, technical and other.
\nThe list of ways we’re getting the word out about the Wolfram Language goes on. There’s Wolfram Community, that’s full of interesting contributions, and has ever-increasing readership. There are sites like Wolfram Challenges. There are our Wolfram Technology Conferences. And lots more.
\nWe’ve put immense effort into building the whole Wolfram technology stack over the past four decades. And even as we continue to aggressively build it, we’re putting more and more effort into telling the world about just what’s in it, and helping people (and AIs) to make the most effective use of it. But in a sense, everything we’re doing is just a seed for what the wider community of Wolfram Language users are doing, and can do. Spreading the power of the Wolfram Language to more and more people and areas.
\nThe machine learning superfunctions Classify and Predict first appeared in Wolfram Language in 2014 (Version 10). By the next year there were starting to be functions like ImageIdentify and LanguageIdentify, and within a couple of years we’d introduced our whole neural net framework and Neural Net Repository. Included in that were a variety of neural nets for language modeling, that allowed us to build out functions like SpeechRecognize and an experimental version of FindTextualAnswer. But—like everyone else—we were taken by surprise at the end of 2022 by ChatGPT and its remarkable capabilities.
\nVery quickly we realized that a major new use case—and market—had arrived for Wolfram|Alpha and Wolfram Language. For now it was not only humans who’d need the tools we’d built; it was also AIs. By March 2023 we’d worked with OpenAI to use our Wolfram Cloud technology to deliver a plugin to ChatGPT that allows it to call Wolfram|Alpha and Wolfram Language. LLMs like ChatGPT provide remarkable new capabilities in reproducing human language, basic human thinking and general commonsense knowledge. But—like unaided humans—they’re not set up to deal with detailed computation or precise knowledge. For that, like humans, they have to use formalism and tools. And the remarkable thing is that the formalism and tools we’ve built in Wolfram Language (and Wolfram|Alpha) are basically a broad, perfect fit for what they need.
\nWe created the Wolfram Language to provide a bridge from what humans think about to what computation can express and implement. And now that’s what the AIs can use as well. The Wolfram Language provides a medium not only for humans to “think computationally” but also for AIs to do so. And we’ve been steadily doing the engineering to let AIs call on Wolfram Language as easily as possible.
\nBut in addition to LLMs using Wolfram Language, there’s also now the possibility of Wolfram Language using LLMs. And already in June 2023 (Version 13.3) we released a major collection of LLM-based capabilities in Wolfram Language. One category is LLM functions, that effectively use LLMs as “internal algorithms” for operations in Wolfram Language:
\nIn typical Wolfram Language fashion, we have a symbolic representation for LLMs: LLMConfiguration[…] represents an LLM with its various parameters, promptings, etc. And in the past few months we’ve been steadily adding connections to the full range of popular LLMs, making Wolfram Language a unique hub not only for LLM usage, but also for studying the performance—and science—of LLMs.
\nYou can define your own LLM functions in Wolfram Language. But there’s also the Wolfram Prompt Repository that plays a similar role for LLM functions as the Wolfram Function Repository does for ordinary Wolfram Language functions. There’s a public Prompt Repository that so far has several hundred curated prompts. But it’s also possible for anyone to post their prompts in the Wolfram Cloud and make them publicly (or privately) accessible. The prompts can define personas (“talk like a [stereotypical] pirate”). They can define AI-oriented functions (“write it with emoji”). And they can define modifiers that affect the form of output (“haiku style”).
\n\nIn addition to calling LLMs “programmatically” within Wolfram Language, there’s the new concept (first introduced in Version 13.3) of “Chat Notebooks”. Chat Notebooks represent a new kind of user interface, that combines the graphical, computational and document features of traditional Wolfram Notebooks with the new linguistic interface capabilities brought to us by LLMs.
\nThe basic idea of a Chat Notebook—as introduced in Version 13.3, and now extended in Version 14.0—is that you can have “chat cells” (requested by typing ‘) whose content gets sent not to the Wolfram kernel, but instead to an LLM:
\nYou can use “function prompts”—say from the Wolfram Prompt Repository—directly in a Chat Notebook:
\nAnd as of Version 14.0 you can also knit Wolfram Language computations directly into your “conversation” with the LLM:
\n(You type \\ to insert Wolfram Language, very much like the way you can use <* … *> to insert Wolfram Language into external evaluation cells.)
\nOne thing about Chat Notebooks is that—as their name suggests—they really are centered around “chatting”, and around having a sequential interaction with an LLM. In an ordinary notebook, it doesn’t matter where in the notebook each Wolfram Language evaluation is requested; all that’s relevant is the order in which the Wolfram kernel does the evaluations. But in a Chat Notebook the “LLM evaluations” are always part of a “chat” that’s explicitly laid out in the notebook.
\nA key part of Chat Notebooks is the concept of a chat block: type ~ and you get a separator in the notebook that “starts a new chat”:
\nChat Notebooks—with all their typical Wolfram Notebook editing, structuring, automation, etc. capabilities—are very powerful just as “LLM interfaces”. But there’s another dimension as well, enabled by LLMs being able to call Wolfram Language as a tool.
\nAt one level, Chat Notebooks provide an “on ramp” for using Wolfram Language. Wolfram|Alpha—and even more so, Wolfram|Alpha Notebook Edition—let you ask questions in natural language, then have the questions translated into Wolfram Language, and answers computed. But in Chat Notebooks you can go beyond asking specific questions. Instead, through the LLM, you can just “start chatting” about what you want to do, then have Wolfram Language code generated, and executed:
\nThe workflow is typically as follows. First, you have to conceptualize in computational terms what you want. (And, yes, that step requires computational thinking—which is a very important skill that too few people have so far learned.) Then you tell the LLM what you want, and it’ll try to write Wolfram Language code to achieve it. It’ll typically run the code for you (but you can also always do it yourself)—and you can see whether you got what you wanted. But what’s crucial is that Wolfram Language is intended to be read not only by computers but also by humans. And particularly since LLMs actually usually seem to manage to write pretty good Wolfram Language code, you can expect to read what they wrote, and see if it’s what you wanted. If it is, you can take that code, and use it as a “solid building block” for whatever larger system you might be trying to set up. Otherwise, you can either fix it yourself, or try chatting with the LLM to get it to do it.
\nOne of the things we see in the example above is the LLM—within the Chat Notebook—making a “tool call”, here to a Wolfram Language evaluator. In the Wolfram Language there’s now a whole mechanism for defining tools for LLMs—with each tool being represented by an LLMTool symbolic object. In Version 14.0 there’s an experimental version of the new Wolfram LLM Tool Repository with some predefined tools:
\n\nIn a default Chat Notebook, the LLM has access to some default tools, which include not only the Wolfram Language evaluator, but also things like Wolfram documentation search and Wolfram|Alpha query. And it’s common to see the LLM go back and forth trying to write “code that works”, and for example sometimes having to “resort” (much like humans do) to reading the documentation.
\nSomething that’s new in Version 14.0 is experimental access to multimodal LLMs that can take images as well as text as input. And when this capability is enabled, it allows the LLM to “look at pictures from the code it generated”, see if they’re what was asked for, and potentially correct itself:
\nThe deep integration of images into Wolfram Language—and Wolfram Notebooks—yields all sorts of possibilities for multimodal LLMs. Here we’re giving a plot as an image and asking the LLM how to reproduce it:
\nAnother direction for multimodal LLMs is to take data (in the hundreds of formats accepted by Wolfram Language) and use the LLM to guide its visualization and analysis in the Wolfram Language. Here’s an example that starts from a file data.csv in the current directory on your computer:
\nOne thing that’s very nice about using Wolfram Language directly is that everything you do (well, unless you use RandomInteger, etc.) is completely reproducible; do the same computation twice and you’ll get the same result. That’s not true with LLMs (at least right now). And so when one uses LLMs it feels like something more ephemeral and fleeting than using Wolfram Language. One has to grab any good results one gets—because one might never be able to reproduce them. Yes, it’s very helpful that one can store everything in a Chat Notebook, even if one can’t rerun it and get the same results. But the more “permanent” use of LLM results tends to be “offline”. Use an LLM “up front” to figure something out, then just use the result it gave.
\nOne unexpected application of LLMs for us has been in suggesting names of functions. With the LLM’s “experience” of what people talk about, it’s in a good position to suggest functions that people might find useful. And, yes, when it writes code it has a habit of hallucinating such functions. But in Version 14.0 we’ve actually added one function—DigitSum—that was suggested to us by LLMs. And in a similar vein, we can expect LLMs to be useful in making connections to external databases, functions, etc. The LLM “reads the documentation”, and tries to write Wolfram Language “glue” code—which then can be reviewed, checked, etc., and if it’s right, can be used henceforth.
\nThen there’s data curation, which is a field that—through Wolfram|Alpha and many of our other efforts—we’ve become extremely expert at over the past couple of decades. How much can LLMs help with that? They certainly don’t “solve the whole problem”, but integrating them with the tools we already have has allowed us over the past year to speed up some of our data curation pipelines by factors of two or more.
\nIf we look at the whole stack of technology and content that’s in the modern Wolfram Language, the overwhelming majority of it isn’t helped by LLMs, and isn’t likely to be. But there are many—sometimes unexpected—corners where LLMs can dramatically improve heuristics or otherwise solve problems. And in Version 14.0 there are starting to be a wide variety of “LLM inside” functions.
\nAn example is TextSummarize, which is a function we’ve considered adding for many versions—but now, thanks to LLMs, can finally implement to a useful level:
\nThe main LLMs that we’re using right now are based on external services. But we’re building capabilities to allow us to run LLMs in local Wolfram Language installations as soon as that’s technically feasible. And one capability that’s actually part of our mainline machine learning effort is NetExternalObject—a way of representing symbolically an externally defined neural net that can be run inside Wolfram Language. NetExternalObject allows you, for example, to take any network in ONNX form and effectively treat it as a component in a Wolfram Language neural net. Here’s a network for image depth estimation—that we’re here importing from an external repository (though in this case there’s actually a similar network already in the Wolfram Neural Net Repository):
\nNow we can apply this imported network to an image that’s been encoded with our built-in image encoder—then we’re taking the result and visualizing it:
\nIt’s often very convenient to be able to run networks locally, but it can sometimes take quite high-end hardware to do so. For example, there’s now a function in the Wolfram Function Repository that does image synthesis entirely locally—but to run it, you do need a GPU with at least 8 GB of VRAM:
\nBy the way, based on LLM principles (and ideas like transformers) there’ve been other related advances in machine learning that have been strengthening a whole range of Wolfram Language areas—with one example being image segmentation, where ImageSegmentationComponents now provides robust “content-sensitive” segmentation:
\nWhen Mathematica 1.0 was released in 1988, it was a “wow” that, yes, now one could routinely do integrals symbolically by computer. And it wasn’t long before we got to the point—first with indefinite integrals, and later with definite integrals—where what’s now the Wolfram Language could do integrals better than any human. So did that mean we were “finished” with calculus? Well, no. First there were differential equations, and partial differential equations. And it took a decade to get symbolic ODEs to a beyond-human level. And with symbolic PDEs it took until just a few years ago. Somewhere along the way we built out discrete calculus, asymptotic expansions and integral transforms. And we also implemented lots of specific features needed for applications like statistics, probability, signal processing and control theory. But even now there are still frontiers.
\nAnd in Version 14 there are significant advances around calculus. One category concerns the structure of answers. Yes, one can have a formula that correctly represents the solution to a differential equation. But is it in the best, simplest or most useful form? Well, in Version 14 we’ve worked hard to make sure it is—often dramatically reducing the size of expressions that get generated.
\nAnother advance has to do with expanding the range of “pre-packaged” calculus operations. We’ve been able to do derivatives ever since Version 1.0. But in Version 14 we’ve added implicit differentiation. And, yes, one can give a basic definition for this easily enough using ordinary differentiation and equation solving. But by adding an explicit ImplicitD we’re packaging all that up—and handling the tricky corner cases—so that it becomes routine to use implicit differentiation wherever you want:
\nAnother category of pre-packaged calculus operations new in Version 14 are ones for vector-based integration. These were always possible to do in a “do-it-yourself” mode. But in Version 14 they are now streamlined built-in functions—that, by the way, also cover corner cases, etc. And what made them possible is actually a development in another area: our decade-long project to add geometric computation to Wolfram Language—which gave us a natural way to describe geometric constructs such as curves and surfaces:
\nRelated functionality new in Version 14 is ContourIntegrate:
\nFunctions like ContourIntegrate just “get the answer”. But if one’s learning or exploring calculus it’s often also useful to be able to do things in a more step-by-step way. In Version 14 you can start with an inactive integral
\nand explicitly do operations like changing variables:
\nSometimes actual answers get expressed in inactive form, particularly as infinite sums:
\nAnd now in Version 14 the function TruncateSum lets you take such a sum and generate a truncated “approximation”:
\nFunctions like D and Integrate—as well as LineIntegrate and SurfaceIntegrate—are, in a sense, “classic calculus”, taught and used for more than three centuries. But in Version 14 we also support what we can think of as “emerging” calculus operations, like fractional differentiation:
\nWhat are the primitives from which we can best build our conception of computation? That’s at some level the question I’ve been asking for more than four decades, and what’s determined the functions and structures at the core of the Wolfram Language.
\nAnd as the years go by, and we see more and more of what’s possible, we recognize and invent new primitives that will be useful. And, yes, the world—and the ways people interact with computers—change too, opening up new possibilities and bringing new understanding of things. Oh, and this year there are LLMs which can “get the intellectual sense of the world” and suggest new functions that can fit into the framework we’ve created with the Wolfram Language. (And, by the way, there’ve also been lots of great suggestions made by the audiences of our design review livestreams.)
\nOne new construct added in Version 13.1—and that I personally have found very useful—is Threaded. When a function is listable—as Plus is—the top levels of lists get combined:
\nBut sometimes you want one list to be “threaded into” the other at the lowest level, not the highest. And now there’s a way to specify that, using Threaded:
\nIn a sense, Threaded is part of a new wave of symbolic constructs that have “ambient effects” on lists. One very simple example (introduced in 2015) is Nothing:
\nAnother, introduced in 2020, is Splice:
\nAn old chestnut of Wolfram Language design concerns the way infinite evaluation loops are handled. And in Version 13.2 we introduced the symbolic construct TerminatedEvaluation to provide better definition of how out-of-control evaluations have been terminated:
\nIn a curious connection, in the computational representation of physics in our recent Physics Project, the direct analog of nonterminating evaluations are what make possible the seemingly unending universe in which we live.
\nBut what is actually going on “inside an evaluation”, terminating or not? I’ve always wanted a good representation of this. And in fact back in Version 2.0 we introduced Trace for this purpose:
\nBut just how much detail of what the evaluator does should one show? Back in Version 2.0 we introduced the option TraceOriginal that traces every path followed by the evaluator:
\nBut often this is way too much. And in Version 14.0 we’ve introduced the new setting TraceOriginal→Automatic, which doesn’t include in its output evaluations that don’t do anything:
\nThis may seem pedantic, but when one has an expression of any substantial size, it’s a crucial piece of pruning. So, for example, here’s a graphical representation of a simple arithmetic evaluation, with TraceOriginal→True:
\nAnd here’s the corresponding “pruned” version, with TraceOriginal→Automatic:
\n(And, yes, the structures of these graphs are closely related to things like the causal graphs we construct in our Physics Project.)
\nIn the effort to add computational primitives to the Wolfram Language, two new entrants in Version 14.0 are Comap and ComapApply. The function Map takes a function f and “maps it” over a list:
\nComap does the “mathematically co-” version of this, taking a list of functions and “comapping” them onto a single argument:
\nWhy is this useful? As an example, one might want to apply three different statistical functions to a single list. And now it’s easy to do that, using Comap:
\nBy the way, as with Map, there’s also an operator form for Comap:
\nComap works well when the functions it’s dealing with take just one argument. If one has functions that take multiple arguments, ComapApply is what one typically wants:
\nTalking of “co-like” functions, a new function added in Version 13.2 is PositionSmallest. Min gives the smallest element in a list; PositionSmallest instead says where the smallest elements are:
\nOne of the important objectives in the Wolfram Language is to have as much as possible “just work”. When we released Version 1.0 strings could be assumed just to contain ordinary ASCII characters, or perhaps to have an external character encoding defined. And, yes, it could be messy not to know “within the string itself” what characters were supposed to be there. And by the time of Version 3.0 in 1996 we’d become contributors to, and early adopters of, Unicode, which provided a standard encoding for “16-bits’-worth” of characters. And for many years this served us well. But in time—and particularly with the growth of emoji—16 bits wasn’t enough to encode all the characters people wanted to use. So a few years ago we began rolling out support for 32-bit Unicode, and in Version 13.1 we integrated it into notebooks—in effect making strings something much richer than before:
\nAnd, yes, you can use Unicode everywhere now:
\nBack when Version 1.0 was released, a megabyte was a lot of memory. But 35 years later we routinely deal with gigabytes. And one of the things that makes practical is computation with video. We first introduced Video experimentally in Version 12.1 in 2020. And over the past three years we’ve been systematically broadening and strengthening our ability to deal with video in Wolfram Language. Probably the single most important advance is that things around video now—as much as possible—“just work”, without “creaking” under the strain of handling such large amounts of data.
\nWe can directly capture video into notebooks, and we can robustly play video anywhere within a notebook. We’ve also added options for where to store the video so that it’s conveniently accessible to you and anyone else you want to give access to it.
\nThere’s lots of complexity in the encoding of video—and we now robustly and transparently support more than 500 codecs. We also do lots of convenient things automatically, like rotating portrait-mode videos—and being able to apply image processing operations like ImageCrop across whole videos. In every version, we’ve been further optimizing the speed of some video operation or another.
\nBut a particularly big focus has been on video generators: programmatic ways to produce videos and animations. One basic example is AnimationVideo, which produces the same kind of output as Animate, but as a Video object that can either be displayed directly in a notebook, or exported in MP4 or some other format:
\nAnimationVideo is based on computing each frame in a video by evaluating an expression. Another class of video generators take an existing visual construct, and simply “tour” it. TourVideo “tours” images, graphics and geo graphics; Tour3DVideo (new in Version 14.0) tours 3D geometry:
\nA very powerful capability in Wolfram Language is being able to apply arbitrary functions to videos. One example of how this can be done is VideoFrameMap, which maps a function across frames of a video, and which was made efficient in Version 13.2:
\nAnd although Wolfram Language isn’t intended as an interactive video editing system, we’ve made sure that it’s possible to do streamlined programmatic video editing in the language, and for example in Version 14.0 we’ve added things like transition effects in VideoJoin and timed overlays in OverlayVideo.
\nWith every new version of Wolfram Language we add new capabilities to extend yet further the domain of the language. But we also put a lot of effort into something less immediately visible: making existing capabilities faster, stronger and sleeker.
\nAnd in Version 14 two areas where we can see some examples of all these are dates and quantities. We introduced the notion of symbolic dates (DateObject, etc.) nearly a decade ago. And over the years since then we’ve built many things on this structure. And in the process of doing this it’s become clear that there are certain flows and paths that are particularly common and convenient. At the beginning what mattered most was just to make sure that the relevant functionality existed. But over time we’ve been able to see what should be streamlined and optimized, and we’ve steadily been doing that.
\nIn addition, as we’ve worked towards new and different applications, we’ve seen “corners” that need to be filled in. So, for example, astronomy is an area we’ve significantly developed in Version 14, and supporting astronomy has required adding several new “high-precision” time capabilities, such as the TimeSystem option, as well as new astronomy-oriented calendar systems. Another example concerns date arithmetic. What should happen if you want to add a month to January 30? Where should you land? Different kinds of business applications and contracts make different assumptions—and so we added a Method option to functions like DatePlus to handle this. Meanwhile, having realized that date arithmetic is involved in the “inner loop” of certain computations, we optimized it—achieving a more than 100x speedup in Version 14.0.
\nWolfram|Alpha has been able to deal with units ever since it was first launched in 2009—now more than 10,000 of them. And in 2012 we introduced Quantity to represent quantities with units in the Wolfram Language. And over the past decade we’ve been steadily smoothing out a whole series of complicated gotchas and issues with units. For example, what does .
At first our priority with Quantity was to get it working as broadly as possible, and to integrate it as widely as possible into computations, visualizations, etc. across the system. But as its capabilities have expanded, so have its uses, repeatedly driving the need to optimize its operation for particular common cases. And indeed between Version 13 and Version 14 we’ve dramatically sped up many things related to Quantity, often by factors of 1000 or more.
\nTalking of speedups, another example—made possible by new algorithms operating on multithreaded CPUs—concerns polynomials. We’ve worked with polynomials in Wolfram Language since Version 1, but in Version 13.2 there was a dramatic speedup of up to 1000x on operations like polynomial factoring.
\nIn addition, a new algorithm in Version 14.0 dramatically speeds up numerical solutions to polynomial and transcendental equations—and, together with the new MaxRoots options, allows us, for example, to pick off a few roots from a degree-one-million polynomial
\nor to find roots of a transcendental equation that we could not even attempt before without pre-specifying bounds on their values:
\nAnother “old” piece of functionality with recent enhancement concerns mathematical functions. Ever since Version 1.0 we’ve set up mathematical functions so that they can be computed to arbitrary precision:
\nBut in recent versions we’ve wanted to be “more precise about precision”, and to be able to rigorously compute just what range of outputs are possible given the range of values provided as input:
\nBut every function for which we do this effectively requires a new theorem, and we’ve been steadily increasing the number of functions covered—now more than 130—so that this “just works” when you need to use it in a computation.
\nTrees are useful. We first introduced them as basic objects in the Wolfram Language only in Version 12.3. But now that they’re there, we’re discovering more and more places they can be used. And to support that, we’ve been adding more and more capabilities to them.
\nOne area that’s advanced significantly since Version 13 is the rendering of trees. We tightened up the general graphic design, but, more importantly, we introduced many new options for how rendering should be done.
\nFor example, here’s a random tree where we’ve specified that for all nodes only 3 children should be explicitly displayed: the others are elided away:
\nHere we’re adding several options to define the rendering of the tree:
\nBy default, the branches in trees are labeled with integers, just like parts in an expression. But in Version 13.1 we added support for named branches defined by associations:
\nOur original conception of trees was very centered around having elements one would explicitly address, and that could have “payloads” attached. But what became clear is that there were applications where all that mattered was the structure of the tree, not anything about its elements. So we added UnlabeledTree to create “pure trees”:
\nTrees are useful because many kinds of structures are basically trees. And since Version 13 we’ve added capabilities for converting trees to and from various kinds of structures. For example, here’s a simple Dataset object:
\nYou can use ExpressionTree to convert this to a tree:
\nAnd TreeExpression to convert it back:
\nWe’ve also added capabilities for converting to and from JSON and XML, as well as for representing file directory structures as trees:
\nIn Version 1.0 we had integers, rational numbers and real numbers. In Version 3.0 we added algebraic numbers (represented implicitly by Root)—and a dozen years later we added algebraic number fields and transcendental roots. For Version 14 we’ve now added another (long-awaited) “number-related” construct: finite fields.
\nHere’s our symbolic representation of the field of integers modulo 7:
\nAnd now here’s a specific element of that field
\nwhich we can immediately compute with:
\nBut what’s really important about what we’ve done with finite fields is that we’ve fully integrated them into other functions in the system. So, for example, we can factor a polynomial whose coefficients are in a finite field:
\nWe can also do things like find solutions to equations over finite fields. So here, for example, is a point on a Fermat curve over the finite field GF(173):
\nAnd here is a power of a matrix with elements over the same finite field:
\nA major new capability added since Version 13 is astro computation. It begins with being able to compute to high precision the positions of things like planets. Even knowing what one means by “position” is complicated, though—with lots of different coordinate systems to deal with. By default AstroPosition gives the position in the sky at the current time from your Here location:
\nBut one can instead ask about a different coordinate system, like global galactic coordinates:
\nAnd now here’s a plot of the distance between Saturn and Jupiter over a 50-year period:
\nIn direct analogy to GeoGraphics, we’ve added AstroGraphics, here showing a patch of sky around the current position of Saturn:
\nAnd this now shows the sequence of positions for Saturn over the course of a couple of years—yes, including retrograde motion:
\nThere are many styling options for AstroGraphics. Here we’re adding a background of the “galactic sky”:
\nAnd here we’re including renderings for constellations (and, yes, we had an artist draw them):
\nSomething specifically new in Version 14.0 has to do with extended handling of solar eclipses. We always try to deliver new functionality as fast as we can. But in this case there was a very specific deadline: the total solar eclipse visible from the US on April 8, 2024. We’ve had the ability to do global computations about solar eclipses for some time (actually since soon before the 2017 eclipse). But now we can also do detailed local computations right in the Wolfram Language.
\nSo, for example, here’s a somewhat detailed overall map of the April 8, 2024, eclipse:
\nNow here’s a plot of the magnitude of the eclipse over a few hours, complete with a little “rampart” associated with the period of totality:
\nAnd here’s a map of the region of totality every minute just after the moment of maximum eclipse:
\nWe first introduced computable data on biological organisms back when Wolfram|Alpha was released in 2009. But in Version 14—following several years of work—we’ve dramatically broadened and deepened the computable data we have about biological organisms.
\nSo for example here’s how we can figure out what species have cheetahs as predators:
\nAnd here are pictures of these:
\nHere’s a map of countries where cheetahs have been seen (in the wild):
\nWe now have data—curated from a great many sources—on more than a million species of animals, as well as most of the plants, fungi, bacteria, viruses and archaea that have been described. And for animals, for example, we have nearly 200 properties that are extensively filled in. Some are taxonomic properties:
\nSome are physical properties:
\nSome are genetic properties:
\nSome are ecological properties (yes, the cheetah is not the apex predator):
\nIt’s useful to be able to get properties of individual species, but the real power of our curated computable data shows up when one does larger-scale analyses. Like here’s a plot of the lengths of genomes for organisms with the longest ones across our collection of organisms:
\nOr here’s a histogram of the genome lengths for organisms in the human gut microbiome:
\nAnd here’s a scatterplot of the lifespans of birds against their weights:
\nFollowing the idea that cheetahs aren’t apex predators, this is a graph of what’s “above” them in the food chain:
\nWe began the process of introducing chemical computation into the Wolfram Language in Version 12.0, and by Version 13 we had good coverage of atoms, molecules, bonds and functional groups. Now in Version 14 we’ve added coverage of chemical formulas, amounts of chemicals—and chemical reactions.
\nHere’s a chemical formula, that basically just gives a “count of atoms”:
\nNow here are specific molecules with that formula:
\nLet’s pick one of these molecules:
\nNow in Version 14 we have a way to represent a certain quantity of molecules of a given type—here 1 gram of methylcyclopentane:
\nChemicalConvert can convert to a different specification of quantity, here moles:
\nAnd here a count of molecules:
\nBut now the bigger story is that in Version 14 we can represent not just individual types of molecules, and quantities of molecules, but also chemical reactions. Here we give a “sloppy” unbalanced representation of a reaction, and ReactionBalance gives us the balanced version:
\nAnd now we can extract the formulas for the reactants:
\nWe can also give a chemical reaction in terms of molecules:
\nBut with our symbolic representation of molecules and reactions, there’s now a big thing we can do: represent classes of reactions as “pattern reactions”, and work with them using the same kinds of concepts as we use in working with patterns for general expressions. So, for example, here’s a symbolic representation of the hydrohalogenation reaction:
\nNow we can apply this pattern reaction to particular molecules:
\nHere’s a more elaborate example, in this case entered using a SMARTS string:
\nHere we’re applying the reaction just once:
\nAnd now we’re doing it repeatedly
\nin this case generating longer and longer molecules (which in this case happen to be polypeptides):
\nEvery minute of every day, new data is being added to the Wolfram Knowledgebase. Much of it is coming automatically from real-time feeds. But we also have a very large-scale ongoing curation effort with humans in the loop. We’ve built sophisticated (Wolfram Language) automation for our data curation pipeline over the years—and this year we’ve been able to increase efficiency in some areas by using LLM technology. But it’s hard to do curation right, and our long-term experience is that to do so ultimately requires human experts being in the loop, which we have.
\nSo what’s new since Version 13.0? 291,842 new notable current and historical people; 264,467 music works; 118,538 music albums; 104,024 named stars; and so on. Sometimes the addition of an entity is driven by the new availability of reliable data; often it’s driven by the need to use that entity in some other piece of functionality (e.g. stars to render in AstroGraphics). But more than just adding entities there’s the issue of filling in values of properties of existing entities. And here again we’re always making progress, sometimes integrating newly available large-scale secondary data sources, and sometimes doing direct curation ourselves from primary sources.
\nA recent example where we needed to do direct curation was in data on alcoholic beverages. We have very extensive data on hundreds of thousands of types of foods and drinks. But none of our large-scale sources included data on alcoholic beverages. So that’s an area where we need to go to primary sources (in this case typically the original producers of products) and curate everything for ourselves.
\nSo, for example, we can now ask for something like the distribution of flavors of different varieties of vodka (actually, personally, not being a consumer of such things, I had no idea vodka even had flavors…):
\nBut beyond filling out entities and properties of existing types, we’ve also steadily been adding new entity types. One recent example is geological formations, 13,706 of them:
\nSo now, for example, we can specify where T. rex have been found
\nand we can show those regions on a map:
\nPDEs are hard. It’s hard to solve them. And it’s hard to even specify what exactly you want to solve. But we’ve been on a multi-decade mission to “consumerize” PDEs and make them easier to work with. Many things go into this. You need to be able to easily specify elaborate geometries. You need to be able to easily define mathematically complicated boundary conditions. You need to have a streamlined way to set up the complicated equations that come out of underlying physics. Then you have to—as automatically as possible—do the sophisticated numerical analysis to efficiently solve the equations. But that’s not all. You also often need to visualize your solution, compute other things from it, or run optimizations of parameters over it.
\nIt’s a deep use of what we’ve built with Wolfram Language—touching many parts of the system. And the result is something unique: a truly streamlined and integrated way to handle PDEs. One’s not dealing with some (usually very expensive) “just for PDEs” package; what we now have is a “consumerized” way to handle PDEs whenever they’re needed—for engineering, science, or whatever. And, yes, being able to connect machine learning, or image computation, or curated data, or data science, or real-time sensor feeds, or parallel computing, or, for that matter, Wolfram Notebooks, to PDEs just makes them so much more valuable.
\nWe’ve had “basic, raw NDSolve” since 1991. But what’s taken decades to build is all the structure around that to let one conveniently set up—and efficiently solve—real-world PDEs, and connect them into everything else. It’s taken developing a whole tower of underlying algorithmic capabilities such as our more-flexible-and-integrated-than-ever-before industrial-strength computational geometry and finite element methods. But beyond that it’s taken creating a language for specifying real-world PDEs. And here the symbolic nature of the Wolfram Language—and our whole design framework—has made possible something very unique, that has allowed us to dramatically simplify and consumerize the use of PDEs.
\nIt’s all about providing symbolic “construction kits” for PDEs and their boundary conditions. We started this about five years ago, progressively covering more and more application areas. In Version 14 we’ve particularly focused on solid mechanics, fluid mechanics, electromagnetics and (one-particle) quantum mechanics.
\nHere’s an example from solid mechanics. First, we define the variables we’re dealing with (displacement and underlying coordinates):
\nNext, we specify the parameters we want to use to describe the solid material we’re going to work with:
\nNow we can actually set up our PDE—using symbolic PDE specifications like SolidMechanicsPDEComponent—here for the deformation of a solid object pulled on one side:
\nAnd, yes, “underneath”, these simple symbolic specifications turn into a complicated “raw” PDE:
\nNow we are ready to actually solve our PDE in a particular region, i.e. for an object with a particular shape:
\nAnd now we can visualize the result, which shows how our object stretches when it’s pulled on:
\nThe way we’ve set things up, the material for our object is an idealization of something like rubber. But in the Wolfram Language we now have ways to specify all sorts of detailed properties of materials. So, for example, we can add reinforcement as a unit vector in a particular direction (say in practice with fibers) to our material:
\nThen we can rerun what we did before
\nbut now we get a slightly different result:
\nAnother major PDE domain that’s new in Version 14.0 is fluid flow. Let’s do a 2D example. Our variables are 2D velocity and pressure:
\nNow we can set up our fluid system in a particular region, with no-slip conditions on all walls except at the top where we assume fluid is flowing from left to right. The only parameter needed is the Reynolds number. And instead of just solving our PDEs for a single Reynolds number, let’s create a parametric solver that can take any specified Reynolds number:
\nNow here’s the result for Reynolds number 100:
\nBut with the way we’ve set things up, we can as well generate a whole video as a function of Reynolds number (and, yes, the Parallelize speeds things up by generating different frames in parallel):
\nMuch of our work in PDEs involves catering to the complexities of real-world engineering situations. But in Version 14.0 we’re also adding features to support “pure physics”, and in particular to support quantum mechanics done with the Schrödinger equation. So here, for example, is the 2D 1-particle Schrödinger equation (with ):
Here’s the region we’re going to be solving over—showing explicit discretization:
\nNow we can solve the equation, adding in some boundary conditions:
\nAnd now we get to visualize a Gaussian wave packet scattering around a barrier:
\nSystems engineering is a big field, but it’s one where the structure and capabilities of the Wolfram Language provide unique advantages—that over the past decade have allowed us to build out rather complete industrial-strength support for modeling, analysis and control design for a wide range of types of systems. It’s all an integrated part of the Wolfram Language, accessible through the computational and interface structure of the language. But it’s also integrated with our separate Wolfram System Modeler product, that provides a GUI-based workflow for system modeling and exploration.
\nShared with System Modeler are large collections of domain-specific modeling libraries. And, for example, since Version 13, we’ve added libraries in areas such as battery engineering, hydraulic engineering and aircraft engineering—as well as educational libraries for mechanical engineering, thermal engineering, digital electronics, and biology. (We’ve also added libraries for areas such as business and public policy simulation.)
\nA typical workflow for systems engineering begins with the setting up of a model. The model can be built from scratch, or assembled from components in model libraries—either visually in Wolfram System Modeler, or programmatically in the Wolfram Language. For example, here’s a model of an electric motor that’s turning a load through a flexible shaft:
\nOnce one’s got a model, one can then simulate it. Here’s an example where we’ve set one parameter of our model (the moment of inertia of the load), and we’re computing the values of two others as a function of time:
\nA new capability in Version 14.0 is being able to see the effect of uncertainty in parameters (or initial values, etc.) on the behavior of a system. So here, as an example, we’re saying the value of the parameter is not definite, but is instead distributed according to a normal distribution—then we’re seeing the distribution of output results:
\nThe motor with flexible shaft that we’re looking at can be thought of as a “multidomain system”, combining electrical and mechanical components. But the Wolfram Language (and Wolfram System Modeler) can also handle “mixed systems”, combining analog and digital (i.e. continuous and discrete) components. Here’s a fairly sophisticated example from the world of control systems: a helicopter model connected in a closed loop to a digital control system:
\nThis whole model system can be represented symbolically just by:
\nAnd now we compute the input-output response of the model:
\nHere’s specifically the output response:
\nBut now we can “drill in” and see specific subsystem responses, here of the zero-order hold device (labeled ZOH above)—complete with its little digital steps:
\nBut what if we want to design the control systems ourselves? Well, in Version 14 we can now apply all our Wolfram Language control systems design functionality to arbitrary system models. Here’s an example of a simple model, in this case in chemical engineering (a continuously stirred tank):
\nNow we can take this model and design an LQG controller for it—then assemble a whole closed-loop system for it:
\nNow we can simulate the closed-loop system—and see that the controller succeeds in bringing the final value to 0:
\nGraphics have always been an important part of the story of the Wolfram Language, and for more than three decades we’ve been progressively enhancing and updating their appearance and functionality—sometimes with help from advances in hardware (e.g. GPU) capabilities.
\nSince Version 13 we’ve added a variety of “decorative” (or “annotative”) effects in 2D graphics. One example (useful for putting captions on things) is Haloing:
\nAnother example is DropShadowing:
\nAll of these are specified symbolically, and can be used throughout the system (e.g. in hover effects, etc). And, yes, there are many detailed parameters you can set:
\nA significant new capability in Version 14.0 is convenient texture mapping. We’ve had low-level polygon-by-polygon textures for a decade and a half. But now in Version 14.0 we’ve made it straightforward to map textures onto whole surfaces. Here’s an example wrapping a texture onto a sphere:
\nAnd here’s wrapping the same texture onto a more complicated surface:
\nA significant subtlety is that there are many ways to map what amount to “texture coordinate patches” onto surfaces. The documentation illustrates new, named cases:
\nAnd now here’s what happens with stereographic projection onto a sphere:
\nHere’s an example of “surface texture” for the planet Venus
\nand here it’s been mapped onto a sphere, which can be rotated:
\nHere’s a “flowerified” bunny:
\nThings like texture mapping help make graphics visually compelling. Since Version 13 we’ve also added a variety of “live visualization” capabilities that automatically “bring visualizations to life”. For example, any plot now by default has a “coordinate mouseover”:
\nAs usual, there’s lots of ways to control such “highlighting” effects:
\nOne might say it’s been two thousand years in the making. But four years ago (Version 12) we began to introduce a computable version of Euclid-style synthetic geometry.
\nThe idea is to specify geometric scenes symbolically by giving a collection of (potentially implicit) constraints:
\nWe can then generate a random instance of geometry consistent with the constraints—and in Version 14 we’ve considerably enhanced our ability to make sure that geometry will be “typical” and non-degenerate:
\nBut now a new feature of Version 14 is that we can find values of geometric quantities that are determined by the constraints:
\nHere’s a slightly more complicated case:
\nAnd here we’re now solving for the areas of two triangles in the figure:
\nWe’ve always been able to give explicit styles for particular elements of a scene:
\nNow one of the new features in Version 14 is being able to give general “geometric styling rules”, here just assigning random colors to each element:
\nOur goal with Wolfram Language is to make it as easy as possible to express oneself computationally. And a big part of achieving that is the coherent design of the language itself. But there’s another part as well, which is being able to actually enter Wolfram Language input one wants—say in a notebook—as easily as possible. And with every new version we make enhancements to this.
\nOne area that’s been in continuous development is interactive syntax highlighting. We first added syntax highlighting nearly two decades ago—and over time we’ve progressively made it more and more sophisticated, responding both as you type, and as code gets executed. Some highlighting has always had obvious meaning. But particularly highlighting that is dynamic and based on cursor position has sometimes been harder to interpret. And in Version 14—leveraging the brighter color palettes that have become the norm in recent years—we’ve tuned our dynamic highlighting so it’s easier to quickly tell “where you are” within the structure of an expression:
\nOn the subject of “knowing what one has”, another enhancement—added in Version 13.2—is differentiated frame coloring for different kinds of visual objects in notebooks. Is that thing one has a graphic? Or an image? Or a graph? Now one can tell from the color of frame when one selects it:
\nAn important aspect of the Wolfram Language is that the names of built-in functions are spelled out enough that it’s easy to tell what they do. But often the names are therefore necessarily quite long, and so it’s important to be able to autocomplete them when one’s typing. In 13.3 we added the notion of “fuzzy autocompletion” that not only “completes to the end” a name one’s typing, but also can fill in intermediate letters, change capitalization, etc. Thus, for example, just typing lll brings up an autocompletion menu that begins with ListLogLogPlot:
\n
A major user interface update that first appeared in Version 13.1—and has been enhanced in subsequent versions—is a default toolbar for every notebook:
\nThe toolbar provides immediate access to evaluation controls, cell formatting and various kinds of input (like inline cells, , hyperlinks, drawing canvas, etc.)—as well as to things like
cloud publishing,
documentation search and
“chat” (i.e. LLM) settings.
Much of the time, it’s useful to have the toolbar displayed in any notebook you’re working with. But on the left-hand side there’s a little tiny that lets you minimize the toolbar:
In 14.0 there’s a Preferences setting that makes the toolbar come up minimized in any new notebook you create—and this in effect gives you the best of both worlds: you have immediate access to the toolbar, but your notebooks don’t have anything “extra” that might distract from their content.
\nAnother thing that’s advanced since Version 13 is the handling of “summary” forms of output in notebooks. A basic example is what happens if you generate a very large result. By default only a summary of the result is actually displayed. But now there’s a bar at the bottom that gives various options for how to handle the actual output:
\nBy default, the output is only stored in your current kernel session. But by pressing the Iconize button you get an iconized form that will appear directly in your notebook (or one that can be copied anywhere) and that “has the whole output inside”. There’s also a Store full expression in notebook button, which will “invisibly” store the output expression “behind” the summary display.
\nIf the expression is stored in the notebook, then it’ll be persistent across kernel sessions. Otherwise, well, you won’t be able to get to it in a different kernel session; the only thing you’ll have is the summary display:
\nIt’s a similar story for large “computational objects”. Like here’s a Nearest function with a million data points:
\nBy default, the data is just something that exists in your current kernel session. But now there’s a menu that lets you save the data in various persistent locations:
\nThere are many ways to run the Wolfram Language. Even in Version 1.0 we had the notion of remote kernels: the notebook front end running on one machine (in those days essentially always a Mac, or a NeXT), and the kernel running on a different machine (in those days sometimes even connected by phone lines). But a decade ago came a major step forward: the Wolfram Cloud.
\nThere are really two distinct ways in which the cloud is used. The first is in delivering a notebook experience similar to our longtime desktop experience, but running purely in a browser. And the second is in delivering APIs and other programmatically accessed capabilities—notably, even at the beginning, a decade ago, through things like APIFunction.
\nThe Wolfram Cloud has been the target of intense development now for nearly 15 years. Alongside it have also come Wolfram Application Server and Wolfram Web Engine, which provide more streamlined support specifically for APIs (without things like user management, etc., but with things like clustering).
\nAll of these—but particularly the Wolfram Cloud—have become core technology capabilities for us, supporting many of our other activities. So, for example, the Wolfram Function Repository and Wolfram Paclet Repository are both based on the Wolfram Cloud (and in fact this is true of our whole resource system). And when we came to build the Wolfram plugin for ChatGPT earlier this year, using the Wolfram Cloud allowed us to have the plugin deployed within a matter of days.
\nSince Version 13 there have been quite a few very different applications of the Wolfram Cloud. One is for the function ARPublish, which takes 3D geometry and puts it in the Wolfram Cloud with appropriate metadata to allow phones to get augmented-reality versions from a QR code of a cloud URL:
\nOn the Cloud Notebook side, there’s been a steady increase in usage, notably of embedded Cloud Notebooks, which have for example become common on Wolfram Community, and are used all over the Wolfram Demonstrations Project. Our goal all along has been to make Cloud Notebooks be as easy to use as simple webpages, but to have the depth of capabilities that we’ve developed in notebooks over the past 35 years. We achieved this some years ago for fairly small notebooks, but in the past couple of years we’ve been going progressively further in handling even multi-hundred-megabyte notebooks. It’s a complicated story of caching, refreshing—and dodging the vicissitudes of web browsers. But at this point the vast majority of notebooks can be seamlessly deployed to the cloud, and will display as immediately as simple webpages.
\nIt’s been possible to call external code from Wolfram Language ever since Version 1.0. But in Version 14 there are important advances in the extent and ease with which external code can be integrated. The overall goal is to be able to use all the power and coherence of the Wolfram Language even when some part of a computation is done in external code. And in Version 14 we’ve done a lot to streamline and automate the process by which external code can be integrated into the language.
\nOnce something is integrated into the Wolfram Language it just becomes, for example, a function that can be used just like any other Wolfram Language function. But what’s underneath is necessarily quite different for different kinds of external code. There’s one setup for interpreted languages like Python. There’s another for C-like compiled languages and dynamic libraries. (And then there are others for external processes, APIs, and what amount to “importable code specifications”, say for neural networks.)
\nLet’s start with Python. We’ve had ExternalEvaluate for evaluating Python code since 2018. But when you actually come to use Python there are all these dependencies and libraries to deal with. And, yes, that’s one of the places where the incredible advantages of the Wolfram Language and its coherent design are painfully evident. But in Version 14.0 we now have a way to encapsulate all that Python complexity, so that we can deliver Python functionality within Wolfram Language, hiding all the messiness of Python dependencies, and even the versioning of Python itself.
\nAs an example, let’s say we want to make a Wolfram Language function Emojize that uses the Python function emojize within the emoji Python library. Here’s how we can do that:
\nAnd now you can just call Emojize in the Wolfram Language and—under the hood—it’ll run Python code:
\nThe way this works is that the first time you call Emojize, a Python environment with all the right features is created, then is cached for subsequent uses. And what’s important is that the Wolfram Language specification of Emojize is completely system independent (or as system independent as it can be, given vicissitudes of Python implementations). So that means that you can, for example, deploy Emojize in the Wolfram Function Repository just like you would deploy something written purely in Wolfram Language.
\nThere’s very different engineering involved in calling C-compatible functions in dynamic libraries. But in Version 13.3 we also made this very streamlined using the function ForeignFunctionLoad. There’s all sorts of complexity associated with converting to and from native C data types, managing memory for data structures, etc. But we’ve now got very clean ways to do this in Wolfram Language.
\nAs an example, here’s how one sets up a “foreign function” call to a function RAND_bytes in the OpenSSL library:
\nInside this, we’re using Wolfram Language compiler technology to specify the native C types that will be used in the foreign function. But now we can package this all up into a Wolfram Language function:
\nAnd we can call this function just like any other Wolfram Language function:
\nInternally, all sorts of complicated things are going on. For example, we’re allocating a raw memory buffer that’s then getting fed to our C function. But when we do that memory allocation we’re creating a symbolic structure that defines it as a “managed object”:
\nAnd now when this object is no longer being used, the memory associated with it will be automatically freed.
\nAnd, yes, with both Python and C there’s quite a bit of complexity underneath. But the good news is that in Version 14 we’ve basically been able to automate handling it. And the result is that what gets exposed is pure, simple Wolfram Language.
\nBut there’s another big piece to this. Within particular Python or C libraries there are often elaborate definitions of data structures that are specific to that library. And so to use these libraries one has to dive into all the—potentially idiosyncratic—complexities of those definitions. But in the Wolfram Language we have consistent symbolic representations for things, whether they’re images, or dates or types of chemicals. When you first hook up an external library you have to map its data structures to these. But once that’s done, anyone can use what’s been built, and seamlessly integrate with other things they’re doing, perhaps even calling other external code. In effect what’s happening is that one’s leveraging the whole design framework of the Wolfram Language, and applying that even when one’s using underlying implementations that aren’t based on the Wolfram Language.
\nA single line (or less) of Wolfram Language code can do a lot. But one of the remarkable things about the language is that it’s fundamentally scalable: good both for very short programs and very long programs. And since Version 13 there’ve been several advances in handling very long programs. One of them concerns “code editing”.
\nStandard Wolfram Notebooks work very well for exploratory, expository and many other forms of work. And it’s certainly possible to write large amounts of code in standard notebooks (and, for example, I personally do it). But when one’s doing “software-engineering-style work” it’s both more convenient and more familiar to use what amounts to a pure code editor, largely separate from code execution and exposition. And this is why we have the “package editor”, accessible from File > New > Package/Script. You’re still operating in the notebook environment, with all its sophisticated capabilities. But things have been “skinned” to provide a much more textual “code experience”—both in terms of editing, and in terms of what actually gets saved in .wl files.
\nHere’s typical example of the package editor in action (in this case applied to our GitLink package):
\nSeveral things are immediately evident. First, it’s very line oriented. Lines (of code) are numbered, and don’t break except at explicit newlines. There are headings just like in ordinary notebooks, but when the file is saved, they’re stored as comments with a certain stylized structure:
\n
It’s still perfectly possible to run code in the package editor, but the output won’t get saved in the .wl file:
\nOne thing that’s changed since Version 13 is that the toolbar is much enhanced. And for example there’s now “smart search” that is aware of code structure:
\nYou can also ask to go to a line number—and you’ll immediately see whatever lines of code are nearby:
\nIn addition to code editing, another set of features new since Version 13 of importance to serious developers concern automated testing. The main advance is the introduction of a fully symbolic testing framework, in which individual tests are represented as symbolic objects
\nand can be manipulated in symbolic form, then run using functions like TestEvaluate and TestReport:
\nIn Version 14.0 there’s another new testing function—IntermediateTest—that lets you insert what amount to checkpoints inside larger tests:
\nEvaluating this test, we see that the intermediate tests were also run:
\nThe Wolfram Function Repository has been a big success. We introduced it in 2019 as a way to make specific, individual contributed functions available in the Wolfram Language. And now there are more than 2900 such functions in the Repository.
\nThe nearly 7000 functions that constitute the Wolfram Language as it is today have been painstakingly developed over the past three and a half decades, always mindful of creating a coherent whole with consistent design principles. And now in a sense the success of the Function Repository is one of the dividends of all that effort. Because it’s the coherence and consistency of the underlying language and its design principles that make it feasible to just add one function at a time, and have it really work. You want to add a function to do some very specific operation that combines images and graphs. Well, there’s a consistent representation of both images and graphs in the Wolfram Language, which you can leverage. And by following the principles of the Wolfram Language—like for the naming of functions—you can create a function that’ll be easy for Wolfram Language users to understand and use.
\nUsing the Wolfram Function Repository is a remarkably seamless process. If you know the function’s name, you can just call it using ResourceFunction; the function will be loaded if it’s needed, and then it’ll just run:
\nIf there’s an update available for the function, it’ll give you a message, but run the old version anyway. The message has a button that lets you load in the update; then you can rerun your input and use the new version. (If you’re writing code where you want to “burn in” a particular version of a function, you can just use the ResourceVersion option of ResourceFunction.)
\nIf you want your code to look more elegant, just evaluate the ResourceFunction object
\nand use the formatted version:
\nAnd, by the way, pressing the + then gives you more information about the function:
\n
An important feature of functions in the Function Repository is that they all have documentation pages—that are organized pretty much like the pages for built-in functions:
\nBut how does one create a Function Repository entry? Just go to File > New > Repository Item > Function Repository Item and you’ll get a Definition Notebook:
\nWe’ve optimized this to be as easy to fill in as possible, minimizing boilerplate and automatically checking for correctness and consistency whenever possible. And the result is that it’s perfectly realistic to create a simple Function Repository item in under an hour—with the main time spent being in the writing of good expository examples.
\nWhen you press Submit to Repository your function gets sent to the Wolfram Function Repository review team, whose mandate is to ensure that functions in the repository do what they say they do, work in a way that is consistent with general Wolfram Language design principles, have good names, and are adequately documented. Except for very specialized functions, the goal is to finish reviews within a week (and sometimes considerably sooner)—and to publish functions as soon as they are ready.
\nThere’s a digest of new (and updated) functions in the Function Repository that gets sent out every Friday—and makes for interesting reading (you can subscribe here):
\n
The Wolfram Function Repository is a curated public resource that can be accessed from any Wolfram Language system (and, by the way, the source code for every function is available—just press the Source Notebook button). But there’s another important use case for the infrastructure of the Function Repository: privately deployed “resource functions”.
\nIt all works through the Wolfram Cloud. You use the exact same Definition Notebook, but now instead of submitting to the public Wolfram Function Repository, you just deploy your function to the Wolfram Cloud. You can make it private so that only you, or some specific group, can access it. Or you can make it public, so anyone who knows its URL can immediately access and use it in their Wolfram Language system.
\nThis turns out to be a tremendously useful mechanism, both for group projects, and for creating published material. In a sense it’s a very lightweight but robust way to distribute code—packaged into functions that can immediately be used. (By the way, to find the functions you’ve published from your Wolfram Cloud account, just go to the DeployedResources folder in the cloud file browser.)
\n(For organizations that want to manage their own function repository, it’s worth mentioning that the whole Wolfram Function Repository mechanism—including the infrastructure for doing reviews, etc.—is also available in a private form through the Wolfram Enterprise Private Cloud.)
\nSo what’s in the public Wolfram Function Repository? There are a lot of “specialty functions” intended for specific “niche” purposes—but very useful if they’re what you want:
\nThere are functions that add various kinds of visualizations:
\nSome functions set up user interfaces:
\nSome functions link to external services:
\nSome functions provide simple utilities:
\nThere are also functions that are being explored for potential inclusion in the core system:
\nThere are also lots of “leading-edge” functions, added as part of research or exploratory development. And for example in pieces I write (including this one), I make a point of having all pictures and other output be backed by “click-to-copy” code that reproduces them—and this code quite often contains functions either from the public Wolfram Function Repository or from (publicly accessible) private deployments.
\nPaclets are a technology we’ve used for more than a decade and a half to distribute updated functionality to Wolfram Language systems in the field. In Version 13 we began the process of providing tools for anyone to create paclets. And since Version 13 we’ve introduced the Wolfram Language Paclet Repository as a centralized repository for paclets:
\n\nWhat is a paclet? It’s a collection of Wolfram Language functionality—including function definitions, documentation, external libraries, stylesheets, palettes and more—that can be distributed as a unit, and immediately deployed in any Wolfram Language system.
\nThe Paclet Repository is a centralized place where anyone can publish paclets for public distribution. So how does this relate to the Wolfram Function Repository? They are interestingly complementary—with different optimization and different setups. The Function Repository is more lightweight, the Paclet Repository more flexible. The Function Repository is for making available individual new functions, that independently fit into the whole existing structure of the Wolfram Language. The Paclet Repository is for making available larger-scale pieces of functionality, that can define a whole framework and environment of their own.
\nThe Function Repository is also fully curated, with every function being reviewed by our team before it is posted. The Paclet Repository is an immediate-deployment system, without pre-publication review. In the Function Repository every function is specified just by its name—and our review team is responsible for ensuring that names are well chosen and have no conflicts. In the Paclet Repository, every contributor gets their own namespace, and all their functions and other material live inside that namespace. So, for example, I contributed the function RandomHypergraph to the Function Repository, which can be accessed just as ResourceFunction[\"RandomHypergraph\"]. But if I had put this function in a paclet in the Paclet Repository, it would have to be accessed as something like PacletSymbol[\"StephenWolfram/Hypergraphs\", \"RandomHypergraph\"].
\nPacletSymbol, by the way, is a convenient way of “deep accessing” individual functions inside a paclet. PacletSymbol temporarily installs (and loads) a paclet so that you can access a particular symbol in it. But more often one wants to permanently install a paclet (using PacletInstall), then explicitly load its contents (using Needs) whenever one wants to have its symbols available. (All the various ancillary elements, like documentation, stylesheets, etc. in a paclet get set up when it is installed.)
\nWhat does a paclet look like in the Paclet Repository? Every paclet has a home page that typically includes an overall summary, a guide to the functions in the paclet, and some overall examples of the paclet:
\nIndividual functions typically have their own documentation pages:
\n
Just like in the main Wolfram Language documentation, there can be a whole hierarchy of guide pages, and there can be things like tutorials.
\nNotice that in examples in paclet documentation, one often sees constructs like . These represent symbols in the paclet, presented in forms like PacletSymbol[\"WolframChemistry/ProteinVisualization\", \"AmidePlanePlot\"] that allow these symbols to be accessed in a “standalone” way. If you directly evaluate such a form, by the way, it’ll force (temporary) installation of the paclet, then return the actual, raw symbol that appears in the paclet:
So how does one create a paclet suitable for submission to the Paclet Repository? You can do it purely programmatically, or you can start from File > New > Repository Item > Paclet Repository Item, which launches what amounts to a whole paclet creation IDE. The first step is to specify where you want to assemble your paclet. You give some basic information
\nthen a Paclet Resource Definition Notebook is created, from which you can give function definitions, set up documentation pages, specify what you want your paclet’s home page to be like, etc.:
\nThere are lots of sophisticated tools that let you create full-featured paclets with the same kind of breadth and depth of capabilities that you find in the Wolfram Language itself. For example, Documentation Tools lets you construct full-featured documentation pages (function pages, guide pages, tutorials, …):
\nOnce you’ve assembled a paclet, you can check it, build it, deploy it privately—or submit it to the Paclet Repository. And once you submit it, it will automatically get set up on the Paclet Repository servers, and within just a few minutes the pages you’ve created describing your paclet will show up on the Paclet Repository website.
\nSo what’s in the Paclet Repository so far? There’s a lot of good and very serious stuff, contributed both by teams at our company and by members of the broader Wolfram Language community. In fact, many of the 134 paclets now in the Paclet Repository have enough in them that there’s a whole piece like this that one could write about them.
\nOne category of things you’ll find in the Paclet Repository are snapshots of our ongoing internal development projects—many of which will eventually become built-in parts of the Wolfram Language. A good example of this is our LLM and Chat Notebook functionality, whose rapid development and deployment over the past year was made possible by the use of the Paclet Repository. Another example, representing ongoing work from our chemistry team (AKA WolframChemistry in the Paclet Repository) is the ChemistryFunctions paclet, which contains functions like:
\nAnd, yes, this is interactive:
\nOr, also from WolframChemistry:
\nAnother “development snapshot” is DiffTools—a paclet for making and viewing diffs between strings, cells, notebooks, etc.:
\nA major paclet is QuantumFramework—which provides the functionality for our Wolfram Quantum Framework
\n\nand delivers broad support for quantum computing (with at least a few connections to multiway systems and our Physics Project):
\nTalking of our Physics Project, there are over 200 functions supporting it that are in the Wolfram Function Repository. But there are also paclets, like WolframInstitute/Hypergraph:
\nAn example of an externally contributed package is Automata—with more than 250 functions for doing computations related to finite automata:
\nAnother contributed paclet is FunctionalParsers, which goes from a symbolic parser specification to an actual parser, here being used in a reverse mode to generate random “sentences”:
\nPhi4Tools is a more specialized paclet, for working with Feynman diagrams in field theory:
And, as another example, here’s MaXrd, for crystallography and x-ray scattering:
\nAs just one more example, there’s the Organizer paclet—a utility paclet for making and manipulating organizer notebooks. But unlike the other paclets we’ve seen here, it doesn’t expose any Wolfram Language functions; instead, when you install it, it puts a palette in your Palettes list:
\nAs of today, Version 14 is finished, and out in the world. So what’s next? We have lots of projects underway—some already with years of development behind them. Some extend and strengthen what’s already in the Wolfram Language; some take it in new directions.
\nOne major focus is broadening and streamlining the deployment of the language: unifying the way it’s delivered and installed on computers, packaging it so it can be efficiently integrated into other standalone applications, etc.
\nAnother major focus is expanding the handling of very large amounts of data by the Wolfram Language—and seamlessly integrating out-of-core and lazy processing.
\nThen of course there’s algorithmic development. Some is “classical”, directly building on the towers of functionality we’ve developed over the decades. Some is more “AI based”. We’ve been creating heuristic algorithms and meta-algorithms ever since Version 1.0—increasingly using methods from machine learning. How far will neural net methods go? We don’t know yet. We’re routinely using them in things like algorithm selection. But to what extent can they help in the heart of algorithms?
\nI’m reminded of something we did back in 1987 in developing Version 1.0. There was a long tradition in numerical analysis of painstakingly deriving series approximations for particular cases of mathematical functions. But we wanted to be able to compute hundreds of different functions to arbitrary precision for any complex values of their arguments. So how did we do it? We generalized from series to rational approximations—and then, in a very “machine-learning-esque” way—we spent months of CPU time systematically optimizing these approximations. Well, we’ve been trying to do the same kind of thing again—though now over more ambitious domains—and now using not rational functions but large neural nets as our basis.
\nWe’ve also been exploring using neural nets to “control” precise algorithms, in effect making heuristic choices which either guide or can be validated by the precise algorithms. So far, none of what we’ve produced has outperformed our existing methods, but it seems plausible that fairly soon it will.
\nWe’re doing a lot with various aspects of metaprogramming. There’s the project of
\ngetting LLMs to help in the construction of Wolfram Language code—and in giving comments on it, and in analyzing what went wrong if the code didn’t do what one expected. Then there’s code annotation—where LLMs may help in doing things like predicting the most likely type for something. And there’s code compilation. We’ve been working for many years on a full-scale compiler for the Wolfram Language, and in every version what we have becomes progressively more capable. We’ve been doing some level of automatic compilation in particular cases (particularly ones involving numerical computation) for more than 30 years. And eventually full-scale automatic compilation will be possible for everything. But as of now some of the biggest payoffs from our compiler technology have been for our internal development, where we can now get optimal down-to-the-metal performance simply by compiled (albeit carefully written) Wolfram Language code.
One of the big lessons of the surprising success of LLMs is that there’s potentially more structure in meaningful human language than we thought. I’ve long been interested in creating what I’ve called a “symbolic discourse language” that gives a computational representation of everyday discourse. The LLMs haven’t explicitly done that. But they encourage the idea that it should be possible, and they also provide practical help in doing it. And whether the goal is to be able to represent narrative text, or contracts, or textual specifications, it’s a matter of extending the computational language we’ve built to encompass more kinds of concepts and structures.
\nThere are typically several kinds of drivers for our continued development efforts. Sometimes it’s a question of continuing to build a tower of capabilities in some known direction (like, for example, solving PDEs). Sometimes the tower we’ve built suddenly lets us see new possibilities. Sometimes when we actually use what we’ve built we realize there’s an obvious way to polish or extend it—or to “double down” on something that we can now see is valuable. And then there are cases where things happening in the technology world suddenly open up new possibilities—like LLMs have recently done, and perhaps XR will eventually do. And finally there are cases where new science-related insights suggest new directions.
\nI had assumed that our Physics Project would at best have practical applications only centuries hence. But in fact it’s become clear that the correspondence it’s defined between physics and computation gives us quite immediate new ways to think about aspects of practical computation. And indeed we’re now actively exploring how to use this to define a new level of parallel and distributed computation in the Wolfram Language, as well as to represent symbolically not only the results of computations but also the ongoing process of computation.
\nOne might think that after nearly four decades of intense development there wouldn’t be anything left to do in developing the Wolfram Language. But in fact at every level we reach, there’s ever more that becomes possible, and ever more that can we see might be possible. And indeed this moment is a particularly fertile one, with an unprecedentedly broad waterfront of possibilities. Version 14 is an important and satisfying waypoint. But there are wonderful things ahead—as we continue our long-term mission to make the computational paradigm achieve its potential, and to build our computational language to help that happen.
\n\n\n\n
\n", - "category": "Big Picture", - "link": "https://writings.stephenwolfram.com/2024/01/the-story-continues-announcing-version-14-of-wolfram-language-and-mathematica/", + "title": "What If We Had Bigger Brains? Imagining Minds beyond Ours", + "description": "We humans have perhaps 100 billion neurons in our brains. But what if we had many more? Or what if the AIs we built effectively had many more? What kinds of things might then become possible? At 100 billion neurons, we know, for example, that compositional language of the kind we humans use is possible. At the 100 million or so neurons of a cat, it doesn’t seem to be. But what would become possible with 100 trillion neurons? And is it even something we could imagine understanding?
\nMy purpose here is to start exploring such questions, informed by what we’ve seen in recent years in neural nets and LLMs, as well as by what we now know about the fundamental nature of computation, and about neuroscience and the operation of actual brains (like the one that’s writing this, imaged here):
\nOne suggestive point is that as artificial neural nets have gotten bigger, they seem to have successively passed a sequence of thresholds in capability:
\nSo what’s next? No doubt there’ll be things like humanoid robotic control that have close analogs in what we humans already do. But what if we go far beyond the ~1014 connections that our human brains have? What qualitatively new kinds of capabilities might there then be?
\nIf this was about “computation in general” then there wouldn’t really be much to talk about. The Principle of Computational Equivalence implies that beyond some low threshold computational systems can generically produce behavior that corresponds to computation that’s as sophisticated as it can ever be. And indeed that’s the kind of thing we see both in lots of abstract settings, and in the natural world.
\nBut the point here is that we’re not dealing with “computation in general”. We’re dealing with the kinds of computations that brains fundamentally do. And the essence of these seems to have to do with taking in large amounts of sensory data and then coming up with what amount to decisions about what to do next.
\nIt’s not obvious that there’d be any reasonable way to do this. The world at large is full of computational irreducibility—where the only general way to work out what will happen in a system is just to run the underlying rules for that system step by step and see what comes out:
\nAnd, yes, there are plenty of questions and issues for which there’s essentially no choice but to do this irreducible computation—just as there are plenty of cases where LLMs need to call on our Wolfram Language computation system to get computations done. But brains, for the things most important to them, somehow seem to routinely manage to “jump ahead” without in effect simulating every detail. And what makes this possible is the fundamental fact that within any system that shows overall computational irreducibility there must inevitably be an infinite number of “pockets of computational reducibility”, in effect associated with “simplifying features” of the behavior of the system.
\nIt’s these “pockets of reducibility” that brains exploit to be able to successfully “navigate” the world for their purposes in spite of its “background” of computational irreducibility. And in these terms things like the progress of science (and technology) can basically be thought of as the identification of progressively more pockets of computational reducibility. And we can then imagine that the capabilities of bigger brains could revolve around being able to “hold in mind” more of these pockets of computational reducibility.
\nWe can think of brains as fundamentally serving to “compress” the complexity of the world, and extract from it just certain features—associated with pockets of reducibility—that we care about. And for us a key manifestation of this is the idea of concepts, and of language that uses them. At the level of raw sensory input we might see many detailed images of some category of thing—but language lets us describe them all just in terms of one particular symbolic concept (say “rock”).
\nIn a rough first approximation, we can imagine that there’s a direct correspondence between concepts and words in our language. And it’s then notable that human languages all tend to have perhaps 30,000 common words (or word-like constructs). So is that scale the result of the size of our brains? And could bigger brains perhaps deal with many more words, say millions or more?
\n“What could all those words be about?” we might ask. After all, our everyday experience makes it seem like our current 30,000 words are quite sufficient to describe the world as it is. But in some sense this is circular: we’ve invented the words we have because they’re what we need to describe the aspects of the world we care about, and want to talk about. There will always be more features of, say, the natural world that we could talk about. It’s just that we haven’t chosen to engage with them. (For example, we could perfectly well invent words for all the detailed patterns of clouds in the sky, but those patterns are not something we currently feel the need to talk in detail about.)
\nBut given our current set of words or concepts, is there “closure” to it? Can we successfully operate in a “self-consistent slice of concept space” or will we always find ourselves needing new concepts? We might think of new concepts as being associated with intellectual progress that we choose to pursue or not. But insofar as the “operation of the world” is computationally irreducible it’s basically inevitable that we’ll eventually be confronted with things that cannot be described by our current concepts.
\nSo why is it that the number of concepts (or words) isn’t just always increasing? A fundamental reason is abstraction. Abstraction takes collections of potentially large numbers of specific things (“tiger”, “lion”, …) and allows them to be described “abstractly” in terms of a more general thing (say, “big cats”). And abstraction is useful if it’s possible to make collective statements about those general things (“all big cats have…”), in effect providing a consistent “higher-level” way of thinking about things.
\nIf we imagine concepts as being associated with particular pockets of reducibility, the phenomenon of abstraction is then a reflection of the existence of networks of these pockets. And, yes, such networks can themselves show computational irreducibility, which can then have its own pockets of reducibility, etc.
\nSo what about (artificial) neural nets? It’s routine to “look inside” these, and for example see the possible patterns of activation at a given layer based on a range of possible (“real-world”) inputs. We can then think of these patterns of activation as forming points in a “feature space”. And typically we’ll be able to see clusters of these points, which we can potentially identify as “emergent concepts” that we can view as having been “discovered” by the neural net (or rather, its training). Normally there won’t be existing words in human languages that correspond to most of these concepts. They represent pockets of reducibility, but not ones that we’ve identified, and that are captured by our typical 30,000 or so words. And, yes, even in today’s neural nets, there can easily be millions of “emergent concepts”.
\nBut will these be useful abstractions or concepts, or merely “incidental examples of compression” not connected to anything else? The construction of neural nets implies that a pattern of “emergent concepts” at one layer will necessarily feed into the next layer. But the question is really whether the concept can somehow be useful “independently”—not just at this particular place in the neural net.
\nAnd indeed the most obvious everyday use for words and concepts—and language in general—is for communication: for “transferring thoughts” from one mind to another. Within a brain (or a neural net) there are all kinds of complicated patterns of activity, different in each brain (or each neural net). But a fundamental role that concepts, words and language play is to define a way to “package up” certain features of that activity in a form that can be robustly transported between minds, somehow inducing “comparable thoughts” in all of them.
\nThe transfer from one mind to another can never be precise: in going from the pattern of activity in one brain (or neural net) to the pattern of activity in another, there’ll always be translation involved. But—at least up to a point—one can expect that the “more that’s said” the more faithful a translation can be.
\nBut what if there’s a bigger brain, with more “emergent concepts” inside? Then to communicate about them at a certain level of precision we might need to use more words—if not a fundamentally richer form of language. And, yes, while dogs seem to understand isolated words (“sit”, “fetch”, …), we, with our larger brains, can deal with compositional language in which we can in effect construct an infinite range of meanings by combining words into phrases, sentences, etc.
\nAt least as we currently imagine it, language defines a certain model of the world, based on some finite collection of primitives (words, concepts, etc.). The existence of computational irreducibility tells us that such a model can never be complete. Instead, the model has to “approximate things” based on the “network of pockets of reducibility” that the primitives in the language effectively define. And insofar as a bigger brain might in essence be able to make use of a larger network of pockets of reducibility, it can then potentially support a more precise model of the world.
\nAnd it could then be that if we look at such a brain and what it does, it will inevitably seem closer to the kind of “incomprehensible and irreducible computation” that’s characteristic of so many abstract systems, and systems in nature. But it could also be that in being a “brain-like construct” it’d necessarily tap into computational reducibility in such a way that—with the formalism and abstraction we’ve built—we’d still meaningfully be able to talk about what it can do.
\nAt the outset we might have thought any attempt for us to “understand minds beyond ours” would be like asking a cat to understand algebra. But somehow the universality of the concepts of computation that we now know—with their ability to address the deepest foundations of physics and other fields—makes it seem more plausible we might now be in a position to meaningfully discuss minds beyond ours. Or at least to discuss the rather more concrete question of what brains like ours, but bigger than ours, might be able to do.
\nAs we’ve mentioned, at least in a rough approximation, the role of brains is to turn large amounts of sensory input into small numbers of decisions about what to do. But how does this happen?
\nHuman brains continually receive input from a few million “sensors”, mostly associated with photoreceptors in our eyes and touch receptors in our skin. This input is processed by a total of about 100 billion neurons, each responding in a few milliseconds, and mostly organized into a handful of layers. There are altogether perhaps 100 trillion connections between neurons, many quite long range. At any given moment, a few percent of neurons (i.e. perhaps a billion) are firing. But in the end, all that activity seems to feed into particular structures in the lower part of the brain that in effect “take a majority vote” a few times a second to determine what to do next—in particular with the few hundred “actuators” our bodies have.
\nThis basic picture seems to be more or less the same in all higher animals. The total number of neurons scales roughly with the number of “input sensors” (or, in a first approximation, the surface area of the animal—i.e. volume2/3—which determines the number of touch sensors). The fraction of brain volume that consists of connections (“white matter”) as opposed to main parts of neurons (“gray matter”) increases as a power of the number of neurons. The largest brains—like ours—have a roughly nested pattern of folds that presumably reduce average connection lengths. Different parts of our brains have characteristic functions (e.g. motor control, handling input from our eyes, generation of language, etc.), although there seems to be enough universality that other parts can usually learn to take over if necessary. And in terms of overall performance, animals with smaller brains generally seem to react more quickly to stimuli.
\nSo what was it that made brains originally arise in biological evolution? Perhaps it had to do with giving animals a way to decide where to go next as they moved around. (Plants, which don’t move around, don’t have brains.) And perhaps it’s because animals can’t “go in more than one direction at once” that brains seem to have the fundamental feature of generating a single stream of decisions. And, yes, this is probably why we have a single thread of “conscious experience”, rather than a whole collection of experiences associated with the activities of all our neurons. And no doubt it’s also what we leverage in the construction of language—and in communicating through a one-dimensional sequence of tokens.
\nIt’s notable how similar our description of brains is to the basic operation of large language models: an LLM processes input from its “context window” by feeding it through large numbers of artificial neurons organized in layers—ultimately taking something like a majority vote to decide what token to generate next. There are differences, however, most notably that whereas brains routinely intersperse learning and thinking, current LLMs separate training from operation, in effect “learning first” and “thinking later”.
\nBut almost certainly the core capabilities of both brains and neural nets don’t depend much on the details of their biological or architectural structure. It matters that there are many inputs and few outputs. It matters that there’s irreducible computation inside. It matters that the systems are trained on the world as it is. And, finally, it matters how “big” they are, in effect relative to the “number of relevant features of the world”.
\nIn artificial neural nets, and presumably also in brains, memory is encoded in the
\nstrengths (or “weights”) of connections between neurons. And at least in neural nets it seems that the number of tokens (of textual data) that can reasonably be “remembered” is a few times the number of weights. (With current methods, the number of computational operations of training needed to achieve this is roughly the product of the total number of weights and the total number of tokens.) If there are too few weights, what happens is that the “memory” gets fuzzy, with details of the fuzziness reflecting details of the structure of the network.
But what’s crucial—for both neural nets and brains—is not so much to remember specifics of training data, but rather to just “do something reasonable” for a wide range of inputs, regardless of whether they’re in the training data. Or, in other words, to generalize appropriately from training data.
\nBut what is “appropriate generalization”? As a practical matter, it tends to be “generalization that aligns with what we humans would do”. And it’s then a remarkable fact that artificial neural nets with fairly simple architectures can successfully do generalizations in a way that’s roughly aligned with human brains. So why does this work? Presumably it’s because there are universal features of “brain-like systems” that are close enough between human brains and neural nets. And once again it’s important to emphasize that what’s happening in both cases seems distinctly weaker than “general computation”.
\nA feature of “general computation” is that it can potentially involve unbounded amounts of time and storage space. But both brains and typical neural nets have just a fixed number of neurons. And although both brains and LLMs in effect have an “outer loop” that can “recycle” output to input, it’s limited.
\nAnd at least when it comes to brains, a key feature associated with this is the limit on “working memory”, i.e. memory that can readily be both read and written “in the course of a computation”. Bigger and more developed brains typically seem to support larger amounts of working memory. Adult humans can remember perhaps 5 or 7 “chunks” of data in working memory; for young children, and other animals, it’s less. Size of working memory (as we’ll discuss later) seems to be important in things like language capabilities. And the fact that it’s limited is no doubt one reason we can’t generally “run code in our brains”.
\nAs we try to reflect on what our brains do, we’re most aware of our stream of conscious thought. But that represents just a tiny fraction of all our neural activity. Most of the activity is much less like “thought” and much more like typical processes in nature, with lots of elements seemingly “doing their own thing”. We might think of this as an “ocean of unconscious neural activity”, from which a “thread of consensus thought” is derived. Usually—much like in an artificial neural net—it’s difficult to find much regularity in that “unconscious activity”. Though when one trains oneself enough to get to the point of being able to “do something without thinking about it”, that presumably happens by organizing some part of that activity.
\nThere’s always a question of what kinds of things we can learn. We can’t overcome computational irreducibility. But how broadly can we handle what’s computationally reducible? Artificial neural nets show a certain genericity in their operation: although some specific architectures are more efficient than others, it doesn’t seem to matter much whether the input they’re fed is images or text or numbers, or whatever. And for our brains it’s probably the same—though what we’ve normally experienced, and learned from, are the specific kinds of input the come from our eyes, ears, etc. And from these, we’ve ended up recognizing certain types of regularities—that we’ve then used to guide our actions, set up our environment, etc.
\nAnd, yes, this plugs into certain pockets of computational reducibility in the world. But there’s always further one could go. And how that might work with brains bigger than ours is at the core of what we’re trying to discuss here.
\nAt some level we can view our brains as serving to take the complexity of the world and extract from it a compressed representation that our finite minds can handle. But what is the structure of that representation? A central aspect of it is that it ignores many details of the original input (like particular configurations of pixels). Or, in other words, it effectively equivalences many different inputs together.
\nBut how then do we describe that equivalence class? Implementationally, say in a neural net, the equivalence class might correspond to an attractor to which many different initial conditions all evolve. In terms of the detailed pattern of activity in the neural net the attractor will typically be very hard to describe. But on a larger scale we can potentially just think of it as some kind of robust construct that represents a class of things—or what in terms of our process of thought we might describe as a “concept”.
\nAt the lowest level there’s all sorts of complicated neural activity in our brains—most of it mired in computational irreducibility. But the “thin thread of conscious experience” that we extract from this we can for many purposes treat as being made up of higher-level “units of thought”, or essentially “discrete concepts”.
\nAnd, yes, it’s certainly our typical human experience that robust constructs—and particularly ones from which other constructs can be built—will be discrete. In principle one can imagine that there could be things like “robust continuous spaces of concepts” (“cat and dog and everything in between”). But we don’t have anything like the computational paradigm that shows us a consistent universal way that such things could fit together (there’s no robust analog of computation theory for real numbers, for example). And somehow the success of the computational paradigm—potentially all the way down to the foundations of the physical universe—doesn’t seem to leave much room for anything else.
\nSo, OK, let’s imagine that we can represent our thread of conscious experience in terms of concepts. Well, that’s close to saying that we’re using language. We’re “packaging up” the details of our neural activity into “robust elements” which we can think of as concepts—and which are represented in language essentially by words. And not only does this “packaging” into language give a robust way for different brains to communicate; it also gives a single brain a robust way to “remember” and “redeploy” thoughts.
\nWithin one brain one could imagine that one might be able to remember and “think” directly in terms of detailed low-level neural patterns. But no doubt the “neural environment” inside a brain is continually changing (not least because of its stream of sensory input). And so the only way to successfully “preserve a thought” across time is presumably to “package it up” in terms of robust elements, or essentially in terms of language. In other words, if we’re going to be able to consistently “think a particular thought” we probably have to formulate it in terms of something robust—like concepts.
\nBut, OK, individual concepts are one thing. But language—or at least human language—is based on putting together concepts in structured ways. One might take a noun (“cat”) and qualify it with an adjective (“black”) to form a phrase that’s in effect a finer-grained version of the concept represented by the noun. And in a rough approximation one can think of language as formed from trees of nested phrases like this. And insofar as the phrases are independent in their structure (i.e. “context free”), we can parse such language by recursively understanding each phrase in turn—with the constraint that we can’t do it if the nesting goes too deep for us to hold the necessary stack of intermediate steps in our working memory.
\nAn important feature of ordinary human language is that it’s ultimately presented in a sequential way. Even though it may consist of a nested tree of phrases, the words that are the leaves of that tree are spoken or written in a one-dimensional sequence. And, yes, the fact that this is how it works is surely closely connected to the fact that our brains construct a single thread of conscious experience.
\nIn the actuality of the few thousand human languages currently in use, there is considerable superficial diversity, but also considerable fundamental commonality. For example, the same parts of speech (noun, verb, etc.) typically show up, as do concepts like “subject” and “object”. But the details of how words are put together, and how things are indicated, can be fairly different. Sometimes nouns have case endings; sometimes there are separate prepositions. Sometimes verb tenses are indicated by annotating the verb; sometimes with extra words. And sometimes, for example, what would usually be whole phrases can be smooshed together into single words.
\nIt’s not clear to what extent commonalities between languages are the result of shared history, and to what extent they’re consequences either of the particulars of our human sensory experience of the world, or the particular construction of our brains. It’s not too hard to get something like concepts to emerge in experiments on training neural nets to pass data through a “bottleneck” that simulates a “mind-to-mind communication channel”. But how compositionality or grammatical structure might emerge is not clear.
\nOK, but so what might change if we had bigger brains? If neural nets are a guide, one obvious thing is that we should be able to deal directly with a larger number of “distinct concepts”, or words. So what consequences would this have? Presumably one’s language would get “grammatically shallower”, in the sense that what would otherwise have had to be said with nested phrases could now be said with individual words. And presumably this would tend to lead to “faster communication”, requiring fewer words. But it would likely also lead to more rigid communication, with less ability to tweak shades of meaning, say by changing just a few words in a phrase. (And it would presumably also require longer training, to learn what all the words mean.)
\nIn a sense we have a preview of what it’s like to have more words whenever we deal with specialized versions of existing language, aimed say at particular technical fields. There are additional words of “jargon” available, that make certain things “faster to say” (but require longer to learn). And with that jargon comes a certain rigidity, in saying easily only what the jargon says, and not something slightly different.
\nSo how else could language be different with a bigger brain? With larger working memory, one could presumably have more deeply nested phrases. But what about more sophisticated grammatical structures, say ones that aren’t “context free”, in the sense that different nested phrases can’t be parsed separately? My guess is that this quickly devolves into requiring arbitrary computation—and runs into computational irreducibility. In principle it’s perfectly possible to have any program as the “message” one communicates. But if one has to run the program to “determine its meaning”, that’s in general going to involve computational irreducibility.
\nAnd the point is that with our assumptions about what “brain-like systems” do, that’s something that’s out of scope. Yes, one can construct a system (even with neurons) that can do it. But not with the “single thread of decisions from sensory input” workflow that seems characteristic of brains. (There are finer gradations one could consider—like languages that are context sensitive but don’t require general computation. But the Principle of Computational Equivalence strongly suggests that the separation between nested context-free systems and ones associated with arbitrary computation is very thin, and there doesn’t seem to be any particular reason to expect that the capabilities of a bigger brain would land right there.)
\nSaid another way: the Principle of Computational Equivalence says it’s easy to have a system that can deal with arbitrary computation. It’s just that such a system is not “brain like” in its behavior; it’s more like a typical system we see in nature.
\nOK, but what other “additional features” can one imagine, for even roughly “brain-like” systems? One possibility is to go beyond the idea of a single thread of experience, and to consider a multiway system in which threads of experience can branch and merge. And, yes, this is what we imagine happens at a low level in the physical universe, particularly in connection with quantum mechanics. And indeed it’s perfectly possible to imagine, for example, a “quantum-like” LLM system in which one generates a graph of different textual sequences. But just “scaling up the number of neurons” in a brain, without changing the overall architecture, won’t get to this. We have to have a different, multiway architecture. Where we have a “graph of consciousness” rather than a “stream of consciousness”, and where, in effect, we’re “thinking a graph of thoughts”, notably with thoughts themselves being able to branch and merge.
\nIn our practical use of language, it’s most often communicated in spoken or written form—effectively as a one-dimensional sequence of tokens. But in math, for example, it’s common to have a certain amount of 2D structure, and in general there are also all sorts of specialized (usually technical) diagrammatic representations in use, often based on using graphs and networks—as we’ll discuss in more detail below.
\nBut what about general pictures? Normally it’s difficult for us to produce these. But in generative AI systems it’s basically easy. So could we then imagine directly “communicating mental images” from one mind to another? Maybe as a practical matter some neural implant in our brain could aggregate neural signals from which a displayed image could be generated. But is there in fact something coherent that could be extracted from our brains in this way? Perhaps that can only happen after “consensus is formed”, and we’ve reduced things to a much thinner “thread of experience”. Or, in other words, perhaps the only robust way for us to “think about images” is in effect to reduce them to discrete concepts and language-like representations.
\nBut perhaps if we “had the hardware” to display images directly from our minds it’d be a different story. And it’s sobering to imagine that perhaps the reason cats and dogs don’t appear to have compositional language is just that they don’t “have the hardware” to talk like we do (and it’s too laborious for them to “type with their paws”, etc.). And, by analogy, that if we “had the hardware” for displaying images, we’d discover we could also “think very differently”.
\nOf course, in some small ways we do have the ability to “directly communicate with images”, for example in our use of gestures and body language. Right now, these seem like largely ancillary forms of communication. But, yes, it’s conceivable that with bigger brains, they could be more.
\nAnd when it comes to other animals the story can be different. Cuttlefish are notable for dynamically producing elaborate patterns on their skin—giving them in a sense the hardware to “communicate in pictures”. But so far as one can tell, they produce just a small number of distinct patterns—and certainly nothing like a “pictorial generalization of compositional language”. (In principle one could imagine that “generalized cuttlefish” could do things like “dynamically run cellular automata on their skin”, just like all sorts of animals “statically” do in the process of growth or development. But to decode such patterns—and thereby in a sense enable “communicating in programs”—would typically require irreducible amounts of computation that are beyond the capabilities of any standard brain-like system.)
\nWe humans have raw inputs coming into our brains from a few million sensors distributed across our usual senses of touch, sight, hearing, taste and smell (together with balance, temperature, hunger, etc.). In most cases the detailed sensor inputs are not independent; in a typical visual scene, for example, neighboring pixels are highly correlated. And it doesn’t seem to take many layers of neurons in our brains to distill our typical sensory experience from pure pieces of “raw data” to what we might view as “more independent features”.
\nOf course there’ll usually be much more in the raw data than just those features. But the “features” typically correspond to aspects of the data that we’ve “learned are useful to us”—normally connected to pockets of computational reducibility that exist in the environment in which we operate. Are the features we pick out all we’ll ever need? In the end, we typically want to derive a small stream of decisions or actions from all the data that comes in. But how many “intermediate features” do we need to get “good” decisions or actions?
\nThat really depends on two things. First, what our decisions and actions are like. And second, what our raw data is like. Early in the history of our species, everything was just about “indigenous human experience”: what the natural world is like, and what we can do with our bodies. But as soon as we were dealing with technology, that changed. And in today’s world we’re constantly exposed, for example, to visual input that comes not from the natural world, but, say, from digital displays.
\nAnd, yes, we often try to arrange our “user experience” to align with what’s familiar from the natural world (say by having objects that stay unchanged when they’re moved across the screen). But it doesn’t have to be that way. And indeed it’s easy—even with simple programs—to generate for example visual images very different from what we’re used to. And in many such cases, it’s very hard for us to “tell what’s going on” in the image. Sometimes it’ll just “look too complicated”. Sometimes it’ll seem like it has pieces we should recognize, but we don’t:
\nWhen it’s “just too complicated”, that’s often a reflection of computational irreducibility. But when there are pieces we might “think we should recognize”, that can be a reflection of pockets of reducibility we’re just not familiar with. If we imagine a space of possible images—as we can readily produce with generative AI—there will be some that correspond to concepts (and words) we’re familiar with. But the vast majority will effectively lie in “interconcept space”: places where we could have concepts, but don’t, at least yet:
\nSo what could bigger brains do with all this? Potentially they could handle more features, and more concepts. Full computational irreducibility will always in effect ultimately overpower them. But when it comes to handling pockets of reducibility, they’ll presumably be able to deal with more of them. So in the end, it’s very much as one might expect: a bigger brain should be able to track more things going on, “see more details”, etc.
\nBrains of our size seem like they are in effect sufficient for “indigenous human experience”. But with technology in the picture, it’s perfectly possible to “overload” them. (Needless to say, technology—in the form of filtering, data analysis, etc.—can also reduce that overload, in effect taking raw input and bringing our actual experience of it closer to something “indigenous”.)
\nIt’s worth pointing out that while two brains of a given size might be able to “deal with the same number of features or concepts”, those features or concepts might be different. One brain might have learned to talk about the world in terms of one set of primitives (such as certain basic colors); another in terms of a different set of primitives. But if both brains are sampling “indigenous human experience” in similar environments one can expect that it should be possible to translate between these descriptions—just as it is generally possible to translate between things said in different human languages.
\nBut what if the brains are effectively sampling “different slices of reality”? What if one’s using technology to convert different physical phenomena to forms (like images) that we can “indigenously” handle? Perhaps we’re sensing different electromagnetic frequencies; perhaps we’re sensing molecular or chemical properties; perhaps we’re sensing something like fluid motion. The kinds of features that will be “useful” may be quite different in these different modalities. Indeed, even something as seemingly basic as the notion of an “object” may not be so relevant if our sensory experience is effectively of continuous fluid motion.
\nBut in the end, what’s “useful” will depend on what we can do. And once again, it depends on whether we’re dealing with “pure humans” (who can’t, for example, move like octopuses) or with humans “augmented by technology”. And here we start to see an issue that relates to the basic capabilities of our brains.
\nAs “pure humans”, we have certain “actuators” (basically in the form of muscles) that we can “indigenously” operate. But with technology it’s perfectly possible for us to use quite different actuators in quite different configurations. And as a practical matter, with brains like ours, we may not be able to make them work.
\nFor example, while humans can control helicopters, they never managed to control quadcopters—at least not until digital flight controllers could do most of the work. In a sense there were just too many degrees of freedom for brains like ours to deal with. Should bigger brains be able to do more? One would think so. And indeed one could imagine testing this with artificial neural nets. In millipedes, for example, their actual brains seem to support only a couple of patterns of motion of their legs (roughly, same phase vs. opposite phase). But one could imagine that with a bigger brain, all sorts of other patterns would become possible.
\nUltimately, there are two issues at stake here. The first is having a brain be able to “independently address” enough actuators, or in effect enough degrees of freedom. The second is having a brain be able to control those degrees of freedom. And for example with mechanical degrees of freedom there are again essentially issues of computational irreducibility. Looking at the space of possible configurations—say of millipede legs—does one effectively just have to trace the path to find out if, and how, one can get from one configuration to another? Or are there instead pockets of reducibility, associated with regularities in the space of configurations, that let one “jump ahead” and figure this out without tracing all the steps? It’s those pockets of reducibility that brains can potentially make use of.
\nWhen it comes to our everyday “indigenous” experience of the world, we are used to certain kinds of computational reducibility, associated for example with familiar natural laws, say about motion of objects. But what if we were dealing with different experiences, associated with different senses?
\nFor example, imagine (as with dogs) that our sense of smell was better developed than our sense of sight—as reflected by more nerves coming into our brains from our noses than our eyes. Our description of the world would then be quite different, based for example not on geometry revealed by the line-of-sight arrival of light, but instead by the delivery of odors through fluid motion and diffusion—not to mention the probably-several-hundred-dimensional space of odors, compared to the red, green, blue space of colors. Once again there would be features that could be identified, and “concepts” that could be defined. But those might only be useful in an environment “built for smell” rather than one “built for sight”.
\nAnd in the end, how many concepts would be useful? I don’t think we have any way to know. But it certainly seems as if one can be a successful “smell-based animal” with a smaller brain (presumably supporting fewer concepts) than one needs as a successful “sight-based animal”.
\nOne feature of “natural senses” is that they tend to be spatially localized: an animal basically senses things only where it is. (We’ll discuss the case of social organisms later.) But what if we had access to a distributed array of sensors—say associated with IoT devices? The “effective laws of nature” that one could perceive would then be different. Maybe there would be regularities that could be captured by a small number of concepts, but it seems more likely that the story would be more complicated, and that in effect one would “need a bigger brain” to be able to keep track of what’s going on, and make use of whatever pockets of reducibility might exist.
\nThere are somewhat similar issues if one imagines changing the timescales for sensory input. Our perception of space, for example, depends on the fact that light travels fast enough that in the milliseconds it takes our brain to register the input, we’ve already received light from everything that’s around us. But if our brains operated a million times faster (as digital electronics does) we’d instead be registering individual photons. And while our brains might aggregate these to something like what we ordinarily perceive, there may be all sorts of other (e.g. quantum optics) effects that would be more obvious.
\nThe more abstractly we try to think, the harder it seems to get. But would it get easier if we had bigger brains? And might there perhaps be fundamentally higher levels of abstraction that we could reach—but only if we had bigger brains.
\nAs a way to approach such questions, let’s begin by talking a bit about the history of the phenomenon of abstraction. We might already say that basic perception involves some abstraction, capturing as it does a filtered version of the world as it actually is. But perhaps we reach a different level when we start to ask “what if?” questions, and to imagine how things in the world could be different than they are.
\nBut somehow when it comes to us humans, it seems as if the greatest early leap in abstraction was the invention of language, and the explicit delineation of concepts that could be quite far from our direct experience. The earliest written records tend to be rather matter of fact, mostly recording as they do events and transactions. But already there are plenty of signs of abstraction. Numbers independent of what they count. Things that should happen in the future. The concept of money.
\nThere seems to be a certain pattern to the development of abstraction. One notices that some category of things one sees many times can be considered similar, then one “packages these up” into a concept, often described by a word. And in many cases, there’s a certain kind of self amplification: once one has a word for something (as a modern example, say “blog”), it becomes easier for us to think about the thing, and we tend to see it or make it more often in the world around us. But what really makes abstraction take off is when we start building a whole tower of it, with one abstract concept recursively being based on others.
\nHistorically this began quite slowly. And perhaps it was seen first in theology. There were glimmerings of it in things like early (syllogistic) logic, in which one started to be able to talk about the form of arguments, independent of their particulars. And then there was mathematics, where computations could be done just in terms of numbers, independent of where those numbers came from. And, yes, while there were tables of “raw computational results”, numbers were usually discussed in terms of what they were numbers of. And indeed when it came to things like measures of weight, it took until surprisingly modern times for there to be an absolute, abstract notion of weight, independent of whether it was a weight of figs or of wool.
\nThe development of algebra in the early modern period can be considered an important step forward in abstraction. Now there were formulas that could be manipulated abstractly, without even knowing what particular numbers x stood for. But it would probably be fair to say that there was a major acceleration in abstraction in the 19th century—with the development of formal systems that could be discussed in “purely symbolic form” independent of what they might (or might not) “actually represent”.
\nAnd it was from this tradition that modern notions of computation emerged (and indeed particularly ones associated with symbolic computation that I personally have extensively used). But the most obvious area in which towers of abstraction have been built is mathematics. One might start with numbers (that could count things). But soon one’s on to variables, functions, spaces of functions, category theory—and a zillion other constructs that abstractly build on each other.
\nThe great value of abstraction is that it allows one to think about large classes of things all at once, instead of each separately. But how do those abstract concepts fit together? The issue is that often it’s in a way that’s very remote from anything about which we have direct experience from our raw perception of the world. Yes, we can define concepts about transfinite numbers or higher categories. But they don’t immediately relate to anything we’re familiar with from our everyday experience.
\nAs a practical matter one can often get a sense of how high something is on the tower of abstraction by seeing how much one has to explain to build up to it from “raw experiential concepts”. Just sometimes it turns out that actually, once one hears about a certain seemingly “highly abstract” concept, one can actually explain it surprisingly simply, without going through the whole historical chain that led to it. (A notable example of this is the concept of universal computation—which arose remarkably late in human intellectual history, but is now quite easy to explain, albeit particularly given its actual widespread embodiment in technology.) But the more common case is that there’s no choice but to explain a whole tower of concepts.
\nAt least in my experience, however, when one actually thinks about “highly abstract” things, one does it by making analogies to more familiar, more concrete things. The analogies may not be perfect, but they provide scaffolding which allows our brains to take what would otherwise be quite inaccessible steps.
\nAt some level any abstraction is a reflection of a pocket of computational reducibility. Because if a useful abstraction can be defined, what it means is that it’s possible to say something in a “summarized” or reduced way, in effect “jumping ahead”, without going through all the computational steps or engaging with all the details. And one can then think of towers of abstraction as being like networks of pockets of computational reducibility. But, yes, it can be hard to navigate these.
\nUnderneath, there’s lots of computational irreducibility. And if one is prepared to “go through all the steps” one can often “get to an answer” without all the “conceptual difficulty” of complex abstractions. But while computers can often readily “go through all the steps”, brains can’t. And that’s in a sense why we have to use abstraction. But inevitably, even if we’re using abstraction, and the pockets of computational reducibility associated with it, there’ll be shadows of the computational irreducibility underneath. And in particular, if we try to “explore everything”, our network of pockets of reducibility will inevitably “get complicated”, and ultimately also be mired in computational irreducibility, albeit with “higher-level” constructs than in the computational irreducibility underneath.
\nNo finite brain will ever be able to “go all the way”, but it starts to seem likely that a bigger brain will be able to “reach further” in the network of abstraction. But what will it find there? How does the character of abstraction change when we take it further? We’ll be able to discuss this a bit more concretely when we talk about computational language below. But perhaps the main thing to say now is that—at least in my experience—most higher abstractions don’t feel as if they’re “structurally different” once one understands them. In other words, most of the time, it seems as if the same patterns of thought and reasoning that one’s applied in many other places can be applied there too, just to different kinds of constructs.
\nSometimes, though, there seem to be exceptions. Shocks to intuition that seem to separate what one’s now thinking about from anything one’s thought before. And, for example, for me this happened when I started looking broadly at the computational universe. I had always assumed that simple rules would lead to simple behavior. But many years ago I discovered that in the computational universe this isn’t true (hence computational irreducibility). And this led to a whole different paradigm for thinking about things.
\nIt feels a bit like in metamathematics. Where one can imagine one type of abstraction associated with different constructs out of which to form theorems. But where somehow there’s another level associated with different ways to build new theorems, or indeed whole spaces of theorems. Or to build proofs from proofs, or proofs from proofs of proofs, etc. But the remarkable thing is that there seems to be an ultimate construct that encompasses it all: the ruliad.
\nWe can describe the ruliad as the entangled limit of all possible computations. But we can also describe it as the limit of all possible abstractions. And it seems to lie underneath all physical reality, as well as all possible mathematics, etc. But, we might ask, how do brains relate to it?
\nInevitably, it’s full of computational irreducibility. And looked at as a whole, brains can’t get far with it. But the key idea is to think about how brains as they are—with all their various features and limitations—will “parse” it. And what I’ve argued is that what “brains as they are” will perceive about the ruliad are the core laws of physics (and mathematics) as we know them. In other words, it’s because brains are the way they are that we perceive the laws of physics that we perceive.
\nWould it be different for bigger brains? Not if they’re the “same kind of brains”. Because what seems to matter for the core laws of physics are really just two properties of observers. First, that they’re computationally bounded. And second, that they believe they are persistent in time, and have a single thread of experience through time. And both of these seem to be core features of what makes brains “brain-like”, rather than just arbitrary computational systems.
\nIt’s a remarkable thing that just these features are sufficient to make core laws of physics inevitable. But if we want to understand more about the physics we’ve constructed—and the laws we’ve deduced—we probably have to understand more about what we’re like as observers. And indeed, as I’ve argued elsewhere, even our physical scale (much bigger than molecules, much smaller than the whole universe) is for example important in giving us the particular experience (and laws) of physics that we have.
\nWould this be different with bigger brains? Perhaps a little. But anything that something brain-like can do pales in comparison to the computational irreducibility that exists in the ruliad and in the natural world. Nevertheless, with every new pocket of computational reducibility that’s reached we get some new abstraction about the world, or in effect, some new law about how the world works.
\nAnd as a practical matter, each such abstraction can allow us to build a whole collection of new ways of thinking about the world, and making things in the world. It’s challenging to trace this arc. Because in a sense it’ll all be about “things we never thought to think about before”. Goals we might define for ourselves that are built on a tower of abstraction, far away from what we might think of as “indigenous human goals”.
\nIt’s important to realize that there won’t just be one tower of abstraction that can be built. There’ll inevitably be an infinite network of pockets of computational reducibility, with each path leading to a different specific tower of abstraction. And indeed the abstractions we have pursued reflect the particular arc of human intellectual history. Bigger brains—or AIs—have many possible directions they can go, each one defining a different path of history.
\nOne question to ask is to what extent reaching higher levels of abstraction is a matter of education, and to what extent it requires additional intrinsic capabilities of a brain. It is, I suspect, a mixture. Sometimes it’s really just a question of knowing “where that pocket of reducibility is”, which is something we can learn from education. But sometimes it’s a question of navigating a network of pockets, which may only be possible when brains reach a certain level of “computational ability”.
\nThere’s another thing to discuss, related to education. And that’s the fact that over time, more and more “distinct pieces of knowledge” get built up in our civilization. There was perhaps a time in history when a brain of our size could realistically commit to memory at least the basics of much of that knowledge. But today that time has long passed. Yes, abstraction in effect compresses what one needs to know. But the continual addition of new and seemingly important knowledge, across countless specialties, makes it impossible for brains of our size to keep up.
\nPlenty of that knowledge is, though, quite siloed in different areas. But sometimes there are “grand analogies” to make—say pulling an idea from relativity theory and applying it to biological evolution. In a sense such analogies reveal new abstractions—but to make them requires knowledge that spans many different areas. And that’s a place where bigger brains—or AIs—can potentially do something that’s in a fundamental way “beyond us”.
\nWill there always be such “grand analogies” to make? The general growth of knowledge is inevitably a computationally irreducible process. And within it there will inevitably be pockets of reducibility. But how often in practice will one actually encounter “long-range connections” across “knowledge space”? As a specific example one can look at metamathematics, where such connections are manifest in theorems that link seemingly different areas of mathematics. And this example leads one to realize that at some deep level grand analogies are in a sense inevitable. In the context of the ruliad, one can think of different domains of knowledge as corresponding to different parts. But the nature of the ruliad—encompassing as it does everything that is computationally possible—inevitably imbues it with a certain homogeneity, which implies that (as the Principle of Computational Equivalence might suggest) there must ultimately be a correspondence between different areas. In practice, though, this correspondence may be at a very “atomic” (or “formal”) level, far below the kinds of descriptions (based on pockets of reducibility) that we imagine brains normally use.
\nBut, OK, will it always take an “expanding brain” to keep up with the “expanding knowledge” we have? Computational irreducibility guarantees that there’ll always in principle be “new knowledge” to be had—separated from what’s come before by irreducible amounts of computation. But then there’s the question of whether in the end we’ll care about it. After all, it could be that the knowledge we can add is so abstruse that it will never affect any practical decisions we have to make. And, yes, to some extent that’s true (which is why only some tiny fraction of the Earth’s population will care about what I’m writing here). But another consequence of computational irreducibility is that there will always be “surprises”—and those can eventually “push into focus” even what at first seems like arbitrarily obscure knowledge.
\nLanguage in general—and compositional language in particular—is arguably the greatest invention of our species. But is it somehow “the top”—the highest possible representation of things? Or if, for example, we had bigger brains, is there something beyond it that we could reach?
\nWell, in some very formal sense, yes, compositional language (at least in idealized form) is “the top”. Because—at least if it’s allowed to include utterances of any length—then in some sense it can in principle encode arbitrary, universal computations. But this really isn’t true in any useful sense—and indeed to apply ordinary compositional language in this way would require doing computationally irreducible computations.
\nSo we return to the question of what might in practice lie beyond ordinary human language. I wondered about this for a long time. But in the end I realized that the most important clue is in a sense right in front of me: the concept of computational language, that I’ve spent much of my life exploring.
\nIt’s worth saying at the outset that the way computational language plays out for computers and for brains is somewhat different, and in some respects complementary. In computers you might specify something as a Wolfram Language symbolic expression, and then the “main action” is to evaluate this expression, potentially running a long computation to find out what the expression evaluates to.
\nBrains aren’t set up to do long computations like this. For them a Wolfram Language expression is something to use in effect as a “representation of a thought”. (And, yes, that’s an important distinction between the computational language concept of Wolfram Language, and standard “programming languages”, which are intended purely as a way to tell a computer what to do, not a way to represent thoughts.)
\nSo what kinds of thoughts can we readily represent in our computational language? There are ones involving explicit numbers, or mathematical expressions. There are ones involving cities and chemicals, and other real-world entities. But then there are higher-level ones, that in effect describe more abstract structures.
\nFor example, there’s NestList, which gives the result of nesting any operation, here named f:
\nAt the outset, it’s not obvious that this would be a useful thing to do. But in fact it’s a very successful abstraction: there are lots of functions f for which one wants to do this.
\nIn the development of ordinary human language, words tend to get introduced when they’re useful, or, in other words, when they express things one often wants to express. But somehow in human language the words one gets tend to be more concrete. Maybe they describe something that directly happens to objects in the world. Maybe they describe our impression of a human mental state. Yes, one can make rather vague statements like “I’m going to do something to someone”. But human language doesn’t normally “go meta”, doing things like NestList where one’s saying that one wants to take some “direct statement” and in effect “work with the statement”. In some sense, human language tends to “work with data”, applying a simple analog of code to it. Our computational language can “work with code” as “raw material”.
\nOne can think about this as a “higher-order function”: a function that operates not on data, but on functions. And one can keep going, dealing with functions that operate on functions that operate on functions, and so on. And at every level one is increasing the generality—and abstraction—at which one is working. There may be many specific functions (a bit analogous to verbs) that operate on data (a bit analogous to nouns). But when we talk about operating on functions themselves we can potentially have just a single function (like NestList) that operates, quite generally, on many functions. In ordinary language, we might call such things “metaverbs”, but they aren’t something that commonly occurs.
\nBut what makes them possible in computational language? Well, it’s taking the computational paradigm seriously, and representing everything in computational terms: objects, actions, etc. In Wolfram Language, it’s that we can represent everything as a symbolic expression. Arrays of numbers (or countries, or whatever) are symbolic expressions. Graphics are symbolic expressions. Programs are symbolic expressions. And so on.
\nAnd given this uniformity of representation it becomes feasible—and natural—to do higher-order operations, that in effect manipulate symbolic structure without being concerned about what the structure might represent. At some level we can view this as leading to the ultimate abstraction embodied in the ruliad, where in a sense “everything is pure structure”. But in practice in Wolfram Language we try to “anchor” what we’re doing to known concepts from ordinary human language—so that we use names for things (like NestList) that are derived from common English words.
\nIn some formal sense this isn’t necessary. Everything can be “purely structural”, as it is not only in the ruliad but also in constructs like combinators, where, say, the operation of addition can be represented by:
\nCombinators have been around for more than a century. But they are almost impenetrably difficult for most humans to understand. Somehow they involve too much “pure abstraction”, not anchored to concepts we “have a sense of” in our brains.
\nIt’s been interesting for me to observe over the years what it’s taken for people (including myself) to come to terms with the kind of higher-order constructs that exist in the Wolfram Language. The typical pattern is that over the course of months or years one gets used to lots of specific cases. And only after that is one able—often in the end rather quickly—to “get to the next level” and start to use some generalized, higher-order construct. But normally one can in effect only “go one level at a time”. After one groks one level of abstraction, that seems to have to “settle” for a while before one can go on to the next one.
\nSomehow it seems as if one is gradually “feeling out” a certain amount of computational irreducibility, to learn about a new pocket of reducibility, that one can eventually use to “think in terms of”.
\nCould “having a bigger brain” speed this up? Maybe it’d be useful to be able to remember more cases, and perhaps get more into “working memory”. But I rather suspect that combinators, for example, are in some sense fundamentally beyond all brain-like systems. It’s much as the Principle of Computational Equivalence suggests: one quickly “ascends” to things that are as computationally sophisticated as anything—and therefore inevitably involve computational irreducibility. There are only certain specific setups that remain within the computationally bounded domain that brain-like systems can deal with.
\nOf course, even though they can’t directly “run code in their brains”, humans—and LLMs—can perfectly well use Wolfram Language as a tool, getting it to actually run computations. And this means they can readily “observe phenomena” that are computationally irreducible. And indeed in the end it’s very much the same kind of thing observing such phenomena in the abstract computational universe, and in the “real” physical universe. And the point is that in both cases, brain-like systems will pull out only certain features, essentially corresponding to pockets of computational reducibility.
\nHow do things like higher-order functions relate to this? At this point it’s not completely clear. Presumably in at least some sense there are hierarchies of higher-order functions that capture certain kinds of regularities that can be thought of as associated with networks of computational reducibility. And it’s conceivable that category theory and its higher-order generalizations are relevant here. In category theory one imagines applying sequences of functions (“morphisms”) and it’s a foundational assumption that the effect of any sequence of functions can also be represented by just a single function—which seems tantamount to saying that one can always “jump ahead”, or in other words, that everything one’s dealing with is computationally reducible. Higher-order category theory then effectively extends this to higher-order functions, but always with what seem like assumptions of computational reducibility.
\nAnd, yes, this all seems highly abstract, and difficult to understand. But does it really need to be, or is there some way to “bring it down” to a level that’s close to everyday human thinking? It’s not clear. But in a sense the core art of computational language design (that I’ve practiced so assiduously for nearly half a century) is precisely to take things that at first might seem abstruse, and somehow cast them into an accessible form. And, yes, this is something that’s about as intellectually challenging as anything—because in a sense it involves continually trying to “figure out what’s really going on”, and in effect “drilling down” to get to the foundations of everything.
\nBut, OK, when one gets there, how simple will things be? Part of that depends on how much computational irreducibility is left when one reaches what one considers to be “the foundations”. And part in a sense depends on the extent to which one can “find a bridge” between the foundations and something that’s familiar. Of course, what’s “familiar” can change. And indeed over the four decades that I’ve been developing the Wolfram Language quite a few things (particularly in areas like functional programming) that at first seemed abstruse and unfamiliar have begun to seem more familiar. And, yes, it’s taken the collective development and dissemination of the relevant ideas to achieve that. But now it “just takes education”; it doesn’t “take a bigger brain” to deal with these things.
\nOne of the core features of the Wolfram Language is that it represents everything as a symbolic expression. And, yes, symbolic expressions are formally able to represent any kind of computational structure. But beyond that, the important point is that they’re somehow set up to be a match for how brains work.
\nAnd in particular, symbolic expressions can be thought of “grammatically” as consisting of nested functions that form a tree-like structure; effectively a more precise version of the typical kind of grammar that we find in human language. And, yes, just as we manage to understand and generate human language with a limited working memory, so (at least at the grammatical level) we can do the same thing with computational language. In other words, in dealing with Wolfram Language we’re leveraging our faculties with human language. And that’s why Wolfram Language can serve as such an effective bridge between the way we think about things, and what’s computationally possible.
\nBut symbolic expressions represented as trees aren’t the only conceivable structures. It’s also possible to have symbolic expressions where the elements are nodes on a graph, and the graph can even have loops in it. Or one can go further, and start talking, for example, about the hypergraphs that appear in our Physics Project. But the point is that brain-like systems have a hard time processing such structures. Because to keep track of what’s going on they in a sense have to keep track of multiple “threads of thought”. And that’s not something individual brain-like systems as we current envision them can do.
\nAs we’ve discussed several times here, it seems to be a key feature of brains that they create a single “thread of experience”. But what would it be like to have multiple threads? Well, we actually have a very familiar example of that: what happens when we have a whole collection of people (or other animals).
\nOne could imagine that biological evolution might have produced animals whose brains maintain multiple simultaneous threads of experience. But somehow it has ended up instead restricting each animal to just one thread of experience—and getting multiple threads by having multiple animals. (Conceivably creatures like octopuses may actually in some sense support multiple threads within one organism.)
\nWithin a single brain it seems important to always “come to a single, definite conclusion”—say to determine where an animal will “move next”. But what about in a collection of organisms? Well, there’s still some kind of coordination that will be important to the fitness of the whole population—perhaps even something as direct as moving together as a herd or flock. And in a sense, just as all those different neuron firings in one brain get collected to determine a “final conclusion for what to do”, so similarly the conclusions of many different brains have to be collected to determine a coordinated outcome.
\nBut how can a coordinated outcome arise? Well, there has to be communication of some sort between organisms. Sometimes it’s rather passive (just watch what your neighbor in a herd or flock does). Sometimes it’s something more elaborate and active—like language. But is that the best one can do? One might imagine that there could be some kind of “telepathic coordination”, in which the raw pattern of neuron firings is communicated from one brain to another. But as we’ve argued, such communication cannot be expected to be robust. To achieve robustness, one must “package up” all the internal details into some standardized form of communication (words, roars, calls, etc.) that one can expect can be “faithfully unpacked” and in effect “understood” by other, suitably similar brains.
\nBut it’s important to realize that the very possibility of such standardized communication in effect requires coordination. Because somehow what goes on in one brain has to be aligned with what goes on in another. And indeed the way that’s maintained is precisely through continual communication.
\nSo, OK, how might bigger brains affect this? One possibility is that they might enable more complex social structures. There are plenty of animals with fairly small brains that successfully form “all do the same thing” flocks, herds and the like. But the larger brains of primates seem to allow more complex “tribal” structures. Could having a bigger brain let one successfully maintain a larger social structure, in effect remembering and handling larger numbers of social connections? Or could the actual forms of these connections be more complex? While human social connections seem to be at least roughly captured by social networks represented as ordinary graphs, maybe bigger brains would for example routinely require hypergraphs.
\nBut in general we can say that language—or standardized communication of some form—is deeply connected to the existence of a “coherent society”. For without being able to exchange something like language there’s no way to align the members of a potential society. And without coherence between members something like language won’t be useful.
\nAs in so many other situations, one can expect that the detailed interactions between members of a society will show all sorts of computational irreducibility. And insofar as one can identify “the will of society” (or, for that matter, the “tide of history”), it represents a pocket of computational reducibility in the system.
\nIn human society there is a considerable tendency (though it’s often not successful) to try to maintain a single “thread of society”, in which, at some level, everyone is supposed to act more or less the same. And certainly that’s an important simplifying feature in allowing brains like ours to “navigate the social world”. Could bigger brains do something more sophisticated? As in other areas, one can imagine a whole network of regularities (or pockets of reducibility) in the structure of society, perhaps connected to a whole tower of “higher-order social abstractions”, that only brains bigger than ours can comfortably deal with. (“Just being friends” might be a story for the “small brained”. With bigger brains one might instead have patterns of dependence and connectivity that can only be represented in complicated graph theoretic ways.)
\nWe humans have a tremendous tendency to think—or at least hope—that our minds are somehow “at the top” of what’s possible. But with what we know now about computation and how it operates in the natural world it’s pretty clear this isn’t true. And indeed it seems as if it’s precisely a limitation in the “computational architecture” of our minds—and brains—that leads to that most cherished feature of our existence that we characterize as “conscious experience”.
\nIn the natural world at large, computation is in some sense happening quite uniformly, everywhere. But our brains seem to be set up to do computation in a more directed and more limited way—taking in large amounts of sensory data, but then filtering it down to a small stream of actions to take. And, yes, one can remove this “limitation”. And while the result may lead to more computation getting done, it doesn’t lead to something that’s “a mind like ours”.
\nAnd indeed in what we’ve done here, we’ve tended to be very conservative in how we imagine “extending our minds”. We’ve mostly just considered what might happen if our brains were scaled up to have more neurons, while basically maintaining the same structure. (And, yes, animals physically bigger than us already have larger brains—as did Neanderthals—but what we really need to look at is size of brain relative to size of the animal, or, in effect “amount of brain for a given amount of sensory input”.)
\nA certain amount about what happens with different scales of brains is already fairly clear from looking at different kinds of animals, and at things like their apparent lack of human-like language. But now that we have artificial neural nets that do remarkably human-like things we’re in a position to get a more systematic sense of what different scales of “brains” can do. And indeed we’ve seen a sequence of “capability thresholds” passed as neural nets get larger.
\nSo what will bigger brains be able to do? What’s fairly straightforward is that they’ll presumably be able to take larger amounts of sensory input, and generate larger amounts of output. (And, yes, the sensory input could come from existing modalities, or new ones, and the outputs could go to existing “actuators”, or new ones.) As a practical matter, the more “data” that has to be processed for a brain to “come to a decision” and generate an output, the slower it’ll probably be. But as brains get bigger, so presumably will the size of their working memory—as well as the number of distinct “concepts” they can “distinguish” and “remember”.
\nIf the same overall architecture is maintained, there’ll still be just a single “thread of experience”, associated with a single “thread of communication”, or a single “stream of tokens”. At the size of brains we have, we can deal with compositional language in which “concepts” (represented, basically, as words) can have at least a certain depth of qualifiers (corresponding, say, to adjectival phrases). As brain size increases, we can expect there can both be more “raw concepts”—allowing fewer qualifiers—as well as more working memory to deal with more deeply nested qualifiers.
\nBut is there something qualitatively different that can happen with bigger brains? Computational language (and particularly my experience with the Wolfram Language) gives some indications, the most notable of which is the idea of “going meta” and using “higher-order constructs”. Instead of, say, operating directly on “raw concepts” with (say, “verb-like”) “functions”, we can imagine higher-order functions that operate on functions themselves. And, yes, this is something of which we see powerful examples in the Wolfram Language. But it feels as if we could somehow go further—and make this more routine—if our brains in a sense had “more capacity”.
\nTo “go meta” and “use higher-order constructs” is in effect a story of abstraction—and of taking many disparate things and abstracting to the point where one can “talk about them all together”. The world at large is full of complexity—and computational irreducibility. But in essence what makes “minds like ours” possible is that there are pockets of computational reducibility to be found. And those pockets of reducibility are closely related to being able to successfully do abstraction. And as we build up towers of abstraction we are in effect navigating through networks of pockets of computational reducibility.
\nThe progress of knowledge—and the fact that we’re educated about it—lets us get to a certain level of abstraction. And, one suspects, the more capacity there is in a brain, the further it will be able to go.
\nBut where will it “want to go”? The world at large—full as it is with computational irreducibility, along with infinite numbers of pockets of reducibility—leaves infinite possibilities. And it is largely the coincidence of our particular history that defines the path we have taken.
\nWe often identify our “sense of purpose” with the path we will take. And perhaps the definiteness of our belief in purpose is related to the particular feature of brains that leads us to concentrate “everything we’re thinking” down into just a single stream of decisions and action.
\nAnd, yes, as we’ve discussed, one could in principle imagine “multiway minds” with multiple “threads of consciousness” operating at once. But we humans (and individual animals in general) don’t seem to have those. Of course, in collections of humans (or other animals) there are still inevitably multiple “threads of consciousness” —and it’s things like language that “knit together” those threads to, for example, make a coherent society.
\nQuite what that “knitting” looks like might change as we scale up the size of brains. And so, for example, with bigger brains we might be able to deal with “higher-order social structures” that would seem alien and incomprehensible to us today.
\nSo what would it be like to interact with a “bigger brain”? Inside, that brain might effectively use many more words and concepts than we know. But presumably it could generate at least a rough (“explain-like-I’m-5”) approximation that we’d be able to understand. There might well be all sorts of abstractions and “higher-order constructs” that we are basically blind to. And, yes, one is reminded of something like a dog listening to a human conversation about philosophy—and catching only the occasional “sit” or “fetch” word.
\nAs we’ve discussed several times here, if we remove our restriction to “brain-like” operation (and in particular to deriving a small stream of decisions from large amounts of sensory input) we’re thrown into the domain of general computation, where computational irreducibility is rampant, and we can’t in general expect to say much about what’s going on. But if we maintain “brain-like operation”, we’re instead in effect navigating through “networks of computational reducibility”, and we can expect to talk about things like concepts, language and towers of abstraction.
\nFrom a foundational point of view, we can imagine any mind as in effect being at a particular place in the ruliad. When minds communicate, they are effectively exchanging the rulial analog of particles—robust concepts that are somehow unchanged as they propagate within the ruliad. So what would happen if we had bigger brains? In a sense it’s a surprisingly “mechanical” story: a bigger brain—encompassing more concepts, etc.—in effect just occupies a larger region of rulial space. And the presence of abstraction—perhaps learned from a whole arc of intellectual history—can lead to more expansion in rulial space.
\nAnd in the end it seems that “minds beyond ours” can be characterized by how large the regions of the ruliad they occupy are. (Such minds are, in some very literal rulial sense, more “broad minded”.) So what is the limit of all this? Ultimately, it’s a “mind” that spans the whole ruliad, and in effect incorporates all possible computations. But in some fundamental sense this is not a mind like ours, not least because by “being everything” it “becomes nothing”—and one can no longer identify it as having a coherent “thread of individual existence”.
\nAnd, yes, the overall thrust of what we’ve been saying applies just as well to “AI minds” as to biological ones. If we remove restrictions like being set up to generate the next token, we’ll be left with a neural net that’s just “doing computation”, with no obvious “mind-like purpose” in sight. But if we make neural nets do typical “brain-like” tasks, then we can expect that they too will find and navigate pockets of reducibility. We may well not recognize what they’re doing. But insofar as we can, then inevitably we’ll mostly be sampling the parts of “minds beyond ours” that are aligned with “minds like ours”. And it’ll take progress in our whole human intellectual edifice to be able to fully appreciate what it is that minds beyond ours can do.
\nThanks for recent discussions about topics covered here in particular to Richard Assar, Joscha Bach, Kovas Boguta, Thomas Dullien, Dugan Hammock, Christopher Lord, Fred Meinberg, Nora Popescu, Philip Rosedale, Terry Sejnowski, Hikari Sorensen, and James Wiles.
\n", + "category": "Artificial Intelligence", + "link": "https://writings.stephenwolfram.com/2025/05/what-if-we-had-bigger-brains-imagining-minds-beyond-ours/", "creator": "Stephen Wolfram", - "pubDate": "Tue, 09 Jan 2024 22:33:01 +0000", - "enclosure": "https://content.wolfram.com/sites/43/2024/01/stream-plot-small.mp4", - "enclosureType": "video/mp4", - "image": "https://content.wolfram.com/sites/43/2024/01/stream-plot-small.mp4", + "pubDate": "Wed, 21 May 2025 14:28:31 +0000", + "enclosure": "", + "enclosureType": "", + "image": "", "id": "", "language": "en", "folder": "", @@ -84,7 +84,249 @@ "favorite": false, "created": false, "tags": [], - "hash": "8e9ed31ddb65ef517482505f1b29daef", + "hash": "2841357beeb72f8b939e88b179422b99", + "highlights": [] + }, + { + "title": "What Can We Learn about Engineering and Innovation from Half a Century of the Game of Life Cellular Automaton?", + "description": "Things are invented. Things are discovered. And somehow there’s an arc of progress that’s formed. But are there what amount to “laws of innovation” that govern that arc of progress?
\nThere are some exponential and other laws that purport to at least measure overall quantitative aspects of progress (number of transistors on a chip; number of papers published in a year; etc.). But what about all the disparate innovations that make up the arc of progress? Do we have a systematic way to study those?
\nWe can look at the plans for different kinds of bicycles or rockets or microprocessors. And over the course of years we’ll see the results of successive innovations. But most of the time those innovations won’t stay within one particular domain—say shapes of bicycle frames. Rather they’ll keep on pulling in innovations from other domains—say, new materials or new manufacturing techniques. But if we want to get closer to the study of the pure phenomenon of innovation we need a case where—preferably over a long period of time—everything that happens can be described in a uniform way within a single narrowly defined framework.
\nWell, some time ago I realized that, actually, yes, there is such a case—and I’ve even personally been following it for about half a century. It’s the effort to build “engineering” structures within the Game of Life cellular automaton. They might serve as clocks, wires, logic gates, or things that generate digits of π. But the point is that they’re all just patterns of bits. So when we talk about innovation in this case, we’re talking about the rather pure question of how patterns of bits get invented, or discovered.
\nAs a long-time serious researcher of the science of cellular automata (and of what they generically do), I must say I’ve long been frustrated by how specific, whimsical and “non-scientific” the things people do with the Game of Life have often seemed to me to be. But what I now realize is that all that detail and all that hard work have now created what amounts to a unique dataset of engineering innovation. And my goal here is to do what one can call “metaengineering”—and to study in effect what happened in that process of engineering over the nearly six decades since the Game of Life was invented.
\nWe’ll see in rather pure form many phenomena that are at least anecdotally familiar from our overall experience of progress and innovation. Most of the time, the first step is to identify an objective: some purpose one can describe and wants to achieve. (Much more rarely, one instead observes something that happens, then realizes there’s a way one can meaningfully make use of it.) But starting from an objective, one either takes components one has, and puts human effort into arranging them to “invent” something that will achieve the objective—or in effect (usually at least somewhat systematically, and automatically) one searches to try to “discover” new ways to achieve the objective.
\nAs we explore what’s been done with the Game of Life we’ll see occasional sudden advances—together with much larger amounts of incremental progress. We’ll see towers of technology being built, and we’ll see old, rather simple technology being used to achieve new objectives. But most of all, we’ll see an interplay between what gets discovered by searching possibilities—and what gets invented by explicit human effort.
\nThe Principle of Computational Equivalence implies that there is, in a sense, infinite richness to what a computational system like the Game of Life can ultimately do—and it’s the role of science to explore this richness in all its breadth. But when it comes to engineering and technology the crucial question is what we choose to make the system do—and what paths we follow to get there. Inevitably, some of this is determined by the underlying computational structure of the system. But much of it is a reflection of how we, as humans, do things, and the patterns of choices we make. And that’s what we’ll be able to study—at quite large scale—by looking at the nearly six decades of work on the Game of Life.
\nHow similar are the results of such “purposeful engineering” to the results of “blind” adaptive evolution of the kind that occurs in biology? I recently explored adaptive evolution (as it happens, using cellular automata as a model) and saw that it can routinely deliver what seem like “sequences of new ideas”. But now in the example of the Game of Life we have what we can explicitly identify as “sequences of new ideas”. And so we’re in a position to compare the results of human effort (aided, in many cases, by systematic search) with what we can “automatically” do by the algorithmic process of adaptive evolution.
\nIn the end, we can think of the set of things that we can in principle engineer as being laid out in a kind of “metaengineering space”, much as we can think of mathematical theorems we can prove as being laid out in metamathematical space. In the mathematical case (notwithstanding some of my own work) the vast majority of theorems have historically been found purely by human effort. But, as we’ll see below, in Game-of-Life engineering it’s been a mixture of human effort and fairly automated exploration of metaengineering space. Though—much like in traditional mathematics—we’ve still in a sense always only pursuing objectives we’ve already conceptualized. And in this way what we’re doing is very different from what I’ve done for so long in studying the science (or, as I would now say, the ruliology) of what computational systems like cellular automata (of which the Game of Life is an example) do “in the wild”, when they’re unconstrained by objectives we’re trying to achieve with them.
\nHere’s a typical example of what it looks like to run the Game of Life:
\nThere’s a lot of complicated—and hard to understand—stuff going on here. But there are still some recognizable structures—like the “blinkers” that alternate on successive steps
\nand the “gliders” that steadily move across the screen:
\nSeeing these structures might make one think that one should be able to “do engineering” in the Game of Life, setting up patterns that can ultimately do all sorts of things. And indeed our main subject here is the actual development of such engineering over the past nearly six decades since the introduction of the Game of Life.
\nWhat we’ll be concentrating on is essentially the “technology” of the Game of Life: how we take the “raw material” that the Game of Life provides, and make from it “meaningful engineering structures”.
\nBut what about the science of the Game of Life? What can we say about what the Game of Life “naturally does”, independent of “useful” structures we create in it? The vast majority of the effort that’s been put into the Game of Life over the past half century hasn’t been about this. But this type of fundamental question is central to what one asks in what I now call ruliology—a kind of science that I’ve been energetically pursuing since the early 1980s.
\nRuliology looks in general at classes of systems, rather then at the kind of specifics that have typically been explored in the Game of Life. And within ruliology, the Game of Life is in a sense nothing special; it’s just one of many “class 4” 2D cellular automaton (in my numbering scheme, it’s the 2-color 9-neighbor cellular automaton with outer totalistic code 224).
\nMy own investigations of cellular automata have particularly focused in 1D than 2D examples. And I think that’s been crucial to many of the scientific discoveries I’ve made. Because somehow one learns so much more by being able to see at a glance the history of a system, rather than just seeing frames in a video go by. With a class 4 2D rule like the Game of Life, one can begin to approach this by including “trails” of what’s previously happened, and we’ll often use this kind of visualization in what follows:
\nWe can get a more complete view of history by looking at the whole (2+1)-dimensional “spacetime history”—though then we’re confronted with 3D forms that are often somewhat difficult for our human visual system to parse:
\nBut taking a slice through this 3D form we get “silhouette” pictures that turn out to look remarkably similar to what I generated in large quantities starting in the early 1980s across many 1D cellular automata:
\nSuch pictures—with their complex forms—highlight the computational irreducibility that’s close at hand even in the Game of Life. And indeed it’s the presence of such computational irreducibility that ultimately makes possible the richness of engineering that can be done in the Game of Life. But in actually doing that engineering—and in setting up structures and processes that behave in understandable and “technologically useful” ways—we need to keep the computational irreducibility “bottled up”. And in the end, we can think of the path of engineering innovation in the Game of Life as like an effort to navigate through an ocean of computational irreducibility, finding “islands of reducibility” that achieve the purposes we want.
\nMost of the structures of “engineering interest” in the Game of Life are somehow persistent. The simplest are structures that just remain constant, some small examples being:
\nAnd, yes, structures in the Game of Life have been given all sorts of (usually whimsical) names, which I’ll use here. (And, in that vein, structures in the Game of Life that remain constant are normally called “still lifes”.)
\nBeyond structures that just remain constant, there are “oscillators” that produce periodic patterns:
\nWe’ll be discussing oscillators at much greater length below, but here are a few examples (where now we’re including a visualization that shows “trails”):
\nNext in our inventory of classes of structures come “gliders” (or in general “spaceships”): structures that repeat periodically but move when they do so. A classic example is the basic glider, which takes on the same form every 4 steps—after moving 1 cell horizontally and 1 cell vertically:
\nHere are a few small examples of such “spaceship”-style structures:
\nStill lifes, oscillators and spaceships are most of what one sees in the “ash” that survives from typical random initial conditions. And for example the end result (after 1103 steps) from the evolution we saw in the previous section consists of:
\nThe structures we’ve seen so far were all found not long after the Game of Life was invented; indeed, pretty much as soon it was simulated on a computer. But one feature that they all share is that they don’t systematically grow; they always return to the same number of black cells. And so one of the early surprises (in 1970) was the discovery of a “glider gun” that shoots out a glider every 30 steps forever:
\nSomething that gives a sense of progress that’s been made in Game-of-Life “technology” is that a “more efficient” glider gun—with period 15—was discovered, but only in 2024, 54 years after the previous one:
\nAnother kind of structure that was quickly discovered in the early history of the Game of Life is a “puffer”—a “spaceship” that “leaves debris behind” (in this case every 128 steps):
\nBut given these kinds of “components”, what can one build? Something constructed very early was the “breeder”, that uses streams of gliders to create glider guns, that themselves then generate streams of gliders:
\nThe original pattern covers about a quarter million cells (with 4060 being black). Running it for 1000 steps we see it builds up a triangle containing a quadratically increasing number of gliders:
\nOK, but knowing that it’s in principle possible to “fill a growing region of space”, is there a more efficient way to do it? The surprisingly simple answer, as discovered in 1993, is yes:
\nSo what other kinds of things can be built in the Game of Life? Lots—even from the simple structures we’ve seen so far. For example, here’s a pattern that was constructed to compute the primes
\nemitting a “lightweight spaceship” at step 100 + 120n only if n is prime. It’s a little more obvious how this works when it’s viewed “in spacetime”; in effect it’s running a sieve in which all multiples of all numbers are instantiated as streams of gliders, which knock out spaceships generated at non-prime positions:
\nIf we look at the original pattern here, it’s just made up of a collection of rather simple structures:
\nAnd indeed structures like these have been used to build all sorts of things, including for example Turing machine emulators—and also an emulator for the Game of Life itself, with this 499×499 pattern corresponding to a single emulated Life cell:
\nBoth these last two patterns were constructed in the 1990s—from components that had been known since the early 1970s. And—as we can see—they’re large (and complicated). But do they need to be so large? One of the lessons of the Principle of Computational Equivalence is that in the computational universe there’s almost always a way to “do just as much, but with much less”. And indeed in the Game of Life many, many discoveries along these lines have been made in the past few decades.
\nAs we’ll see, often (but not always) these discoveries built on “new devices” and “new mechanisms” that were identified in the intervening years. A long series of such “devices” and “mechanisms” involved handling “signals” associated with streams of gliders. For example, the “glider pusher” (from 1993) has the somewhat subtle (but useful) effect of “pushing” a glider by one cell when it goes past:
\nAnother example (actually already known in 1971, and based on the period-15 “pentadecathlon” oscillator) is a glider reflector:
\nBut a feature of this glider pusher and glider reflector is that they work only when both the glider and the stationary object are in a particular phase with respect to their periods. And this makes it very tricky to build larger structures out of these that operate correctly (and in many cases it wouldn’t be possible but for the commensurability of the period 30 of the original glider gun, and the period 15 of the glider reflector).
\nCould glider pushing and glider reflection be done more robustly? The answer turns out to be yes. Though it wasn’t until 2020 that the “bandersnatch” was created—a completely static structure that “pushes” gliders independent of their phase:
\nMeanwhile, in 2013 the “snark” had been created—which served as a phase-independent glider reflector:
\nOne theme—to which we’ll return later—is that after certain functionality was first built in the Game of Life, there followed many “optimizations”, achieving that functionality more robustly, with smaller patterns, etc. An important methodology has revolved around so-called “hasslers”, which in effect allow one to “mine” small pieces of computational irreducibility, by providing “harnesses” that “rein in” behavior, typically returning patterns to their original states after they’ve done what one wants them to do.
\nSo, for example, here’s a hassler (found, as it happens just on February 8, 2025!) that “harnesses” the first pattern we looked at above (that didn’t stabilize for 1103 steps) into an oscillator with period 80:
\nAnd based on this (indeed, later that same day) the most-compact-ever “spaceship gun” was constructed from this:
\nWe’ve talked about some of what it’s been possible to build in the Game of Life over the years. Now I want to talk about how that happened, or, in other words, the “arc of progress” in the Game of Life. And as a first indication of this, we can plot the number of new Life structures that have been identified each year (or, more specifically, the number of structures deemed significant enough to name, and to record in the LifeWiki database or its predecessors):
\nThere’s an immediate impression of several waves of activity. And we can break this down into activity around various common categories of structures:
\nFor oscillators we see fairly continuous activity for five decades, but with rapid acceleration recently. For “spaceships” and “guns” we see a long dry spell from the early 1970s to the 1990s, followed by fairly consistent activity since. And for conduits and reflectors we see almost nothing until sudden peaks of activity, in the mid-1990s and mid-2010s respectively.
\nBut what was actually done to find all these structures? There have basically been two methods: construction and search. Construction is a story of “explicit engineering”—and of using human thought to build up what one wants. Search, on the other hand, is a story of automation—and of taking algorithmically generated (usually large) collections of possible patterns, and testing them to find ones that do what one wants. Particularly in more recent times it’s also become common to interleave these methods, for example using construction to build a framework, and then using search to find specific patterns that implement some feature of that framework.
\nWhen one uses construction, it’s like “inventing” a structure, and when one uses search, it’s like “discovering” it. So how much of each is being done in practice? Text mining descriptions of recently recorded structures the result is as follows—suggesting that, at least in recent times, search (i.e. “discovery”) has become the dominant methodology for finding new structures:
\nWhen the Game of Life was being invented, it wasn’t long before it was being run on computers—and people were trying to classify the things it could do. Still lifes and simple oscillators showed up immediately. And then—evolving from the (“R pentomino”) initial condition that we used at the beginning here—after 69 steps something unexpected showed up. In between complicated behavior that was hard to describe was a simple free-standing structure that just systematically moved—a “glider”:
Some other moving structures (dubbed “spaceships”) were also observed. But the question arose: could there be a structure that would somehow systematically grow forever? To find it involved a mixture of “discovery” and “invention”. In running from the (“R pentomino”) initial condition lots of things happen. But at step 785 it was noticed that there appeared the following structure:
For a while this structure (dubbed the “queen bee”) behaves in a fairly orderly way—producing two stable “beehive” structures (visible here as vertical columns). But then it “decays” into more complicated behavior:
\nBut could this “discovered” behavior be “stabilized”? The answer was that, yes, if a “queen bee” was combined with two “blocks” it would just repeatedly “shuttle” back and forth:
\nWhat about two “queen bees”? Now whenever these collided there was a side effect: a glider was generated—with the result that the whole structure became a glider gun repeatedly producing gliders forever:
\nThe glider gun was the first major example of a structure in the Game of Life that was found—at least in part—by construction. And within a year of it being found in November 1970, two more guns—with very similar methods of operation—had been found:
\nBut then the well ran dry—and no further gun was found until 1990. Pretty much the same thing happened with spaceships: four were found in 1970, but no more were found until 1989. As we’ll discuss later, it was in a sense a quintessential story of computational irreducibility: there was no way to predict (or “construct”) what spaceships would exist; one just had to do the computation (i.e. search) to find out.
\nIt was, however, easier to have incremental success with oscillators—and (as we’ll see) pretty much every year an oscillator with some new period was found, essentially always by search. Some periods were “long holdouts” (for example the first period-19 oscillator was found only in 2023), once again reflecting the effects of computational irreducibility.
\nGlider guns provided a source of “signals” for Life engineering. But what could one do with these signals? An important idea—that first showed up in the “breeder” in 1971—was “glider synthesis”: the concept that combinations of gliders could produce other structures. So, for example, it was found that three carefully-arranged gliders could generate a period-15 (“pentadecathlon”) oscillator:
\nIt was also soon found that 8 gliders could make the original glider gun (the breeder made glider guns by a slightly more ornate method). And eventually there developed the conjecture that any structure that could be synthesized from gliders would need at most 15 gliders, carefully arranged at positions whose values effectively encoded the object to be constructed.
\nBy the end of the 1970s a group of committed Life enthusiasts remained, but there was something of a feeling that “the low-hanging fruit had been picked”, and it wasn’t clear where to go next. But after a somewhat slow decade, work on the Game of Life picked up substantially towards the end of the 1980s. Perhaps my own work on cellular automata (and particularly the identification of class 4 cellular automata, of which the Game of Life is a 2D example) had something to do with. And no doubt it also helped that the fairly widespread availability of faster (“workstation class”) computers now made it possible for more people to do large-scale systematic searches. In addition, when the web arrived in the early 1990s it let people much more readily share results—and had the effect of greatly expanding and organizing the community of Life enthusiasts.
\nIn the 1990s—along with more powerful searches that found new spaceships and guns—there was a burst of activity in constructing elaborate “machines” out of existing known structures. The idea was to start from a known type of “machine” (say a Turing machine), then to construct a Life implementation of it. The constructions were made particularly ornate by the need to make the phases of gliders, guns, etc. appropriately correspond. Needless to say, any Life configuration can be thought of as doing some computation. But the “machines” that were constructed were ones whose “purpose” and “functionality” was already well established in general computation, independent of the Game of Life.
\nIf the 1990s saw a push towards “construction” in the Game of Life, the first decade of the 2000s saw a great expansion of search. Increasingly powerful cloud and distributed computing allowed “censuses” to be created of structures emerging from billions, then trillions of initial conditions. Mostly what was emphasized was finding new instances of existing categories of objects, like oscillators and spaceships. There were particular challenges, like (as we’ll discuss below) finding oscillators of any period (finally completely solved in 2023), or finding spaceships with different patterns of motion. Searches did yield what in censuses were usually called “objects with unusual growth”, but mostly these were not viewed as being of “engineering utility”, and so were not extensively studied (even though from the point of the “science of the Game of Life” they are, for example, perhaps the most revealing examples of computational irreducibility).
\nAs had happened throughout the history of the Game of Life, some of the most notable new structures were created (sometimes over a long period of time) by a mixture of construction and search. For example, the “stably-reflect-gliders-without-regard-to-phase” snark—finally obtained in 2013—was the result of using parts of the (ultimately unstable) “simple-structures” construction from around 1998
\nand combining them with a hard-to-explain-why-it-works “still life” found by search:
\nAnother example was the “Sir Robin knightship”—a spaceship that moves like a chess knight 2 cells down and 1 across. In 2017 a spaceship search found a structure that in 6 steps has many elements that make a knight move—but then subsequently “falls apart”:
\nBut the next year a carefully orchestrated search was able to “find a tail” that “adds a fix” to this—and successfully produces a final “perfect knightship”:
\nBy the way, the idea that one can take something that “almost works” and find a way to “fix it” is one that’s appeared repeatedly in the engineering history of the Game of Life. At the outset, it’s far from obvious that such a strategy would be viable. But the fact that it is seems to be similar to the story of why both biological evolution and machine learning are viable—which, as I’ve recently discussed, can be viewed as yet another consequence of the phenomenon of computational irreducibility.
\nOne thing that’s happened many times in the history of the Game of Life is that at some point some category of structure—like a conduit—is identified, and named. But then it’s realized that actually there was something that could be seen as an instance of the same category of structure found much earlier, though without the clarity of the later instance, its significance wasn’t recognized. For example, in 1995 the “Herschel conduit” that moves a from one position to another (here in 64 steps) was discovered (by a search):
But then it was realized that—if looked at correctly—a similar phenomenon had actually already been seen in 1972, in the form of a structure that in effect takes if it is present, and “moves it” (in 28 steps) to a
at a different position (albeit with a certain amount of “containable” other activity):
Looking at the plots above of the number of new structures found per year we see the largest peak after 2020. And, yes, it seems that during the pandemic people spent more time on the Game of Life—in particular trying to fill in tables of structures of particular types, for example, with each possible period.
\nBut what about the human side of engineering in the Game of Life? The activity brought in people from many different backgrounds. And particularly in earlier years, they often operated quite independently, and with very different methods (some not even using a computer). But if we look at all “recorded structures” we can look at how many structures in total different people contributed, and when they made these contributions:
\nNeedless to say—given that we’re dealing with an almost-60-year span—different people tend to show up as active in different periods. Looking at everyone, there’s a roughly exponential distribution to the number of (named) structures they’ve contributed. (Though note that several of the top contributors shown here found parametrized collections of structures and then recorded many instances.)
\nAs a first example of systematic “innovation history” in the Game of Life let’s talk about oscillators. Here are the periods of oscillators that were found up to 1980:
\nAs of 1980, many periods were missing. But in fact all periods are possible—though it wasn’t until 2023 that they were all filled in:
\nAnd if we plot the number of distinct periods (say below 60) found by a given year, we can get a first sense of the “arc of progress” in “oscillator technology” in the Game of Life:
\nFinding an oscillator of a given period is one thing. But how about the smallest oscillator of that period? We can be fairly certain that not all of these are known, even for periods below 30. But here’s a plot that shows when the progressive “smallest so far” oscillators were found for a given period (red indicates the first instance of a given period; blue the best result to date):
\nAnd here’s the corresponding plot for all periods up to 100:
\nBut what about the actual reduction in size that’s achieved? Here’s a plot for each oscillator period showing the sequence of sizes found—in effect the “arc of engineering optimization” that’s achieved for that period:
\nSo what are the actual patterns associated with these various oscillators? Here are some results (including timelines of when the patterns were found):
\nBut how were these all found? The period-2 “blinker” was very obvious—showing up in evolution from almost any random initial condition. Some other oscillators were also easily found by looking at the evolution of particular, simple initial conditions. For example, a line of 10 black cells after 3 steps gives the period-15 “pentadecathlon”. Similarly, the period-3 “pulsar” emerges from a pair of length-5 blocks after 22 steps:
\nMany early oscillators were found by iterative experimentation, often starting with stable “still life” configurations, then perturbing them slightly, as in this period-4 case:
\nAnother common strategy for finding oscillators (that we’ll discuss more below) was to take an “unstable” configuration, then to “stabilize” it by putting “robust” still lifes such as the “block” or the “eater”
around it—yielding results like:
For periods that can be formed as LCMs of smaller periods one “construction-oriented” strategy has been to take oscillators with appropriate smaller periods, and combine them, as in:
\nIn general, many different strategies have been used, as indicated for example by the sequence of period-3 oscillators that have been recorded over the years (where “smallest-so-far” cases are highlighted):
\nBy the mid-1990s oscillators of many periods had been found. But there were still holdouts, like period 19 and for example pretty much all periods between 61 and 70 (except, as it happens, 66). At the time, though, all sorts of complicated constructions—say of prime generators—were nevertheless being done. And in 1996 it was figured out that one could in effect always “build a machine” (using only structures that had already been found two decades earlier) that would serve as an oscillator of any (sufficiently large) period (here 67)—effectively by “sending a signal around a loop of appropriate size”:
\nBut by the 2010s, with large numbers of fast computers becoming available, there was again an emphasis on pure random search. A handful of highly efficient programs were developed, that could be run on anyone’s machine. In a typical case, a search might consist of starting, say, from a trillion randomly chosen initial conditions (or “soups”), identifying new structures that emerge, then seeing whether these act, for example, as oscillators. Typically any new discovery was immediately reported in online forums—leading to variations of it being tried, and new follow-on results often being reported within hours or days.
\nMany of the random searches started just from 16×16 regions of randomly chosen cells (or larger regions with symmetries imposed). And in a typical manifestation of computational irreducibility, many surprisingly small and “random-looking” (at least up to symmetries) results were found. So, for example, here’s the sequence of recorded period-16 oscillators with smaller-than-before cases highlighted:
\nUp through the 1990s results were typically found by a mixture of construction and small-scale search. But in 2016, results from large-scale random searches (sometimes symmetrical, sometimes not) started to appear.
\nThe contrast between construction and search could be dramatic, like here for period 57:
\nOne might wonder whether there could actually be a systematic, purely algorithmic way to find, say, possible oscillators of a given period. And indeed for one-dimensional cellular automata (as I noted in 1984), it turns out that there is. Say one considers blocks of cells of width w. Which block can follow which other is determined by a de Bruijn graph, or equivalently, a finite state machine. If one is going to have a pattern with period p, all blocks that appear in it must also be periodic with period p. But such blocks just form a subgraph of the overall de Bruijn graph, or equivalently, form another, smaller, finite state machine. And then all patterns with period p must correspond to paths through this subgraph. But how long are the blocks one has to consider?
\nIn 1D cellular automata, it turns out that there’s an upper bound of 22p. But for 2D cellular automata—like the Game of Life—there is in general no such upper bound, a fact related to the undecidability of the 2D tiling problem. And the result is that there’s no complete, systematic algorithm to find oscillators in a general 2D cellular automaton, or presumably in the Game of Life.
\nBut—as was actually already realized in the mid-1990s—it’s still possible to use algorithmic methods to “fill in” pieces of patterns. The idea is to define part of a pattern of a given period, then use this as a constraint on filling in the rest of it, finding “solutions” that satisfy the constraint using SAT-solving techniques. In practice, this approach has more often been used for spaceships than for oscillators (not least because it’s only practical for small periods). But one feature of it is that it can generate fairly large patterns with a given period.
\nYet another method that’s been tried has been to generate oscillators by colliding gliders in many possible ways. But while this is definitely useful if one’s interested in what can be made using gliders, it doesn’t seem to have, for example, allowed people to find much in the way of interesting new oscillators.
\nIn traditional engineering a key strategy is modularity. Rather than trying to build something “all in one go”, the idea is to build a collection of independent subsystems, from which the whole system can then be assembled. But how does this work in the Game of Life? We might imagine that to identify the modular parts of a system, we’d have to know the “process” by which the system was put together, and the “intent” involved. But because in the Game of Life we’re ultimately just dealing with pure patterns of bits we can in effect just as well “come in at the end” and algorithmically figure out what pieces are operating as separate, modular parts.
\nSo how can we do this? Basically what we want to find out is which parts of a pattern “operate independently” at a given step, in the sense that these parts don’t have any overlap in the cells they affect. Given that in the rules for the Game of Life a particular cell can affect any of the 9 cells in its neighborhood, we can say that black cells can only have “overlapping effects” if they are at most
cell units apart. So then we can draw a “nearest neighbor graph” that shows which cells are connected in this sense:
But what about the whole evolution? We can draw what amounts to a causal graph that shows the causal connections between the “independent modular parts” that exist at each step:
\nAnd given this, we can summarize the “modular structure” of this particular oscillator by the causal graph:
\nUltimately all that matters in the “overall operation” of the oscillator is the partial ordering defined by this graph. Parts that appear “horizontally separated” (or, more precisely, in antichains, or in physics terminology, spacelike separated) can be generated independently and in parallel. But parts that follow each other in the partial order need to be generated in that order (i.e. in physics terms, they are timelike separated).
\nAs another example, let’s look at graphs for the various oscillators of period 16 that we showed above:
\nWhat we see is that the early period-16 oscillators were quite modular, and had many parts that in effect operated independently. But the later, smaller ones were not so modular. And indeed the last one shown here had no parts that could operate independently; the whole pattern had to be taken together at each step.
\nAnd indeed, what we’ll often see is that the more optimized a structure is, the less modular it tends to be. If we’re going to construct something “by hand” we usually need to assemble it in parts, because that’s what allows us to “understand what we’re doing”. But if, for example, we just find a structure in a search, there’s no reason for it to be “understandable”, and there’s no reason for it to be particularly modular.
\nDifferent steps in a given oscillator can involve different numbers of modular parts. But as a simple way to assess the “modularity” of an oscillator, we can just ask for the average number of parts over the course of one period. So as an example, here are the results for period-30 oscillators:
\nLater, we’ll discuss how we can use the level of modularity to assess whether a pattern is likely to have been found by a search or by construction. But for now, this shows how the modularity index has varied over the years for the best known progressively smaller oscillators of a given period—with the main conclusion being that as the oscillators get optimized for size, so also their modularity index tends to decrease:
\nOscillators are structures that cycle but do not move. “Gliders” and, more generally, “spaceships” are structures that move every time they cycle. When the Game of Life was first introduced, four examples of these (all of period 4) were found almost immediately (the last one being the result of trying to extend the one before it):
\nWithin a couple of years, experimentation had revealed two variants, with periods 12 and 20 respectively, involving additional structures:
\nBut after that, for nearly two decades, no more spaceships were found. In 1989, however, a systematic method for searching was invented, and in the years since, a steady stream of new spaceships have been found. A variety of different periods have been seen
\nas well as a variety of speeds (and three different angles):
\nThe forms of these spaceships are quite diverse:
\nSome are “tightly integrated”, while some have many “modular pieces”, as revealed by their causal graphs:
\nPeriod-96 spaceships provide an interesting example of the “arc of progress” in the Game of Life. Back in 1971, a systematic enumeration of small polyominoes was done, looking for one that could “reproduce itself”. While no polyomino on its own seemed to do this, a case was found where part of the pattern produced after 48 steps seemed to reappear repeatedly every 48 steps thereafter:
\nOne might expect this repeated behavior to continue forever. But in a typical manifestation of computational irreducibility, it doesn’t, instead stopping its “regeneration” after 24 cycles, and then reaching a steady state (apart from “radiated” gliders) after 3911 steps:
\nBut from an engineering point of view this kind of complexity was just viewed as a nuisance, and efforts were made to “tame” and avoid it.
\nAdding just one still-life block to the so-called “switch engine”
\nproduces a structure that keeps generating a “periodic wake” forever:
\nBut can this somehow be “refactored” as a “pure spaceship” that doesn’t “leave anything behind”? In 1991 it was discovered that, yes, there was an arrangement of 13 switch engines that could successfully “clean up behind themselves”, to produce a structure that would act as a spaceship with period 96:
\nBut could this be made simpler? It took many years—and tests of many different configurations—but in the end it was found that just 2 switch engines were sufficient:
\nLooking at the final pattern in spacetime gives a definite impression of “narrowly contained complexity”:
\nWhat about the causal graphs? Basically these just decrease in “width” (i.e. number of independent modular parts) as the number of engines decreases:
\nLike many other things in Game-of-Life engineering, both search and construction have been used to find spaceships. As an extreme example of construction let’s talk about the case of spaceships with speed 31/240. In 2013, an analog of the switch engine above was found—which “eats” blocks 31 cells apart every 240 steps:
\nBut could this be turned into a “self-sufficient” spaceship? A year later an almost absurdly large (934852×290482) pattern was constructed that did this—by using streams of gliders and spaceships (together with dynamically assembled glider guns) to create appropriate blocks in front, and remove them behind (along with all the “construction equipment” that was used):
\n\nBy 2016, a pattern with about 700× less area had been constructed. And now, just a few weeks ago, a pattern with 1300× less area (11974×45755) was constructed:
\nAnd while this is still huge, it’s still made of modular pieces that operate in an “understandable” way. No doubt there’s a much smaller pattern that operates as a spaceship of the same speed, but—computational irreducibility being what it is—we have no idea how large the pattern might be, or how we might efficiently search for it.
\nWhat can one engineer in the Game of Life? A crucial moment in the development of Game-of-Life engineering was the discovery of the original glider gun in 1970. And what was particularly important about the glider gun is that it was a first example of something that could be thought of as a “signal generator”—that one could imagine would allow one to implement electrical-engineering-style “devices” in the Game of Life.
\nThe original glider gun produces gliders every 30 steps, in a sense defining a “clock speed” of 1/30 for any “circuit” driven by it. Within a year after the original glider gun, two other “slower” glider guns had also been discovered
\nboth working on similar principles, as suggested by their causal graphs:
\nIt wasn’t until 1990 that any additional “guns” were found. And in the years since, a sequence of guns have been found, with a rather wide range of distinct periods:
\nSome of the guns found have very long periods:
\nBut as part of the effort to do constructions in the 1990s a gun was constructed that had overall period 210, but which interwove multiple glider streams to ultimately produce gliders every 14 steps (which is the maximum rate possible, while avoiding interference of successive gliders):
\nOver the years, a whole variety of different glider guns have been found. Some are in effect “thoroughly controlled” constructions. Others are more based on some complex process that is reined in to the point where it just produces a stream of gliders and nothing more:
\nAn example of a somewhat surprising glider gun—with the shortest “true period” known—was found in 2024:
\nThe causal graph for this glider gun shows a mixture of irreducible “search-found” parts, together with a collection of “well-known” small modular parts:
\nBy the way, in 2013 it was actually found possible to extend the construction for oscillators of any period to a construction for guns of any period (or at least any period above 78):
\nIn addition to having streams of gliders, it’s also sometimes been found useful to have streams of other “spaceships”. Very early on, it was already known that one could create small spaceships by colliding gliders:
\nBut by the mid-1990s it had been found that direct “spaceship guns” could also be made—and over the years smaller and smaller “optimized” versions have been found:
\nThe last of these—from just last month—has a surprisingly simple structure, being built from components that were already known 30 years ago, and having a causal graph that shows very modular construction:
\nWe’ve talked about some of the history of how specific patterns in the Game of Life were found. But what about the overall “flow of engineering progress”? And, in particular, when something new is found, how much does it build on what has been found before? In real-world engineering, things like patent citations potentially give one an indication of this. But in the Game of Life one can approach the question much more systematically and directly, just asking what configurations of bits from older patterns are used in newer ones.
\nAs we discussed above, given a pattern such as
\nwe can pick out its “modular parts”, here rotated to canonical orientations:
\nThen we can see if these parts correspond to (any phase of) previously known patterns, which in this case they all do:
\nSo now for all structures in the database we can ask what parts they involve. Here’s a plot of the overall frequencies of these parts:
\nIt’s notable that the highest-ranked part is a so-called “eater” that’s often used in constructions, but occurs only quite infrequently in evolution from random initial conditions. It’s also notable that (for no particularly obvious reason) the frequency of the nth most common structure is roughly 1/n.
\nSo when were the various structures that appear here first found? As this picture shows, most—but not all—were found very early in the history of the Game of Life:
\nIn other words, most of the parts used in structures from any time in the history of the Game of Life come from very early in its history. Or, in effect, structures typically go “back to basics” in the parts they use.
\nHere’s a more detailed picture, showing the relative amount of use of each part in structures from each year:
\nThere are definite “fashions” to be seen here, with some structures “coming into fashion” for a while (sometimes, but not always, right after they were first found), and then dropping out.
\nOne might perhaps imagine that smaller parts (i.e. ones with smaller areas) would be more popular than larger ones. But plotting areas of parts against their rank, we see that there are some large parts that are quite common, and some small ones that are rare:
\nWe’ve seen that many of the most popular parts overall are ones that were found early in the history of the Game of Life. But plenty of distinct modular parts were also found much later. This shows the number of distinct new modular parts found across all patterns in successive years:
\nNormalizing by the number of new patterns found each year, we see a general gradual increase in the relative number of new modular parts, presumably reflecting the greater use of search in finding patterns, or components of patterns:
\nBut how important have these later-found modular parts been? This shows the total rate at which modular parts found in a given year were subsequently used—and what we see, once again, is that parts found early are overwhelmingly the ones that are subsequently used:
\nA somewhat complementary way to look at this is to ask of all patterns found in a given year, how many are “purely de novo”, in the sense that they use no previously found modular parts (as indicated in red), and how many use previously found parts:
\nA cumulative version of this makes it clear that in early years most patterns are purely de novo, but later on, there’s an increasing amount of “reuse” of previously found parts—or, in other words, in later years the “engineering history” is increasingly important:
\nIt should be said, however, that if one wants the full story of “what’s being used” it’s a bit more nuanced. Because here we’re always treating each modular part of each pattern as a separate entity, so that we consider any given pattern to “depend” only on base modular parts. But “really” it could depend on another whole structure, itself built of many modular parts. And in what we’re doing here, we’re not tracking that hierarchy of dependencies. Were we to do so, we would likely be able to see more complex “technology stacks” in the Game of Life. But instead we’re always “going down to the primitives”. (If we were dealing with electronics it’d be like asking “What are the transistors and capacitors that are being used?”, rather than “What is the caching architecture, or how is the floating point unit set up?”)
\nOK, but in terms of “base modular parts” a simple question to ask is how many get used in each pattern. This shows the number of (base) modular parts in patterns found in each year:
\nThere are always a certain number of patterns that just consist of a single modular part—and, as we saw above, that was more common earlier in the history of the Game of Life. But now we also see that there have been an increasing number of patterns that use many modular parts—typically reflecting a higher degree of “construction” (rather than search) going on.
\nBy the way, for comparison, these plots show the total areas and the numbers of (black) cells in patterns found in each year; both show increases early on, but more or less level off by the 1990s:
\nBut, OK, if we look across all patterns in the database, how many parts do they end up using? Here’s the overall distribution:
\nAt least for a certain range of numbers of parts, this falls roughly exponentially, reflecting the idea that it’s been exponentially less likely for people to come up with (or find) patterns that have progressively larger numbers of distinct modular parts.
\nHow has this changed over time? This shows a cumulative plot of the relative frequencies with which different numbers of modular parts appear in patterns up to a given year
\nindicating that over time the distribution of the number of modular parts has gotten progressively broader—or, in other words, as we’ve seen in other ways above, more patterns make use of larger numbers of modular parts.
\nWe’ve been looking at all the patterns that have been found. But we can also ask, say, just about oscillators. And then we can ask, for example, which oscillators (with which periods) contain which others, as in:
\nAnd looking at all known oscillators we can see how common different “oscillator primitives” are in building up other oscillators:
\nWe can also ask in which year “oscillator primitives” at different ranks were found. Unlike in the case of all structures above, we now see that some oscillator primitives that were found only quite recently appear at fairly high ranks—reflecting the fact that in this case, once a primitive has been found, it’s often immediately useful in making oscillators that have multiples of its period:
\nWe can think of almost everything we’ve talked about so far as being aimed at creating structures (like “clocks” and “wires”) that are recognizably useful for building traditional “machine-like” engineering systems. But a different possible objective is to find patterns that have some feature we can recognize, whether with obvious immediate “utility” or not. And as one example of this we can think about finding so-called “die hard” patterns that live as long as possible before dying out.
\nThe phenomenon of computational irreducibility tells us that even given a particular pattern we can’t in general “know in advance” how long it’s going to take to die out (or if it ultimately dies out at all). So it’s inevitable that the problem of finding ultimate die-hard patterns can be unboundedly difficult, just like analogous problems for other computational systems (such as finding so-called “busy beavers” in Turing machines).
\nBut in practice one can use both search and construction techniques to find patterns that at least live a long time (even if not the very longest possible time). And as an example, here’s a very simple pattern (found by search) that lives for 132 steps before dying out (the “puff” at the end on the left is a reflection of how we’re showing “trails”; all the actual cells are zero at that point):
\nSearching nearly 1016 randomly chosen 16×16 patterns (out of a total of ≈ 1077 possible such patterns), the longest lifetime found is 1413 steps—achieved with a rather random-looking initial pattern:
\nBut is this the best one can do? Well, no. Just consider a block and a spaceship n cells apart. It’ll take 2n steps for them to collide, and if the phases are right, annihilate each other:
\nSo by picking the separation n to be large enough, we can make this configuration “live as long as we want”. But what if we limit the size of the initial pattern, say to 32×32? In 2022 the following pattern was constructed:
\nAnd this pattern is carefully set up so that after 30,274 steps, everything lines up and it dies out, as we can see in the (vertically foreshortened) spacetime diagram on the left:
\nAnd, yes, the construction here clearly goes much further than search was able to reach. But can we go yet further? In 2023 a 116×86 pattern was constructed
\nthat it was proved eventually dies out, but only after the absurdly large number of 17↑↑↑3 steps (probably even much larger than the number of emes in the ruliad), as given by:
\nor
\nThere are some definite rough ways in which technology development parallels biological evolution. Both involve the concept of trying out possibilities and building on ones that work. But technology development has always ultimately been driven by human effort, whereas biological evolution is, in effect, a “blind” process, based on the natural selection of random mutations. So what happens if we try to apply something like biological evolution to the Game of Life? As an example, let’s look at adaptive evolution that’s trying to maximize finite lifetime based on making a sequence of random point mutations within an initially random 16×16 pattern. Most of those mutations don’t give patterns with larger (finite) lifetimes, but occasionally there’s a “breakthrough” and the lifetime achieved so far jumps up:
\nThe actual behaviors corresponding to the breakthroughs in this case are:
\nAnd here are some other outcomes from adaptive evolution:
\nIn almost all cases, a limited number of steps of adaptive evolution do succeed in generating patterns with fairly long finite lifetimes. But the behavior we see typically shows no “readily understandable mechanisms”—and no obviously separable modular parts. And instead—just like in my recent studies of both biological evolution and machine learning—what we get are basically “lumps of irreducible computation” that “just happen” to show what we’re looking for (here, long lifetime).
\nLet’s say we’re presented with an array of cells that’s an initial condition for the Game of Life. Can we tell “where it came from”? Is it “just arbitrary” (or “random”)? Or was it “set up for a purpose”? And if it was “set up for a purpose”, was it “invented” (and “constructed”) for that purpose, or was it just “discovered” (say by a search) to fulfill that purpose?
\nWhether one’s dealing with archaeology, evolutionary biology, forensic science, the identification of alien intelligence or, for that matter, theology, the question of whether something “was set up for a purpose” is a philosophically fraught one. Any behavior one sees one can potentially explain either in terms of the mechanism that produces it, or in terms of what it “achieves”. Things get a little clearer if we have a particular language for describing both mechanisms and purposes. Then we can ask questions like: “Is the behavior we care about more succinctly described in terms of its mechanism or its purpose?” So, for example, “It behaves as a period-15 glider gun” might be an adequate purpose-oriented description, that’s much shorter than a mechanism-oriented description in terms of arrangements of cells.
\nBut what is the appropriate “lexicon of purposes” for the Game of Life? In effect, that’s a core question for Game-of-Life engineering. Because what engineering—and technology in general—is ultimately about is taking whatever raw material is available (whether from the physical world, or from the Game of Life) and somehow fashioning it into something that aligns with human purposes. But then we’re back to what counts as a valid human purpose. How deeply does the purpose have to connect in to everything we do? Is it, for example, enough for something to “look nice”, or is that not “utilitarian enough”? There aren’t absolute answers to these questions. And indeed the answers can change over time, as new uses for things are discovered (or invented).
\nBut for the Game of Life we can start with some of the “purposes” we’ve discussed here—like “be an oscillator of a certain period”, “reflect gliders”, “generate the primes” or even just “die after as long as possible”. Let’s say we just start enumerating possible initial patterns, either randomly, or exhaustively. How often will we come across patterns that “achieve one of these purposes”? And will it “only achieve that purpose” or will it also “do extra stuff” that “seems irrelevant”?
\nAs an example, consider enumerating all possible 3×3 patterns of cells. There are altogether
Other patterns can take a while to “become period 2”, but then at least give “pure period-2 objects”. And for example this one can be interpreted as being the smallest precursor, and taking the least time, to reach the period-2 object it produces:
\nThere are other cases that “get to the same place” but seem to “wander around” doing so, and therefore don’t seem as convincing as having been “created for the purpose of making a period-2 oscillator”:
\nThen there are much more egregious cases. Like
\nwhich after 173 steps gives
\nbut only after going through all sorts of complicated intermediate behavior
\nthat definitely doesn’t make it look like it’s going “straight to its purpose” (unless perhaps its purpose is to produce that final pattern from the smallest initial precursor, etc.).
\nBut, OK. Let’s imagine we have a pattern that “goes straight to” some “recognizable purpose” (like being an oscillator of a certain period). The next question is: was that pattern explicitly constructed with an understanding of how it would achieve its purpose, or was it instead “blindly found” by some kind of search?
\nAs an example, let’s look at some period-9 oscillators:
\nOne like
\nseems like it must have been constructed out of “existing parts”, while one like
\nseems like it could only plausibly have been found by a search.
\nSpacetime views don’t tell us much in these particular cases:
\nBut causal graphs are much more revealing:
\nThey show that in the first case there are lots of “factored modular parts”, while in the second case there’s basically just one “irreducible blob” with no obvious separable parts. And we can view this as an immediate signal for “how human” each pattern is. In a sense it’s a reflection of the computational boundedness of our minds. When there are factored modular parts that interact fairly rarely and each behave in a fairly simple way, it’s realistic for us to “get our minds around” what’s going on. But when there’s just an “irreducible blob of activity” we’d have to compute too much and keep too much in mind at once for us to be able to really “understand what’s going on” and for example produce a human-level narrative explanation of it.
\nIf we find a pattern by search, however, we don’t really have to “understand it”; it’s just something we computationally “discover out there in the computational universe” that “happens” to do what we want. And, indeed, as in the example here, it often does what it does in a quite minimal (if incomprehensible) way. Something that’s found by human effort is much less likely to be minimal; in effect it’s at least somewhat “optimized for comprehensibility” rather than for minimality or ease of being found by search. And indeed it will often be far too big (e.g. in terms of number of cells) for any pure exhaustive or random search to plausibly find it—even though the “human-level narrative” for it might be quite short.
\nHere are the causal graphs for all the period-9 oscillators from above:
\nSome we can see can readily be broken down into multiple rarely interacting distinct components; others can’t be decomposed in this kind of way. And in a first approximation, the “decomposable” ones seem to be precisely those that were somehow “constructed by human effort”, while the non-decomposable ones seem to be those that were “discovered by searches”.
\nTypically, the way the “constructions” are done is to start with some collection of known parts, then, by trial and error (sometimes computer assisted) see how these can be fit together to get something that does what one wants. Searches, on the other hand, typically operate on “raw” configurations of cells, blindly going through a large number of possible configurations, at every stage automatically testing whether one’s got something that does what one wants.
\nAnd in the end these different strategies reveal themselves in the character of the final patterns they produce, and in the causal graphs that represent these patterns and their behavior.
\nIn engineering as it’s traditionally been practiced, the main emphasis tends to be on figuring out plans, and then constructing things based on those plans. Typically one starts from components one has, then tries to figure out how to combine them to incrementally build up what one wants.
\nAnd, as we’ve discussed, this is also a way of developing technology in the Game of Life. But as we’ve discussed at length, it’s not the only way. Another way is just to search for whole pieces of technology one wants.
\nTraditional intuition might make one assume this would be hopeless. But the repeated lesson of my discoveries about simple programs—as well as what’s been done with the Game of Life—is that actually it’s often not hopeless at all, and instead it’s very powerful.
\nYes, what you get is not likely to be readily “understandable”. But it is likely to be minimal and potentially quite optimal for whatever it is that it does. I’ve often talked of this approach as “mining from the computational universe”. And over the course of many years I’ve had success with it in all sorts of disparate areas. And now, here, we’ve see in the Game of Life a particularly clean example where search is used alongside construction in developing technology.
\nIt’s a feature of things produced by construction that they are “born understandable”. In effect, they are computationally reducible enough that we can “fit them in our finite minds” and “understand them”. But things found by search don’t have this feature. And most of the time the behavior they’ll show will be full of computational irreducibility.
\nIn both biological evolution and machine learning my recent investigations suggest that most of what we’re seeing are “lumps of irreducible computation” found at random that just “happen to achieve the necessary objectives”. This hasn’t been something familiar in traditional engineering, but it’s something tremendously powerful. And from the examples we’ve seen here in the Game of Life it’s clear that it can often achieve things that seem completely inaccessible by traditional methods based on explicit construction.
\nAt first we might assume that irreducible computation is too unruly and unpredictable to be useful in achieving “understandable objectives”. But if we find just the right piece of irreducible computation then it’ll achieve the objective we want, often in a very minimal way. And the point is that the computational universe is in a sense big enough that we’ll usually be able to find that “right piece of irreducible computation”.
\nOne thing we see in Game-of-Life engineering is something that’s in a sense a compromise between irreducible computation and predictable construction. The basic idea is to take something that’s computationally irreducible, and to “put it in a cage” that constrains it to do what one wants. The computational irreducibility is in a sense the “spark” in the system; the cage provides the control we need to harness that spark in a way that meets our objectives.
\nLet’s look at some examples. As our “spark” we’ll use the R pentomino that we discussed at the very beginning. On its own, this generates all sorts of complex behavior—that for the most part doesn’t align with typical objectives we might define (though as a “side show” it does happen to generate gliders). But the idea is to put constraints on the R pentomino to make it “useful”.
Here’s a case where we’ve tried to “build a road” for the R pentomino to go down:
\nAnd looking at this every 18 steps we see that, at least for a while, the R pentomino has indeed moved down the road. But it’s also generated something of an “explosion”, and eventually this explosion catches up, and the R pentomino is destroyed.
\nSo can we maintain enough control to let the R pentomino survive? The answer is yes. And here, for example, is a period-12 oscillator, “powered” by an R pentomino at its center:
\nWithout the R pentomino, the structure we’ve set up cycles with period 6:
\nAnd when we insert the R pentomino this structure “keeps it under control”—so that the only effect it ultimately has is to double the period, t0 12.
\nHere’s a more dramatic example. Start with a static configuration of four so-called “eaters”:
\nNow insert two R pentominoes. They’ll start doing their thing, generating what seems like quite random behavior. But the “cage” defined by the “eaters” limits what can happen, and in the end what emerges is an oscillator—that has period 129:
\nWhat else can one “make R pentominoes do”? Well, with appropriate harnesses, they can for example be used to “power” oscillators with many different periods:
\n“Be an oscillator of a certain period” is in a sense a simple objective. But what about more complex objectives? Of course, any pattern of cells in the Game of Life will do something. But the question is whether that something aligns with technological objectives we have.
\nGenerically, things in the Game of Life will behave in computationally irreducible ways. And it’s this very fact that gives such richness to what can be done with the Game of Life. But can the computational irreducibility be controlled—and harnessed for technological purposes? In a sense that is the core challenge of engineering in both the Game of Life, and in the real world. (It’s also rather directly the challenge we face in making use of the computational power of AI, but still adequately aligning it with human objectives.)
\nAs we look at the arc of technological development in the Game of Life we see over the course of half a century all sorts of different advances being made. But will there be an end to this? Will we eventually run out of inventions and discoveries? The underlying presence of computational irreducibility makes it clear that we will not. The only thing that might end is the set of objectives we’re trying to meet. We now know how to make oscillators of any period. And unless we insist on for example finding the smallest oscillator of a given period, we can consider the problem of finding oscillators solved, with nothing more to discover.
\nIn the real world nature and the evolution of the universe inevitably confront us with new issues, which lead to new objectives. In the Game of Life—as in any other abstract area, like mathematics—the issue of defining new objectives is up to us. Computational irreducibility leads to infinite diversity and richness of what’s possible. The issue for us is to figure out what direction we want to go. And the story of engineering and technology in the Game of Life gives us, in effect, a simple model for the issues we confront in other areas of technology, like AI.
\nI’m not sure if I made the right decision back in 1981. I had come up with a very simple class of systems and was doing computer experiments on them, and was starting to get some interesting results. And when I mentioned what I was doing to a group of (then young) computer scientists they said “Oh, those things you’re studying are called cellular automata”. Well, actually, the cellular automata they were talking about were 2D systems while mine were 1D. And though that might seem like a technical difference, it has a big effect on one’s impression of what’s going on—because in 1D one can readily see “spacetime histories” that gave an immediate sense of the “whole behavior of the system”, while in 2D one basically can’t.
\nI wondered what to call my models. I toyed with the term “polymones”—as a modernized nod to Leibniz’s monads. But in the end I decided that I should stick with a simpler connection to history, and just call my models, like their 2D analogs, “cellular automata”. In many ways I’m happy with that decision. Though one of its downsides has been a certain amount of conceptual confusion—more than anything centered around the Game of Life.
\nPeople often know that the Game of Life is an example of a cellular automaton. And they also know that within the Game of Life lots of structures (like gliders and glider guns) can be set up to do particular things. Meanwhile, they hear about my discoveries about the generation of complexity in cellular automata (like rule 30). And somehow they conflate these things—leading to all too many books etc. that show pictures of simple gliders in the Game of Life and say “Look at all this complexity!”
\nAt some level it’s a confusion between science and engineering. My efforts around cellular automata have centered on empirical science questions like “What does this cellular automaton do if you run it?” But—as I’ve discussed at length above—most of what’s been done with the Game of Life has centered instead on questions of engineering, like “What recognizable (or useful) structures can you build in the system?” It’s a different objective, with different results. And, in particular, by asking to “engineer understandable technology” one’s specifically eschewing the phenomenon of computational irreducibility—and the whole story of the emergence of complexity that’s been so central to my own scientific work on cellular automata and so much else.
\nMany times over the years, people would show me things they’d been able to build in the Game of Life—and I really wouldn’t know what to make of them. Yes, they seemed like impressive hacks. But what was the big picture? Was this just fun, or was there some broader intellectual point? Well, finally, not long ago I realized: this is not a story of science, it’s a story about the arc of engineering, or what one can call “metaengineering”.
\nAnd back in 2018, in connection with the upcoming 50th anniversary of the Game of Life, I decided to see what I could figure out about this. But I wasn’t satisfied with how far I got, and other priorities interceded. So—beyond one small comment that ended up in a 2020 New York Times article—I didn’t write anything about what I’d done. And the project languished. Until now. When somehow my long-time interest in “alien engineering”, combined with my recent results about biological evolution coalesced into a feeling that it was time to finally figure out what we could learn from all that effort that’s been put into the Game of Life.
\nIn a sense this brings closure to a very long-running story for me. The first time I heard about the Game of Life was in 1973. I was an early teenager then, and I’d just gotten access to a computer. By today’s standards the computer (an Elliott 903C) was a primitive one: the size of a desk, programmed with paper tape, with only 24 kilobytes of memory. I was interested in using it for things like writing a simulator for the physics of idealized gas molecules. But other kids who had access to the computer were instead more interested (much as many kids might be today) in writing games. Someone wrote a “Hunt the Wumpus” game. And someone else wrote a program for the “Game of Life”. The configurations of cells at each generation were printed out on a teleprinter. And for some reason people were particularly taken with the “Cheshire cat” configuration, in which all that was left at the end (as in Alice in Wonderland) was a “smile”. At the time, I absolutely didn’t see the point of any of this. I was interested in science, not games, and the Game of Life pretty much lost me at “Game”.
\nFor a number of years I didn’t have any further contact with the Game of Life. But then I met Bill Gosper, who I later learned had in 1970 discovered the glider gun in the Game of Life. I met Gosper first “online” (yes, even in 1978 that was a thing, at least if you used the MIT-MC computer through the ARPANET)—then in person in 1979. And in 1980 I visited him at Xerox PARC, where he described himself as part of the “entertainment division” and gave me strange math formulas printed on a not-yet-out-of-the-lab color laser printer
\nand also showed me a bitmapped display (complete with GUI) with lots of pixels dancing around that he enthusiastically explained were showing the Game of Life. Knowing what I know now, I would have been excited by what I saw. But at the time, it didn’t really register.
\nStill, in 1981, having started my big investigation of 1D cellular automata, and having made the connection to the 2D case of the Game of Life, I started wondering whether there was something “scientifically useful” that I could glean from all the effort I knew (particularly from Gosper) had been put into Life. It didn’t help that almost none of the output of that effort had been published. And in those days before the web, personal contact was pretty much the only way to get unpublished material. One of my larger “finds” was from a friend of mine from Oxford who passed on “lab notebook pages” he’d got from someone who was enumerating outcomes from different Game-of-Life initial configurations:
\nAnd from material like this, as well as my own simulations, I came up with some tentative “scientific conclusions”, which I summarized in 1982 in a paragraph in my first big paper about cellular automata:
\nBut then, at the beginning of 1983, as part of my continuing effort to do science on cellular automata, I made a discovery. Among all cellular automata there seemed to be four basic classes of behavior, with class 4 being characterized by the presence of localized structures, sometimes just periodic, and sometimes moving:
\nI immediately recognized the analogy to the Game of Life, and to oscillators and gliders there. And indeed this analogy was part of what “tipped me off” to thinking about the ubiquitous computational capabilities of cellular automata, and to the phenomenon of computational irreducibility.
\nMeanwhile, in March 1983, I co-organized what was effectively the first-ever conference on cellular automata (held at Los Alamos)—and one of the people I invited was Gosper. He announced his Hashlife algorithm (which was crucial to future Life research) there, and came bearing gifts: printouts for me of Life, that I annotated, and still have in my archives:
\nI asked Gosper to do some “more scientific” experiments for me—for example starting from a region of randomness, then seeing what happened:
\nBut Gosper really wasn’t interested in what I saw as being science; he wanted to do engineering, and make constructions—like this one he gave me, showing two glider guns exchanging streams of gliders (why would one care, I wondered):
\nI’d mostly studied 1D cellular automata—where I’d discovered a lot by systematically looking at their behavior “laid out in spacetime”. But in early 1984 I resolved to also systematically check out 2D cellular automata. And mostly the resounding conclusion was that their basic behavior was very similar to 1D. Out of all the rules we studied, the Game of Life didn’t particularly stand out. But—mostly to provide a familiar comparison point—I included pictures of it in the paper we wrote:
\nAnd we also went to the trouble of making a 3D “spacetime” picture of the Game of Life on a Cray supercomputer—though it was too small to show anything terribly interesting:
\nIt had been a column in Scientific American in 1970 that had first propelled the Game of Life to public prominence—and that had also launched the first great Life engineering challenge of finding a glider gun. And in both 1984 and 1985 a successor to that very same column ran stories about my 1D cellular automata. And in 1985, in collaboration with Scientific American, I thought it would be fun and interesting to reprise the 1970 glider gun challenge, but now for 1D class 4 cellular automata:
\nMany people participated. And my main conclusion was: yes, it seemed like one could do the same kinds of engineering in typical 1D class 4 cellular automata as one could in the Game of Life. But this was all several years before the web, and the kind of online community that has driven so much Game of Life engineering in modern times wasn’t yet able to form.
\nMeanwhile, by the next year, I was starting the development of Mathematica and what’s now the Wolfram Language, and for a few years didn’t have much time to think about cellular automata. But in 1987 when Gosper got involved in making pre-release demos of Mathematica he once again excitedly told me about his discoveries in the Game of Life, and gave me pictures like:
\nIt was in 1992 that the Game of Life once again appeared in my life. I had recently embarked on what would become the 10-year project of writing my book A New Kind of Science. I was working on one of the rather few “I already have this figured out” sections in the book—and I wanted to compare class 4 behavior in 1D and 2D. How was I to display the Game of Life, especially in a static book? Equipped with what’s now the Wolfram Language it was easy to come up with visualizations—looking “out” into a spacetime slice with more distant cells “in a fog”, as well as “down” into a fog of successive states:
\n\nAnd, yes, it was immediately striking how similar the spacetime slice looked to my pictures of 1D class 4 cellular automata. And when I wrote a note for the end of the book about Life, the correspondence became even more obvious. I’d always seen the glider gun as a movie. But in a spacetime slice it “made much more sense”, and looked incredibly similar to analogous structures in 1D class 4 cellular automata:
\nIn A New Kind of Science I put a lot of effort into historical notes. And as a part of such a note on “History of cellular automata” I had a paragraph about the Game of Life:
\nI first met John Conway in September 1983 (at a conference in the south of France). As I would tell his biographer many years later, my relationship with Conway was complicated from the start. We were both drawn to systems defined by very simple rules, but what we found interesting about them was very different. I wanted to understand the big picture and to explore science-oriented questions (and what I would now call ruliology). Conway, on the other hand, was interested in specific, often whimsically presented results—and in questions that could be couched as mathematical theorems.
\nIn my conversations with Conway, the Game of Life would sometimes come up, but Conway never seemed too interested in talking about it. In 2001, though, when I was writing my note about the history of 2D cellular automata, I spent several hours specifically asking Conway about the Game of Life and its history. At first Conway told me the standard origin story that Life had arisen as a kind of game. A bit later he said he’d at the time just been hired as a logic professor, and had wanted to use Life as a simple way to enumerate the recursive functions. In the end, it was hard to disentangle true recollections from false (or “elaborated”) ones. And, notably, when asked directly about the origin of the specific rules of Life, he was evasive. Of course, none of that should detract from Conway’s achievement in the concept of the Game of Life, and in the definition of the hacker-like culture around it—the fruits of which have now allowed me to do what I’ve done here.
\nFor many years after the publication of A New Kind of Science in 2002, I didn’t actively engage with the Game of Life—though I would hear from Life enthusiasts with some frequency, but none as much as Gosper, from whom I was a recipient of hundreds of messages about Life, a typical example from 2017 concerning
\nand saying:
\n\nNovelty is mediated by the sporadic glider gas (which forms very sparse
\nbeams), sporadic debris (forming sparse lines), and is hidden in sporadic
\ndefects in the denser beams and lines. At this scale, each screen pixel
\nrepresents 262144 x 262144 Life cells. Thus very sparse lines, e.g. density
\n10^-5, appear solid, while being very nearly transparent to gliders.
After 3.4G, (sparse) new glider beams are still fading up. The beams
\nrepeatedly strafe the x and y axis stalagmites.
I suspect this will (very) eventually lead to a positive density of
\nswitch-engines, and thus quadratic population growth.
⋮
\nFinally, around 4.2G, an eater1 (fish hook):
\nDepending on background novelty radiation, there ought to be one of
\nthese every few billion, all lying on a line through the origin.
⋮
\nWith much help from Tom R, I slogged to 18G, with *zero* new nonmovers
\nin the 4th quadrant, causing me to propose a mechanism that precluded
\nfuture new ones. But then Andrew Trevorrow fired up his Big Mac (TM),
\nran 60G, and found three new nonmovers! They are, respectively, a mirror
\nimage(!) of the 1st eater, and two blinkers, in phase, but not aligned with
\nthe origin. I.e., all four are "oners'", or at least will lie on different
\ntrash trails.
I’m still waiting for one of these to sprout switch-engines and begin quadratic
\ngrowth. But here’s a puzzle: Doesn’t the gas of sparse gliders (actually glider
\npackets) in the diagonal strips athwart the 1st quadrant already reveal (small
\ncoefficient) quadratic growth? Which will *eventually* dominate? The area of the
\nstrips is increasing quadratically. Their density *appears* to be at least holding,
\nbut possibly along only one axis. I don’t see where quadratically many gliders could
\narise. They’re being manufactured at a (roughly) fixed rate. Imagine the above
\npicture in the distant future. Where is the amplification that will keep those
\nstrips full? ‐‐Bill