Intelligence and Combinatorial Complexity

Sam Schillace

Published Aug 27, 2023

Ok, now that all the non math-nerds are gone, we can talk.

I’ve been working for some time now with a team inside of Microsoft on various ideas in the direction of more stable, longer-running agents based on LLMs, that use memory and other techniques keep context and perform more complex tasks, or even just maintain relationships with people (which is pretty wild). We have some good results and some mixed ones - stability turns out to be a hard thing to achieve in a naive way, though there are some really interesting new design capabilities when you start treating cognition and memory as fundamental building blocks (“now I know Kung Fu” - we’ve actually been able to implement things like shared memory between agents, for example). Many of these ideas have made it into the Semantic Kernel project (and more are coming), and some have made it into a recent MSR paper.

One counter-intuitive thing we have found is that you can get “more” intelligence from more agents working together, or in some kind of state-machine-like flow (as opposed to a single agent) even if they are all using the same base model. That seems counter intuitive - why aren’t you getting “all” of the intelligence of the base model in any inference? How can you get more with just a different kind of inference agent, like working memory or a prefrontal cortex analog or other monitor?

But again, often when working with these systems, thinking about human patterns is at least helpful. In this case, we have the same pattern: you can write a paragraph or solve a math problem, for example, and then go back through in “proofreader” mode or “checker” mode and find errors and improve your own work. Sure, there is some kind of limit eventually (you’re not going to proofread your way into General Relativity unless you’re Einstein) but it’s not zero - we all get some benefit from going back over our work. So it shouldn’t surprise us too much that this works with something like LLM inference too.

A guess (and it’s just that) is that there is something like the idea of combinatorial complexity at play here. Our own neural architecture is a collection of loosely connected subnetworks that have more targeted tasks, and that come together in complex ways to produce our behavior. The world is a complex place, so much so that it’s impossible to hard code all of the edge cases (and many have tried in the prior rounds of AI research - I remember Psych for example).

It’s hard to get to a high “N” count of behaviors and ideas that way - but much easier to get to that high N count combinatorically, with a good set of composable primitives. And language is really good for the compositional part of that equation - it’s easy to combine these pieces in surprising ways, and much less brittle that it might otherwise be, so the system can even “improvise” fairly well.

Of course, anything at this level of complexity probably comes with its own problems - hallucinations, cult beliefs, lies, etc. It’s possible that that just comes with the territory, that anything above a certain level of complexity will have the same problems, just like it’s hard to weed things like cancer or mutations out of complex biological systems. Maybe it’s a numbers game like entropy is - there are just higher and higher odds you’ll land on a bad square as the complexity or dimensionality goes up.

Base models will continue to get better, and it’s hard to know how much will get solved that way versus higher level architectures like what I described above. But that math, and our own inferred neural infrastructure, does point in the direction that there will always be value in adding code, state machines, other perspectives, etc on top of the base models, no matter how rich they get. It’s time to think about what the fundamentals of that kind of programming and tooling need to be - how we do testing, regression, monitoring, experiments, tracing, etc.

Joaquin Marques