FAQs and Summary: Rat-scale, near Real-time Cortical Simulations

Since we are receiving a number of questions about our recent paper on rat-scale cortical simulation, Anatomy of a Cortical Simulator, at the Supercomputing 2007 Conference, I have decided to post a list of FAQs which I hope will serve as a summary of the paper.

Abstract

Insights into brain’s high-level computational principles will lead to novel cognitive systems, computing architectures, programming paradigms, and numerous practical applications. An important step towards this end is the study of large networks of cortical spiking neurons.

We have built a cortical simulator, C2, incorporating several algorithmic enhancements to optimize the simulation scale and time, through: computationally efficient simulation of neurons in a clock-driven and synapses in an event-driven fashion; memory efficient representation of simulation state; and communication efficient message exchanges.

Using phenomenological, single-compartment models of spiking neurons and synapses with spike-timing dependent plasticity, we represented a rat-scale cortical model (55 million neurons, 442 billion synapses) in 8TB memory of a 32,768-processor BlueGene/L. With 1 millisecond resolution for neuronal dynamics and 1-20 milliseconds axonal delays, C2 can simulate 1 second of model time in 9 seconds per Hertz of average neuronal firing rate.

In summary, by combining state-of-the-art hardware with innovative algorithms and software design, we simultaneously achieved unprecedented time-to-solution on an unprecedented problem size.

FAQ: How does rat-scale compare to previous results on mouse-scale?

The rat-scale model (55 million neurons, 442 billion synapses) is about 3.5 times bigger than our previous work on mouse-scale model (16 million neurons, 128 billion synapses) and eight times bigger than (almost) half-mouse-scale models (8 million neurons, 50 million synapses).

FAQ: What is the essence of a cortical simulation?

The term “neuron” was coined by Heinrich Wilhelm Gottfried von Waldeyer-Hartz in 1891 to capture the discrete information processing units of the brain. The junctions between two neurons were termed “synapses” by Sir Charles Sherrington in 1897. Information flows only along one direction through a synapse, thus we talk about a “pre-synaptic” and a “post-synaptic” neuron. Neurons, when activated by sufficient input received via synapses, emit “spikes” that are delivered to those synapses that the neuron is pre-synaptic to. Neurons can be either “excitatory” or “inhibitory.”

The essence of a cortical simulation is to put together neurons connected via an interconnection network (namely, the neuroanatomy), and to understand the information processing capability of such networks. The essence of an efficient cortical simulator, C2, is as follows:

1. For every neuron:
    a. For every clock step (say 1 ms):
        i. Update the state of each neuron
        ii. If the neuron fires, generate an event for each synapse that
            the neuron is post-synaptic to and pre-synaptic to.
2. For every synapse:
    When it receives a pre- or post-synaptic event,
    update its state and, if necessary, the state of the post-synaptic neuron

Thus, neurons are simulated in a “clock-driven” fashion whereas synapses are simulated in an “event-driven” fashion.

FAQ: What were the technical challenges in building C2?

As a first step toward cognitive computation, an interesting question is whether one can simulate a mammalian-scale cortical model in near real-time on an existing computer system? What are the memory, computation, and communication costs for achieving such a simulation?

Memory: To achieve near real-time simulation times, the state of all neurons and synapses must fit in the random access memory of the system. Since synapses far outnumber the neurons, the total available memory divided by the number of bytes per synapse limits the number of synapses that can be modeled. We need to store state for 448 billion synapses and 55 million neurons where later being negligible in comparison to the former.

Communication: Let us assume that, on an average, each neuron fires once a second. Each neuron connects to 8,000 other neurons, and, hence, each neuron would generate 8,000 spikes (“messages’) per second. This amounts to a total of 448 billion messages per second.

Computation: Let us assume that, on an average, each neuron fires once a second. In this case, on an average, each synapse would be activated twice—once when its pre-synaptic neuron fires and once when its post-synaptic neuron fires. This amounts to 896 billion synaptic updates per second. Let us assume that the state of each neuron is updated every millisecond. This amounts to 55 billion neuronal updates per second. Once again, synapses seem to dominate the computational cost.

The key observation is that synapses dominate all the three costs!

Let us now take a state-of-the-art supercomputer BlueGene/L with 32,768 processors, 256 megabytes of memory per processor (a total of 8 terabytes), and 1.05 gigabytes per second of in/out communication bandwidth per node. To meet the above three constraints, if one can design data structure and algorithms that require no more than 16 byes of storage per synapse, 175 Flops per synapse per second, and 66 bytes per spike message, then one can hope for a rat-scale, near real-time simulation. Can such a software infrastructure be put together?

This is exactly the challenge that our paper addresses.

Specifically, we have designed and implemented a massively parallel cortical simulator, C2, designed to run on distributed memory multiprocessors that incorporates several algorithmic enhancements: (a) a computationally efficient way to simulate neurons in a clock-driven ("synchronous") and synapses in an event-driven("asynchronous") fashion; (b) a memory efficient representation to compactly represent the state of the simulation; (c) a communication efficient way to minimize the number of messages sent by aggregating them in several ways and by mapping message exchanges between processors onto judiciously chosen MPI primitives for synchronization.

Furthermore, the simulator incorporated (a) carefully selected computationally efficient models of phenomenological spiking neurons from the literature; (b) carefully selected models of spike-timing dependent synaptic plasticity for synaptic updates; (c) axonal delays; (d) 80% excitatory neurons and 20% inhibitory neurons; and (e) a certain random graph of neuronal interconnectivity.

FAQ: What is the significance of the results achieved so far?

On a historical note, in 1956, a team of IBM researchers simulated 512 neurons (N. Rochester, J. H. Holland, L. H. Haibt, and W. L. Duda, Tests on a Cell Assembly Theory of the Action of the Brain Using a Large Digital Computer, IRE Transaction of Information Theory, IT-2, pp. 80-93, September 1956.).

Our results represent a judicious intersection between computer science which defines the region of feasibility in terms of available computing resources today, and neuroscience which defines the region of desirability in terms of biological details that one would like to add. At any given point in time, to get a particular scale of simulation at a particular simulation speed, one must balance between feasibility and desirability. Thus, our results demonstrate that a non-empty intersection between these two regions exists today at rat-scale, at near real-time and at a certain complexity of simulations. This intersection will continue to expand over time. As more biological richness is added, correspondingly more resources will be required to accommodate the model in memory and to maintain reasonable simulation times.

The value of the current simulator is in the fact that it permits almost interactive, large-scale simulation, and, hence, allows us to explore a wide space of parameters in trying to uncover (“guess”) the function of the cerebral cortex. Furthermore, understanding and harnessing dynamics of such large-scale networks is a tremendously exciting frontier. We hope that C2 will become the linear accelerator of cognitive computing.

FAQ: Is it a Rat Brain?

No.

The rat cerebral cortex itself is a remarkable wonder of nature with a surface area of only 6 square cm, a thickness of roughly 1.5-2 mm, and consumes minimal power, but yet hides untold secrets not to mention richness of neurons and synapses which certainly dwarf the relatively simple phenomenological models that we can simulate today. Philosophically, any simulation is always an approximation (a kind of “cartoon”) based on certain assumptions. A biophysically-realistic simulation is NOT the focus of our work.

Our focus is on simulating only those details that lead us towards insights into brain’s high-level computational principles. Elucidation of such high-level principles will lead, we hope, to novel cognitive systems, computing architectures, programming paradigms, and numerous practical applications.

So, no, it is not a rat brain, and it most certainly does not sniff cheese yet! But, it is rat-scale, and it does consume a lot of processing cycles and power!!

FAQ: What will it take to achieve human-scale cortical simulations?

Before discussing this question, we must agree upon the complexity of neurons and synapses to be simulated. Let us fix these two as described in our SC07 paper.

The human cortex has about 22 billion neurons which is roughly a factor of 400 larger than our rat-scale model which has 55 million neurons. We used a BlueGene/L with 92 TF and 8 TB to carry out rat-scale simulations in near real-time. So, by naïve extrapolation, one would require at least a machine with a computation capacity of 36.8 PF and a memory capacity of 3.2 PB. Furthermore, assuming that there are 8,000 synapses per neuron, that neurons fire at an average rate of 1 Hz, and that each spike message can be communicated in, say, 66 Bytes. One would need an aggregate communication bandwidth of ~ 2 PBps.

Thus, even at a given complexity of synapses and neurons that we have used, scaling cortical simulations to these levels will require tremendous advances along all the three metrics: memory, communication and computation. Furthermore, power consumption and space requirements will become a major technological obstacle that must be overcome. Finally, as complexity of synapses and neurons is increased many fold, even more resources would be required. Inevitably, along with the advances in hardware, significant further innovation in software infrastructure would be required to effectively use the available hardware resources.

FAQ: What can brain teach us about new computing architectures?

The cortex is an analog, asynchronous, parallel, biophysical, fault-tolerant, and distributed memory machine. C2 represents one logical abstraction of the cortex that is suitable for simulation on modern distributed memory multiprocessors. Computation and memory are fully distributed in the cortex, whereas in C2 each processor houses and processes several neurons and synapses. Communication is implemented in the cortex via targeted physical wiring, whereas in C2 it is implemented in software by message passing on top of an underlying general-purpose communication infrastructure. Unlike the cortex, C2 uses discrete simulation time steps and synchronizes all processors at every step. In light of these observations, the search for new types of (perhaps non-von Neumann) computer architecture to truly mimic the brain remains an open question. However, we believe that detailed design of the simulator and analysis of the results presented in this paper may present one angle of attack towards this quest.