Moments ago, IBM and LBNL presented the next milestone towards fulfilling the vision of DARPA SyNAPSE program at Supercomputing 2012.
TITLE:
Compass: A scalable simulator for an architecture for Cognitive Computing
AUTHORS:
Robert Preissl
Theodore M. Wong
Pallab Datta
Myron D. Flickner
Raghavendra Singh
Steven K. Esser
Emmett McQuinn*
Rathinakumar Appuswamy*
William P. Risk
Horst D. Simon (LBNL)
Dharmendra S. Modha*Since submitting the camera ready copy, these colleagues contributed significantly to visualization and dynamics.
ABSTRACT:
Inspired by the function, power, and volume of the organic brain, IBM is developing TrueNorth, a novel modular, scalable, non-von Neumann, ultra-low power, cognitive computing architecture. TrueNorth consists of a scalable network of neurosynaptic cores, with each core containing neurons, dendrites, synapses, and axons. To set sail for TrueNorth, IBM developed Compass, a multi-threaded, massively parallel functional simulator and a parallel compiler that maps a network of long-distance pathways in the macaque monkey brain to TrueNorth.
IBM and LBNL demonstrated near-perfect weak scaling on a 16 rack IBM Blue Gene/Q (262,144 processor cores, 256 TB memory), achieving an unprecedented scale of 256 million neurosynaptic cores containing 65 billion neurons and 16 trillion synapses running only 388× slower than real time with an average spiking rate of 8.1 Hz. By using emerging PGAS communication primitives, IBM also demonstrated 2× better real-time performance over MPI primitives on a 4 rack Blue Gene/P (16384 processor cores, 16 TB memory). Here is PDF of final paper.
NEW NEWS: Since submitting the camera ready copy, using 96 Blue Gene/Q racks of the Lawrence Livermore National Lab Sequoia supercomputer (1,572,864 processor cores, 1.5 PB memory, 98,304 MPI processes, and 6,291,456 threads), IBM and LBNL achieved an unprecedented scale of 2.084 billion neurosynaptic cores containing 53×1010 neurons and 1.37×1014 synapses running only 1542× slower than real time. Here is PDF of IBM Research Report, RJ 10502.
SIGNIFICANCE:
The ultimate vision of the DARPA SyNAPSE program is to build a cognitive computing architecture with 1010 neurons and 1014 synapses. “The vision for the Systems of Neuromorphic Adaptive Plastic Scalable Electronics (SyNAPSE) program is to develop electronic neuromorphic machine technology that scales to biological levels.” For reference, DARPA SyNAPSE BAA from 2008 is here. This DARPA SyNAPSE metric was probably inspired by the following: Gordon Shepherd in The Synaptic Organization of the Brain estimates the number of synapses in the human brain as 0.6×1014 and Christof Koc in Biophysics of Computation: Information Processing in Single Neurons estimates the number of synapses in the human brain as 2.4×1014.
CLARIFICATION:
We have not built a biologically realistic simulation of the complete human brain. Rather, we have simulated a novel modular, scalable, non-von Neumann, ultra-low power, cognitive computing architecture at the scale of DARPA SyNAPSE metric of 1014 synapses that, in turn, is inspired by the number of synapses in the human brain. Computation (“neurons”), memory (“synapses”), communication (“axons”, “dendrites”) are mathematically abstracted away from biological detail towards engineering goals of maximizing function (utility, applications) and minimizing cost (power, area, delay) and design complexity of hardware implementation.
PAPER STATUS:
Per SC12 website: “The SC12 Technical Papers program received 472 submissions covering a wide variety of research topics in high performance computing. We followed a rigorous peer review process with a newly introduced author rebuttal period, careful management of conflicts, and four reviews per submission (in most cases). At a two-day face-to-face committee meeting on June 25-26 in Salt Lake City, over 100 technical paper committee members discussed every paper and finalized the selections. At the conclusion of the meeting, SC12 Technical Papers had accepted 100 papers, reflecting an acceptance rate of 21 percent.” Six of the 100 accepted papers, including this one, were selected as finalists for the Best Paper Award.
PERSPECTIVE:
Through last 6 years, powered by Blue Gene/L, Blue Gene/P, and Blue Gene/Q, and with support from DARPA and DOE / NNSA / LLNL, the simulations have scaled from 4,096 processor cores and 1 TB main memory in February 2007 to 8,192 processors and 4 TB of main memory in July 2007 to 32,768 processor cores and 8TB main memory in November 2007 to 147,456 processor cores and 144 TB of main memory in November 2009 to 262,144 processor cores and 256 TB main memory in April 2012 to, finally, 1,572,864 processor cores and 1.5 PB main memory in October 2012.
Previously, we have demonstrated a neurosynaptic core and some of its applications. We have also compiled the largest long-distance wiring diagram in the monkey brain. Now, imagine a network with over 2 billion of these neurosynaptic cores that are divided into 77 brain-inspired regions with probabilistic intra-region (“gray matter”) connectivity and monkey-brain-inspired inter-region (“white matter”) connectivity. The new paper simulates dynamics of such a network on Top #2 supercomputer, LLNL’s Sequoia, and drives the dynamics to a self-critical state. This fulfills a core vision of DARPA SyNAPSE project to bring together nanotechnology, neuroscience, and supercomputing to lay the foundation of a novel cognitive computing architecture that complements today’s von Neumann machines.
APPLICATIONS OF COMPASS:
The Compass simulator is an all-purpose “swiss-army knife” to pursue novel architectures, algorithms, and applications. Compass is indispensable for (a) verifying TrueNorth correctness via regression testing, (b) studying TrueNorth dynamics, (c) benchmarking inter-core communication topologies, (d) demonstrating applications in vision, audition, realtime motor control, and sensor integration, (e) estimating power consumption, and (f) hypotheses testing, verification, and iteration regarding neural codes and function. We have used Compass to demonstrate numerous applications of the TrueNorth architecture, such as optic flow, attention mechanisms, image and audio classification, multi-modal image audio classification, character recognition, robotic navigation, and spatio-temporal feature extraction. These applications will be published separately.
THANKS:
IBM would like to thank DARPA, DARPA DSO, SyNAPSE Program Manager: Dr. Gill A. Pratt, and Former SyNAPSE Program Manager: Dr. Todd Hylton. The research reported in this presentation was sponsored by Defense Advanced Research Projects Agency, Defense Sciences Office (DSO), Program: Systems of Neuromorphic Adaptive Plastic Scalable Electronics (SyNAPSE), Issued by DARPA/CMO under Contract No. HR0011-09-C-0002.
IBM and LBNL would like to thank Michel McCoy and Tom Spelce for access to the Sequoia Blue Gene/Q supercomputer at Lawrence Livermore National Laboratory and the DOE NNSA Advanced Simulation and Computing Program for time on Sequoia. Lawrence Livermore National Laboratory is operated by Lawrence Livermore National Security, LLC, for the U.S. Department of Energy, National Nuclear Security Administration under Contract DE-AC52-07NA27344.
The authors are indebted to Fred Mintzer for access to IBM Blue Gene/P and Blue Gene/Q at the IBM T.J. Watson Research Center and to George Fax, Kerry Kaliszewski, Andrew Schram, Faith W. Sell, Steven M. Westerbeck for access to IBM Rochester Blue Gene/Q, without which this paper would have been impossible.
The authors thank Filipp Akopyan, Rodrigo Alvarez-Icaza, John Arthur, Andrew Cassidy, Daniel Friedman, Subu Iyer, Bryan Jackson, Rajit Manohar, Paul Merolla, and Jun Sawada for their collaboration on the TrueNorth architecture, and our university partners Stefano Fusi, Rajit Manohar, Ashutosh Saxena, and Giulio Tononi as well as their research teams for their feedback on the Compass simulator.
Finally, the authors would like to thank David Peyton for his expert editorial assistance in revising the manuscript.
TYPOS:
Section III, Listing 1, move “threadAggregate( remoteBuf, remoteBufAgg)” immediately
below the line “if ( threadID == 0 ) {“Section V.C., “80/20” should be “20/80”
TO LEARN MORE:
VIDEOS:
PAST IBM PRESS RELEASES:
COMPASS ALGORITHM:
Each MPI process executes the following algorithmic flow with 1 master thread and 64 slave threads.
For the flagship 1014 run, there were 98,304 MPI processes and 6,291,456 threads.