Please see link.
Archives for 2012
Council of Scientific Society Presidents
The Council of Scientific Society Presidents is an organization of presidents, presidents-elect, and recent past presidents of about sixty scientific federations and societies whose combined membership numbers over 1.4 million scientists and science educators.
On December 8, 2012, at CSSP annual meeting in Washington, DC, I presented SyNAPSE in the session entitled “Frontiers of 21st Century Science”. The other speakers included Kavli Prize winner Professor Mildred Dresselhaus and HHMI Scientist Professor Gregory Hannon.
1014

Dharmendra S. Modha, Myron D. Flickner, Emmett McQuinn, Steven K. Esser, Robert Preissl, Pallab Datta, Horst D. Simon, Rathinakumar Appuswamy, Theodore M. Wong, William P. Risk
(Photo Credit: Hita Bambhania-Modha)
Moments ago, IBM and LBNL presented the next milestone towards fulfilling the vision of DARPA SyNAPSE program at Supercomputing 2012.
TITLE:
Compass: A scalable simulator for an architecture for Cognitive Computing
AUTHORS:
Robert Preissl
Theodore M. Wong
Pallab Datta
Myron D. Flickner
Raghavendra Singh
Steven K. Esser
Emmett McQuinn*
Rathinakumar Appuswamy*
William P. Risk
Horst D. Simon (LBNL)
Dharmendra S. Modha*Since submitting the camera ready copy, these colleagues contributed significantly to visualization and dynamics.
ABSTRACT:
Inspired by the function, power, and volume of the organic brain, IBM is developing TrueNorth, a novel modular, scalable, non-von Neumann, ultra-low power, cognitive computing architecture. TrueNorth consists of a scalable network of neurosynaptic cores, with each core containing neurons, dendrites, synapses, and axons. To set sail for TrueNorth, IBM developed Compass, a multi-threaded, massively parallel functional simulator and a parallel compiler that maps a network of long-distance pathways in the macaque monkey brain to TrueNorth.
IBM and LBNL demonstrated near-perfect weak scaling on a 16 rack IBM Blue Gene/Q (262,144 processor cores, 256 TB memory), achieving an unprecedented scale of 256 million neurosynaptic cores containing 65 billion neurons and 16 trillion synapses running only 388× slower than real time with an average spiking rate of 8.1 Hz. By using emerging PGAS communication primitives, IBM also demonstrated 2× better real-time performance over MPI primitives on a 4 rack Blue Gene/P (16384 processor cores, 16 TB memory). Here is PDF of final paper.
NEW NEWS: Since submitting the camera ready copy, using 96 Blue Gene/Q racks of the Lawrence Livermore National Lab Sequoia supercomputer (1,572,864 processor cores, 1.5 PB memory, 98,304 MPI processes, and 6,291,456 threads), IBM and LBNL achieved an unprecedented scale of 2.084 billion neurosynaptic cores containing 53×1010 neurons and 1.37×1014 synapses running only 1542× slower than real time. Here is PDF of IBM Research Report, RJ 10502.
SIGNIFICANCE:
The ultimate vision of the DARPA SyNAPSE program is to build a cognitive computing architecture with 1010 neurons and 1014 synapses. “The vision for the Systems of Neuromorphic Adaptive Plastic Scalable Electronics (SyNAPSE) program is to develop electronic neuromorphic machine technology that scales to biological levels.” For reference, DARPA SyNAPSE BAA from 2008 is here. This DARPA SyNAPSE metric was probably inspired by the following: Gordon Shepherd in The Synaptic Organization of the Brain estimates the number of synapses in the human brain as 0.6×1014 and Christof Koc in Biophysics of Computation: Information Processing in Single Neurons estimates the number of synapses in the human brain as 2.4×1014.
CLARIFICATION:
We have not built a biologically realistic simulation of the complete human brain. Rather, we have simulated a novel modular, scalable, non-von Neumann, ultra-low power, cognitive computing architecture at the scale of DARPA SyNAPSE metric of 1014 synapses that, in turn, is inspired by the number of synapses in the human brain. Computation (“neurons”), memory (“synapses”), communication (“axons”, “dendrites”) are mathematically abstracted away from biological detail towards engineering goals of maximizing function (utility, applications) and minimizing cost (power, area, delay) and design complexity of hardware implementation.
PAPER STATUS:
Per SC12 website: “The SC12 Technical Papers program received 472 submissions covering a wide variety of research topics in high performance computing. We followed a rigorous peer review process with a newly introduced author rebuttal period, careful management of conflicts, and four reviews per submission (in most cases). At a two-day face-to-face committee meeting on June 25-26 in Salt Lake City, over 100 technical paper committee members discussed every paper and finalized the selections. At the conclusion of the meeting, SC12 Technical Papers had accepted 100 papers, reflecting an acceptance rate of 21 percent.” Six of the 100 accepted papers, including this one, were selected as finalists for the Best Paper Award.
PERSPECTIVE:
Through last 6 years, powered by Blue Gene/L, Blue Gene/P, and Blue Gene/Q, and with support from DARPA and DOE / NNSA / LLNL, the simulations have scaled from 4,096 processor cores and 1 TB main memory in February 2007 to 8,192 processors and 4 TB of main memory in July 2007 to 32,768 processor cores and 8TB main memory in November 2007 to 147,456 processor cores and 144 TB of main memory in November 2009 to 262,144 processor cores and 256 TB main memory in April 2012 to, finally, 1,572,864 processor cores and 1.5 PB main memory in October 2012.
Previously, we have demonstrated a neurosynaptic core and some of its applications. We have also compiled the largest long-distance wiring diagram in the monkey brain. Now, imagine a network with over 2 billion of these neurosynaptic cores that are divided into 77 brain-inspired regions with probabilistic intra-region (“gray matter”) connectivity and monkey-brain-inspired inter-region (“white matter”) connectivity. The new paper simulates dynamics of such a network on Top #2 supercomputer, LLNL’s Sequoia, and drives the dynamics to a self-critical state. This fulfills a core vision of DARPA SyNAPSE project to bring together nanotechnology, neuroscience, and supercomputing to lay the foundation of a novel cognitive computing architecture that complements today’s von Neumann machines.
APPLICATIONS OF COMPASS:
The Compass simulator is an all-purpose “swiss-army knife” to pursue novel architectures, algorithms, and applications. Compass is indispensable for (a) verifying TrueNorth correctness via regression testing, (b) studying TrueNorth dynamics, (c) benchmarking inter-core communication topologies, (d) demonstrating applications in vision, audition, realtime motor control, and sensor integration, (e) estimating power consumption, and (f) hypotheses testing, verification, and iteration regarding neural codes and function. We have used Compass to demonstrate numerous applications of the TrueNorth architecture, such as optic flow, attention mechanisms, image and audio classification, multi-modal image audio classification, character recognition, robotic navigation, and spatio-temporal feature extraction. These applications will be published separately.
THANKS:
IBM would like to thank DARPA, DARPA DSO, SyNAPSE Program Manager: Dr. Gill A. Pratt, and Former SyNAPSE Program Manager: Dr. Todd Hylton. The research reported in this presentation was sponsored by Defense Advanced Research Projects Agency, Defense Sciences Office (DSO), Program: Systems of Neuromorphic Adaptive Plastic Scalable Electronics (SyNAPSE), Issued by DARPA/CMO under Contract No. HR0011-09-C-0002.
IBM and LBNL would like to thank Michel McCoy and Tom Spelce for access to the Sequoia Blue Gene/Q supercomputer at Lawrence Livermore National Laboratory and the DOE NNSA Advanced Simulation and Computing Program for time on Sequoia. Lawrence Livermore National Laboratory is operated by Lawrence Livermore National Security, LLC, for the U.S. Department of Energy, National Nuclear Security Administration under Contract DE-AC52-07NA27344.
The authors are indebted to Fred Mintzer for access to IBM Blue Gene/P and Blue Gene/Q at the IBM T.J. Watson Research Center and to George Fax, Kerry Kaliszewski, Andrew Schram, Faith W. Sell, Steven M. Westerbeck for access to IBM Rochester Blue Gene/Q, without which this paper would have been impossible.
The authors thank Filipp Akopyan, Rodrigo Alvarez-Icaza, John Arthur, Andrew Cassidy, Daniel Friedman, Subu Iyer, Bryan Jackson, Rajit Manohar, Paul Merolla, and Jun Sawada for their collaboration on the TrueNorth architecture, and our university partners Stefano Fusi, Rajit Manohar, Ashutosh Saxena, and Giulio Tononi as well as their research teams for their feedback on the Compass simulator.
Finally, the authors would like to thank David Peyton for his expert editorial assistance in revising the manuscript.
TYPOS:
Section III, Listing 1, move “threadAggregate( remoteBuf, remoteBufAgg)” immediately
below the line “if ( threadID == 0 ) {“Section V.C., “80/20” should be “20/80”
TO LEARN MORE:
VIDEOS:
PAST IBM PRESS RELEASES:
COMPASS ALGORITHM:
Each MPI process executes the following algorithmic flow with 1 master thread and 64 slave threads.
For the flagship 1014 run, there were 98,304 MPI processes and 6,291,456 threads.

Building Block of a Programmable Neuromorphic Substrate: A Digital Neurosynaptic Core
Last week, IBM-Cornell SyNAPSE Team published the following paper:
Citation: John V. Arthur, Paul A. Merolla, Filipp Akopyan, Rodrigo Alvarez-Icaza, Andrew Cassidy, Shyamal Chandra, Steven K. Esser, Nabil Imam, William Risk, Daniel Rubin, Rajit Manohar, and Dharmendra S. Modha, "Building Block of a Programmable Neuromorphic Substrate: A Digital Neurosynaptic Core", International Joint Conference on Neural Networks, June 2012.
Abstract: The grand challenge of neuromorphic computation is to develop a flexible brain-like architecture capable of a wide array of real-time applications, while striving towards the ultra-low power consumption and compact size of biological neural systems. To this end, we fabricated a key building block of a modular neuromorphic architecture, a neurosynaptic core. Our implementation consists of 256 integrate-and-fire neurons and a 1,024×256 SRAM crossbar memory for synapses that fits in 4.2mm2 using a 45nm SOI process and consumes just 45pJ per spike. The core is fully configurable in terms of neuron parameters, axon types, and synapse states and its fully digital implementation achieves one-to-one correspondence with software simulation models. One-to-one correspondence allows us to introduce an abstract neural programming model for our chip, a contract guaranteeing that any application developed in software functions identically in hardware. This contract allows us to rapidly test and map applications from control, machine vision, and classification. To demonstrate, we present four test cases (i) a robot driving in a virtual environment, (ii) the classic game of pong, (iii) visual digit recognition and (iv) an autoassociative memory.
Implementation of olfactory bulb glomerular-layer computations in a digital neurosynaptic core
Today, Cornell – IBM SyNAPSE Team published the following paper:
Citation: Imam N, Cleland TA, Manohar R, Merolla PA, Arthur JV, Akopyan F and Modha DS (2012) Implementation of olfactory bulb glomerular-layer computations in a digital neurosynaptic core. Front. Neurosci. 6:83. doi: 10.3389/fnins.2012.00083
Abstract: We present a biomimetic system that captures essential functional properties of the glomerular layer of the mammalian olfactory bulb, specifically including its capacity to decorrelate similar odor representations without foreknowledge of the statistical distributions of analyte features. Our system is based on a digital neuromorphic chip consisting of 256 leaky-integrate-and-fire neurons, 1024 × 256 crossbar synapses, and address-event representation communication circuits. The neural circuits configured in the chip reflect established connections among mitral cells, periglomerular cells, external tufted cells, and superficial short-axon cells within the olfactory bulb, and accept input from convergent sets of sensors configured as olfactory sensory neurons. This configuration generates functional transformations comparable to those observed in the glomerular layer of the mammalian olfactory bulb. Our circuits, consuming only 45 pJ of active power per spike with a power supply of 0.85 V, can be used as the first stage of processing in low-power artificial chemical sensing devices inspired by natural olfactory systems.