June 23, 2017

A Scale-Out Scale-up Synaptic Supercomputer (NS16e-4)

Today, Air Force Research Lab (AFRL) and IBM announce the development of a new Scale-out, Scale-up Synaptic Supercomputer (NS16e-4) that builds on previous NS16e system for LLNL. Over the last six years, IBM has expanded the number of neurons per system from 256 to more than 64 million – an 800 percent annual increase over six years!

Don't miss the story in TechCrunch and a video.

Enclosed below is a perspective from my colleagues.

Line Separator


Guest Blog by Bill Risk, Camillo Sassano, Mike DeBole, Ben Shaw, Aaron Cox, and Kevin Schultz.

The IBM TrueNorth Neurosynaptic System NS16e-4 is the latest hardware innovation designed to exploit the capabilities of the TrueNorth chip. Through a combination of “scaling out” and “scaling up,” we now have increased the size of TrueNorth-based neurosynaptic systems by 64x from the original single-chip systems (Figure 1).

Scale-out and Scale-up
Figure 1. Scale-out and scale-up of systems based on the TrueNorth chip.

The NS1e board included a single TrueNorth chip and was designed to jump start learning about and using the TrueNorth system. The first “scale out” step—the NS1e-16— put sixteen of these boards in a single enclosure, which permitted running multiple jobs in parallel. The first “scale up” system—the NS16e—exploited the built-in ability of the TrueNorth chip to tile seamlessly and communicate directly with other TrueNorth chips. Where both the NS1e-16 and NS16e offered the same number of neurons and synapses, the NS1e-16 was essentially 16 separate 1-million neuron systems working in parallel, while the NS16e was a single 16-million neuron system, allowing the exploration of substantially larger neurosynaptic networks. The NS16e-4 is the next step in this evolution, bringing four NS16e systems together in parallel to provide 64 million neurons and 16 billion synapses in a single enclosure.

As announced today, the U.S. Air Force Research Lab has ordered the first NS16e-4 system. To deliver this system, we needed to devise a way to put four NS16e systems in the same enclosure, subject to the following constraints:

  • It must fit in a 4U-high (7”) by 29” deep enclosure that can be mounted in a standard equipment rack
  • All related components required by the system—power supplies, cabling, connectors, etc., (other than a separate 2U server that acts as a gateway for the system)—must be contained within the same 4U space
  • It must be possible to easily remove each NS16e sub-system so that it can be serviced, transferred to a different identical enclosure, or used independently as a standalone system

Given the size of the previous enclosure, it was not feasible to simply put four NS16e systems in a 4U-high box and meet these constraints. Instead, we first had to redesign some aspects of the NS16e circuit boards to permit a more compact form factor. In concert, we redesigned the NS16e case to match the new form factor, while retaining the original design’s signature angular shapes.

This smaller form factor allowed us to consider several different ways that four NS16e sub-systems could be efficiently placed in the allotted space. The one we settled on places them in a unique V-shaped arrangement in a drawer (Figure 2a), which, when extended, provides easy access to the individual NS16e sub-systems. Empty spaces under the V provide room to route cables and move air for ventilation. A docking structure (not visible in the Figure 2) holds the NS16e’s in place when in use, provides power and signal connections, and permits them to be released and removed when necessary. This arrangement provides a view of all 64 TrueNorth chips. A transparent window and interior accent lighting permit the chips to be seen even when the drawer is closed. (Figure 2b). Multiple NS16e-4 systems can be placed in the same rack; the novel front panel shape creates an intriguing 3D geometric pattern (Figure 3).

NS16e-4 system with drawer open.
Figure 2a. NS16e-4 system with drawer open.

NS16e-4 system with drawer closed.
Figure 2b. NS16e-4 system with drawer closed.

Meeting all the technical requirements required creative industrial design. We designed for utility, but were pleased that a measure of beauty and elegance emerged through the design process. We now look forward to building and delivering it!

Two 4U-high NS16e-4 systems stacked above a 2U-high server.
Figure 3. Two 4U-high NS16e-4 systems stacked above a 2U-high server.

Line Separator


The following timeline provides context for today's milestone in terms of the continued evolution of our project.

Illustration Credit: William Risk

November 13, 2016

Breaking News: Supercomputing 2016 Paper -- TrueNorth Ecosystem for Brain-Inspired Computing

Guest Blog by Jun Sawada, Brian Taba, Pallab Datta, and Ben Shaw.

This week, in collaboration with Lawrence Livermore National Laboratory, U.S. Air Force Research Laboratory, and U.S. Army Research Laboratory, IBM Research is publishing the newest paper describing the TrueNorth ecosystem in the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16). SC16 is one of the most prestigious conferences in the HPC (high performance computing) field. Please download the paper below.

Download Paper

This paper describes the result of multi-year effort to create development tools and an ecosystem around the TrueNorth neurosynaptic chip. Today, we have an entire stack of hardware, firmware, software, training algorithms and applications, with active users at many university, corporate and government laboratories implementing neural computation through deep-learning and convolution networks. The diagram below, a figure from the paper, shows the entire ecosystem stack based on the TrueNorth neurosynaptic chip.


No chip will attract significant usage without a good hardware and software ecosystem. Typically when a chip maker comes out with a new processor, they can rely on existing development tools, software and chip sets, so that it is seldom necessary to build an entirely new ecosystem from scratch. However, TrueNorth was a totally new architecture for neurosynaptic computation which required building many new tools from the ground up. We have worked over the past several years to build an ecosystem, and today the toolset allows users to create deep-learning network applications for TrueNorth systems with little more than the click of a button. The paper describes these tools and the ecosystem that empowers users of the TrueNorth neurosynaptic chip.

TrueNorth Hardware

The paper describes the TrueNorth-based hardware systems we have developed: (a) a mobile evaluation board (NS1e), (b) a scale-out parallel neuromorphic server (NS1e-16), and (c) a scale-up system for running larger neural network (NS16e). The diagram below shows each system and illustrates its internal architecture.

Hardware Schematics

The schematics show TrueNorth chips in green, FPGA programmable logic in blue, CPU's in orange, and network in purple. Each system runs with control software on CPU talking to TrueNorth chips with the help of programmable logic in FPGA. For details, please see the paper.

The NS1e system is an index card-sized, mobile system consisting of a single TrueNorth chip and a Xilinx Zynq-7000 system-on-a-chip. Although it is a tiny system it has proven to be our workhorse. Many of our university partners use NS1e to run and test neural networks they create for the TrueNorth architecture.


The NS1e-16 is a scale-out system running many neurosynaptic chips in parallel. It is a collection of 16 NS1e single-chip systems with an Ethernet backbone, as shown in the diagram above. When a request to execute a neural network job comes, the gateway machine picks an available NS1e automatically and dispatches the job to it. NS1e-16 is really a small neural network data-center in a box.


Finally NS16e, the scale-up system, has a tightly connected 16-chip TrueNorth array. It can run a neural network using up to 16 million neurons and 4 billion synapses. This system can run image recognition tasks (CIFAR 10, CIFAR 100) with near state-of-the-art accuracy at over 1000 frames per second.


Work Flow of Software Ecosystem

To design applications for TrueNorth, we have built a rich stack of development tools, such as our Eedn framework for developing energy-efficient deep neuromorphic networks. This generates convolutional neural networks (CNNs) that run natively in TrueNorth hardware to achieve near-state-of-art classification accuracy in real-time at very low power.

Figure below sketches a typical workflow for developing streaming Eedn applications like the gesture-recognition application we showed at the CVPR Industry Expo, or the television remote-control application we developed in collaboration with Samsung.


The runtime flow is shown on the top row (a)-(f). A sensor (a) produces a stream of input data. It may be a spiking sensor such as a Dynamic Vision Sensor. The original sensor data is often transformed into more specialized features (b) by cropping, filtering, or other transformation, before being encoded as a stream of input spikes (c), and sent to TrueNorth chip (d). The output of TrueNorth is decoded (e) and sent to downstream application (f).

Flow for designing an Eedn application is shown on the bottom row (g)-(l). The fundamental task is to configure the TrueNorth chip with a set of network parameters. Starting from a dataset (g), the Eedn trainer (h) uses a GPU to train a CNN within the constraints of the TrueNorth architecture. The trained neural network is built into a TrueNorth model (i) and then mapped to hardware (j). Finally, it uses simulation (k) and analysis tools (l) to improve the quality of the neural network generation.

Core Placement Problem

A TrueNorth model build at (i) is a purely logical representation of a network of neurosynaptic cores. However, to configure actual hardware, every logical core in the network must be mapped to a unique physical (X, Y) location on a TrueNorth chip by a process called placement (j)). Placement optimization is a critical issue especially for large multi-chip networks for the NS16e.

TrueNorth uses a dimension-order router, in which spikes first travel horizontally (east-west) and then vertically (north-south) between their source and destination. Figure below shows an example of routing spikes within and across TrueNorth chips.


The objective of the placement is to minimize the sum of all the paths from source neurons to destination neurons, reduce the overall active power of the system, and improve the throughput. It implicitly attempts to pack all the tightly connected cores on the same chip. This problem is typically known to be NP-Hard. We developed a new NeuroSynaptic Core Placement (NSCP) Algorithm that maps the neurosynaptic cores efficiently onto the hardware substrate. The algorithm places the input neuron layer first, by collocating cores that process neighboring regions of the input signal. It then iteratively places cores in each consecutive network layer.

Lastly, a movie generated by a graph analysis tool shows how a multi-layer convolution neural network is placed on a 16-chip TrueNorth array. The visualization depicts core to core communication as edges, and illustrates how different layers of a multi-layer network are actually placed. Early layer computation is done locally while the information is shared between chips in higher layers. Some images from the movies are enclosed below.

Placement Viz 1

Placement Viz 1

Placement Viz 2

Placement Viz 3

Our Partners

As we build out the TrueNorth platform, tool chain and software ecosystem, we rely on a number of distinguished labs and other ecosystem partners to illuminate and help explore the space of TrueNorth's application potential.

Since we held our first TrueNorth Boot Camp in August 2015 with a core group of 65 committed early adopters, our User Community has expanded to include over 150 members from 25 Universities, 7 Government labs and 3 National labs, as well as 8 corporate research centers around the world.

TrueNorth's extremely low power profile and guaranteed real-time operation make it a natural fit for applications ranging from mobile/embedded devices to HPC and supercomputing. Table 1 from our paper lists various research areas currently under study by our ecosystem partners. The range and variety convey some idea of the space of TrueNorth application potential our partners have begun to explore.

TrueNorth Partners

Table of Our TrueNorth Ecosystem Partners

In May, we held a Reunion of our BootCamp community to showcase some of the amazing things that have been accomplished in just a few months. In addition to algorithmic and toolchain developments from our team, 16 posters and stage presentations were delivered by ecosystem partners. These projects illustrate a range of research topics and are a testament to the motivation and energy of our pioneering TrueNorth developers.

Three of our partners, Army Research Lab, Air Force Research Lab and Lawrence Livermore National Lab, contributed sections to the Supercomputing paper, each application showcases a different TrueNorth system.

Army Research Lab prototyped a computational offloading scheme to illustrate how TrueNorth's low power profile might enable computation at the point of data collection. Using the single-chip NS1e board and an android tablet, ARL researchers created a demonstration system that allows visitors to their lab to hand write arithmetic expressions on the tablet, with handwriting streamed to the NS1e for character recognition and recognized characters sent back to the tablet for arithmetic calculation. Of course the point here is not to make a handwriting calculator, it is to show how TrueNorth's low power and real time pattern recognition might be deployed at the point of data collection to reduce latency, complexity and transmission bandwidth, as well as back end data storage requirements in distributed systems.

Tablet Calculator

Tablet Handwriting Calculator based on TrueNorth

Air Force Research Lab contributed another prototype application utilizing a TrueNorth scale-out system to perform a data-parallel text extraction and recognition task. In this application, an image of a document is segmented into individual characters that are streamed to AFRL's NS1e-16 TrueNorth system for parallel character recognition. Classification results are then sent to an inference-based natural language model to reconstruct words and sentences. This system is able to process 16,000 characters per second--about six times more characters than are in this section so far. AFRL plans to eventually implement the word and sentence inference algorithms on TrueNorth as well.

Document Processor

Parallel hand-written documents recognition.

Lawrence Livermore National Lab has taken delivery of a sixteen-chip scale up system to explore the potential of post-von Neumann computation through larger neural models and more complex algorithms enabled by the native tiling characteristics of the TrueNorth chip. For the Supercomputing paper, they contributed a single-chip application performing in-situ process monitoring in an additive manufacturing process. LLNL trained a TrueNorth network to recognize 7 classes related to track weld quality in welds produced by a selective laser melting machine. Real-time weld quality determination allows for closed loop process improvement and immediate rejection of defective parts. This is one of several applications LLNL is developing to showcase TrueNorth as a scalable platform for low-power, real time inference.


Manufacturing defects detected by TrueNorth classifier

It is a great honor for our team to collaborate with the many and varied members of our partner ecosystem. We are inspired by their energy, ingenuity and passion, as we set out to explore the potential of TrueNorth together!

November 08, 2016

2016 Scientist of the Year By R&D Magazine


This is 51st Anniversary of the Award. Previous awardees include John Bardeen, Nobel Prize; Tim Berners-Lee, director of the World Wide Web Consortium; Steve Chu, Nobel Prize; Ralph Gomory, Former IBM SVP of Science & Technology; Leroy Hood, Human Genome Pioneer; Bill Joy, founder of Sun Microsystems; Kary Mullis, Nobel Prize; Calvin Quate, Atomic force microscopy; Justin Rattner, Former Director of Intel Labs; J. Craig Venter, Human Genome Pioneer; and Stephen Wolfram, Founder & CEO of Wolfram Research.

First and foremost, I am most grateful to my colleagues whose collaboration and dedication has been the key to our shared success. I am grateful to IBM Research, IBM, U.S. Department of Defense (Defense Advanced Research Projects Agency, Air Force Research Lab, Army Research Lab) and U.S. Department of Energy (Lawrence Livermore National Lab and Lawrence Berkeley National Lab) for their support of this work over the last 12 years. I am grateful to 150+ researchers at 40+ universities who are experimenting with TrueNorth Ecosystem. I am grateful to my mentors. I am grateful to my alma maters IIT Bombay and UCSD and my teachers at both schools. I am grateful to my family and friends for their love and support.

The cover picture is credit to Hita Bambhania-Modha. Her professional Facebook page https://www.facebook.com/hita.life/ just launched. And, don’t miss her website http://www.hita.life.

September 23, 2016

Breaking News: Energy-Efficient Neuromorphic Classifiers

A paper entitled "Energy-Efficient Neuromorphic Classifiers" was published this week in Neural Computation by Daniel Martí, Mattia Rigotti, Mingoo Seok, and Stefano Fusi.

Here is the abstract:

Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered.

September 21, 2016

Deep learning inference possible in embedded systems thanks to TrueNorth

See the beautiful blog by Caroline Vespi on our PNAS paper.

September 20, 2016

Breaking News: Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing

Guest Blog by Steven K. Esser

We are excited to announce that our work demonstrating convolutional networks for energy-efficient neuromorphic hardware just appeared in the peer-reviewed Proceedings of the National Academy of Sciences (PNAS) of the United States of America. Here is an open-access link. In this work, we show near state-of-the-art accuracy on 8 datasets, while running at between 1200 and 2600 frames per second and using between 25 and 275 mW running on the TrueNorth chip. Our approach adapts deep convolutional neural networks, today's state-of-the-art approach for machine perception, to perform classification tasks on neuromorphic hardware, such as TrueNorth. This work overcomes the challenge of learning a network that uses low precision synapses, spiking neurons and limited-fan in, constraints that allow for energy-efficient neuromorphic hardware design (including TrueNorth) but that are not imposed in conventional deep learning.

Low precision synapses make it possible to co-locate processing and memory in hardware, dramatically reducing the energy needed for data movement. Employing spiking neurons (which have only a binary state) further reduces data traffic and computation, leading to additional energy savings. However, contemporary convolutional networks use high precision (at least 32-bit) neurons and synapses. This high precision representation allows for very small changes to the neural network during learning, facilitating incremental improvements of the response to a given input during training. In this work, we introduce two constraints to the learning rule to bridge deep learning to energy-efficient neuromorphic hardware. First, we maintain a high precision "shadow" synaptic weight during training, but map this weight to a trinary value (-1, 0 or 1) implementable in hardware for operation as well as, critically, to determine how the network should change during the training itself. Second, we employ spiking neurons during both training and operation, but use the behavior of a high precision neuron to predict how the spiking neuron should change as network weights change.

Limiting neuron fan-in (the number of inputs each neuron can receive) allows for a crossbar design in hardware that reduces spike traffic, again leading to lower energy consumption. However, in deep learning no explicit restriction is placed on inputs per neuron, which allows networks to scale while still remaining integrated. In this work, we enforce this connectivity constraint by partitioning convolutional filters into multiple groups. We maintain network integration by interspersing layers with large topographic coverage but small feature coverage with those with small topographic coverage but large feature coverage. Essentially, this allows the network to learn complicated spatial features, and then to learn useful mixtures of those features.

Our results show that the operational and structural differences between neuromorphic computing and deep learning are not fundamental and indeed, that near state-of-the-art performance can be achieved in a low-power system by merging the two. In a previous post, we mentioned one example of an application running convolutional networks on TrueNorth, trained with our method, to perform real-time gesture recognition.

The algorithm is now in the hands of over 130 users at over 40 institutions, and we are excited to discover what else might now be possible!

September 01, 2016

Career Opportunity: Brain-inspired Computing

Apply here.

August 23, 2016

Samsung turns IBM's brain-like chip into a digital eye

On occasion of the 30th Anniversary of IBM Research - Almaden, Samsung, Air Force Research Lab, and Lawrence Livermore National Lab presented their results using TrueNorth Ecosystem. Here is an article written by a reporter who was in the audience. A video of the session will be posted soon.

During the same event, Professor Tobi Delbruck presented the Misha Mahowald Prize to the IBM TrueNorth Team. See here.

Misha Mahowald Prize

August 12, 2016

TrueNorth Accepted into the Computer History Museum

On the occasion of the 30th Anniversary of IBM Research - Almaden, TrueNorth was accepted into the Computer History Museum. Here is IBM Press Release.



  • Talk
Creative Commons License
This weblog is licensed under a Creative Commons License.
The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.