Archives for 2016

Breaking News: Supercomputing 2016 Paper — TrueNorth Ecosystem for Brain-Inspired Computing

November 13, 2016 By dmodha

Guest Blog by Jun Sawada, Brian Taba, Pallab Datta, and Ben Shaw.

This week, in collaboration with Lawrence Livermore National Laboratory, U.S. Air Force Research Laboratory, and U.S. Army Research Laboratory, IBM Research is publishing the newest paper describing the TrueNorth ecosystem in the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16). SC16 is one of the most prestigious conferences in the HPC (high performance computing) field. Please download the paper below.

Download Paper

This paper describes the result of multi-year effort to create development tools and an ecosystem around the TrueNorth neurosynaptic chip. Today, we have an entire stack of hardware, firmware, software, training algorithms and applications, with active users at many university, corporate and government laboratories implementing neural computation through deep-learning and convolution networks. The diagram below, a figure from the paper, shows the entire ecosystem stack based on the TrueNorth neurosynaptic chip.

Stack

No chip will attract significant usage without a good hardware and software ecosystem. Typically when a chip maker comes out with a new processor, they can rely on existing development tools, software and chip sets, so that it is seldom necessary to build an entirely new ecosystem from scratch. However, TrueNorth was a totally new architecture for neurosynaptic computation which required building many new tools from the ground up. We have worked over the past several years to build an ecosystem, and today the toolset allows users to create deep-learning network applications for TrueNorth systems with little more than the click of a button. The paper describes these tools and the ecosystem that empowers users of the TrueNorth neurosynaptic chip.

TrueNorth Hardware
The paper describes the TrueNorth-based hardware systems we have developed: (a) a mobile evaluation board (NS1e), (b) a scale-out parallel neuromorphic server (NS1e-16), and (c) a scale-up system for running larger neural network (NS16e). The diagram below shows each system and illustrates its internal architecture.

Hardware Schematics

The schematics show TrueNorth chips in green, FPGA programmable logic in blue, CPU’s in orange, and network in purple. Each system runs with control software on CPU talking to TrueNorth chips with the help of programmable logic in FPGA. For details, please see the paper.

The NS1e system is an index card-sized, mobile system consisting of a single TrueNorth chip and a Xilinx Zynq-7000 system-on-a-chip. Although it is a tiny system it has proven to be our workhorse. Many of our university partners use NS1e to run and test neural networks they create for the TrueNorth architecture.

NS1e

The NS1e-16 is a scale-out system running many neurosynaptic chips in parallel. It is a collection of 16 NS1e single-chip systems with an Ethernet backbone, as shown in the diagram above. When a request to execute a neural network job comes, the gateway machine picks an available NS1e automatically and dispatches the job to it. NS1e-16 is really a small neural network data-center in a box.

NS1e-16

Finally NS16e, the scale-up system, has a tightly connected 16-chip TrueNorth array. It can run a neural network using up to 16 million neurons and 4 billion synapses. This system can run image recognition tasks (CIFAR 10, CIFAR 100) with near state-of-the-art accuracy at over 1000 frames per second.

NS16e

Work Flow of Software Ecosystem
To design applications for TrueNorth, we have built a rich stack of development tools, such as our Eedn framework for developing energy-efficient deep neuromorphic networks. This generates convolutional neural networks (CNNs) that run natively in TrueNorth hardware to achieve near-state-of-art classification accuracy in real-time at very low power.

Figure below sketches a typical workflow for developing streaming Eedn applications like the gesture-recognition application we showed at the CVPR Industry Expo, or the television remote-control application we developed in collaboration with Samsung.

Workflow

The runtime flow is shown on the top row (a)-(f). A sensor (a) produces a stream of input data. It may be a spiking sensor such as a Dynamic Vision Sensor. The original sensor data is often transformed into more specialized features (b) by cropping, filtering, or other transformation, before being encoded as a stream of input spikes (c), and sent to TrueNorth chip (d). The output of TrueNorth is decoded (e) and sent to downstream application (f).

Flow for designing an Eedn application is shown on the bottom row (g)-(l). The fundamental task is to configure the TrueNorth chip with a set of network parameters. Starting from a dataset (g), the Eedn trainer (h) uses a GPU to train a CNN within the constraints of the TrueNorth architecture. The trained neural network is built into a TrueNorth model (i) and then mapped to hardware (j). Finally, it uses simulation (k) and analysis tools (l) to improve the quality of the neural network generation.

Core Placement Problem
A TrueNorth model build at (i) is a purely logical representation of a network of neurosynaptic cores. However, to configure actual hardware, every logical core in the network must be mapped to a unique physical (X, Y) location on a TrueNorth chip by a process called placement (j)). Placement optimization is a critical issue especially for large multi-chip networks for the NS16e.

TrueNorth uses a dimension-order router, in which spikes first travel horizontally (east-west) and then vertically (north-south) between their source and destination. Figure below shows an example of routing spikes within and across TrueNorth chips.

Routing

The objective of the placement is to minimize the sum of all the paths from source neurons to destination neurons, reduce the overall active power of the system, and improve the throughput. It implicitly attempts to pack all the tightly connected cores on the same chip. This problem is typically known to be NP-Hard. We developed a new NeuroSynaptic Core Placement (NSCP) Algorithm that maps the neurosynaptic cores efficiently onto the hardware substrate. The algorithm places the input neuron layer first, by collocating cores that process neighboring regions of the input signal. It then iteratively places cores in each consecutive network layer.

Lastly, a movie generated by a graph analysis tool shows how a multi-layer convolution neural network is placed on a 16-chip TrueNorth array. The visualization depicts core to core communication as edges, and illustrates how different layers of a multi-layer network are actually placed. Early layer computation is done locally while the information is shared between chips in higher layers. Some images from the movies are enclosed below.

Placement Viz 1

Placement Viz 2

Placement Viz 3

Our Partners
As we build out the TrueNorth platform, tool chain and software ecosystem, we rely on a number of distinguished labs and other ecosystem partners to illuminate and help explore the space of TrueNorth’s application potential.

Since we held our first TrueNorth Boot Camp in August 2015 with a core group of 65 committed early adopters, our User Community has expanded to include over 150 members from 25 Universities, 7 Government labs and 3 National labs, as well as 8 corporate research centers around the world.

TrueNorth’s extremely low power profile and guaranteed real-time operation make it a natural fit for applications ranging from mobile/embedded devices to HPC and supercomputing. Table 1 from our paper lists various research areas currently under study by our ecosystem partners. The range and variety convey some idea of the space of TrueNorth application potential our partners have begun to explore.

TrueNorth Partners
Table of Our TrueNorth Ecosystem Partners

In May, we held a Reunion of our BootCamp community to showcase some of the amazing things that have been accomplished in just a few months. In addition to algorithmic and toolchain developments from our team, 16 posters and stage presentations were delivered by ecosystem partners. These projects illustrate a range of research topics and are a testament to the motivation and energy of our pioneering TrueNorth developers.

Three of our partners, Army Research Lab, Air Force Research Lab and Lawrence Livermore National Lab, contributed sections to the Supercomputing paper, each application showcases a different TrueNorth system.

Army Research Lab prototyped a computational offloading scheme to illustrate how TrueNorth’s low power profile might enable computation at the point of data collection. Using the single-chip NS1e board and an android tablet, ARL researchers created a demonstration system that allows visitors to their lab to hand write arithmetic expressions on the tablet, with handwriting streamed to the NS1e for character recognition and recognized characters sent back to the tablet for arithmetic calculation. Of course the point here is not to make a handwriting calculator, it is to show how TrueNorth’s low power and real time pattern recognition might be deployed at the point of data collection to reduce latency, complexity and transmission bandwidth, as well as back end data storage requirements in distributed systems.

Tablet Calculator
Tablet Handwriting Calculator based on TrueNorth

Air Force Research Lab contributed another prototype application utilizing a TrueNorth scale-out system to perform a data-parallel text extraction and recognition task. In this application, an image of a document is segmented into individual characters that are streamed to AFRL’s NS1e-16 TrueNorth system for parallel character recognition. Classification results are then sent to an inference-based natural language model to reconstruct words and sentences. This system is able to process 16,000 characters per second–about six times more characters than are in this section so far. AFRL plans to eventually implement the word and sentence inference algorithms on TrueNorth as well.

Document Processor
Parallel hand-written documents recognition.

Lawrence Livermore National Lab has taken delivery of a sixteen-chip scale up system to explore the potential of post-von Neumann computation through larger neural models and more complex algorithms enabled by the native tiling characteristics of the TrueNorth chip. For the Supercomputing paper, they contributed a single-chip application performing in-situ process monitoring in an additive manufacturing process. LLNL trained a TrueNorth network to recognize 7 classes related to track weld quality in welds produced by a selective laser melting machine. Real-time weld quality determination allows for closed loop process improvement and immediate rejection of defective parts. This is one of several applications LLNL is developing to showcase TrueNorth as a scalable platform for low-power, real time inference.

SLM

Manufacturing defects detected by TrueNorth classifier

It is a great honor for our team to collaborate with the many and varied members of our partner ecosystem. We are inspired by their energy, ingenuity and passion, as we set out to explore the potential of TrueNorth together!

2016 Scientist of the Year By R&D Magazine

November 8, 2016 By dmodha

Cover

This is the 51st Anniversary of the Award. Previous awardees include John Bardeen, Nobel Prize; Tim Berners-Lee, director of the World Wide Web Consortium; Steve Chu, Nobel Prize; Ralph Gomory, Former IBM SVP of Science & Technology; Leroy Hood, Human Genome Pioneer; Bill Joy, founder of Sun Microsystems; Kary Mullis, Nobel Prize; Calvin Quate, Atomic force microscopy; Justin Rattner, Former Director of Intel Labs; J. Craig Venter, Human Genome Pioneer; and Stephen Wolfram, Founder & CEO of Wolfram Research.

First and foremost, I am most grateful to my colleagues whose collaboration and dedication has been the key to our shared success. I am grateful to IBM Research, IBM, U.S. Department of Defense (Defense Advanced Research Projects Agency, Air Force Research Lab, Army Research Lab) and U.S. Department of Energy (Lawrence Livermore National Lab and Lawrence Berkeley National Lab) for their support of this work over the last 12 years. I am grateful to 150+ researchers at 40+ universities who are experimenting with TrueNorth Ecosystem. I am grateful to my mentors. I am grateful to my alma maters IIT Bombay and UCSD and my teachers at both schools. I am grateful to my family and friends for their love and support.

The cover picture is credit to Hita Bambhania-Modha. Her professional Facebook page https://www.facebook.com/hita.life/ just launched. And, don’t miss her website http://www.hita.life.

Breaking News: Energy-Efficient Neuromorphic Classifiers

September 23, 2016 By dmodha

A paper entitled “Energy-Efficient Neuromorphic Classifiers” was published this week in Neural Computation by Daniel Martí, Mattia Rigotti, Mingoo Seok, and Stefano Fusi.

Here is the abstract:

Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered.

Deep learning inference possible in embedded systems thanks to TrueNorth

September 21, 2016 By dmodha

See the beautiful blog by Caroline Vespi on our PNAS paper “Convolutional networks for fast, energy-efficient neuromorphic computing“.

Breaking News: Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing

September 20, 2016 By dmodha

Guest Blog by Steven K. Esser

We are excited to announce that our work demonstrating convolutional networks for energy-efficient neuromorphic hardware just appeared in the peer-reviewed Proceedings of the National Academy of Sciences (PNAS) of the United States of America. Here is an open-access link. In this work, we show near state-of-the-art accuracy on 8 datasets, while running at between 1200 and 2600 frames per second and using between 25 and 275 mW running on the TrueNorth chip. Our approach adapts deep convolutional neural networks, today’s state-of-the-art approach for machine perception, to perform classification tasks on neuromorphic hardware, such as TrueNorth. This work overcomes the challenge of learning a network that uses low precision synapses, spiking neurons and limited-fan in, constraints that allow for energy-efficient neuromorphic hardware design (including TrueNorth) but that are not imposed in conventional deep learning.

Low precision synapses make it possible to co-locate processing and memory in hardware, dramatically reducing the energy needed for data movement. Employing spiking neurons (which have only a binary state) further reduces data traffic and computation, leading to additional energy savings. However, contemporary convolutional networks use high precision (at least 32-bit) neurons and synapses. This high precision representation allows for very small changes to the neural network during learning, facilitating incremental improvements of the response to a given input during training. In this work, we introduce two constraints to the learning rule to bridge deep learning to energy-efficient neuromorphic hardware. First, we maintain a high precision “shadow” synaptic weight during training, but map this weight to a trinary value (-1, 0 or 1) implementable in hardware for operation as well as, critically, to determine how the network should change during the training itself. Second, we employ spiking neurons during both training and operation, but use the behavior of a high precision neuron to predict how the spiking neuron should change as network weights change.

Limiting neuron fan-in (the number of inputs each neuron can receive) allows for a crossbar design in hardware that reduces spike traffic, again leading to lower energy consumption. However, in deep learning no explicit restriction is placed on inputs per neuron, which allows networks to scale while still remaining integrated. In this work, we enforce this connectivity constraint by partitioning convolutional filters into multiple groups. We maintain network integration by interspersing layers with large topographic coverage but small feature coverage with those with small topographic coverage but large feature coverage. Essentially, this allows the network to learn complicated spatial features, and then to learn useful mixtures of those features.

Our results show that the operational and structural differences between neuromorphic computing and deep learning are not fundamental and indeed, that near state-of-the-art performance can be achieved in a low-power system by merging the two. In a previous post, we mentioned one example of an application running convolutional networks on TrueNorth, trained with our method, to perform real-time gesture recognition.

The algorithm is now in the hands of over 130 users at over 40 institutions, and we are excited to discover what else might now be possible!