Please see a blog post by Brian Taba on our CVPR 2017 paper “A Low Power, Fully Event-Based Gesture Recognition System“. Here is PDF of the paper “A Low Power, Fully Event-Based Gesture Recognition System“. Video of the system in action follows:
Photo Credit: Hita Bambhania-Modha
On June 17, 2017, Dean Al Pisano invited me to present the Keynote Speech at the Ring Ceremony at UCSD’s Jacobs School of Engineering. Enclosed is a transcript of my remarks.
Congratulations class of 2017!
I am honored to share this pivotal day in your life
with you, your families, and your friends.
And I want to thank Dean Pisano for inviting me here,
as well as the distinguished faculty and my UCSD mentors
who have all helped shape who I am today.
As you graduate from the Engineering school,
there is a blank canvas in front of you.
The space of that canvas is the Earth and its vicinity,
and the time of the canvas is your individual life span.
On this blank canvas,
we engineer, not only, devices, materials, systems, structures, and processes,
but we also engineer
our own careers and lives,
so as to manifest strength, utility, and beauty.
The recipe for success, I believe, is three-fold:
- first, identify external gradients towards your inner purpose;
- second, capitalize on inherent opportunities presented by these gradients
using your most authentic self;
- and, lastly, engineer the means for exploiting these gradients
the chief villain in our lives,
namely, the 2nd Law of thermodynamics.
First, let us talk about gradients.
A gradient or an imbalance
is simply a difference across a distance.
For example, think of differences in
temperature, pressure, chemical concentration, voltage, incomes, etc.
The gradients are the sources of opportunity.
When yoda from starwars said, “Feel the force”,
He meant, “Feel the gradients.”
A water wheel
converts the energy gradient
of water flowing from high to low
into useful work.
Similarly, our intent is to harness the external gradients
that exist in the society and the universe—
social, economic, political, technological, physical gradients—
to create beneficial structures
and to manifest constructive complexity.
Unlocking and harnessing these gradients
requires us to apply
the infinite and inexhaustible tools of
creativity, awareness, and imagination
while leading us to discovering and extending
the frontiers of mathematics, science, and technology in the process.
In my case,
the gradient that led to the notion of brain-inspired computers
was the observation
that there was a billion-fold disparity between the function, the size, the energy, and the speed of the brain as compared to today’s computers.
Second, let us talk about purpose.
Discovery of external frontiers,
first and foremost,
starts with the internal discovery
of our own authentic self.
From this place of inner integrity,
we pick problems of universal importance
and establish audacious goals to solve them
while matching these goals to our specific individual gifts.
We then work backwards from the end goals
and chart a course to achieve these goals.
As facts change,
we never compromise on the destination
but continually revise the path.
In any situation,
we do not react
but rather we consciously act
because there is always room for creative response.
In every moment,
in every interaction,
in every relationship,
we bring all the positivity of our entire existence to bear —
and then we do it again
To truly win,
we put not just our skin in the game—
rather, we put our soul in the game.
While it’s important to strive to succeed at work,
it’s equally important to maintain work-life balance
and choose an inner state of happiness
despite life’s paradoxes and challenges.
And, regardless of success or failure,
we win, personally, by finishing what we start.
all of you have demonstrated
that you are winners!
let us talk about the villain.
The 2nd Law of thermodynamics essentially says that if a hot room is connected to a cold room then over time the temperature difference will disappear.
So, the 2nd Law of thermodynamics
serves to efface all gradients over time
leaving increased entropy,
Left to its own un-engineered devices,
the 2nd Law will produce only heat and waste.
It is not possible,
to fight or defy the second law
at a global, macroscopic level,
but within the confines of local space and time
it is indeed possible
to engineer means
by which gradients produce useful work.
had to purposefully do the hard work
of inventing and perfecting
to exploit the potential energy of water
that otherwise would have remained stagnant.
The 2nd Law will have its way eventually.
The waterwheel, for example, requires maintenance
to keep running
and ultimately will decay and descend into ruin.
But while it lasts,
it will enhance human life
serve as a step stone
to greater progress.
This is the eternal essence of engineering.
This is why
fighting the 2nd law is
in my mind,
symbolizes our resolve
to courageously stand up
to the 2nd Law in all its manifestations.
So, in conclusion,
that we meet the 2nd Law of thermodynamics,
let us rub our magic rings,
and let us look at the 2nd Law in the eye,
you are dealing
with a graduate
of the UCSD’s Jacobs School of Engineering!
Congratulations again, my friends,
and I wish you the very best of luck!
Today, Air Force Research Lab (AFRL) and IBM announce the development of a new Scale-out, Scale-up Synaptic Supercomputer (NS16e-4) that builds on previous NS16e system for LLNL. Over the last six years, IBM has expanded the number of neurons per system from 256 to more than 64 million – an 800 percent annual increase over six years!
Enclosed below is a perspective from my colleagues.
Guest Blog by Bill Risk, Camillo Sassano, Mike DeBole, Ben Shaw, Aaron Cox, and Kevin Schultz.
The IBM TrueNorth Neurosynaptic System NS16e-4 is the latest hardware innovation designed to exploit the capabilities of the TrueNorth chip. Through a combination of “scaling out” and “scaling up,” we now have increased the size of TrueNorth-based neurosynaptic systems by 64x from the original single-chip systems (Figure 1).
Figure 1. Scale-out and scale-up of systems based on the TrueNorth chip.
The NS1e board included a single TrueNorth chip and was designed to jump start learning about and using the TrueNorth system. The first “scale out” step—the NS1e-16— put sixteen of these boards in a single enclosure, which permitted running multiple jobs in parallel. The first “scale up” system—the NS16e—exploited the built-in ability of the TrueNorth chip to tile seamlessly and communicate directly with other TrueNorth chips. Where both the NS1e-16 and NS16e offered the same number of neurons and synapses, the NS1e-16 was essentially 16 separate 1-million neuron systems working in parallel, while the NS16e was a single 16-million neuron system, allowing the exploration of substantially larger neurosynaptic networks. The NS16e-4 is the next step in this evolution, bringing four NS16e systems together in parallel to provide 64 million neurons and 16 billion synapses in a single enclosure.
As announced today, the U.S. Air Force Research Lab has ordered the first NS16e-4 system. To deliver this system, we needed to devise a way to put four NS16e systems in the same enclosure, subject to the following constraints:
- It must fit in a 4U-high (7”) by 29” deep enclosure that can be mounted in a standard equipment rack
- All related components required by the system—power supplies, cabling, connectors, etc., (other than a separate 2U server that acts as a gateway for the system)—must be contained within the same 4U space
- It must be possible to easily remove each NS16e sub-system so that it can be serviced, transferred to a different identical enclosure, or used independently as a standalone system
Given the size of the previous enclosure, it was not feasible to simply put four NS16e systems in a 4U-high box and meet these constraints. Instead, we first had to redesign some aspects of the NS16e circuit boards to permit a more compact form factor. In concert, we redesigned the NS16e case to match the new form factor, while retaining the original design’s signature angular shapes.
This smaller form factor allowed us to consider several different ways that four NS16e sub-systems could be efficiently placed in the allotted space. The one we settled on places them in a unique V-shaped arrangement in a drawer (Figure 2a), which, when extended, provides easy access to the individual NS16e sub-systems. Empty spaces under the V provide room to route cables and move air for ventilation. A docking structure (not visible in the Figure 2) holds the NS16e’s in place when in use, provides power and signal connections, and permits them to be released and removed when necessary. This arrangement provides a view of all 64 TrueNorth chips. A transparent window and interior accent lighting permit the chips to be seen even when the drawer is closed. (Figure 2b). Multiple NS16e-4 systems can be placed in the same rack; the novel front panel shape creates an intriguing 3D geometric pattern (Figure 3).
Figure 2a. NS16e-4 system with drawer open.
Figure 2b. NS16e-4 system with drawer closed.
Meeting all the technical requirements required creative industrial design. We designed for utility, but were pleased that a measure of beauty and elegance emerged through the design process. We now look forward to building and delivering it!
Figure 3. Two 4U-high NS16e-4 systems stacked above a 2U-high server.
The following timeline provides context for today’s milestone in terms of the continued evolution of our project.
Illustration Credit: William Risk
Guest Blog by Jun Sawada, Brian Taba, Pallab Datta, and Ben Shaw.
This week, in collaboration with Lawrence Livermore National Laboratory, U.S. Air Force Research Laboratory, and U.S. Army Research Laboratory, IBM Research is publishing the newest paper describing the TrueNorth ecosystem in the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16). SC16 is one of the most prestigious conferences in the HPC (high performance computing) field. Please download the paper below.
This paper describes the result of multi-year effort to create development tools and an ecosystem around the TrueNorth neurosynaptic chip. Today, we have an entire stack of hardware, firmware, software, training algorithms and applications, with active users at many university, corporate and government laboratories implementing neural computation through deep-learning and convolution networks. The diagram below, a figure from the paper, shows the entire ecosystem stack based on the TrueNorth neurosynaptic chip.
No chip will attract significant usage without a good hardware and software ecosystem. Typically when a chip maker comes out with a new processor, they can rely on existing development tools, software and chip sets, so that it is seldom necessary to build an entirely new ecosystem from scratch. However, TrueNorth was a totally new architecture for neurosynaptic computation which required building many new tools from the ground up. We have worked over the past several years to build an ecosystem, and today the toolset allows users to create deep-learning network applications for TrueNorth systems with little more than the click of a button. The paper describes these tools and the ecosystem that empowers users of the TrueNorth neurosynaptic chip.
The paper describes the TrueNorth-based hardware systems we have developed: (a) a mobile evaluation board (NS1e), (b) a scale-out parallel neuromorphic server (NS1e-16), and (c) a scale-up system for running larger neural network (NS16e). The diagram below shows each system and illustrates its internal architecture.
The schematics show TrueNorth chips in green, FPGA programmable logic in blue, CPU’s in orange, and network in purple. Each system runs with control software on CPU talking to TrueNorth chips with the help of programmable logic in FPGA. For details, please see the paper.
The NS1e system is an index card-sized, mobile system consisting of a single TrueNorth chip and a Xilinx Zynq-7000 system-on-a-chip. Although it is a tiny system it has proven to be our workhorse. Many of our university partners use NS1e to run and test neural networks they create for the TrueNorth architecture.
The NS1e-16 is a scale-out system running many neurosynaptic chips in parallel. It is a collection of 16 NS1e single-chip systems with an Ethernet backbone, as shown in the diagram above. When a request to execute a neural network job comes, the gateway machine picks an available NS1e automatically and dispatches the job to it. NS1e-16 is really a small neural network data-center in a box.
Finally NS16e, the scale-up system, has a tightly connected 16-chip TrueNorth array. It can run a neural network using up to 16 million neurons and 4 billion synapses. This system can run image recognition tasks (CIFAR 10, CIFAR 100) with near state-of-the-art accuracy at over 1000 frames per second.
Work Flow of Software Ecosystem
To design applications for TrueNorth, we have built a rich stack of development tools, such as our Eedn framework for developing energy-efficient deep neuromorphic networks. This generates convolutional neural networks (CNNs) that run natively in TrueNorth hardware to achieve near-state-of-art classification accuracy in real-time at very low power.
Figure below sketches a typical workflow for developing streaming Eedn applications like the gesture-recognition application we showed at the CVPR Industry Expo, or the television remote-control application we developed in collaboration with Samsung.
The runtime flow is shown on the top row (a)-(f). A sensor (a) produces a stream of input data. It may be a spiking sensor such as a Dynamic Vision Sensor. The original sensor data is often transformed into more specialized features (b) by cropping, filtering, or other transformation, before being encoded as a stream of input spikes (c), and sent to TrueNorth chip (d). The output of TrueNorth is decoded (e) and sent to downstream application (f).
Flow for designing an Eedn application is shown on the bottom row (g)-(l). The fundamental task is to configure the TrueNorth chip with a set of network parameters. Starting from a dataset (g), the Eedn trainer (h) uses a GPU to train a CNN within the constraints of the TrueNorth architecture. The trained neural network is built into a TrueNorth model (i) and then mapped to hardware (j). Finally, it uses simulation (k) and analysis tools (l) to improve the quality of the neural network generation.
Core Placement Problem
A TrueNorth model build at (i) is a purely logical representation of a network of neurosynaptic cores. However, to configure actual hardware, every logical core in the network must be mapped to a unique physical (X, Y) location on a TrueNorth chip by a process called placement (j)). Placement optimization is a critical issue especially for large multi-chip networks for the NS16e.
TrueNorth uses a dimension-order router, in which spikes first travel horizontally (east-west) and then vertically (north-south) between their source and destination. Figure below shows an example of routing spikes within and across TrueNorth chips.
The objective of the placement is to minimize the sum of all the paths from source neurons to destination neurons, reduce the overall active power of the system, and improve the throughput. It implicitly attempts to pack all the tightly connected cores on the same chip. This problem is typically known to be NP-Hard. We developed a new NeuroSynaptic Core Placement (NSCP) Algorithm that maps the neurosynaptic cores efficiently onto the hardware substrate. The algorithm places the input neuron layer first, by collocating cores that process neighboring regions of the input signal. It then iteratively places cores in each consecutive network layer.
Lastly, a movie generated by a graph analysis tool shows how a multi-layer convolution neural network is placed on a 16-chip TrueNorth array. The visualization depicts core to core communication as edges, and illustrates how different layers of a multi-layer network are actually placed. Early layer computation is done locally while the information is shared between chips in higher layers. Some images from the movies are enclosed below.
As we build out the TrueNorth platform, tool chain and software ecosystem, we rely on a number of distinguished labs and other ecosystem partners to illuminate and help explore the space of TrueNorth’s application potential.
Since we held our first TrueNorth Boot Camp in August 2015 with a core group of 65 committed early adopters, our User Community has expanded to include over 150 members from 25 Universities, 7 Government labs and 3 National labs, as well as 8 corporate research centers around the world.
TrueNorth’s extremely low power profile and guaranteed real-time operation make it a natural fit for applications ranging from mobile/embedded devices to HPC and supercomputing. Table 1 from our paper lists various research areas currently under study by our ecosystem partners. The range and variety convey some idea of the space of TrueNorth application potential our partners have begun to explore.
Table of Our TrueNorth Ecosystem Partners
In May, we held a Reunion of our BootCamp community to showcase some of the amazing things that have been accomplished in just a few months. In addition to algorithmic and toolchain developments from our team, 16 posters and stage presentations were delivered by ecosystem partners. These projects illustrate a range of research topics and are a testament to the motivation and energy of our pioneering TrueNorth developers.
Three of our partners, Army Research Lab, Air Force Research Lab and Lawrence Livermore National Lab, contributed sections to the Supercomputing paper, each application showcases a different TrueNorth system.
Army Research Lab prototyped a computational offloading scheme to illustrate how TrueNorth’s low power profile might enable computation at the point of data collection. Using the single-chip NS1e board and an android tablet, ARL researchers created a demonstration system that allows visitors to their lab to hand write arithmetic expressions on the tablet, with handwriting streamed to the NS1e for character recognition and recognized characters sent back to the tablet for arithmetic calculation. Of course the point here is not to make a handwriting calculator, it is to show how TrueNorth’s low power and real time pattern recognition might be deployed at the point of data collection to reduce latency, complexity and transmission bandwidth, as well as back end data storage requirements in distributed systems.
Tablet Handwriting Calculator based on TrueNorth
Air Force Research Lab contributed another prototype application utilizing a TrueNorth scale-out system to perform a data-parallel text extraction and recognition task. In this application, an image of a document is segmented into individual characters that are streamed to AFRL’s NS1e-16 TrueNorth system for parallel character recognition. Classification results are then sent to an inference-based natural language model to reconstruct words and sentences. This system is able to process 16,000 characters per second–about six times more characters than are in this section so far. AFRL plans to eventually implement the word and sentence inference algorithms on TrueNorth as well.
Parallel hand-written documents recognition.
Lawrence Livermore National Lab has taken delivery of a sixteen-chip scale up system to explore the potential of post-von Neumann computation through larger neural models and more complex algorithms enabled by the native tiling characteristics of the TrueNorth chip. For the Supercomputing paper, they contributed a single-chip application performing in-situ process monitoring in an additive manufacturing process. LLNL trained a TrueNorth network to recognize 7 classes related to track weld quality in welds produced by a selective laser melting machine. Real-time weld quality determination allows for closed loop process improvement and immediate rejection of defective parts. This is one of several applications LLNL is developing to showcase TrueNorth as a scalable platform for low-power, real time inference.
Manufacturing defects detected by TrueNorth classifier
It is a great honor for our team to collaborate with the many and varied members of our partner ecosystem. We are inspired by their energy, ingenuity and passion, as we set out to explore the potential of TrueNorth together!
This is the 51st Anniversary of the Award. Previous awardees include John Bardeen, Nobel Prize; Tim Berners-Lee, director of the World Wide Web Consortium; Steve Chu, Nobel Prize; Ralph Gomory, Former IBM SVP of Science & Technology; Leroy Hood, Human Genome Pioneer; Bill Joy, founder of Sun Microsystems; Kary Mullis, Nobel Prize; Calvin Quate, Atomic force microscopy; Justin Rattner, Former Director of Intel Labs; J. Craig Venter, Human Genome Pioneer; and Stephen Wolfram, Founder & CEO of Wolfram Research.
First and foremost, I am most grateful to my colleagues whose collaboration and dedication has been the key to our shared success. I am grateful to IBM Research, IBM, U.S. Department of Defense (Defense Advanced Research Projects Agency, Air Force Research Lab, Army Research Lab) and U.S. Department of Energy (Lawrence Livermore National Lab and Lawrence Berkeley National Lab) for their support of this work over the last 12 years. I am grateful to 150+ researchers at 40+ universities who are experimenting with TrueNorth Ecosystem. I am grateful to my mentors. I am grateful to my alma maters IIT Bombay and UCSD and my teachers at both schools. I am grateful to my family and friends for their love and support.
The cover picture is credit to Hita Bambhania-Modha. Her professional Facebook page https://www.facebook.com/hita.life/ just launched. And, don’t miss her website http://www.hita.life.