Computer History Museum interview on the occasion of NorthPole’s induction into the Museum. Other interviewees include: John Backus (Fortran), Brian Kernighan (UNIX), Robert Metcalfe (Ethernet, 3Com), Gordon Moore (Moore’s Law), Robert Kahn (TCP/IP), Douglas Engelbart (hypertext), Ronald Rivest (RSA), John McCarthy (LISP), Donald Knuth (analysis of algorithms), James Gosling (JAVA), John Hennessy (RISC), Ken Thompson (UNIX, B), Rodney Brooks (robotics).
Archives for 2025
EE Times Interview by Sunny Bains
Sunny Bains interviewed me for Brains and Machines. It captures our journey through DARPA SyNAPSE, TrueNorth, and NorthPole. Listen here.
SiLQ: Simple Large Language Model Quantization-Aware Training
Thrilled to share the latest work from the IBM Research NorthPole Team pushing the cutting edge of quantized large language model performance. In a recent paper, we introduce a new quantization recipe and apply it to 8 billion parameter Granite and Llama models. We demonstrate these models with 8-bit activations and cache and 4-bit weights showing minimal accuracy degradation on 3 leader boards spanning 20 distinct tasks.
Our method is high accuracy, outperforming all prior published quantization methods on the models and precisions examined, is simple, able to reuse existing training code after adding appropriate quantization and knowledge distillation, and is relatively low-cost, able to reuse existing training data or publicly available datasets, and requiring an increase in total training budget of less than 0.1%. We believe that this will be a powerful enabling tool for deploying models on ultra-low-latency inference accelerators like NorthPole, greatly enhancing the performance of latency critical applications such as interactive dialog and agentic workflows.
The paper, written with co-authors Jeffrey McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra Modha, can be found here.