# Memory-Based Neuromorphic Hardware for Advanced Neural Network Models

D.B. Strukov UC Santa Barbara

<u>Acknowledgments</u>: G. Adam, F. Alibart, M. Bavandpour, B. Chakrabarti, N. Do, J. Edwards, M. Graziano, X. Guo, B. Hoskins, I. Kataeva, M. Klachko, H. Kim, K. Likharev, M.R. Mahmoodi, F. Merrikh Bayat, H. Nili, M. Prezioso, S. Sahay, A. Vincent



## Part I. Introduction (Brief overview of neurocomputing, mixed-signal hardware for simpler models)

# **NEURAL NETWORK MODELS**



g/neural-network-zoo/

recommendation, analytics applications etc.)

## STATE-OF-THE-ART (ALEXNET) DEEP LEARNING HARDWARE: CUSTOM DIGITAL CIRCUITS



**Eyeriss** 

Y.H. Chen et al., ISSCC'16



B. Moons et al., ISSCC'17



UNPU

J. Lee et al., ISSCC'18





B. Zimmer *et al., VLSISymp*'19

|                                       | Eyeriss (2016) | Envision (2017)                       | UNPU (2018)                                        | NVidia (2019)  |
|---------------------------------------|----------------|---------------------------------------|----------------------------------------------------|----------------|
| Technology (nm)                       | 65 LP CMOS     | 28 UTBB FD-SOI                        | 65                                                 | 16             |
| Peak performance [GOPS]               | 67             | (1 to 4) × 102                        | 1382 (4) /345 (16)                                 | 4010 (8 bit)   |
| Active area [mm <sup>2</sup> ]        | 12.25          | 1.87                                  | 16                                                 | 3.1            |
| Filter size                           | 1-1024         | any                                   | any                                                | any            |
| Precision [b]                         | 16             | 4-16                                  | 1-16                                               | 8              |
| Power [mW] @ frame rate [fps]         | 278 @ 34.7*    | 44 @ 47*                              | 297 @ ? *                                          | ?              |
| Min/max energy efficiency<br>[TOps/J] | 0.15 - 0.35    | 0.26 – 10<br>(~ <b>2.5</b> for 4-bit) | 50.6 (1 bit) / <b>11.6</b> (4 bit) / 3.08 (16 bit) | ~ 9.09 (8 bit) |

- Saturating improvements for purely digital implementations
- Biology is five-six orders of magnitude more energy efficient

 AlexNet convolutional layers only

## BASIC IDEA FOR RADICAL IMPROVEMENT: ANALOG VECTOR-BY-MATRIX MULTIPLICATION

### <u>VMM</u>:



#### Analog VMM: using the Ohm & Kirchhoff laws



## Features:

- physical-level, in-memory, fast, very energy-efficient
- proposed by Widrow in 1960s, popularized by Carver Mead and students (CalTech) in the 1980s
- no dense adjustable-conductance crosspoint devices until recently 5

## **UC SANTA BARBARA'S MEMRISTORS**

#### • 64 × 64 crossbar circuit





H. Kim et al. arXiv 2019

Background work: M. Prezioso et al., Nature 521, 61 2015, M. Prezioso et al. IEDM'15 p. 17.4.1, 2015, F. Merrikh Bayat et al. Nature Comm., 2018

#### **Details:**

- Al<sub>2</sub>O<sub>3</sub>/TiO<sub>2-x</sub> active bilayer by reactive sputtering
- ~250 nm wide lines, passive (0T1R) integration
- CMP/dry etching and TiN/Al electrodes for higher conductance
- Higher as-fabricated film conductance  $\rightarrow$  low forming voltage  $\rightarrow$  very uniform I-Vs

>250x/10,000x better memristor/memory cell density compared to 1T1R work from HPL/UMass collaboration at comparable complexity

## **ANALOG APPLICATION DEMO WITH 4K-DEVICE CROSSBAR**

#### Conductance tuning



#### MNIST classification

H. Kim et al. arXiv 2019



- Ex-situ trained single 64-10 single layer perceptron Emulated neuron functionality
- Very close to simulation measured fidelity (within 1.5%) for highest fidelity

## **NEUROCOMPUTING BASED ON FLOATING GATE DEVICES: ARHICTECTURE AND CHIP DEMO**

input data 1



Multilayer perceptron circuit



Area breakdown and chip layout





F. Merrikh Bayat et al., TNNLS'18, X. Guo et al., IEDM'17

#### **Classifier features:**

**High-level architecture** 

- 28x28 B/W input, 10-class output
- >100,000 NOR flash synapses, 64 hidden layer **CMOS** neurons
- 180-nm process with eFlash
- Differential implementation of synaptic weights
- High voltage circuitry for weight import

## NEUROCOMPUTING BASED ON FLOATING GATE DEVICES: EXPERIMENTAL RESULTS



#### **Experimental measured performance for MNIST:**

- 94.65% experimental fidelity (96.5% theoretical)
- < 1-µs latency, < 20 nJ energy per pattern (reserves for improvement for both with better neuron design)
- Much better in speed and energy efficiency over purely digital circuits at comparable MNIST fidelity (6 orders of magnitude better energy-delay compare to IBM TrueNorth)
- Reproducible, temperature insensitive, no change in performance after 7 months



Latency (one pattern)

**Power consumption** 



F. Merrikh Bayat et al., TNNLS'18, X. Guo et al., IEDM'17 9

1

# Part II. Hardware for Stochastic Neural Networks

## **STOCHASTIC NEUROCOMPUTING**

### Molecular-level operations in the brain are stochastic, e.g.



Neurotransmitter Neurotransmitter transporter Synaptic Axon vesicle terminal Voltagegated Ca++ Synaptic channel Receptor Postsynaptic cleft Image source: Dendrite density Wikipedia

Stochastic (binary) neuron



- Stochastic networks
- Boltzmann machines
- Restricted Boltzmann machines
- Stochastic Hopfield networks
- Deep Believe networks
- Bayesian networks

#### Need efficient implementations of both dot-product and stochastic functionality <sup>11</sup>

Neurotransmitter release at synaptic cleft

...

## STOCHASTIC DOT PRODUCT CIRCUIT



- Rely on intrinsic (from memory cells) and/or externally injected noise
- Sigmoid slope (computing temperature) controlled by the applied ( $V_{ON}$ ) voltage

Imax(µA)

5.8 2.78

1.1

0.275

1.0

## NEUROOPTIMIZATION WITH STOCHASTIC HOPFIELD NETWORKS

Hopfield network with annealing



#### Color background highlights circuitry for:

- Baseline Hopfield neural network
- Stochastic annealing
- Adjustable energy function annealing
- Chaotic annealing
- $\times$  = scaling  $\Sigma$  = summing

M. Mahmoodi et al., *submitted* April 2019; Background work: L. Gao et al., in: *Proc. NanoArch'* 13, Ney York, NY July 2013; X. Guo et al., *Frontiers in Neuroscience* **9**, art. 488, Dec. 2015

Graph partitioning problem



#### Experimental results



#### Comparison with other approaches

Adapted from ArXive 1903.11194

|                         | Conventional |      | Emerging technology  |                     | Our work  |           |
|-------------------------|--------------|------|----------------------|---------------------|-----------|-----------|
|                         | CPU          | GPU  | D-Wave               | <b>Fiber optics</b> | Memristor | NOR flash |
| Time to solution (µs)   | 220          | 10   | 1010                 | 600                 | 3         | 10        |
| Energy to solution (µJ) | 4000         | 2500 | 250×10 <sup>12</sup> | ?                   | 0.2       | 0.6       |

Part III. Neuromorphic Hardware for Spiking Neural Networks

## **SPIKING NEURAL NETWORKS**



- Information encoded in timing of spikes (rate vs. temporal)
  - coordinated processing of spatial-temporal information
  - believed to be more energy efficient
- Local learning rules for synaptic weight update ightarrow suitable for online training and HW friendly
- More biologically plausible (but so far outperformed by firing-rate ANNs in virtually all machine learning tasks)

## **SPIKE-TIMING-DEPENDENT PLASTICITY**

### Main idea

### STDP in cultured hippocampal neurons



# STDP is essential feature of spiking neural networks

Figure 7. Critical window for the induction of synaptic potentiation and depression. The percentage change in the EPSC amplitude at 20–30 min after the repetitive correlated spiking (60 pulses at 1 Hz) was plotted against the spike timing. Spike timing was defined by the time interval ( $\Delta t$ ) between the onset of the EPSP and the peak of the postsynaptic action potential during each cycle of repetitive stimulation, as illustrated by the traces above. For this analysis, we included only synapses with initial EPSC amplitude of <500 pA, and all EPSPs were subthreshold for data associated with negatively correlated spiking. Calibration: 50 mV, 10 msec.

#### Bi and Poo, Annual Rev. Neurosci. 2001. 24:139-66

o

## STATE-OF-THE-ART DIGITAL SPIKING NEUROMOPRHIC HARDWARE

Intel Loihi 2018



14 nm 128 M synapses 128 M neurons 2.07 B transistors on-chip learning

IBM TrueNorth 2014



28 nm 256 M synapses 1 M neurons 5.4 B transistors inference only

## EXPERIMENTAL DEMONSTRAITON OF STDP BASED ON MEMRISTIVE CROSSBAR CIRCUITS



#### Voltage across memristor

-60

-40 -20

0

∆t(ms)

20

40

60

## COINCIDENCE DETECTION BY PASSIVE MEMRISTOR-BASED SPIKING NEURAL NETWORK



Main results: Demonstration of unsupervised learning of spike coincidence (i.e. spiking on synchronous input) via STDP mechanism

## **CHALLENGES FOR SPIKING NEURAL NETWORKS**



More severe impact of d2d + c2c variations than for ex-situ-trained systems Require higher switching endurance

## **FUTURE WORK AND CHALLENGES**

### Important future work

 Improving memristor technology, monolithic integration with CMOS, e.g. 3D CMOL



memristor crossbar add-on (synapses and interconnect)

CMOS stack (neurons and other functions)

- >1000x better in energy-delay over purely digital system (experiment for smaller scale, sim for larger scale systems)
- 10<sup>13</sup> synapses per cm<sup>2</sup> for 100-layer 10-nm memristive crossbar circuits (~30x less compared to human brain)

- Challenges
- Device yield, device to device and cycle to cycle variations in I-V, switching endurance (but more tolerant to defects than logic)



- Economical and confidence barriers
- Lack of algorithms for higher intelligence

# **THANK YOU!**