ISCA paper preprint about Google's Tensor Processing Unit

Blog post by +Norm Jouppi:

Last June at Google I/O, +Sundar Pichai showed an example of a new type of custom ASIC that Google had developed to accelerate machine learning workloads, called a Tensor Processing Unit (TPU), but didn't give very many details. The TPU is used to run large neural networks very efficiently and with low latency throughout many Google products, including Search, Photos, Translate, and also powered the AlphaGo system used during the match against Lee Sedol in Korea last March, and offers 92 trillion operations per second (TOPs) per chip with a modest power budget. I'm happy to announce that we now have a detailed paper In-Datacenter Performance Analysis of a Tensor Processing Unit​ that will appear in this year's International Symposium on Computer Architecture (ISCA) conference in Toronto in June. Today we've published a pre-print of the paper and a companion blog post, and +David Patterson will be giving a talk about the TPU at the Computer History Museum in Mountain View this afternoon ( sadly no more space is available).

Various news articles:
Hacker News discussion:

It is important to examine the long latency tails of systems, even when they appear fast (I take this seriously even when on vacation).

+Luiz André Barroso and I wrote an article called "The Tail at Scale" about managing latency variability in large-scale distributed systems.  The article was just published in this month's CACM:
