This week, we continue our deep dives into Google excellent papers for 2011 ( with a closer look at “The Impact of Memory Subsystem Resource Sharing on Datacenter Applications,” by Lingjia Tang, Jason Mars, Neil Vachharajani, Mary-Lou Soffa, and Robert Hundt ( Here, Robert shares some key findings from their paper that can help improve efficiency in data centers:

One of the major challenges facing the research community today is understanding the emerging class of applications and web services that live in massive scale data center platforms like those used by Google. To continue innovating both the processor architecture and system software components that are essential to data center platforms, a better understanding of how these types of workloads interact with the underlying memory systems of modern servers is critical. Before this work, the conventional wisdom in the research community was that sharing of caches and other memory subsystem components do not have a significant impact on contemporary multithreaded workloads. These conclusions were drawn using academic benchmark workloads, namely the PARSEC suite, widely used by many researchers around the world. However, these popular benchmark suites (SPEC, PARSEC, etc) do not resemble emerging datacenter application workloads, and in fact, as this work shows, these conclusions do not apply to emerging data center workloads.

In this work, the authors expose key memory characteristics of the emerging workloads that Google creates, and show how to enhance system software to take advantage of these characteristics to improve efficiency in datacenters. Contrary to the common wisdom, the authors find that across several key datacenter applications including websearch, there is both a sizable benefit and a potential degradation from improperly sharing micro-architectural resources on a single machine (such as on-chip caches and bandwidth to memory). The performance variability between the worst and the optimal sharing patterns is significant for these workloads (up to 40%). In addition, the authors discovered that the best thread-to-core mapping for a given application does not only depend on the application’s sharing and memory characteristics, it is also impacted dynamically by the characteristics of other applications co-running on the same machine simultaneously.

The authors then present a heuristics-based algorithm that leverages knowledge of an application’s sharing characteristics to predict the optimal thread-to-core mapping for the application’s threads, when it is running alone, as well as with other applications. They also present an adaptive approach that uses a competition heuristic to learn the best performing mapping online. By employing an adaptive thread-to-core mapper, the authors improved the performance of the datacenter applications by up to 22% over the status quo sharing-agnostic, thread-to-core mapping and achieved performance within 3% of optimal.
Shared publiclyView activity