PIM-CACHED

From UBC Wiki

Overview

Memcached is used by big data centres such as Facebook and Twitter to cache keys and values on dedicated servers. Memcached servers act as a cache layer to the main storage servers, thus, they will help reduce the load on the storage servers. These servers don't communicate with each other; they retrieve value on a get(key) request and update/set a value on update(key, value) request. The goal of this project is to reduce memcached TCO (total cost of ownership) by using UPMEM machines as a set of memcached servers. You can learn more about the motivation of this project here.

Milestones

M1

The objective of M1 is to find the base line of Memcached server throughput. The benchmarks and their configurations are listed below (more to be added):

  • Benchmark 1: One instance of Memcached server (Recommended)
    • Memory: 4, 16, 32GB
    • Get:set ratio : 100:0 - 70:30 (100:0 is the easiest and is viable for some workloads, 70:30 is more realistic for most workloads)
    • Threads: nproc (2-8)
    • Key/value size distribution: Fixed, Similar to Facebook's ETC distribution
    • Aiming for miss rate: <20%
    • window size: 4-5K
  • Benchmark 2: n instances of Memcached servers where n = total memory/64MB
    • Memory: 64 MB
    • Other configs are similar to the benchmark 1

M2

Design and implement an in-memory hash table inside a DPU

M3

Scheduling policies to divide get/set request across multiple DPUs

M4

Engineer techniques for access to DPUs via RDMA.

M5

Set up a representative Memcache workload in a three-tier setup using tools such as Mcrouter. Evaluate performance, power consumption and TCO.

M6

Write up and publish the results.

Benchmarks

Memaslap

to set up Memaslap follow the instructions here. If you wish to change the configuration, follow the Memaslap documentation. The default configuration for Memaslap is as follows:

  • get:set ratio: 9:1
  • Key size: Fixed, 64 bytes
  • Value size: Fixed, 1024 bytes
  • Concurrency: 16
  • # Connections: 1 per concurrency

Memcache-perf

This benchmark is a Memcached load generator designed for high request rates. Using this tool, we can generate Facebook's ETC workload that is described in SIGMETRICS’12 paper. The default configuration for Memcache-perf is:

  • get:set ratio: 1:0
  • Key size: Fixed, 30 bytes
  • Value size: Fixed, 200 bytes
  • Concurrency: 1
  • # Connections: 1 per concurrency

YCSB

YCSB is a popular KVS workload generator. It has predefined workloads to measure performance of Memcached. The configuration of the benchmark depends on the workload you run. For example, Workload C has the following properties.

  • get:set ratio: 1:0
  • Field size: 10 x 100
  • Concurrency: 1
  • # Connections: 1 per concurrency

Related Work

Papers below are related to PIM-CACHED project. Some papers are more relevant than the others; The ranking, in terms of relevance, is shown in the last column with *** indicating the most and * indicating the least relevant.

Year Conference Title & Link Rel
2011 SOSP SILT: A Memory-Efficient, High-Performance Key-Value Store *
2011 CSAIL Tech Report CPHASH: A Cache-Partitioned Hash Table **
2013 NSDI Scaling Memcache at Facebook ***
2014 OSDI FaRM: Fast Remote Memory ***
2014 SIGCOMM Using RDMA efficiently for key-value services ***
2014 - Introducing mcrouter: A memcached protocol router for scaling memcached deployments
2015 SoCC MemcachedGPU: Scaling-up Scale-out Key-value Stores *
2016 Consistent Hashing with Bounded Loads
2016 VLDB Bluecache: a scalable distributed flash-based key-value store ?
2017 SOSP KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC **
2017 SOSP NetCache: Balancing Key-Value Stores with Fast In-Network Caching **
2020 NSDI FileMR: Rethinking RDMA Networking for Scalable Persistent Memory **
2020 EuroSys StRoM: Smart Remote Memory **
2020 USENIX ATC Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores ***
2020 OSDI AIFM: High-Performance, Application-Integrated Far Memory **
2021 ASPLOS Rethinking Software Runtimes For Disaggregated Memory *
2021 USENIX ATC Avocado: A Secure In-Memory Distributed Storage System *
2021 USENIX ATC Improving Performance of Flash Based Key-Value Stores Using Storage Class Memory as a Volatile Memory Extension **

Useful Links

https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158773267967200%7D&path=%2Fnotes%2Fnote%2F&_rdr

https://github.com/fbmarc/facebook-memcached-old [out-dated]

https://github.com/facebook/mcrouter