Memcached is used by big data centres such as Facebook and Twitter to cache keys and values on dedicated servers. Memcached servers act as a cache layer to the main storage servers, thus, they will help reduce the load on the storage servers. These servers don't communicate with each other; they retrieve value on a get(key) request and update/set a value on update(key, value) request. The goal of this project is to reduce memcached TCO (total cost of ownership) by using UPMEM machines as a set of memcached servers. You can learn more about the motivation of this project here.



The objective of M1 is to find the base line of Memcached server throughput. The benchmarks and their configurations are listed below (more to be added):

  • Benchmark 1: One instance of Memcached server (Recommended)
    • Memory: 4, 16, 32GB
    • Get:set ratio : 100:0 - 70:30 (100:0 is the easiest and is viable for some workloads, 70:30 is more realistic for most workloads)
    • Threads: nproc (2-8)
    • Key/value size distribution: Fixed, Similar to Facebook's ETC distribution
    • Aiming for miss rate: <20%
    • window size: 4-5K
  • Benchmark 2: n instances of Memcached servers where n = total memory/64MB
    • Memory: 64 MB
    • Other configs are similar to the benchmark 1


Design and implement an in-memory hash table inside a DPU


Scheduling policies to divide get/set request across multiple DPUs


Engineer techniques for access to DPUs via RDMA.


Set up a representative Memcache workload in a three-tier setup using tools such as Mcrouter. Evaluate performance, power consumption and TCO.


Write up and publish the results.



to set up Memaslap follow the instructions here. If you wish to change the configuration, follow the Memaslap documentation. The default configuration for Memaslap is as follows:

  • get:set ratio: 9:1
  • Key size: Fixed, 64 bytes
  • Value size: Fixed, 1024 bytes
  • Concurrency: 16
  • # Connections: 1 per concurrency


This benchmark is a Memcached load generator designed for high request rates. Using this tool, we can generate Facebook's ETC workload that is described in SIGMETRICS’12 paper. The default configuration for Memcache-perf is:

  • get:set ratio: 1:0
  • Key size: Fixed, 30 bytes
  • Value size: Fixed, 200 bytes
  • Concurrency: 1
  • # Connections: 1 per concurrency


YCSB is a popular KVS workload generator. It has predefined workloads to measure performance of Memcached. The configuration of the benchmark depends on the workload you run. For example, Workload C has the following properties.

  • get:set ratio: 1:0
  • Field size: 10 x 100
  • Concurrency: 1
  • # Connections: 1 per concurrency

