PIM-ORC

Overview

PIM-ORC is an ORC reader implemented using DPUs in UPMEM. PIM-ORC is integrated with Trino (Previously known as Presto). The goal of the project is to speed up the query execution by offloading ORC parsing to UPMEM machines.

From previous projects, we learned that Snappy compression/ decompression can be accelerated using DPUs [1]. ORC files are compressed in blocks using Snappy algorithm. You can read more on ORC format here.

Setup

There are currently two systems that use PIM-ORC. One is a simple multi-threaded ORC reader that aggregates all the first columns and prints the result. To setup and compile this module, follow the instructions here .

The second system integrated with PIM-ORC is Trino, a distributed SQL query engine used for data analytics. PIM-ORC is used in trino-orcwhich is Trino's implementation of an optimized ORC reader. To compile and run PIM-trino, follow the instructions on the Github page here.