Skip to main content

PlantD

Performance, Latency ANalysis and Testing for Data pipelines

PlantD is a harness for measuring the performance of data pipelines during and after development. PlantD collects a standard suite of metrics and visualizations, for use when developing or deciding among data pipeline architectures, configurations, and business use cases.

Get Started

Concepts

To use PlantD to measure a data pipeline, configure it with the following information:

Your Endpoint

How to reach your pipeline-under-test: a description of the pipeline you want to measure, including at least an IP address and port number to send data in, and tags that uniquely identify your pipeline's resources on your cloud provider.

Your Data Schema

The data schema that your pipeline requires as input, that is, what data items are fed into the pipeline, as well as their data format and allowable values. From this, PlantD will generate a dataset: a quantity of generated fake data that meets that schema, for use in testing.

A Load Pattern

How fast and for how long should experimental data be fed to your pipeline? For example: 100 records per second steadily for 5 minutes, then ramping up over 1 minute to 200 records per second, staying steady for 10 minutes, then ramping down to 0 over a 2 minute span.

Your Experiment

PlantD's load generator will send data to your pipeline following this pattern, and collect metrics: cost, latency, and throughput.

Prerequisites

A Data Pipeline

PlantD can measure performance of many data pipelines. The pipeline can be implemented on premises, in a commercial cloud, or on a Kubernetes cluster. Your pipeline should not be in production; PlantD's load generator will send synthetic data to your pipeline; real traffic will not only interfere with the experiment, but also be contaminated by the synthetic data. If you just want to experiment, we provide a toy data pipeline for experimenting.

A Kubernetes Cluster

We recommend running PlantD on a Kubernetes cluster. Most commercial cloud providers provide easy ways to set up such clusters. If you want to experiment, you can run a small cluster on your local machine using minikube.

About Us

PlantD is maintained by CMU's TEEL Labs, and funded by Honda's 99P Labs. Get in touch with us if you have questions or comments. PlantD is a work in progress; we're eager to find out how you're using PlantD, and how we can improve it.

TEEL Labs Logo

TEEL Labs

TEEL is a research group at Carnegie Mellon University, led by Professor Majd Sakr.

Email: teel@andrew.cmu.edu

99P Labs Logo

99P Labs

99P Labs is a research group at Honda Research Institute USA, Inc.

Email: support@99plabs.com