Test Pipeline (Optional)

Introduction

This test pipeline is designed to evaluate and validate the functionality of the Benchmarking Tool, PlantD. It allows you to simulate an e-commerce scenario to test the service's performance and error handling capabilities.

Phases

The test pipeline includes three phases.

Extract Phase

Acts as an entry point to the pipeline
Receives data in the form of zip files
Extracts and puts the CSV file content on the Kafka queue

Transform Phase

Data validation to remove corrupt or incomplete records
Normalization of product names, prices, and other fields for consistency

Load Phase

Cleaned, aggregated, and enriched data is loaded into a SQL database for further analytics.

Usage

Clone the repository:

git clone https://github.com/CarnegieMellon-PlantD/plantd-test-pipeline

Navigate to the project directory:
```
cd plantd-test-pipeline
```

Deploying the pipeline

Deploy the pipeline by applying the YAMLs under the k8s folder in a Kubernetes cluster.

# All pipelines are deployed in a specific namespace
kubectl create ns test-pipeline

# Apply the pipeline YAMLs to deploy the microservices, MariaDB, Kafka and Zookeeper
kubectl apply -f k8s/

Running the experiment

In the cluster that has PlantD deployed, use the YAMLs under the plantd_yamls to setup an experiment.

plantd_yamls directory contains k8s CRs needed for the experiment. Pipeline resource has a dependency on DataSet, and Experiment has a dependency on LoadPattern and Pipeline. Hence, you will need to apply the CRs in the following order -

schema (kubectl apply -f schema.yaml)
load pattern (kubectl apply -f loadpattern.yaml)
dataset (kubectl apply -f dataset.yaml)
pipeline (kubectl apply -f pipeline.yaml)
experiment (kubectl apply -f experiment.yaml)

Test Pipeline (Optional)

Introduction​

Phases​

Extract Phase​

Transform Phase​

Load Phase​

Usage​

Deploying the pipeline​

Running the experiment​