Data Analysis in Rust

Rust is a modern systems programming language that is know for it performance and memory safety. As a modern programming language and in the age of Data and AI, can we use Rust for data analysis and data science task?

In this tutorial we will explore some of the crates or packages available in Rust for data science. Below is a list of libraries

  • Polars : This is similar to Python Pandas and it has a python implementation as well as a rust implementation.
  • DataFusion
  • Batista
  • Ndarray
  • Linfa: a Rust equivalent of scikit-learn
  • Smartcore: a Rust equivalent of MLPack
  • Plotly.rs
  • Plotars

Let us see how to use polars for data analysis in Rust

Data Analysis in Rust Using Polars and Plotly

Rust, with its strong focus on performance, safety, and concurrency, has become an attractive choice for data analysis tasks. Two powerful libraries, Polars and Plotly, can be leveraged to perform high-performance data analysis and visualization in Rust. This blog post will guide you through using these libraries to read, manipulate, and visualize data.

Setting Up Your Rust Project

To start, you need to set up a new Rust project using Cargo, Rust’s package manager.

cargo new data_analysis
cd data_analysis

Add the necessary dependencies to your Cargo.toml file:

[dependencies]
polars = "0.23.0"
plotly = "0.4.0"

Reading and Manipulating Data with Polars

Polars is a high-performance DataFrame library that allows you to efficiently read, manipulate, and analyze large datasets.

Reading a CSV File

Just like pandas we can read several file formats with polars. Here is how you can read a CSV file using Polars:

use polars::prelude::*;

fn main() {
    let df = CsvReadOptions::default()
        .try_into_reader_with_file_path(Some("diamonds.csv".into()))
        .unwrap()
        .finish()
        .unwrap();

    // Preview the last few rows of the DataFrame
    println!("{}", df.tail(Some(4)));
}

Dataframe Operations

Polars provides a variety of methods to inspect and manipulate your DataFrame:

// Get the shape of the DataFrame
println!("{:?}", df.shape());

// Get the size and number of rows
println!("{:?}", df.size());

// Select specific columns
let sv: Vec<&Column> = df.columns(["color", "clarity"]).expect("Failed to Work");
println!("{:?}", sv);

// Select columns using the `select` method
let result = df.clone().select(["cut", "price"]);
println!("{:?}", result);

// Count null values
let null_count = df.null_count();
println!("{}", null_count);

Handling Missing Values

You can handle missing values (NaN/Null) by filling them with specific values:

let df2 = df (
    "col1" => [0.5, 1.0, 1.5, 2.0, 2.5],
    "col2" => [Some(1), None, Some(3), None, Some(5)],
).expect("Failed");

let fill_literal_df = df2
    .clone()
    .lazy()
    .with_column(col("col2").fill_null(3))
    .collect().expect("Failed");
println!("{}", fill_literal_df);

GroupBy and Aggregate Operations

Polars supports advanced operations like grouping and aggregating data:

let grouped_diamonds = df.clone().lazy().group_by(["cut"]).agg([
    col("price").sum().alias("total_price"),
    col("price").mean().alias("average_price"),
    col("price").count().alias("counts"),
]).collect().expect("Failed");
println!("{:?}", grouped_diamonds);

Visualizing Data with Plotly

Plotly is a powerful data visualization library that integrates well with Rust. Here’s how you can use it to visualize your data.

Setting Up Plotly

Ensure you have the plotly crate included in your Cargo.toml file.

Creating a Bar Chart

Here’s an example of creating a bar chart using data from your DataFrame:

use plotly::{Plot, Bar};

let mut dataset = LazyCsvReader::new("diamonds.csv").finish().unwrap().collect().unwrap();
let x = dataset.column("cut").unwrap().str().into_iter().flatten().collect(); // convert to vec assuming no nulls
let y = dataset.column("price").unwrap().f64().into_iter().flatten().collect();

let trace = Bar::new(x, y).show_legend(true).opacity(0.5);
let mut plot = Plot::new();
plot.add_trace(trace);
plot.show();

This code reads the cut and price columns from the DataFrame, converts them into vectors, and then creates a bar chart using Plotly.

Customizing Plots

Plotly allows extensive customization of your plots, including titles, axis labels, and legends.

let mut plot = Plot::new();
let layout = Layout::new()
    .title(Title::new("Diamond Prices by Cut"))
    .x_axis(Axis::new().title(Title::new("Cut")))
    .y_axis(Axis::new().title(Title::new("Price")));
plot.set_layout(layout);
plot.add_trace(trace);
plot.show();

Conclusion

Rust, combined with Polars and Plotly, offers a robust and efficient environment for data analysis and visualization. Here are the key points covered:

  • Reading and Manipulating Data: Use Polars to read CSV files, select columns, handle missing values, and perform group-by and aggregate operations.
  • Visualizing Data: Use Plotly to create interactive and customizable plots such as bar charts, line charts, and more.
  • Customization: Customize your plots with titles, axis labels, and legends to effectively communicate your data insights.

By leveraging these libraries, you can build high-performance data analysis applications in Rust that are both efficient and visually appealing.

Example Code

Here is the complete example code used in this blog post:

use polars::prelude::*;
use plotly::{Plot, Bar};

fn main() {
    // Read A CSV File in Polars
    let df = CsvReadOptions::default()
        .try_into_reader_with_file_path(Some("diamonds.csv".into()))
        .unwrap()
        .finish()
        .unwrap();

    // Preview
    println!("{}", df.tail(Some(4)));

    // Shape
    println!("{:?}", df.shape());

    // Get the size,nrows
    println!("{:?}", df.size());

    // Selection of Columns
    let sv: Vec<&Column> = df.columns(["color", "clarity"]).expect("Failed to Work");
    println!("{:?}", sv);

    // Select using the Select
    let result = df
        .clone()
        .select(["cut", "price"]);
    println!("{:?}", result);

    // Missing Nan and Null
    let null_count = df.null_count();
    println!("{}", null_count);

    // Fill NaN /Null
    let df2 = df (
        "col1" => [0.5, 1.0, 1.5, 2.0, 2.5],
        "col2" => [Some(1), None, Some(3), None, Some(5)],
    ).expect("Failed");

    let fill_literal_df = df2
        .clone()
        .lazy()
        .with_column(col("col2").fill_null(3))
        .collect().expect("Failed");
    println!("{}", fill_literal_df);

    // Unique Values or Class Distribution
    let class_distr = df
        .clone()
        .lazy()
        .select([col("cut").n_unique().alias("n_unique"),])
        .collect().expect("Failed");
    println!("{}", class_distr);

    // Filter Groupby
    let grouped_diamonds = df.clone().lazy().group_by(["cut"]).agg([
        col("price").sum().alias("total_price"),
        col("price").mean().alias("average_price"),
        col("price").count().alias("counts"),
    ]).collect().expect("Failed");
    println!("{:?}", grouped_diamonds);

    // Concat Two DF
    let df_v1 = df!(
        "a" => &[1],
        "b" => &[3],
        "c" => &[4],
    ).unwrap();

    let df_v2 = df!(
        "a" => &[5],
        "b" => &,
        "c" => &,
    ).unwrap();

    // Vertical Concat
    let df_vertical = concat([df_v1.clone().lazy(), df_v2.clone().lazy()],
        UnionArgs::default())
        .unwrap()
        .collect().expect("Failed to concat");
    println!("{}", &df_vertical);

   
}

This example demonstrates a comprehensive workflow for data analysis and visualization in Rust, making it a powerful tool for any data scientist or analyst.

You can check out the video tutorials below

Thanks for your attention

Jesus Saves

Jesse E.Agbe (JCharis)

Leave a Comment

Your email address will not be published. Required fields are marked *