Rust is a modern systems programming language that is know for it performance and memory safety. As a modern programming language and in the age of Data and AI, can we use Rust for data analysis and data science task?
In this tutorial we will explore some of the crates or packages available in Rust for data science. Below is a list of libraries
- Polars : This is similar to Python Pandas and it has a python implementation as well as a rust implementation.
- DataFusion
- Batista
- Ndarray
- Linfa: a Rust equivalent of scikit-learn
- Smartcore: a Rust equivalent of MLPack
- Plotly.rs
- Plotars
Let us see how to use polars for data analysis in Rust
Data Analysis in Rust Using Polars and Plotly
Rust, with its strong focus on performance, safety, and concurrency, has become an attractive choice for data analysis tasks. Two powerful libraries, Polars and Plotly, can be leveraged to perform high-performance data analysis and visualization in Rust. This blog post will guide you through using these libraries to read, manipulate, and visualize data.
Setting Up Your Rust Project
To start, you need to set up a new Rust project using Cargo, Rust’s package manager.
cargo new data_analysis
cd data_analysis
Add the necessary dependencies to your Cargo.toml
file:
[dependencies]
polars = "0.23.0"
plotly = "0.4.0"
Reading and Manipulating Data with Polars
Polars is a high-performance DataFrame library that allows you to efficiently read, manipulate, and analyze large datasets.
Reading a CSV File
Just like pandas we can read several file formats with polars. Here is how you can read a CSV file using Polars:
use polars::prelude::*;
fn main() {
let df = CsvReadOptions::default()
.try_into_reader_with_file_path(Some("diamonds.csv".into()))
.unwrap()
.finish()
.unwrap();
// Preview the last few rows of the DataFrame
println!("{}", df.tail(Some(4)));
}
Dataframe Operations
Polars provides a variety of methods to inspect and manipulate your DataFrame:
// Get the shape of the DataFrame
println!("{:?}", df.shape());
// Get the size and number of rows
println!("{:?}", df.size());
// Select specific columns
let sv: Vec<&Column> = df.columns(["color", "clarity"]).expect("Failed to Work");
println!("{:?}", sv);
// Select columns using the `select` method
let result = df.clone().select(["cut", "price"]);
println!("{:?}", result);
// Count null values
let null_count = df.null_count();
println!("{}", null_count);
Handling Missing Values
You can handle missing values (NaN/Null) by filling them with specific values:
let df2 = df (
"col1" => [0.5, 1.0, 1.5, 2.0, 2.5],
"col2" => [Some(1), None, Some(3), None, Some(5)],
).expect("Failed");
let fill_literal_df = df2
.clone()
.lazy()
.with_column(col("col2").fill_null(3))
.collect().expect("Failed");
println!("{}", fill_literal_df);
GroupBy and Aggregate Operations
Polars supports advanced operations like grouping and aggregating data:
let grouped_diamonds = df.clone().lazy().group_by(["cut"]).agg([
col("price").sum().alias("total_price"),
col("price").mean().alias("average_price"),
col("price").count().alias("counts"),
]).collect().expect("Failed");
println!("{:?}", grouped_diamonds);
Visualizing Data with Plotly
Plotly is a powerful data visualization library that integrates well with Rust. Here’s how you can use it to visualize your data.
Setting Up Plotly
Ensure you have the plotly
crate included in your Cargo.toml
file.
Creating a Bar Chart
Here’s an example of creating a bar chart using data from your DataFrame:
use plotly::{Plot, Bar};
let mut dataset = LazyCsvReader::new("diamonds.csv").finish().unwrap().collect().unwrap();
let x = dataset.column("cut").unwrap().str().into_iter().flatten().collect(); // convert to vec assuming no nulls
let y = dataset.column("price").unwrap().f64().into_iter().flatten().collect();
let trace = Bar::new(x, y).show_legend(true).opacity(0.5);
let mut plot = Plot::new();
plot.add_trace(trace);
plot.show();
This code reads the cut
and price
columns from the DataFrame, converts them into vectors, and then creates a bar chart using Plotly.
Customizing Plots
Plotly allows extensive customization of your plots, including titles, axis labels, and legends.
let mut plot = Plot::new();
let layout = Layout::new()
.title(Title::new("Diamond Prices by Cut"))
.x_axis(Axis::new().title(Title::new("Cut")))
.y_axis(Axis::new().title(Title::new("Price")));
plot.set_layout(layout);
plot.add_trace(trace);
plot.show();
Conclusion
Rust, combined with Polars and Plotly, offers a robust and efficient environment for data analysis and visualization. Here are the key points covered:
- Reading and Manipulating Data: Use Polars to read CSV files, select columns, handle missing values, and perform group-by and aggregate operations.
- Visualizing Data: Use Plotly to create interactive and customizable plots such as bar charts, line charts, and more.
- Customization: Customize your plots with titles, axis labels, and legends to effectively communicate your data insights.
By leveraging these libraries, you can build high-performance data analysis applications in Rust that are both efficient and visually appealing.
Example Code
Here is the complete example code used in this blog post:
use polars::prelude::*;
use plotly::{Plot, Bar};
fn main() {
// Read A CSV File in Polars
let df = CsvReadOptions::default()
.try_into_reader_with_file_path(Some("diamonds.csv".into()))
.unwrap()
.finish()
.unwrap();
// Preview
println!("{}", df.tail(Some(4)));
// Shape
println!("{:?}", df.shape());
// Get the size,nrows
println!("{:?}", df.size());
// Selection of Columns
let sv: Vec<&Column> = df.columns(["color", "clarity"]).expect("Failed to Work");
println!("{:?}", sv);
// Select using the Select
let result = df
.clone()
.select(["cut", "price"]);
println!("{:?}", result);
// Missing Nan and Null
let null_count = df.null_count();
println!("{}", null_count);
// Fill NaN /Null
let df2 = df (
"col1" => [0.5, 1.0, 1.5, 2.0, 2.5],
"col2" => [Some(1), None, Some(3), None, Some(5)],
).expect("Failed");
let fill_literal_df = df2
.clone()
.lazy()
.with_column(col("col2").fill_null(3))
.collect().expect("Failed");
println!("{}", fill_literal_df);
// Unique Values or Class Distribution
let class_distr = df
.clone()
.lazy()
.select([col("cut").n_unique().alias("n_unique"),])
.collect().expect("Failed");
println!("{}", class_distr);
// Filter Groupby
let grouped_diamonds = df.clone().lazy().group_by(["cut"]).agg([
col("price").sum().alias("total_price"),
col("price").mean().alias("average_price"),
col("price").count().alias("counts"),
]).collect().expect("Failed");
println!("{:?}", grouped_diamonds);
// Concat Two DF
let df_v1 = df!(
"a" => &[1],
"b" => &[3],
"c" => &[4],
).unwrap();
let df_v2 = df!(
"a" => &[5],
"b" => &,
"c" => &,
).unwrap();
// Vertical Concat
let df_vertical = concat([df_v1.clone().lazy(), df_v2.clone().lazy()],
UnionArgs::default())
.unwrap()
.collect().expect("Failed to concat");
println!("{}", &df_vertical);
}
This example demonstrates a comprehensive workflow for data analysis and visualization in Rust, making it a powerful tool for any data scientist or analyst.
You can check out the video tutorials below
Thanks for your attention
Jesus Saves
Jesse E.Agbe (JCharis)