fake data generator app

Building A Fake Data Generator App with Streamlit

Data is everywhere, but sometimes when building a web site or testing out a product you may need quick data to use to test drive your app or product. This is where the Faker library comes to play. The idea behind Faker is quite simple to generate data randomly per certain fields. This library has been ported into several languages with similar names such as Faker.js for JavaScript, PHP’s Faker, Perl’s Data::Faker, and by Ruby’s Faker.

In this tutorial we will be building a web application using streamlit and faker. The basic idea behind the app is for generating fake data that we can download as CSV or JSON and use it for our task.

Let us see the requirements. To build this app we will need just 3 packages namely

  • streamlit : For our web app
  • pandas: For Previewing the data and converting into CSV and JSON
  • faker: For generating our fake data

Installation of Packages

You can install these packages using pip as below

pip install streamlit pandas faker

Ok, let us check out the basic structure of our app. Our app will be having two main sections. The first section will be for generating basic simple profile and the second section will be for generating customizable or fields- specific profile where you can select how many fields you may need.

Let us check the code

# Core Pkgs
 import streamlit as st 
 import streamlit.components.v1 as stc 
 # Data Pkgs
 import pandas as pd 
 from faker import Faker
 fake = Faker()
 #Utils
 import base64
 import time 
 timestr = time.strftime("%Y%m%d-%H%M%S")

We will create individual functions to make our work simpler.

#Fxn to Download Into A Specified Format
 def make_downloadable_df_format(data,format_type="csv"):
     if format_type == "csv":
         datafile = data.to_csv(index=False)
     elif format_type == "json":
         datafile = data.to_json()
     b64 = base64.b64encode(datafile.encode()).decode()  # B64 encoding
     st.markdown("### ** Download File  📩 ** ")
     new_filename = "fake_dataset_{}.{}".format(timestr,format_type)
     href = f'Click Here!'
     st.markdown(href, unsafe_allow_html=True)

 # Generate A Simple Profile
 def generate_profile(number,random_seed=200):
     Faker.seed(random_seed)
     data = [fake.simple_profile() for i in range(number)]
     df = pd.DataFrame(data)
     return df 
 # Generate A Customized Profile Per Locality
 def generate_locale_profile(number,locale,random_seed=200):
     locale_fake = Faker(locale)
     Faker.seed(random_seed)
     data = [locale_fake.simple_profile() for i in range(number)]
     df = pd.DataFrame(data)
     return df 

Next we will create individual input to receive data from the front-end using streamlit widgets and then parse them into our function. Let us see the code for that

 number_to_gen = st.sidebar.number_input("Number",10,5000)
 locale = st.sidebar.multiselect("Select Locale",localized_providers,default="en_US")
 dataformat = st.sidebar.selectbox("Save Data As",["csv","json"])

The received data will then be parsed into our generate_local_profile() function accordingly. Below is the code within our main function.

def main():
  st.title("Fake Data Generator")
  menu = ["Home","Customize","About"]

  choice = st.sidebar.selectbox("Menu",menu)
if choice == "Home":
    st.subheader("Home")
    number_to_gen = st.sidebar.number_input("Number",10,5000)
    locale = st.sidebar.multiselect("Select Locale",localized_providers,default="en_US")
    dataformat = st.sidebar.selectbox("Save Data As",["csv","json"])

    df = generate_locale_profile(number_to_gen,locale)
    st.dataframe(df)

    st.write(df['sex'].value_counts())
    with st.beta_expander("📩: Download"):
        make_downloadable_df_format(df,dataformat)

elif choice == "Customize":
    st.subheader("Select Your Fields")

    locale = st.sidebar.multiselect("Select Locale",localized_providers,default="en_US")

    profile_options_list = ['username', 'name', 'sex' , 'address', 'mail' , 'birthdate''job', 'company', 'ssn', 'residence', 'current_location', 'blood_group', 'website'] 
    profile_fields = st.sidebar.multiselect("Fields",profile_options_list,default='username')

    number_to_gen = st.sidebar.number_input("Number",10,10000)
    dataformat = st.sidebar.selectbox("Save Data As",["csv","json"])

    custom_fake = Faker(locale)
    data = [custom_fake.profile(fields=profile_fields) for i in range(number_to_gen)]
    df = pd.DataFrame(data)


    st.dataframe(df)


    with st.beta_expander("🔍: View JSON "):
        st.json(data)

    with st.beta_expander("📩: Download"):
        make_downloadable_df_format(df,dataformat)


else:
    st.subheader("About")
    st.success("Built with Streamlit")
    st.info("Jesus Saves @JCharisTech")
    st.text("By Jesse E.Agbe(JCharis)")

if __name__ == '__main__':
	main()

We have seen how easy to build something cool using streamlit and python faker library.

You can also check out the video tutorial below and the code from here.

Thanks For Your Time

Jesus Saves

By Jesse E.Agbe(JCharis)