One of the cool things about building web applications is the ability to either upload or download files from the web app.
In this tutorial we will be exploring streamlit file upload feature. With this feature you can upload any type of files and use them in your app. To work with the file uploads you will have to use the st.file_uploader() function.
Let us see how the st.file_uploader() functions works.
First of all let us explore the various features of the st.file_uploader()
- Ability to specify the type of file you want to allow(type=[]): This feature is quite useful as it gives you a form of security out of the box with little code. Hence unspecified file types are disallowed and restricted when the user uploads a file.
- Ability to receive multiple files: (accept_multiple_files=True): With this feature you can accept multiple file as well as even select multiple files and upload them.
- Ability to specify the limit of files to upload : By default the maximum limit is 200mb but Streamlit allows you to change the limit.
These are the commonest features out of the box but there are some other nice features that needs explanation.
Let us see how the file_uploader() works.
The UploadedFile Class
Any file uploaded will be seen under the UploadedFile Class which in this context is a subclass of ByteIO . This is ‘file-like ‘ so in case you are working with it – you are to treat it like how you will treat a file. Therefore you will have to use the right file processing library or package to process or read your uploaded file.
Hence if it is an image you will have to use PIL or OpenCV ,etc.
Filetype and Their Respective Packages
Image(PNG,JPEG,JPG): PIL,OpenCV,Scikit-Image
CSV: Pandas, DataTable,CSV
PDF: pyPDF2,pdfplumber,etc
Docx: Docx2Txt,python-docx,textract,
Getting the FIlename and Filesize
With the UploadedFile Class you can get the original filename, filesize and filetype. Doing a dir(uploaded_file) will show all the attributes and methods of this class. Hence you can get the original details.
For the original filename,filesize and filetype you can use the code below
uploaded_file = st.file_uploader("Upload Files",type=['png','jpeg']) if uploaded_file is not None: file_details = {"FileName":uploaded_file.name,"FileType":uploaded_file.type,"FileSize":uploaded_file.size} st.write(file_details)
This is the same thing that shows just below the drag and drop section when you upload a file.
Reading Each File Type
To make it easy I have built a simple file_upload app and you can check the code below
import streamlit as st
import streamlit.components.v1 as stc
# File Processing Pkgs
import pandas as pd
import docx2txt
from PIL import Image
from PyPDF2 import PdfFileReader
import pdfplumber
def read_pdf(file):
pdfReader = PdfFileReader(file)
count = pdfReader.numPages
all_page_text = ""
for i in range(count):
page = pdfReader.getPage(i)
all_page_text += page.extractText()
return all_page_text
def read_pdf_with_pdfplumber(file):
with pdfplumber.open(file) as pdf:
page = pdf.pages[0]
return page.extract_text()
# import fitz # this is pymupdf
# def read_pdf_with_fitz(file):
# with fitz.open(file) as doc:
# text = ""
# for page in doc:
# text += page.getText()
# return text
# Fxn
@st.cache
def load_image(image_file):
img = Image.open(image_file)
return img
def main():
st.title("File Upload Tutorial")
menu = ["Home","Dataset","DocumentFiles","About"]
choice = st.sidebar.selectbox("Menu",menu)
if choice == "Home":
st.subheader("Home")
image_file = st.file_uploader("Upload Image",type=['png','jpeg','jpg'])
if image_file is not None:
# To See Details
# st.write(type(image_file))
# st.write(dir(image_file))
file_details = {"Filename":image_file.name,"FileType":image_file.type,"FileSize":image_file.size}
st.write(file_details)
img = load_image(image_file)
st.image(img,width=250,height=250)
elif choice == "Dataset":
st.subheader("Dataset")
data_file = st.file_uploader("Upload CSV",type=['csv'])
if st.button("Process"):
if data_file is not None:
file_details = {"Filename":data_file.name,"FileType":data_file.type,"FileSize":data_file.size}
st.write(file_details)
df = pd.read_csv(data_file)
st.dataframe(df)
elif choice == "DocumentFiles":
st.subheader("DocumentFiles")
docx_file = st.file_uploader("Upload File",type=['txt','docx','pdf'])
if st.button("Process"):
if docx_file is not None:
file_details = {"Filename":docx_file.name,"FileType":docx_file.type,"FileSize":docx_file.size}
st.write(file_details)
# Check File Type
if docx_file.type == "text/plain":
# raw_text = docx_file.read() # read as bytes
# st.write(raw_text)
# st.text(raw_text) # fails
st.text(str(docx_file.read(),"utf-8")) # empty
raw_text = str(docx_file.read(),"utf-8") # works with st.text and st.write,used for futher processing
# st.text(raw_text) # Works
st.write(raw_text) # works
elif docx_file.type == "application/pdf":
# raw_text = read_pdf(docx_file)
# st.write(raw_text)
try:
with pdfplumber.open(docx_file) as pdf:
page = pdf.pages[0]
st.write(page.extract_text())
except:
st.write("None")
elif docx_file.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
# Use the right file processor ( Docx,Docx2Text,etc)
raw_text = docx2txt.process(docx_file) # Parse in the uploadFile Class directory
st.write(raw_text)
else:
st.subheader("About")
st.info("Built with Streamlit")
st.info("Jesus Saves @JCharisTech")
st.text("Jesse E.Agbe(JCharis)")
if __name__ == '__main__':
main()
You can also check out the video tutorial below
Thanks For Your Time
Jesus Saves
By Jesse E.Agbe(JCharis)
Hi Jesse,
Thanks for this. Is there a way to add delimiters to CSV files? When I use:
data_file = st.file_uploader(‘Weekly Sales Data’,type=[‘csv’,’txt’,’xlsx’])
if data_file:
if data_file.name[-3:] == ‘csv’:
df_data = pd.read_csv(data_file, delimiter=’\|’)
elif data_file.name[-3:] == ‘txt’:
df_data = pd.read_csv(data_file, delimiter=’\|’)
else:
df_data = pd.read_excel(data_file)
It gives me ValueError: I/O operation on closed file. But then try:
data_file = st.file_uploader(‘Weekly Sales Data’,type=[‘csv’,’txt’,’xlsx’])
if data_file:
if data_file.name[-3:] == ‘csv’:
df_data = pd.read_csv(io.StringIO(data_file.read().decode(‘utf-8′)), delimiter=’\|’)
elif data_file.name[-3:] == ‘txt’:
df_data = pd.read_csv(io.StringIO(data_file.read().decode(‘utf-8′)), delimiter=’\|’)
else:
df_data = pd.read_excel(data_file)
It reads the file fine, but if I change any parameters but leave the file in the upload it says the same thing.
So I added:
if data_file:
del data_file
under. It doesn’t work.
Any advice?
Hello Peter, thanks for reaching out.
The issue you are having is strange for me, it is supposed to work directly without even using the IO. Please which version of streamlit are you using?
About the condition, the UploadedFile class has a
.type
attribute that you can use for the condition eg.if data_file is not None:
if data_file.type == 'csv':
df = pd.read_csv(data_file)
elif data_file.type == 'txt':
df = pd.read_csv(data_file,delimiter='\t')
...
Hope this helps