Working with File Uploads in Streamlit Python

One of the cool things about building web applications is the ability to either upload or download files from the web app.

In this tutorial we will be exploring streamlit file upload feature. With this feature you can upload any type of files and use them in your app. To work with the file uploads you will have to use the st.file_uploader() function.

Let us see how the st.file_uploader() functions works.

First of all let us explore the various features of the st.file_uploader()

Ability to specify the type of file you want to allow(type=[]): This feature is quite useful as it gives you a form of security out of the box with little code. Hence unspecified file types are disallowed and restricted when the user uploads a file.
Ability to receive multiple files: (accept_multiple_files=True): With this feature you can accept multiple file as well as even select multiple files and upload them.
Ability to specify the limit of files to upload : By default the maximum limit is 200mb but Streamlit allows you to change the limit.

These are the commonest features out of the box but there are some other nice features that needs explanation.

Let us see how the file_uploader() works.

The UploadedFile Class

Any file uploaded will be seen under the UploadedFile Class which in this context is a subclass of ByteIO . This is ‘file-like ‘ so in case you are working with it – you are to treat it like how you will treat a file. Therefore you will have to use the right file processing library or package to process or read your uploaded file.

Hence if it is an image you will have to use PIL or OpenCV ,etc.

Filetype and Their Respective Packages

Image(PNG,JPEG,JPG): PIL,OpenCV,Scikit-Image

CSV: Pandas, DataTable,CSV

PDF: pyPDF2,pdfplumber,etc

Docx: Docx2Txt,python-docx,textract,

Getting the FIlename and Filesize

With the UploadedFile Class you can get the original filename, filesize and filetype. Doing a dir(uploaded_file) will show all the attributes and methods of this class. Hence you can get the original details.

For the original filename,filesize and filetype you can use the code below

uploaded_file = st.file_uploader("Upload Files",type=['png','jpeg'])
if uploaded_file is not None:
    file_details = {"FileName":uploaded_file.name,"FileType":uploaded_file.type,"FileSize":uploaded_file.size}
     st.write(file_details)

This is the same thing that shows just below the drag and drop section when you upload a file.

Reading Each File Type

To make it easy I have built a simple file_upload app and you can check the code below

import streamlit as st
import streamlit.components.v1 as stc

# File Processing Pkgs
import pandas as pd
import docx2txt
from PIL import Image 
from PyPDF2 import PdfFileReader
import pdfplumber


def read_pdf(file):
	pdfReader = PdfFileReader(file)
	count = pdfReader.numPages
	all_page_text = ""
	for i in range(count):
		page = pdfReader.getPage(i)
		all_page_text += page.extractText()

	return all_page_text

def read_pdf_with_pdfplumber(file):
	with pdfplumber.open(file) as pdf:
	    page = pdf.pages[0]
	    return page.extract_text()

# import fitz  # this is pymupdf

# def read_pdf_with_fitz(file):
# 	with fitz.open(file) as doc:
# 		text = ""
# 		for page in doc:
# 			text += page.getText()
# 		return text 

# Fxn
@st.cache
def load_image(image_file):
	img = Image.open(image_file)
	return img 



def main():
	st.title("File Upload Tutorial")

	menu = ["Home","Dataset","DocumentFiles","About"]
	choice = st.sidebar.selectbox("Menu",menu)

	if choice == "Home":
		st.subheader("Home")
		image_file = st.file_uploader("Upload Image",type=['png','jpeg','jpg'])
		if image_file is not None:
		
			# To See Details
			# st.write(type(image_file))
			# st.write(dir(image_file))
			file_details = {"Filename":image_file.name,"FileType":image_file.type,"FileSize":image_file.size}
			st.write(file_details)

			img = load_image(image_file)
			st.image(img,width=250,height=250)


	elif choice == "Dataset":
		st.subheader("Dataset")
		data_file = st.file_uploader("Upload CSV",type=['csv'])
		if st.button("Process"):
			if data_file is not None:
				file_details = {"Filename":data_file.name,"FileType":data_file.type,"FileSize":data_file.size}
				st.write(file_details)

				df = pd.read_csv(data_file)
				st.dataframe(df)

	elif choice == "DocumentFiles":
		st.subheader("DocumentFiles")
		docx_file = st.file_uploader("Upload File",type=['txt','docx','pdf'])
		if st.button("Process"):
			if docx_file is not None:
				file_details = {"Filename":docx_file.name,"FileType":docx_file.type,"FileSize":docx_file.size}
				st.write(file_details)
				# Check File Type
				if docx_file.type == "text/plain":
					# raw_text = docx_file.read() # read as bytes
					# st.write(raw_text)
					# st.text(raw_text) # fails
					st.text(str(docx_file.read(),"utf-8")) # empty
					raw_text = str(docx_file.read(),"utf-8") # works with st.text and st.write,used for futher processing
					# st.text(raw_text) # Works
					st.write(raw_text) # works
				elif docx_file.type == "application/pdf":
					# raw_text = read_pdf(docx_file)
					# st.write(raw_text)
					try:
						with pdfplumber.open(docx_file) as pdf:
						    page = pdf.pages[0]
						    st.write(page.extract_text())
					except:
						st.write("None")
					    
					
				elif docx_file.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
				# Use the right file processor ( Docx,Docx2Text,etc)
					raw_text = docx2txt.process(docx_file) # Parse in the uploadFile Class directory
					st.write(raw_text)

	else:
		st.subheader("About")
		st.info("Built with Streamlit")
		st.info("Jesus Saves @JCharisTech")
		st.text("Jesse E.Agbe(JCharis)")



if __name__ == '__main__':
	main()

You can also check out the video tutorial below

Thanks For Your Time

Jesus Saves

By Jesse E.Agbe(JCharis)

2 thoughts on “Working with File Uploads in Streamlit Python”

Peter Vlok
March 12, 2021 at 9:55 am

Hi Jesse,

Thanks for this. Is there a way to add delimiters to CSV files? When I use:

data_file = st.file_uploader(‘Weekly Sales Data’,type=[‘csv’,’txt’,’xlsx’])
if data_file:
if data_file.name[-3:] == ‘csv’:
df_data = pd.read_csv(data_file, delimiter=’\|’)
elif data_file.name[-3:] == ‘txt’:
df_data = pd.read_csv(data_file, delimiter=’\|’)
else:
df_data = pd.read_excel(data_file)

It gives me ValueError: I/O operation on closed file. But then try:

data_file = st.file_uploader(‘Weekly Sales Data’,type=[‘csv’,’txt’,’xlsx’])
if data_file:
if data_file.name[-3:] == ‘csv’:
df_data = pd.read_csv(io.StringIO(data_file.read().decode(‘utf-8′)), delimiter=’\|’)
elif data_file.name[-3:] == ‘txt’:
df_data = pd.read_csv(io.StringIO(data_file.read().decode(‘utf-8′)), delimiter=’\|’)
else:
df_data = pd.read_excel(data_file)

It reads the file fine, but if I change any parameters but leave the file in the upload it says the same thing.

So I added:

if data_file:
del data_file

under. It doesn’t work.

Any advice?

1. jesse_jcharis
  March 17, 2021 at 12:51 pm
  
  Hello Peter, thanks for reaching out.
  The issue you are having is strange for me, it is supposed to work directly without even using the IO. Please which version of streamlit are you using?
  About the condition, the UploadedFile class has a .type attribute that you can use for the condition eg.
  if data_file is not None: if data_file.type == 'csv': df = pd.read_csv(data_file) elif data_file.type == 'txt': df = pd.read_csv(data_file,delimiter='\t') ...
  
  Hope this helps

The UploadedFile Class

Filetype and Their Respective Packages

Reading Each File Type

2 thoughts on “Working with File Uploads in Streamlit Python”

Leave a Comment Cancel Reply