Go also know as Golang is a simple yet powerful programming language created around 2009 and backed by Google. Go is fast, statically typed, compiled language that feels like a dynamically typed, interpreted language.
With a lot of features built within it, it is a great tool.
In this series of tutorials and posts we will be experiment with Golang as a tool for performing NLP and Data Science. As most of us know, Python is the current defactor language for Data Science and NLP due to it simplicity and the numerous libraries and resources available.
So where does Golang fits in the realm of Data Science? Can we even use Go for Data Science? Can we use Go For Natural Language Processing? ( You can check out the upcoming video tutorials and courses on Go4DataScience)
The answer is definitely Yes!. We can use Go for data science. However in using any tool you should ask yourself,
- How efficient am I becoming at solving the problem or meeting the need.
- How much time am I saving or wasting?
- What resources am I under-using or abusing?
Go has several advantages in that it has inbuilt concurrency, cross compiling feature across several Operating Systems and several web frameworks. It is also having a clean simple but robust syntax.
Let us classify the areas that Golang is great for based on certain features that is comes inherently with.
According to Wiki – Concurrency has the benefit of increasing program throughput, ie, the parallel execution of a concurrent program allows the number of tasks completed in a given time to increase proportionally to the number of processors according to Gustafson’s law. In short this improves performance during multi tasking multi-threading,etc.
Golang’s concurrency makes it a great choice for building cloud native applications. Hence when building cloud native apps and web apps that will require multiple requests and tasks, Go is a great option.
Because it can be responding to multiple requests at the same time, and not sequentially within the context of goroutine cooperative multi tasking.
Examples of Popular Cloud Native Apps build in Golang includes
Cross Compilation For Multiple Platforms
Go has a very important feature that makes it super useful when building tools with it. The ability to build cross platform binaries in Golang makes it very useful. With this feature you can create utilities and distribute them with easy.
Moreover building binaries in Golang requires little to no external dependencies hence you can create several tools and share them without thinking if it will work or not on a different OS . You can build for Windows, Linux, MacOS, FreeBSD even Android.
Hashicorp , a DevOps and Cloud Oriented Company build almost all of their tools with Go because of this feature. Tools like Terraform, Vault and Nomad, Waypoint were built with Golang and can be used every where.
Portable and Quick Building of Binaries
Just like the above feature – building portable binaries with easy, you can leverage this feature to build useful and simple CLI tools that you can distribute easily.
Networking & HTTP
Go comes with a powerful full feature HTTP library that allows anyone to build web applications without any external libraries. Moreover there are several web application frameworks in Go that follows a similar API. This makes it easy to be able to build web apps . This apps can leverage the performance and concurrency nature of Golang.
Memory Safety, Static Typing & C -like features
Go can be used for building most of the ageing infrastructure the internet runs on. The static typing and memory safety can serve as a safe guide to build the Next Generation Internet infrastructure.
Now what about Data Science and NLP?
For NLP, Go has a lot of places it can be used. A simple example is in building portable NLP tools and Desktop Apps.
From the above stated features of Golang, where can Golang fit best in the data science workflow.? From my little observation Golang can be useful in :
- Scalable Infrastructure: using its concurrency and networking features
- Data Science Web Applications: using the numerous web frameworks
- Easy distribution of portable offline models and NLP tools
- Cross compilation which makes it easy for deployment
- Unknown field
But there are a few big limitations using Golang for DataScience.
Where Golang Is Not Suitable For In Data Science as of Now?
- Exploratory Data Analysis & Data Cleaning
- Data Visualization & Plots
There are no simple ways to do Exploratory Data Analysis and Data Cleaning in Golang. Unlike Python that is an interpreted language with A REPL as well as several easy to use Libraries and Package such as Pandas, Numpy, Scipy,- Golang lacks such powerful libraries. Although there are some few packages such as Gonum,GoTa, DataFrame, qframes. It is cumbersome for the beginner.
Just reading a CSV file can take more than 3 lines of code. You have to first open the file using os.Open or ioutil before parsing and processing the csv file.
Moreover in Data Science, we want to save time especially in a time consuming task such as data cleaning and munging. For now using Go for this task will increase your time than save you time. Hence the best option is to use Python or Julia to wrangle and clean the data and then continue from there with Go.
The second limitation is in Data Visualization, it takes more code to plot a simple bar chart than in python. Moreover there are not a lot of material and resources to make it easier or better for the data scientist.
I hope in the next years there will be a lot of powerful and versatile Golang libraries and resources to make Data Science fun and fast.
Apart from these two, Go is great and can be used for Data Science.
Check out the upcoming video tutorials and posts on Go4DataScience as we experiment together on using Golang for Data Science.
Thanks For Your TIme
By Jesse E.Agbe (JCharis)