NeatBio

NeatBio : is a simple yet another bioinformatics tool built for doing simple DNA sequence analysis and protein analysis.

It was built with the intention of showing how to use simple custom functions to do bioinformatics and then converting these functions into a reusable tool via packaging.

Unlike  well tested libraries  for bioinformatics such as BioPython,Biotite and Scikit-Bio, NeatBio has certain minor limitations as it  was not intended to replace or compete with the above mention libraries.

Why NeatBio?

  • NeatBio is yet another bioinformatics library along side powerful and popular bioinformatics libraries such as biopython,scikit-bio,biotite.
  • It is meant to complement these powerful library in a simple way.
  • This was built in the BioInformatics with Python Course as an educational step on how to create custom functions for sequence analysis. It was then converted into a simple python library.

NeatBio is part of the NEAT project which is maintained by @jcharis but contributors are gladly welcomed.

Features

  • Handling Sequences(DNA,RNA,Protein)
  • Protein Synthesis
  • Sequence Similarity
  • Kmers Generation and Kmer Distance
  • Probable Back Translation of Amino Acids
  • Reading FASTA files
  • Sequence Alignment

Getting Started

Installation

  • Using Pip
pip install neatbio

Usage

NeatBio is meant to be used for sequence analysis in bioinformatics and computational biology. NeatBio comes with 2 main classes for handling and creating sequence

  • Sequence: For creating DNA and RNA sequences.
  • ProteinSeq: For creating with Protein Sequences.

There is also the sequtils subpackage that offer several utilities for working with sequences – which is accessed using the neatbio.sequtils. For sequence alignment which is still experimental, there is the neatbio.alignments for global and local alignments.

Usage

Handling Sequences

  • Neatbio offers the ability to analyze sequences for more insight
  • You can create a simple Sequence Object for working with DNA and RNA
  • You can also do the basic sequence activities such as transcription,translation and complements as well as find,count and index.
>>>import neatbio as nt
>>> seq1 = nt.Sequence('ATGCATTGA')
>>> seq1.transcribe()
'AUGCAUUGA'
>>> mrna = seq1.transcribe()
>>> mrna.back_transcribe()
'ATGCATTGA'
>>> 

>>> seq1.translate()
'MH*'
>>> seq1.translate

DNA Composition – GC and AT Content and Frequency

With neatbio you can get the gc and at content as attributes as well as the frequency of both using the gc_frequency and at_frequency respectively. There is also gc_content and at_content functions that you can import from the sequtils subpackage.

>>>import neatbio as nt
>>> seq1 = nt.Sequence('ATGCATTGA')
>>> seq1.gc
33.33333333333333
>>> seq1.gc_frequency()
3
>>> seq1.at
66.66666666666666
>>> seq1.at_frequency()
6

Working with Proteins

You can also create a ProteinSeq object to work with only Proteins. This is meant to separate how we handle DNA,RNA sequence from Protein Sequence. By that option we can do back_translation of any protein sequence.

  • Note that the back_translate() function offers a probable sequence and not the exact back-translation as multiple codons can represent the same amino acid.
>>> protein1 = nt.ProteinSeq('MIT')
>>> protein1
ProteinSeq(seq='MIT')
>>> protein1 = nt.ProteinSeq('MIT')
>>> protein1.back_translate()
Sequence(seq='ATGATAACT')
>>> protein1.aromaticity()
0.0
>>> protein1.get_amino_acid_percentage()
{'A': 0.0, 'C': 0.0, 'D': 0.0, 'E': 0.0, 'F': 0.0, 'G': 0.0, 'H': 0.0, 'I': 0.3333333333333333, 'K': 0.0, 'L': 0.0, 'M': 0.3333333333333333, 'N': 0.0, 'P': 0.0, 'Q': 0.0, 'R': 0.0, 'S': 0.0, 'T': 0.3333333333333333, 'V': 0.0, 'W': 0.0, 'Y': 0.0, '*': 0.0, '-': 0.0}

Convert 3 Letter Amino Acid to 1 and vice versa

You can also convert a 3 Letter Amino Acid such as Ala or Leu etc to a single letter. You can also do a vice versa where we convert 1 Letter Amino Acid to 3 Letter Amino Acid. Finally in case you want to get the full name for an amino acid you can supply the 3 letter Amino Acid to get the full amino acid name.

>>> from neatbio.sequtils import convert_3to1,convert_1to3,get_acid_name
>>> convert_3to1('Ala')
'A'
>>> convert_1to3('L')
'Leu'

>>> get_acid_name('Ala')
'Alanine'

Generate DotPlot

A dot plot is a graphical method that allows the comparison of two biological sequences and identify regions of close similarity between them. • Dot plots compare two sequences by organizing one sequence on the x-axis, and another on the y-axis, of a plot. neatbio allows us to generate dotplot between two sequence.

>>> import neatbio as nt 
>>> import neatbio.sequtils as utils
>>> seq1 = nt.Sequence('AGTCGTACT')
>>> seq2 = nt.Sequence('AGGCGCACT')
>>> 
>>> utils.dotplot(seq1,seq2)
 |AGGCGCACT
-----------
A|■     ■  
G| ■■ ■    
T|        ■
C|   ■ ■ ■ 
G| ■■ ■    
T|        ■
A|■     ■  
C|   ■ ■ ■ 
T|>>> 

Reading FASTA Files

Although there are several biological data file formats we only support reading of FASTA file for now. With the read_fasta you can fetch data from FASTA files which will return a dictionary with the header and the sequence record

>>> import neatbio as nt 
>>> file1 = nt.read_fasta('sequence.fasta')
>>> file1['seqRecord']
....
>>> file1['header']

>>> seq1 = nt.Sequence(file1['seqRecord'])

API Reference

There is the main core package with two main subpackages.

Core Package

Subpackages

Thanks For Using NeatBio

Let us know any bugs and ways we can improve it.

Jesus Saves