NeatBio : is a simple yet another bioinformatics tool built for doing simple DNA sequence analysis and protein analysis.
It was built with the intention of showing how to use simple custom functions to do bioinformatics and then converting these functions into a reusable tool via packaging.
Unlike well tested libraries for bioinformatics such as BioPython,Biotite and Scikit-Bio, NeatBio has certain minor limitations as it was not intended to replace or compete with the above mention libraries.
Why NeatBio?
- NeatBio is yet another bioinformatics library along side powerful and popular bioinformatics libraries such as biopython,scikit-bio,biotite.
- It is meant to complement these powerful library in a simple way.
- This was built in the BioInformatics with Python Course as an educational step on how to create custom functions for sequence analysis. It was then converted into a simple python library.
NeatBio is part of the NEAT project which is maintained by @jcharis but contributors are gladly welcomed.
Features
- Handling Sequences(DNA,RNA,Protein)
- Protein Synthesis
- Sequence Similarity
- Kmers Generation and Kmer Distance
- Probable Back Translation of Amino Acids
- Reading FASTA files
- Sequence Alignment
Getting Started
Installation
- Using Pip
pip install neatbio
Usage
NeatBio is meant to be used for sequence analysis in bioinformatics and computational biology. NeatBio comes with 2 main classes for handling and creating sequence
- Sequence: For creating DNA and RNA sequences.
- ProteinSeq: For creating with Protein Sequences.
There is also the sequtils subpackage that offer several utilities for working with sequences – which is accessed using the neatbio.sequtils
. For sequence alignment which is still experimental, there is the neatbio.alignments for global and local alignments.
Usage
Handling Sequences
- Neatbio offers the ability to analyze sequences for more insight
- You can create a simple Sequence Object for working with DNA and RNA
- You can also do the basic sequence activities such as transcription,translation and complements as well as
find,count and index
.
>>>import neatbio as nt
>>> seq1 = nt.Sequence('ATGCATTGA')
>>> seq1.transcribe()
'AUGCAUUGA'
>>> mrna = seq1.transcribe()
>>> mrna.back_transcribe()
'ATGCATTGA'
>>>
>>> seq1.translate()
'MH*'
>>> seq1.translate
DNA Composition – GC and AT Content and Frequency
With neatbio you can get the gc
and at
content as attributes as well as the frequency of both using the gc_frequency
and at_frequency
respectively. There is also gc_content
and at_content
functions that you can import from the sequtils
subpackage.
>>>import neatbio as nt
>>> seq1 = nt.Sequence('ATGCATTGA')
>>> seq1.gc
33.33333333333333
>>> seq1.gc_frequency()
3
>>> seq1.at
66.66666666666666
>>> seq1.at_frequency()
6
Working with Proteins
You can also create a ProteinSeq object to work with only Proteins. This is meant to separate how we handle DNA,RNA sequence from Protein Sequence. By that option we can do back_translation of any protein sequence.
- Note that the
back_translate()
function offers a probable sequence and not the exact back-translation as multiple codons can represent the same amino acid.
>>> protein1 = nt.ProteinSeq('MIT')
>>> protein1
ProteinSeq(seq='MIT')
>>> protein1 = nt.ProteinSeq('MIT')
>>> protein1.back_translate()
Sequence(seq='ATGATAACT')
>>> protein1.aromaticity()
0.0
>>> protein1.get_amino_acid_percentage()
{'A': 0.0, 'C': 0.0, 'D': 0.0, 'E': 0.0, 'F': 0.0, 'G': 0.0, 'H': 0.0, 'I': 0.3333333333333333, 'K': 0.0, 'L': 0.0, 'M': 0.3333333333333333, 'N': 0.0, 'P': 0.0, 'Q': 0.0, 'R': 0.0, 'S': 0.0, 'T': 0.3333333333333333, 'V': 0.0, 'W': 0.0, 'Y': 0.0, '*': 0.0, '-': 0.0}
Convert 3 Letter Amino Acid to 1 and vice versa
You can also convert a 3 Letter Amino Acid such as Ala or Leu
etc to a single letter. You can also do a vice versa where we convert 1 Letter Amino Acid to 3 Letter Amino Acid. Finally in case you want to get the full name for an amino acid you can supply the 3 letter Amino Acid to get the full amino acid name.
>>> from neatbio.sequtils import convert_3to1,convert_1to3,get_acid_name
>>> convert_3to1('Ala')
'A'
>>> convert_1to3('L')
'Leu'
>>> get_acid_name('Ala')
'Alanine'
Generate DotPlot
A dot plot is a graphical method that allows the comparison of two biological sequences and identify regions of close similarity between them. • Dot plots compare two sequences by organizing one sequence on the x-axis, and another on the y-axis, of a plot. neatbio
allows us to generate dotplot between two sequence.
>>> import neatbio as nt
>>> import neatbio.sequtils as utils
>>> seq1 = nt.Sequence('AGTCGTACT')
>>> seq2 = nt.Sequence('AGGCGCACT')
>>>
>>> utils.dotplot(seq1,seq2)
|AGGCGCACT
-----------
A|■ ■
G| ■■ ■
T| ■
C| ■ ■ ■
G| ■■ ■
T| ■
A|■ ■
C| ■ ■ ■
T| ■
>>>
Reading FASTA Files
Although there are several biological data file formats we only support reading of FASTA file for now. With the read_fasta
you can fetch data from FASTA files which will return a dictionary with the header and the sequence record
>>> import neatbio as nt
>>> file1 = nt.read_fasta('sequence.fasta')
>>> file1['seqRecord']
....
>>> file1['header']
>>> seq1 = nt.Sequence(file1['seqRecord'])
API Reference
There is the main core package with two main subpackages.
Core Package
Subpackages
- neatbio.sequtils
- neatbio.alignments
Thanks For Using NeatBio
Let us know any bugs and ways we can improve it.
Jesus Saves