In this tutorial we will be analysing the genomic and DNA sequence of coronavirus (ncov 19) using a wonderful bioinformatic package called BioPython.
We will do a simple introduction of how to use BioPython and then continue with our protein analysis of Covid19.
Installation
pip install biopython
Applications of BioPython
- For sequence analysis (DNA,RNA)
- To do transcription and translation of DNA (Protein Synthesis)
- Querying and Access BioInformatic Databases
- Entrez, BLAST,GenBank,etc
- 3D Structure analysis
Task
- Analysis of Covid19 genome
BioPython Crash Course
In [1]:
# Load the Pkg
import Bio
In [3]:
# Check the Attributes
dir(Bio)
Out[3]:
['BiopythonDeprecationWarning', 'BiopythonExperimentalWarning', 'BiopythonParserWarning', 'BiopythonWarning', 'MissingExternalDependencyError', 'MissingPythonDependencyError', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_parent_dir', 'os', 'warnings']
Sequence Analysis
- DNA and RNA Sequence
- A Adenine
- C Cytosine
- G Guanine
- T Thymine
- U Uracil * RNA
- Protein Sequence Analysis
In [4]:
# Working with Sequence
from Bio.Seq import Seq
In [5]:
dir(Seq)
Out[5]:
['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__imul__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__radd__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_get_seq_str_and_check_alphabet', 'back_transcribe', 'complement', 'count', 'count_overlap', 'encode', 'endswith', 'find', 'index', 'join', 'lower', 'lstrip', 'reverse_complement', 'rfind', 'rindex', 'rsplit', 'rstrip', 'split', 'startswith', 'strip', 'tomutable', 'transcribe', 'translate', 'ungap', 'upper']
In [6]:
# Create a General DNA sequence
mydna = Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA')
In [7]:
mydna
Out[7]:
Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA')
In [8]:
mydna.alphabet
Out[8]:
Alphabet()
In [9]:
# Convert Sequence to String
# Method 1
str(mydna)
Out[9]:
'ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA'
In [10]:
type(mydna)
Out[10]:
Bio.Seq.Seq
Alphabet Types
- generic_dna/rna
- generic_protein
- IUPACUnambiguousDNA, which provides for just the basic letters,
- IUPACAmbiguousDNA ,which provides for ambiguity letters for every possible situation
Usefulness of Specifying the Type of Sequence or Alphabet
- Help us to have an idea of the type of information the Seq object contains.
- Act as a means of constraining the information,
- As a means of type checking.
In [11]:
# Create a Specific Sequence (DNA,RNA,Protein)
from Bio.Alphabet import generic_dna,generic_rna,generic_protein
In [12]:
# Create a DNA
dna1 = Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA',generic_dna)
In [13]:
# Check the Type of Sequence
dna1.alphabet
Out[13]:
DNAAlphabet()
In [14]:
# Create a RNA
rna1 = Seq('AGGCUCUCGUA',generic_rna)
In [15]:
rna1.alphabet
Out[15]:
RNAAlphabet()
In [16]:
# Method 2 Using IUPAC
from Bio.Alphabet import IUPAC
In [17]:
dna2 = Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA',IUPAC.unambiguous_dna)
In [18]:
dna2.alphabet
Out[18]:
IUPACUnambiguousDNA()
Sequence Manipulation
- Indexing/Slicing
- Join 2 Sequences
- Find a Codon in a sequence
- Count the number of Nucleotides
In [19]:
dna_seq = Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA',generic_dna)
In [20]:
# Slicing
dna_seq[0:3]
Out[20]:
Seq('ATT', DNAAlphabet())
In [21]:
# Adding Sequence
dna_seq2 = Seq('AGCGCTTCGAGA',generic_dna)
In [22]:
dna_seq[0:3] + dna_seq2[4:]
Out[22]:
Seq('ATTCTTCGAGA', DNAAlphabet())
In [23]:
# Find the number of G Nucleotides in a sequence
dna_seq.count('G')
Out[23]:
8
In [24]:
# Count the number of G Nucleotides in a sequence
dna_seq.count('GGT')
Out[24]:
2
In [25]:
# Find the index/position of G Nucleotides in a sequence
dna_seq.find('G')
Out[25]:
6
In [71]:
# Count the number of G Nucleotides in a sequence in that overlap
dna_seq.count_overlap('GGT')
Out[71]:
2
In [72]:
seq1 = Seq('ATGATCTCGTAA')
In [73]:
# Complement
seq1.complement()
Out[73]:
Seq('TACTAGAGCATT')
In [77]:
# Backwards of complement
seq1.reverse_complement()
Out[77]:
Seq('TTACGAGATCAT')
In [74]:
# To mrna
seq1.transcribe()
Out[74]:
Seq('AUGAUCUCGUAA', RNAAlphabet())
In [76]:
seq1.transcribe().translate()
Out[76]:
Seq('MIS*', HasStopCodon(ExtendedIUPACProtein(), '*'))
In [75]:
# To protein
seq1.translate()
Out[75]:
Seq('MIS*', HasStopCodon(ExtendedIUPACProtein(), '*'))
Proteing Synthesis
In [ ]:
Proteing Synthesis
In [26]:
# Transcription
# DNA to mRNA
dna_seq
Out[26]:
Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA', DNAAlphabet())
In [27]:
mrna = dna_seq.transcribe()
In [28]:
# Changes the Thiamine to Uracil
mrna
Out[28]:
Seq('AUUAAAGGUUUAUACCUUCCCAGGUAACAAACCAACCAACUUUCGAUCUCUUGU...AAA', RNAAlphabet())
In [29]:
# Translation
# mRNA to Protein
# DNA to Protein
dna_seq.translate()
/usr/local/lib/python3.7/dist-packages/Bio/Seq.py:2859: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. BiopythonWarning,
Out[29]:
Seq('IKGLYLPR*QTNQLSISCRSVL*', HasStopCodon(ExtendedIUPACProtein(), '*'))
In [30]:
dir(mrna)
Out[30]:
['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__imul__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__radd__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_data', '_get_seq_str_and_check_alphabet', 'alphabet', 'back_transcribe', 'complement', 'count', 'count_overlap', 'encode', 'endswith', 'find', 'index', 'join', 'lower', 'lstrip', 'reverse_complement', 'rfind', 'rindex', 'rsplit', 'rstrip', 'split', 'startswith', 'strip', 'tomutable', 'transcribe', 'translate', 'ungap', 'upper']
In [31]:
# Translate mRNA to Protein/Amino Acid
mrna.translate()
Out[31]:
Seq('IKGLYLPR*QTNQLSISCRSVL*', HasStopCodon(ExtendedIUPACProtein(), '*'))
In [70]:
# Translate mRNA to Protein/Amino Acid
# Change the symbol for the stop codon
mrna.translate(stop_symbol='@')
Out[70]:
Seq('IKGLYLPR@QTNQLSISCRSVL@', HasStopCodon(ExtendedIUPACProtein(), '@'))
In [32]:
# Back Transcribe mRNA to DNA
mrna.back_transcribe()
Out[32]:
Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA', DNAAlphabet())
In [33]:
# View the CodonTable
from Bio.Data import CodonTable
In [35]:
dir(CodonTable)
Out[35]:
['Alphabet', 'AmbiguousCodonTable', 'AmbiguousForwardTable', 'CodonTable', 'IUPAC', 'IUPACData', 'NCBICodonTable', 'NCBICodonTableDNA', 'NCBICodonTableRNA', 'TranslationError', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'ambiguous_dna_by_id', 'ambiguous_dna_by_name', 'ambiguous_generic_by_id', 'ambiguous_generic_by_name', 'ambiguous_rna_by_id', 'ambiguous_rna_by_name', 'generic_by_id', 'generic_by_name', 'list_ambiguous_codons', 'list_possible_proteins', 'make_back_table', 'register_ncbi_table', 'standard_dna_table', 'standard_rna_table', 'unambiguous_dna_by_id', 'unambiguous_dna_by_name', 'unambiguous_rna_by_id', 'unambiguous_rna_by_name']
In [36]:
# CodonTable for DNA
print(CodonTable.unambiguous_dna_by_name['Standard'])
Table 1 Standard, SGC0 | T | C | A | G | --+---------+---------+---------+---------+-- T | TTT F | TCT S | TAT Y | TGT C | T T | TTC F | TCC S | TAC Y | TGC C | C T | TTA L | TCA S | TAA Stop| TGA Stop| A T | TTG L(s)| TCG S | TAG Stop| TGG W | G --+---------+---------+---------+---------+-- C | CTT L | CCT P | CAT H | CGT R | T C | CTC L | CCC P | CAC H | CGC R | C C | CTA L | CCA P | CAA Q | CGA R | A C | CTG L(s)| CCG P | CAG Q | CGG R | G --+---------+---------+---------+---------+-- A | ATT I | ACT T | AAT N | AGT S | T A | ATC I | ACC T | AAC N | AGC S | C A | ATA I | ACA T | AAA K | AGA R | A A | ATG M(s)| ACG T | AAG K | AGG R | G --+---------+---------+---------+---------+-- G | GTT V | GCT A | GAT D | GGT G | T G | GTC V | GCC A | GAC D | GGC G | C G | GTA V | GCA A | GAA E | GGA G | A G | GTG V | GCG A | GAG E | GGG G | G --+---------+---------+---------+---------+--
In [37]:
# CodonTable for RNA
print(CodonTable.unambiguous_rna_by_name['Standard'])
Table 1 Standard, SGC0 | U | C | A | G | --+---------+---------+---------+---------+-- U | UUU F | UCU S | UAU Y | UGU C | U U | UUC F | UCC S | UAC Y | UGC C | C U | UUA L | UCA S | UAA Stop| UGA Stop| A U | UUG L(s)| UCG S | UAG Stop| UGG W | G --+---------+---------+---------+---------+-- C | CUU L | CCU P | CAU H | CGU R | U C | CUC L | CCC P | CAC H | CGC R | C C | CUA L | CCA P | CAA Q | CGA R | A C | CUG L(s)| CCG P | CAG Q | CGG R | G --+---------+---------+---------+---------+-- A | AUU I | ACU T | AAU N | AGU S | U A | AUC I | ACC T | AAC N | AGC S | C A | AUA I | ACA T | AAA K | AGA R | A A | AUG M(s)| ACG T | AAG K | AGG R | G --+---------+---------+---------+---------+-- G | GUU V | GCU A | GAU D | GGU G | U G | GUC V | GCC A | GAC D | GGC G | C G | GUA V | GCA A | GAA E | GGA G | A G | GUG V | GCG A | GAG E | GGG G | G --+---------+---------+---------+---------+--
In [ ]:
# Analysing Covid 19
You can find the DNA sequence in several format such as GenBank format,FASTA format. To get the file you can search for it in the ncbi database as shown below
In [47]:
from Bio import SeqIO
In [51]:
# Load the file
for record in SeqIO.parse("Covid_sequence.fasta", "fasta"):
print(record.id)
print(record.name)
print(record.description)
MN908947.3 MN908947.3 MN908947.3 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
In [52]:
# Load the file
for record in SeqIO.parse("Covid_sequence.fasta", "fasta"):
print(record)
ID: MN908947.3 Name: MN908947.3 Description: MN908947.3 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome Number of features: 0 Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA', SingleLetterAlphabet())
In [57]:
# Read the sequence record in the file
ncov_dna_record = SeqIO.read("Covid_sequence.fasta","fasta")
In [61]:
type(ncov_dna_record)
Out[61]:
Bio.SeqRecord.SeqRecord
In [62]:
ncov_dna_record
Out[62]:
SeqRecord(seq=Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA', SingleLetterAlphabet()), id='MN908947.3', name='MN908947.3', description='MN908947.3 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome', dbxrefs=[])
In [63]:
ncov_dna = ncov_dna_record.seq
In [64]:
# Display the Nucleotides
ncov_dna
Out[64]:
Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA', SingleLetterAlphabet())
In [65]:
# Length of our sequence
len(ncov_dna)
Out[65]:
29903
In [66]:
# Transcribe (DNA to mRNA)
ncov_mRNA = ncov_dna.transcribe()
In [67]:
# Changes Thymine to Uracil
ncov_mRNA
Out[67]:
Seq('AUUAAAGGUUUAUACCUUCCCAGGUAACAAACCAACCAACUUUCGAUCUCUUGU...AAA', RNAAlphabet())
In [68]:
# Translate to Protein/Amino Acids (mRNA to AA)
ncov_protein = ncov_mRNA.translate()
In [69]:
ncov_protein
Out[69]:
Seq('IKGLYLPR*QTNQLSISCRSVL*TNFKICVAVTRLHA*CTHAV*LITNYCR*QD...KKK', HasStopCodon(ExtendedIUPACProtein(), '*'))
In [78]:
# Length of Protein/Amino Acids
len(ncov_protein)
Out[78]:
9967
In [80]:
# Check if it is true by dividing it by 3 for codon
len(ncov_mRNA)/3
Out[80]:
9967.666666666666
In [81]:
# Find all the amino acids
ncov_amino_acids = ncov_protein.split('*')
In [82]:
ncov_amino_acids
Out[82]:
[Seq('IKGLYLPR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QTNQLSISCRSVL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TNFKICVAVTRLHA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CTHAV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LITNYCR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QDTSNSSIFCRLLTVSSVLQPIISTSRFRPGVTER', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RWHLWLSRS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KRRFAST', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TALCVHQTFGCSNCTSWSCYG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('AGSRTRRHSVRS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('W', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('DTWCPCPSCGRNTSGLPQGSSS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ER', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RSWWP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LRRRSKVI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LRRRAWH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RFSRKLEH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QWCYP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('THA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RRGIHSLCR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QLLWP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WLPS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RPSSTCW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SFMHFVRTTGLY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('H', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EGCILLP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NCLVHGTF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KEL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IADTF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('N', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IGKEI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HLQWGMSKFCISLKFHNQDYSTKG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KEKA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WLYG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NSICLSSCVTK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('MQPNVPFNSHEV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SLW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NFMADGRFC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SHLRILWH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EFD', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RRCHYLWLLTPKCCC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NLLSSMSQFRSRT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SCRIP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IWLENHSS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GWSHYCLWRLCVLLCWLP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QVCLLGSTC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('R', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HRL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PYRCCWRRFRRS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QPS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NTPKRESQHQYCW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RDRHYFGIFFCFHKCFCGNCERFGL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SIQTNC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ILW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SYKRKS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KRCLEYW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TEINTESSLCICIRGCSCCTINFLPHS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NCSKFCACFTEGRYNNTRWNFTVFTETH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CYDVHI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FGY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QSSCNGLHYRWCCSVDFAVAN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HLWHCL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KTQTRP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('REV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GRCRVS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RRLGNC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IYLNLCL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NCRWTNCHLCKGN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GECSDIL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ACK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IFGFVC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LYHYWWS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SLEFR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NICHALKGIVQKVC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FTTIRTTY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SSIGWYTSLY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RAYVARNQRHRKVLCPCT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YDGNKQYLHTQRRCTNKGYFW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HCDRSARLQECEYHF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KD', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ST', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EVLCLYS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TRYRSK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VRLCCGRCCHKNFATSI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ITYTTGH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VEYGYILLI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IGFTYVLFFLPSR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('G', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RRR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RRRV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('AINSI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VWY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('R', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LPR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TFGIWCHFCCSST', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RRARRRLVR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('STNCWSTRRQ', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GQSDNYYSNNC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GSTSIRDGTYTSCSDY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SE', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WLFKTY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QCIH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KCRHCGRS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGKTNSGC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CSQCLP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TWRRCCRSLK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QCHAS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('I', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LHSY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WTT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SGW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LCFKRTQSC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TLSSCCRPKC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RHSTS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ECL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SARSSTCTIIISWYFWC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PYTFFKSLCRYCSHKCLLSCL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KSL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QTCFKLFGNEE', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KAS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TKDR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('DS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RGS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('AIYN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('K', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TFS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TEKTR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ENQSLC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RSYNNSGRN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VPHRKLVTLY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('H', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WQSSSRFCHSC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('H', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HHFLKERCSIYSG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CCSRGCFNCCGYTY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGWWHY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NASESFEKSANRQLYNHLPGSGFKWLHCRGGKDSA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KCLLHSTIYYL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EARNSWNCFLEFARNACTCRRNTQINACLCGN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SHSFNYTA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('I', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NTRGCG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LWC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ILLLHQ', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NNCSVTYQHT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RSK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CCYSV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WLSYFFF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RTFY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NHLTCWFL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RLVLFWTIYTTRYRIS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ER', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KCILH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SYHIPPRW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SYHL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('DTSFFERSEDY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GVYNSRQH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PPHASCGHVNDIWTTVWSNLFGWS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NKTS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('R', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NILCFT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HSTC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VLPHN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('S', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FSG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VHVSIKSH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KVEIPTS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WFNFY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('MGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QLLSCHCIVNTPTNRVEV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('STCSTRCLLQSKGW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LLCTYLSLL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('DSR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('C', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RNNELLVSTCQFRFLQKSLERGV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NLWTTADNP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GCRSCYVHGHTFL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ERCSDTLYVW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TSYKISSTTGVTFCYDVSTTCSV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('AWYIYLC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VHW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LPVWSL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TYNF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IGWCCLYRN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('P', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VGQLL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ERQFLFHRATN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SCTKPTISKRKLR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VCM', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YQIC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FKPVNWL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ETCFKRA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SYIFP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LKW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CGGY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TLHTLF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ERS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IVT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TYCLAC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QCN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SHV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TKYLVYTLSLEHKTS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NIKFV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CTEVRGRAGNG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SCLRRSKTSL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RSSGKSYHTERRS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CENYRSCRRHYT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TSK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FKNYRRGWPHRSNGCLCRQF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SYY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ET', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('II', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SIRFENPCYSWFSCC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CPLGYYS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('AFS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QSC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YNY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HSYTVFKPCLY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LYALFLYFIATIVYFY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KYKF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('N', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SIYADYYSKEYC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ECR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ILSRGFI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LFEVT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TDKYYNLVFTIKCLPRFFNLLNRCFRCFNV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FRHAFLLYWLQRRLFELY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CHYCNLLYWFYTL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CLS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WFRFFRHLSFFRNYTNYHFIF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('MGFNCFWLSCRVVFGIYSFH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VFLCTWIGCNHAIVFQLFCSTFY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FLAYVVNN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SCTNGPDFSYG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NVHLLCIILLCMEKLCACCRRL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FINLYDVLQT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SNKSRMYNYC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KVLLCLC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RLLQTTQLELC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YILCW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YIY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SCERLVTTV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KTNKSY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PVFLHR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CYSEEWFHPSLL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SWSKDL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KTFSLSFC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LRQPES', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('H', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RFIAY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CYSF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('W', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IKM', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RIICKISVCLLQSAYVSTYTVTRSGISV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CGSCS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CLR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YVFINF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RTNGKTQNTSCNCRS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TCKECVLRQCLIYFYFSSSARVC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FRCRN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RCC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('MS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IVTSI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HRSYWR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LYAHL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KHDTP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PWCLY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CASY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CAGSKKSQHCFDMER', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RFHVIV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TTTKTNT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KE', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LTF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VDMCNY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TSC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CCNNKDST', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LVEAVN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SYTCVPFCCCYFLFNNTCSCHV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LFK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NHRIQGY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WWCHS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HSIYRYLFC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QTC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HMV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PAWW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QSLPIDCCSHNKRSGFCRAWFAWHDITHN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('W', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LFAFLT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CSW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HLLHTIKTYRVH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LCNISLCFGC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('MYNF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RCFW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ASTILL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YQCTRRFCCL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KFTP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HTLCAHGWLYYSIS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HLP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RFC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SGNNF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ARHL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KIRSWCLCIY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('W', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('MGT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('Q', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LLQIFTRSFLWCRCCKFTY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KSFW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IQSCSCL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LPKETCSL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WCFL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RSCAVHLFVK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RNVSKVA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CAITSYAI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ILSSL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VQVF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WSNGYN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LQRSCLLSSRKGSQ', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LQ', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LRF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CSLPTTTNLYHLSCFAEWF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KNGIPIW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('S', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GLYGTSNLWYNYT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RSLA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RSLLSKTCDLHL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RHA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('P', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RFTHS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('S', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FLGTGW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CSTQGYWTFYAKLCT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('G', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YSQS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('DT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VCSHSTRTDFFSVSLLQWFTIWCLPMCYEAQFHY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GFIP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WFMW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CWF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HRL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LCLFLLHAPYGITNWSSCWHRLRR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LLWTFC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QANSTSSWYGHNYYS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CFSLVVRCCYKWRQVVSQSIYHNS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PCGYEVQL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TSNTRPC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HTRTSFCSNWNCRFRYVCFIKRITAKWYEWTYHIG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CFIRR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IYTF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KCLFTFCYGYYCYVCFCNDVCQT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ACISLFVFVTFSCHCSLF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YGLYAC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LGDAYYDMVGYG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('Y', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FVWF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('AKRLCYVCISCSVTNPYDSKNCV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ESVDTYECLDTRL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SLLW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CFRSSHFHVGSYNLCYF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LLRCSYNCHVFGQRYCFYVC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VLPYFLHNW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YTSVYNASLLFLRLFLYLLLWPLLFTQPLL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TDSWCL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LLSFYTGV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IYEFTGTTPTQE', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HRCLQTQH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IVGCWWQTLYQSSHCTV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NVRCKVHISSLTLSFATTQSRIII', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IVGSMCPVTQ', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HSLS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RYY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KNGFTTFCFAFHAGCCRHKQAL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RNAGQQGNLTSYSLRV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FPSIICSFCYCSRSL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('AGCC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('W', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SCS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KVEEVFECG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('I', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('I', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('P', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CSHAT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VGKDG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SSYDPNV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('I', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GQEGKSY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CYADNAFHYA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KVG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CTQQHYQQCKRWLCSLEHNTSYNSSQTNGCHTRL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KYV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WYNIYLCISIVGNPTGCRCR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NCST', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('N', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YGQFT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FSMASYCNSFKGQFCCQITE', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SCCTTTDVLCCRYYTNCLH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QCVSLLQHNKGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VCTCTVIRFTGFEMG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('E', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WNWYYLYRTGTTL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VCYRHT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SEVFILY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RIKQPK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RYGTW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FSCHSTSTSW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CNRSACQFNCIIFLCFCCRCC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SLQRLSS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WGTTNH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RIL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LKR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VCTNTYNLC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PCGFYT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KHSLYRLRYVERLWL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('STPRTHASVS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQ...VNN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TNNVCFSCFIATSL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SVC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SYNQNSITPCIH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FFHTWCLLP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PCPTI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WCLFCFH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HNKRLDFWYYFRFEDPVPTYC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CCY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ISIL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SIFGCLLPQKQQKLDGK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VQSLF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CE', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LHF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ICLSAFSYGP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RKTG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FQKS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GICV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WLF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NIF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('AHAY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FSA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SPSGFFGFRTIGRFANRY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VSNFTCFT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KLFDSW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FFFRLDSWCCSLLCGLSST', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('DFSIKI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KWNHYRCCRLCT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PSLRNKVYVEILHCRKRNLSNF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SPTNRIYC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YYKLVPFW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RHQICICLCLEQEENQQLCC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LFCPI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FRIIFHF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VLWSVSY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SLLY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CLCRFICN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('R', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SQTNRSRANWKDC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ITR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FYRLRYSLEF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GWW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LPV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SQTF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ERYFN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NLSGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HTL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LLLSFTIIWFPTH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WCWLPTIQSSSTFF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TSTCTSNCLWT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KVY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KQMCQFQLQWFNRHRCSY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QKVSAFPTIWQRHC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CCP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('STDT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('DS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HYTMFFWWCQCYNTRNKYF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PGCCSLSGC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LHRSPCCYSCRSTYSYLACLFYRF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CFSNTCRLFNRG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TCQQLI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HTHWCRYMR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LSDSD', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FSSAGT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SIHHCLHYVTWCRKFSCLL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LYCHTHKFYY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CYHRNSTSVYDQDISRLYNVHLW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('MQQSFVAIWQFLYTIKPCFNWNSC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TRQKHPRSFCTSQTNLQNTTN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RFWWF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FFTNITRSIKTKQEVIY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RSTFQQSDTCRCWLHQTIW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LPW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YCC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RPHLCTKV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RPYCFATFAHR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NDCSIHFCTVSGYNHFWLDLWCRCCITNTICYANGL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WYWSYTECSL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EPKIDCQPI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CYWQNSRLTFFHSKCTWKTSRCGQPKCTSFKHAC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LQFWCNFKCFK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YPFTS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('G', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SAN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VDHRQTSKFADICDSTIN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SCRNQSFC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SCCY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NVRVCTWTIKKS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WKSTLSS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RCLCFKWHTLVCNTKEFL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TTNHYYRQHICVW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CCNRNCQQHSL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SFAT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IRLIQGGVR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ESYITRC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HLWH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CFSCKHSKRN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PPQ', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GCQEFK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ISHRSPRTWKV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('AVYKMAMVHLARFYSWLDCHSNGDNYALLYDQLL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LSQGLLFLWILLQI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RRL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQA...VPL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNI...LLV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ILY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FFCLEL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PWQIPTVLLPLKSLKSSLNNGT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('VSYSLHGFVFYNLPMPTGIGFCI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FSSGCYGQ', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LVLCLLLFTE', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IGSPVELLSQWLVL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PDRF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KVNS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SEL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SFVDIFVLLDTI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('DAVTSRTCLKKSLLLHHERFLITNWELRSV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QVTQVLLHTVATGLATIN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TQTIPVAVTILLCLYSK', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQL...EID', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADN...KTE', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LNFH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LTSICAF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PFCYSLF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LCLLSFGSHLNCKIIMKLVTPKRT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NFLFS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ESSQL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LHFTKNVVYSHVLNINHM', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LMTRVLFTSILNGILE', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ELENQHL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CVVRSMKTF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SIMTFVLF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('ISSKRTN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NV', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('WTPKSAKCTPHYVWWTLRFNWQ', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PEWRTQWGAIKTTSAPRFTQ', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('YCVLVHRSHSTWQGRP', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('IPSRTRRSN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('HQ', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QSR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('PNWLLPKSYQTNSWW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('R', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NERSQSKMVFLLPRNWARSWTSLWC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QRRHHMGCN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GSLEYTKRSHWHPQSC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QCCNRATTSSRNNIAKRLLRRREQRRQSSLFSFLIT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SQQFKKFNSRQQ', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GNFSC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NGWQWR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('CCSCFAAA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QIEPA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EQNVW', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RPTTTRPNCH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EICC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('GF', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('EASAKTYCH', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SIQCNTSFRQTWSRTNPRKFWGPGTNQTRN', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('QRSKFQRSSHFAE', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('AY', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('RIQNIPTNRA', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGQKEEG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('NSSLTAETEETANCDSSSCCRFG', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('FLQTIATIHEQC', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LNSGLNSCRPHKADGLYKRFRFSVYDI', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('STLVQNEFS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LHSTSRCS', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('SHIAIFNQCVTLGRT', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('LLRRMTKKKKKKKKKK', HasStopCodon(ExtendedIUPACProtein(), '*'))]
In [86]:
for i in ncov_amino_acids:
print(i)
IKGLYLPR QTNQLSISCRSVL TNFKICVAVTRLHA CTHAV LITNYCR QDTSNSSIFCRLLTVSSVLQPIISTSRFRPGVTER DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS RWHLWLSRS KRRFAST TALCVHQTFGCSNCTSWSCYG AGSRTRRHSVRS W DTWCPCPSCGRNTSGLPQGSSS ER RSWWP LRRRSKVI LRRRAWH SL RFSRKLEH T QWCYP THA A RRGIHSLCR QLLWP WLPS VH RPSSTCW SFMHFVRTTGLY H EGCILLP T A NCLVHGTF KEL IADTF N IGKEI HLQWGMSKFCISLKFHNQDYSTKG KEKA WLYG NSICLSSCVTK MQPNVPFNSHEV SLW NFMADGRFC SHLRILWH EFD RRCHYLWLLTPKCCC NLLSSMSQFRSRT A SCRIP IWLENHSS GWSHYCLWRLCVLLCWLP QVCLLGSTC R HRL PYRCCWRRFRRS QPS NTPKRESQHQYCW L T RDRHYFGIFFCFHKCFCGNCERFGL SIQTNC ILW F SYKRKS KRCLEYW TEINTESSLCICIRGCSCCTINFLPHS NCSKFCACFTEGRYNNTRWNFTVFTETH CYDVHI FGY QSSCNGLHYRWCCSVDFAVAN HLWHCL KTQTRP LA REV GRCRVS RRLGNC IYLNLCL NCRWTNCHLCKGN GECSDIL ACK IFGFVC LYHYWWS T SLEFR NICHALKGIVQKVC IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW FTTIRTTY SC SSIGWYTSLY RAYVARNQRHRKVLCPCT YDGNKQYLHTQRRCTNKGYFW HCDRSARLQECEYHF T KD ST EVLCLYS TRYRSK VRLCCGRCCHKNFATSI ITYTTGH FR VEYGYILLI VW V IGFTYVLFFLPSR G RRR L RRRV AINSI VWY R LPR TFGIWCHFCCSST RRARRRLVR STNCWSTRRQ GQSDNYYSNNC GSTSIRDGTYTSCSDY SE F WLFKTY QCIH KCRHCGRS KGKTNSGC CSQCLP TWRRCCRSLK GY QCHAS I LHSY WTT SGW LCFKRTQSC TLSSCCRPKC QR RHSTS ECL KF SARSSTCTIIISWYFWC PYTFFKSLCRYCSHKCLLSCL KSL QTCFKLFGNEE KAS TKDR DS RGS AIYN K TFS TEKTR ENQSLC RSYNNSGRN VPHRKLVTLY H WQSSSRFCHSC H HHFLKERCSIYSG CCSRGCFNCCGYTY KGWWHY NASESFEKSANRQLYNHLPGSGFKWLHCRGGKDSA KV KCLLHSTIYYL EARNSWNCFLEFARNACTCRRNTQINACLCGN SHSFNYTA I GY NTRGCG LWC ILLLHQ NNCSVTYQHT RSK NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT CCYSV WLSYFFF NT RTFY NHLTCWFL RLVLFWTIYTTRYRIS ER KCILH SYHIPPRW SYHL QS DTSFFERSEDY GVYNSRQH PPHASCGHVNDIWTTVWSNLFGWS CY NKTS FT R NILCFT HSTC GF VLPHN S FSG VHVSIKSH KVEIPTS WFNFY MGR QLLSCHCIVNTPTNRVEV STCSTRCLLQSKGW SC LLCTYLSLL DSR VR C RNNELLVSTCQFRFLQKSLERGV NLWTTADNP GCRSCYVHGHTFL TI ERCSDTLYVW TSYKISSTTGVTFCYDVSTTCSV T AWYIYLC VHW LPVWSL TYNF RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL IGWCCLYRN P VGQLL ERQFLFHRATN SCTKPTISKRKLR F VCM YQIC FKPVNWL ETCFKRA SYIFP LKW CGGY L TLHTLF ERS IVT TYCLAC QCN SHV TKYLVYTLSLEHKTS NIKFV CTEVRGRAGNG SCLRRSKTSL RSSGKSYHTERRS V CENYRSCRRHYT TSK FKNYRRGWPHRSNGCLCRQF SYY ET II SIRFENPCYSWFSCC CPLGYYS LC AFS QSC YNY HSYTVFKPCLY LYALFLYFIATIVYFY KYKF N SIYADYYSKEYC ECR ILSRGFI LFEVT FF TDKYYNLVFTIKCLPRFFNLLNRCFRCFNV FRHAFLLYWLQRRLFELY CHYCNLLYWFYTL CLS WFRFFRHLSFFRNYTNYHFIF MGFNCFWLSCRVVFGIYSFH VFLCTWIGCNHAIVFQLFCSTFY FLAYVVNN SCTNGPDFSYG NVHLLCIILLCMEKLCACCRRL FINLYDVLQT SNKSRMYNYC WC KVLLCLC WR RLLQTTQLELC L YILCW YIY SCERLVTTV KTNKSY PVFLHR CYSEEWFHPSLL SWSKDL KTFSLSFC LRQPES H RFIAY CYSF W IKM RIICKISVCLLQSAYVSTYTVTRSGISV CW CGSCS NV CLR YVFINF RTNGKTQNTSCNCRS TCKECVLRQCLIYFYFSSSARVC FRCRN RCC MS IVTSI HRSYWR L LYAHL QS KHDTP PWCLY L CASY CAGSKKSQHCFDMER RFHVIV TTTKTNT CC KE LTF VDMCNY TSC CCNNKDST GW NC LVEAVN SYTCVPFCCCYFLFNNTCSCHV TY LFK NHRIQGY WWCHS HSIYRYLFC QTC F HMV PAWW LY QSLPIDCCSHNKRSGFCRAWFAWHDITHN W LFAFLT SF CSW HLLHTIKTYRVH LCNISLCFGC MYNF RCFW ASTILL YQCTRRFCCL KFTP HTLCAHGWLYYSIS HLP RFC SGNNF F VL ARHL KIRSWCLCIY W MGT Q LLQIFTRSFLWCRCCKFTY YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV KSFW IQSCSCL YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL LPKETCSL WCFL YF RSCAVHLFVK RNVSKVA CAITSYAI ILSSL VQVF WSNGYN LQRSCLLSSRKGSQ LQ LRF CSLPTTTNLYHLSCFAEWF KNGIPIW S GLYGTSNLWYNYT RSLA RSLLSKTCDLHL RHA P L RFTHS V S FLGTGW CSTQGYWTFYAKLCT A G YSQS DT V VCSHSTRTDFFSVSLLQWFTIWCLPMCYEAQFHY GFIP WFMW CWF HRL LCLFLLHAPYGITNWSSCWHRLRR LLWTFC QANSTSSWYGHNYYS CFSLVVRCCYKWRQVVSQSIYHNS L PCGYEVQL TSNTRPC HTRTSFCSNWNCRFRYVCFIKRITAKWYEWTYHIG CFIRR IYTF CC TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV KCLFTFCYGYYCYVCFCNDVCQT ACISLFVFVTFSCHCSLF YGLYAC LGDAYYDMVGYG Y FVWF AKRLCYVCISCSVTNPYDSKNCV WC ESVDTYECLDTRL SLLW CFRSSHFHVGSYNLCYF LLRCSYNCHVFGQRYCFYVC VLPYFLHNW YTSVYNASLLFLRLFLYLLLWPLLFTQPLL TDSWCL LLSFYTGV IYEFTGTTPTQE HRCLQTQH IVGCWWQTLYQSSHCTV NVRCKVHISSLTLSFATTQSRIII IVGSMCPVTQ HSLS RYY SL KNGFTTFCFAFHAGCCRHKQAL RNAGQQGNLTSYSLRV FPSIICSFCYCSRSL AGCC W F SCS KVEEVFECG I I P CSHAT VGKDG SSYDPNV TG I GQEGKSY CYADNAFHYA KVG CTQQHYQQCKRWLCSLEHNTSYNSSQTNGCHTRL HI KYV WYNIYLCISIVGNPTGCRCR NCST N YGQFT FSMASYCNSFKGQFCCQITE A SCCTTTDVLCCRYYTNCLH QCVSLLQHNKGR VCTCTVIRFTGFEMG IP E WNWYYLYRTGTTL VCYRHT RS SEVFILY RIKQPK RYGTW FSCHSTSTSW CNRSACQFNCIIFLCFCCRCC SLQRLSS WGTTNH LC DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS RIL LKR VCTNTYNLC PCGFYT KHSLYRLRYVERLWL L STPRTHASVS CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQAVGACVLCNSQTSLRCGACIRRPFLCCKCCYDHVISTSHKLVLSVNPYVCNAPGCDVTDVTQLYLGGMSYYCKSHKPPISFPLCANGQVFGLYKNTCVGSDNVTDFNAIATCDWTNAGDYILANTCTERLKLFAAETLKATEETFKLSYGIATVREVLSDRELHLSWEVGKPRPPLNRNYVFTGYRVTKNSKVQIGEYTFEKGDYGDAVVYRGTTTYKLNVGDYFVLTSHTVMPLSAPTLVPQEHYVRITGLYPTLNISDEFSSNVANYQKVGMQKYSTLQGPPGTGKSHFAIGLALYYPSARIVYTACSHAAVDALCEKALKYLPIDKCSRIIPARARVECFDKFKVNSTLEQYVFCTVNALPETTADIVVFDEISMATNYDLSVVNARLRAKHYVYIGDPAQLPAPRTLLTKGTLEPEYFNSVCRLMKTIGPDMFLGTCRRCPAEIVDTVSALVYDNKLKAHKDKSAQCFKMFYKGVITHDVSSAINRPQIGVVREFLTRNPAWRKAVFISPYNSQNAVASKILGLPTQTVDSSQGSEYDYVIFTQTTETAHSCNVNRFNVAITRAKVGILCIMSDRDLYDKLQFTSLEIPRRNVATLQAENVTGLFKDCSKVITGLHPTQAPTHLSVDTKFKTEGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEGCHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLCDRRATCFSTASDTYACWHHSIGFDYVYNPFMIDVQQWGFTGNLQSNHDLYCQVHGNAHVASCDAIMTRCLAVHECFVKRVDWTIEYPIIGDELKINAACRKVQHMVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIEELFYSYATHSDKFTDGVCLFWNCNVDRYPANSIVCRFDTRVLSNLNLPGCDGGSLYVNKHAFHTPAFDKSAFVNLKQLPFFYYSDSPCESHGKQVVSDIDYVPLKSATCITRCNLGGAVCRHHANEYRLYLDAYNMMISAGFSLWVYKQFDTYNLWNTFTRLQSLENVAFNVVNKGHFDGQQGEVPVSIINNTVYTKVDGVDVELFENKTTLPVNVAFELWAKRNIKPVPEVKILNNLGVDIAANTVIWDYKRDAPAHISTIGVCSMTDIAKKPTETICAPLTVFFDGRVDGQVDLFRNARNGVLITEGSVKGLQPSVGPKQASLNGVTLIGEAVKTQFNYYKKVDGVVQQLPETYFTQSRNLQEFKPRSQMEIDFLELAMDEFIERYKLEGYAFEHIVYGDFSHSQLGGLHLLIGLAKRFKESPFELEDFIPMDSTVKNYFITDAQTGSSKCVCSVIDLLLDDFVEIIKSQDLSVVSKVVKVTIDYTEISFMLWCKDGHVETFYPKLQSSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGDSATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEGFFTYICGFIQQKLALGGSVAIKITEHSWNADLYKLMGHFAWWTAFVTNVNASSSEAFLIGCNYLGKPREQIDGYVMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKEGQINDMILSLLSKGRLIIRENNRVVISSDVLVNN TNNVCFSCFIATSL SVC SYNQNSITPCIH FFHTWCLLP QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY EV PCPTI WCLFCFH EV HNKRLDFWYYFRFEDPVPTYC RY CCY SL ISIL SIFGCLLPQKQQKLDGK VQSLF CE LHF ICLSAFSYGP RKTG FQKS GICV EY WLF NIF AHAY FSA SPSGFFGFRTIGRFANRY HH VSNFTCFT KLFDSW FFFRLDSWCCSLLCGLSST DFSIKI KWNHYRCCRLCT PSLRNKVYVEILHCRKRNLSNF L SPTNRIYC IS YYKLVPFW SF RHQICICLCLEQEENQQLCC LFCPI FRIIFHF VLWSVSY IK SLLY CLCRFICN R SQTNRSRANWKDC L L ITR FYRLRYSLEF QS F GWW L LPV IV EV SQTF ERYFN NLSGR HTL WC RF LLLSFTIIWFPTH WCWLPTIQSSSTFF TSTCTSNCLWT KVY FG KQMCQFQLQWFNRHRCSY V QKVSAFPTIWQRHC HY CCP STDT DS HYTMFFWWCQCYNTRNKYF PGCCSLSGC LHRSPCCYSCRSTYSYLACLFYRF CFSNTCRLFNRG TCQQLI V HTHWCRYMR LSDSD FSSAGT CS SIHHCLHYVTWCRKFSCLL LYCHTHKFYY CYHRNSTSVYDQDISRLYNVHLW FN MQQSFVAIWQFLYTIKPCFNWNSC TRQKHPRSFCTSQTNLQNTTN RFWWF FFTNITRSIKTKQEVIY RSTFQQSDTCRCWLHQTIW LPW YCC RPHLCTKV RPYCFATFAHR NDCSIHFCTVSGYNHFWLDLWCRCCITNTICYANGL V WYWSYTECSL EPKIDCQPI CYWQNSRLTFFHSKCTWKTSRCGQPKCTSFKHAC TT LQFWCNFKCFK YPFTS QS G SAN VDHRQTSKFADICDSTIN SCRNQSFC SCCY NVRVCTWTIKKS FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS WKSTLSS RCLCFKWHTLVCNTKEFL TTNHYYRQHICVW L CCNRNCQQHSL SFAT IRLIQGGVR IF ESYITRC FR HLWH CFSCKHSKRN PPQ GCQEFK ISHRSPRTWKV AVYKMAMVHLARFYSWLDCHSNGDNYALLYDQLL LSQGLLFLWILLQI RRL ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPL AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV TN ILY FFCLEL F PWQIPTVLLPLKSLKSSLNNGT VSYSLHGFVFYNLPMPTGIGFCI LS FSSGCYGQ L LVLCLLLFTE IGSPVELLSQWLVL A CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF PDRF KVNS SEL SFVDIFVLLDTI DAVTSRTCLKKSLLLHHERFLITNWELRSV QVTQVLLHTVATGLATIN TQTIPVAVTILLCLYSK QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQLDEEQPMEID TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIVAAIVFITLCFTLKRKTE LNFH LTSICAF PFCYSLF LCLLSFGSHLNCKIIMKLVTPKRT NFLFS ESSQL LHFTKNVVYSHVLNINHM LMTRVLFTSILNGILE ELENQHL LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL CVVRSMKTF SIMTFVLF ISSKRTN NV WTPKSAKCTPHYVWWTLRFNWQ PEWRTQWGAIKTTSAPRFTQ YCVLVHRSHSTWQGRP IPSRTRRSN HQ QSR PNWLLPKSYQTNSWW R NERSQSKMVFLLPRNWARSWTSLWC QRRHHMGCN GSLEYTKRSHWHPQSC QCCNRATTSSRNNIAKRLLRRREQRRQSSLFSFLIT SQQFKKFNSRQQ GNFSC NGWQWR CCSCFAAA QIEPA EQNVW RPTTTRPNCH EICC GF EASAKTYCH SIQCNTSFRQTWSRTNPRKFWGPGTNQTRN LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG QRSKFQRSSHFAE AY RIQNIPTNRA KGQKEEG NSSLTAETEETANCDSSSCCRFG FLQTIATIHEQC LNSGLNSCRPHKADGLYKRFRFSVYDI STLVQNEFS LHSTSRCS L SHIAIFNQCVTLGRT KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM F LLRRMTKKKKKKKKKK
In [89]:
ncov_aa = [str(i) for i in ncov_amino_acids]
In [90]:
ncov_aa
Out[90]:
['IKGLYLPR', 'QTNQLSISCRSVL', 'TNFKICVAVTRLHA', 'CTHAV', 'LITNYCR', 'QDTSNSSIFCRLLTVSSVLQPIISTSRFRPGVTER', 'DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS', 'RWHLWLSRS', 'KRRFAST', 'TALCVHQTFGCSNCTSWSCYG', 'AGSRTRRHSVRS', 'W', 'DTWCPCPSCGRNTSGLPQGSSS', 'ER', '', 'RSWWP', 'LRRRSKVI', 'LRRRAWH', 'SL', 'RFSRKLEH', 'T', 'QWCYP', 'THA', 'A', 'RRGIHSLCR', 'QLLWP', 'WLPS', 'VH', 'RPSSTCW', 'SFMHFVRTTGLY', 'H', 'EGCILLP', 'T', 'A', 'NCLVHGTF', 'KEL', 'IADTF', 'N', 'IGKEI', 'HLQWGMSKFCISLKFHNQDYSTKG', 'KEKA', 'WLYG', 'NSICLSSCVTK', 'MQPNVPFNSHEV', 'SLW', 'NFMADGRFC', 'SHLRILWH', 'EFD', 'RRCHYLWLLTPKCCC', 'NLLSSMSQFRSRT', 'A', 'SCRIP', '', 'IWLENHSS', 'GWSHYCLWRLCVLLCWLP', 'QVCLLGSTC', 'R', 'HRL', 'PYRCCWRRFRRS', '', 'QPS', 'NTPKRESQHQYCW', 'L', 'T', '', 'RDRHYFGIFFCFHKCFCGNCERFGL', 'SIQTNC', 'ILW', 'F', 'SYKRKS', 'KRCLEYW', 'TEINTESSLCICIRGCSCCTINFLPHS', 'NCSKFCACFTEGRYNNTRWNFTVFTETH', 'CYDVHI', 'FGY', 'QSSCNGLHYRWCCSVDFAVAN', 'HLWHCL', 'KTQTRP', 'LA', 'REV', 'GRCRVS', 'RRLGNC', 'IYLNLCL', 'NCRWTNCHLCKGN', 'GECSDIL', 'ACK', 'IFGFVC', 'LYHYWWS', 'T', 'SLEFR', 'NICHALKGIVQKVC', 'IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW', 'FTTIRTTY', '', 'SC', 'SSIGWYTSLY', 'RAYVARNQRHRKVLCPCT', 'YDGNKQYLHTQRRCTNKGYFW', '', 'HCDRSARLQECEYHF', 'T', '', 'KD', '', 'ST', '', 'EVLCLYS', 'TRYRSK', 'VRLCCGRCCHKNFATSI', 'ITYTTGH', 'FR', 'VEYGYILLI', '', 'VW', 'V', 'IGFTYVLFFLPSR', 'G', 'RRR', 'L', 'RRRV', 'AINSI', 'VWY', 'R', 'LPR', 'TFGIWCHFCCSST', 'RRARRRLVR', '', '', 'STNCWSTRRQ', 'GQSDNYYSNNC', 'GSTSIRDGTYTSCSDY', 'SE', 'F', 'WLFKTY', 'QCIH', 'KCRHCGRS', 'KGKTNSGC', 'CSQCLP', 'TWRRCCRSLK', 'GY', 'QCHAS', 'I', '', 'LHSY', 'WTT', 'SGW', 'LCFKRTQSC', 'TLSSCCRPKC', 'QR', 'RHSTS', 'ECL', 'KF', 'SARSSTCTIIISWYFWC', 'PYTFFKSLCRYCSHKCLLSCL', '', 'KSL', 'QTCFKLFGNEE', 'KAS', 'TKDR', 'DS', 'RGS', 'AIYN', 'K', 'TFS', 'TEKTR', '', 'ENQSLC', 'RSYNNSGRN', 'VPHRKLVTLY', 'H', 'WQSSSRFCHSC', '', 'H', 'HHFLKERCSIYSG', 'CCSRGCFNCCGYTY', 'KGWWHY', 'NASESFEKSANRQLYNHLPGSGFKWLHCRGGKDSA', 'KV', 'KCLLHSTIYYL', '', 'EARNSWNCFLEFARNACTCRRNTQINACLCGN', 'SHSFNYTA', 'I', 'GY', 'NTRGCG', 'LWC', 'ILLLHQ', 'NNCSVTYQHT', 'RSK', 'NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT', 'CCYSV', 'WLSYFFF', 'NT', 'RTFY', 'NHLTCWFL', 'RLVLFWTIYTTRYRIS', 'ER', '', 'KCILH', '', 'SYHIPPRW', 'SYHL', 'QS', 'DTSFFERSEDY', 'GVYNSRQH', 'PPHASCGHVNDIWTTVWSNLFGWS', 'CY', 'NKTS', 'FT', 'R', 'NILCFT', '', '', 'HSTC', 'GF', 'VLPHN', 'S', 'FSG', 'VHVSIKSH', 'KVEIPTS', 'WFNFY', 'MGR', 'QLLSCHCIVNTPTNRVEV', 'STCSTRCLLQSKGW', 'SC', 'LLCTYLSLL', '', 'DSR', 'VR', 'C', 'RNNELLVSTCQFRFLQKSLERGV', 'NLWTTADNP', 'GCRSCYVHGHTFL', 'TI', 'ERCSDTLYVW', 'TSYKISSTTGVTFCYDVSTTCSV', 'T', 'AWYIYLC', '', 'VHW', 'LPVWSL', 'TYNF', 'RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL', 'IGWCCLYRN', 'P', 'VGQLL', 'ERQFLFHRATN', 'SCTKPTISKRKLR', 'F', 'VCM', '', 'YQIC', '', 'FKPVNWL', 'ETCFKRA', 'SYIFP', 'LKW', 'CGGY', 'L', 'TLHTLF', 'ERS', 'IVT', 'TYCLAC', 'QCN', '', 'SHV', 'TKYLVYTLSLEHKTS', 'NIKFV', 'CTEVRGRAGNG', 'SCLRRSKTSL', 'RSSGKSYHTERRS', 'V', 'CENYRSCRRHYT', 'TSK', '', 'FKNYRRGWPHRSNGCLCRQF', 'SYY', 'ET', '', 'II', 'SIRFENPCYSWFSCC', '', 'CPLGYYS', 'LC', 'AFS', 'QSC', 'YNY', 'HSYTVFKPCLY', 'LYALFLYFIATIVYFY', 'KYKF', 'N', 'SIYADYYSKEYC', 'ECR', 'ILSRGFI', 'LFEVT', 'FF', 'TDKYYNLVFTIKCLPRFFNLLNRCFRCFNV', 'FRHAFLLYWLQRRLFELY', 'CHYCNLLYWFYTL', 'CLS', 'WFRFFRHLSFFRNYTNYHFIF', 'MGFNCFWLSCRVVFGIYSFH', 'VFLCTWIGCNHAIVFQLFCSTFY', '', 'FLAYVVNN', 'SCTNGPDFSYG', 'NVHLLCIILLCMEKLCACCRRL', 'FINLYDVLQT', '', 'SNKSRMYNYC', 'WC', 'KVLLCLC', 'WR', 'RLLQTTQLELC', 'L', 'YILCW', 'YIY', '', '', 'SCERLVTTV', 'KTNKSY', 'PVFLHR', '', 'CYSEEWFHPSLL', '', 'SWSKDL', 'KTFSLSFC', 'LRQPES', '', 'H', 'RFIAY', 'CYSF', 'W', 'IKM', 'RIICKISVCLLQSAYVSTYTVTRSGISV', 'CW', '', 'CGSCS', 'NV', 'CLR', 'YVFINF', 'RTNGKTQNTSCNCRS', 'TCKECVLRQCLIYFYFSSSARVC', 'FRCRN', 'RCC', 'MS', 'IVTSI', 'HRSYWR', 'L', '', 'LYAHL', 'QS', 'KHDTP', 'PWCLY', 'L', 'CASY', 'CAGSKKSQHCFDMER', 'RFHVIV', 'TTTKTNT', 'CC', 'KE', 'LTF', 'VDMCNY', 'TSC', 'CCNNKDST', 'GW', 'NC', '', 'LVEAVN', 'SYTCVPFCCCYFLFNNTCSCHV', 'TY', 'LFK', 'NHRIQGY', 'WWCHS', 'HSIYRYLFC', 'QTC', 'F', 'HMV', 'PAWW', 'LY', '', 'QSLPIDCCSHNKRSGFCRAWFAWHDITHN', 'W', 'LFAFLT', 'SF', 'CSW', 'HLLHTIKTYRVH', 'LCNISLCFGC', 'MYNF', 'RCFW', 'ASTILL', 'YQCTRRFCCL', 'KFTP', 'HTLCAHGWLYYSIS', 'HLP', 'RFC', 'SGNNF', 'F', 'VL', 'ARHL', 'KIRSWCLCIY', 'W', 'MGT', 'Q', 'LLQIFTRSFLWCRCCKFTY', 'YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV', 'KSFW', 'IQSCSCL', 'YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY', '', 'CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL', '', 'LPKETCSL', 'WCFL', 'YF', 'RSCAVHLFVK', 'RNVSKVA', '', 'CAITSYAI', '', 'ILSSL', '', 'VQVF', 'WSNGYN', 'LQRSCLLSSRKGSQ', 'LQ', 'LRF', 'CSLPTTTNLYHLSCFAEWF', 'KNGIPIW', 'S', 'GLYGTSNLWYNYT', 'RSLA', '', 'RSLLSKTCDLHL', 'RHA', 'P', 'L', 'RFTHS', 'V', 'S', 'FLGTGW', 'CSTQGYWTFYAKLCT', 'A', 'G', 'YSQS', 'DT', 'V', 'VCSHSTRTDFFSVSLLQWFTIWCLPMCYEAQFHY', 'GFIP', 'WFMW', 'CWF', 'HRL', 'LCLFLLHAPYGITNWSSCWHRLRR', 'LLWTFC', 'QANSTSSWYGHNYYS', 'CFSLVVRCCYKWRQVVSQSIYHNS', '', 'L', 'PCGYEVQL', 'TSNTRPC', 'HTRTSFCSNWNCRFRYVCFIKRITAKWYEWTYHIG', 'CFIRR', 'IYTF', 'CC', 'TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV', 'KCLFTFCYGYYCYVCFCNDVCQT', 'ACISLFVFVTFSCHCSLF', 'YGLYAC', 'LGDAYYDMVGYG', 'Y', 'FVWF', 'AKRLCYVCISCSVTNPYDSKNCV', '', 'WC', 'ESVDTYECLDTRL', 'SLLW', 'CFRSSHFHVGSYNLCYF', 'LLRCSYNCHVFGQRYCFYVC', 'VLPYFLHNW', 'YTSVYNASLLFLRLFLYLLLWPLLFTQPLL', 'TDSWCL', 'LLSFYTGV', 'IYEFTGTTPTQE', 'HRCLQTQH', 'IVGCWWQTLYQSSHCTV', 'NVRCKVHISSLTLSFATTQSRIII', 'IVGSMCPVTQ', 'HSLS', 'RYY', 'SL', 'KNGFTTFCFAFHAGCCRHKQAL', 'RNAGQQGNLTSYSLRV', 'FPSIICSFCYCSRSL', 'AGCC', 'W', 'F', 'SCS', 'KVEEVFECG', 'I', 'I', 'P', 'CSHAT', 'VGKDG', 'SSYDPNV', 'TG', 'I', 'GQEGKSY', 'CYADNAFHYA', 'KVG', '', 'CTQQHYQQCKRWLCSLEHNTSYNSSQTNGCHTRL', 'HI', 'KYV', 'WYNIYLCISIVGNPTGCRCR', '', 'NCST', '', 'N', 'YGQFT', 'FSMASYCNSFKGQFCCQITE', '', 'A', 'SCCTTTDVLCCRYYTNCLH', '', 'QCVSLLQHNKGR', 'VCTCTVIRFTGFEMG', 'IP', 'E', 'WNWYYLYRTGTTL', 'VCYRHT', 'RS', 'SEVFILY', 'RIKQPK', 'RYGTW', 'FSCHSTSTSW', 'CNRSACQFNCIIFLCFCCRCC', 'SLQRLSS', 'WGTTNH', 'LC', 'DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS', 'RIL', 'LKR', 'VCTNTYNLC', '', 'PCGFYT', 'KHSLYRLRYVERLWL', 'L', 'STPRTHASVS', 'CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQAVGACVLCNSQTSLRCGACIRRPFLCCKCCYDHVISTSHKLVLSVNPYVCNAPGCDVTDVTQLYLGGMSYYCKSHKPPISFPLCANGQVFGLYKNTCVGSDNVTDFNAIATCDWTNAGDYILANTCTERLKLFAAETLKATEETFKLSYGIATVREVLSDRELHLSWEVGKPRPPLNRNYVFTGYRVTKNSKVQIGEYTFEKGDYGDAVVYRGTTTYKLNVGDYFVLTSHTVMPLSAPTLVPQEHYVRITGLYPTLNISDEFSSNVANYQKVGMQKYSTLQGPPGTGKSHFAIGLALYYPSARIVYTACSHAAVDALCEKALKYLPIDKCSRIIPARARVECFDKFKVNSTLEQYVFCTVNALPETTADIVVFDEISMATNYDLSVVNARLRAKHYVYIGDPAQLPAPRTLLTKGTLEPEYFNSVCRLMKTIGPDMFLGTCRRCPAEIVDTVSALVYDNKLKAHKDKSAQCFKMFYKGVITHDVSSAINRPQIGVVREFLTRNPAWRKAVFISPYNSQNAVASKILGLPTQTVDSSQGSEYDYVIFTQTTETAHSCNVNRFNVAITRAKVGILCIMSDRDLYDKLQFTSLEIPRRNVATLQAENVTGLFKDCSKVITGLHPTQAPTHLSVDTKFKTEGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEGCHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLCDRRATCFSTASDTYACWHHSIGFDYVYNPFMIDVQQWGFTGNLQSNHDLYCQVHGNAHVASCDAIMTRCLAVHECFVKRVDWTIEYPIIGDELKINAACRKVQHMVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIEELFYSYATHSDKFTDGVCLFWNCNVDRYPANSIVCRFDTRVLSNLNLPGCDGGSLYVNKHAFHTPAFDKSAFVNLKQLPFFYYSDSPCESHGKQVVSDIDYVPLKSATCITRCNLGGAVCRHHANEYRLYLDAYNMMISAGFSLWVYKQFDTYNLWNTFTRLQSLENVAFNVVNKGHFDGQQGEVPVSIINNTVYTKVDGVDVELFENKTTLPVNVAFELWAKRNIKPVPEVKILNNLGVDIAANTVIWDYKRDAPAHISTIGVCSMTDIAKKPTETICAPLTVFFDGRVDGQVDLFRNARNGVLITEGSVKGLQPSVGPKQASLNGVTLIGEAVKTQFNYYKKVDGVVQQLPETYFTQSRNLQEFKPRSQMEIDFLELAMDEFIERYKLEGYAFEHIVYGDFSHSQLGGLHLLIGLAKRFKESPFELEDFIPMDSTVKNYFITDAQTGSSKCVCSVIDLLLDDFVEIIKSQDLSVVSKVVKVTIDYTEISFMLWCKDGHVETFYPKLQSSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGDSATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEGFFTYICGFIQQKLALGGSVAIKITEHSWNADLYKLMGHFAWWTAFVTNVNASSSEAFLIGCNYLGKPREQIDGYVMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKEGQINDMILSLLSKGRLIIRENNRVVISSDVLVNN', 'TNNVCFSCFIATSL', 'SVC', 'SYNQNSITPCIH', 'FFHTWCLLP', 'QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY', 'EV', '', 'PCPTI', '', 'WCLFCFH', 'EV', 'HNKRLDFWYYFRFEDPVPTYC', '', 'RY', 'CCY', 'SL', 'ISIL', '', 'SIFGCLLPQKQQKLDGK', 'VQSLF', 'CE', 'LHF', 'ICLSAFSYGP', 'RKTG', 'FQKS', 'GICV', 'EY', 'WLF', 'NIF', 'AHAY', 'FSA', 'SPSGFFGFRTIGRFANRY', 'HH', 'VSNFTCFT', 'KLFDSW', 'FFFRLDSWCCSLLCGLSST', 'DFSIKI', '', 'KWNHYRCCRLCT', 'PSLRNKVYVEILHCRKRNLSNF', 'L', 'SPTNRIYC', 'IS', 'YYKLVPFW', 'SF', 'RHQICICLCLEQEENQQLCC', 'LFCPI', 'FRIIFHF', 'VLWSVSY', 'IK', 'SLLY', 'CLCRFICN', 'R', '', 'SQTNRSRANWKDC', 'L', 'L', 'ITR', 'FYRLRYSLEF', 'QS', 'F', 'GWW', 'L', 'LPV', 'IV', 'EV', 'SQTF', 'ERYFN', 'NLSGR', 'HTL', 'WC', 'RF', 'LLLSFTIIWFPTH', 'WCWLPTIQSSSTFF', 'TSTCTSNCLWT', 'KVY', 'FG', 'KQMCQFQLQWFNRHRCSY', 'V', 'QKVSAFPTIWQRHC', 'HY', 'CCP', 'STDT', 'DS', 'HYTMFFWWCQCYNTRNKYF', 'PGCCSLSGC', 'LHRSPCCYSCRSTYSYLACLFYRF', 'CFSNTCRLFNRG', 'TCQQLI', 'V', 'HTHWCRYMR', 'LSDSD', 'FSSAGT', 'CS', 'SIHHCLHYVTWCRKFSCLL', '', 'LYCHTHKFYY', 'CYHRNSTSVYDQDISRLYNVHLW', 'FN', 'MQQSFVAIWQFLYTIKPCFNWNSC', 'TRQKHPRSFCTSQTNLQNTTN', 'RFWWF', 'FFTNITRSIKTKQEVIY', 'RSTFQQSDTCRCWLHQTIW', 'LPW', 'YCC', 'RPHLCTKV', 'RPYCFATFAHR', 'NDCSIHFCTVSGYNHFWLDLWCRCCITNTICYANGL', 'V', 'WYWSYTECSL', 'EPKIDCQPI', '', 'CYWQNSRLTFFHSKCTWKTSRCGQPKCTSFKHAC', 'TT', 'LQFWCNFKCFK', 'YPFTS', 'QS', 'G', 'SAN', '', 'VDHRQTSKFADICDSTIN', 'SCRNQSFC', 'SCCY', 'NVRVCTWTIKKS', 'FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS', 'WKSTLSS', 'RCLCFKWHTLVCNTKEFL', 'TTNHYYRQHICVW', 'L', 'CCNRNCQQHSL', 'SFAT', 'IRLIQGGVR', 'IF', 'ESYITRC', 'FR', 'HLWH', 'CFSCKHSKRN', 'PPQ', 'GCQEFK', 'ISHRSPRTWKV', 'AVYKMAMVHLARFYSWLDCHSNGDNYALLYDQLL', 'LSQGLLFLWILLQI', '', 'RRL', 'ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPL', 'AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV', 'TN', 'ILY', 'FFCLEL', 'F', 'PWQIPTVLLPLKSLKSSLNNGT', '', '', 'VSYSLHGFVFYNLPMPTGIGFCI', 'LS', 'FSSGCYGQ', 'L', 'LVLCLLLFTE', 'IGSPVELLSQWLVL', 'A', 'CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF', 'PDRF', 'KVNS', 'SEL', 'SFVDIFVLLDTI', 'DAVTSRTCLKKSLLLHHERFLITNWELRSV', 'QVTQVLLHTVATGLATIN', 'TQTIPVAVTILLCLYSK', 'QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQLDEEQPMEID', 'TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIVAAIVFITLCFTLKRKTE', 'LNFH', 'LTSICAF', 'PFCYSLF', 'LCLLSFGSHLNCKIIMKLVTPKRT', 'NFLFS', 'ESSQL', 'LHFTKNVVYSHVLNINHM', 'LMTRVLFTSILNGILE', 'ELENQHL', 'LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL', 'CVVRSMKTF', 'SIMTFVLF', 'ISSKRTN', 'NV', '', 'WTPKSAKCTPHYVWWTLRFNWQ', 'PEWRTQWGAIKTTSAPRFTQ', 'YCVLVHRSHSTWQGRP', 'IPSRTRRSN', 'HQ', 'QSR', 'PNWLLPKSYQTNSWW', 'R', 'NERSQSKMVFLLPRNWARSWTSLWC', 'QRRHHMGCN', 'GSLEYTKRSHWHPQSC', 'QCCNRATTSSRNNIAKRLLRRREQRRQSSLFSFLIT', 'SQQFKKFNSRQQ', 'GNFSC', 'NGWQWR', 'CCSCFAAA', 'QIEPA', 'EQNVW', 'RPTTTRPNCH', 'EICC', 'GF', 'EASAKTYCH', 'SIQCNTSFRQTWSRTNPRKFWGPGTNQTRN', 'LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG', 'QRSKFQRSSHFAE', 'AY', 'RIQNIPTNRA', 'KGQKEEG', '', 'NSSLTAETEETANCDSSSCCRFG', 'FLQTIATIHEQC', 'LNSGLNSCRPHKADGLYKRFRFSVYDI', 'STLVQNEFS', 'LHSTSRCS', 'L', 'SHIAIFNQCVTLGRT', 'KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM', 'F', '', 'LLRRMTKKKKKKKKKK']
In [83]:
# Find the largest amino acid
import heapq
In [91]:
heapq.nlargest(10,ncov_aa)
Out[91]:
['YYKLVPFW', 'YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV', 'YVFINF', 'YTSVYNASLLFLRLFLYLLLWPLLFTQPLL', 'YSQS', 'YQIC', 'YQCTRRFCCL', 'YPFTS', 'YNY', 'YIY']
In [94]:
for i in ncov_amino_acids:
if len(i) > 20:
print(i)
QDTSNSSIFCRLLTVSSVLQPIISTSRFRPGVTER DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS TALCVHQTFGCSNCTSWSCYG DTWCPCPSCGRNTSGLPQGSSS HLQWGMSKFCISLKFHNQDYSTKG RDRHYFGIFFCFHKCFCGNCERFGL TEINTESSLCICIRGCSCCTINFLPHS NCSKFCACFTEGRYNNTRWNFTVFTETH QSSCNGLHYRWCCSVDFAVAN IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW YDGNKQYLHTQRRCTNKGYFW PYTFFKSLCRYCSHKCLLSCL NASESFEKSANRQLYNHLPGSGFKWLHCRGGKDSA EARNSWNCFLEFARNACTCRRNTQINACLCGN NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT PPHASCGHVNDIWTTVWSNLFGWS RNNELLVSTCQFRFLQKSLERGV TSYKISSTTGVTFCYDVSTTCSV RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL TDKYYNLVFTIKCLPRFFNLLNRCFRCFNV WFRFFRHLSFFRNYTNYHFIF VFLCTWIGCNHAIVFQLFCSTFY NVHLLCIILLCMEKLCACCRRL RIICKISVCLLQSAYVSTYTVTRSGISV TCKECVLRQCLIYFYFSSSARVC SYTCVPFCCCYFLFNNTCSCHV QSLPIDCCSHNKRSGFCRAWFAWHDITHN YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL VCSHSTRTDFFSVSLLQWFTIWCLPMCYEAQFHY LCLFLLHAPYGITNWSSCWHRLRR CFSLVVRCCYKWRQVVSQSIYHNS HTRTSFCSNWNCRFRYVCFIKRITAKWYEWTYHIG TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV KCLFTFCYGYYCYVCFCNDVCQT AKRLCYVCISCSVTNPYDSKNCV YTSVYNASLLFLRLFLYLLLWPLLFTQPLL NVRCKVHISSLTLSFATTQSRIII KNGFTTFCFAFHAGCCRHKQAL CTQQHYQQCKRWLCSLEHNTSYNSSQTNGCHTRL CNRSACQFNCIIFLCFCCRCC DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQAVGACVLCNSQTSLRCGACIRRPFLCCKCCYDHVISTSHKLVLSVNPYVCNAPGCDVTDVTQLYLGGMSYYCKSHKPPISFPLCANGQVFGLYKNTCVGSDNVTDFNAIATCDWTNAGDYILANTCTERLKLFAAETLKATEETFKLSYGIATVREVLSDRELHLSWEVGKPRPPLNRNYVFTGYRVTKNSKVQIGEYTFEKGDYGDAVVYRGTTTYKLNVGDYFVLTSHTVMPLSAPTLVPQEHYVRITGLYPTLNISDEFSSNVANYQKVGMQKYSTLQGPPGTGKSHFAIGLALYYPSARIVYTACSHAAVDALCEKALKYLPIDKCSRIIPARARVECFDKFKVNSTLEQYVFCTVNALPETTADIVVFDEISMATNYDLSVVNARLRAKHYVYIGDPAQLPAPRTLLTKGTLEPEYFNSVCRLMKTIGPDMFLGTCRRCPAEIVDTVSALVYDNKLKAHKDKSAQCFKMFYKGVITHDVSSAINRPQIGVVREFLTRNPAWRKAVFISPYNSQNAVASKILGLPTQTVDSSQGSEYDYVIFTQTTETAHSCNVNRFNVAITRAKVGILCIMSDRDLYDKLQFTSLEIPRRNVATLQAENVTGLFKDCSKVITGLHPTQAPTHLSVDTKFKTEGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEGCHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLCDRRATCFSTASDTYACWHHSIGFDYVYNPFMIDVQQWGFTGNLQSNHDLYCQVHGNAHVASCDAIMTRCLAVHECFVKRVDWTIEYPIIGDELKINAACRKVQHMVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIEELFYSYATHSDKFTDGVCLFWNCNVDRYPANSIVCRFDTRVLSNLNLPGCDGGSLYVNKHAFHTPAFDKSAFVNLKQLPFFYYSDSPCESHGKQVVSDIDYVPLKSATCITRCNLGGAVCRHHANEYRLYLDAYNMMISAGFSLWVYKQFDTYNLWNTFTRLQSLENVAFNVVNKGHFDGQQGEVPVSIINNTVYTKVDGVDVELFENKTTLPVNVAFELWAKRNIKPVPEVKILNNLGVDIAANTVIWDYKRDAPAHISTIGVCSMTDIAKKPTETICAPLTVFFDGRVDGQVDLFRNARNGVLITEGSVKGLQPSVGPKQASLNGVTLIGEAVKTQFNYYKKVDGVVQQLPETYFTQSRNLQEFKPRSQMEIDFLELAMDEFIERYKLEGYAFEHIVYGDFSHSQLGGLHLLIGLAKRFKESPFELEDFIPMDSTVKNYFITDAQTGSSKCVCSVIDLLLDDFVEIIKSQDLSVVSKVVKVTIDYTEISFMLWCKDGHVETFYPKLQSSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGDSATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEGFFTYICGFIQQKLALGGSVAIKITEHSWNADLYKLMGHFAWWTAFVTNVNASSSEAFLIGCNYLGKPREQIDGYVMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKEGQINDMILSLLSKGRLIIRENNRVVISSDVLVNN QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY HNKRLDFWYYFRFEDPVPTYC PSLRNKVYVEILHCRKRNLSNF LHRSPCCYSCRSTYSYLACLFYRF CYHRNSTSVYDQDISRLYNVHLW MQQSFVAIWQFLYTIKPCFNWNSC TRQKHPRSFCTSQTNLQNTTN NDCSIHFCTVSGYNHFWLDLWCRCCITNTICYANGL CYWQNSRLTFFHSKCTWKTSRCGQPKCTSFKHAC FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS AVYKMAMVHLARFYSWLDCHSNGDNYALLYDQLL ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPL AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV PWQIPTVLLPLKSLKSSLNNGT VSYSLHGFVFYNLPMPTGIGFCI CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF DAVTSRTCLKKSLLLHHERFLITNWELRSV QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQLDEEQPMEID TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIVAAIVFITLCFTLKRKTE LCLLSFGSHLNCKIIMKLVTPKRT LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL WTPKSAKCTPHYVWWTLRFNWQ NERSQSKMVFLLPRNWARSWTSLWC QCCNRATTSSRNNIAKRLLRRREQRRQSSLFSFLIT SIQCNTSFRQTWSRTNPRKFWGPGTNQTRN LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG NSSLTAETEETANCDSSSCCRFG LNSGLNSCRPHKADGLYKRFRFSVYDI KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM
In [95]:
# Place our Amino Acids into a DataFrame
import pandas as pd
In [100]:
df = pd.DataFrame({'amino_acids':ncov_aa})
In [101]:
df.head()
Out[101]:
amino_acids | |
---|---|
0 | IKGLYLPR |
1 | QTNQLSISCRSVL |
2 | TNFKICVAVTRLHA |
3 | CTHAV |
4 | LITNYCR |
In [102]:
df['count'] = df['amino_acids'].apply(len)
In [103]:
df.head()
Out[103]:
amino_acids | count | |
---|---|---|
0 | IKGLYLPR | 8 |
1 | QTNQLSISCRSVL | 13 |
2 | TNFKICVAVTRLHA | 14 |
3 | CTHAV | 5 |
4 | LITNYCR | 7 |
In [106]:
# Find the largest amino acid sequence
df['count'].nlargest(20)
Out[106]:
548 2701 694 290 719 123 695 83 718 63 6 46 464 46 539 43 758 43 771 43 674 42 729 41 242 40 91 39 405 38 410 38 710 38 189 37 408 36 553 36 Name: count, dtype: int64
In [111]:
df.nlargest(20,'count')
Out[111]:
amino_acids | count | |
---|---|---|
548 | CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFL… | 2701 |
694 | ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRA… | 290 |
719 | TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNS… | 123 |
695 | AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALR… | 83 |
718 | QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSL… | 63 |
6 | DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS | 46 |
464 | TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV | 46 |
539 | DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS | 43 |
758 | LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG | 43 |
771 | KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM | 43 |
674 | FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS | 42 |
729 | LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL | 41 |
242 | RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL | 40 |
91 | IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW | 39 |
405 | YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV | 38 |
410 | CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL | 38 |
710 | CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF | 38 |
189 | NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT | 37 |
408 | YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY | 36 |
553 | QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY | 36 |
In [113]:
# Most Frequent Amino Acid
print(ncov_protein.)
IKGLYLPR*QTNQLSISCRSVL*TNFKICVAVTRLHA*CTHAV*LITNYCR*QDTSNSSIFCRLLTVSSVLQPIISTSRFRPGVTER*DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS*RWHLWLSRS*KRRFAST*TALCVHQTFGCSNCTSWSCYG*AGSRTRRHSVRS*W*DTWCPCPSCGRNTSGLPQGSSS*ER**RSWWP*LRRRSKVI*LRRRAWH*SL*RFSRKLEH*T*QWCYP*THA*A*RRGIHSLCR*QLLWP*WLPS*VH*RPSSTCW*SFMHFVRTTGLY*H*EGCILLP*T*A*NCLVHGTF*KEL*IADTF*N*IGKEI*HLQWGMSKFCISLKFHNQDYSTKG*KEKA*WLYG*NSICLSSCVTK*MQPNVPFNSHEV*SLW*NFMADGRFC*SHLRILWH*EFD*RRCHYLWLLTPKCCC*NLLSSMSQFRSRT*A*SCRIP**IWLENHSS*GWSHYCLWRLCVLLCWLP*QVCLLGSTC*R*HRL*PYRCCWRRFRRS**QPS*NTPKRESQHQYCW*L*T**RDRHYFGIFFCFHKCFCGNCERFGL*SIQTNC*ILW*F*SYKRKS*KRCLEYW*TEINTESSLCICIRGCSCCTINFLPHS*NCSKFCACFTEGRYNNTRWNFTVFTETH*CYDVHI*FGY*QSSCNGLHYRWCCSVDFAVAN*HLWHCL*KTQTRP*LA*REV*GRCRVS*RRLGNC*IYLNLCL*NCRWTNCHLCKGN*GECSDIL*ACK*IFGFVC*LYHYWWS*T*SLEFR*NICHALKGIVQKVC*IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW*FTTIRTTY**SC*SSIGWYTSLY*RAYVARNQRHRKVLCPCT*YDGNKQYLHTQRRCTNKGYFW**HCDRSARLQECEYHF*T**KD**ST**EVLCLYS*TRYRSK*VRLCCGRCCHKNFATSI*ITYTTGH*FR*VEYGYILLI**VW*V*IGFTYVLFFLPSR*G*RRR*L*RRRV*AINSI*VWY*R*LPR*TFGIWCHFCCSST*RRARRRLVR***STNCWSTRRQ*GQSDNYYSNNC*GSTSIRDGTYTSCSDY*SE*F*WLFKTY*QCIH*KCRHCGRS*KGKTNSGC*CSQCLP*TWRRCCRSLK*GY*QCHAS*I**LHSY*WTT*SGW*LCFKRTQSC*TLSSCCRPKC*QR*RHSTS*ECL*KF*SARSSTCTIIISWYFWC*PYTFFKSLCRYCSHKCLLSCL**KSL*QTCFKLFGNEE*KAS*TKDR*DS*RGS*AIYN*K*TFS*TEKTR**ENQSLC*RSYNNSGRN*VPHRKLVTLY*H*WQSSSRFCHSC**H*HHFLKERCSIYSG*CCSRGCFNCCGYTY*KGWWHY*NASESFEKSANRQLYNHLPGSGFKWLHCRGGKDSA*KV*KCLLHSTIYYL**EARNSWNCFLEFARNACTCRRNTQINACLCGN*SHSFNYTA*I*GY*NTRGCG*LWC*ILLLHQ*NNCSVTYQHT*RSK*NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT*CCYSV*WLSYFFF*NT*RTFY*NHLTCWFL*RLVLFWTIYTTRYRIS*ER**KCILH**SYHIPPRW*SYHL*QS*DTSFFERSEDY*GVYNSRQH*PPHASCGHVNDIWTTVWSNLFGWS*CY*NKTS*FT*R*NILCFT***HSTC*GF*VLPHN*S*FSG*VHVSIKSH*KVEIPTS*WFNFY*MGR*QLLSCHCIVNTPTNRVEV*STCSTRCLLQSKGW*SC*LLCTYLSLL**DSR*VR*C*RNNELLVSTCQFRFLQKSLERGV*NLWTTADNP*GCRSCYVHGHTFL*TI*ERCSDTLYVW*TSYKISSTTGVTFCYDVSTTCSV*T*AWYIYLC**VHW*LPVWSL*TYNF*RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL*IGWCCLYRN*P*VGQLL*ERQFLFHRATN*SCTKPTISKRKLR*F*VCM**YQIC**FKPVNWL*ETCFKRA*SYIFP*LKW*CGGY*L*TLHTLF*ERS*IVT*TYCLAC*QCN**SHV*TKYLVYTLSLEHKTS*NIKFV*CTEVRGRAGNG*SCLRRSKTSL*RSSGKSYHTERRS*V*CENYRSCRRHYT*TSK**FKNYRRGWPHRSNGCLCRQF*SYY*ET**II*SIRFENPCYSWFSCC**CPLGYYS*LC*AFS*QSC*YNY*HSYTVFKPCLY*LYALFLYFIATIVYFY*KYKF*N*SIYADYYSKEYC*ECR*ILSRGFI*LFEVT*FF*TDKYYNLVFTIKCLPRFFNLLNRCFRCFNV*FRHAFLLYWLQRRLFELY*CHYCNLLYWFYTL*CLS*WFRFFRHLSFFRNYTNYHFIF*MGFNCFWLSCRVVFGIYSFH*VFLCTWIGCNHAIVFQLFCSTFY**FLAYVVNN*SCTNGPDFSYG*NVHLLCIILLCMEKLCACCRRL*FINLYDVLQT**SNKSRMYNYC*WC*KVLLCLC*WR*RLLQTTQLELC*L*YILCW*YIY***SCERLVTTV*KTNKSY*PVFLHR**CYSEEWFHPSLL**SWSKDL*KTFSLSFC*LRQPES**H*RFIAY*CYSF*W*IKM*RIICKISVCLLQSAYVSTYTVTRSGISV*CW**CGSCS*NV*CLR*YVFINF*RTNGKTQNTSCNCRS*TCKECVLRQCLIYFYFSSSARVC*FRCRN*RCC*MS*IVTSI*HRSYWR*L**LYAHL*QS*KHDTP*PWCLY*L*CASY*CAGSKKSQHCFDMER*RFHVIV*TTTKTNT*CC*KE*LTF*VDMCNY*TSC*CCNNKDST*GW*NC**LVEAVN*SYTCVPFCCCYFLFNNTCSCHV*TY*LFK*NHRIQGY*WWCHS*HSIYRYLFC*QTC*F*HMV*PAWW*LY**QSLPIDCCSHNKRSGFCRAWFAWHDITHN*W*LFAFLT*SF*CSW*HLLHTIKTYRVH*LCNISLCFGC*MYNF*RCFW*ASTILL*YQCTRRFCCL*KFTP*HTLCAHGWLYYSIS*HLP*RFC*SGNNF*F*VL*ARHL*KIRSWCLCIY*W*MGT*Q*LLQIFTRSFLWCRCCKFTY*YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV*KSFW*IQSCSCL*YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY**CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL**LPKETCSL*WCFL*YF*RSCAVHLFVK*RNVSKVA**CAITSYAI**ILSSL**VQVF*WSNGYN*LQRSCLLSSRKGSQ*LQ*LRF*CSLPTTTNLYHLSCFAEWF*KNGIPIW*S*GLYGTSNLWYNYT*RSLA**RSLLSKTCDLHL*RHA*P*L*RFTHS*V*S*FLGTGW*CSTQGYWTFYAKLCT*A*G*YSQS*DT*V*VCSHSTRTDFFSVSLLQWFTIWCLPMCYEAQFHY*GFIP*WFMW*CWF*HRL*LCLFLLHAPYGITNWSSCWHRLRR*LLWTFC*QANSTSSWYGHNYYS*CFSLVVRCCYKWRQVVSQSIYHNS**L*PCGYEVQL*TSNTRPC*HTRTSFCSNWNCRFRYVCFIKRITAKWYEWTYHIG*CFIRR*IYTF*CC*TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV*KCLFTFCYGYYCYVCFCNDVCQT*ACISLFVFVTFSCHCSLF*YGLYAC*LGDAYYDMVGYG*Y*FVWF*AKRLCYVCISCSVTNPYDSKNCV**WC*ESVDTYECLDTRL*SLLW*CFRSSHFHVGSYNLCYF*LLRCSYNCHVFGQRYCFYVC*VLPYFLHNW*YTSVYNASLLFLRLFLYLLLWPLLFTQPLL*TDSWCL*LLSFYTGV*IYEFTGTTPTQE*HRCLQTQH*IVGCWWQTLYQSSHCTV*NVRCKVHISSLTLSFATTQSRIII*IVGSMCPVTQ*HSLS*RYY*SL*KNGFTTFCFAFHAGCCRHKQAL*RNAGQQGNLTSYSLRV*FPSIICSFCYCSRSL*AGCC*W*F*SCS*KVEEVFECG*I*I*P*CSHAT*VGKDG*SSYDPNV*TG*I*GQEGKSY*CYADNAFHYA*KVG**CTQQHYQQCKRWLCSLEHNTSYNSSQTNGCHTRL*HI*KYV*WYNIYLCISIVGNPTGCRCR**NCST**N*YGQFT*FSMASYCNSFKGQFCCQITE**A*SCCTTTDVLCCRYYTNCLH**QCVSLLQHNKGR*VCTCTVIRFTGFEMG*IP*E*WNWYYLYRTGTTL*VCYRHT*RS*SEVFILY*RIKQPK*RYGTW*FSCHSTSTSW*CNRSACQFNCIIFLCFCCRCC*SLQRLSS*WGTTNH*LC*DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS*RIL*LKR*VCTNTYNLC**PCGFYT*KHSLYRLRYVERLWL*L*STPRTHASVS*CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQAVGACVLCNSQTSLRCGACIRRPFLCCKCCYDHVISTSHKLVLSVNPYVCNAPGCDVTDVTQLYLGGMSYYCKSHKPPISFPLCANGQVFGLYKNTCVGSDNVTDFNAIATCDWTNAGDYILANTCTERLKLFAAETLKATEETFKLSYGIATVREVLSDRELHLSWEVGKPRPPLNRNYVFTGYRVTKNSKVQIGEYTFEKGDYGDAVVYRGTTTYKLNVGDYFVLTSHTVMPLSAPTLVPQEHYVRITGLYPTLNISDEFSSNVANYQKVGMQKYSTLQGPPGTGKSHFAIGLALYYPSARIVYTACSHAAVDALCEKALKYLPIDKCSRIIPARARVECFDKFKVNSTLEQYVFCTVNALPETTADIVVFDEISMATNYDLSVVNARLRAKHYVYIGDPAQLPAPRTLLTKGTLEPEYFNSVCRLMKTIGPDMFLGTCRRCPAEIVDTVSALVYDNKLKAHKDKSAQCFKMFYKGVITHDVSSAINRPQIGVVREFLTRNPAWRKAVFISPYNSQNAVASKILGLPTQTVDSSQGSEYDYVIFTQTTETAHSCNVNRFNVAITRAKVGILCIMSDRDLYDKLQFTSLEIPRRNVATLQAENVTGLFKDCSKVITGLHPTQAPTHLSVDTKFKTEGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEGCHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLCDRRATCFSTASDTYACWHHSIGFDYVYNPFMIDVQQWGFTGNLQSNHDLYCQVHGNAHVASCDAIMTRCLAVHECFVKRVDWTIEYPIIGDELKINAACRKVQHMVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIEELFYSYATHSDKFTDGVCLFWNCNVDRYPANSIVCRFDTRVLSNLNLPGCDGGSLYVNKHAFHTPAFDKSAFVNLKQLPFFYYSDSPCESHGKQVVSDIDYVPLKSATCITRCNLGGAVCRHHANEYRLYLDAYNMMISAGFSLWVYKQFDTYNLWNTFTRLQSLENVAFNVVNKGHFDGQQGEVPVSIINNTVYTKVDGVDVELFENKTTLPVNVAFELWAKRNIKPVPEVKILNNLGVDIAANTVIWDYKRDAPAHISTIGVCSMTDIAKKPTETICAPLTVFFDGRVDGQVDLFRNARNGVLITEGSVKGLQPSVGPKQASLNGVTLIGEAVKTQFNYYKKVDGVVQQLPETYFTQSRNLQEFKPRSQMEIDFLELAMDEFIERYKLEGYAFEHIVYGDFSHSQLGGLHLLIGLAKRFKESPFELEDFIPMDSTVKNYFITDAQTGSSKCVCSVIDLLLDDFVEIIKSQDLSVVSKVVKVTIDYTEISFMLWCKDGHVETFYPKLQSSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGDSATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEGFFTYICGFIQQKLALGGSVAIKITEHSWNADLYKLMGHFAWWTAFVTNVNASSSEAFLIGCNYLGKPREQIDGYVMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKEGQINDMILSLLSKGRLIIRENNRVVISSDVLVNN*TNNVCFSCFIATSL*SVC*SYNQNSITPCIH*FFHTWCLLP*QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY*EV**PCPTI**WCLFCFH*EV*HNKRLDFWYYFRFEDPVPTYC**RY*CCY*SL*ISIL**SIFGCLLPQKQQKLDGK*VQSLF*CE*LHF*ICLSAFSYGP*RKTG*FQKS*GICV*EY*WLF*NIF*AHAY*FSA*SPSGFFGFRTIGRFANRY*HH*VSNFTCFT*KLFDSW*FFFRLDSWCCSLLCGLSST*DFSIKI**KWNHYRCCRLCT*PSLRNKVYVEILHCRKRNLSNF*L*SPTNRIYC*IS*YYKLVPFW*SF*RHQICICLCLEQEENQQLCC*LFCPI*FRIIFHF*VLWSVSY*IK*SLLY*CLCRFICN*R**SQTNRSRANWKDC*L*L*ITR*FYRLRYSLEF*QS*F*GWW*L*LPV*IV*EV*SQTF*ERYFN*NLSGR*HTL*WC*RF*LLLSFTIIWFPTH*WCWLPTIQSSSTFF*TSTCTSNCLWT*KVY*FG*KQMCQFQLQWFNRHRCSY*V*QKVSAFPTIWQRHC*HY*CCP*STDT*DS*HYTMFFWWCQCYNTRNKYF*PGCCSLSGC*LHRSPCCYSCRSTYSYLACLFYRF*CFSNTCRLFNRG*TCQQLI*V*HTHWCRYMR*LSDSD*FSSAGT*CS*SIHHCLHYVTWCRKFSCLL**LYCHTHKFYY*CYHRNSTSVYDQDISRLYNVHLW*FN*MQQSFVAIWQFLYTIKPCFNWNSC*TRQKHPRSFCTSQTNLQNTTN*RFWWF*FFTNITRSIKTKQEVIY*RSTFQQSDTCRCWLHQTIW*LPW*YCC*RPHLCTKV*RPYCFATFAHR*NDCSIHFCTVSGYNHFWLDLWCRCCITNTICYANGL*V*WYWSYTECSL*EPKIDCQPI**CYWQNSRLTFFHSKCTWKTSRCGQPKCTSFKHAC*TT*LQFWCNFKCFK*YPFTS*QS*G*SAN**VDHRQTSKFADICDSTIN*SCRNQSFC*SCCY*NVRVCTWTIKKS*FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS*WKSTLSS*RCLCFKWHTLVCNTKEFL*TTNHYYRQHICVW*L*CCNRNCQQHSL*SFAT*IRLIQGGVR*IF*ESYITRC*FR*HLWH*CFSCKHSKRN*PPQ*GCQEFK*ISHRSPRTWKV*AVYKMAMVHLARFYSWLDCHSNGDNYALLYDQLL*LSQGLLFLWILLQI**RRL*ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPL*AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV*TN*ILY*FFCLEL*F*PWQIPTVLLPLKSLKSSLNNGT***VSYSLHGFVFYNLPMPTGIGFCI*LS*FSSGCYGQ*L*LVLCLLLFTE*IGSPVELLSQWLVL*A*CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF*PDRF*KVNS*SEL*SFVDIFVLLDTI*DAVTSRTCLKKSLLLHHERFLITNWELRSV*QVTQVLLHTVATGLATIN*TQTIPVAVTILLCLYSK*QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQLDEEQPMEID*TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIVAAIVFITLCFTLKRKTE*LNFH*LTSICAF*PFCYSLF*LCLLSFGSHLNCKIIMKLVTPKRT*NFLFS*ESSQL*LHFTKNVVYSHVLNINHM*LMTRVLFTSILNGILE*ELENQHL*LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL*CVVRSMKTF*SIMTFVLF*ISSKRTN*NV**WTPKSAKCTPHYVWWTLRFNWQ*PEWRTQWGAIKTTSAPRFTQ*YCVLVHRSHSTWQGRP*IPSRTRRSN*HQ*QSR*PNWLLPKSYQTNSWW*R*NERSQSKMVFLLPRNWARSWTSLWC*QRRHHMGCN*GSLEYTKRSHWHPQSC*QCCNRATTSSRNNIAKRLLRRREQRRQSSLFSFLIT*SQQFKKFNSRQQ*GNFSC*NGWQWR*CCSCFAAA*QIEPA*EQNVW*RPTTTRPNCH*EICC*GF*EASAKTYCH*SIQCNTSFRQTWSRTNPRKFWGPGTNQTRN*LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG*QRSKFQRSSHFAE*AY*RIQNIPTNRA*KGQKEEG**NSSLTAETEETANCDSSSCCRFG*FLQTIATIHEQC*LNSGLNSCRPHKADGLYKRFRFSVYDI*STLVQNEFS*LHSTSRCS*L*SHIAIFNQCVTLGRT*KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM*F**LLRRMTKKKKKKKKKK
In [114]:
from collections import Counter
In [115]:
Counter(ncov_protein).most_common(10)
Out[115]:
[('L', 886), ('S', 810), ('*', 774), ('T', 679), ('C', 635), ('F', 593), ('R', 558), ('V', 548), ('Y', 505), ('N', 472)]
3D Structure of Covid
- File Format
- pdb :PDBParser() legacy
- cif :MMCIFParser() recent
links
- https://www.ncbi.nlm.nih.gov/Structure/pdb/6LU7
- Protein Data Bank
Pkgs
- pip install nglview
- pip install py3Dmol
- pip install pytraj
- jupyter-nbextension enable nglview –py –sys-prefix
- nglview enable
- jupyter-labextension install @jupyter-widget/jupyterlab-manager
- jupyter-labextension install nglview-js-widgets
In [116]:
from Bio.PDB import PDBParser,MMCIFParser
In [117]:
# Reading a PDB File
parser = PDBParser()
structure = parser.get_structure("mmdb_6LU7", "mmdb_6LU7.pdb")
In [118]:
structure
Out[118]:
<Structure id=mmdb_6LU7>
In [163]:
# Chains in the Protein Structure
model = structure[0]
In [165]:
for chain in model:
print(f'chain {chain},chain_ID: {chain.id}')
chain <Chain id=A>,chain_ID: A
In [168]:
# Check the atoms
for model in structure:
print(model)
for chain in model:
print(chain)
for residue in chain:
for atom in residue:
print(atom)
<Model id=0> <Chain id=A> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom NE1> <Atom CE2> <Atom CE3> <Atom CZ2> <Atom CZ3> <Atom CH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom NE1> <Atom CE2> <Atom CE3> <Atom CZ2> <Atom CZ3> <Atom CH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom NE1> <Atom CE2> <Atom CE3> <Atom CZ2> <Atom CZ3> <Atom CH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Model id=1> <Chain id=C> <Atom C4> <Atom C5> <Atom C6> <Atom O1> <Atom N2> <Atom C3> <Atom C41> <Atom O42> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom C19> <Atom C20> <Atom C21> <Atom C22> <Atom C25> <Atom C26> <Atom C27> <Atom C28> <Atom N6> <Atom C29> <Atom O8> <Atom N5> <Atom O7> <Atom C> <Atom O> <Atom C1> <Atom C2> <Atom C3> <Atom C4> <Atom C5> <Atom C6> <Model id=2> <Chain id=A> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom NE1> <Atom CE2> <Atom CE3> <Atom CZ2> <Atom CZ3> <Atom CH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom NE1> <Atom CE2> <Atom CE3> <Atom CZ2> <Atom CZ3> <Atom CH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom NE1> <Atom CE2> <Atom CE3> <Atom CZ2> <Atom CZ3> <Atom CH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom OH> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom ND1> <Atom CD2> <Atom CE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom CE> <Atom NZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom SD> <Atom CE> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom ND2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom CD1> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom OE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom OD1> <Atom OD2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom NE> <Atom CZ> <Atom NH1> <Atom NH2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom SG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom OG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom CE1> <Atom CE2> <Atom CZ> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD> <Atom OE1> <Atom NE2> <Model id=3> <Chain id=C> <Atom C4> <Atom C5> <Atom C6> <Atom O1> <Atom N2> <Atom C3> <Atom C41> <Atom O42> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG1> <Atom CG2> <Atom N> <Atom CA> <Atom C> <Atom O> <Atom CB> <Atom CG> <Atom CD1> <Atom CD2> <Atom C19> <Atom C20> <Atom C21> <Atom C22> <Atom C25> <Atom C26> <Atom C27> <Atom C28> <Atom N6> <Atom C29> <Atom O8> <Atom N5> <Atom O7> <Atom C> <Atom O> <Atom C1> <Atom C2> <Atom C3> <Atom C4> <Atom C5> <Atom C6>
Visualizing the 3D structure
- using nglview
- py3Dmol
- using pytraj
- squiggle
In [153]:
# View our 3D Structure
import nglview as nv
In [154]:
nv.demo()
NGLWidget()
In [155]:
view = nv.show_biopython(structure)
In [156]:
view
NGLWidget()
In [124]:
import py3Dmol
In [125]:
view1 = py3Dmol.view(query='pdb:6LU7')
In [126]:
view1.setStyle({'cartoon':{'color':'spectrum'}})
view1
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
Out[126]:
<py3Dmol.view at 0x7ff36e221a10>
In [129]:
dir(py3Dmol.view)
Out[129]:
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_make_html', '_repr_html_', 'getModel', 'insert', 'model', 'png', 'show', 'update']
In [157]:
view.render_image()
Image(value=b'', width='99%')
In [158]:
view._display_image()
Out[158]:
In [147]:
import pytraj as pt
In [148]:
# Load file
ncov_traj = pt.load("mmdb_6LU7.pdb")
In [159]:
view3 = nv.show_pytraj(ncov_traj)
In [160]:
view3
NGLWidget(max_frame=1)
In [161]:
view3.render_image()
Image(value=b'', width='99%')
In [162]:
view3._display_image()
Out[162]:
In [ ]:
To conclude we have been able to see how to use biopython and how to analysis coronavirus DNA sequence.
You can check out the video tutorials
Thanks For Your Attention
Jesus Saves
By Jesse E.Agbe(JCharis)