Protein Sequence Analysis of Covid19 using BioPython

In this tutorial we will be analysing the genomic and DNA sequence of coronavirus (ncov 19) using a wonderful bioinformatic package called BioPython.

We will do a simple introduction of how to use BioPython and then continue with our protein analysis of Covid19.

Installation

pip install biopython

Applications of BioPython

  • For sequence analysis (DNA,RNA)
  • To do transcription and translation of DNA (Protein Synthesis)
  • Querying and Access BioInformatic Databases
    • Entrez, BLAST,GenBank,etc
  • 3D Structure analysis

Task

  • Analysis of Covid19 genome

BioPython Crash Course

In [1]:
# Load the Pkg
import Bio
In [3]:
# Check the Attributes
dir(Bio)
Out[3]:
['BiopythonDeprecationWarning',
 'BiopythonExperimentalWarning',
 'BiopythonParserWarning',
 'BiopythonWarning',
 'MissingExternalDependencyError',
 'MissingPythonDependencyError',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_parent_dir',
 'os',
 'warnings']

Sequence Analysis

  • DNA and RNA Sequence
    • A Adenine
    • C Cytosine
    • G Guanine
    • T Thymine
    • U Uracil * RNA
  • Protein Sequence Analysis
In [4]:
# Working with Sequence
from Bio.Seq import Seq
In [5]:
dir(Seq)
Out[5]:
['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__mul__',
 '__ne__',
 '__new__',
 '__radd__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_get_seq_str_and_check_alphabet',
 'back_transcribe',
 'complement',
 'count',
 'count_overlap',
 'encode',
 'endswith',
 'find',
 'index',
 'join',
 'lower',
 'lstrip',
 'reverse_complement',
 'rfind',
 'rindex',
 'rsplit',
 'rstrip',
 'split',
 'startswith',
 'strip',
 'tomutable',
 'transcribe',
 'translate',
 'ungap',
 'upper']
In [6]:
# Create a General DNA sequence
mydna = Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA')
In [7]:
mydna
Out[7]:
Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA')
In [8]:
mydna.alphabet
Out[8]:
Alphabet()
In [9]:
# Convert Sequence to String
# Method 1
str(mydna)
Out[9]:
'ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA'
In [10]:
type(mydna)
Out[10]:
Bio.Seq.Seq

Alphabet Types

  • generic_dna/rna
  • generic_protein
  • IUPACUnambiguousDNA, which provides for just the basic letters,
  • IUPACAmbiguousDNA ,which provides for ambiguity letters for every possible situation

Usefulness of Specifying the Type of Sequence or Alphabet

  • Help us to have an idea of the type of information the Seq object contains.
  • Act as a means of constraining the information,
  • As a means of type checking.
In [11]:
# Create a Specific Sequence (DNA,RNA,Protein)
from Bio.Alphabet import generic_dna,generic_rna,generic_protein
In [12]:
# Create a DNA
dna1 = Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA',generic_dna)
In [13]:
# Check the Type of Sequence
dna1.alphabet
Out[13]:
DNAAlphabet()
In [14]:
# Create a RNA
rna1 = Seq('AGGCUCUCGUA',generic_rna)
In [15]:
rna1.alphabet
Out[15]:
RNAAlphabet()
In [16]:
# Method 2 Using IUPAC
from Bio.Alphabet import IUPAC
In [17]:
dna2 = Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA',IUPAC.unambiguous_dna)
In [18]:
dna2.alphabet
Out[18]:
IUPACUnambiguousDNA()

Sequence Manipulation

  • Indexing/Slicing
  • Join 2 Sequences
  • Find a Codon in a sequence
  • Count the number of Nucleotides
In [19]:
dna_seq = Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA',generic_dna)
In [20]:
# Slicing
dna_seq[0:3]
Out[20]:
Seq('ATT', DNAAlphabet())
In [21]:
# Adding Sequence
dna_seq2 = Seq('AGCGCTTCGAGA',generic_dna)
In [22]:
dna_seq[0:3] + dna_seq2[4:]
Out[22]:
Seq('ATTCTTCGAGA', DNAAlphabet())
In [23]:
# Find the number of G Nucleotides in a sequence
dna_seq.count('G')
Out[23]:
8
In [24]:
# Count the number of G Nucleotides in a sequence
dna_seq.count('GGT')
Out[24]:
2
In [25]:
# Find the index/position of G Nucleotides in a sequence
dna_seq.find('G')
Out[25]:
6
In [71]:
# Count the number of G Nucleotides in a sequence in that overlap
dna_seq.count_overlap('GGT')
Out[71]:
2
In [72]:
seq1 = Seq('ATGATCTCGTAA')
In [73]:
# Complement
seq1.complement()
Out[73]:
Seq('TACTAGAGCATT')
In [77]:
# Backwards of complement
seq1.reverse_complement()
Out[77]:
Seq('TTACGAGATCAT')
In [74]:
# To mrna
seq1.transcribe()
Out[74]:
Seq('AUGAUCUCGUAA', RNAAlphabet())
In [76]:
seq1.transcribe().translate()
Out[76]:
Seq('MIS*', HasStopCodon(ExtendedIUPACProtein(), '*'))
In [75]:
# To protein
seq1.translate()
Out[75]:
Seq('MIS*', HasStopCodon(ExtendedIUPACProtein(), '*'))

Proteing Synthesis

In [ ]:

Proteing Synthesis

In [26]:
# Transcription
# DNA to mRNA
dna_seq
Out[26]:
Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA', DNAAlphabet())
In [27]:
mrna = dna_seq.transcribe()
In [28]:
# Changes the Thiamine to Uracil
mrna
Out[28]:
Seq('AUUAAAGGUUUAUACCUUCCCAGGUAACAAACCAACCAACUUUCGAUCUCUUGU...AAA', RNAAlphabet())
In [29]:
# Translation
# mRNA to Protein
# DNA to Protein
dna_seq.translate()
/usr/local/lib/python3.7/dist-packages/Bio/Seq.py:2859: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  BiopythonWarning,
Out[29]:
Seq('IKGLYLPR*QTNQLSISCRSVL*', HasStopCodon(ExtendedIUPACProtein(), '*'))
In [30]:
dir(mrna)
Out[30]:
['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__mul__',
 '__ne__',
 '__new__',
 '__radd__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_data',
 '_get_seq_str_and_check_alphabet',
 'alphabet',
 'back_transcribe',
 'complement',
 'count',
 'count_overlap',
 'encode',
 'endswith',
 'find',
 'index',
 'join',
 'lower',
 'lstrip',
 'reverse_complement',
 'rfind',
 'rindex',
 'rsplit',
 'rstrip',
 'split',
 'startswith',
 'strip',
 'tomutable',
 'transcribe',
 'translate',
 'ungap',
 'upper']
In [31]:
# Translate mRNA to Protein/Amino Acid
mrna.translate()
Out[31]:
Seq('IKGLYLPR*QTNQLSISCRSVL*', HasStopCodon(ExtendedIUPACProtein(), '*'))
In [70]:
# Translate mRNA to Protein/Amino Acid
# Change the symbol for the stop codon
mrna.translate(stop_symbol='@')
Out[70]:
Seq('IKGLYLPR@QTNQLSISCRSVL@', HasStopCodon(ExtendedIUPACProtein(), '@'))
In [32]:
# Back Transcribe mRNA to DNA
mrna.back_transcribe()
Out[32]:
Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA', DNAAlphabet())

In [33]:
# View the CodonTable
from Bio.Data import CodonTable
In [35]:
dir(CodonTable)
Out[35]:
['Alphabet',
 'AmbiguousCodonTable',
 'AmbiguousForwardTable',
 'CodonTable',
 'IUPAC',
 'IUPACData',
 'NCBICodonTable',
 'NCBICodonTableDNA',
 'NCBICodonTableRNA',
 'TranslationError',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'ambiguous_dna_by_id',
 'ambiguous_dna_by_name',
 'ambiguous_generic_by_id',
 'ambiguous_generic_by_name',
 'ambiguous_rna_by_id',
 'ambiguous_rna_by_name',
 'generic_by_id',
 'generic_by_name',
 'list_ambiguous_codons',
 'list_possible_proteins',
 'make_back_table',
 'register_ncbi_table',
 'standard_dna_table',
 'standard_rna_table',
 'unambiguous_dna_by_id',
 'unambiguous_dna_by_name',
 'unambiguous_rna_by_id',
 'unambiguous_rna_by_name']
In [36]:
# CodonTable for DNA
print(CodonTable.unambiguous_dna_by_name['Standard'])
Table 1 Standard, SGC0

  |  T      |  C      |  A      |  G      |
--+---------+---------+---------+---------+--
T | TTT F   | TCT S   | TAT Y   | TGT C   | T
T | TTC F   | TCC S   | TAC Y   | TGC C   | C
T | TTA L   | TCA S   | TAA Stop| TGA Stop| A
T | TTG L(s)| TCG S   | TAG Stop| TGG W   | G
--+---------+---------+---------+---------+--
C | CTT L   | CCT P   | CAT H   | CGT R   | T
C | CTC L   | CCC P   | CAC H   | CGC R   | C
C | CTA L   | CCA P   | CAA Q   | CGA R   | A
C | CTG L(s)| CCG P   | CAG Q   | CGG R   | G
--+---------+---------+---------+---------+--
A | ATT I   | ACT T   | AAT N   | AGT S   | T
A | ATC I   | ACC T   | AAC N   | AGC S   | C
A | ATA I   | ACA T   | AAA K   | AGA R   | A
A | ATG M(s)| ACG T   | AAG K   | AGG R   | G
--+---------+---------+---------+---------+--
G | GTT V   | GCT A   | GAT D   | GGT G   | T
G | GTC V   | GCC A   | GAC D   | GGC G   | C
G | GTA V   | GCA A   | GAA E   | GGA G   | A
G | GTG V   | GCG A   | GAG E   | GGG G   | G
--+---------+---------+---------+---------+--
In [37]:
# CodonTable for RNA
print(CodonTable.unambiguous_rna_by_name['Standard'])
Table 1 Standard, SGC0

  |  U      |  C      |  A      |  G      |
--+---------+---------+---------+---------+--
U | UUU F   | UCU S   | UAU Y   | UGU C   | U
U | UUC F   | UCC S   | UAC Y   | UGC C   | C
U | UUA L   | UCA S   | UAA Stop| UGA Stop| A
U | UUG L(s)| UCG S   | UAG Stop| UGG W   | G
--+---------+---------+---------+---------+--
C | CUU L   | CCU P   | CAU H   | CGU R   | U
C | CUC L   | CCC P   | CAC H   | CGC R   | C
C | CUA L   | CCA P   | CAA Q   | CGA R   | A
C | CUG L(s)| CCG P   | CAG Q   | CGG R   | G
--+---------+---------+---------+---------+--
A | AUU I   | ACU T   | AAU N   | AGU S   | U
A | AUC I   | ACC T   | AAC N   | AGC S   | C
A | AUA I   | ACA T   | AAA K   | AGA R   | A
A | AUG M(s)| ACG T   | AAG K   | AGG R   | G
--+---------+---------+---------+---------+--
G | GUU V   | GCU A   | GAU D   | GGU G   | U
G | GUC V   | GCC A   | GAC D   | GGC G   | C
G | GUA V   | GCA A   | GAA E   | GGA G   | A
G | GUG V   | GCG A   | GAG E   | GGG G   | G
--+---------+---------+---------+---------+--
In [ ]:
# Analysing Covid 19
 You can find the DNA sequence in several format such as GenBank format,FASTA format. To get the file you can search for it in the ncbi database as shown below
In [47]:
from Bio import SeqIO
In [51]:
# Load the file
for record in SeqIO.parse("Covid_sequence.fasta", "fasta"):
    print(record.id)
    print(record.name)
    print(record.description)
MN908947.3
MN908947.3
MN908947.3 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
In [52]:
# Load the file
for record in SeqIO.parse("Covid_sequence.fasta", "fasta"):
    print(record)
ID: MN908947.3
Name: MN908947.3
Description: MN908947.3 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
Number of features: 0
Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA', SingleLetterAlphabet())
In [57]:
# Read the sequence record in the file
ncov_dna_record = SeqIO.read("Covid_sequence.fasta","fasta")
In [61]:
type(ncov_dna_record)
Out[61]:
Bio.SeqRecord.SeqRecord
In [62]:
ncov_dna_record
Out[62]:
SeqRecord(seq=Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA', SingleLetterAlphabet()), id='MN908947.3', name='MN908947.3', description='MN908947.3 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome', dbxrefs=[])
In [63]:
ncov_dna = ncov_dna_record.seq
In [64]:
# Display the Nucleotides
ncov_dna
Out[64]:
Seq('ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGT...AAA', SingleLetterAlphabet())
In [65]:
# Length of our sequence
len(ncov_dna)
Out[65]:
29903
In [66]:
# Transcribe (DNA to mRNA)
ncov_mRNA = ncov_dna.transcribe()
In [67]:
# Changes Thymine to Uracil
ncov_mRNA
Out[67]:
Seq('AUUAAAGGUUUAUACCUUCCCAGGUAACAAACCAACCAACUUUCGAUCUCUUGU...AAA', RNAAlphabet())
In [68]:
# Translate to Protein/Amino Acids (mRNA to AA)
ncov_protein = ncov_mRNA.translate()
In [69]:
ncov_protein
Out[69]:
Seq('IKGLYLPR*QTNQLSISCRSVL*TNFKICVAVTRLHA*CTHAV*LITNYCR*QD...KKK', HasStopCodon(ExtendedIUPACProtein(), '*'))
In [78]:
# Length of Protein/Amino Acids
len(ncov_protein)
Out[78]:
9967
In [80]:
# Check if it is true by dividing it by 3 for codon
len(ncov_mRNA)/3
Out[80]:
9967.666666666666
In [81]:
# Find all the amino acids
ncov_amino_acids = ncov_protein.split('*')
In [82]:
ncov_amino_acids
Out[82]:
[Seq('IKGLYLPR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QTNQLSISCRSVL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TNFKICVAVTRLHA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CTHAV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LITNYCR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QDTSNSSIFCRLLTVSSVLQPIISTSRFRPGVTER', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RWHLWLSRS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KRRFAST', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TALCVHQTFGCSNCTSWSCYG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('AGSRTRRHSVRS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('W', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('DTWCPCPSCGRNTSGLPQGSSS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ER', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RSWWP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LRRRSKVI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LRRRAWH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RFSRKLEH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QWCYP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('THA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RRGIHSLCR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QLLWP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WLPS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RPSSTCW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SFMHFVRTTGLY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('H', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EGCILLP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NCLVHGTF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KEL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IADTF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('N', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IGKEI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HLQWGMSKFCISLKFHNQDYSTKG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KEKA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WLYG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NSICLSSCVTK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('MQPNVPFNSHEV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SLW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NFMADGRFC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SHLRILWH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EFD', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RRCHYLWLLTPKCCC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NLLSSMSQFRSRT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SCRIP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IWLENHSS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GWSHYCLWRLCVLLCWLP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QVCLLGSTC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('R', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HRL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PYRCCWRRFRRS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QPS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NTPKRESQHQYCW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RDRHYFGIFFCFHKCFCGNCERFGL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SIQTNC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ILW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SYKRKS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KRCLEYW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TEINTESSLCICIRGCSCCTINFLPHS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NCSKFCACFTEGRYNNTRWNFTVFTETH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CYDVHI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FGY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QSSCNGLHYRWCCSVDFAVAN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HLWHCL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KTQTRP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('REV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GRCRVS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RRLGNC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IYLNLCL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NCRWTNCHLCKGN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GECSDIL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ACK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IFGFVC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LYHYWWS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SLEFR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NICHALKGIVQKVC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FTTIRTTY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SSIGWYTSLY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RAYVARNQRHRKVLCPCT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YDGNKQYLHTQRRCTNKGYFW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HCDRSARLQECEYHF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KD', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ST', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EVLCLYS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TRYRSK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VRLCCGRCCHKNFATSI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ITYTTGH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VEYGYILLI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IGFTYVLFFLPSR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('G', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RRR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RRRV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('AINSI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VWY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('R', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LPR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TFGIWCHFCCSST', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RRARRRLVR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('STNCWSTRRQ', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GQSDNYYSNNC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GSTSIRDGTYTSCSDY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SE', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WLFKTY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QCIH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KCRHCGRS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KGKTNSGC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CSQCLP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TWRRCCRSLK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QCHAS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('I', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LHSY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WTT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SGW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LCFKRTQSC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TLSSCCRPKC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RHSTS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ECL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SARSSTCTIIISWYFWC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PYTFFKSLCRYCSHKCLLSCL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KSL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QTCFKLFGNEE', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KAS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TKDR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('DS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RGS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('AIYN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('K', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TFS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TEKTR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ENQSLC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RSYNNSGRN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VPHRKLVTLY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('H', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WQSSSRFCHSC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('H', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HHFLKERCSIYSG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CCSRGCFNCCGYTY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KGWWHY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NASESFEKSANRQLYNHLPGSGFKWLHCRGGKDSA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KCLLHSTIYYL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EARNSWNCFLEFARNACTCRRNTQINACLCGN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SHSFNYTA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('I', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NTRGCG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LWC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ILLLHQ', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NNCSVTYQHT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RSK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CCYSV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WLSYFFF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RTFY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NHLTCWFL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RLVLFWTIYTTRYRIS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ER', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KCILH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SYHIPPRW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SYHL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('DTSFFERSEDY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GVYNSRQH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PPHASCGHVNDIWTTVWSNLFGWS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NKTS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('R', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NILCFT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HSTC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VLPHN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('S', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FSG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VHVSIKSH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KVEIPTS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WFNFY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('MGR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QLLSCHCIVNTPTNRVEV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('STCSTRCLLQSKGW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LLCTYLSLL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('DSR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('C', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RNNELLVSTCQFRFLQKSLERGV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NLWTTADNP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GCRSCYVHGHTFL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ERCSDTLYVW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TSYKISSTTGVTFCYDVSTTCSV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('T', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('AWYIYLC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VHW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LPVWSL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TYNF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IGWCCLYRN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('P', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VGQLL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ERQFLFHRATN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SCTKPTISKRKLR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VCM', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YQIC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FKPVNWL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ETCFKRA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SYIFP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LKW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CGGY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TLHTLF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ERS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IVT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TYCLAC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QCN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SHV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TKYLVYTLSLEHKTS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NIKFV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CTEVRGRAGNG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SCLRRSKTSL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RSSGKSYHTERRS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CENYRSCRRHYT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TSK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FKNYRRGWPHRSNGCLCRQF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SYY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ET', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('II', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SIRFENPCYSWFSCC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CPLGYYS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('AFS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QSC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YNY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HSYTVFKPCLY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LYALFLYFIATIVYFY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KYKF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('N', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SIYADYYSKEYC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ECR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ILSRGFI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LFEVT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TDKYYNLVFTIKCLPRFFNLLNRCFRCFNV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FRHAFLLYWLQRRLFELY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CHYCNLLYWFYTL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CLS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WFRFFRHLSFFRNYTNYHFIF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('MGFNCFWLSCRVVFGIYSFH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VFLCTWIGCNHAIVFQLFCSTFY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FLAYVVNN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SCTNGPDFSYG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NVHLLCIILLCMEKLCACCRRL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FINLYDVLQT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SNKSRMYNYC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KVLLCLC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RLLQTTQLELC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YILCW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YIY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SCERLVTTV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KTNKSY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PVFLHR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CYSEEWFHPSLL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SWSKDL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KTFSLSFC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LRQPES', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('H', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RFIAY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CYSF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('W', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IKM', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RIICKISVCLLQSAYVSTYTVTRSGISV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CGSCS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CLR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YVFINF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RTNGKTQNTSCNCRS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TCKECVLRQCLIYFYFSSSARVC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FRCRN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RCC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('MS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IVTSI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HRSYWR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LYAHL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KHDTP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PWCLY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CASY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CAGSKKSQHCFDMER', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RFHVIV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TTTKTNT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KE', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LTF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VDMCNY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TSC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CCNNKDST', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LVEAVN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SYTCVPFCCCYFLFNNTCSCHV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LFK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NHRIQGY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WWCHS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HSIYRYLFC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QTC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HMV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PAWW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QSLPIDCCSHNKRSGFCRAWFAWHDITHN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('W', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LFAFLT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CSW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HLLHTIKTYRVH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LCNISLCFGC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('MYNF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RCFW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ASTILL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YQCTRRFCCL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KFTP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HTLCAHGWLYYSIS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HLP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RFC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SGNNF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ARHL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KIRSWCLCIY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('W', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('MGT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('Q', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LLQIFTRSFLWCRCCKFTY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KSFW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IQSCSCL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LPKETCSL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WCFL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RSCAVHLFVK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RNVSKVA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CAITSYAI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ILSSL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VQVF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WSNGYN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LQRSCLLSSRKGSQ', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LQ', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LRF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CSLPTTTNLYHLSCFAEWF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KNGIPIW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('S', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GLYGTSNLWYNYT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RSLA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RSLLSKTCDLHL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RHA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('P', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RFTHS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('S', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FLGTGW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CSTQGYWTFYAKLCT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('G', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YSQS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('DT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VCSHSTRTDFFSVSLLQWFTIWCLPMCYEAQFHY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GFIP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WFMW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CWF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HRL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LCLFLLHAPYGITNWSSCWHRLRR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LLWTFC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QANSTSSWYGHNYYS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CFSLVVRCCYKWRQVVSQSIYHNS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PCGYEVQL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TSNTRPC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HTRTSFCSNWNCRFRYVCFIKRITAKWYEWTYHIG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CFIRR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IYTF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KCLFTFCYGYYCYVCFCNDVCQT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ACISLFVFVTFSCHCSLF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YGLYAC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LGDAYYDMVGYG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('Y', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FVWF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('AKRLCYVCISCSVTNPYDSKNCV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ESVDTYECLDTRL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SLLW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CFRSSHFHVGSYNLCYF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LLRCSYNCHVFGQRYCFYVC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VLPYFLHNW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YTSVYNASLLFLRLFLYLLLWPLLFTQPLL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TDSWCL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LLSFYTGV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IYEFTGTTPTQE', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HRCLQTQH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IVGCWWQTLYQSSHCTV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NVRCKVHISSLTLSFATTQSRIII', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IVGSMCPVTQ', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HSLS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RYY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KNGFTTFCFAFHAGCCRHKQAL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RNAGQQGNLTSYSLRV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FPSIICSFCYCSRSL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('AGCC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('W', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SCS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KVEEVFECG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('I', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('I', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('P', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CSHAT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VGKDG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SSYDPNV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('I', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GQEGKSY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CYADNAFHYA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KVG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CTQQHYQQCKRWLCSLEHNTSYNSSQTNGCHTRL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KYV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WYNIYLCISIVGNPTGCRCR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NCST', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('N', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YGQFT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FSMASYCNSFKGQFCCQITE', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SCCTTTDVLCCRYYTNCLH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QCVSLLQHNKGR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VCTCTVIRFTGFEMG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('E', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WNWYYLYRTGTTL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VCYRHT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SEVFILY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RIKQPK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RYGTW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FSCHSTSTSW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CNRSACQFNCIIFLCFCCRCC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SLQRLSS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WGTTNH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RIL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LKR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VCTNTYNLC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PCGFYT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KHSLYRLRYVERLWL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('STPRTHASVS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQ...VNN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TNNVCFSCFIATSL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SVC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SYNQNSITPCIH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FFHTWCLLP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PCPTI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WCLFCFH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HNKRLDFWYYFRFEDPVPTYC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CCY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ISIL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SIFGCLLPQKQQKLDGK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VQSLF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CE', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LHF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ICLSAFSYGP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RKTG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FQKS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GICV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WLF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NIF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('AHAY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FSA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SPSGFFGFRTIGRFANRY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VSNFTCFT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KLFDSW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FFFRLDSWCCSLLCGLSST', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('DFSIKI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KWNHYRCCRLCT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PSLRNKVYVEILHCRKRNLSNF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SPTNRIYC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YYKLVPFW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RHQICICLCLEQEENQQLCC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LFCPI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FRIIFHF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VLWSVSY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SLLY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CLCRFICN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('R', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SQTNRSRANWKDC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ITR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FYRLRYSLEF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GWW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LPV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SQTF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ERYFN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NLSGR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HTL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LLLSFTIIWFPTH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WCWLPTIQSSSTFF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TSTCTSNCLWT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KVY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KQMCQFQLQWFNRHRCSY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QKVSAFPTIWQRHC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CCP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('STDT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('DS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HYTMFFWWCQCYNTRNKYF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PGCCSLSGC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LHRSPCCYSCRSTYSYLACLFYRF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CFSNTCRLFNRG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TCQQLI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HTHWCRYMR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LSDSD', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FSSAGT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SIHHCLHYVTWCRKFSCLL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LYCHTHKFYY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CYHRNSTSVYDQDISRLYNVHLW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('MQQSFVAIWQFLYTIKPCFNWNSC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TRQKHPRSFCTSQTNLQNTTN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RFWWF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FFTNITRSIKTKQEVIY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RSTFQQSDTCRCWLHQTIW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LPW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YCC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RPHLCTKV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RPYCFATFAHR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NDCSIHFCTVSGYNHFWLDLWCRCCITNTICYANGL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('V', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WYWSYTECSL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EPKIDCQPI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CYWQNSRLTFFHSKCTWKTSRCGQPKCTSFKHAC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LQFWCNFKCFK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YPFTS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('G', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SAN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VDHRQTSKFADICDSTIN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SCRNQSFC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SCCY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NVRVCTWTIKKS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WKSTLSS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RCLCFKWHTLVCNTKEFL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TTNHYYRQHICVW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CCNRNCQQHSL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SFAT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IRLIQGGVR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ESYITRC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HLWH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CFSCKHSKRN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PPQ', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GCQEFK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ISHRSPRTWKV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('AVYKMAMVHLARFYSWLDCHSNGDNYALLYDQLL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LSQGLLFLWILLQI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RRL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQA...VPL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNI...LLV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ILY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FFCLEL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PWQIPTVLLPLKSLKSSLNNGT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('VSYSLHGFVFYNLPMPTGIGFCI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FSSGCYGQ', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LVLCLLLFTE', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IGSPVELLSQWLVL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('A', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PDRF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KVNS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SEL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SFVDIFVLLDTI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('DAVTSRTCLKKSLLLHHERFLITNWELRSV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QVTQVLLHTVATGLATIN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TQTIPVAVTILLCLYSK', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQL...EID', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADN...KTE', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LNFH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LTSICAF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PFCYSLF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LCLLSFGSHLNCKIIMKLVTPKRT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NFLFS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ESSQL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LHFTKNVVYSHVLNINHM', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LMTRVLFTSILNGILE', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ELENQHL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CVVRSMKTF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SIMTFVLF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('ISSKRTN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NV', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('WTPKSAKCTPHYVWWTLRFNWQ', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PEWRTQWGAIKTTSAPRFTQ', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('YCVLVHRSHSTWQGRP', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('IPSRTRRSN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('HQ', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QSR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('PNWLLPKSYQTNSWW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('R', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NERSQSKMVFLLPRNWARSWTSLWC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QRRHHMGCN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GSLEYTKRSHWHPQSC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QCCNRATTSSRNNIAKRLLRRREQRRQSSLFSFLIT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SQQFKKFNSRQQ', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GNFSC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NGWQWR', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('CCSCFAAA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QIEPA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EQNVW', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RPTTTRPNCH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EICC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('GF', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('EASAKTYCH', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SIQCNTSFRQTWSRTNPRKFWGPGTNQTRN', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('QRSKFQRSSHFAE', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('AY', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('RIQNIPTNRA', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KGQKEEG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('NSSLTAETEETANCDSSSCCRFG', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('FLQTIATIHEQC', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LNSGLNSCRPHKADGLYKRFRFSVYDI', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('STLVQNEFS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LHSTSRCS', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('SHIAIFNQCVTLGRT', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('F', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('', HasStopCodon(ExtendedIUPACProtein(), '*')),
 Seq('LLRRMTKKKKKKKKKK', HasStopCodon(ExtendedIUPACProtein(), '*'))]
In [86]:
for i in ncov_amino_acids:
    print(i)
IKGLYLPR
QTNQLSISCRSVL
TNFKICVAVTRLHA
CTHAV
LITNYCR
QDTSNSSIFCRLLTVSSVLQPIISTSRFRPGVTER
DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS
RWHLWLSRS
KRRFAST
TALCVHQTFGCSNCTSWSCYG
AGSRTRRHSVRS
W
DTWCPCPSCGRNTSGLPQGSSS
ER

RSWWP
LRRRSKVI
LRRRAWH
SL
RFSRKLEH
T
QWCYP
THA
A
RRGIHSLCR
QLLWP
WLPS
VH
RPSSTCW
SFMHFVRTTGLY
H
EGCILLP
T
A
NCLVHGTF
KEL
IADTF
N
IGKEI
HLQWGMSKFCISLKFHNQDYSTKG
KEKA
WLYG
NSICLSSCVTK
MQPNVPFNSHEV
SLW
NFMADGRFC
SHLRILWH
EFD
RRCHYLWLLTPKCCC
NLLSSMSQFRSRT
A
SCRIP

IWLENHSS
GWSHYCLWRLCVLLCWLP
QVCLLGSTC
R
HRL
PYRCCWRRFRRS

QPS
NTPKRESQHQYCW
L
T

RDRHYFGIFFCFHKCFCGNCERFGL
SIQTNC
ILW
F
SYKRKS
KRCLEYW
TEINTESSLCICIRGCSCCTINFLPHS
NCSKFCACFTEGRYNNTRWNFTVFTETH
CYDVHI
FGY
QSSCNGLHYRWCCSVDFAVAN
HLWHCL
KTQTRP
LA
REV
GRCRVS
RRLGNC
IYLNLCL
NCRWTNCHLCKGN
GECSDIL
ACK
IFGFVC
LYHYWWS
T
SLEFR
NICHALKGIVQKVC
IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW
FTTIRTTY

SC
SSIGWYTSLY
RAYVARNQRHRKVLCPCT
YDGNKQYLHTQRRCTNKGYFW

HCDRSARLQECEYHF
T

KD

ST

EVLCLYS
TRYRSK
VRLCCGRCCHKNFATSI
ITYTTGH
FR
VEYGYILLI

VW
V
IGFTYVLFFLPSR
G
RRR
L
RRRV
AINSI
VWY
R
LPR
TFGIWCHFCCSST
RRARRRLVR


STNCWSTRRQ
GQSDNYYSNNC
GSTSIRDGTYTSCSDY
SE
F
WLFKTY
QCIH
KCRHCGRS
KGKTNSGC
CSQCLP
TWRRCCRSLK
GY
QCHAS
I

LHSY
WTT
SGW
LCFKRTQSC
TLSSCCRPKC
QR
RHSTS
ECL
KF
SARSSTCTIIISWYFWC
PYTFFKSLCRYCSHKCLLSCL

KSL
QTCFKLFGNEE
KAS
TKDR
DS
RGS
AIYN
K
TFS
TEKTR

ENQSLC
RSYNNSGRN
VPHRKLVTLY
H
WQSSSRFCHSC

H
HHFLKERCSIYSG
CCSRGCFNCCGYTY
KGWWHY
NASESFEKSANRQLYNHLPGSGFKWLHCRGGKDSA
KV
KCLLHSTIYYL

EARNSWNCFLEFARNACTCRRNTQINACLCGN
SHSFNYTA
I
GY
NTRGCG
LWC
ILLLHQ
NNCSVTYQHT
RSK
NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT
CCYSV
WLSYFFF
NT
RTFY
NHLTCWFL
RLVLFWTIYTTRYRIS
ER

KCILH

SYHIPPRW
SYHL
QS
DTSFFERSEDY
GVYNSRQH
PPHASCGHVNDIWTTVWSNLFGWS
CY
NKTS
FT
R
NILCFT


HSTC
GF
VLPHN
S
FSG
VHVSIKSH
KVEIPTS
WFNFY
MGR
QLLSCHCIVNTPTNRVEV
STCSTRCLLQSKGW
SC
LLCTYLSLL

DSR
VR
C
RNNELLVSTCQFRFLQKSLERGV
NLWTTADNP
GCRSCYVHGHTFL
TI
ERCSDTLYVW
TSYKISSTTGVTFCYDVSTTCSV
T
AWYIYLC

VHW
LPVWSL
TYNF
RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL
IGWCCLYRN
P
VGQLL
ERQFLFHRATN
SCTKPTISKRKLR
F
VCM

YQIC

FKPVNWL
ETCFKRA
SYIFP
LKW
CGGY
L
TLHTLF
ERS
IVT
TYCLAC
QCN

SHV
TKYLVYTLSLEHKTS
NIKFV
CTEVRGRAGNG
SCLRRSKTSL
RSSGKSYHTERRS
V
CENYRSCRRHYT
TSK

FKNYRRGWPHRSNGCLCRQF
SYY
ET

II
SIRFENPCYSWFSCC

CPLGYYS
LC
AFS
QSC
YNY
HSYTVFKPCLY
LYALFLYFIATIVYFY
KYKF
N
SIYADYYSKEYC
ECR
ILSRGFI
LFEVT
FF
TDKYYNLVFTIKCLPRFFNLLNRCFRCFNV
FRHAFLLYWLQRRLFELY
CHYCNLLYWFYTL
CLS
WFRFFRHLSFFRNYTNYHFIF
MGFNCFWLSCRVVFGIYSFH
VFLCTWIGCNHAIVFQLFCSTFY

FLAYVVNN
SCTNGPDFSYG
NVHLLCIILLCMEKLCACCRRL
FINLYDVLQT

SNKSRMYNYC
WC
KVLLCLC
WR
RLLQTTQLELC
L
YILCW
YIY


SCERLVTTV
KTNKSY
PVFLHR

CYSEEWFHPSLL

SWSKDL
KTFSLSFC
LRQPES

H
RFIAY
CYSF
W
IKM
RIICKISVCLLQSAYVSTYTVTRSGISV
CW

CGSCS
NV
CLR
YVFINF
RTNGKTQNTSCNCRS
TCKECVLRQCLIYFYFSSSARVC
FRCRN
RCC
MS
IVTSI
HRSYWR
L

LYAHL
QS
KHDTP
PWCLY
L
CASY
CAGSKKSQHCFDMER
RFHVIV
TTTKTNT
CC
KE
LTF
VDMCNY
TSC
CCNNKDST
GW
NC

LVEAVN
SYTCVPFCCCYFLFNNTCSCHV
TY
LFK
NHRIQGY
WWCHS
HSIYRYLFC
QTC
F
HMV
PAWW
LY

QSLPIDCCSHNKRSGFCRAWFAWHDITHN
W
LFAFLT
SF
CSW
HLLHTIKTYRVH
LCNISLCFGC
MYNF
RCFW
ASTILL
YQCTRRFCCL
KFTP
HTLCAHGWLYYSIS
HLP
RFC
SGNNF
F
VL
ARHL
KIRSWCLCIY
W
MGT
Q
LLQIFTRSFLWCRCCKFTY
YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV
KSFW
IQSCSCL
YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY

CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL

LPKETCSL
WCFL
YF
RSCAVHLFVK
RNVSKVA

CAITSYAI

ILSSL

VQVF
WSNGYN
LQRSCLLSSRKGSQ
LQ
LRF
CSLPTTTNLYHLSCFAEWF
KNGIPIW
S
GLYGTSNLWYNYT
RSLA

RSLLSKTCDLHL
RHA
P
L
RFTHS
V
S
FLGTGW
CSTQGYWTFYAKLCT
A
G
YSQS
DT
V
VCSHSTRTDFFSVSLLQWFTIWCLPMCYEAQFHY
GFIP
WFMW
CWF
HRL
LCLFLLHAPYGITNWSSCWHRLRR
LLWTFC
QANSTSSWYGHNYYS
CFSLVVRCCYKWRQVVSQSIYHNS

L
PCGYEVQL
TSNTRPC
HTRTSFCSNWNCRFRYVCFIKRITAKWYEWTYHIG
CFIRR
IYTF
CC
TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV
KCLFTFCYGYYCYVCFCNDVCQT
ACISLFVFVTFSCHCSLF
YGLYAC
LGDAYYDMVGYG
Y
FVWF
AKRLCYVCISCSVTNPYDSKNCV

WC
ESVDTYECLDTRL
SLLW
CFRSSHFHVGSYNLCYF
LLRCSYNCHVFGQRYCFYVC
VLPYFLHNW
YTSVYNASLLFLRLFLYLLLWPLLFTQPLL
TDSWCL
LLSFYTGV
IYEFTGTTPTQE
HRCLQTQH
IVGCWWQTLYQSSHCTV
NVRCKVHISSLTLSFATTQSRIII
IVGSMCPVTQ
HSLS
RYY
SL
KNGFTTFCFAFHAGCCRHKQAL
RNAGQQGNLTSYSLRV
FPSIICSFCYCSRSL
AGCC
W
F
SCS
KVEEVFECG
I
I
P
CSHAT
VGKDG
SSYDPNV
TG
I
GQEGKSY
CYADNAFHYA
KVG

CTQQHYQQCKRWLCSLEHNTSYNSSQTNGCHTRL
HI
KYV
WYNIYLCISIVGNPTGCRCR

NCST

N
YGQFT
FSMASYCNSFKGQFCCQITE

A
SCCTTTDVLCCRYYTNCLH

QCVSLLQHNKGR
VCTCTVIRFTGFEMG
IP
E
WNWYYLYRTGTTL
VCYRHT
RS
SEVFILY
RIKQPK
RYGTW
FSCHSTSTSW
CNRSACQFNCIIFLCFCCRCC
SLQRLSS
WGTTNH
LC
DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS
RIL
LKR
VCTNTYNLC

PCGFYT
KHSLYRLRYVERLWL
L
STPRTHASVS
CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQAVGACVLCNSQTSLRCGACIRRPFLCCKCCYDHVISTSHKLVLSVNPYVCNAPGCDVTDVTQLYLGGMSYYCKSHKPPISFPLCANGQVFGLYKNTCVGSDNVTDFNAIATCDWTNAGDYILANTCTERLKLFAAETLKATEETFKLSYGIATVREVLSDRELHLSWEVGKPRPPLNRNYVFTGYRVTKNSKVQIGEYTFEKGDYGDAVVYRGTTTYKLNVGDYFVLTSHTVMPLSAPTLVPQEHYVRITGLYPTLNISDEFSSNVANYQKVGMQKYSTLQGPPGTGKSHFAIGLALYYPSARIVYTACSHAAVDALCEKALKYLPIDKCSRIIPARARVECFDKFKVNSTLEQYVFCTVNALPETTADIVVFDEISMATNYDLSVVNARLRAKHYVYIGDPAQLPAPRTLLTKGTLEPEYFNSVCRLMKTIGPDMFLGTCRRCPAEIVDTVSALVYDNKLKAHKDKSAQCFKMFYKGVITHDVSSAINRPQIGVVREFLTRNPAWRKAVFISPYNSQNAVASKILGLPTQTVDSSQGSEYDYVIFTQTTETAHSCNVNRFNVAITRAKVGILCIMSDRDLYDKLQFTSLEIPRRNVATLQAENVTGLFKDCSKVITGLHPTQAPTHLSVDTKFKTEGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEGCHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLCDRRATCFSTASDTYACWHHSIGFDYVYNPFMIDVQQWGFTGNLQSNHDLYCQVHGNAHVASCDAIMTRCLAVHECFVKRVDWTIEYPIIGDELKINAACRKVQHMVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIEELFYSYATHSDKFTDGVCLFWNCNVDRYPANSIVCRFDTRVLSNLNLPGCDGGSLYVNKHAFHTPAFDKSAFVNLKQLPFFYYSDSPCESHGKQVVSDIDYVPLKSATCITRCNLGGAVCRHHANEYRLYLDAYNMMISAGFSLWVYKQFDTYNLWNTFTRLQSLENVAFNVVNKGHFDGQQGEVPVSIINNTVYTKVDGVDVELFENKTTLPVNVAFELWAKRNIKPVPEVKILNNLGVDIAANTVIWDYKRDAPAHISTIGVCSMTDIAKKPTETICAPLTVFFDGRVDGQVDLFRNARNGVLITEGSVKGLQPSVGPKQASLNGVTLIGEAVKTQFNYYKKVDGVVQQLPETYFTQSRNLQEFKPRSQMEIDFLELAMDEFIERYKLEGYAFEHIVYGDFSHSQLGGLHLLIGLAKRFKESPFELEDFIPMDSTVKNYFITDAQTGSSKCVCSVIDLLLDDFVEIIKSQDLSVVSKVVKVTIDYTEISFMLWCKDGHVETFYPKLQSSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGDSATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEGFFTYICGFIQQKLALGGSVAIKITEHSWNADLYKLMGHFAWWTAFVTNVNASSSEAFLIGCNYLGKPREQIDGYVMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKEGQINDMILSLLSKGRLIIRENNRVVISSDVLVNN
TNNVCFSCFIATSL
SVC
SYNQNSITPCIH
FFHTWCLLP
QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY
EV

PCPTI

WCLFCFH
EV
HNKRLDFWYYFRFEDPVPTYC

RY
CCY
SL
ISIL

SIFGCLLPQKQQKLDGK
VQSLF
CE
LHF
ICLSAFSYGP
RKTG
FQKS
GICV
EY
WLF
NIF
AHAY
FSA
SPSGFFGFRTIGRFANRY
HH
VSNFTCFT
KLFDSW
FFFRLDSWCCSLLCGLSST
DFSIKI

KWNHYRCCRLCT
PSLRNKVYVEILHCRKRNLSNF
L
SPTNRIYC
IS
YYKLVPFW
SF
RHQICICLCLEQEENQQLCC
LFCPI
FRIIFHF
VLWSVSY
IK
SLLY
CLCRFICN
R

SQTNRSRANWKDC
L
L
ITR
FYRLRYSLEF
QS
F
GWW
L
LPV
IV
EV
SQTF
ERYFN
NLSGR
HTL
WC
RF
LLLSFTIIWFPTH
WCWLPTIQSSSTFF
TSTCTSNCLWT
KVY
FG
KQMCQFQLQWFNRHRCSY
V
QKVSAFPTIWQRHC
HY
CCP
STDT
DS
HYTMFFWWCQCYNTRNKYF
PGCCSLSGC
LHRSPCCYSCRSTYSYLACLFYRF
CFSNTCRLFNRG
TCQQLI
V
HTHWCRYMR
LSDSD
FSSAGT
CS
SIHHCLHYVTWCRKFSCLL

LYCHTHKFYY
CYHRNSTSVYDQDISRLYNVHLW
FN
MQQSFVAIWQFLYTIKPCFNWNSC
TRQKHPRSFCTSQTNLQNTTN
RFWWF
FFTNITRSIKTKQEVIY
RSTFQQSDTCRCWLHQTIW
LPW
YCC
RPHLCTKV
RPYCFATFAHR
NDCSIHFCTVSGYNHFWLDLWCRCCITNTICYANGL
V
WYWSYTECSL
EPKIDCQPI

CYWQNSRLTFFHSKCTWKTSRCGQPKCTSFKHAC
TT
LQFWCNFKCFK
YPFTS
QS
G
SAN

VDHRQTSKFADICDSTIN
SCRNQSFC
SCCY
NVRVCTWTIKKS
FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS
WKSTLSS
RCLCFKWHTLVCNTKEFL
TTNHYYRQHICVW
L
CCNRNCQQHSL
SFAT
IRLIQGGVR
IF
ESYITRC
FR
HLWH
CFSCKHSKRN
PPQ
GCQEFK
ISHRSPRTWKV
AVYKMAMVHLARFYSWLDCHSNGDNYALLYDQLL
LSQGLLFLWILLQI

RRL
ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPL
AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV
TN
ILY
FFCLEL
F
PWQIPTVLLPLKSLKSSLNNGT


VSYSLHGFVFYNLPMPTGIGFCI
LS
FSSGCYGQ
L
LVLCLLLFTE
IGSPVELLSQWLVL
A
CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF
PDRF
KVNS
SEL
SFVDIFVLLDTI
DAVTSRTCLKKSLLLHHERFLITNWELRSV
QVTQVLLHTVATGLATIN
TQTIPVAVTILLCLYSK
QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQLDEEQPMEID
TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIVAAIVFITLCFTLKRKTE
LNFH
LTSICAF
PFCYSLF
LCLLSFGSHLNCKIIMKLVTPKRT
NFLFS
ESSQL
LHFTKNVVYSHVLNINHM
LMTRVLFTSILNGILE
ELENQHL
LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL
CVVRSMKTF
SIMTFVLF
ISSKRTN
NV

WTPKSAKCTPHYVWWTLRFNWQ
PEWRTQWGAIKTTSAPRFTQ
YCVLVHRSHSTWQGRP
IPSRTRRSN
HQ
QSR
PNWLLPKSYQTNSWW
R
NERSQSKMVFLLPRNWARSWTSLWC
QRRHHMGCN
GSLEYTKRSHWHPQSC
QCCNRATTSSRNNIAKRLLRRREQRRQSSLFSFLIT
SQQFKKFNSRQQ
GNFSC
NGWQWR
CCSCFAAA
QIEPA
EQNVW
RPTTTRPNCH
EICC
GF
EASAKTYCH
SIQCNTSFRQTWSRTNPRKFWGPGTNQTRN
LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG
QRSKFQRSSHFAE
AY
RIQNIPTNRA
KGQKEEG

NSSLTAETEETANCDSSSCCRFG
FLQTIATIHEQC
LNSGLNSCRPHKADGLYKRFRFSVYDI
STLVQNEFS
LHSTSRCS
L
SHIAIFNQCVTLGRT
KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM
F

LLRRMTKKKKKKKKKK
In [89]:
ncov_aa = [str(i) for i in ncov_amino_acids]
In [90]:
ncov_aa
Out[90]:
['IKGLYLPR',
 'QTNQLSISCRSVL',
 'TNFKICVAVTRLHA',
 'CTHAV',
 'LITNYCR',
 'QDTSNSSIFCRLLTVSSVLQPIISTSRFRPGVTER',
 'DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS',
 'RWHLWLSRS',
 'KRRFAST',
 'TALCVHQTFGCSNCTSWSCYG',
 'AGSRTRRHSVRS',
 'W',
 'DTWCPCPSCGRNTSGLPQGSSS',
 'ER',
 '',
 'RSWWP',
 'LRRRSKVI',
 'LRRRAWH',
 'SL',
 'RFSRKLEH',
 'T',
 'QWCYP',
 'THA',
 'A',
 'RRGIHSLCR',
 'QLLWP',
 'WLPS',
 'VH',
 'RPSSTCW',
 'SFMHFVRTTGLY',
 'H',
 'EGCILLP',
 'T',
 'A',
 'NCLVHGTF',
 'KEL',
 'IADTF',
 'N',
 'IGKEI',
 'HLQWGMSKFCISLKFHNQDYSTKG',
 'KEKA',
 'WLYG',
 'NSICLSSCVTK',
 'MQPNVPFNSHEV',
 'SLW',
 'NFMADGRFC',
 'SHLRILWH',
 'EFD',
 'RRCHYLWLLTPKCCC',
 'NLLSSMSQFRSRT',
 'A',
 'SCRIP',
 '',
 'IWLENHSS',
 'GWSHYCLWRLCVLLCWLP',
 'QVCLLGSTC',
 'R',
 'HRL',
 'PYRCCWRRFRRS',
 '',
 'QPS',
 'NTPKRESQHQYCW',
 'L',
 'T',
 '',
 'RDRHYFGIFFCFHKCFCGNCERFGL',
 'SIQTNC',
 'ILW',
 'F',
 'SYKRKS',
 'KRCLEYW',
 'TEINTESSLCICIRGCSCCTINFLPHS',
 'NCSKFCACFTEGRYNNTRWNFTVFTETH',
 'CYDVHI',
 'FGY',
 'QSSCNGLHYRWCCSVDFAVAN',
 'HLWHCL',
 'KTQTRP',
 'LA',
 'REV',
 'GRCRVS',
 'RRLGNC',
 'IYLNLCL',
 'NCRWTNCHLCKGN',
 'GECSDIL',
 'ACK',
 'IFGFVC',
 'LYHYWWS',
 'T',
 'SLEFR',
 'NICHALKGIVQKVC',
 'IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW',
 'FTTIRTTY',
 '',
 'SC',
 'SSIGWYTSLY',
 'RAYVARNQRHRKVLCPCT',
 'YDGNKQYLHTQRRCTNKGYFW',
 '',
 'HCDRSARLQECEYHF',
 'T',
 '',
 'KD',
 '',
 'ST',
 '',
 'EVLCLYS',
 'TRYRSK',
 'VRLCCGRCCHKNFATSI',
 'ITYTTGH',
 'FR',
 'VEYGYILLI',
 '',
 'VW',
 'V',
 'IGFTYVLFFLPSR',
 'G',
 'RRR',
 'L',
 'RRRV',
 'AINSI',
 'VWY',
 'R',
 'LPR',
 'TFGIWCHFCCSST',
 'RRARRRLVR',
 '',
 '',
 'STNCWSTRRQ',
 'GQSDNYYSNNC',
 'GSTSIRDGTYTSCSDY',
 'SE',
 'F',
 'WLFKTY',
 'QCIH',
 'KCRHCGRS',
 'KGKTNSGC',
 'CSQCLP',
 'TWRRCCRSLK',
 'GY',
 'QCHAS',
 'I',
 '',
 'LHSY',
 'WTT',
 'SGW',
 'LCFKRTQSC',
 'TLSSCCRPKC',
 'QR',
 'RHSTS',
 'ECL',
 'KF',
 'SARSSTCTIIISWYFWC',
 'PYTFFKSLCRYCSHKCLLSCL',
 '',
 'KSL',
 'QTCFKLFGNEE',
 'KAS',
 'TKDR',
 'DS',
 'RGS',
 'AIYN',
 'K',
 'TFS',
 'TEKTR',
 '',
 'ENQSLC',
 'RSYNNSGRN',
 'VPHRKLVTLY',
 'H',
 'WQSSSRFCHSC',
 '',
 'H',
 'HHFLKERCSIYSG',
 'CCSRGCFNCCGYTY',
 'KGWWHY',
 'NASESFEKSANRQLYNHLPGSGFKWLHCRGGKDSA',
 'KV',
 'KCLLHSTIYYL',
 '',
 'EARNSWNCFLEFARNACTCRRNTQINACLCGN',
 'SHSFNYTA',
 'I',
 'GY',
 'NTRGCG',
 'LWC',
 'ILLLHQ',
 'NNCSVTYQHT',
 'RSK',
 'NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT',
 'CCYSV',
 'WLSYFFF',
 'NT',
 'RTFY',
 'NHLTCWFL',
 'RLVLFWTIYTTRYRIS',
 'ER',
 '',
 'KCILH',
 '',
 'SYHIPPRW',
 'SYHL',
 'QS',
 'DTSFFERSEDY',
 'GVYNSRQH',
 'PPHASCGHVNDIWTTVWSNLFGWS',
 'CY',
 'NKTS',
 'FT',
 'R',
 'NILCFT',
 '',
 '',
 'HSTC',
 'GF',
 'VLPHN',
 'S',
 'FSG',
 'VHVSIKSH',
 'KVEIPTS',
 'WFNFY',
 'MGR',
 'QLLSCHCIVNTPTNRVEV',
 'STCSTRCLLQSKGW',
 'SC',
 'LLCTYLSLL',
 '',
 'DSR',
 'VR',
 'C',
 'RNNELLVSTCQFRFLQKSLERGV',
 'NLWTTADNP',
 'GCRSCYVHGHTFL',
 'TI',
 'ERCSDTLYVW',
 'TSYKISSTTGVTFCYDVSTTCSV',
 'T',
 'AWYIYLC',
 '',
 'VHW',
 'LPVWSL',
 'TYNF',
 'RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL',
 'IGWCCLYRN',
 'P',
 'VGQLL',
 'ERQFLFHRATN',
 'SCTKPTISKRKLR',
 'F',
 'VCM',
 '',
 'YQIC',
 '',
 'FKPVNWL',
 'ETCFKRA',
 'SYIFP',
 'LKW',
 'CGGY',
 'L',
 'TLHTLF',
 'ERS',
 'IVT',
 'TYCLAC',
 'QCN',
 '',
 'SHV',
 'TKYLVYTLSLEHKTS',
 'NIKFV',
 'CTEVRGRAGNG',
 'SCLRRSKTSL',
 'RSSGKSYHTERRS',
 'V',
 'CENYRSCRRHYT',
 'TSK',
 '',
 'FKNYRRGWPHRSNGCLCRQF',
 'SYY',
 'ET',
 '',
 'II',
 'SIRFENPCYSWFSCC',
 '',
 'CPLGYYS',
 'LC',
 'AFS',
 'QSC',
 'YNY',
 'HSYTVFKPCLY',
 'LYALFLYFIATIVYFY',
 'KYKF',
 'N',
 'SIYADYYSKEYC',
 'ECR',
 'ILSRGFI',
 'LFEVT',
 'FF',
 'TDKYYNLVFTIKCLPRFFNLLNRCFRCFNV',
 'FRHAFLLYWLQRRLFELY',
 'CHYCNLLYWFYTL',
 'CLS',
 'WFRFFRHLSFFRNYTNYHFIF',
 'MGFNCFWLSCRVVFGIYSFH',
 'VFLCTWIGCNHAIVFQLFCSTFY',
 '',
 'FLAYVVNN',
 'SCTNGPDFSYG',
 'NVHLLCIILLCMEKLCACCRRL',
 'FINLYDVLQT',
 '',
 'SNKSRMYNYC',
 'WC',
 'KVLLCLC',
 'WR',
 'RLLQTTQLELC',
 'L',
 'YILCW',
 'YIY',
 '',
 '',
 'SCERLVTTV',
 'KTNKSY',
 'PVFLHR',
 '',
 'CYSEEWFHPSLL',
 '',
 'SWSKDL',
 'KTFSLSFC',
 'LRQPES',
 '',
 'H',
 'RFIAY',
 'CYSF',
 'W',
 'IKM',
 'RIICKISVCLLQSAYVSTYTVTRSGISV',
 'CW',
 '',
 'CGSCS',
 'NV',
 'CLR',
 'YVFINF',
 'RTNGKTQNTSCNCRS',
 'TCKECVLRQCLIYFYFSSSARVC',
 'FRCRN',
 'RCC',
 'MS',
 'IVTSI',
 'HRSYWR',
 'L',
 '',
 'LYAHL',
 'QS',
 'KHDTP',
 'PWCLY',
 'L',
 'CASY',
 'CAGSKKSQHCFDMER',
 'RFHVIV',
 'TTTKTNT',
 'CC',
 'KE',
 'LTF',
 'VDMCNY',
 'TSC',
 'CCNNKDST',
 'GW',
 'NC',
 '',
 'LVEAVN',
 'SYTCVPFCCCYFLFNNTCSCHV',
 'TY',
 'LFK',
 'NHRIQGY',
 'WWCHS',
 'HSIYRYLFC',
 'QTC',
 'F',
 'HMV',
 'PAWW',
 'LY',
 '',
 'QSLPIDCCSHNKRSGFCRAWFAWHDITHN',
 'W',
 'LFAFLT',
 'SF',
 'CSW',
 'HLLHTIKTYRVH',
 'LCNISLCFGC',
 'MYNF',
 'RCFW',
 'ASTILL',
 'YQCTRRFCCL',
 'KFTP',
 'HTLCAHGWLYYSIS',
 'HLP',
 'RFC',
 'SGNNF',
 'F',
 'VL',
 'ARHL',
 'KIRSWCLCIY',
 'W',
 'MGT',
 'Q',
 'LLQIFTRSFLWCRCCKFTY',
 'YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV',
 'KSFW',
 'IQSCSCL',
 'YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY',
 '',
 'CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL',
 '',
 'LPKETCSL',
 'WCFL',
 'YF',
 'RSCAVHLFVK',
 'RNVSKVA',
 '',
 'CAITSYAI',
 '',
 'ILSSL',
 '',
 'VQVF',
 'WSNGYN',
 'LQRSCLLSSRKGSQ',
 'LQ',
 'LRF',
 'CSLPTTTNLYHLSCFAEWF',
 'KNGIPIW',
 'S',
 'GLYGTSNLWYNYT',
 'RSLA',
 '',
 'RSLLSKTCDLHL',
 'RHA',
 'P',
 'L',
 'RFTHS',
 'V',
 'S',
 'FLGTGW',
 'CSTQGYWTFYAKLCT',
 'A',
 'G',
 'YSQS',
 'DT',
 'V',
 'VCSHSTRTDFFSVSLLQWFTIWCLPMCYEAQFHY',
 'GFIP',
 'WFMW',
 'CWF',
 'HRL',
 'LCLFLLHAPYGITNWSSCWHRLRR',
 'LLWTFC',
 'QANSTSSWYGHNYYS',
 'CFSLVVRCCYKWRQVVSQSIYHNS',
 '',
 'L',
 'PCGYEVQL',
 'TSNTRPC',
 'HTRTSFCSNWNCRFRYVCFIKRITAKWYEWTYHIG',
 'CFIRR',
 'IYTF',
 'CC',
 'TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV',
 'KCLFTFCYGYYCYVCFCNDVCQT',
 'ACISLFVFVTFSCHCSLF',
 'YGLYAC',
 'LGDAYYDMVGYG',
 'Y',
 'FVWF',
 'AKRLCYVCISCSVTNPYDSKNCV',
 '',
 'WC',
 'ESVDTYECLDTRL',
 'SLLW',
 'CFRSSHFHVGSYNLCYF',
 'LLRCSYNCHVFGQRYCFYVC',
 'VLPYFLHNW',
 'YTSVYNASLLFLRLFLYLLLWPLLFTQPLL',
 'TDSWCL',
 'LLSFYTGV',
 'IYEFTGTTPTQE',
 'HRCLQTQH',
 'IVGCWWQTLYQSSHCTV',
 'NVRCKVHISSLTLSFATTQSRIII',
 'IVGSMCPVTQ',
 'HSLS',
 'RYY',
 'SL',
 'KNGFTTFCFAFHAGCCRHKQAL',
 'RNAGQQGNLTSYSLRV',
 'FPSIICSFCYCSRSL',
 'AGCC',
 'W',
 'F',
 'SCS',
 'KVEEVFECG',
 'I',
 'I',
 'P',
 'CSHAT',
 'VGKDG',
 'SSYDPNV',
 'TG',
 'I',
 'GQEGKSY',
 'CYADNAFHYA',
 'KVG',
 '',
 'CTQQHYQQCKRWLCSLEHNTSYNSSQTNGCHTRL',
 'HI',
 'KYV',
 'WYNIYLCISIVGNPTGCRCR',
 '',
 'NCST',
 '',
 'N',
 'YGQFT',
 'FSMASYCNSFKGQFCCQITE',
 '',
 'A',
 'SCCTTTDVLCCRYYTNCLH',
 '',
 'QCVSLLQHNKGR',
 'VCTCTVIRFTGFEMG',
 'IP',
 'E',
 'WNWYYLYRTGTTL',
 'VCYRHT',
 'RS',
 'SEVFILY',
 'RIKQPK',
 'RYGTW',
 'FSCHSTSTSW',
 'CNRSACQFNCIIFLCFCCRCC',
 'SLQRLSS',
 'WGTTNH',
 'LC',
 'DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS',
 'RIL',
 'LKR',
 'VCTNTYNLC',
 '',
 'PCGFYT',
 'KHSLYRLRYVERLWL',
 'L',
 'STPRTHASVS',
 'CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQAVGACVLCNSQTSLRCGACIRRPFLCCKCCYDHVISTSHKLVLSVNPYVCNAPGCDVTDVTQLYLGGMSYYCKSHKPPISFPLCANGQVFGLYKNTCVGSDNVTDFNAIATCDWTNAGDYILANTCTERLKLFAAETLKATEETFKLSYGIATVREVLSDRELHLSWEVGKPRPPLNRNYVFTGYRVTKNSKVQIGEYTFEKGDYGDAVVYRGTTTYKLNVGDYFVLTSHTVMPLSAPTLVPQEHYVRITGLYPTLNISDEFSSNVANYQKVGMQKYSTLQGPPGTGKSHFAIGLALYYPSARIVYTACSHAAVDALCEKALKYLPIDKCSRIIPARARVECFDKFKVNSTLEQYVFCTVNALPETTADIVVFDEISMATNYDLSVVNARLRAKHYVYIGDPAQLPAPRTLLTKGTLEPEYFNSVCRLMKTIGPDMFLGTCRRCPAEIVDTVSALVYDNKLKAHKDKSAQCFKMFYKGVITHDVSSAINRPQIGVVREFLTRNPAWRKAVFISPYNSQNAVASKILGLPTQTVDSSQGSEYDYVIFTQTTETAHSCNVNRFNVAITRAKVGILCIMSDRDLYDKLQFTSLEIPRRNVATLQAENVTGLFKDCSKVITGLHPTQAPTHLSVDTKFKTEGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEGCHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLCDRRATCFSTASDTYACWHHSIGFDYVYNPFMIDVQQWGFTGNLQSNHDLYCQVHGNAHVASCDAIMTRCLAVHECFVKRVDWTIEYPIIGDELKINAACRKVQHMVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIEELFYSYATHSDKFTDGVCLFWNCNVDRYPANSIVCRFDTRVLSNLNLPGCDGGSLYVNKHAFHTPAFDKSAFVNLKQLPFFYYSDSPCESHGKQVVSDIDYVPLKSATCITRCNLGGAVCRHHANEYRLYLDAYNMMISAGFSLWVYKQFDTYNLWNTFTRLQSLENVAFNVVNKGHFDGQQGEVPVSIINNTVYTKVDGVDVELFENKTTLPVNVAFELWAKRNIKPVPEVKILNNLGVDIAANTVIWDYKRDAPAHISTIGVCSMTDIAKKPTETICAPLTVFFDGRVDGQVDLFRNARNGVLITEGSVKGLQPSVGPKQASLNGVTLIGEAVKTQFNYYKKVDGVVQQLPETYFTQSRNLQEFKPRSQMEIDFLELAMDEFIERYKLEGYAFEHIVYGDFSHSQLGGLHLLIGLAKRFKESPFELEDFIPMDSTVKNYFITDAQTGSSKCVCSVIDLLLDDFVEIIKSQDLSVVSKVVKVTIDYTEISFMLWCKDGHVETFYPKLQSSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGDSATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEGFFTYICGFIQQKLALGGSVAIKITEHSWNADLYKLMGHFAWWTAFVTNVNASSSEAFLIGCNYLGKPREQIDGYVMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKEGQINDMILSLLSKGRLIIRENNRVVISSDVLVNN',
 'TNNVCFSCFIATSL',
 'SVC',
 'SYNQNSITPCIH',
 'FFHTWCLLP',
 'QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY',
 'EV',
 '',
 'PCPTI',
 '',
 'WCLFCFH',
 'EV',
 'HNKRLDFWYYFRFEDPVPTYC',
 '',
 'RY',
 'CCY',
 'SL',
 'ISIL',
 '',
 'SIFGCLLPQKQQKLDGK',
 'VQSLF',
 'CE',
 'LHF',
 'ICLSAFSYGP',
 'RKTG',
 'FQKS',
 'GICV',
 'EY',
 'WLF',
 'NIF',
 'AHAY',
 'FSA',
 'SPSGFFGFRTIGRFANRY',
 'HH',
 'VSNFTCFT',
 'KLFDSW',
 'FFFRLDSWCCSLLCGLSST',
 'DFSIKI',
 '',
 'KWNHYRCCRLCT',
 'PSLRNKVYVEILHCRKRNLSNF',
 'L',
 'SPTNRIYC',
 'IS',
 'YYKLVPFW',
 'SF',
 'RHQICICLCLEQEENQQLCC',
 'LFCPI',
 'FRIIFHF',
 'VLWSVSY',
 'IK',
 'SLLY',
 'CLCRFICN',
 'R',
 '',
 'SQTNRSRANWKDC',
 'L',
 'L',
 'ITR',
 'FYRLRYSLEF',
 'QS',
 'F',
 'GWW',
 'L',
 'LPV',
 'IV',
 'EV',
 'SQTF',
 'ERYFN',
 'NLSGR',
 'HTL',
 'WC',
 'RF',
 'LLLSFTIIWFPTH',
 'WCWLPTIQSSSTFF',
 'TSTCTSNCLWT',
 'KVY',
 'FG',
 'KQMCQFQLQWFNRHRCSY',
 'V',
 'QKVSAFPTIWQRHC',
 'HY',
 'CCP',
 'STDT',
 'DS',
 'HYTMFFWWCQCYNTRNKYF',
 'PGCCSLSGC',
 'LHRSPCCYSCRSTYSYLACLFYRF',
 'CFSNTCRLFNRG',
 'TCQQLI',
 'V',
 'HTHWCRYMR',
 'LSDSD',
 'FSSAGT',
 'CS',
 'SIHHCLHYVTWCRKFSCLL',
 '',
 'LYCHTHKFYY',
 'CYHRNSTSVYDQDISRLYNVHLW',
 'FN',
 'MQQSFVAIWQFLYTIKPCFNWNSC',
 'TRQKHPRSFCTSQTNLQNTTN',
 'RFWWF',
 'FFTNITRSIKTKQEVIY',
 'RSTFQQSDTCRCWLHQTIW',
 'LPW',
 'YCC',
 'RPHLCTKV',
 'RPYCFATFAHR',
 'NDCSIHFCTVSGYNHFWLDLWCRCCITNTICYANGL',
 'V',
 'WYWSYTECSL',
 'EPKIDCQPI',
 '',
 'CYWQNSRLTFFHSKCTWKTSRCGQPKCTSFKHAC',
 'TT',
 'LQFWCNFKCFK',
 'YPFTS',
 'QS',
 'G',
 'SAN',
 '',
 'VDHRQTSKFADICDSTIN',
 'SCRNQSFC',
 'SCCY',
 'NVRVCTWTIKKS',
 'FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS',
 'WKSTLSS',
 'RCLCFKWHTLVCNTKEFL',
 'TTNHYYRQHICVW',
 'L',
 'CCNRNCQQHSL',
 'SFAT',
 'IRLIQGGVR',
 'IF',
 'ESYITRC',
 'FR',
 'HLWH',
 'CFSCKHSKRN',
 'PPQ',
 'GCQEFK',
 'ISHRSPRTWKV',
 'AVYKMAMVHLARFYSWLDCHSNGDNYALLYDQLL',
 'LSQGLLFLWILLQI',
 '',
 'RRL',
 'ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPL',
 'AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV',
 'TN',
 'ILY',
 'FFCLEL',
 'F',
 'PWQIPTVLLPLKSLKSSLNNGT',
 '',
 '',
 'VSYSLHGFVFYNLPMPTGIGFCI',
 'LS',
 'FSSGCYGQ',
 'L',
 'LVLCLLLFTE',
 'IGSPVELLSQWLVL',
 'A',
 'CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF',
 'PDRF',
 'KVNS',
 'SEL',
 'SFVDIFVLLDTI',
 'DAVTSRTCLKKSLLLHHERFLITNWELRSV',
 'QVTQVLLHTVATGLATIN',
 'TQTIPVAVTILLCLYSK',
 'QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQLDEEQPMEID',
 'TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIVAAIVFITLCFTLKRKTE',
 'LNFH',
 'LTSICAF',
 'PFCYSLF',
 'LCLLSFGSHLNCKIIMKLVTPKRT',
 'NFLFS',
 'ESSQL',
 'LHFTKNVVYSHVLNINHM',
 'LMTRVLFTSILNGILE',
 'ELENQHL',
 'LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL',
 'CVVRSMKTF',
 'SIMTFVLF',
 'ISSKRTN',
 'NV',
 '',
 'WTPKSAKCTPHYVWWTLRFNWQ',
 'PEWRTQWGAIKTTSAPRFTQ',
 'YCVLVHRSHSTWQGRP',
 'IPSRTRRSN',
 'HQ',
 'QSR',
 'PNWLLPKSYQTNSWW',
 'R',
 'NERSQSKMVFLLPRNWARSWTSLWC',
 'QRRHHMGCN',
 'GSLEYTKRSHWHPQSC',
 'QCCNRATTSSRNNIAKRLLRRREQRRQSSLFSFLIT',
 'SQQFKKFNSRQQ',
 'GNFSC',
 'NGWQWR',
 'CCSCFAAA',
 'QIEPA',
 'EQNVW',
 'RPTTTRPNCH',
 'EICC',
 'GF',
 'EASAKTYCH',
 'SIQCNTSFRQTWSRTNPRKFWGPGTNQTRN',
 'LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG',
 'QRSKFQRSSHFAE',
 'AY',
 'RIQNIPTNRA',
 'KGQKEEG',
 '',
 'NSSLTAETEETANCDSSSCCRFG',
 'FLQTIATIHEQC',
 'LNSGLNSCRPHKADGLYKRFRFSVYDI',
 'STLVQNEFS',
 'LHSTSRCS',
 'L',
 'SHIAIFNQCVTLGRT',
 'KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM',
 'F',
 '',
 'LLRRMTKKKKKKKKKK']
In [83]:
# Find the largest amino acid
import heapq
In [91]:
heapq.nlargest(10,ncov_aa)
Out[91]:
['YYKLVPFW',
 'YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV',
 'YVFINF',
 'YTSVYNASLLFLRLFLYLLLWPLLFTQPLL',
 'YSQS',
 'YQIC',
 'YQCTRRFCCL',
 'YPFTS',
 'YNY',
 'YIY']
In [94]:
for i in ncov_amino_acids:
    if len(i) > 20:
        print(i)
QDTSNSSIFCRLLTVSSVLQPIISTSRFRPGVTER
DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS
TALCVHQTFGCSNCTSWSCYG
DTWCPCPSCGRNTSGLPQGSSS
HLQWGMSKFCISLKFHNQDYSTKG
RDRHYFGIFFCFHKCFCGNCERFGL
TEINTESSLCICIRGCSCCTINFLPHS
NCSKFCACFTEGRYNNTRWNFTVFTETH
QSSCNGLHYRWCCSVDFAVAN
IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW
YDGNKQYLHTQRRCTNKGYFW
PYTFFKSLCRYCSHKCLLSCL
NASESFEKSANRQLYNHLPGSGFKWLHCRGGKDSA
EARNSWNCFLEFARNACTCRRNTQINACLCGN
NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT
PPHASCGHVNDIWTTVWSNLFGWS
RNNELLVSTCQFRFLQKSLERGV
TSYKISSTTGVTFCYDVSTTCSV
RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL
TDKYYNLVFTIKCLPRFFNLLNRCFRCFNV
WFRFFRHLSFFRNYTNYHFIF
VFLCTWIGCNHAIVFQLFCSTFY
NVHLLCIILLCMEKLCACCRRL
RIICKISVCLLQSAYVSTYTVTRSGISV
TCKECVLRQCLIYFYFSSSARVC
SYTCVPFCCCYFLFNNTCSCHV
QSLPIDCCSHNKRSGFCRAWFAWHDITHN
YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV
YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY
CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL
VCSHSTRTDFFSVSLLQWFTIWCLPMCYEAQFHY
LCLFLLHAPYGITNWSSCWHRLRR
CFSLVVRCCYKWRQVVSQSIYHNS
HTRTSFCSNWNCRFRYVCFIKRITAKWYEWTYHIG
TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV
KCLFTFCYGYYCYVCFCNDVCQT
AKRLCYVCISCSVTNPYDSKNCV
YTSVYNASLLFLRLFLYLLLWPLLFTQPLL
NVRCKVHISSLTLSFATTQSRIII
KNGFTTFCFAFHAGCCRHKQAL
CTQQHYQQCKRWLCSLEHNTSYNSSQTNGCHTRL
CNRSACQFNCIIFLCFCCRCC
DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS
CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQAVGACVLCNSQTSLRCGACIRRPFLCCKCCYDHVISTSHKLVLSVNPYVCNAPGCDVTDVTQLYLGGMSYYCKSHKPPISFPLCANGQVFGLYKNTCVGSDNVTDFNAIATCDWTNAGDYILANTCTERLKLFAAETLKATEETFKLSYGIATVREVLSDRELHLSWEVGKPRPPLNRNYVFTGYRVTKNSKVQIGEYTFEKGDYGDAVVYRGTTTYKLNVGDYFVLTSHTVMPLSAPTLVPQEHYVRITGLYPTLNISDEFSSNVANYQKVGMQKYSTLQGPPGTGKSHFAIGLALYYPSARIVYTACSHAAVDALCEKALKYLPIDKCSRIIPARARVECFDKFKVNSTLEQYVFCTVNALPETTADIVVFDEISMATNYDLSVVNARLRAKHYVYIGDPAQLPAPRTLLTKGTLEPEYFNSVCRLMKTIGPDMFLGTCRRCPAEIVDTVSALVYDNKLKAHKDKSAQCFKMFYKGVITHDVSSAINRPQIGVVREFLTRNPAWRKAVFISPYNSQNAVASKILGLPTQTVDSSQGSEYDYVIFTQTTETAHSCNVNRFNVAITRAKVGILCIMSDRDLYDKLQFTSLEIPRRNVATLQAENVTGLFKDCSKVITGLHPTQAPTHLSVDTKFKTEGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEGCHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLCDRRATCFSTASDTYACWHHSIGFDYVYNPFMIDVQQWGFTGNLQSNHDLYCQVHGNAHVASCDAIMTRCLAVHECFVKRVDWTIEYPIIGDELKINAACRKVQHMVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIEELFYSYATHSDKFTDGVCLFWNCNVDRYPANSIVCRFDTRVLSNLNLPGCDGGSLYVNKHAFHTPAFDKSAFVNLKQLPFFYYSDSPCESHGKQVVSDIDYVPLKSATCITRCNLGGAVCRHHANEYRLYLDAYNMMISAGFSLWVYKQFDTYNLWNTFTRLQSLENVAFNVVNKGHFDGQQGEVPVSIINNTVYTKVDGVDVELFENKTTLPVNVAFELWAKRNIKPVPEVKILNNLGVDIAANTVIWDYKRDAPAHISTIGVCSMTDIAKKPTETICAPLTVFFDGRVDGQVDLFRNARNGVLITEGSVKGLQPSVGPKQASLNGVTLIGEAVKTQFNYYKKVDGVVQQLPETYFTQSRNLQEFKPRSQMEIDFLELAMDEFIERYKLEGYAFEHIVYGDFSHSQLGGLHLLIGLAKRFKESPFELEDFIPMDSTVKNYFITDAQTGSSKCVCSVIDLLLDDFVEIIKSQDLSVVSKVVKVTIDYTEISFMLWCKDGHVETFYPKLQSSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGDSATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEGFFTYICGFIQQKLALGGSVAIKITEHSWNADLYKLMGHFAWWTAFVTNVNASSSEAFLIGCNYLGKPREQIDGYVMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKEGQINDMILSLLSKGRLIIRENNRVVISSDVLVNN
QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY
HNKRLDFWYYFRFEDPVPTYC
PSLRNKVYVEILHCRKRNLSNF
LHRSPCCYSCRSTYSYLACLFYRF
CYHRNSTSVYDQDISRLYNVHLW
MQQSFVAIWQFLYTIKPCFNWNSC
TRQKHPRSFCTSQTNLQNTTN
NDCSIHFCTVSGYNHFWLDLWCRCCITNTICYANGL
CYWQNSRLTFFHSKCTWKTSRCGQPKCTSFKHAC
FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS
AVYKMAMVHLARFYSWLDCHSNGDNYALLYDQLL
ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPL
AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV
PWQIPTVLLPLKSLKSSLNNGT
VSYSLHGFVFYNLPMPTGIGFCI
CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF
DAVTSRTCLKKSLLLHHERFLITNWELRSV
QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQLDEEQPMEID
TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIVAAIVFITLCFTLKRKTE
LCLLSFGSHLNCKIIMKLVTPKRT
LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL
WTPKSAKCTPHYVWWTLRFNWQ
NERSQSKMVFLLPRNWARSWTSLWC
QCCNRATTSSRNNIAKRLLRRREQRRQSSLFSFLIT
SIQCNTSFRQTWSRTNPRKFWGPGTNQTRN
LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG
NSSLTAETEETANCDSSSCCRFG
LNSGLNSCRPHKADGLYKRFRFSVYDI
KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM
In [95]:
# Place our Amino Acids into a DataFrame
import pandas as pd
In [100]:
df = pd.DataFrame({'amino_acids':ncov_aa})
In [101]:
df.head()
Out[101]:
amino_acids
0 IKGLYLPR
1 QTNQLSISCRSVL
2 TNFKICVAVTRLHA
3 CTHAV
4 LITNYCR
In [102]:
df['count'] = df['amino_acids'].apply(len)
In [103]:
df.head()
Out[103]:
amino_acids count
0 IKGLYLPR 8
1 QTNQLSISCRSVL 13
2 TNFKICVAVTRLHA 14
3 CTHAV 5
4 LITNYCR 7
In [106]:
# Find the largest amino acid sequence
df['count'].nlargest(20)
Out[106]:
548    2701
694     290
719     123
695      83
718      63
6        46
464      46
539      43
758      43
771      43
674      42
729      41
242      40
91       39
405      38
410      38
710      38
189      37
408      36
553      36
Name: count, dtype: int64
In [111]:
df.nlargest(20,'count')
Out[111]:
amino_acids count
548 CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFL… 2701
694 ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRA… 290
719 TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNS… 123
695 AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALR… 83
718 QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSL… 63
6 DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS 46
464 TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV 46
539 DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS 43
758 LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG 43
771 KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM 43
674 FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS 42
729 LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL 41
242 RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL 40
91 IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW 39
405 YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV 38
410 CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL 38
710 CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF 38
189 NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT 37
408 YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY 36
553 QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY 36
In [113]:
# Most Frequent Amino Acid
print(ncov_protein.)
IKGLYLPR*QTNQLSISCRSVL*TNFKICVAVTRLHA*CTHAV*LITNYCR*QDTSNSSIFCRLLTVSSVLQPIISTSRFRPGVTER*DGEPCPWFQRENTRPTQFACFTGSRRARTWLWRLRGGGLIRGTSTS*RWHLWLSRS*KRRFAST*TALCVHQTFGCSNCTSWSCYG*AGSRTRRHSVRS*W*DTWCPCPSCGRNTSGLPQGSSS*ER**RSWWP*LRRRSKVI*LRRRAWH*SL*RFSRKLEH*T*QWCYP*THA*A*RRGIHSLCR*QLLWP*WLPS*VH*RPSSTCW*SFMHFVRTTGLY*H*EGCILLP*T*A*NCLVHGTF*KEL*IADTF*N*IGKEI*HLQWGMSKFCISLKFHNQDYSTKG*KEKA*WLYG*NSICLSSCVTK*MQPNVPFNSHEV*SLW*NFMADGRFC*SHLRILWH*EFD*RRCHYLWLLTPKCCC*NLLSSMSQFRSRT*A*SCRIP**IWLENHSS*GWSHYCLWRLCVLLCWLP*QVCLLGSTC*R*HRL*PYRCCWRRFRRS**QPS*NTPKRESQHQYCW*L*T**RDRHYFGIFFCFHKCFCGNCERFGL*SIQTNC*ILW*F*SYKRKS*KRCLEYW*TEINTESSLCICIRGCSCCTINFLPHS*NCSKFCACFTEGRYNNTRWNFTVFTETH*CYDVHI*FGY*QSSCNGLHYRWCCSVDFAVAN*HLWHCL*KTQTRP*LA*REV*GRCRVS*RRLGNC*IYLNLCL*NCRWTNCHLCKGN*GECSDIL*ACK*IFGFVC*LYHYWWS*T*SLEFR*NICHALKGIVQKVC*IQRRNWPTHASKSPKRNYLLRGRNTSHRSVNRGSCLENW*FTTIRTTY**SC*SSIGWYTSLY*RAYVARNQRHRKVLCPCT*YDGNKQYLHTQRRCTNKGYFW**HCDRSARLQECEYHF*T**KD**ST**EVLCLYS*TRYRSK*VRLCCGRCCHKNFATSI*ITYTTGH*FR*VEYGYILLI**VW*V*IGFTYVLFFLPSR*G*RRR*L*RRRV*AINSI*VWY*R*LPR*TFGIWCHFCCSST*RRARRRLVR***STNCWSTRRQ*GQSDNYYSNNC*GSTSIRDGTYTSCSDY*SE*F*WLFKTY*QCIH*KCRHCGRS*KGKTNSGC*CSQCLP*TWRRCCRSLK*GY*QCHAS*I**LHSY*WTT*SGW*LCFKRTQSC*TLSSCCRPKC*QR*RHSTS*ECL*KF*SARSSTCTIIISWYFWC*PYTFFKSLCRYCSHKCLLSCL**KSL*QTCFKLFGNEE*KAS*TKDR*DS*RGS*AIYN*K*TFS*TEKTR**ENQSLC*RSYNNSGRN*VPHRKLVTLY*H*WQSSSRFCHSC**H*HHFLKERCSIYSG*CCSRGCFNCCGYTY*KGWWHY*NASESFEKSANRQLYNHLPGSGFKWLHCRGGKDSA*KV*KCLLHSTIYYL**EARNSWNCFLEFARNACTCRRNTQINACLCGN*SHSFNYTA*I*GY*NTRGCG*LWC*ILLLHQ*NNCSVTYQHT*RSK*NSCYNATWLCNTWLKFGRSCSVYEISQSASYSFCFFT*CCYSV*WLSYFFF*NT*RTFY*NHLTCWFL*RLVLFWTIYTTRYRIS*ER**KCILH**SYHIPPRW*SYHL*QS*DTSFFERSEDY*GVYNSRQH*PPHASCGHVNDIWTTVWSNLFGWS*CY*NKTS*FT*R*NILCFT***HSTC*GF*VLPHN*S*FSG*VHVSIKSH*KVEIPTS*WFNFY*MGR*QLLSCHCIVNTPTNRVEV*STCSTRCLLQSKGW*SC*LLCTYLSLL**DSR*VR*C*RNNELLVSTCQFRFLQKSLERGV*NLWTTADNP*GCRSCYVHGHTFL*TI*ERCSDTLYVW*TSYKISSTTGVTFCYDVSTTCSV*T*AWYIYLC**VHW*LPVWSL*TYNF*RNFVLHRRCFTYKVLRIQRSYYGCFLQRKQLHNNHKTSYL*IGWCCLYRN*P*VGQLL*ERQFLFHRATN*SCTKPTISKRKLR*F*VCM**YQIC**FKPVNWL*ETCFKRA*SYIFP*LKW*CGGY*L*TLHTLF*ERS*IVT*TYCLAC*QCN**SHV*TKYLVYTLSLEHKTS*NIKFV*CTEVRGRAGNG*SCLRRSKTSL*RSSGKSYHTERRS*V*CENYRSCRRHYT*TSK**FKNYRRGWPHRSNGCLCRQF*SYY*ET**II*SIRFENPCYSWFSCC**CPLGYYS*LC*AFS*QSC*YNY*HSYTVFKPCLY*LYALFLYFIATIVYFY*KYKF*N*SIYADYYSKEYC*ECR*ILSRGFI*LFEVT*FF*TDKYYNLVFTIKCLPRFFNLLNRCFRCFNV*FRHAFLLYWLQRRLFELY*CHYCNLLYWFYTL*CLS*WFRFFRHLSFFRNYTNYHFIF*MGFNCFWLSCRVVFGIYSFH*VFLCTWIGCNHAIVFQLFCSTFY**FLAYVVNN*SCTNGPDFSYG*NVHLLCIILLCMEKLCACCRRL*FINLYDVLQT**SNKSRMYNYC*WC*KVLLCLC*WR*RLLQTTQLELC*L*YILCW*YIY***SCERLVTTV*KTNKSY*PVFLHR**CYSEEWFHPSLL**SWSKDL*KTFSLSFC*LRQPES**H*RFIAY*CYSF*W*IKM*RIICKISVCLLQSAYVSTYTVTRSGISV*CW**CGSCS*NV*CLR*YVFINF*RTNGKTQNTSCNCRS*TCKECVLRQCLIYFYFSSSARVC*FRCRN*RCC*MS*IVTSI*HRSYWR*L**LYAHL*QS*KHDTP*PWCLY*L*CASY*CAGSKKSQHCFDMER*RFHVIV*TTTKTNT*CC*KE*LTF*VDMCNY*TSC*CCNNKDST*GW*NC**LVEAVN*SYTCVPFCCCYFLFNNTCSCHV*TY*LFK*NHRIQGY*WWCHS*HSIYRYLFC*QTC*F*HMV*PAWW*LY**QSLPIDCCSHNKRSGFCRAWFAWHDITHN*W*LFAFLT*SF*CSW*HLLHTIKTYRVH*LCNISLCFGC*MYNF*RCFW*ASTILL*YQCTRRFCCL*KFTP*HTLCAHGWLYYSIS*HLP*RFC*SGNNF*F*VL*ARHL*KIRSWCLCIY*W*MGT*Q*LLQIFTRSFLWCRCCKFTY*YVYTTNSTYWCFGHISIYSSWWYCSYRSNMPCLLFYEV*KSFW*IQSCSCL*YFTIPYVIHCTLFNTSLLILTWCLFCYLLVLDILSY**CFFFSTYSVDGYVHTFSTFLDNNCLYHLYFHKAFLLVL**LPKETCSL*WCFL*YF*RSCAVHLFVK*RNVSKVA**CAITSYAI**ILSSL**VQVF*WSNGYN*LQRSCLLSSRKGSQ*LQ*LRF*CSLPTTTNLYHLSCFAEWF*KNGIPIW*S*GLYGTSNLWYNYT*RSLA**RSLLSKTCDLHL*RHA*P*L*RFTHS*V*S*FLGTGW*CSTQGYWTFYAKLCT*A*G*YSQS*DT*V*VCSHSTRTDFFSVSLLQWFTIWCLPMCYEAQFHY*GFIP*WFMW*CWF*HRL*LCLFLLHAPYGITNWSSCWHRLRR*LLWTFC*QANSTSSWYGHNYYS*CFSLVVRCCYKWRQVVSQSIYHNS**L*PCGYEVQL*TSNTRPC*HTRTSFCSNWNCRFRYVCFIKRITAKWYEWTYHIG*CFIRR*IYTF*CC*TMLRCYFPKCSEKNNQGYTPLVVTHNFDFTFSFSPEYSMVFVLFFV*KCLFTFCYGYYCYVCFCNDVCQT*ACISLFVFVTFSCHCSLF*YGLYAC*LGDAYYDMVGYG*Y*FVWF*AKRLCYVCISCSVTNPYDSKNCV**WC*ESVDTYECLDTRL*SLLW*CFRSSHFHVGSYNLCYF*LLRCSYNCHVFGQRYCFYVC*VLPYFLHNW*YTSVYNASLLFLRLFLYLLLWPLLFTQPLL*TDSWCL*LLSFYTGV*IYEFTGTTPTQE*HRCLQTQH*IVGCWWQTLYQSSHCTV*NVRCKVHISSLTLSFATTQSRIII*IVGSMCPVTQ*HSLS*RYY*SL*KNGFTTFCFAFHAGCCRHKQAL*RNAGQQGNLTSYSLRV*FPSIICSFCYCSRSL*AGCC*W*F*SCS*KVEEVFECG*I*I*P*CSHAT*VGKDG*SSYDPNV*TG*I*GQEGKSY*CYADNAFHYA*KVG**CTQQHYQQCKRWLCSLEHNTSYNSSQTNGCHTRL*HI*KYV*WYNIYLCISIVGNPTGCRCR**NCST**N*YGQFT*FSMASYCNSFKGQFCCQITE**A*SCCTTTDVLCCRYYTNCLH**QCVSLLQHNKGR*VCTCTVIRFTGFEMG*IP*E*WNWYYLYRTGTTL*VCYRHT*RS*SEVFILY*RIKQPK*RYGTW*FSCHSTSTSW*CNRSACQFNCIIFLCFCCRCC*SLQRLSS*WGTTNH*LC*DVVYTHWYWSGNNSYTGSQYGSRILWWCIVLSVLPLPHRSSKS*RIL*LKR*VCTNTYNLC**PCGFYT*KHSLYRLRYVERLWL*L*STPRTHASVS*CTIVFKRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQAVGACVLCNSQTSLRCGACIRRPFLCCKCCYDHVISTSHKLVLSVNPYVCNAPGCDVTDVTQLYLGGMSYYCKSHKPPISFPLCANGQVFGLYKNTCVGSDNVTDFNAIATCDWTNAGDYILANTCTERLKLFAAETLKATEETFKLSYGIATVREVLSDRELHLSWEVGKPRPPLNRNYVFTGYRVTKNSKVQIGEYTFEKGDYGDAVVYRGTTTYKLNVGDYFVLTSHTVMPLSAPTLVPQEHYVRITGLYPTLNISDEFSSNVANYQKVGMQKYSTLQGPPGTGKSHFAIGLALYYPSARIVYTACSHAAVDALCEKALKYLPIDKCSRIIPARARVECFDKFKVNSTLEQYVFCTVNALPETTADIVVFDEISMATNYDLSVVNARLRAKHYVYIGDPAQLPAPRTLLTKGTLEPEYFNSVCRLMKTIGPDMFLGTCRRCPAEIVDTVSALVYDNKLKAHKDKSAQCFKMFYKGVITHDVSSAINRPQIGVVREFLTRNPAWRKAVFISPYNSQNAVASKILGLPTQTVDSSQGSEYDYVIFTQTTETAHSCNVNRFNVAITRAKVGILCIMSDRDLYDKLQFTSLEIPRRNVATLQAENVTGLFKDCSKVITGLHPTQAPTHLSVDTKFKTEGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEGCHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLCDRRATCFSTASDTYACWHHSIGFDYVYNPFMIDVQQWGFTGNLQSNHDLYCQVHGNAHVASCDAIMTRCLAVHECFVKRVDWTIEYPIIGDELKINAACRKVQHMVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIEELFYSYATHSDKFTDGVCLFWNCNVDRYPANSIVCRFDTRVLSNLNLPGCDGGSLYVNKHAFHTPAFDKSAFVNLKQLPFFYYSDSPCESHGKQVVSDIDYVPLKSATCITRCNLGGAVCRHHANEYRLYLDAYNMMISAGFSLWVYKQFDTYNLWNTFTRLQSLENVAFNVVNKGHFDGQQGEVPVSIINNTVYTKVDGVDVELFENKTTLPVNVAFELWAKRNIKPVPEVKILNNLGVDIAANTVIWDYKRDAPAHISTIGVCSMTDIAKKPTETICAPLTVFFDGRVDGQVDLFRNARNGVLITEGSVKGLQPSVGPKQASLNGVTLIGEAVKTQFNYYKKVDGVVQQLPETYFTQSRNLQEFKPRSQMEIDFLELAMDEFIERYKLEGYAFEHIVYGDFSHSQLGGLHLLIGLAKRFKESPFELEDFIPMDSTVKNYFITDAQTGSSKCVCSVIDLLLDDFVEIIKSQDLSVVSKVVKVTIDYTEISFMLWCKDGHVETFYPKLQSSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGDSATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEGFFTYICGFIQQKLALGGSVAIKITEHSWNADLYKLMGHFAWWTAFVTNVNASSSEAFLIGCNYLGKPREQIDGYVMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKEGQINDMILSLLSKGRLIIRENNRVVISSDVLVNN*TNNVCFSCFIATSL*SVC*SYNQNSITPCIH*FFHTWCLLP*QSFQILSFTFNSGLVLTFLFQCYLVPCYTCLWDQWY*EV**PCPTI**WCLFCFH*EV*HNKRLDFWYYFRFEDPVPTYC**RY*CCY*SL*ISIL**SIFGCLLPQKQQKLDGK*VQSLF*CE*LHF*ICLSAFSYGP*RKTG*FQKS*GICV*EY*WLF*NIF*AHAY*FSA*SPSGFFGFRTIGRFANRY*HH*VSNFTCFT*KLFDSW*FFFRLDSWCCSLLCGLSST*DFSIKI**KWNHYRCCRLCT*PSLRNKVYVEILHCRKRNLSNF*L*SPTNRIYC*IS*YYKLVPFW*SF*RHQICICLCLEQEENQQLCC*LFCPI*FRIIFHF*VLWSVSY*IK*SLLY*CLCRFICN*R**SQTNRSRANWKDC*L*L*ITR*FYRLRYSLEF*QS*F*GWW*L*LPV*IV*EV*SQTF*ERYFN*NLSGR*HTL*WC*RF*LLLSFTIIWFPTH*WCWLPTIQSSSTFF*TSTCTSNCLWT*KVY*FG*KQMCQFQLQWFNRHRCSY*V*QKVSAFPTIWQRHC*HY*CCP*STDT*DS*HYTMFFWWCQCYNTRNKYF*PGCCSLSGC*LHRSPCCYSCRSTYSYLACLFYRF*CFSNTCRLFNRG*TCQQLI*V*HTHWCRYMR*LSDSD*FSSAGT*CS*SIHHCLHYVTWCRKFSCLL**LYCHTHKFYY*CYHRNSTSVYDQDISRLYNVHLW*FN*MQQSFVAIWQFLYTIKPCFNWNSC*TRQKHPRSFCTSQTNLQNTTN*RFWWF*FFTNITRSIKTKQEVIY*RSTFQQSDTCRCWLHQTIW*LPW*YCC*RPHLCTKV*RPYCFATFAHR*NDCSIHFCTVSGYNHFWLDLWCRCCITNTICYANGL*V*WYWSYTECSL*EPKIDCQPI**CYWQNSRLTFFHSKCTWKTSRCGQPKCTSFKHAC*TT*LQFWCNFKCFK*YPFTS*QS*G*SAN**VDHRQTSKFADICDSTIN*SCRNQSFC*SCCY*NVRVCTWTIKKS*FLWKGLSSYVLPSVSTSWCSLLACDLCPCTRKELHNCSCHLS*WKSTLSS*RCLCFKWHTLVCNTKEFL*TTNHYYRQHICVW*L*CCNRNCQQHSL*SFAT*IRLIQGGVR*IF*ESYITRC*FR*HLWH*CFSCKHSKRN*PPQ*GCQEFK*ISHRSPRTWKV*AVYKMAMVHLARFYSWLDCHSNGDNYALLYDQLL*LSQGLLFLWILLQI**RRL*ASAQRSQITLHINELMDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPL*AQADEYELMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV*TN*ILY*FFCLEL*F*PWQIPTVLLPLKSLKSSLNNGT***VSYSLHGFVFYNLPMPTGIGFCI*LS*FSSGCYGQ*L*LVLCLLLFTE*IGSPVELLSQWLVL*A*CGSATSLLLSDCLRVRVPCGHSIQKLTFFSTCHSMALF*PDRF*KVNS*SEL*SFVDIFVLLDTI*DAVTSRTCLKKSLLLHHERFLITNWELRSV*QVTQVLLHTVATGLATIN*TQTIPVAVTILLCLYSK*QQMFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQLDEEQPMEID*TNMKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIVAAIVFITLCFTLKRKTE*LNFH*LTSICAF*PFCYSLF*LCLLSFGSHLNCKIIMKLVTPKRT*NFLFS*ESSQL*LHFTKNVVYSHVLNINHM*LMTRVLFTSILNGILE*ELENQHL*LNCAWMRLVLNHPFSTSISVIIQFPVYLLQLIARNLNWVVL*CVVRSMKTF*SIMTFVLF*ISSKRTN*NV**WTPKSAKCTPHYVWWTLRFNWQ*PEWRTQWGAIKTTSAPRFTQ*YCVLVHRSHSTWQGRP*IPSRTRRSN*HQ*QSR*PNWLLPKSYQTNSWW*R*NERSQSKMVFLLPRNWARSWTSLWC*QRRHHMGCN*GSLEYTKRSHWHPQSC*QCCNRATTSSRNNIAKRLLRRREQRRQSSLFSFLIT*SQQFKKFNSRQQ*GNFSC*NGWQWR*CCSCFAAA*QIEPA*EQNVW*RPTTTRPNCH*EICC*GF*EASAKTYCH*SIQCNTSFRQTWSRTNPRKFWGPGTNQTRN*LQTLAANCTICPQRFSVLRNVAHWHGSHTFGNVVDLHRCHQIG*QRSKFQRSSHFAE*AY*RIQNIPTNRA*KGQKEEG**NSSLTAETEETANCDSSSCCRFG*FLQTIATIHEQC*LNSGLNSCRPHKADGLYKRFRFSVYDI*STLVQNEFS*LHSTSRCS*L*SHIAIFNQCVTLGRT*KSHHIFTEATRSTIECTVNNARESCLYGRALMCKINFSSAIPM*F**LLRRMTKKKKKKKKKK
In [114]:
from collections import Counter
In [115]:
Counter(ncov_protein).most_common(10)
Out[115]:
[('L', 886),
 ('S', 810),
 ('*', 774),
 ('T', 679),
 ('C', 635),
 ('F', 593),
 ('R', 558),
 ('V', 548),
 ('Y', 505),
 ('N', 472)]

3D Structure of Covid

  • File Format
    • pdb :PDBParser() legacy
    • cif :MMCIFParser() recent

links

Pkgs

  • pip install nglview
  • pip install py3Dmol
  • pip install pytraj
  • jupyter-nbextension enable nglview –py –sys-prefix
  • nglview enable
  • jupyter-labextension install @jupyter-widget/jupyterlab-manager
  • jupyter-labextension install nglview-js-widgets
In [116]:
from Bio.PDB import PDBParser,MMCIFParser
In [117]:
# Reading a PDB File
parser = PDBParser()
structure = parser.get_structure("mmdb_6LU7", "mmdb_6LU7.pdb")
In [118]:
structure
Out[118]:
<Structure id=mmdb_6LU7>
In [163]:
# Chains in the Protein Structure
model = structure[0]
In [165]:
for chain in model:
    print(f'chain {chain},chain_ID: {chain.id}')
chain <Chain id=A>,chain_ID: A
In [168]:
# Check the atoms
for model in structure:
    print(model)
    for chain in model:
        print(chain)
        for residue in chain:
            for atom in residue:
                print(atom)
<Model id=0>
<Chain id=A>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom NE1>
<Atom CE2>
<Atom CE3>
<Atom CZ2>
<Atom CZ3>
<Atom CH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom NE1>
<Atom CE2>
<Atom CE3>
<Atom CZ2>
<Atom CZ3>
<Atom CH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom NE1>
<Atom CE2>
<Atom CE3>
<Atom CZ2>
<Atom CZ3>
<Atom CH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Model id=1>
<Chain id=C>
<Atom C4>
<Atom C5>
<Atom C6>
<Atom O1>
<Atom N2>
<Atom C3>
<Atom C41>
<Atom O42>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom C19>
<Atom C20>
<Atom C21>
<Atom C22>
<Atom C25>
<Atom C26>
<Atom C27>
<Atom C28>
<Atom N6>
<Atom C29>
<Atom O8>
<Atom N5>
<Atom O7>
<Atom C>
<Atom O>
<Atom C1>
<Atom C2>
<Atom C3>
<Atom C4>
<Atom C5>
<Atom C6>
<Model id=2>
<Chain id=A>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom NE1>
<Atom CE2>
<Atom CE3>
<Atom CZ2>
<Atom CZ3>
<Atom CH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom NE1>
<Atom CE2>
<Atom CE3>
<Atom CZ2>
<Atom CZ3>
<Atom CH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom NE1>
<Atom CE2>
<Atom CE3>
<Atom CZ2>
<Atom CZ3>
<Atom CH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom OH>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom ND1>
<Atom CD2>
<Atom CE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom CE>
<Atom NZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom SD>
<Atom CE>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom ND2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom CD1>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom OE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom OD1>
<Atom OD2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom NE>
<Atom CZ>
<Atom NH1>
<Atom NH2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom SG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom OG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom CE1>
<Atom CE2>
<Atom CZ>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD>
<Atom OE1>
<Atom NE2>
<Model id=3>
<Chain id=C>
<Atom C4>
<Atom C5>
<Atom C6>
<Atom O1>
<Atom N2>
<Atom C3>
<Atom C41>
<Atom O42>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG1>
<Atom CG2>
<Atom N>
<Atom CA>
<Atom C>
<Atom O>
<Atom CB>
<Atom CG>
<Atom CD1>
<Atom CD2>
<Atom C19>
<Atom C20>
<Atom C21>
<Atom C22>
<Atom C25>
<Atom C26>
<Atom C27>
<Atom C28>
<Atom N6>
<Atom C29>
<Atom O8>
<Atom N5>
<Atom O7>
<Atom C>
<Atom O>
<Atom C1>
<Atom C2>
<Atom C3>
<Atom C4>
<Atom C5>
<Atom C6>

Visualizing the 3D structure

  • using nglview
  • py3Dmol
  • using pytraj
  • squiggle
In [153]:
# View our 3D Structure
import nglview as nv
In [154]:
nv.demo()
NGLWidget()
In [155]:
view =  nv.show_biopython(structure)
In [156]:
view
NGLWidget()
In [124]:
import py3Dmol
In [125]:
view1 = py3Dmol.view(query='pdb:6LU7')
In [126]:
view1.setStyle({'cartoon':{'color':'spectrum'}})
view1

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

Out[126]:
<py3Dmol.view at 0x7ff36e221a10>
In [129]:
dir(py3Dmol.view)
Out[129]:
['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_make_html',
 '_repr_html_',
 'getModel',
 'insert',
 'model',
 'png',
 'show',
 'update']
In [157]:
view.render_image()
Image(value=b'', width='99%')
In [158]:
view._display_image()
Out[158]:
In [147]:
import pytraj as pt
In [148]:
# Load file
ncov_traj = pt.load("mmdb_6LU7.pdb")
In [159]:
view3 = nv.show_pytraj(ncov_traj)
In [160]:
view3
NGLWidget(max_frame=1)
In [161]:
view3.render_image()
Image(value=b'', width='99%')
In [162]:
view3._display_image()
Out[162]:
In [ ]:

To conclude we have been able to see how to use biopython and how to analysis coronavirus DNA sequence.

You can check out the video tutorials

 

Thanks For Your Attention

Jesus Saves

By  Jesse E.Agbe(JCharis)

Leave a Comment

Your email address will not be published. Required fields are marked *