SlideShare a Scribd company logo
2015 bioinformatics python_io_wim_vancriekinge
FBW
20-10-2015
Wim Van Criekinge
Bioinformatics.be
Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Strings
Regular expressions
Python
• Programming languages are overrated
– If you are going into bioinformatics you probably
learn/need multiple
– If you know one you know 90% of a second
• Choice does matter but it matters far less than people think it
does
• Why Python?
– Lets you start useful programs asap
– Build-in libraries – incl BioPython
– Free, most platforms, widely (scientifically) used
• Versus Perl?
– Incredibly similar
– Consistent syntax, indentation
Version 2.7 and 3.4 on athena.ugent.be
Where is the workspace ?
GitHub: Hosted GIT
• Largest open source git hosting site
• Public and private options
• User-centric rather than project-centric
• http://github.ugent.be (use your Ugent
login and password)
– Accept invitation from Bioinformatics-I-
2015
URI:
– https://github.ugent.be/Bioinformatics-I-
2015/Python.git
Run Install.py (is BioPython installed ?)
import pip
import sys
import platform
import webbrowser
print ("Python " + platform.python_version()+ " installed
packages:")
installed_packages = pip.get_installed_distributions()
installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
for i in installed_packages])
print(*installed_packages_list,sep="n")
Control Structures
if condition:
statements
[elif condition:
statements] ...
else:
statements
while condition:
statements
for var in sequence:
statements
break
continue
Lists
• Flexible arrays, not Lisp-like linked
lists
• a = [99, "bottles of beer", ["on", "the",
"wall"]]
• Same operators as for strings
• a+b, a*3, a[0], a[-1], a[1:], len(a)
• Item and slice assignment
• a[0] = 98
• a[1:2] = ["bottles", "of", "beer"]
-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]
• del a[-1] # -> [98, "bottles", "of", "beer"]
Dictionaries
• Hash tables, "associative arrays"
• d = {"duck": "eend", "water": "water"}
• Lookup:
• d["duck"] -> "eend"
• d["back"] # raises KeyError exception
• Delete, insert, overwrite:
• del d["water"] # {"duck": "eend", "back": "rug"}
• d["back"] = "rug" # {"duck": "eend", "back":
"rug"}
• d["duck"] = "duik" # {"duck": "duik", "back":
"rug"}
Regex.py
text = 'abbaaabbbbaaaaa'
pattern = 'ab'
for match in re.finditer(pattern, text):
s = match.start()
e = match.end()
print ('Found "%s" at %d:%d' % (text[s:e], s, e))
Find the answer in ultimate-sequence.txt ?
>ultimate-sequence
ACTCGTTATGATATTTTTTTTGAACGTGAAAATACT
TTTCGTGCTATGGAAGGACTCGTTATCGTGAAGT
TGAACGTTCTGAATGTATGCCTCTTGAAATGGA
AAATACTCATTGTTTATCTGAAATTTGAATGGGA
ATTTTATCTACAATGTTTTATTCTTACAGAACAT
TAAATTGTGTTATGTTTCATTTCACATTTTAGTA
GTTTTTTCAGTGAAAGCTTGAAAACCACCAAGA
AGAAAAGCTGGTATGCGTAGCTATGTATATATA
AAATTAGATTTTCCACAAAAAATGATCTGATAA
ACCTTCTCTGTTGGCTCCAAGTATAAGTACGAAA
AGAAATACGTTCCCAAGAATTAGCTTCATGAGT
AAGAAGAAAAGCTGGTATGCGTAGCTATGTATA
TATAAAATTAGATTTTCCACAAAAAATGATCTG
ATAA
Question 2
AA1 =
{'UUU':'F','UUC':'F','UUA':'L','UUG':'L','UCU':'S','
UCC':'S','UCA':'S','UCG':'S','UAU':'Y','UAC':'Y','UA
A':'*','UAG':'*','UGU':'C','UGC':'C','UGA':'*','UGG':
'W','CUU':'L','CUC':'L','CUA':'L','CUG':'L','CCU':'P',
'CCC':'P','CCA':'P','CCG':'P','CAU':'H','CAC':'H','CA
A':'Q','CAG':'Q','CGU':'R','CGC':'R','CGA':'R','CGG'
:'R','AUU':'I','AUC':'I','AUA':'I','AUG':'M','ACU':'T','
ACC':'T','ACA':'T','ACG':'T','AAU':'N','AAC':'N','AAA'
:'K','AAG':'K','AGU':'S','AGC':'S','AGA':'R','AGG':'R',
'GUU':'V','GUC':'V','GUA':'V','GUG':'V','GCU':'A','G
CC':'A','GCA':'A','GCG':'A','GAU':'D','GAC':'D','GA
A':'E','GAG':'E','GGU':'G','GGC':'G','GGA':'G','GGG
':'G' }
Hint: Use Dictionaries
Hint 2: Translations
Python way:
tab = str.maketrans("ACGU","UGCA")
sequence = sequence.translate(tab)[::-1]
17
Reading Files
name = open("filename")
– opens the given file for reading, and returns a file object
name.read() - file's entire contents as a string
name.readline() - next line from file as a string
name.readlines() - file's contents as a list of lines
– the lines from a file object can also be read using a for loop
>>> f = open("hours.txt")
>>> f.read()
'123 Susan 12.5 8.1 7.6 3.2n
456 Brad 4.0 11.6 6.5 2.7 12n
789 Jenn 8.0 8.0 8.0 8.0 7.5n'
18
File Input Template
• A template for reading files in Python:
name = open("filename")
for line in name:
statements
>>> input = open("hours.txt")
>>> for line in input:
... print(line.strip()) # strip() removes n
123 Susan 12.5 8.1 7.6 3.2
456 Brad 4.0 11.6 6.5 2.7 12
789 Jenn 8.0 8.0 8.0 8.0 7.5
19
Writing Files
name = open("filename", "w")
name = open("filename", "a")
– opens file for write (deletes previous contents), or
– opens file for append (new data goes after previous data)
name.write(str) - writes the given string to the file
name.close() - saves file once writing is done
>>> out = open("output.txt", "w")
>>> out.write("Hello, world!n")
>>> out.write("How are you?")
>>> out.close()
>>> open("output.txt").read()
'Hello, world!nHow are you?'
Question 3. Swiss-Knife.py
• Using a database as input ! Parse
the entire Swiss Prot collection
– How many entries are there ?
– Average Protein Length (in aa and
MW)
– Relative frequency of amino acids
• Compare to the ones used to construct
the PAM scoring matrixes from 1978 –
1991
Question 3: Getting the database
Uniprot_sprot.dat.gz – 528Mb
(on Github onder Files)
Unzipped 2.92 Gb !
http://www.ebi.ac.uk/uniprot/download-center
Amino acid frequencies
1978 1991
L 0.085 0.091
A 0.087 0.077
G 0.089 0.074
S 0.070 0.069
V 0.065 0.066
E 0.050 0.062
T 0.058 0.059
K 0.081 0.059
I 0.037 0.053
D 0.047 0.052
R 0.041 0.051
P 0.051 0.051
N 0.040 0.043
Q 0.038 0.041
F 0.040 0.040
Y 0.030 0.032
M 0.015 0.024
H 0.034 0.023
C 0.033 0.020
W 0.010 0.014
Second step: Frequencies of Occurence
Extra Questions
• How many records have a sequence of length 260?
• What are the first 20 residues of 143X_MAIZE?
• What is the identifier for the record with the
shortest sequence? Is there more than one record
with that length?
• What is the identifier for the record with the
longest sequence? Is there more than one record
with that length?
• How many contain the subsequence "ARRA"?
• How many contain the substring "KCIP-1" in the
description?
Question 4
• Program your own prosite parser !
• Download prosite pattern database
(prosite.dat)
• Automatically generate >2000 search
patterns, and search in sequence set
from question 1

More Related Content

PPTX
2015 bioinformatics python_strings_wim_vancriekinge
PPTX
2015 bioinformatics bio_python
PPTX
2015 bioinformatics databases_wim_vancriekinge
PDF
Perl Programming - 03 Programming File
PDF
Perl for System Automation - 01 Advanced File Processing
PDF
UNIX Basics and Cluster Computing
ODP
Biopython
PDF
20141111 파이썬으로 Hadoop MR프로그래밍
2015 bioinformatics python_strings_wim_vancriekinge
2015 bioinformatics bio_python
2015 bioinformatics databases_wim_vancriekinge
Perl Programming - 03 Programming File
Perl for System Automation - 01 Advanced File Processing
UNIX Basics and Cluster Computing
Biopython
20141111 파이썬으로 Hadoop MR프로그래밍

What's hot (20)

PPTX
2016 bioinformatics i_bio_python_wimvancriekinge
PDF
Intro to Python programming and iPython
PDF
Cross platform php
PPTX
Writing and using php streams and sockets
PDF
Python for Penetration testers
PPTX
2016 bioinformatics i_io_wim_vancriekinge
PDF
Tajo Seoul Meetup-201501
PDF
Python build your security tools.pdf
PDF
Course 102: Lecture 8: Composite Commands
PDF
Module net cdf4
PPTX
Penetration testing using python
PDF
System Programming and Administration
PDF
PyCon Russian 2015 - Dive into full text search with python.
PDF
Python arch wiki
PDF
Perl Programming - 01 Basic Perl
ODP
Pycon Sec
PDF
Course 102: Lecture 6: Seeking Help
PDF
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
2016 bioinformatics i_bio_python_wimvancriekinge
Intro to Python programming and iPython
Cross platform php
Writing and using php streams and sockets
Python for Penetration testers
2016 bioinformatics i_io_wim_vancriekinge
Tajo Seoul Meetup-201501
Python build your security tools.pdf
Course 102: Lecture 8: Composite Commands
Module net cdf4
Penetration testing using python
System Programming and Administration
PyCon Russian 2015 - Dive into full text search with python.
Python arch wiki
Perl Programming - 01 Basic Perl
Pycon Sec
Course 102: Lecture 6: Seeking Help
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Ad

Viewers also liked (20)

PPTX
Bioinformatics life sciences_v2015
PPTX
Van criekinge next_generation_epigenetic_profling_vlille
PPT
2015 03 13_puurs_v_public
PPT
Bioinformatica 29-09-2011-p1-introduction
PPTX
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
PPT
Bioinformatica 10-11-2011-t5-database searching
PPTX
Bioinformatica p6-bioperl
PPTX
Bioinformatics t9-t10-biocheminformatics v2014
PDF
2012 12 02_epigenetic_profiling_environmental_health_sciences
PPTX
Bioinformatics p5-bioperlv2014
PPTX
Bioinformatica t3-scoring matrices
PPTX
Bioinformatics t5-databasesearching v2014
PPT
2015 04 22_time_labs_shared
PDF
Thesis bio bix_2014
PDF
Mini symposium
PPTX
2015 bioinformatics go_hmm_wim_vancriekinge
PPTX
2012 12 12_adam_v_final
PPTX
Jose María Ordovás-El impacto de las ciencias ómicas en la nutrición, la medi...
PPTX
Bioinformatics v2014 wim_vancriekinge
PPTX
2015 09 imec_wim_vancriekinge_v42_to_present
Bioinformatics life sciences_v2015
Van criekinge next_generation_epigenetic_profling_vlille
2015 03 13_puurs_v_public
Bioinformatica 29-09-2011-p1-introduction
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica p6-bioperl
Bioinformatics t9-t10-biocheminformatics v2014
2012 12 02_epigenetic_profiling_environmental_health_sciences
Bioinformatics p5-bioperlv2014
Bioinformatica t3-scoring matrices
Bioinformatics t5-databasesearching v2014
2015 04 22_time_labs_shared
Thesis bio bix_2014
Mini symposium
2015 bioinformatics go_hmm_wim_vancriekinge
2012 12 12_adam_v_final
Jose María Ordovás-El impacto de las ciencias ómicas en la nutrición, la medi...
Bioinformatics v2014 wim_vancriekinge
2015 09 imec_wim_vancriekinge_v42_to_present
Ad

Similar to 2015 bioinformatics python_io_wim_vancriekinge (20)

PPTX
2015 bioinformatics bio_python_partii
PPTX
2016 bioinformatics i_bio_python_ii_wimvancriekinge
PDF
PDF
Python 2.5 reference card (2009)
PDF
Biopython: Overview, State of the Art and Outlook
PPTX
P3 2018 python_regexes
ODP
biopython, doctest and makefiles
PDF
Python for Chemistry
PDF
Python for Chemistry
ODP
PPT
Python tutorial
PPTX
P6 2017 biopython2
PPTX
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
ODP
Python course Day 1
PDF
File handling & regular expressions in python programming
PPTX
PDF
A Few of My Favorite (Python) Things
PDF
Genome_annotation@BioDec: Python all over the place
PPTX
P4 2018 io_functions
PDF
Python: The Dynamic!
2015 bioinformatics bio_python_partii
2016 bioinformatics i_bio_python_ii_wimvancriekinge
Python 2.5 reference card (2009)
Biopython: Overview, State of the Art and Outlook
P3 2018 python_regexes
biopython, doctest and makefiles
Python for Chemistry
Python for Chemistry
Python tutorial
P6 2017 biopython2
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
Python course Day 1
File handling & regular expressions in python programming
A Few of My Favorite (Python) Things
Genome_annotation@BioDec: Python all over the place
P4 2018 io_functions
Python: The Dynamic!

More from Prof. Wim Van Criekinge (20)

PPTX
2020 02 11_biological_databases_part1
PPTX
2019 03 05_biological_databases_part5_v_upload
PPTX
2019 03 05_biological_databases_part4_v_upload
PPTX
2019 03 05_biological_databases_part3_v_upload
PPTX
2019 02 21_biological_databases_part2_v_upload
PPTX
2019 02 12_biological_databases_part1_v_upload
PPTX
P7 2018 biopython3
PPTX
P6 2018 biopython2b
PPTX
T1 2018 bioinformatics
PPTX
P1 2018 python
PDF
Bio ontologies and semantic technologies[2]
PPTX
2018 05 08_biological_databases_no_sql
PPTX
2018 03 27_biological_databases_part4_v_upload
PPTX
2018 03 20_biological_databases_part3
PPTX
2018 02 20_biological_databases_part2_v_upload
PPTX
2018 02 20_biological_databases_part1_v_upload
PPTX
P7 2017 biopython3
PPTX
Van criekinge 2017_11_13_rodebiotech
PPTX
T5 2017 database_searching_v_upload
PPTX
P1 3 2017_python_exercises
2020 02 11_biological_databases_part1
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part3_v_upload
2019 02 21_biological_databases_part2_v_upload
2019 02 12_biological_databases_part1_v_upload
P7 2018 biopython3
P6 2018 biopython2b
T1 2018 bioinformatics
P1 2018 python
Bio ontologies and semantic technologies[2]
2018 05 08_biological_databases_no_sql
2018 03 27_biological_databases_part4_v_upload
2018 03 20_biological_databases_part3
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part1_v_upload
P7 2017 biopython3
Van criekinge 2017_11_13_rodebiotech
T5 2017 database_searching_v_upload
P1 3 2017_python_exercises

Recently uploaded (20)

PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Basic Mud Logging Guide for educational purpose
PDF
Insiders guide to clinical Medicine.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Lesson notes of climatology university.
PPTX
Cell Types and Its function , kingdom of life
PDF
Pre independence Education in Inndia.pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
Pharma ospi slides which help in ospi learning
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Sports Quiz easy sports quiz sports quiz
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Complications of Minimal Access Surgery at WLH
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Basic Mud Logging Guide for educational purpose
Insiders guide to clinical Medicine.pdf
Supply Chain Operations Speaking Notes -ICLT Program
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Lesson notes of climatology university.
Cell Types and Its function , kingdom of life
Pre independence Education in Inndia.pdf
Cell Structure & Organelles in detailed.
Pharma ospi slides which help in ospi learning
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Module 4: Burden of Disease Tutorial Slides S2 2025
O5-L3 Freight Transport Ops (International) V1.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPH.pptx obstetrics and gynecology in nursing
Sports Quiz easy sports quiz sports quiz
VCE English Exam - Section C Student Revision Booklet
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Complications of Minimal Access Surgery at WLH

2015 bioinformatics python_io_wim_vancriekinge

  • 4. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Strings Regular expressions
  • 5. Python • Programming languages are overrated – If you are going into bioinformatics you probably learn/need multiple – If you know one you know 90% of a second • Choice does matter but it matters far less than people think it does • Why Python? – Lets you start useful programs asap – Build-in libraries – incl BioPython – Free, most platforms, widely (scientifically) used • Versus Perl? – Incredibly similar – Consistent syntax, indentation
  • 6. Version 2.7 and 3.4 on athena.ugent.be
  • 7. Where is the workspace ?
  • 8. GitHub: Hosted GIT • Largest open source git hosting site • Public and private options • User-centric rather than project-centric • http://github.ugent.be (use your Ugent login and password) – Accept invitation from Bioinformatics-I- 2015 URI: – https://github.ugent.be/Bioinformatics-I- 2015/Python.git
  • 9. Run Install.py (is BioPython installed ?) import pip import sys import platform import webbrowser print ("Python " + platform.python_version()+ " installed packages:") installed_packages = pip.get_installed_distributions() installed_packages_list = sorted(["%s==%s" % (i.key, i.version) for i in installed_packages]) print(*installed_packages_list,sep="n")
  • 10. Control Structures if condition: statements [elif condition: statements] ... else: statements while condition: statements for var in sequence: statements break continue
  • 11. Lists • Flexible arrays, not Lisp-like linked lists • a = [99, "bottles of beer", ["on", "the", "wall"]] • Same operators as for strings • a+b, a*3, a[0], a[-1], a[1:], len(a) • Item and slice assignment • a[0] = 98 • a[1:2] = ["bottles", "of", "beer"] -> [98, "bottles", "of", "beer", ["on", "the", "wall"]] • del a[-1] # -> [98, "bottles", "of", "beer"]
  • 12. Dictionaries • Hash tables, "associative arrays" • d = {"duck": "eend", "water": "water"} • Lookup: • d["duck"] -> "eend" • d["back"] # raises KeyError exception • Delete, insert, overwrite: • del d["water"] # {"duck": "eend", "back": "rug"} • d["back"] = "rug" # {"duck": "eend", "back": "rug"} • d["duck"] = "duik" # {"duck": "duik", "back": "rug"}
  • 13. Regex.py text = 'abbaaabbbbaaaaa' pattern = 'ab' for match in re.finditer(pattern, text): s = match.start() e = match.end() print ('Found "%s" at %d:%d' % (text[s:e], s, e))
  • 14. Find the answer in ultimate-sequence.txt ? >ultimate-sequence ACTCGTTATGATATTTTTTTTGAACGTGAAAATACT TTTCGTGCTATGGAAGGACTCGTTATCGTGAAGT TGAACGTTCTGAATGTATGCCTCTTGAAATGGA AAATACTCATTGTTTATCTGAAATTTGAATGGGA ATTTTATCTACAATGTTTTATTCTTACAGAACAT TAAATTGTGTTATGTTTCATTTCACATTTTAGTA GTTTTTTCAGTGAAAGCTTGAAAACCACCAAGA AGAAAAGCTGGTATGCGTAGCTATGTATATATA AAATTAGATTTTCCACAAAAAATGATCTGATAA ACCTTCTCTGTTGGCTCCAAGTATAAGTACGAAA AGAAATACGTTCCCAAGAATTAGCTTCATGAGT AAGAAGAAAAGCTGGTATGCGTAGCTATGTATA TATAAAATTAGATTTTCCACAAAAAATGATCTG ATAA Question 2
  • 16. Hint 2: Translations Python way: tab = str.maketrans("ACGU","UGCA") sequence = sequence.translate(tab)[::-1]
  • 17. 17 Reading Files name = open("filename") – opens the given file for reading, and returns a file object name.read() - file's entire contents as a string name.readline() - next line from file as a string name.readlines() - file's contents as a list of lines – the lines from a file object can also be read using a for loop >>> f = open("hours.txt") >>> f.read() '123 Susan 12.5 8.1 7.6 3.2n 456 Brad 4.0 11.6 6.5 2.7 12n 789 Jenn 8.0 8.0 8.0 8.0 7.5n'
  • 18. 18 File Input Template • A template for reading files in Python: name = open("filename") for line in name: statements >>> input = open("hours.txt") >>> for line in input: ... print(line.strip()) # strip() removes n 123 Susan 12.5 8.1 7.6 3.2 456 Brad 4.0 11.6 6.5 2.7 12 789 Jenn 8.0 8.0 8.0 8.0 7.5
  • 19. 19 Writing Files name = open("filename", "w") name = open("filename", "a") – opens file for write (deletes previous contents), or – opens file for append (new data goes after previous data) name.write(str) - writes the given string to the file name.close() - saves file once writing is done >>> out = open("output.txt", "w") >>> out.write("Hello, world!n") >>> out.write("How are you?") >>> out.close() >>> open("output.txt").read() 'Hello, world!nHow are you?'
  • 20. Question 3. Swiss-Knife.py • Using a database as input ! Parse the entire Swiss Prot collection – How many entries are there ? – Average Protein Length (in aa and MW) – Relative frequency of amino acids • Compare to the ones used to construct the PAM scoring matrixes from 1978 – 1991
  • 21. Question 3: Getting the database Uniprot_sprot.dat.gz – 528Mb (on Github onder Files) Unzipped 2.92 Gb ! http://www.ebi.ac.uk/uniprot/download-center
  • 22. Amino acid frequencies 1978 1991 L 0.085 0.091 A 0.087 0.077 G 0.089 0.074 S 0.070 0.069 V 0.065 0.066 E 0.050 0.062 T 0.058 0.059 K 0.081 0.059 I 0.037 0.053 D 0.047 0.052 R 0.041 0.051 P 0.051 0.051 N 0.040 0.043 Q 0.038 0.041 F 0.040 0.040 Y 0.030 0.032 M 0.015 0.024 H 0.034 0.023 C 0.033 0.020 W 0.010 0.014 Second step: Frequencies of Occurence
  • 23. Extra Questions • How many records have a sequence of length 260? • What are the first 20 residues of 143X_MAIZE? • What is the identifier for the record with the shortest sequence? Is there more than one record with that length? • What is the identifier for the record with the longest sequence? Is there more than one record with that length? • How many contain the subsequence "ARRA"? • How many contain the substring "KCIP-1" in the description?
  • 24. Question 4 • Program your own prosite parser ! • Download prosite pattern database (prosite.dat) • Automatically generate >2000 search patterns, and search in sequence set from question 1