SlideShare a Scribd company logo
Outline of talk Introduction to protein kinases Prediction of substrate specificity Predikin and PredikinDB Evaluation Neil Saunders School of Molecular and Microbial Sciences University of Queensland
Introduction to protein kinases kinase ATP protein OH + protein OPi kinase ADP + Biochemistry Two major (eukaryotic) types:  (1) Ser/Thr; (2) Tyr ~ 2% of human genes encode a protein kinase At least 30-50% of human proteins phosphorylated Regulate essentially every cellular process
Complex signalling networks How do protein kinases find their targets?
Kinase specificity – substrate recruitment Remenyi et al. (2006) Docking interactions in protein kinase and phosphatase networks. Curr Opin Struct Biol 16: 676-685 LOCATE calcium/calmodulin-dependent protein kinase IV Substrate recruitment Any process that brings substrate to kinase - docking - binding to scaffolding protein(s) - colocalisation - coregulation Docking interactions Colocalisation
Kinase specificity - peptide specificity Amino acid frequency in substrate sequences at X{7}[ST]X{7} sites CK-2 PKA MAPK
Structural basis for peptide specificity Substrate heptapeptide binding to protein kinase A PKA surface + heptapeptide RRASIHD Schematic of heptapeptide + PKA SDRs
Accurate location of key residues using HMMER *->Yellkkl GkG aFGkVylardkktgrlv AiK vik..........eril Y+++k+lG+G+FGkV+la+++ tg++vA+K+i+++  +++ + ri+ snf1p  55  YQIVKTL GE GS F GKVKLAYHTTTGQKV ALK IINkkvl aks dmqGRIE 101 rEikiLkk.dHPNIVkLydvfed.dklylVmEyceGdl GdL fdllkkrgr rEi+ L+  +HP+I+kLydv+ ++d++ +V Ey+++  +Lfd++++r + snf1p  102 REISYLRLlRHPHIIKLYDVIKSkDEIIMVIEYAGN-- - E L FD YIVQRDK 148 rglrkvlsE.earfyfrQilsaLeYLHsqgIiHRDLKPeNiLLds..hvK +sE+ear++f+Qi+sa+eY+H+++I+HRDLKPeN+LLd++ +vK snf1p  149 ------MSEqEARRFFQQIISAVEYCHRHKIVHRDLKPENLLLDEhlNVK 192 la DFG lArql......ttfvGTpeYm APE vl...gYgkpavDiWSlGcil +aDFGl+ ++++++  +t +G+p+Y APEv++++ Y +p+vD+WS+G+il snf1p  193 IA DFG L SNIMtdgnflK TS CG S P NY A APE VIsgkLYAGPEVDVWSCGVIL 242 yElltGkpPFp..qldlifkkig..........SpeakdLikklLvkdPe y +l+++ PF+++ + ++fk+i ++ ++ ++  Sp a  Lik++L ++P  snf1p  243 YVMLCRRLPFDdeSIPVLFKNISngvytlpkflSPGAAGLIKRMLIVNPL 292 kRlta.eaLedeldikaHPff<-* +R++++e+++  + +f  snf1p  293 NRISIhEIMQ-------DDWF  306 GkG, AiK, GdL, DFG, APE anchor positions   -3  +3 Substrate heptapeptide       X   X   X  [ST]  X   X   X
Predikin:  components PredikinDB : database of phosphorylation sites Predikin.pm : Perl module to process kinases Web server
Why not phospho.ELM? +------+-----------+--------+----+-------+------------+------+----------------------+ | acc   | sequence   | position | code | pmids   | kinases   | source | entry_date   | +------+-----------+--------+----+-------+------------+------+----------------------+ | P04083 |AMVSEFLK...| 20   | Y   |2457390| Abl;Src;EGFR |LTP  |2004-12-31 00:00:00+01| +------+-----------+--------+----+-------+------------+------+----------------------+ A phosphoELM entry Problems Incorrect/missing accession numbers Phosphoresidues not at given positions Multiple kinase entries per substrate Inconsistent names for kinase families No way to link  kinase name  with  kinase sequence FT  MOD_RES   26   26  Phospho serine  ( by PKC ). phospho.ELM is derived from SwissProt entries http://phospho.elm.eu.org
PredikinDB construction Substrate UniProt entry ID IF2A_MOUSE AC Q6ZWX6 ; Q3TIQ0; OS Mus musculus (Mouse) . FT  MOD_RES  49   49  Phospho serine  (by  HRI ) ( By  similarity ). FT  MOD_RES  52   52  Phospho serine  (by  EIF2AK3 ,  GCN2 ,  HRI  and  PKR ). Entries in table_kinases that match kinase name and species Q9Z2R9 EIF2AK1 Eif2ak1; Hri Q9Z2B5 EIF2AK3 Eif2ak3;Pek;Perk Q9QZ05 EIF2AK4 Eif2ak4; Gcn2 ;Kiaa1338 Q03963 EIF2AK2 Eif2ak2; Pkr ;Prkr;Tik Entry in table_psites substrate_ac residue posn hepta conf kinase_name kinase_ac Q6ZWX6 S 49 ILLSELS   2 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 EIF2AK3 Q9Z2B5 Q6ZWX6 S 52 SELSRRR 1 GCN2 Q9QZ05 Q6ZWX6 S 52 SELSRRR 1 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 PKR Q03963 PredikinDB links phosphorylation sites to their  specific  kinase sequences
PredikinDB – table schema table_kinases kinase_ac (AC) kinase_id (ID) domain domain_seq kinase_type kinase_name (GN Name) kinase_syn (GN Synonyms) panther_name panther_ac panther_evalue ksd_name ksd_ac ksd_evalue species (OS) kingdom (OC) + 38 SDR-related residues table_psites ID substrate_ac (AC) residue (MOD_RES) position (MOD_RES) hepta confidence (MOD_RES) kinase_name (MOD_RES) kinase_ac (AC) table_substrates substrate_ac (AC) substrate_id (ID) species (OS) kingdom (OC)
The Predikin Perl module External tools - HMMER + HMM libraries - pantherScore - DisEMBL, TMHMM (filters) Bioperl libraries ( http://www.bioperl.org) protein kinase sequence find catalytic domains assign kinase type locate SDRs assign KSD family assign PANTHER family find substrate XXX[STY]XXX make kinase scoring matrix score XXX[STY]XXX sites
Scoring matrices:  SDR method Query kinase: GEL+1 =  E GEL+3 =  F GEL+4 =  S Type  = Ser/Thr SQL query for heptapeptide position -3: select hepta from psites, kinases where kinase_type  = 'Ser/Thr' and psites.kinase_ac = kinases.kinase_ac and GELp1 rlike '[ D E N ] ' and GELp3 rlike '[ F WY ] ' and GELp4 rlike '[ AN S T ]' Heptapeptides : Q FSTVKG E QFSTVK R SVSEAA R SGSSPN R HDSGLD R RMSDEF A RGSFDA Repeat for positions -2 to +3 and corresponding SDRs Frequency matrix PWM (weights) matrix score substrates
Scoring matrices:  filters and cutoffs Residue Phosphosites Disordered 1 TM Helix 2 S 24 637 23 081 16 T    5 405  4 898    5 Y    4 285  3 318 12 Total 34 327 31 297 ( 91.2% ) 33 ( 0.1% ) Most sites disordered (DisEMBL prediction) Most sites not in TM helix (TMHMM prediction)
Evaluation of Predikin A brief area under ROC curve primer Outline of evaluation procedure Obtain kinase-substrate pairs from PredikinDB Construct scoring matrix for kinase (don't include its substrates) Score all XXX[ST]XXX sites in corresponding substrate Label sites as 1 (known, annotated) or 0 (unknown, unannotated) Generate AROC values using R package ROCR TN TP FP FN unannotated sites annotated sites scores ROC curve
Evaluation of comparable methods Comparison with existing methods is not easy Existing tools take a substrate and score sites based on a  kinase family Predikin takes kinase(s) + substrate(s) and scores sites based on  kinase sequence Problems to solve Determine the kinase families common to other tools Relate families to kinase sequences in PredikinDB Submit corresponding substrates to each server - (no API, standalone tools, web services...) Collate scored XXX[ST]XXX sites common to all methods Format data for AROC analysis Example submission using HTML::Form (NetPhosK) # get the form my $ua = LWP::UserAgent->new; my $response = $ua->get($url); my @forms  = HTML::Form->parse($response); # set the values $forms[0]->value(' SEQSUB ', “myfile.fa”); $forms[0]->value(' threshold ', '0.00'); # submit the form my $output = $ua->request($form[0]-> click ); # parse output
Evaluation results Predikin performance equals or exceeds that of existing methods Performance may depend on type of kinase Ser/Thr kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.93 0.90 75/5334 KinasePhos 0.80 0.88 76/5663 GPS 0.88 0.87 72/5307 PPSP 0.92 0.87 75/3778 Scansite 0.96 0.87 55/2936 CMGC kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.62 0.96 211/9146 KinasePhos 0.94 0.96 211/9106 GPS 0.93 0.96 211/9146 PPSP 0.94 0.96 208/8158 Scansite 0.95 0.97 175/5039
Usage cases kinase   substrate  score CLA4   1  CLA4   727  KRA T MVG  92.93 CLA4  1  YOL113W  541  KRATMVG  92.93 CLA4  1  YHL021C  129  KGSSFVS  91.87 CLA4  1  YKR010C  527  KRNSITE  91.70 CLA4  1  YNL049C  526  RATSFFG  90.14 CLA4  1  YDL056W  477  KRKSTTP  88.70 CLA4  1  YOL157C  527  KLFSFTK  88.25 CLA4  1  YBR198C  157  RAYSMLK  87.71 CLA4  1  YML076C  878  HRESMTG  87.62 CLA4  1  YOR181W  619  KRKTKVG  87.37 kinase  substrate  score NP_001547   1  COA1   80   SSM S GLH  85.49 NP_001269   1  COA1   80   SSM S GLH  85.49 XP_042066  1  COA1  80  SSMSGLH  75.77 XP_001128827  1  COA1  80  SSMSGLH  75.77 NP_001013725  1  COA1  80  SSMSGLH  74.72 NP_004064  1  COA1  80  SSMSGLH  73.84 NP_006613  1  COA1  80  SSMSGLH  73.84 NP_001778  1  COA1  80  SSMSGLH  72.21 XP_001128005  1  COA1  80  SSMSGLH  72.21 NP_277021  1  COA1  80  SSMSGLH  72.21 Substrates for CLA4 A PAK/STE-20 kinase in  S. cerevisiae Phosphorylates own activation loop T727? Evidence for this in literature Kinases for acetyl CoA carboxylase Known phosphorylation site on S80 Phosphorylated in AMPK knockout mice Suggested alternate kinases:  IKK  α/β Experimental evidence (Bruce Kemp)
The Predikin webserver:  implementation http://predikin.biosci.uq.edu.au perl.so MySQL PredikinDB PHP Predikin.pm Apache Server DisEMBL TMHMM BLAST pantherScore HMMER Client (browser)
The Predikin webserver:  screenshots Kinase sequence submission
The Predikin webserver:  screenshots Frequency and weight matrices
The Predikin webserver:  screenshots Scored sites
Acknowledgements Funding & advice (UQ) Testing Bostjan Kobe (ARC, NHMRC) Bruce Kemp (SVI Melbourne) Thomas Huber Brenda Andrews (U. Toronto) Predikin 1.0 (UQ) General Ross Brinkworth Kobe Lab Robert Breinl

More Related Content

PDF
20161021_master_lesson_no_feedback
PDF
在Aix6.1上安装11g r2 rac grid infrastructure集群
PDF
Mysql56 replication
PPTX
NGS techniques and data
PDF
Cyber-physical system with machine learning (Poster)
TXT
Nan meno c2
PPTX
Cpf1-based genome editing using ribonucleoprotein complexes
PDF
Amos command
20161021_master_lesson_no_feedback
在Aix6.1上安装11g r2 rac grid infrastructure集群
Mysql56 replication
NGS techniques and data
Cyber-physical system with machine learning (Poster)
Nan meno c2
Cpf1-based genome editing using ribonucleoprotein complexes
Amos command

What's hot (16)

PDF
Increasing genome editing efficiency with optimized CRISPR-Cas enzymes
PDF
Increase efficiency of genome editing with the Alt-R™ CRISPR-Cas9 System: Des...
PDF
Reducing off-target events in CRISPR genome editing applications with a novel...
PDF
A deep dive about VIP,HAIP, and SCAN
DOC
Commands...
PDF
Debugging Ruby
PPS
PhD Defence
PDF
Debugging Ruby Systems
PDF
Pledge in OpenBSD
PPTX
Ashg2017 workshop tg
DOCX
Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
ODP
nftables - the evolution of Linux Firewall
PPTX
20141219 workshop methylation sequencing analysis
PPT
370410176 moshell-commands
PDF
Storage managment using nagios
Increasing genome editing efficiency with optimized CRISPR-Cas enzymes
Increase efficiency of genome editing with the Alt-R™ CRISPR-Cas9 System: Des...
Reducing off-target events in CRISPR genome editing applications with a novel...
A deep dive about VIP,HAIP, and SCAN
Commands...
Debugging Ruby
PhD Defence
Debugging Ruby Systems
Pledge in OpenBSD
Ashg2017 workshop tg
Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
nftables - the evolution of Linux Firewall
20141219 workshop methylation sequencing analysis
370410176 moshell-commands
Storage managment using nagios
Ad

Viewers also liked (20)

PPS
Cars1
PPTX
#SEJThinkTank Webinar: Social Marketing Hacks To Crush It in 2015
DOCX
App Store Optimization Tips 101
PPTX
Journey of Mobile Apps - Year 2013
PPTX
Amy Vernon of The Daily Dot #SEJSummit: Lessons from the Newsroom
PDF
Game Kit - iPhone
PPT
Convert Excel Spreadsheet into Mobile iPhone Android App
PPTX
#SEJThinkTank: Harnessing the Networks of Your Podcast Guests & Audiences
PDF
Online bioinformatics forums: why do we keep asking the same questions?
PDF
Steal from the Startups: Entrepreneur Style Growth Tactics for Big Brands by ...
PPTX
How to Combine text from two or more cells in Excel
PPTX
Treinamento para Tutores do PNC
PDF
Isabel alexandra deus
PPT
Saberes do Professor Universitário
ODP
Introduction to Perl - Day 1
PPTX
Link Building Metrics: Managing Projects and SEOs with Page One Power
PDF
Winning with Authority - 15 Experts on Integrated Online Marketing
PPT
Preconceito racial
PPT
Migración mundial
PPTX
Cases Revista Exame
Cars1
#SEJThinkTank Webinar: Social Marketing Hacks To Crush It in 2015
App Store Optimization Tips 101
Journey of Mobile Apps - Year 2013
Amy Vernon of The Daily Dot #SEJSummit: Lessons from the Newsroom
Game Kit - iPhone
Convert Excel Spreadsheet into Mobile iPhone Android App
#SEJThinkTank: Harnessing the Networks of Your Podcast Guests & Audiences
Online bioinformatics forums: why do we keep asking the same questions?
Steal from the Startups: Entrepreneur Style Growth Tactics for Big Brands by ...
How to Combine text from two or more cells in Excel
Treinamento para Tutores do PNC
Isabel alexandra deus
Saberes do Professor Universitário
Introduction to Perl - Day 1
Link Building Metrics: Managing Projects and SEOs with Page One Power
Winning with Authority - 15 Experts on Integrated Online Marketing
Preconceito racial
Migración mundial
Cases Revista Exame
Ad

More from Neil Saunders (11)

PDF
Should I be dead? a very personal genomics
PDF
Learning from complete strangers: social networking for bioinformaticians
PDF
SQL, noSQL or no database at all? Are databases still a core skill?
PDF
Data Integration: What I Haven't Yet Achieved
PDF
Building A Web Application To Monitor PubMed Retraction Notices
PDF
Version Control in Bioinformatics: Our Experience Using Git
PDF
What can science networking online do for you
ODP
Using structural information to predict protein-protein interaction and enyzm...
PPT
The Viking labelled release experiment: life on Mars?
PDF
Protein function and bioinformatics
PDF
Genomics of cold-adapted microorganisms
Should I be dead? a very personal genomics
Learning from complete strangers: social networking for bioinformaticians
SQL, noSQL or no database at all? Are databases still a core skill?
Data Integration: What I Haven't Yet Achieved
Building A Web Application To Monitor PubMed Retraction Notices
Version Control in Bioinformatics: Our Experience Using Git
What can science networking online do for you
Using structural information to predict protein-protein interaction and enyzm...
The Viking labelled release experiment: life on Mars?
Protein function and bioinformatics
Genomics of cold-adapted microorganisms

Recently uploaded (20)

PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PPTX
Session 14-16. Capital Structure Theories.pptx
PDF
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
PDF
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
DOCX
marketing plan Elkhabiry............docx
PPTX
The discussion on the Economic in transportation .pptx
PPTX
EABDM Slides for Indifference curve.pptx
PPTX
Introduction to Customs (June 2025) v1.pptx
PDF
final_dropping_the_baton_-_how_america_is_failing_to_use_russia_sanctions_and...
PPTX
Antihypertensive_Drugs_Presentation_Poonam_Painkra.pptx
PDF
Corporate Finance Fundamentals - Course Presentation.pdf
PPTX
What is next for the Fractional CFO - August 2025
PPTX
How best to drive Metrics, Ratios, and Key Performance Indicators
PDF
ADVANCE TAX Reduction using traditional insurance
PDF
Copia de Minimal 3D Technology Consulting Presentation.pdf
PDF
Dialnet-DynamicHedgingOfPricesOfNaturalGasInMexico-8788871.pdf
PDF
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
PDF
how_to_earn_50k_monthly_investment_guide.pdf
PDF
Predicting Customer Bankruptcy Using Machine Learning Algorithm research pape...
PPT
E commerce busin and some important issues
ECONOMICS AND ENTREPRENEURS LESSONSS AND
Session 14-16. Capital Structure Theories.pptx
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
marketing plan Elkhabiry............docx
The discussion on the Economic in transportation .pptx
EABDM Slides for Indifference curve.pptx
Introduction to Customs (June 2025) v1.pptx
final_dropping_the_baton_-_how_america_is_failing_to_use_russia_sanctions_and...
Antihypertensive_Drugs_Presentation_Poonam_Painkra.pptx
Corporate Finance Fundamentals - Course Presentation.pdf
What is next for the Fractional CFO - August 2025
How best to drive Metrics, Ratios, and Key Performance Indicators
ADVANCE TAX Reduction using traditional insurance
Copia de Minimal 3D Technology Consulting Presentation.pdf
Dialnet-DynamicHedgingOfPricesOfNaturalGasInMexico-8788871.pdf
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
how_to_earn_50k_monthly_investment_guide.pdf
Predicting Customer Bankruptcy Using Machine Learning Algorithm research pape...
E commerce busin and some important issues

Predikin and PredikinDB: tools to predict protein kinase peptide specificity

  • 1. Outline of talk Introduction to protein kinases Prediction of substrate specificity Predikin and PredikinDB Evaluation Neil Saunders School of Molecular and Microbial Sciences University of Queensland
  • 2. Introduction to protein kinases kinase ATP protein OH + protein OPi kinase ADP + Biochemistry Two major (eukaryotic) types: (1) Ser/Thr; (2) Tyr ~ 2% of human genes encode a protein kinase At least 30-50% of human proteins phosphorylated Regulate essentially every cellular process
  • 3. Complex signalling networks How do protein kinases find their targets?
  • 4. Kinase specificity – substrate recruitment Remenyi et al. (2006) Docking interactions in protein kinase and phosphatase networks. Curr Opin Struct Biol 16: 676-685 LOCATE calcium/calmodulin-dependent protein kinase IV Substrate recruitment Any process that brings substrate to kinase - docking - binding to scaffolding protein(s) - colocalisation - coregulation Docking interactions Colocalisation
  • 5. Kinase specificity - peptide specificity Amino acid frequency in substrate sequences at X{7}[ST]X{7} sites CK-2 PKA MAPK
  • 6. Structural basis for peptide specificity Substrate heptapeptide binding to protein kinase A PKA surface + heptapeptide RRASIHD Schematic of heptapeptide + PKA SDRs
  • 7. Accurate location of key residues using HMMER *->Yellkkl GkG aFGkVylardkktgrlv AiK vik..........eril Y+++k+lG+G+FGkV+la+++ tg++vA+K+i+++ +++ + ri+ snf1p 55 YQIVKTL GE GS F GKVKLAYHTTTGQKV ALK IINkkvl aks dmqGRIE 101 rEikiLkk.dHPNIVkLydvfed.dklylVmEyceGdl GdL fdllkkrgr rEi+ L+ +HP+I+kLydv+ ++d++ +V Ey+++ +Lfd++++r + snf1p 102 REISYLRLlRHPHIIKLYDVIKSkDEIIMVIEYAGN-- - E L FD YIVQRDK 148 rglrkvlsE.earfyfrQilsaLeYLHsqgIiHRDLKPeNiLLds..hvK +sE+ear++f+Qi+sa+eY+H+++I+HRDLKPeN+LLd++ +vK snf1p 149 ------MSEqEARRFFQQIISAVEYCHRHKIVHRDLKPENLLLDEhlNVK 192 la DFG lArql......ttfvGTpeYm APE vl...gYgkpavDiWSlGcil +aDFGl+ ++++++ +t +G+p+Y APEv++++ Y +p+vD+WS+G+il snf1p 193 IA DFG L SNIMtdgnflK TS CG S P NY A APE VIsgkLYAGPEVDVWSCGVIL 242 yElltGkpPFp..qldlifkkig..........SpeakdLikklLvkdPe y +l+++ PF+++ + ++fk+i ++ ++ ++ Sp a Lik++L ++P snf1p 243 YVMLCRRLPFDdeSIPVLFKNISngvytlpkflSPGAAGLIKRMLIVNPL 292 kRlta.eaLedeldikaHPff<-* +R++++e+++ + +f snf1p 293 NRISIhEIMQ-------DDWF 306 GkG, AiK, GdL, DFG, APE anchor positions -3 +3 Substrate heptapeptide X X X [ST] X X X
  • 8. Predikin: components PredikinDB : database of phosphorylation sites Predikin.pm : Perl module to process kinases Web server
  • 9. Why not phospho.ELM? +------+-----------+--------+----+-------+------------+------+----------------------+ | acc | sequence | position | code | pmids | kinases | source | entry_date | +------+-----------+--------+----+-------+------------+------+----------------------+ | P04083 |AMVSEFLK...| 20 | Y |2457390| Abl;Src;EGFR |LTP |2004-12-31 00:00:00+01| +------+-----------+--------+----+-------+------------+------+----------------------+ A phosphoELM entry Problems Incorrect/missing accession numbers Phosphoresidues not at given positions Multiple kinase entries per substrate Inconsistent names for kinase families No way to link kinase name with kinase sequence FT MOD_RES 26 26 Phospho serine ( by PKC ). phospho.ELM is derived from SwissProt entries http://phospho.elm.eu.org
  • 10. PredikinDB construction Substrate UniProt entry ID IF2A_MOUSE AC Q6ZWX6 ; Q3TIQ0; OS Mus musculus (Mouse) . FT MOD_RES 49 49 Phospho serine (by HRI ) ( By similarity ). FT MOD_RES 52 52 Phospho serine (by EIF2AK3 , GCN2 , HRI and PKR ). Entries in table_kinases that match kinase name and species Q9Z2R9 EIF2AK1 Eif2ak1; Hri Q9Z2B5 EIF2AK3 Eif2ak3;Pek;Perk Q9QZ05 EIF2AK4 Eif2ak4; Gcn2 ;Kiaa1338 Q03963 EIF2AK2 Eif2ak2; Pkr ;Prkr;Tik Entry in table_psites substrate_ac residue posn hepta conf kinase_name kinase_ac Q6ZWX6 S 49 ILLSELS 2 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 EIF2AK3 Q9Z2B5 Q6ZWX6 S 52 SELSRRR 1 GCN2 Q9QZ05 Q6ZWX6 S 52 SELSRRR 1 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 PKR Q03963 PredikinDB links phosphorylation sites to their specific kinase sequences
  • 11. PredikinDB – table schema table_kinases kinase_ac (AC) kinase_id (ID) domain domain_seq kinase_type kinase_name (GN Name) kinase_syn (GN Synonyms) panther_name panther_ac panther_evalue ksd_name ksd_ac ksd_evalue species (OS) kingdom (OC) + 38 SDR-related residues table_psites ID substrate_ac (AC) residue (MOD_RES) position (MOD_RES) hepta confidence (MOD_RES) kinase_name (MOD_RES) kinase_ac (AC) table_substrates substrate_ac (AC) substrate_id (ID) species (OS) kingdom (OC)
  • 12. The Predikin Perl module External tools - HMMER + HMM libraries - pantherScore - DisEMBL, TMHMM (filters) Bioperl libraries ( http://www.bioperl.org) protein kinase sequence find catalytic domains assign kinase type locate SDRs assign KSD family assign PANTHER family find substrate XXX[STY]XXX make kinase scoring matrix score XXX[STY]XXX sites
  • 13. Scoring matrices: SDR method Query kinase: GEL+1 = E GEL+3 = F GEL+4 = S Type = Ser/Thr SQL query for heptapeptide position -3: select hepta from psites, kinases where kinase_type = 'Ser/Thr' and psites.kinase_ac = kinases.kinase_ac and GELp1 rlike '[ D E N ] ' and GELp3 rlike '[ F WY ] ' and GELp4 rlike '[ AN S T ]' Heptapeptides : Q FSTVKG E QFSTVK R SVSEAA R SGSSPN R HDSGLD R RMSDEF A RGSFDA Repeat for positions -2 to +3 and corresponding SDRs Frequency matrix PWM (weights) matrix score substrates
  • 14. Scoring matrices: filters and cutoffs Residue Phosphosites Disordered 1 TM Helix 2 S 24 637 23 081 16 T 5 405 4 898 5 Y 4 285 3 318 12 Total 34 327 31 297 ( 91.2% ) 33 ( 0.1% ) Most sites disordered (DisEMBL prediction) Most sites not in TM helix (TMHMM prediction)
  • 15. Evaluation of Predikin A brief area under ROC curve primer Outline of evaluation procedure Obtain kinase-substrate pairs from PredikinDB Construct scoring matrix for kinase (don't include its substrates) Score all XXX[ST]XXX sites in corresponding substrate Label sites as 1 (known, annotated) or 0 (unknown, unannotated) Generate AROC values using R package ROCR TN TP FP FN unannotated sites annotated sites scores ROC curve
  • 16. Evaluation of comparable methods Comparison with existing methods is not easy Existing tools take a substrate and score sites based on a kinase family Predikin takes kinase(s) + substrate(s) and scores sites based on kinase sequence Problems to solve Determine the kinase families common to other tools Relate families to kinase sequences in PredikinDB Submit corresponding substrates to each server - (no API, standalone tools, web services...) Collate scored XXX[ST]XXX sites common to all methods Format data for AROC analysis Example submission using HTML::Form (NetPhosK) # get the form my $ua = LWP::UserAgent->new; my $response = $ua->get($url); my @forms = HTML::Form->parse($response); # set the values $forms[0]->value(' SEQSUB ', “myfile.fa”); $forms[0]->value(' threshold ', '0.00'); # submit the form my $output = $ua->request($form[0]-> click ); # parse output
  • 17. Evaluation results Predikin performance equals or exceeds that of existing methods Performance may depend on type of kinase Ser/Thr kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.93 0.90 75/5334 KinasePhos 0.80 0.88 76/5663 GPS 0.88 0.87 72/5307 PPSP 0.92 0.87 75/3778 Scansite 0.96 0.87 55/2936 CMGC kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.62 0.96 211/9146 KinasePhos 0.94 0.96 211/9106 GPS 0.93 0.96 211/9146 PPSP 0.94 0.96 208/8158 Scansite 0.95 0.97 175/5039
  • 18. Usage cases kinase substrate score CLA4 1 CLA4 727 KRA T MVG 92.93 CLA4 1 YOL113W 541 KRATMVG 92.93 CLA4 1 YHL021C 129 KGSSFVS 91.87 CLA4 1 YKR010C 527 KRNSITE 91.70 CLA4 1 YNL049C 526 RATSFFG 90.14 CLA4 1 YDL056W 477 KRKSTTP 88.70 CLA4 1 YOL157C 527 KLFSFTK 88.25 CLA4 1 YBR198C 157 RAYSMLK 87.71 CLA4 1 YML076C 878 HRESMTG 87.62 CLA4 1 YOR181W 619 KRKTKVG 87.37 kinase substrate score NP_001547 1 COA1 80 SSM S GLH 85.49 NP_001269 1 COA1 80 SSM S GLH 85.49 XP_042066 1 COA1 80 SSMSGLH 75.77 XP_001128827 1 COA1 80 SSMSGLH 75.77 NP_001013725 1 COA1 80 SSMSGLH 74.72 NP_004064 1 COA1 80 SSMSGLH 73.84 NP_006613 1 COA1 80 SSMSGLH 73.84 NP_001778 1 COA1 80 SSMSGLH 72.21 XP_001128005 1 COA1 80 SSMSGLH 72.21 NP_277021 1 COA1 80 SSMSGLH 72.21 Substrates for CLA4 A PAK/STE-20 kinase in S. cerevisiae Phosphorylates own activation loop T727? Evidence for this in literature Kinases for acetyl CoA carboxylase Known phosphorylation site on S80 Phosphorylated in AMPK knockout mice Suggested alternate kinases: IKK α/β Experimental evidence (Bruce Kemp)
  • 19. The Predikin webserver: implementation http://predikin.biosci.uq.edu.au perl.so MySQL PredikinDB PHP Predikin.pm Apache Server DisEMBL TMHMM BLAST pantherScore HMMER Client (browser)
  • 20. The Predikin webserver: screenshots Kinase sequence submission
  • 21. The Predikin webserver: screenshots Frequency and weight matrices
  • 22. The Predikin webserver: screenshots Scored sites
  • 23. Acknowledgements Funding & advice (UQ) Testing Bostjan Kobe (ARC, NHMRC) Bruce Kemp (SVI Melbourne) Thomas Huber Brenda Andrews (U. Toronto) Predikin 1.0 (UQ) General Ross Brinkworth Kobe Lab Robert Breinl