SlideShare a Scribd company logo
Using the MMDB C++
library from Python
CRBM
Background
CCP4 has traditionally developed and
maintained programs for macromolecular
crystallography – mostly in Fortran. We realised
a need for object-oriented programming
particularly to handle more complex
experimental data. Hence the development of
two C++ libraries:
Clipper, for experimental data, by Kevin Cowtan
MMDB (macro-molecular data-base) by Eugene
Krissinel
CCP4mg
CCP4mg project begun after the library project.
We want to use the libraries and integrate with
other scientific methods being developed in
C++ but
recognise advantages of Python for rapid coding
and the Python libraries (and thanks to
Warren and Michel for demonstrating Python
MG will work!).
SWIG
Auto generates code to export C/C++ interface
to Python (and other scripting languages).
We had some problems initially – particularly
exporting overloaded method names. These
were solved by SWIG version >=1.3.17
Our build currently auto generates for all of
MMDB – huge file and the slow step in program
building. (Solution: we need to be more
discerning in what we interface).
C++-Python Interface Issues
It is not efficient to pass large quantities of data
through this interface. Any functionality which
requires looping over all atoms (or residues) is
written in C++. (Should we just export the
whole data structure in one go?).
In our code Python does not access the
underlying data – it is a puppet-master which
usually deals with pointers to the model, handles
to selection sets and a few individual
atom/residue/chain pointers.
MMDB
MMDB is heavily used by European
BioInfomatics Macromolecular Structure
Database group to handle deposited data which
may be in PDB or mmCIF format.
Freely available – www.ccp4.ac.uk
www.ebi.ac.uk/~keb/cldoc
Ccp4 mmdb-python
MMDB Functionality
•Read/write PDB mmCif, binary format
•Large number of methods to ‘surf’ data structure
•Methods to safely edit the data structure
•Tools to select sets of atoms (these are brilliant!)
•Handling additional generic user defined data
•Structure analysis methods
Python Code example – list chain ids and residue
names
# molHnd is instance of MMDBManager object (a molecule)
molHnd = CMMDBManager()
#Read a PDB file
RC = molHnd.ReadCoordFile(‘mydata.pdb’)
# Get a table of the chains in the molecule
chainTable = newPPCChain()
nChains = intp()
molHnd.GetChainTable(1,chainTable,nChains)
#Loop over all chains and print chain ID
for ic in range(0,nChains.value())
pc=CChainPtr(getPCChain(chainTable,ic))
print ‘Chain’,pc.GetChainID()
#Get a table of the residues in the chain
resTable = newPPCResidue()
nRes = intp()
pc = GetResidueTable(resTable,nRes)
#Loop over residues and print out name and sequence ID
for ir in range(0,nRes.value())
pr = CResiduePtr(getPCResidue(resTable,ir))
print ‘ Residue’,pr.name,pr.seqNum
….and similarly for atoms
There are many means of navigating round the
data hierarchy – the example shows just one of
them
There are a few lines of code here to handle the
C++-Python interface which presumably would
not be necessary in a pure Python
implementation.
Comments on the Code Example
Comments for CRBM
I may be going off on the wrong track but here’s
my two pennies worth..
• CCP4 is (mostly) writing scientific methods in
C++ and not Python, so should we be involved
in CRBM? One C in CCP4 is for ‘Collaborative’ so
in principle we are interested.
• The useful things people in CRBM might want to
share are scientific methods but these are
(usually) closely tied to underlying data
structures which makes sharing tricky. (As a
not completely reformed Fortran programmer I
can not resist pointing out that this is at odds
with the usual ‘reusable methods’ hype for OO).
Comments - continued
• If I understood correctly one idea put up by
Michel was some standardizing of interface to
the underlying data structures.
• Alternatively need mechanism to move data
between different data structures. The old-
fashioned way is via a file.
Comments - continued
Something I would like to see standardized – the
naming syntax for atoms/residues etc.
e.g. MMDB/CCP4 syntax for unique identifier for
an atom
/1/A/27/CA
i.e. CA atom or residue 27 or chain A of (NMR)
model 1)
The NMR model number is usually omitted.

More Related Content

PDF
Versioned Triple Pattern Fragments
PDF
Data transfering: faster, stronger, better and not harder. UA Mobile 2016.
PPT
Lecture6
PDF
Gems in the python standard library
PDF
TNTBase – a Versioned Database for XML (Mathematical) Documents
DOCX
BioInformatics MCQ
DOCX
Hidden Markov Model Toolkit (HTK) www.redicals.com
Versioned Triple Pattern Fragments
Data transfering: faster, stronger, better and not harder. UA Mobile 2016.
Lecture6
Gems in the python standard library
TNTBase – a Versioned Database for XML (Mathematical) Documents
BioInformatics MCQ
Hidden Markov Model Toolkit (HTK) www.redicals.com

Similar to Ccp4 mmdb-python (20)

PDF
Small, fast and useful – MMTF a new paradigm in macromolecular data transmiss...
PPTX
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
PDF
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
PPTX
Numerical Simulation of Nonlinear Mechanical Problems using Metafor
PDF
Embedded Recipes 2018 - Shared memory / telemetry - Yves-Marie Morgan
PPTX
C++ Basics
PDF
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
PDF
OpenDiscovery
PDF
Some "challenges" on the open-source/open-data front
DOCX
FDP-faculty deveopmemt program on python
PDF
Memory as a Programming Concept in C and C Frantisek Franek
PPTX
How to automate Machine Learning pipeline ?
PPTX
Role of python in hpc
PDF
Introducing Parallel Pixie Dust
PPTX
Group presentation.pptx
PDF
Be cse
PDF
Computational Approaches to Systems Biology
PPTX
RDF Join Query Processing with Dual Simulation Pruning
PDF
SoC-2012-pres-2
PDF
CSE5656 Complex Networks - Location Correlation in Human Mobility, Implementa...
Small, fast and useful – MMTF a new paradigm in macromolecular data transmiss...
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
Numerical Simulation of Nonlinear Mechanical Problems using Metafor
Embedded Recipes 2018 - Shared memory / telemetry - Yves-Marie Morgan
C++ Basics
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
OpenDiscovery
Some "challenges" on the open-source/open-data front
FDP-faculty deveopmemt program on python
Memory as a Programming Concept in C and C Frantisek Franek
How to automate Machine Learning pipeline ?
Role of python in hpc
Introducing Parallel Pixie Dust
Group presentation.pptx
Be cse
Computational Approaches to Systems Biology
RDF Join Query Processing with Dual Simulation Pruning
SoC-2012-pres-2
CSE5656 Complex Networks - Location Correlation in Human Mobility, Implementa...
Ad

Recently uploaded (20)

PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
Trump Administration's workforce development strategy
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
International_Financial_Reporting_Standa.pdf
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
Complications of Minimal Access-Surgery.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
advance database management system book.pdf
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
Empowerment Technology for Senior High School Guide
PPTX
20th Century Theater, Methods, History.pptx
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PPTX
Computer Architecture Input Output Memory.pptx
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
IGGE1 Understanding the Self1234567891011
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
FORM 1 BIOLOGY MIND MAPS and their schemes
Trump Administration's workforce development strategy
Share_Module_2_Power_conflict_and_negotiation.pptx
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
International_Financial_Reporting_Standa.pdf
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Complications of Minimal Access-Surgery.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
advance database management system book.pdf
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
B.Sc. DS Unit 2 Software Engineering.pptx
Empowerment Technology for Senior High School Guide
20th Century Theater, Methods, History.pptx
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Computer Architecture Input Output Memory.pptx
Cambridge-Practice-Tests-for-IELTS-12.docx
IGGE1 Understanding the Self1234567891011
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Introduction to pro and eukaryotes and differences.pptx
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Ad

Ccp4 mmdb-python

  • 1. Using the MMDB C++ library from Python CRBM
  • 2. Background CCP4 has traditionally developed and maintained programs for macromolecular crystallography – mostly in Fortran. We realised a need for object-oriented programming particularly to handle more complex experimental data. Hence the development of two C++ libraries: Clipper, for experimental data, by Kevin Cowtan MMDB (macro-molecular data-base) by Eugene Krissinel
  • 3. CCP4mg CCP4mg project begun after the library project. We want to use the libraries and integrate with other scientific methods being developed in C++ but recognise advantages of Python for rapid coding and the Python libraries (and thanks to Warren and Michel for demonstrating Python MG will work!).
  • 4. SWIG Auto generates code to export C/C++ interface to Python (and other scripting languages). We had some problems initially – particularly exporting overloaded method names. These were solved by SWIG version >=1.3.17 Our build currently auto generates for all of MMDB – huge file and the slow step in program building. (Solution: we need to be more discerning in what we interface).
  • 5. C++-Python Interface Issues It is not efficient to pass large quantities of data through this interface. Any functionality which requires looping over all atoms (or residues) is written in C++. (Should we just export the whole data structure in one go?). In our code Python does not access the underlying data – it is a puppet-master which usually deals with pointers to the model, handles to selection sets and a few individual atom/residue/chain pointers.
  • 6. MMDB MMDB is heavily used by European BioInfomatics Macromolecular Structure Database group to handle deposited data which may be in PDB or mmCIF format. Freely available – www.ccp4.ac.uk www.ebi.ac.uk/~keb/cldoc
  • 8. MMDB Functionality •Read/write PDB mmCif, binary format •Large number of methods to ‘surf’ data structure •Methods to safely edit the data structure •Tools to select sets of atoms (these are brilliant!) •Handling additional generic user defined data •Structure analysis methods
  • 9. Python Code example – list chain ids and residue names # molHnd is instance of MMDBManager object (a molecule) molHnd = CMMDBManager() #Read a PDB file RC = molHnd.ReadCoordFile(‘mydata.pdb’) # Get a table of the chains in the molecule chainTable = newPPCChain() nChains = intp() molHnd.GetChainTable(1,chainTable,nChains) #Loop over all chains and print chain ID for ic in range(0,nChains.value()) pc=CChainPtr(getPCChain(chainTable,ic)) print ‘Chain’,pc.GetChainID()
  • 10. #Get a table of the residues in the chain resTable = newPPCResidue() nRes = intp() pc = GetResidueTable(resTable,nRes) #Loop over residues and print out name and sequence ID for ir in range(0,nRes.value()) pr = CResiduePtr(getPCResidue(resTable,ir)) print ‘ Residue’,pr.name,pr.seqNum ….and similarly for atoms
  • 11. There are many means of navigating round the data hierarchy – the example shows just one of them There are a few lines of code here to handle the C++-Python interface which presumably would not be necessary in a pure Python implementation. Comments on the Code Example
  • 12. Comments for CRBM I may be going off on the wrong track but here’s my two pennies worth.. • CCP4 is (mostly) writing scientific methods in C++ and not Python, so should we be involved in CRBM? One C in CCP4 is for ‘Collaborative’ so in principle we are interested. • The useful things people in CRBM might want to share are scientific methods but these are (usually) closely tied to underlying data structures which makes sharing tricky. (As a not completely reformed Fortran programmer I can not resist pointing out that this is at odds with the usual ‘reusable methods’ hype for OO).
  • 13. Comments - continued • If I understood correctly one idea put up by Michel was some standardizing of interface to the underlying data structures. • Alternatively need mechanism to move data between different data structures. The old- fashioned way is via a file.
  • 14. Comments - continued Something I would like to see standardized – the naming syntax for atoms/residues etc. e.g. MMDB/CCP4 syntax for unique identifier for an atom /1/A/27/CA i.e. CA atom or residue 27 or chain A of (NMR) model 1) The NMR model number is usually omitted.