SlideShare a Scribd company logo
Dr. Sven Nahnsen/Dr. Marius Codrea,
Quantitative Biology Center (QBiC)
Data Management for Quantitative Biology
Lecture 5: Database systems (continued)
LIMS and E-lab books
Many database design & concepts
http://dataconomy.com/wp-content/uploads/2014/07/fig2large.jpg
2
Databases
DB = "A database is an organized collection of data" http://en.wikipedia.org/wiki/Database
DB = DB + data model for the application at hand (business logic) + implementation
DB = DB + database management system (DBMS). Software than enables:
3
CRUD
• Create entries
• Read (retrieve)
• Update / edit
• Delete
DB = DB + Administration (User privilages, monitoring)
Selected database systems
I. Relational databases
MySQL
II.NoSQL databases
MongoDB
Specific characteristics MongoDB vs MySQL
More details here: http://db-engines.com/en/system/MongoDB%3BMySQL
System Property MongoDB MySQL
Initial release 2009 1995
Current release 3.0.2, April 2015 5.6.24, April 2015
Triggers No Yes
MapReduce Yes No
Foreign keys No Yes
Transaction concepts No ACID*
*A database transaction, must be Atomic, Consistent, Isolated and Durable.
Fields
Record 1
Record 6
Primary
keyPrimary
key
Foreign Key
Ref
Mice.Mouse_number
● The values of the primary keys uniquely identifies the rows of the table
● The foreign key uniquely links the rows of the host table to 1 record in the referencing table
Mice table
Samples table
Terminology - Relational databases
Mice table
Samples table
Samples are RELATED to mice
1:N one-to-many relationship
Relational databases (Normalization)
Foreign Keys
CREATE TABLE samples (
  Sample_ID SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  Mouse_number SMALLINT UNSIGNED NOT NULL,
  Timepoint VARCHAR(15) NOT NULL,
  PRIMARY KEY (Sample_ID),
  FOREIGN KEY (Mouse_number) 
        REFERENCES mice(Mouse_number)
        ON DELETE CASCADE
 )ENGINE=InnoDB DEFAULT CHARSET=utf8;
Queries
SELECT * , COUNT(*) as count_per_gender from mice
 group by Gender, Treatment;
“How many males and how many females per treatment?”
JOIN queries
SELECT Sample_ID, Treatment, Timepoint, mice.Mouse_number 
from 
samples join mice 
on samples.Mouse_number = mice.Mouse_number 
where mice.Mouse_number=2; 
“What samples do I have from mouse number 2?”
Relational „facts“
1.Rigid schema (once the structure is defined, it may be difficult to adjust)
2.Normalization introduces/requires additional tables, joins, indices and it
scatters data
3.Each field in each record has a single value of a pre-defined type
Mice table
Samples table
1:M one-to-many relationship
Relational „facts“ 1
Generalization to other Projects/Experiments in the lab?
Rigid schema (once the structure is defined, it may be difficult to adjust)
Organisms table
Samples table
BROKEN 1:N one-to-many relationship
Relational „facts“ 1
Mice table
Deleted relationship
CREATE TABLE samples (
  Sample_ID SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  Mouse_number SMALLINT UNSIGNED NOT NULL,
  Timepoint VARCHAR(15) NOT NULL,
  PRIMARY KEY (Sample_ID),
  FOREIGN KEY (Mouse_number) 
        REFERENCES mice(Mouse_number)
        ON DELETE CASCADE
 )ENGINE=InnoDB DEFAULT CHARSET=utf8;
Organisms table
Projects table
Relational „facts“ 1
Projects table
Relational „facts“ 1 Users table
Many users can be involved in many projects. With many roles?
Projects_Users table
Projects table
Relational „facts“ 2
Users table
Normalization introduces/requires additional tables, joins, indices and scatters data
Projects_Users table
CREATE INDEX usr 
on Project_Users (User_ID);
Relational „facts“ 3
Each field in each record has a single value of a pre-defined type
Primary Key Field 1 Field 2 Field 3
A 2-D map (tuples)
Relational „facts“ 3
A single value ?!?
What if a person has 2 affiliations and thus 2 addresses, 2 phone
numbers, etc?
Normalization? Again?
NoSQL
DB = "A database is an organized collection of data" http://en.wikipedia.org/wiki/Database
● Can we allow for “some” heterogeneity of the data?
● Can the records be highly similar but not necessarily identical? (e.g.,
most of the users having just 1 phone number but others more?)
MongoDB is a document-oriented DB
{
Mouse_number: “1”,
Gender: “Male”,
Age: 3,
Treatment: “Vitamin A”
}
Field:value pairs
Document ~ Record
http://www.mongodb.org/
MongoDB Documents
{
Mouse_number: “1”,
Gender: “Male”,
Age: 3,
Treatment: “Vitamin A”
}
Field:value pairs
Documents are BSON files (binary JSON)
Closely resemble structures in programming languages (key-value association)
Each field can be
● NULL
● Single value (integer, string, etc)
● An array of many values
● Other embedded documents
● A reference to another document
MongoDB Collections
Documents are stored in
Collections
Collection ~ Table
{
Mouse_number: “6”,
Gender: “Female”,
Age: 2,
Treatment: “Vitamin B” ,
}
Different representation – The challenge remains the same:
Model the relationships between data
Organisms
Projects
Users
AffiliationsSamples
N
N
N
N
N
N
1
1
Design & operational mechanisms
MySQL
● Primary Key
● Foreign Key
● Join Tables
MongoDB
● Unique ID
● References
● Embedding
MongoDB – Field types
{
_id: <ObjectID1>
Username: {
first_name: “Hans”,
last_name: “Meyer”
},
Gender: “Male”,
Age: 30,
Phones: [“+490777”, “+350777”],
Affiliations_id: <UUID_affiliation>
}
Users document
● array
● embedded document
● reference
● Unique ID
MongoDB – Field types
● Unique ID_id: <ObjectID1>
Acts as a primary key
ObjectId is a 12-byte BSON type, constructed using:
● a 4-byte value representing the seconds since the Unix time,
● a 3-byte machine identifier,
● a 2-byte process id, and
● a 3-byte counter, starting with a random value.
http://docs.mongodb.org/manual/reference/object-id/
ObjectId("507f1f77bcf86cd799439011")
MongoDB – Field types
● arrayPhones: [“+490777”, “+350777”]
● Upon indexing, each value in the array is in the index
● Query for ANY matching value
MongoDB – Field types
{
_id: <ObjectID1>
Username: {
first_name: “Hans”,
last_name: “Meyer”
},
Gender: “Male”,
}
● embedded document
● Pre-joined data?
● Can be indexed
● Query at any level on any field
MongoDB – Field types
{
_id: <ObjectID1>
Username: {
first_name: “Hans”,
last_name: “Meyer”
},
Gender: “Male”,
Age: 30,
Phones: [“+490777”, “+350777”],
Affiliations_id: <UUID_affiliation>
}
Users document
● reference
Affiliations document
{
_id: <UUID_affiliation>
Name: “My lab”,
Address: “Tübingen”
}
Where is the catch?
● "In MongoDB, write operations are atomic at the document level, and
no single write operation can atomically affect more than one
document or more than one collection."
● OK, then references (normalized model) are not really Foreign
Keys that the DB engine resolves. "Client-side applications must
issue follow-up queries to resolve the references".
(see next slide)
● “A denormalized data model with embedded data combines all
related data for a represented entity in a single document. This
facilitates atomic write operations since a single write operation can
insert or update the data for an entity.”
● OK, denormalize. Maximum default document size is 16MB.
http://docs.mongodb.org/manual/
Foreign key „ON DELETE CASCADE“
“Mouse number 3 went wrong. Let's just delete it.”
SELECT * from samples;
DELETE from mice where 
Mouse_number = 3;
SELECT * from samples;
Where are these two
samples gone?
The key challenge
Find the right structure of the documents (references and embedded
documents) that best fit
● the requirements of the application (queries, updates) -data usage
● the performance of the database engine
Model the relationships between data
Organisms
Projects
Users
AffiliationsSamples
N
N
N
N
N
N
1
1
Model the relationships between data 1:N
Organisms
Samples
N
1
Organisms
Sample_ids: [ ]
Samples
Organism_id:
OR?
Depends on the most frequent question?
● What samples do I have from Organism X ?
● Where Sample Y came from?
● How many samples? Reach the 16MB limit?
● Organism embeds
multiple samples
Relational „facts“ 3
Each field in each record has a single value of a pre-defined type
Primary Key Field 1 Field 2 Field 3
A 2-D map (tuples)
MongoDB
Nested documents
_id Field 1 Field 2 Field 3
Queries
{
_id: 4,
Project_ID: 2,
Species: “human”,
Gender:””,
Age: 30,
Treatment:”Vaccine A”
}
Organisms
db.organisms.insert(
  { 
  Project_ID: 2,
  Species: “human”,
  Gender :””,
  Age: 30,
  Treatment: ”Vaccine A”
} 
 
Queries
{
_id: 4,
Project_ID: 2,
Species: “human”,
Gender:””,
Age: 30,
Treatment:”Vaccine A”
}
Organisms
db.organisms.find(
  { Project_ID: { $eq : 2} 
}) 
 
SELECT * from organisms
WHERE Project_ID = 2;
Queries
{
_id: 4,
Project_ID: 2,
Species: “human”,
Gender:””,
Age: 30,
Treatment:”Vaccine A”
}
Organisms db.organisms.find(
  { $and: [{Species: /h.*/}, 
{Age: {$gt: 20 }}]
})
 
SELECT * from organisms 
WHERE Species like 'h%' AND 
Age > 20;
Schema flexibility
{
_id: 4,
Project_ID: 2,
Species: “human”,
Gender:””,
Age: 30,
Treatment:”Vaccine A”
}
Organisms
{
_id: 14,
Project_ID: 5,
Species: “human”,
Gender:”Female”,
Age: 10,
Genetic_background: “WT”
}
Data IS the schema!
Queries
{
_id: 4,
Project_ID: 2,
Species: “human”,
Gender:””,
Age: 30,
Treatment:”Vaccine A”
}
Organisms db.organisms.find(
 { Genetic_background: 
$exists: true } 
})
 
SELECT ??? 
Model the relationships between data 1:N
Organisms
Samples
N
1
Organisms
Sample_ids: [ ]
Samples
Organism_id:
OR?
● Organism embeds
multiple samples
MongoDB
Nested documents
_id Field 1 Field 2 Field 3
Queries
{
_id: 4,
Project_ID: 2,
Species: “human”,
Gender:””,
Age: 30,
Samples: [ { _id: 10, Timepoint:”5h”},
{ _id: 11, Timepoint:”24h” }
],
Treatment:”Vaccine A”
}
Organisms
db.organisms.find(
 { '_id': '4',
   'Samples._id':'11' 
 } 
)
db.organisms.find(
 { '_id': '4',  
'Samples.Timepoint':'5h' 
 } 
)
Summary
● Database design requires technical and substantial domain specific
knowledge
● Normalization
● Indices
MySQL
● Primary Key
● Foreign Key
● Join Tables
MongoDB
● Unique ID
● References
● Embedding
Hint: http://en.wikipedia.org/wiki/Category:Web_application_frameworks
Laboratory information management system (LIMS)
Organisms
Projects
Users
AffiliationsSamples
N
N
N
N
N
N
1
1
An underlying data structure of a simple LIMS design
LIMS definition
http://en.wikipedia.org/wiki/Laboratory_information_management_system
„A Laboratory Information Management System (LIMS), sometimes
referred to as a Laboratory Information System (LIS) or Laboratory
Management System (LMS), is a software-based laboratory and
information management system that offers a set of key features that
support a modern laboratory's operations.“
LIMS properties and functionality
http://en.wikipedia.org/wiki/Laboratory_information_management_system
● Meta data of any sample entering the laboratory
● Tracking of processes throughout sample treatment and preparation;
scheduling of the sample and the associated analytical workload
● Quality control associated with the sample and the utilized equipment
and inventory
● Inspection, approval, and compilation of the sample data for reporting
and/or further analysis
Advantages of LIMS
50
● Fewer transcription errors
● Faster sample processing
● Real-time control of data and metadata
● Reproducibility of experimental processes
● Direct electronic reporting to clients
● Despite many advantages,...
Disadvantages of LIMS
● Customization of LIMS
● Interface is required
● Adequate validation to ensure data quality
With a good LIMS in place, we can consider
Electronic Laboratory Notebooks
Electronic laboratory notebooks (ELN)
http://en.wikipedia.org/wiki/Electronic_lab_notebook
An electronic lab notebook (also known as electronic laboratory
notebook, or ELN) is a computer program designed to replace paper
laboratory notebooks. Lab notebooks in general are used by scientists,
engineers and technicians to document research, experiments and
procedures performed in a laboratory. A lab notebook is often
maintained to be a legal document and may be used in a court of as
evidence.
Prominent use-case: review process
http://rushthecourt.net/mag/wp-content/uploads/2010/09/Three-Ring-Binders.jpg
● You submit a paper
● Several months of review
process is not unlikely
● Reviewers ask for a more
detailed description of the
experiments you did two
years back
Traditional Paper Lab Books
ELN, a survey
Journal of Laboratory Automation 18(3) 229–234, 2012 Society for Laboratory Automation and Screening
DOI: 10.1177/2211068212471834
Examples of ELN software
Practical issues
● Lab technicians “have only two hands”
● Labs are often not equipped with desktop PCs
● Data security of ELNs opposes challenges
● Scientists are classically reluctant adopters
● There is activation energy required to change work habits
● In academic science there is no formal obligation
● Establishment requires stringent modeling (see previous slides on
databases) or significant investments into existing tools
Mobile application of ELNs
Nature Methods 8, 541–543 (2011) doi:10.1038/nmeth.1631
● Handwriting capture
technology
● All functionality as on paper
● Sketch and manipulate
equations
● Draw figures
● All notes can be linked,
reordered, archived, edited,
tagged, annotated and bundled
in virtual 'notebooks'
representing different projects
Easy solutions
Evernote as lab notebook
Journal of Laboratory Automation 18(3) 229–234, 2012 Society for Laboratory Automation and Screening
DOI: 10.1177/2211068212471834

More Related Content

PDF
Data Management for Quantitative Biology - Database systems, May 7, 2015, Dr....
PDF
Android App Development 05 : Saving Data
PPT
SQLITE Android
PPTX
PATTERNS08 - Strong Typing and Data Validation in .NET
PDF
Data management in Stata
PPTX
Limits of RDBMS and Need for NoSQL in Bioinformatics
PPTX
Text mining meets neural nets
PPTX
09.1. Android - Local Database (Sqlite)
Data Management for Quantitative Biology - Database systems, May 7, 2015, Dr....
Android App Development 05 : Saving Data
SQLITE Android
PATTERNS08 - Strong Typing and Data Validation in .NET
Data management in Stata
Limits of RDBMS and Need for NoSQL in Bioinformatics
Text mining meets neural nets
09.1. Android - Local Database (Sqlite)

What's hot (12)

ODP
Sql lite android
PDF
Stata datman
DOCX
Accessing data with android cursors
PPTX
Core Data Migrations and A Better Option
PDF
My Article on MySQL Magazine
PDF
Android datastorage
PPT
data mining with weka application
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
PDF
Dotnet differences compiled -1
DOCX
Android database tutorial
PPTX
Database in Android
PDF
Dotnet difference questions & answers Compiled-1(updated)
Sql lite android
Stata datman
Accessing data with android cursors
Core Data Migrations and A Better Option
My Article on MySQL Magazine
Android datastorage
data mining with weka application
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Dotnet differences compiled -1
Android database tutorial
Database in Android
Dotnet difference questions & answers Compiled-1(updated)
Ad

Similar to Data Management for Quantitative Biology - Database Systems (continued) LIMS and E-lab books by Dr. Marius Codrea (20)

PDF
The Genopolis Microarray database
PPTX
Elasticsearch an overview
PPTX
MongoDB Schema Design: Practical Applications and Implications
PPTX
Being RDBMS Free -- Alternate Approaches to Data Persistence
PDF
MongoDB Basics Unileon
PPTX
Mongo db
PPTX
Connect to NoSQL Database (MongoDB) using Node JS & Connect Node.js with NoSQ...
PPTX
Beautiful Research Data (Structured Data and Open Refine)
PDF
Introduction to MongoDB Basics from SQL to NoSQL
PPTX
Introduction To MongoDB
PPTX
Microsoft Azure Big Data Analytics
DOCX
disertation
PPTX
MongoDB Best Practices
PPTX
Webinar: Best Practices for Getting Started with MongoDB
PPTX
Big Data: Guidelines and Examples for the Enterprise Decision Maker
PDF
PPTX
Democratizing Data Science in the Cloud
PPTX
Pandas application
PDF
Informatics_Practices_SrSec_2024-25.pdf.
PPT
Intro_2.ppt
The Genopolis Microarray database
Elasticsearch an overview
MongoDB Schema Design: Practical Applications and Implications
Being RDBMS Free -- Alternate Approaches to Data Persistence
MongoDB Basics Unileon
Mongo db
Connect to NoSQL Database (MongoDB) using Node JS & Connect Node.js with NoSQ...
Beautiful Research Data (Structured Data and Open Refine)
Introduction to MongoDB Basics from SQL to NoSQL
Introduction To MongoDB
Microsoft Azure Big Data Analytics
disertation
MongoDB Best Practices
Webinar: Best Practices for Getting Started with MongoDB
Big Data: Guidelines and Examples for the Enterprise Decision Maker
Democratizing Data Science in the Cloud
Pandas application
Informatics_Practices_SrSec_2024-25.pdf.
Intro_2.ppt
Ad

Recently uploaded (20)

PPTX
Introduction to Building Materials
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
1_English_Language_Set_2.pdf probationary
PPTX
Cell Types and Its function , kingdom of life
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Empowerment Technology for Senior High School Guide
PDF
Complications of Minimal Access Surgery at WLH
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
Computing-Curriculum for Schools in Ghana
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
Introduction to Building Materials
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
A systematic review of self-coping strategies used by university students to ...
UNIT III MENTAL HEALTH NURSING ASSESSMENT
Indian roads congress 037 - 2012 Flexible pavement
1_English_Language_Set_2.pdf probationary
Cell Types and Its function , kingdom of life
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Weekly quiz Compilation Jan -July 25.pdf
Empowerment Technology for Senior High School Guide
Complications of Minimal Access Surgery at WLH
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Practical Manual AGRO-233 Principles and Practices of Natural Farming
What if we spent less time fighting change, and more time building what’s rig...
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Computing-Curriculum for Schools in Ghana
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE

Data Management for Quantitative Biology - Database Systems (continued) LIMS and E-lab books by Dr. Marius Codrea

  • 1. Dr. Sven Nahnsen/Dr. Marius Codrea, Quantitative Biology Center (QBiC) Data Management for Quantitative Biology Lecture 5: Database systems (continued) LIMS and E-lab books
  • 2. Many database design & concepts http://dataconomy.com/wp-content/uploads/2014/07/fig2large.jpg 2
  • 3. Databases DB = "A database is an organized collection of data" http://en.wikipedia.org/wiki/Database DB = DB + data model for the application at hand (business logic) + implementation DB = DB + database management system (DBMS). Software than enables: 3 CRUD • Create entries • Read (retrieve) • Update / edit • Delete DB = DB + Administration (User privilages, monitoring)
  • 4. Selected database systems I. Relational databases MySQL II.NoSQL databases MongoDB
  • 5. Specific characteristics MongoDB vs MySQL More details here: http://db-engines.com/en/system/MongoDB%3BMySQL System Property MongoDB MySQL Initial release 2009 1995 Current release 3.0.2, April 2015 5.6.24, April 2015 Triggers No Yes MapReduce Yes No Foreign keys No Yes Transaction concepts No ACID* *A database transaction, must be Atomic, Consistent, Isolated and Durable.
  • 6. Fields Record 1 Record 6 Primary keyPrimary key Foreign Key Ref Mice.Mouse_number ● The values of the primary keys uniquely identifies the rows of the table ● The foreign key uniquely links the rows of the host table to 1 record in the referencing table Mice table Samples table Terminology - Relational databases
  • 7. Mice table Samples table Samples are RELATED to mice 1:N one-to-many relationship Relational databases (Normalization)
  • 11. Relational „facts“ 1.Rigid schema (once the structure is defined, it may be difficult to adjust) 2.Normalization introduces/requires additional tables, joins, indices and it scatters data 3.Each field in each record has a single value of a pre-defined type
  • 12. Mice table Samples table 1:M one-to-many relationship Relational „facts“ 1 Generalization to other Projects/Experiments in the lab? Rigid schema (once the structure is defined, it may be difficult to adjust)
  • 13. Organisms table Samples table BROKEN 1:N one-to-many relationship Relational „facts“ 1 Mice table
  • 16. Projects table Relational „facts“ 1 Users table Many users can be involved in many projects. With many roles? Projects_Users table
  • 17. Projects table Relational „facts“ 2 Users table Normalization introduces/requires additional tables, joins, indices and scatters data Projects_Users table CREATE INDEX usr  on Project_Users (User_ID);
  • 18. Relational „facts“ 3 Each field in each record has a single value of a pre-defined type Primary Key Field 1 Field 2 Field 3 A 2-D map (tuples)
  • 19. Relational „facts“ 3 A single value ?!? What if a person has 2 affiliations and thus 2 addresses, 2 phone numbers, etc? Normalization? Again?
  • 20. NoSQL DB = "A database is an organized collection of data" http://en.wikipedia.org/wiki/Database ● Can we allow for “some” heterogeneity of the data? ● Can the records be highly similar but not necessarily identical? (e.g., most of the users having just 1 phone number but others more?)
  • 21. MongoDB is a document-oriented DB { Mouse_number: “1”, Gender: “Male”, Age: 3, Treatment: “Vitamin A” } Field:value pairs Document ~ Record http://www.mongodb.org/
  • 22. MongoDB Documents { Mouse_number: “1”, Gender: “Male”, Age: 3, Treatment: “Vitamin A” } Field:value pairs Documents are BSON files (binary JSON) Closely resemble structures in programming languages (key-value association) Each field can be ● NULL ● Single value (integer, string, etc) ● An array of many values ● Other embedded documents ● A reference to another document
  • 23. MongoDB Collections Documents are stored in Collections Collection ~ Table { Mouse_number: “6”, Gender: “Female”, Age: 2, Treatment: “Vitamin B” , }
  • 24. Different representation – The challenge remains the same: Model the relationships between data Organisms Projects Users AffiliationsSamples N N N N N N 1 1
  • 25. Design & operational mechanisms MySQL ● Primary Key ● Foreign Key ● Join Tables MongoDB ● Unique ID ● References ● Embedding
  • 26. MongoDB – Field types { _id: <ObjectID1> Username: { first_name: “Hans”, last_name: “Meyer” }, Gender: “Male”, Age: 30, Phones: [“+490777”, “+350777”], Affiliations_id: <UUID_affiliation> } Users document ● array ● embedded document ● reference ● Unique ID
  • 27. MongoDB – Field types ● Unique ID_id: <ObjectID1> Acts as a primary key ObjectId is a 12-byte BSON type, constructed using: ● a 4-byte value representing the seconds since the Unix time, ● a 3-byte machine identifier, ● a 2-byte process id, and ● a 3-byte counter, starting with a random value. http://docs.mongodb.org/manual/reference/object-id/ ObjectId("507f1f77bcf86cd799439011")
  • 28. MongoDB – Field types ● arrayPhones: [“+490777”, “+350777”] ● Upon indexing, each value in the array is in the index ● Query for ANY matching value
  • 29. MongoDB – Field types { _id: <ObjectID1> Username: { first_name: “Hans”, last_name: “Meyer” }, Gender: “Male”, } ● embedded document ● Pre-joined data? ● Can be indexed ● Query at any level on any field
  • 30. MongoDB – Field types { _id: <ObjectID1> Username: { first_name: “Hans”, last_name: “Meyer” }, Gender: “Male”, Age: 30, Phones: [“+490777”, “+350777”], Affiliations_id: <UUID_affiliation> } Users document ● reference Affiliations document { _id: <UUID_affiliation> Name: “My lab”, Address: “Tübingen” }
  • 31. Where is the catch? ● "In MongoDB, write operations are atomic at the document level, and no single write operation can atomically affect more than one document or more than one collection." ● OK, then references (normalized model) are not really Foreign Keys that the DB engine resolves. "Client-side applications must issue follow-up queries to resolve the references". (see next slide) ● “A denormalized data model with embedded data combines all related data for a represented entity in a single document. This facilitates atomic write operations since a single write operation can insert or update the data for an entity.” ● OK, denormalize. Maximum default document size is 16MB. http://docs.mongodb.org/manual/
  • 32. Foreign key „ON DELETE CASCADE“ “Mouse number 3 went wrong. Let's just delete it.” SELECT * from samples; DELETE from mice where  Mouse_number = 3; SELECT * from samples; Where are these two samples gone?
  • 33. The key challenge Find the right structure of the documents (references and embedded documents) that best fit ● the requirements of the application (queries, updates) -data usage ● the performance of the database engine
  • 34. Model the relationships between data Organisms Projects Users AffiliationsSamples N N N N N N 1 1
  • 35. Model the relationships between data 1:N Organisms Samples N 1 Organisms Sample_ids: [ ] Samples Organism_id: OR? Depends on the most frequent question? ● What samples do I have from Organism X ? ● Where Sample Y came from? ● How many samples? Reach the 16MB limit? ● Organism embeds multiple samples
  • 36. Relational „facts“ 3 Each field in each record has a single value of a pre-defined type Primary Key Field 1 Field 2 Field 3 A 2-D map (tuples)
  • 38. Queries { _id: 4, Project_ID: 2, Species: “human”, Gender:””, Age: 30, Treatment:”Vaccine A” } Organisms db.organisms.insert(   {    Project_ID: 2,   Species: “human”,   Gender :””,   Age: 30,   Treatment: ”Vaccine A” }   
  • 39. Queries { _id: 4, Project_ID: 2, Species: “human”, Gender:””, Age: 30, Treatment:”Vaccine A” } Organisms db.organisms.find(   { Project_ID: { $eq : 2}  })    SELECT * from organisms WHERE Project_ID = 2;
  • 40. Queries { _id: 4, Project_ID: 2, Species: “human”, Gender:””, Age: 30, Treatment:”Vaccine A” } Organisms db.organisms.find(   { $and: [{Species: /h.*/},  {Age: {$gt: 20 }}] })   SELECT * from organisms  WHERE Species like 'h%' AND  Age > 20;
  • 41. Schema flexibility { _id: 4, Project_ID: 2, Species: “human”, Gender:””, Age: 30, Treatment:”Vaccine A” } Organisms { _id: 14, Project_ID: 5, Species: “human”, Gender:”Female”, Age: 10, Genetic_background: “WT” } Data IS the schema!
  • 42. Queries { _id: 4, Project_ID: 2, Species: “human”, Gender:””, Age: 30, Treatment:”Vaccine A” } Organisms db.organisms.find(  { Genetic_background:  $exists: true }  })   SELECT ??? 
  • 43. Model the relationships between data 1:N Organisms Samples N 1 Organisms Sample_ids: [ ] Samples Organism_id: OR? ● Organism embeds multiple samples
  • 45. Queries { _id: 4, Project_ID: 2, Species: “human”, Gender:””, Age: 30, Samples: [ { _id: 10, Timepoint:”5h”}, { _id: 11, Timepoint:”24h” } ], Treatment:”Vaccine A” } Organisms db.organisms.find(  { '_id': '4',    'Samples._id':'11'   }  ) db.organisms.find(  { '_id': '4',   'Samples.Timepoint':'5h'   }  )
  • 46. Summary ● Database design requires technical and substantial domain specific knowledge ● Normalization ● Indices MySQL ● Primary Key ● Foreign Key ● Join Tables MongoDB ● Unique ID ● References ● Embedding Hint: http://en.wikipedia.org/wiki/Category:Web_application_frameworks
  • 47. Laboratory information management system (LIMS) Organisms Projects Users AffiliationsSamples N N N N N N 1 1 An underlying data structure of a simple LIMS design
  • 48. LIMS definition http://en.wikipedia.org/wiki/Laboratory_information_management_system „A Laboratory Information Management System (LIMS), sometimes referred to as a Laboratory Information System (LIS) or Laboratory Management System (LMS), is a software-based laboratory and information management system that offers a set of key features that support a modern laboratory's operations.“
  • 49. LIMS properties and functionality http://en.wikipedia.org/wiki/Laboratory_information_management_system ● Meta data of any sample entering the laboratory ● Tracking of processes throughout sample treatment and preparation; scheduling of the sample and the associated analytical workload ● Quality control associated with the sample and the utilized equipment and inventory ● Inspection, approval, and compilation of the sample data for reporting and/or further analysis
  • 50. Advantages of LIMS 50 ● Fewer transcription errors ● Faster sample processing ● Real-time control of data and metadata ● Reproducibility of experimental processes ● Direct electronic reporting to clients ● Despite many advantages,...
  • 51. Disadvantages of LIMS ● Customization of LIMS ● Interface is required ● Adequate validation to ensure data quality
  • 52. With a good LIMS in place, we can consider Electronic Laboratory Notebooks
  • 53. Electronic laboratory notebooks (ELN) http://en.wikipedia.org/wiki/Electronic_lab_notebook An electronic lab notebook (also known as electronic laboratory notebook, or ELN) is a computer program designed to replace paper laboratory notebooks. Lab notebooks in general are used by scientists, engineers and technicians to document research, experiments and procedures performed in a laboratory. A lab notebook is often maintained to be a legal document and may be used in a court of as evidence.
  • 54. Prominent use-case: review process http://rushthecourt.net/mag/wp-content/uploads/2010/09/Three-Ring-Binders.jpg ● You submit a paper ● Several months of review process is not unlikely ● Reviewers ask for a more detailed description of the experiments you did two years back
  • 56. ELN, a survey Journal of Laboratory Automation 18(3) 229–234, 2012 Society for Laboratory Automation and Screening DOI: 10.1177/2211068212471834
  • 57. Examples of ELN software
  • 58. Practical issues ● Lab technicians “have only two hands” ● Labs are often not equipped with desktop PCs ● Data security of ELNs opposes challenges ● Scientists are classically reluctant adopters ● There is activation energy required to change work habits ● In academic science there is no formal obligation ● Establishment requires stringent modeling (see previous slides on databases) or significant investments into existing tools
  • 59. Mobile application of ELNs Nature Methods 8, 541–543 (2011) doi:10.1038/nmeth.1631 ● Handwriting capture technology ● All functionality as on paper ● Sketch and manipulate equations ● Draw figures ● All notes can be linked, reordered, archived, edited, tagged, annotated and bundled in virtual 'notebooks' representing different projects
  • 61. Evernote as lab notebook Journal of Laboratory Automation 18(3) 229–234, 2012 Society for Laboratory Automation and Screening DOI: 10.1177/2211068212471834