2012 03 08_dbi

FBW
21-02-2008

RELOADED
2

Wim Van Criekinge

Three Basic Data Types

• Scalars - $
• Arrays of scalars - @
• Associative arrays of
scalers or Hashes - %

• [m]/PATTERN/[g][i][o]
• s/PATTERN/PATTERN/[g][i][e][o]
• tr/PATTERNLIST/PATTERNLIST/[c][d][s]

The ‘structure’ of a Hash

• An array looks something like this:
0 1 2 Index
@array =
'val1' 'val2' 'val3' Value

• A hash looks something like this:
Rob Matt Joe_A Key (name)
%phone =
353-7236 353-7122 555-1212 Value

Sub routine

$a=5;
$b=9;
$sum=Optellen(5,9);
print "The SUM is $sumn";
sub Optellen()
{
$d=@_[0];
$e=@_[1];
#alternatively we could do this: my($a,
$b)=@_;
my($answer)=$d+$e;
return $answer;
}

Overview

• Advanced data structures in Perl
• Object-oriented Programming in Perl
• Bioperl: is a large collection of Perl
software for bioinformatics
• Motivation:
– Simple extension: “Multiline parsing“
more difficult than expected
• Goal: to make software modular,
easier to maintain, more reliable, and
easier to reuse

Multi-line parsing
use strict;
use Bio::SeqIO;

my $filename="sw.txt";
my $sequence_object;

my $seqio = Bio::SeqIO -> new (
'-format' => 'swiss',
'-file' => $filename
);

while ($sequence_object = $seqio -> next_seq) {
my $sequentie = $sequence_object-> seq();
print $sequentie."n";
}

Perl 00

• A class is a package

• An object is a reference to a data
structure (usually a hash) in a class

• A method is a subroutine in the class

Perl Classes

• Modules/Packages
– A Perl module is a file that uses a package
declaration
– Packages provide a separate namespace for
different parts of program
– A namespace protects the variable of one part of
a program from unwanted modification by
another part of the program
– The module must always have a last line that
evaluates to true, e.g. 1;
– The module must be in “known” directory
(environment variable)
• Eg … site/lib/bio/Sequentie.pm

Installation on Windows (ActiveState)

• Using PPM shell to install BioPerl
– Get the number of the BioPerl repository:
– PPM>repository
– Set the BioPerl repository, find BioPerl, install
BioPerl:
– PPM>repository set <BioPerl repository number>
– PPM>search *
– PPM>install <BioPerl package number>
• Download BioPerl in archive form from
– http://www.BioPerl.org/Core/Latest/index.shtml
– Use winzip to uncompress and install

Directory Structure

• BioPerl directory structure organization:
– Bio/ BioPerl modules
– models/ UML for BioPerl classes
– t/ Perl built-in tests
– t/data/ Data files used for the tests
– scripts/ Reusable scripts that use BioPerl
– scripts/contributed/ Contributed scripts not
necessarily integrated into BioPerl.
– doc/ "How To" files and the FAQ as XML

Live.pl

#!e:Perlbinperl.exe -w
# script for looping over genbank entries, printing out name
use Bio::DB::Genbank;
use Data::Dumper;

$gb = new Bio::DB::GenBank();

$sequence_object = $gb->get_Seq_by_id('MUSIGHBA1');
print Dumper ($sequence_object);

$seq1_id = $sequence_object->display_id();
$seq1_s = $sequence_object->seq();
print "seq1 display id is $seq1_id n";
print "seq1 sequence is $seq1_s n";

File converter

#!/opt/perl/bin/perl -w
#genbank_to_fasta.pl
use Bio::SeqIO;
my $input = Bio::SeqIO::new->(‘-file’ => $ARGV[0],
‘-format’ =>
‘GenBank’);
my $output = Bio::SeqIO::new->(‘-file’ => ‘>output.fasta’,
‘-format’ => ‘Fasta’);

while (my $seq = $input->next_seq()){
$output->write_seq($seq)
}

• Bptutorial.pl

• It includes the written tutorial as well
as runnable scripts

• 2 ESSENTIAL TOOLS
– Data::Dumper to find out what class your
in
– Perl bptutorial (100 Bio::Seq) to find the
available methods for that class

Oefening 1

Run Needleman-Wunsch-monte-carlo.pl

– my $MATCH = 1; # +1 for letters that match
– my $MISMATCH = -1; # -1 for letters that mismatch
– my $GAP = -1; # -1 for any gap

 Score (-64)

 Score = f($MATCH,$MISMATCH,$GAP)

f?
Implement convergence criteria
Store in DATABASE, make graphs in Excel

Objectives
• Start MySQL and learn how to use the MySQL
Reference Manual

• Create a database

• Change (activate) a database

• Create tables using MySQL

• Create and run SQL commands in MySQL

Objectives (continued)
• Identify and use data types to define columns in
tables

• Understand and use nulls

• Add rows to tables

• View table data

• Correct errors in a database

Opvolger voor MySQL Front
• MySQL-Front was destijds een van
de meest populaire MySQL-
management applicaties. Wat
PHPMyAdmin voor webapplicaties
is, was MySQL-Front dat voor de
desktop. Helaas kon /wilde de
auteur niet langer doorgaan met het
project en werd het project
stilgelegd.
• In begin April 2006 heeft de originele
auteur besloten om de laatste
broncode voor MySQL-Front
beschikbaar te maken onder de
naam HeidiSQL en de eerste beta is

Starting MySQL
• Windows XP
– Click Start button
– Point to All Programs
– Point to MySQL on menu
– Point to MySQL Server 4.1
– Click MySQL Command Line Client
• Must enter password in Command Line Client
window

Obtaining Help in MySQL
• Type h at MySQL> prompt

• Type “help” followed by name of command

– help contents

– help union

Creating a Database

• Must create a database before creating tables
• Use CREATE DATABASE command
• Include database name

Creating a Database (continued)

Changing the Default Database
• Default database: database to which all
subsequent commands pertain
• USE command, followed by database name:
– Changes the default database
– Execute at the start of every session

Creating a Table

• Describe the layout of each table in the
database

• Use CREATE TABLE command

• TABLE is followed by the table name

• Follow this with the names and data types of the
columns in the table

• Data types define type and size of data

Table and Column Name Restrictions

• Names cannot exceed 18 characters

• Must start with a letter

• Can contain letters, numbers, and underscores
(_)

• Cannot contain spaces

Entering Commands in MySQL
• Commands are free-format; no rules stating specific
words in specific positions

• Press ENTER to move to the next line in a
command

• Indicate the end of a command by typing a
semicolon

• Commands are not case sensitive

Editing SQL Commands
• Statement history: stores most recently used
command

• Editing commands:
– Use arrow keys to move up, down, left, and right
– Use Ctrl+A to move to beginning of line
– Use Ctrl+E to move to end of line
– Use Backspace and Delete keys

Editing MySQL Commands
• Press Up arrow key to go to top line

• Press Enter key to move to next line if line is correct

• Use Right and Left arrow keys to move to location of
error

• Press ENTER key when line is correct

• If Enter is not pressed on a line, line not part of the
revised command

Dropping a Table
• Can correct errors by dropping (deleting) a table and
starting over

• Useful when table is created before errors are
discovered

• Command is followed by the table to be dropped
and a semicolon

• Any data in table also deleted

Data Types

• For each table column, type of data must be
defined
• Common data types:
– CHAR(n)
– VARCHAR(n)
– DATE
– DECIMAL(p,q)
– INT
– SMALLINT

Nulls
• A special value to represent situation when actual
value is not known for a column

• Can specify whether to allow nulls in the individual
columns

• Should not allow nulls for primary key columns

Implementation of Nulls
• Use NOT NULL clause in CREATE TABLE
command to exclude the use of nulls in a column

• Default is to allow null values

• If a column is defined as NOT NULL, system will
reject any attempt to store a null value there

Adding Rows to a Table

• INSERT command:
– INSERT INTO followed by table name
– VALUES command followed by specific values in
parentheses
– Values for character columns in single quotation
marks

Modifying the INSERT Command

• To add new rows modify previous INSERT command

• Use same editing techniques as those used to
correct errors

The INSERT Command with Nulls
• Use a special format of INSERT command to enter a
null value in a table

• Identify the names of the columns that accept non-
null values, then list only the non-null values after
the VALUES command

The INSERT Command with Nulls

• Enter only non-null values
• Precisely indicate values you are entering by listing
the columns

The INSERT Command with Nulls (continued)

Viewing Table Data
• Use SELECT command to display all the rows and
columns in a table

• SELECT * FROM followed by the name of the table

• Ends with a semicolon

Viewing Table Data (continued)

Correcting Errors In the Database

• UPDATE command is used to update a value in a
table

• DELETE command allows you to delete a record

• INSERT command allows you to add a record

Correcting Errors in the Database
• UPDATE: change the value in a table
• DELETE: delete a row from a table

Correcting Errors in the Database (continued)

Saving SQL Commands
• Allows you to use commands again without retyping
• Different methods for each SQL implementation you
are using
– Oracle SQL*Plus and SQL*Plus Worksheet use a
script file
– Access saves queries as objects
– MySQL uses an editor to save text files

Saving SQL Commands
• Script file:
– File containing SQL commands
– Use a text editor or word processor to create
– Save with a .txt file name extension
– Run in MySQL:
• SOURCE file name
• . file name
– Include full path if file is in folder other than default

Creating the Remaining Database Tables

• Execute appropriate CREATE TABLE and INSERT
commands

• Save these commands to a secondary storage
device

Summary
• Use MySQL Command Line Client window to enter
commands
• Type h or help to obtain help at the mysql> prompt
• Use MySQL Reference Manual for more detailed
help

Summary (continued)
• Use the CREATE DATABASE command to create a
database

• Use the USE command to change the default
database

• Use the CREATE TABLE command to create tables

• Use the DROP TABLE command to delete a table

Summary (continued)
• CHAR, VARCHAR, DATE, DECIMAL, INT and
SMALLINT data types
• Use INSERT command to add rows
• Use NOT Null clause to identify columns that cannot
have a null value
• Use SELECT command to view data in a table

Summary (continued)
• Use UPDATE command to change the value in a
column
• Use DELETE command to delete a row
• Use SHOW COLUMNS command to display a
table’s structure

• use DBI;

• my $dbh = DBI->connect( 'dbi:mysql:guestdb',
• 'root',
• '',
• ) || die "Database connection not made: $DBI::errstr";

• $sth = $dbh->prepare('SELECT * FROM demo');
• $sth->execute();
• while (my @row = $sth->fetchrow_array) {

• print join(":",@row),"n";
• }
• $sth->finish();

• $dbh->disconnect();

The Players

• Perl – a programming language
• DBMS – software to manage datat storage
• SQL – a language to talk to a DBMS
• DBI – Perl extensions to send SQL to a
DBMS
• DBD – software DBI uses for specific DBMSs
• $dbh – a DBI object for course-grained
access
• $sth – a DBI object for fine-grained access

• What is DBI ?

• DBI is a DataBase Interface
– It is the way Perl talks to Databases
• DBI is a module by Tim Bunce
• DBI is a community of modules & developers

• What is an interface ?

• The overlap where two phenomeba affect
each other
• A point at which independent systems interact
• A boundary across which two systems
communicate

• A Sample Interface (the bedrock of DBI)

Bone

Fred Wilma
Dino

• Characteristics of the DINO interface

• Separation of knowledge
– Fred doesn’t need to know how to find Wilma
– Dino doesn’t need to know how to read
• Generalizability
– Fred can send any message
– Fred can communicate with anyone

• The DBI interface

SQL

Perl DBMS
DBI

• Characteristics of the DBI interface

• Separation of knowledge
– You don’t need to know how to connect
– DBI doesn’t need to know SQL

• Generalizeability
– You can send any SQL
– You can communicate with any DBMS

• The ingredients of a DBI App
– 1: A perl script that uses DBI
– 2: A DBMS
– 3: SQL statements

Outline of a basic DBI script

Set the Perl Environment
Connect to a DBMS
Perform data-affecting SQL instructions
Perform data-returning SQL requests
Disconnect from the DBMS

• $dbh = DataBase Handle
• Done by DBI
– Connect
• Done by $dbh, The Database Handle
– Perform SQL instructions
– Perform SQL request
– Disconnect

• Set the Perl Environment
– use warnings;
– use strict;
– Use DBI;

• Connect to a DBMS

my $dbh = DBI -> connect (‘dbi:DBM:’)

$dbh is a Database Handke
An object created by DBI to handle access to
this specific connection

• Perform data-affecting Instructions

• $dbh->do($sql_string);

• $dbh->do(“ INSERT INTO geography
VALUES (‘Nepal’,’Asia’)” );

• Perform data-returning requests

• My @row = $dbh-
>selectrow_array($sql_string)

• Disconnect from DBMS

• $dbh->disconnect()

A complete script

• use strict;
• use warnings;
• use DBI;

• my $dbh=DBI->connect("dbi:mysql:test","root","");
• $dbh->do("CREATE TABLE geography (country Text, region
Text)");
• $dbh->do("INSERT INTO geography VALUES
('Nepal','Asia')");
• $dbh->do("INSERT INTO geography VALUES
('Portugal','Europe')");
• print $dbh->selectrow_array("SELECT * FROM geography");
• $dbh->disconnect

• The script output

• Only one row
• No seperation of the fields
• No metadata

• Improvements

• DBI
– Connect to DBMS
– Creates a database handle ($dbh)
• $dbh
– Provides course-grained access to the DBMS
– Creates a statement handle ($sth)
• $sth
– Provides fine-grained access to the DBMS

• Life-cycle of a statement handle ($sth)

• Prepare
– Creates the handle, sends SQL to the DBMS to
be analyzed and optimized
• Execute
– Instructs the DBMS to perform operations
• Fetch
– Brings data from the DBMS into a script

• Life-cycle of a statement handle ($sth)

• My $sth = $dbh->prepare($sql_string);


• Print $sth->fetchrow_array();

• Fecthing rows in a loop – the snippet

• My $sth=$dbh->prepare(“SELECT * FROM
geography”);
• While (my @row = $sth->fetchrow_array){
• Print join(“:”,@row),”n”;
• }

• Output
– Nepal:Asia
– Portugal:Europe

• All data retrieved
• Colums seperated
• Rows seperated
• Still no metadata

• Finding Metadata – Handle Attributes

• $handle->{$key}=$value;
• Print $handle->{$key};

• $dbh->{RaiseError}=1;
• Print $dbh->{RaiseError};
• My $column_names = $sth->{NAME};

• Finding Metadata with $sth->{NAME}

• my $sth=$dbh->prepare(“SELECT * FROM
geography”);
• my @column_names=@{$sth->{NAME}};
• my $num_cols = scaler @column_names;
• print join “:”,@column_names;
• print “(there are $num_cols columns)”;

• Errors

• $dbh->do (“Junk”);
• Print “I Got here!”;

• Checking Errors with RaiseError

• my $dbh=DBI->connect >..

• $dbh->{RaiseError}=1;
• $dbh->do(“Junk”);
• Print “Here ?”;

Number of rows affected

$rows=$dbh->do(“DELETE FROM user
WHERE age <42”);

# undef = error
# 3 = 3 rows affected
# 0E0 = no error; no rows affected
# -1 = unknown

• Summary so far

• DBI connect($data_source)

• $dbh do($sql_instruction)
• Prepare ($sql_request)
• Disconnect()
• {RaiseError}

• $sth execute()
– Fetchrow_array()
– {NAMEM}

• A Deeper look at connection

DBD#1 MySQL

Perl DBI

Oracle
DBD#2

• DBDs- Database Drivers

• DRIVER DBMS
• DBD::DBM DBM
• DBD::Pg postgreSQL
• DBD::mysql MySQL
• DBD::Oracle Oracle
• DBD::ODBC Ms-Access, MS-SQL-
Server
• …

• Variation in DBDs & DBMSs

• Driver-specific connection parameters
• Driver-specific attributes and methods
• SQL implementaion
• Optimization Plans

• Driver-Specific Connection Params – driver name – user
name and password

• My $dbh = DBI->connect(
• “DBI:$driver:”,
• “root”,
• “password”;
• {
• RaiseError => 1,
• PrinError => 0,
• AutoCommit =>1,
• }

• );

Finish() – fetchus interuptus

While (my @row=$sth->fetchrow_array){
Last if $row[0] eq $some_conditions;
}

$sth->finish();

• Alternate fecthes

• My @row=$sth->fetchrow_array();
– Print $row[1];
• My @row=$sth->fetchrow_arrayref();
– Print $row->[1]
• My @row=$sth->fetchrow_hashref();
– Print $row->{region};

• Placeholders !

• my $sth = $dbh -> prepare (“SELECT name
from user WHERE country = ? AND city = ?
AND age > ?”);
• $sth-> execute(‘Venezuela’,’Caracas’,21);

• DBDs that don’t need a separate DBMS

• DBD::CSV, DBD::Excel

• DBD::Amazon DBD::Google
• use DBI; my $dbh = DBI-
>connect("dbi:Google:", $KEY); my $sth =
$dbh->prepare(qq[ SELECT title, URL FROM
google WHERE q = "perl" ]); while (my $r =
$sth->fetchrow_hashref) { ...

Step1: Getting Drivers
Essential for SQL Querying

• A driver is a piece of software that lets your
operating system talk to a database
– Installed drivers visible in ODBC manager
• “data connectivity” tool
• Each database engine (Oracle, MySQL, etc)
requires its own driver
– Generally must be installed by user
• Drivers are needed by Data Source Name
tool and querying programs
• Require (simple) installation

MySQL Driver: Needed to Query MySQL Databases

• Windows: Download MySQL
Connector/ODBC 3.51 here
• Must be installed for direct querying
using e.g. Excel
– Not necessary if you are using the MySQL
Query Browser

Oefening 2

Fetch a sequence by adapting live.pl and do remote blast using 3
different scoring matrices (summarize results) and perform
“controls” using adaptation of shuffle …

Rat versus Rat versus
mouse RBP bacterial
lipocalin

Parsing BLAST Using BPlite, BPpsilite, and BPbl2seq

• Similar to Search and SearchIO in
basic functionality
• However:
– Older and will likely be phased out in the
near future
– Substantially limited advanced
functionality compared to Search and
SearchIO
– Important to know about because many
legacy scripts utilize these objects and
either need to be converted

Parse BLAST output
#bioperl_blast_parse.pl
# program prints out query, and all hits with scores for each blast result
use Bio::SearchIO;

my $record = Bio::SearchIO->new(-format => ‘blast’, -file => $ARGV[0]);

while (my $result = $record->next_result){
print “>”, $result->query_name, “ “, $result->query_description, “n”;
my $seen = 0;
while (my $hit = $result->next_hit){
print “t”, $hit->name, “t”, $hit->bits, “t”, $hit->significance, “n”;
$seen++ }
if ($seen == 0 ) { print “No Hits Foundn” }
}

Parse BLAST in a little more detail
#bioperl_blast_parse_hsp.pl
# program prints out query, and all hsps with scores for each blast result
use Bio::SearchIO;
my $record = Bio::SearchIO->new(-format => ‘blast’, -file => $ARGV[0]);
while (my $result = $record->next_result){
print “>”, $result->query_name, “ “, $result->query_description, “n”;
my $seen = 0;
while (my $hit = $result->next_hit{
$seen++;
while (my $hsp = $hit->next_hsp){
print “t”, $hit->name, “has an HSP with an evalue of: “,
$hsp->evalue, “n”;}
if ($seen == 0 ) { print “No Hits Foundn” }
}

Shuffle
#!/usr/bin/perl -w
use strict;

my ($def, @seq) = <>;
print $def;
chomp @seq;
@seq = split(//, join("", @seq));
my $count = 0;
while (@seq) {
my $index = rand(@seq);
my $base = splice(@seq, $index, 1);
print $base;
print "n" if ++$count % 60 == 0;
}
print "n" unless $count %60 == 0;

Searching for Sequence Similarity

• BLAST with BioPerl
• Parsing Blast and FASTA Reports
– Search and SearchIO
– BPLite, BPpsilite, BPbl2seq
• Parsing HMM Reports
• Standalone BioPerl BLAST

Remote Execution of BLAST
• BioPerl has built in capability of running BLAST jobs remotely
using RemoteBlast.pm
• Runs these jobs at NCBI automatically
– NCBI has dynamic configurations (server side) to “always” be up and
ready
– Automatically updated for new BioPerl Releases
• Convenient for independent researchers who do not have
access to huge computing resources
• Quick submission of Blast jobs without tying up local
resources (especially if working from standalone workstation)
• Legal Restrictions!!!

Example of Remote Blast
A script to run a remote blast would be something like the following skeleton:

$remote_blast = Bio::Tools::Run::RemoteBlast->new( '-prog' =>
'blastp','-data' => 'ecoli','-expect' => '1e-10' );
$r = $remote_blast->submit_blast("t/data/ecolitst.fa");
while (@rids = $remote_blast->each_rid ) { foreach $rid
( @rids ) {$rc = $remote_blast->retrieve_blast($rid);}}

In this example we are running a blastp (pairwise comparison) using the
ecoli database and a e-value threshold of 1e-10. The sequences that are
being compared are located in the file “t/data/ecolist.fa”.

Example
It is important to note that all command line options that fall under the blastall
umbrella are available under BlastRemote.pm.

For example you can change some parameters of the remote job.

Consider the following example:

$Bio::Tools::Run::RemoteBlast::HEADER{'MATRIX_NAME'} =
'BLOSUM25';

This basically allows you to change the matrix used to BLOSUM 25, rather
than the default of BLOSUM 62.

Parsing BLAST and FASTA Reports
• Main BioPerl objects in 1.2 are
Search.pm/SearchIO.pm
– SearchIO is more robust and the preferred choice (will be
continued to be supported in future releases)
• Support parsing of BLAST XML reports and other
• Also allow the ability to parse HMMER reports
• Will continue to grow and provide functionality for
parsing all types of reports. This way multiple report
types can be handled by simply creating multiple
instantiations of the SearchIO object.

Parsing Blast Reports

• One of the strengths of BioPerl is its ability to
parse complex data structures. Like a blast
report.
• Unfortunately, there is a bit of arcane
terminology.
• Also, you have to ‘think like bioperl’, in order
to figure out the syntax.
• This next script might get you started

Sample Script to Read and Parse BLAST Report

# Get the report $searchio = new Bio::SearchIO (-format => 'blast', -file =>
$blast_report);
$result = $searchio->next_result; # Get info about the entire report $result-
>database_name;
$algorithm_type = $result->algorithm;
# get info about the first hit $hit = $result->next_hit;
$hit_name = $hit->name ;
# get info about the first hsp of the first hit $hsp =
$hit->next_hsp;
$hsp_start = $hsp->query->start;

2012 03 08_dbi

More Related Content

What's hot (19)

Viewers also liked (17)

Similar to 2012 03 08_dbi (20)

More from Prof. Wim Van Criekinge (20)

Recently uploaded (20)

2012 03 08_dbi