SlideShare a Scribd company logo
Two great open source
databases: a comparison

                 Josh Berkus
                 PostgreSQL Core Team
                 HP Tux Talks, June 26, 2008
Who is Josh?
●   PostgreSQL Core Team
    –   10 years involvement with the project
    –   Large database performance, tuning, project press
        & corporate relations, user groups
●   Database geek
    –   15 years database application development
    –   MS SQL, MySQL, Oracle, others
●   Open Source guru
    –   OpenOffice.org, LedgerSMB, Bricolage, OpenBRR,
        OSCON, more
Topics
●   Sound Bite
●   History
●   Most Common Uses
●   Features
●   Performance
●   Summary
Topics
●   Sound Bite
●   History




                               QL t
    Most Common Uses




                             eS ou
●




                           gr ab
●   Features




                         st ly
                       Po t
    Performance


                          os
●




                       m
●   Summary
Sound Bite


    "The most popular open source database"
              "The web database"




"The world's most advanced open source database"
            "The open source Oracle"
History of MySQL
●   MySQL Server development started in 1994,
    marketed by TCX DataKonsult AB
●   MySQL AB founded in 1995 by Michael
    “Monty” Widenius, David Axmark and Allan
    Larsson
●   Server development based on requirements
    for practical production use: few features, but
    fast and stable
●   Frequent releases with small changes
●   Easy to install and use (15-minute rule)
History of PostgreSQL
• 1986: POSTGRES at the University of California, Berkeley
  > Michael Stonebraker project
  > Successor to INGRES
• 1994: first commecialized
  > as Illustra (later merged into Informix)
• 1995: open-sourced
  > Ported to SQL
  > PostgreSQL Global Development Group formed
• 1997: ported to Japanese, supported in Japan
• 1999: first full-time developers & corporate support
• 2004: native Windows support
• 2006: supported by Sun
Development History


Designed by/for Application Developers




Designed by/for Database Administrators
Development Priorities
                   (historically)

PostgreSQL                  MySQL
1.Data integrity            1.Ease-of-use
2.Security                  2.Performance
3.Reliability               3.Programmer Features
4.Standards                 4.Reliability
5.DB Features               5.DB Features
6.Performance               6.Data integrity
7.Ease-of-use               7.Security
8.Programmer Features       8.Standards
Development Direction
                (a simplification)




               MySQL


               PostgreSQL
Simple,                              Features,
Easy to Use,                         Security,
Fast                                 Standards
Community


 Owned by one company with user community




Community-owned with many companies involved
●   Core MySQL is 100% owned by Sun/MySQL
●   90% of MySQL developers work for Sun
    –   except for the many storage engines
●   MySQL has a large user community
    –   many thousands active worldwide
    –   many partners in other open source groups
●   Sun/MySQL contributes to other OSS projects
    –   PHP especially
●   PostgreSQL has a large distributed
    developer and user community
●   Not owned by any one company
    –   dozens of companies and individuals
        contribute code
    –   est. over 200 developers in 14 time
        zones
●   "Community Owned"
    –   supported by 5 different non-profits
PostgreSQL Community Map

                 Hackers

 Projects
            Committers            User Groups
                                       and
              Co                 National Groups
                 re        Advocacy

     Companies
                                      Foundations
Most Common Uses


●   Web sites           ●   ERP
●   CRM                 ●   Data Warehouse
●   Logging             ●   Geograpic
●   OEM applications    ●   Web Sites
●   Telecom (cluster)   ●   OEM applications
●   Network tools       ●   Network tools
●   Data Warehouse      ●   CRM
Releases


●   Feature-based
    releases                    ●   Time-based releases
    –   new features in minor       –   no new features in
        releases                        minor releases
    –   every 1-3 years             –   every year
         ●   3.23: 2000                  ●   7.4: 2003
         ●   4.0: 2003                   ●   8.0: 2004
         ●   4.1: 2004                   ●   8.1: 2005
         ●   5.0: 2005                   ●   8.2: 2006
         ●   5.1: 2008?                  ●   8.3: 2008
Features
Storage Engines
●   Pluggable "Storage Engines" allow MySQL to
    behave like a variety of different databases
    –   Telecom DB: MySQL Cluster
    –   Non-transactional: MyISAM
    –   Transactional: InnoDB
    –   Compressed: Archive
    –   In-Memory: Memory
    –   Write-only: Blackhole
Programmer Features
●   Excellent drivers for all languages
    –   including JDBC4
●   PHP
    –   high-performance drivers & special syntax
    –   Native driver
●   MySQL Proxy
3rd Party Support
●   Most open source web projects default to
    MySQL
    –   many use only MySQL
    –   primary relational database for most top 25 web
        sites
●   Hundreds of vendors support MySQL
    –   more than 50% of multi-database products
    –   many "MySQL Partners"
MySQL Scale-Out
●   Simple Replication makes it (relatively) simple
    to scale out
    –   used by Google, Yahoo
    –   load-balance reads on slaves
    –   being supplanted by memcached

                                            Master

                                   Writes

                                   Reads    Slave
        Requests
                                   Reads
                   Load Balancer
                                            Slave
Simplicity
●   Easy to set up
    –   "15 minute rule"
    –   everything included
●   Easy to administrate
    –   programmer-administered
    –   most installations don't need tuning
●   Easy Replication
    –   very simple master-slave & multimaster replication
Features
Migrateability
●   Closest to proprietary enterprise DBs
●   Automatic migration from Informix
    –   Informix is 50% PostgreSQL
●   Relatively easy migration from Oracle
    –   easiest of any OSS database
    –   puts migration cost within affordable range
    –   tools for data integration
●   SQL Server, DB2 harder
    –   but easier that MySQL
Security


"... by default, PostgreSQL is
the most security-aware
database available ..."

    Database Hacker's Handbook
(based on a comparison of PostgreSQL,
  MySQL, Oracle, DB2 and SQL Server)
Security
●   Authentication
    –   multiple methods: login, SSL, Kerberos, more
    –   host-based authentication
●   Logging
    –   log output is highly configurable and supports user
        auditing
●   Permissions model
    –   SQL ROLES supported, including nested roles
    –   multiple settable permissions on all database
        objects
Security
●   Clean code
    –   only one security patch per two months
    –   community patches usually out in less 72 hours
    –   only one exploit in the field in the last four years
●   DB Auditing
    –   PostgreSQL supports highly configurable triggers
        and other DB automation
    –   No “auditing toolkit” out yet
Transaction Support
●   "Bulletproof" ACID thanks to MVCC
    –   possibly best of any RDBMS
●   Transactional DDL
    –   apply schema changes in a transaction
    –   great for change management
         ●   including agile development
●   Savepoints
    –   spec-compliant "subtransactions"
BI/DW Features
●   Large database management features
    –   tablespaces, table partitioning
    –   automatic large field/row compression
●   Powerful query planner & executor
    –   complex queries with nested subselects, outer joins
        and calculated fields
    –   large many-table joins with multiple join types
●   Data mining features
    –   full text indexing and regex support
    –   embed external language DM modules
BI/DW Features
select a12.DAY_OF_WEEK_NBR AS DAY_OF_WEEK_NBR,
   max(TO_CHAR(a12.DATE_DESC ,'Day')) AS CustCol_6,
   a11.DATE_ID AS DATE_ID,
   max(a12.DATE_DESC) AS DATE_DESC,
   a11.FI_ID AS FI_ID,
   max(a13.FI_NAME) AS FI_NAME,
   a12.WEEK_YEAR_ID AS WEEK_YEAR_ID,
   max(a14.SHORT_WEEK_DESC) AS SHORT_WEEK_DESC,
   sum (session_count) AS WJXBFS1,
   sum ( a11_count ) AS WJXBFS2
from   ( SELECT DATE_ID, FI_ID, count(distinct SESSION_ID) as
session_count, COUNT(*) as a11_count
       FROM edata.WEB_SITE_ACTIVITY_FA
       WHERE DATE_ID in (2291, 2292, 2293, 2294, 2295)
       GROUP BY DATE_ID, FI_ID )
       a11
   join    edata.DATE_LU a12 on   (a11.DATE_ID = a12.DATE_ID)
   join    edata.DIM_FI a13 on    (a11.FI_ID = a13.FI_ID)
   join    edata.WEEK_LU a14 on   (a12.WEEK_YEAR_ID = a14.WEEK_YEAR_ID)
group by a12.DAY_OF_WEEK_NBR,
   a11.DATE_ID,
   a11.FI_ID,
   a12.WEEK_YEAR_ID
Extensibility
●   Create your own database objects
    –   almost any db object can be extended easily:
         ●   functions
         ●   types
         ●   operators
         ●   aggregates
         ●   pseudo-tables
    –   user-created objects are (usually) first class objects
●   everything is a function
    –   12 different function languages
Extensibility
CREATE OR REPLACE FUNCTION _choose_random_text (
     thestate _random_text,
     newvalue TEXT )
 RETURNS _random_text AS $f$
 DECLARE result _random_text;
 BEGIN
     result.runcount := COALESCE(thestate.runcount, 0) + 1;
     IF random() < ( 1::FLOAT / result.runcount::FLOAT ) THEN
          result.choice := newvalue;
     ELSE
          result.choice := thestate.choice;
     END IF;
     RETURN result;
 END; $f$ LANGUAGE plpgsql;      CREATE AGGREGATE random_agg(
                                     BASETYPE = text,
                                     SFUNC = _choose_random_text,
                                     STYPE = _random_text,
                                     FINALFUNC = _exit_random_text
                                );
Special Data
●   Base Types:                 ●   Exotic types
    –   char, varchar               –   geometric: polygon, line
    –   large text                  –   GIS (through PostGIS)
    –   numeric                     –   crypto
    –   integers                    –   ISN & ISBN
    –   floats                      –   XML
    –   time, date, timestamp       –   network: INET, CIDR
    –   bytea (for binary)          –   arrays
                                    –   full text index
                                    –   genome
Special Data: GIS
Special Data: genomics
●   BLASTgres,
    Unison
    Protein
    Database
Procedural Languages
●   Use the language you prefer, inside the
                             –

    database:
    –   SQL                   –   Java
    –   PL/pgSQL              –   shell
    –   C                     –   R
    –   C++                   –   PHP
    –   Perl                  –   Ruby
    –   Python                –   Tcl

●   In beta now: PSM, Lua
PL/pgSQL
create or replace function _set_self_paths ( )
returns trigger as $f$
declare parrec RECORD;
     has_kids BOOLEAN;
begin
     --prevent setting order_by too high
     EXECUTE 'SELECT * FROM ' || TG_RELNAME || ' WHERE id = ' || CAST(NEW.parent as
TEXT)
          INTO parrec;
     IF parrec.id is not null THEN
          NEW.path := parrec.path || (NEW.id::TEXT);
          NEW.order_path := parrec.order_path || to_char(NEW.order_by, 'FM0000');
          NEW.show_path := parrec.show_path || ' / ' || NEW.name;
     ELSE
          NEW.path := text2ltree(NEW.id::TEXT);
          NEW.order_path := text2ltree(to_char(NEW.order_by, 'FM0000'));
          NEW.show_path := NEW.name;
     END IF;
RETURN NEW;
end;
$f$ language plpgsql;
PL/Perl
CREATE FUNCTION "if_strip_numeric"
(text,smallint) RETURNS text AS $f$
my($the_text, $cutoff) = @_;
$the_text =~ s/[^0-9]/""/eg;
if ( $cutoff > 0 ) {
  $the_text =
    ( substr $the_text, 0, $cutoff );
}
return $the_text;
$f$ LANGUAGE plperl IMMUTABLE, STRICT;
PL/R

create or replace function statsum(text)
returns summarytup as '
 sql<-paste("select id_val from sample_numeric_data ",
           "where ia_id=''", arg1, "''", sep="")
 rs <- pg.spi.exec(sql)
 rng <- range(rs[,1])
 return(data.frame(mean = mean(rs[,1]),
  stddev = sd(rs[,1]), min = rng[1], max = rng[2],
  range = rng[2] - rng[1], count = length(rs[,1])))
' language 'plr';
PL/Java
/**
 * Update a modification time when the row is updated.
 */
static void moddatetime(TriggerData td)
throws SQLException
{
  if(td.isFiredForStatement())
    throw new TriggerException(td, "can't process STATEMENT events");

    if(td.isFiredAfter())
      throw new TriggerException(td, "must be fired before event");

    if(!td.isFiredByUpdate())
      throw new TriggerException(td, "can only process UPDATE events");

    ResultSet _new = td.getNew();
    String[] args = td.getArguments();
    if(args.length != 1)
      throw new TriggerException(td, "one argument was expected");

    _new.updateTimestamp(args[0], new Timestamp(System.currentTimeMillis()));
}
And Others ...
HAI
    CAN HAS DATABUKKIT?
    I HAS A RESULT
    I HAS A RECORD
    GIMMEH RESULT OUTTA DATABUKKIT "SELECT field
FROM mytable"
    IZ RESULT NOOB?
         YARLY
             BYES "SUMWUNZ IN YR PGSQL STEELIN YR
DATA"
    KTHX
    IM IN YR LOOP
         GIMMEH RECORD OUTTA RESULT
         VISIBLE RECORD!!FIELD
         IZ RESULT NOOB? KTHXBYE
    IM OUTTA YR LOOP
KTHXBYE
Hackability
●   Clean, easy to read code
●   Modular interfaces with clean separation of layers
●   #1 most hacked up database
    –   Yahoo, Greenplum, Paraccel, Netezza, Truviso ....
Performance


Better with simple queries and 2-core machines




   Better with complex queries and multi-core
                   machines
Benchmarks
      –   SpecJAppserver 2004, as of July 2007
            J2EE Througput                                   Acquisition Cost Comparison
900
                                                             200000
800                                                          180000

700                                                          160000




                                        Cost in US Dollars
600                                                          140000

500                                                          120000

                                                             100000
400
                                                             80000
300
                                                             60000
200
                                                             40000
100
                                                             20000
 0
                                                                  0
          MySQL   PostgreSQL   Oracle                                 MySQL   PostgreSQL   Oracle
Essential Performance
1)Every application performs best with the
  database for which is was designed.
2)Performance benchmarks for databases are
  constantly increasing.
3)Top databases are close enough that you can
  pick the one which suits you best.
Questions?
●   e-mail: josh@postgresql.org
●   IRC: irc.freenode.net, #postgresql
●   blog: blogs.ittoolbox.com/database/soup




                                               Special thanks to:
                                               Giuseppe Maxia and Harrison Fisk of MySQL for information
                                               about MySQL

              This talk is copyright 2008 Josh Berkus, and is licensed under the creative commons attribution license

More Related Content

PDF
The Great Debate: PostgreSQL vs MySQL
 
PDF
Data Mining and Business Intelligence Tools
PPTX
Graph databases
PDF
Optimizing Your Supply Chain with the Neo4j Graph
PPTX
Big Data Technology Stack : Nutshell
PDF
Data Ingestion in Big Data and IoT platforms
PPTX
Introduction to Data Engineering
PDF
Building a Data Lake on AWS
The Great Debate: PostgreSQL vs MySQL
 
Data Mining and Business Intelligence Tools
Graph databases
Optimizing Your Supply Chain with the Neo4j Graph
Big Data Technology Stack : Nutshell
Data Ingestion in Big Data and IoT platforms
Introduction to Data Engineering
Building a Data Lake on AWS

What's hot (20)

PPT
Data Mining Case Study
PPT
Aggregate fact tables
PPT
Multimedia Mining
PPT
Dimensional Modeling
PPTX
Database Management System
PPT
Designing Scalable Data Warehouse Using MySQL
PPTX
Future of Data and AI in Retail - NRF 2023
PDF
NoSQL Essentials: Cassandra
PPTX
Azure data platform overview
PPTX
Power BI Overview
PPT
datamarts.ppt
PDF
Introduction to column oriented databases
PDF
BCA DATA STRUCTURES INTRODUCTION AND OVERVIEW SOWMYA JYOTHI
PDF
MLops workshop AWS
PPTX
Database Design and Normalization Techniques
PPTX
Entity Relationship Modelling
PDF
A glimpse of cassandra 4.0 features netflix
PPT
Database connectivity and web technologies
PDF
Big Data Visualization
PDF
Dell Technologies Dell EMC ISG Storage, CI, HCI and Data Protection Portfolio...
Data Mining Case Study
Aggregate fact tables
Multimedia Mining
Dimensional Modeling
Database Management System
Designing Scalable Data Warehouse Using MySQL
Future of Data and AI in Retail - NRF 2023
NoSQL Essentials: Cassandra
Azure data platform overview
Power BI Overview
datamarts.ppt
Introduction to column oriented databases
BCA DATA STRUCTURES INTRODUCTION AND OVERVIEW SOWMYA JYOTHI
MLops workshop AWS
Database Design and Normalization Techniques
Entity Relationship Modelling
A glimpse of cassandra 4.0 features netflix
Database connectivity and web technologies
Big Data Visualization
Dell Technologies Dell EMC ISG Storage, CI, HCI and Data Protection Portfolio...
Ad

Similar to PostgreSQL and MySQL (20)

PDF
Greatdebate Postgres vs Mysql
PDF
Free Software and the Future of Database Technology
PPT
Database Tendency
PDF
Embracing Database Diversity: The New Oracle / MySQL DBA - UKOUG
PDF
Reducing Database Pain & Costs with Postgres
 
PDF
My sql 5.5_product_update
PDF
MySQL overview
PDF
The Evolution of Open Source Databases
PDF
Heterogenous Persistence
PDF
Popularity of Open source databases
PPTX
Oracle mysql comparison
PPTX
Relational RDBMS : MySQL, PostgreSQL and SQL SERVER
PDF
My sql roadmap 2008 2009
PPTX
Rising Interest in Open Source Relational Databases
PDF
State of the Dolphin 2020 - 25th Anniversary of MySQL with 8.0.20
PDF
My Sql Presentation
PDF
MySQL DW Breakfast
PPTX
How big data moved the needle from monolithic SQL RDBMS to distributed NoSQL
PDF
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
PDF
MySQL & Oracle Linux Keynote at Open Source India 2014
Greatdebate Postgres vs Mysql
Free Software and the Future of Database Technology
Database Tendency
Embracing Database Diversity: The New Oracle / MySQL DBA - UKOUG
Reducing Database Pain & Costs with Postgres
 
My sql 5.5_product_update
MySQL overview
The Evolution of Open Source Databases
Heterogenous Persistence
Popularity of Open source databases
Oracle mysql comparison
Relational RDBMS : MySQL, PostgreSQL and SQL SERVER
My sql roadmap 2008 2009
Rising Interest in Open Source Relational Databases
State of the Dolphin 2020 - 25th Anniversary of MySQL with 8.0.20
My Sql Presentation
MySQL DW Breakfast
How big data moved the needle from monolithic SQL RDBMS to distributed NoSQL
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
MySQL & Oracle Linux Keynote at Open Source India 2014
Ad

More from PostgreSQL Experts, Inc. (20)

ODP
Shootout at the PAAS Corral
ODP
Shootout at the AWS Corral
ODP
Fail over fail_back
ODP
PostgreSQL Replication in 10 Minutes - SCALE
ODP
Give A Great Tech Talk 2013
PDF
Pg py-and-squid-pypgday
PDF
92 grand prix_2013
PDF
Five steps perform_2013
PDF
7 Ways To Crash Postgres
PDF
PWNage: Producing a newsletter with Perl
PDF
10 Ways to Destroy Your Community
PDF
Open Source Press Relations
PDF
5 (more) Ways To Destroy Your Community
PDF
Preventing Community (from Linux Collab)
PDF
Development of 8.3 In India
PDF
50 Ways To Love Your Project
PDF
8.4 Upcoming Features
PDF
Elephant Roads: PostgreSQL Patches and Variants
PDF
Writeable CTEs: The Next Big Thing
Shootout at the PAAS Corral
Shootout at the AWS Corral
Fail over fail_back
PostgreSQL Replication in 10 Minutes - SCALE
Give A Great Tech Talk 2013
Pg py-and-squid-pypgday
92 grand prix_2013
Five steps perform_2013
7 Ways To Crash Postgres
PWNage: Producing a newsletter with Perl
10 Ways to Destroy Your Community
Open Source Press Relations
5 (more) Ways To Destroy Your Community
Preventing Community (from Linux Collab)
Development of 8.3 In India
50 Ways To Love Your Project
8.4 Upcoming Features
Elephant Roads: PostgreSQL Patches and Variants
Writeable CTEs: The Next Big Thing

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Electronic commerce courselecture one. Pdf
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Modernizing your data center with Dell and AMD
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Chapter 3 Spatial Domain Image Processing.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Monthly Chronicles - July 2025
Unlocking AI with Model Context Protocol (MCP)
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Electronic commerce courselecture one. Pdf
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
MYSQL Presentation for SQL database connectivity
Modernizing your data center with Dell and AMD
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

PostgreSQL and MySQL

  • 1. Two great open source databases: a comparison Josh Berkus PostgreSQL Core Team HP Tux Talks, June 26, 2008
  • 2. Who is Josh? ● PostgreSQL Core Team – 10 years involvement with the project – Large database performance, tuning, project press & corporate relations, user groups ● Database geek – 15 years database application development – MS SQL, MySQL, Oracle, others ● Open Source guru – OpenOffice.org, LedgerSMB, Bricolage, OpenBRR, OSCON, more
  • 3. Topics ● Sound Bite ● History ● Most Common Uses ● Features ● Performance ● Summary
  • 4. Topics ● Sound Bite ● History QL t Most Common Uses eS ou ● gr ab ● Features st ly Po t Performance os ● m ● Summary
  • 5. Sound Bite "The most popular open source database" "The web database" "The world's most advanced open source database" "The open source Oracle"
  • 6. History of MySQL ● MySQL Server development started in 1994, marketed by TCX DataKonsult AB ● MySQL AB founded in 1995 by Michael “Monty” Widenius, David Axmark and Allan Larsson ● Server development based on requirements for practical production use: few features, but fast and stable ● Frequent releases with small changes ● Easy to install and use (15-minute rule)
  • 7. History of PostgreSQL • 1986: POSTGRES at the University of California, Berkeley > Michael Stonebraker project > Successor to INGRES • 1994: first commecialized > as Illustra (later merged into Informix) • 1995: open-sourced > Ported to SQL > PostgreSQL Global Development Group formed • 1997: ported to Japanese, supported in Japan • 1999: first full-time developers & corporate support • 2004: native Windows support • 2006: supported by Sun
  • 8. Development History Designed by/for Application Developers Designed by/for Database Administrators
  • 9. Development Priorities (historically) PostgreSQL MySQL 1.Data integrity 1.Ease-of-use 2.Security 2.Performance 3.Reliability 3.Programmer Features 4.Standards 4.Reliability 5.DB Features 5.DB Features 6.Performance 6.Data integrity 7.Ease-of-use 7.Security 8.Programmer Features 8.Standards
  • 10. Development Direction (a simplification) MySQL PostgreSQL Simple, Features, Easy to Use, Security, Fast Standards
  • 11. Community Owned by one company with user community Community-owned with many companies involved
  • 12. Core MySQL is 100% owned by Sun/MySQL ● 90% of MySQL developers work for Sun – except for the many storage engines ● MySQL has a large user community – many thousands active worldwide – many partners in other open source groups ● Sun/MySQL contributes to other OSS projects – PHP especially
  • 13. PostgreSQL has a large distributed developer and user community ● Not owned by any one company – dozens of companies and individuals contribute code – est. over 200 developers in 14 time zones ● "Community Owned" – supported by 5 different non-profits
  • 14. PostgreSQL Community Map Hackers Projects Committers User Groups and Co National Groups re Advocacy Companies Foundations
  • 15. Most Common Uses ● Web sites ● ERP ● CRM ● Data Warehouse ● Logging ● Geograpic ● OEM applications ● Web Sites ● Telecom (cluster) ● OEM applications ● Network tools ● Network tools ● Data Warehouse ● CRM
  • 16. Releases ● Feature-based releases ● Time-based releases – new features in minor – no new features in releases minor releases – every 1-3 years – every year ● 3.23: 2000 ● 7.4: 2003 ● 4.0: 2003 ● 8.0: 2004 ● 4.1: 2004 ● 8.1: 2005 ● 5.0: 2005 ● 8.2: 2006 ● 5.1: 2008? ● 8.3: 2008
  • 18. Storage Engines ● Pluggable "Storage Engines" allow MySQL to behave like a variety of different databases – Telecom DB: MySQL Cluster – Non-transactional: MyISAM – Transactional: InnoDB – Compressed: Archive – In-Memory: Memory – Write-only: Blackhole
  • 19. Programmer Features ● Excellent drivers for all languages – including JDBC4 ● PHP – high-performance drivers & special syntax – Native driver ● MySQL Proxy
  • 20. 3rd Party Support ● Most open source web projects default to MySQL – many use only MySQL – primary relational database for most top 25 web sites ● Hundreds of vendors support MySQL – more than 50% of multi-database products – many "MySQL Partners"
  • 21. MySQL Scale-Out ● Simple Replication makes it (relatively) simple to scale out – used by Google, Yahoo – load-balance reads on slaves – being supplanted by memcached Master Writes Reads Slave Requests Reads Load Balancer Slave
  • 22. Simplicity ● Easy to set up – "15 minute rule" – everything included ● Easy to administrate – programmer-administered – most installations don't need tuning ● Easy Replication – very simple master-slave & multimaster replication
  • 24. Migrateability ● Closest to proprietary enterprise DBs ● Automatic migration from Informix – Informix is 50% PostgreSQL ● Relatively easy migration from Oracle – easiest of any OSS database – puts migration cost within affordable range – tools for data integration ● SQL Server, DB2 harder – but easier that MySQL
  • 25. Security "... by default, PostgreSQL is the most security-aware database available ..." Database Hacker's Handbook (based on a comparison of PostgreSQL, MySQL, Oracle, DB2 and SQL Server)
  • 26. Security ● Authentication – multiple methods: login, SSL, Kerberos, more – host-based authentication ● Logging – log output is highly configurable and supports user auditing ● Permissions model – SQL ROLES supported, including nested roles – multiple settable permissions on all database objects
  • 27. Security ● Clean code – only one security patch per two months – community patches usually out in less 72 hours – only one exploit in the field in the last four years ● DB Auditing – PostgreSQL supports highly configurable triggers and other DB automation – No “auditing toolkit” out yet
  • 28. Transaction Support ● "Bulletproof" ACID thanks to MVCC – possibly best of any RDBMS ● Transactional DDL – apply schema changes in a transaction – great for change management ● including agile development ● Savepoints – spec-compliant "subtransactions"
  • 29. BI/DW Features ● Large database management features – tablespaces, table partitioning – automatic large field/row compression ● Powerful query planner & executor – complex queries with nested subselects, outer joins and calculated fields – large many-table joins with multiple join types ● Data mining features – full text indexing and regex support – embed external language DM modules
  • 30. BI/DW Features select a12.DAY_OF_WEEK_NBR AS DAY_OF_WEEK_NBR, max(TO_CHAR(a12.DATE_DESC ,'Day')) AS CustCol_6, a11.DATE_ID AS DATE_ID, max(a12.DATE_DESC) AS DATE_DESC, a11.FI_ID AS FI_ID, max(a13.FI_NAME) AS FI_NAME, a12.WEEK_YEAR_ID AS WEEK_YEAR_ID, max(a14.SHORT_WEEK_DESC) AS SHORT_WEEK_DESC, sum (session_count) AS WJXBFS1, sum ( a11_count ) AS WJXBFS2 from ( SELECT DATE_ID, FI_ID, count(distinct SESSION_ID) as session_count, COUNT(*) as a11_count FROM edata.WEB_SITE_ACTIVITY_FA WHERE DATE_ID in (2291, 2292, 2293, 2294, 2295) GROUP BY DATE_ID, FI_ID ) a11 join edata.DATE_LU a12 on (a11.DATE_ID = a12.DATE_ID) join edata.DIM_FI a13 on (a11.FI_ID = a13.FI_ID) join edata.WEEK_LU a14 on (a12.WEEK_YEAR_ID = a14.WEEK_YEAR_ID) group by a12.DAY_OF_WEEK_NBR, a11.DATE_ID, a11.FI_ID, a12.WEEK_YEAR_ID
  • 31. Extensibility ● Create your own database objects – almost any db object can be extended easily: ● functions ● types ● operators ● aggregates ● pseudo-tables – user-created objects are (usually) first class objects ● everything is a function – 12 different function languages
  • 32. Extensibility CREATE OR REPLACE FUNCTION _choose_random_text ( thestate _random_text, newvalue TEXT ) RETURNS _random_text AS $f$ DECLARE result _random_text; BEGIN result.runcount := COALESCE(thestate.runcount, 0) + 1; IF random() < ( 1::FLOAT / result.runcount::FLOAT ) THEN result.choice := newvalue; ELSE result.choice := thestate.choice; END IF; RETURN result; END; $f$ LANGUAGE plpgsql; CREATE AGGREGATE random_agg( BASETYPE = text, SFUNC = _choose_random_text, STYPE = _random_text, FINALFUNC = _exit_random_text );
  • 33. Special Data ● Base Types: ● Exotic types – char, varchar – geometric: polygon, line – large text – GIS (through PostGIS) – numeric – crypto – integers – ISN & ISBN – floats – XML – time, date, timestamp – network: INET, CIDR – bytea (for binary) – arrays – full text index – genome
  • 35. Special Data: genomics ● BLASTgres, Unison Protein Database
  • 36. Procedural Languages ● Use the language you prefer, inside the – database: – SQL – Java – PL/pgSQL – shell – C – R – C++ – PHP – Perl – Ruby – Python – Tcl ● In beta now: PSM, Lua
  • 37. PL/pgSQL create or replace function _set_self_paths ( ) returns trigger as $f$ declare parrec RECORD; has_kids BOOLEAN; begin --prevent setting order_by too high EXECUTE 'SELECT * FROM ' || TG_RELNAME || ' WHERE id = ' || CAST(NEW.parent as TEXT) INTO parrec; IF parrec.id is not null THEN NEW.path := parrec.path || (NEW.id::TEXT); NEW.order_path := parrec.order_path || to_char(NEW.order_by, 'FM0000'); NEW.show_path := parrec.show_path || ' / ' || NEW.name; ELSE NEW.path := text2ltree(NEW.id::TEXT); NEW.order_path := text2ltree(to_char(NEW.order_by, 'FM0000')); NEW.show_path := NEW.name; END IF; RETURN NEW; end; $f$ language plpgsql;
  • 38. PL/Perl CREATE FUNCTION "if_strip_numeric" (text,smallint) RETURNS text AS $f$ my($the_text, $cutoff) = @_; $the_text =~ s/[^0-9]/""/eg; if ( $cutoff > 0 ) { $the_text = ( substr $the_text, 0, $cutoff ); } return $the_text; $f$ LANGUAGE plperl IMMUTABLE, STRICT;
  • 39. PL/R create or replace function statsum(text) returns summarytup as ' sql<-paste("select id_val from sample_numeric_data ", "where ia_id=''", arg1, "''", sep="") rs <- pg.spi.exec(sql) rng <- range(rs[,1]) return(data.frame(mean = mean(rs[,1]), stddev = sd(rs[,1]), min = rng[1], max = rng[2], range = rng[2] - rng[1], count = length(rs[,1]))) ' language 'plr';
  • 40. PL/Java /** * Update a modification time when the row is updated. */ static void moddatetime(TriggerData td) throws SQLException { if(td.isFiredForStatement()) throw new TriggerException(td, "can't process STATEMENT events"); if(td.isFiredAfter()) throw new TriggerException(td, "must be fired before event"); if(!td.isFiredByUpdate()) throw new TriggerException(td, "can only process UPDATE events"); ResultSet _new = td.getNew(); String[] args = td.getArguments(); if(args.length != 1) throw new TriggerException(td, "one argument was expected"); _new.updateTimestamp(args[0], new Timestamp(System.currentTimeMillis())); }
  • 41. And Others ... HAI CAN HAS DATABUKKIT? I HAS A RESULT I HAS A RECORD GIMMEH RESULT OUTTA DATABUKKIT "SELECT field FROM mytable" IZ RESULT NOOB? YARLY BYES "SUMWUNZ IN YR PGSQL STEELIN YR DATA" KTHX IM IN YR LOOP GIMMEH RECORD OUTTA RESULT VISIBLE RECORD!!FIELD IZ RESULT NOOB? KTHXBYE IM OUTTA YR LOOP KTHXBYE
  • 42. Hackability ● Clean, easy to read code ● Modular interfaces with clean separation of layers ● #1 most hacked up database – Yahoo, Greenplum, Paraccel, Netezza, Truviso ....
  • 43. Performance Better with simple queries and 2-core machines Better with complex queries and multi-core machines
  • 44. Benchmarks – SpecJAppserver 2004, as of July 2007 J2EE Througput Acquisition Cost Comparison 900 200000 800 180000 700 160000 Cost in US Dollars 600 140000 500 120000 100000 400 80000 300 60000 200 40000 100 20000 0 0 MySQL PostgreSQL Oracle MySQL PostgreSQL Oracle
  • 45. Essential Performance 1)Every application performs best with the database for which is was designed. 2)Performance benchmarks for databases are constantly increasing. 3)Top databases are close enough that you can pick the one which suits you best.
  • 46. Questions? ● e-mail: josh@postgresql.org ● IRC: irc.freenode.net, #postgresql ● blog: blogs.ittoolbox.com/database/soup Special thanks to: Giuseppe Maxia and Harrison Fisk of MySQL for information about MySQL This talk is copyright 2008 Josh Berkus, and is licensed under the creative commons attribution license