SlideShare a Scribd company logo
		Designing Information Structures for Performance and ReliabilityKey elements to maximizing DB Server PerformanceBryan RandolIT/Systems Manager1
Designing Information Structures for Performance and Reliability : Discussion Outline DAY 1: Hardware Performance: Systematic Tuning ConceptsCPUMemory Architecture and Front-Side Bus (FSB)Data Flow ConceptsDisk ConsiderationsRAIDDAY 2: Database Performance:OLAP vs. OLTPGreenPlum vs. PostgreSQLPostgreSQL Concepts and Performance TweakingPSA v.1 – GreenPlum  AOPen mini-PCs “dbnode1-dbnode6”PSA v.2 – Tyan Transport w/PostgreSQLPSA v.3 – Current PSA Implementation, DELL PowerEdge 2950 w/PostgreSQL 8.32
3I.	Database Server Performance:  Hardware & Operating System ConsiderationsDAY 1: Hardware Performance
4Designing Information Structures for Performance and Reliability : Discussion OutlineSystematic tuning essentially follows these five steps:Assess the problem and establish numeric values that categorize acceptable behavior. (Know the system’s specifications and set realistic goals.)Measure the performance of the system before modification. (Benchmark)Identify the part of the system that is critical for improving the performance. This is called the “bottleneck”. (Analyze)Modify that part of the system to remove the bottleneck. (Upgrade/Tweak)Measure the performance of the system after 	modification. (Benchmark)Repeat steps 3-6 as needed.          (Continuous Improvement)
5I.	Database Server Performance: Data Flow ConceptsDB Files are stored in the filesystem on disk in blocks.A “job” is requested, initiating a “process thread”, associated files are read into memory “pages”.Memory pages are read into the CPU’s cache as needed.“Page-outs” to disk occur to make space as needed. “Page-ins “ fromdisk are what slows down performanceOnce in CPU cache, jobs are processed in threads per CPU (or “core”).
6I.	Database Server Performance:  Hardware & Operating System ConsiderationsServer Performance Considerations:CPU:	Each CPU has at least one core, each core processes jobs (threads) sequentially based on the job’s priority. Higher priority jobs get more CPU time. Multi-threaded jobs are distributed evenly across all cores (“parallelized”).Internal Clock Speed: Operations the CPU can process internally per second in MHz, as advertised.External Clock Speed:  Speed at which the CPU interacts with 	the rest of the system….also known as the front side bus (FSB).Memory Clock Speed: Speed at which RAM is given requests for data.Important PostgreSQL Performance Note:PostgreSQL uses a multi-process model, meaning each database connection has its own Unix process. Because of this, all multi-cpu operating systems can spread multiple database connections among the available CPUs. 	However, if only a single database connection is active, it can only use one CPU. 	PostgreSQL does not use multi-threading to allow a single process to use multiple CPUs.
7I.	Database Server Performance:  Hardware & Operating System ConsiderationsServer Performance Considerations:Memory Architecture and FSB (Front Side Bus):	On Intel based computers the CPU interfaces with memory through the “North Bridge” memory controller, across the FSB (Front Side Bus).	FSB speed and the NorthBridge MMU (memory management unity) drastically affects the server’s performance, as it determines how fast data can be fed into the CPU from memory.	Unless special care is taken, a database	server running even a simple sequential 	scan on a table will spend 95% of its cycles	waitingfor memory to be accessed.	This memory access bottleneck is even more 	difficult to avoid in more complex database 	operations such as sorting, aggregation and join, which exhibit a random access pattern.	Database algorithms and data structures 	should therefore be designed and optimized 	for memory access from the outset.
8I.	Database Server Performance:  Hardware & Operating System ConsiderationsIntel “Xeon” based systems: Memory Access ChallengesFSB is a fixed frequency and requires a separate chip to access memory.Newer processors will run at the same fixed FSB speed. Memory access is delayed by passing through the separate controller chip. Both Processors share the same Front Side Buseffectively halving each processors bandwidth to memory, thereby stalling one processor while the other is accessing memory or I/O.All processor to system I/O and control must use this one path. One interleaved memory bank for both processors,  again, effectively halving each processor’s bandwidth to memory. Half the bandwidth of a 2 memory bank architecture. All program access to graphics, PCI(e), PCI-X or other I/O must be through this bottleneck
9I.	Database Server Performance:  Hardware & Operating System ConsiderationsMultiprocessing Memory Access ApproachesIntel Xeon Multiprocessing  “1st Gen.”FSB cuts bandwidth per CPU
NorthBridge controller produces overhead
UMA (Uniform Memory Access)Access to memory banks is “uniform”.AMD Multiprocessing“HyperTransport”
 FSB is on the CPU
NUMA (Non-Uniform Memory Access)Latency to each memory bank varies
10I.	Database Server Performance:  Hardware & Operating System ConsiderationsIntel “Harpertown” Xeon ImprovementsDELL PowerEdge 2950 III(2 x Xeon E5405 = 8 cores)4 cores/CPU + faster FSB ( >= 1333MHz)Northbridge Controller bandwidth increased to 21.3GB/sreads from memory, and 10.7GB/swrites into memory…32GB/s overall bandwidth.DELL PowerEdge 1950(2 x Xeon E5405 = 8 cores)
11I.	Database Server Performance:  Hardware & Operating System ConsiderationsDisk Considerations (secondary storage):Seek Time/Rotational Delay:	How fast the read/write head is positioned appropriately for reading/writing and how fast the addressed area is placed under the read/write head for data transfer…SATA  (Serial Advanced Technology Attachment) drives  are cheap and come in sizes up to 2.5TB, typically maxing out at 7200RPMs. (“Velociraptor” is the exception @ 10,000RPM)SAS (Serial Attached SCSI) drives are twice as fast (15,000 RPMS)  and typically twice as expensive, with roughly 1/5 the max capacity of SATA (~450GB).Bandwidth/Throughput (Transfer Time):Raw throughput rate at which data is transferred from disk into memory.  This can be aggregated using RAID, which will be discussed later.SATA-I bandwidth is 1Gb/s which translates into ~ 150MB/s  real speed.SATA-II and SAS bandwidth is 3Gb/s, which translates into ~ 300MB/s real speed.
12I.	Database Server Performance:  Hardware & Operating System ConsiderationsDisk Considerations (secondary storage):Buffer/Cache:	Disks contain intelligent controllers, read cache and write cache. When you ask for a given piece of data, the disk locates the data and sends it back to the motherboard. It also reads the rest of the track and caches this data on the assumption that you will want the next piece of data on the disk. 	This data is stored locally in its read cache. If, sometime later you request the next piece of data and it is in the read cache the disk can deliver it with almost no delay.Write back cache improves performance, because a 	write to the high-speed cache is faster than writes to 	normal 	RAM or disk….this cache aids in addressing the 	disk-to-	memory subsystem bottleneck.	Most good drives feature a 32MB buffer cache.
13I.	Database Server Performance:  Hardware & Operating System ConsiderationsDisk Considerations :4.	Track Data Density :	Defines how much information can be stored on a given track. 	The higher the track data density, the more information the disk 	can store.	If a disk can store more data on one track it does not have to 	move the head to the next track as often. 	This means that the higher the recording 	density the lower the chances are that the 	head will have to be moved to the next track 	to get the required data.
14I.	Database Server Performance:  Hardware & Operating System ConsiderationsDisk Considerations:5.	RAID: (n = number of drives in array)	“Redundant Array of Inexpensive Disks”. Pools disks together to aggregate their throughput by “striping” data in segments across each disk. Also provides fault-tolerance. (n = number of drives)RAID0 “Striping” (n) : Fastest due to no parity…raw cumulative speed. Single drive failure causes the entire array to fail. “All-or-none”	RAID1 “Mirroring” (n/2):  Each drive is mirrored, speed and capacity is ½ of RAID0, requires even number of disks in order to be divided. Entire source or mirror array can go bad before data is jeopardized.RAID5 “Striping w/Parity” (n – 1):  Fast, with a drive set aside for fault-tolerance.  Only one drive can fail before the array is lost.RAID6 “Striping with dual Parity” (n -2): Fast, with 2 drives set aside for fault tolerance. Two drives can fail before the array is lost.
15I.	Database Server Performance:  Hardware & Operating System ConsiderationsDisk Considerations:RAID controller	Device responsible for managing the disk drives in an array.	Stores the RAID configuration while also providing additional disk cache. Offloads costly checksum routines from CPU in parity driven RAID configurations (e.g. RAID5 and RAID6)	The type of internal and external interface dramatically impacts the overall I/O performance of the array.Internal bus interface should be PCIe v2.0 (500 MB/s per lane throughput). 	Most common cards are  x2, x4, and x8  “lanes” providing:  1GB/s, 2GB/s, and 4 GB/s throughput respectively.	Notable external storage interfaces to the array enclosure include:
16I.	Database Server Performance:  Hardware & Operating System ConsiderationsFilesystem ConsiderationsAs an easy performance boost with no downside, make sure the file system on which your database is kept is mounted "noatime", which turns off the access time bookkeeping.XFS is a 64-bit filesystem, supports a maximum filesystem size of 8 binary exabytes minus one byte.On 32-bit Linux systems, XFS is “limited” to 16 binary terabytes.Journal updates in XFS are performed asynchronously to prevent a performance penalty.Files and directories in XFS can span allocation groups, each allocation group manages its own inode tables (unlike EXT3/EXT2), providing scalability and parallelism.Multiple threads and processes can perform I/O operations on the same filesystem simultaneously.On a RAID array, a “stripe unit” can be specified within XFS at creation time. This maximizes throughput by aligning  inode allocations with RAID stripe sizes.XFS provides a 64-bit sparse address space for each file, which allows both for very large file sizes, and for holes within files for which no disk space is allocated.
17I.	Database Server Performance:  Hardware & Operating System ConsiderationsTakeaways from Hardware Performance Concepts:	Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data.Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support.More CPU cores allows you to parallelize workloads. 	A multithreaded database takes advantage of multi-processing by        	distributing a query into several threads across multiple CPUs, 	drastically increasing the query’s efficiency while reducing its 	process time.Faster disks with high bandwidth and low seek times maximize 	read performance into memory for CPUs to process complex queries. 	OLAP databases benefit from this because they scan large datasets 	frequently.	Using RAID allows you to aggregate disk  I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
18I.	Database Server Performance:  Hardware & Operating System ConsiderationsDAY 2: Database Performance
19II.	Software & Application Considerations: OLAP and OLTP   OLAP (Online Analytical Processing):Provides big picture, supports analysis, needs aggregate data, evaluates all datasets quickly, uses a multidimensional model.DB size is typically 100GB to several TB (even petabytes)Mostly read-only operations, lots of scans, complex queries.Benefits from  multi-threading, parallel processing, and fast drives with highread throughput/low seek times.Key Performance Metrics:  Query throughput/Response time.OLTP (Online Transactional Processing): Provides detailed audit, supports operations, needs detailed 	   data, finds one dataset quickly, uses a relational model.DB size typically < 100GBShort, atomic transactions. Heavy emphasis on lightning fast writes.Key Performance Metrics:  Transaction Throughput, Availability
20II.	Software & Application Considerations: OLAP and OLTP   Database Types:OLAP (Online Analytical Processing):OLAP databases should only receive historical business data and remain isolated from OLTP (transactional) databases. Summaries not transactions.Data in OLAP databases never change, OLTP data constantly changes.OLAP databases typically contain fewer tables arranged into a “star” or “snowflake” schema. The central table in this star schema is called the “fact table”. The leaf tables are called “dimension tables”. The facts within a dimension table are called “members”.The joins between the dimension and fact tables allow you to browse through the facts across any number of dimensions.The simple design of the star schema makes it easier to write queries, and they run faster. OLTP database could involve dozens of tables, making query design complicated. In addition, the resulting query could take hours to run.OLAP databases make heavy use of indexes because they help find records in less time. In contrast, OLTP databases avoid them because they lengthen the process of inserting data.
21II.	Software & Application Considerations: OLAP and OLTP   Database Types:OLAP (Online Analytical Processing):The process by which OLAP databases are populated is called: Extract, Transform, and Load (ETL). No direct data-entries are made into a OLAP database, only summaritive bulk ETL transactions.A cube aggregates the facts in each level of each dimension in a given OLAP schema. Because the cube contains all of the data in an aggregated form, it seems to know the answers to queries in advance.This arrangement of data into cubes overcomes a limitation of relational databases.
22II.	Software & Application Considerations: OLAP and OLTP   OLAP (Online Analytical Processing):What happens during a query?Client statement is issued Database Server Processes the query by locating extents Data is found on DiskResults are sent through database server to client.
23II.	Software & Application Considerations: PostgreSQL Query FlowPostgreSQL:  The Path of a Query1.  Connection from Application.2.  Parsing Stage3.  Rewrite Stage4.  Cost comparison and Plan/Optimization Stage5.  Execution Stage6.  Result
24II.	Software & Application Considerations: OLAP and OLTP   GreenPlum and PostgreSQL:Of the open source database options, PostgreSQL is the most robust, object-relational database management system.GreenPlum is a commercially based PostgreSQL DBMS, adding enterprise (OLAP) oriented enhancements to PostgreSQL, promising the following features: Economical Petabyte Scaling
 Massively Parallel Query Execution
 Unified Analytical Processing
 Shared-nothing massively parallel processing architecture
 Fault tolerance
 Linear Scalability
 “In-database” compression, 3-10x disk space reduction,     with corresponding I/O improvement.License was $20,000 every 6 months ($40,000/yr.)It’s important to note that PostgreSQL is free and can be modified to perform similarly to GreenPlum. We did just that with our PSA server reconstruction project.
PostgreSQL tweaks explained:PostgreSQL is tweaked through a configuration file called: “postgresql.conf” This flat file contains several dozen parameters from which the masterPostgreSQL service “postmaster” reads at startup. Changes made to this file require the “postgresql “ service to be bounced (restarted) via the command as root: “service postgresql restart”Corresponding “postgresql.conf” parameter affecting query performance:Maximum Connections (max_connections): Determines the maximum number of concurrent connections to the database server.  Keep in mind that this figure is used as a multiplier for work_mem.  Shared Buffers (shared_buffers):  The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL to use for caching data. If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system. Working Memory (work_mem):  If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents. 25II.	Software & Application Considerations: PostgreSQL Tweaks

More Related Content

PPSX
Coa presentation3
PDF
505 kobal exadata
PPTX
Exploring Of System Hardware
PPT
Unit 1 four part pocessor and memory
DOCX
Memory managment
PPT
Components of System Unit
PPTX
SSD HSD storage drives
Coa presentation3
505 kobal exadata
Exploring Of System Hardware
Unit 1 four part pocessor and memory
Memory managment
Components of System Unit
SSD HSD storage drives

What's hot (18)

PDF
Microsoft SQL Server 2014 in memory oltp tdm white paper
PPTX
Computer Memory Hierarchy Computer Architecture
PPTX
logical memory-organisation
PDF
internal_memory
PPTX
Performance Tuning
PPTX
Linux Memory Management
PPT
Windows memory manager internals
PPTX
Linux memory-management-kamal
PDF
Presentation db2 best practices for optimal performance
DOC
How a cpu works1
PPTX
Linux Memory Management with CMA (Contiguous Memory Allocator)
PDF
PAM g.tr 3832
PPT
Dheeraj chugh -_presentation_on_ms-dos
PPT
Presentacion pujol
PPT
Memory Organization
PPT
Lec10. Memory and storage
PDF
2. the memory systems (module2)
PPTX
Project Presentation Final
Microsoft SQL Server 2014 in memory oltp tdm white paper
Computer Memory Hierarchy Computer Architecture
logical memory-organisation
internal_memory
Performance Tuning
Linux Memory Management
Windows memory manager internals
Linux memory-management-kamal
Presentation db2 best practices for optimal performance
How a cpu works1
Linux Memory Management with CMA (Contiguous Memory Allocator)
PAM g.tr 3832
Dheeraj chugh -_presentation_on_ms-dos
Presentacion pujol
Memory Organization
Lec10. Memory and storage
2. the memory systems (module2)
Project Presentation Final
Ad

Viewers also liked (20)

PPTX
SaletteUSAtv MobileTech Classes 2012
DOC
doc - Κινητές Βιβλιοθήκες: διευρύνοντας τον ορίζοντα των συνεργασιών στις λαϊ...
PPTX
Writing J27
PDF
Analects Wrksht
PPTX
Kevin Mexco
PPTX
First Week
DOCX
Reportes de-evaluacion-2014-2015
PPT
El patito feo
PPTX
PPT
Library 2.0: Wikis
PDF
Portugal Ii
PDF
Forum May 2011 The NYS ALBETAC at NYU
PPT
Dizzee Rascal Research Tongue N Cheek
PPT
Content Pages
DOC
DFPS Request for Background Check
PDF
ACTA PRIMERA REUNIÓN Red academica E.I
PPS
Ανάπτυξη καταλόγων: η χρήση των AACR και του UNIMARC από τις ελληνικές δημόσι...
PPTX
Redis varnish js
PPT
Dvd digipaks examples
DOCX
Documentation required for Foster/Adopt Program
SaletteUSAtv MobileTech Classes 2012
doc - Κινητές Βιβλιοθήκες: διευρύνοντας τον ορίζοντα των συνεργασιών στις λαϊ...
Writing J27
Analects Wrksht
Kevin Mexco
First Week
Reportes de-evaluacion-2014-2015
El patito feo
Library 2.0: Wikis
Portugal Ii
Forum May 2011 The NYS ALBETAC at NYU
Dizzee Rascal Research Tongue N Cheek
Content Pages
DFPS Request for Background Check
ACTA PRIMERA REUNIÓN Red academica E.I
Ανάπτυξη καταλόγων: η χρήση των AACR και του UNIMARC από τις ελληνικές δημόσι...
Redis varnish js
Dvd digipaks examples
Documentation required for Foster/Adopt Program
Ad

Similar to Designing Information Structures For Performance And Reliability (20)

PPTX
Disks.pptx
PPTX
IO Dubi Lebel
PPTX
I/O System and Case study
PDF
Shak larry-jeder-perf-and-tuning-summit14-part2-final
PPTX
Introduction to Hard Disk Drive by Vishal Garg
PDF
Topics - , Addressing modes, GPU, .pdf
PPT
L21-Introduction-to-IO.ppt
PPTX
04.01 file organization
PPTX
PDF
Blake Novak semester 2 presentation on overclocking and heat
PDF
Performance Whack A Mole
PPTX
PPTX
GROUP 1 - CPU AND RANDOM ACCESS MEMORY.pptx
PPT
SQL 2005 Disk IO Performance
PPTX
TLE - 8 Perform Mensuration and Calculation
PDF
Performance Whack-a-Mole Tutorial (pgCon 2009)
PPTX
PPTX
Introduction to Storage technologies
PPT
Chap2 5e u v2 - theory
PPTX
System's Specification
Disks.pptx
IO Dubi Lebel
I/O System and Case study
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Introduction to Hard Disk Drive by Vishal Garg
Topics - , Addressing modes, GPU, .pdf
L21-Introduction-to-IO.ppt
04.01 file organization
Blake Novak semester 2 presentation on overclocking and heat
Performance Whack A Mole
GROUP 1 - CPU AND RANDOM ACCESS MEMORY.pptx
SQL 2005 Disk IO Performance
TLE - 8 Perform Mensuration and Calculation
Performance Whack-a-Mole Tutorial (pgCon 2009)
Introduction to Storage technologies
Chap2 5e u v2 - theory
System's Specification

Designing Information Structures For Performance And Reliability

  • 1. Designing Information Structures for Performance and ReliabilityKey elements to maximizing DB Server PerformanceBryan RandolIT/Systems Manager1
  • 2. Designing Information Structures for Performance and Reliability : Discussion Outline DAY 1: Hardware Performance: Systematic Tuning ConceptsCPUMemory Architecture and Front-Side Bus (FSB)Data Flow ConceptsDisk ConsiderationsRAIDDAY 2: Database Performance:OLAP vs. OLTPGreenPlum vs. PostgreSQLPostgreSQL Concepts and Performance TweakingPSA v.1 – GreenPlum AOPen mini-PCs “dbnode1-dbnode6”PSA v.2 – Tyan Transport w/PostgreSQLPSA v.3 – Current PSA Implementation, DELL PowerEdge 2950 w/PostgreSQL 8.32
  • 3. 3I. Database Server Performance: Hardware & Operating System ConsiderationsDAY 1: Hardware Performance
  • 4. 4Designing Information Structures for Performance and Reliability : Discussion OutlineSystematic tuning essentially follows these five steps:Assess the problem and establish numeric values that categorize acceptable behavior. (Know the system’s specifications and set realistic goals.)Measure the performance of the system before modification. (Benchmark)Identify the part of the system that is critical for improving the performance. This is called the “bottleneck”. (Analyze)Modify that part of the system to remove the bottleneck. (Upgrade/Tweak)Measure the performance of the system after modification. (Benchmark)Repeat steps 3-6 as needed. (Continuous Improvement)
  • 5. 5I. Database Server Performance: Data Flow ConceptsDB Files are stored in the filesystem on disk in blocks.A “job” is requested, initiating a “process thread”, associated files are read into memory “pages”.Memory pages are read into the CPU’s cache as needed.“Page-outs” to disk occur to make space as needed. “Page-ins “ fromdisk are what slows down performanceOnce in CPU cache, jobs are processed in threads per CPU (or “core”).
  • 6. 6I. Database Server Performance: Hardware & Operating System ConsiderationsServer Performance Considerations:CPU: Each CPU has at least one core, each core processes jobs (threads) sequentially based on the job’s priority. Higher priority jobs get more CPU time. Multi-threaded jobs are distributed evenly across all cores (“parallelized”).Internal Clock Speed: Operations the CPU can process internally per second in MHz, as advertised.External Clock Speed: Speed at which the CPU interacts with the rest of the system….also known as the front side bus (FSB).Memory Clock Speed: Speed at which RAM is given requests for data.Important PostgreSQL Performance Note:PostgreSQL uses a multi-process model, meaning each database connection has its own Unix process. Because of this, all multi-cpu operating systems can spread multiple database connections among the available CPUs. However, if only a single database connection is active, it can only use one CPU. PostgreSQL does not use multi-threading to allow a single process to use multiple CPUs.
  • 7. 7I. Database Server Performance: Hardware & Operating System ConsiderationsServer Performance Considerations:Memory Architecture and FSB (Front Side Bus): On Intel based computers the CPU interfaces with memory through the “North Bridge” memory controller, across the FSB (Front Side Bus). FSB speed and the NorthBridge MMU (memory management unity) drastically affects the server’s performance, as it determines how fast data can be fed into the CPU from memory. Unless special care is taken, a database server running even a simple sequential scan on a table will spend 95% of its cycles waitingfor memory to be accessed. This memory access bottleneck is even more difficult to avoid in more complex database operations such as sorting, aggregation and join, which exhibit a random access pattern. Database algorithms and data structures should therefore be designed and optimized for memory access from the outset.
  • 8. 8I. Database Server Performance: Hardware & Operating System ConsiderationsIntel “Xeon” based systems: Memory Access ChallengesFSB is a fixed frequency and requires a separate chip to access memory.Newer processors will run at the same fixed FSB speed. Memory access is delayed by passing through the separate controller chip. Both Processors share the same Front Side Buseffectively halving each processors bandwidth to memory, thereby stalling one processor while the other is accessing memory or I/O.All processor to system I/O and control must use this one path. One interleaved memory bank for both processors,  again, effectively halving each processor’s bandwidth to memory. Half the bandwidth of a 2 memory bank architecture. All program access to graphics, PCI(e), PCI-X or other I/O must be through this bottleneck
  • 9. 9I. Database Server Performance: Hardware & Operating System ConsiderationsMultiprocessing Memory Access ApproachesIntel Xeon Multiprocessing “1st Gen.”FSB cuts bandwidth per CPU
  • 11. UMA (Uniform Memory Access)Access to memory banks is “uniform”.AMD Multiprocessing“HyperTransport”
  • 12. FSB is on the CPU
  • 13. NUMA (Non-Uniform Memory Access)Latency to each memory bank varies
  • 14. 10I. Database Server Performance: Hardware & Operating System ConsiderationsIntel “Harpertown” Xeon ImprovementsDELL PowerEdge 2950 III(2 x Xeon E5405 = 8 cores)4 cores/CPU + faster FSB ( >= 1333MHz)Northbridge Controller bandwidth increased to 21.3GB/sreads from memory, and 10.7GB/swrites into memory…32GB/s overall bandwidth.DELL PowerEdge 1950(2 x Xeon E5405 = 8 cores)
  • 15. 11I. Database Server Performance: Hardware & Operating System ConsiderationsDisk Considerations (secondary storage):Seek Time/Rotational Delay: How fast the read/write head is positioned appropriately for reading/writing and how fast the addressed area is placed under the read/write head for data transfer…SATA (Serial Advanced Technology Attachment) drives are cheap and come in sizes up to 2.5TB, typically maxing out at 7200RPMs. (“Velociraptor” is the exception @ 10,000RPM)SAS (Serial Attached SCSI) drives are twice as fast (15,000 RPMS) and typically twice as expensive, with roughly 1/5 the max capacity of SATA (~450GB).Bandwidth/Throughput (Transfer Time):Raw throughput rate at which data is transferred from disk into memory. This can be aggregated using RAID, which will be discussed later.SATA-I bandwidth is 1Gb/s which translates into ~ 150MB/s real speed.SATA-II and SAS bandwidth is 3Gb/s, which translates into ~ 300MB/s real speed.
  • 16. 12I. Database Server Performance: Hardware & Operating System ConsiderationsDisk Considerations (secondary storage):Buffer/Cache: Disks contain intelligent controllers, read cache and write cache. When you ask for a given piece of data, the disk locates the data and sends it back to the motherboard. It also reads the rest of the track and caches this data on the assumption that you will want the next piece of data on the disk. This data is stored locally in its read cache. If, sometime later you request the next piece of data and it is in the read cache the disk can deliver it with almost no delay.Write back cache improves performance, because a write to the high-speed cache is faster than writes to normal RAM or disk….this cache aids in addressing the disk-to- memory subsystem bottleneck. Most good drives feature a 32MB buffer cache.
  • 17. 13I. Database Server Performance: Hardware & Operating System ConsiderationsDisk Considerations :4. Track Data Density : Defines how much information can be stored on a given track. The higher the track data density, the more information the disk can store. If a disk can store more data on one track it does not have to move the head to the next track as often. This means that the higher the recording density the lower the chances are that the head will have to be moved to the next track to get the required data.
  • 18. 14I. Database Server Performance: Hardware & Operating System ConsiderationsDisk Considerations:5. RAID: (n = number of drives in array) “Redundant Array of Inexpensive Disks”. Pools disks together to aggregate their throughput by “striping” data in segments across each disk. Also provides fault-tolerance. (n = number of drives)RAID0 “Striping” (n) : Fastest due to no parity…raw cumulative speed. Single drive failure causes the entire array to fail. “All-or-none” RAID1 “Mirroring” (n/2): Each drive is mirrored, speed and capacity is ½ of RAID0, requires even number of disks in order to be divided. Entire source or mirror array can go bad before data is jeopardized.RAID5 “Striping w/Parity” (n – 1): Fast, with a drive set aside for fault-tolerance. Only one drive can fail before the array is lost.RAID6 “Striping with dual Parity” (n -2): Fast, with 2 drives set aside for fault tolerance. Two drives can fail before the array is lost.
  • 19. 15I. Database Server Performance: Hardware & Operating System ConsiderationsDisk Considerations:RAID controller Device responsible for managing the disk drives in an array. Stores the RAID configuration while also providing additional disk cache. Offloads costly checksum routines from CPU in parity driven RAID configurations (e.g. RAID5 and RAID6) The type of internal and external interface dramatically impacts the overall I/O performance of the array.Internal bus interface should be PCIe v2.0 (500 MB/s per lane throughput). Most common cards are x2, x4, and x8 “lanes” providing: 1GB/s, 2GB/s, and 4 GB/s throughput respectively. Notable external storage interfaces to the array enclosure include:
  • 20. 16I. Database Server Performance: Hardware & Operating System ConsiderationsFilesystem ConsiderationsAs an easy performance boost with no downside, make sure the file system on which your database is kept is mounted "noatime", which turns off the access time bookkeeping.XFS is a 64-bit filesystem, supports a maximum filesystem size of 8 binary exabytes minus one byte.On 32-bit Linux systems, XFS is “limited” to 16 binary terabytes.Journal updates in XFS are performed asynchronously to prevent a performance penalty.Files and directories in XFS can span allocation groups, each allocation group manages its own inode tables (unlike EXT3/EXT2), providing scalability and parallelism.Multiple threads and processes can perform I/O operations on the same filesystem simultaneously.On a RAID array, a “stripe unit” can be specified within XFS at creation time. This maximizes throughput by aligning inode allocations with RAID stripe sizes.XFS provides a 64-bit sparse address space for each file, which allows both for very large file sizes, and for holes within files for which no disk space is allocated.
  • 21. 17I. Database Server Performance: Hardware & Operating System ConsiderationsTakeaways from Hardware Performance Concepts: Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data.Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support.More CPU cores allows you to parallelize workloads. A multithreaded database takes advantage of multi-processing by distributing a query into several threads across multiple CPUs, drastically increasing the query’s efficiency while reducing its process time.Faster disks with high bandwidth and low seek times maximize read performance into memory for CPUs to process complex queries. OLAP databases benefit from this because they scan large datasets frequently. Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
  • 22. 18I. Database Server Performance: Hardware & Operating System ConsiderationsDAY 2: Database Performance
  • 23. 19II. Software & Application Considerations: OLAP and OLTP OLAP (Online Analytical Processing):Provides big picture, supports analysis, needs aggregate data, evaluates all datasets quickly, uses a multidimensional model.DB size is typically 100GB to several TB (even petabytes)Mostly read-only operations, lots of scans, complex queries.Benefits from multi-threading, parallel processing, and fast drives with highread throughput/low seek times.Key Performance Metrics: Query throughput/Response time.OLTP (Online Transactional Processing): Provides detailed audit, supports operations, needs detailed data, finds one dataset quickly, uses a relational model.DB size typically < 100GBShort, atomic transactions. Heavy emphasis on lightning fast writes.Key Performance Metrics: Transaction Throughput, Availability
  • 24. 20II. Software & Application Considerations: OLAP and OLTP Database Types:OLAP (Online Analytical Processing):OLAP databases should only receive historical business data and remain isolated from OLTP (transactional) databases. Summaries not transactions.Data in OLAP databases never change, OLTP data constantly changes.OLAP databases typically contain fewer tables arranged into a “star” or “snowflake” schema. The central table in this star schema is called the “fact table”. The leaf tables are called “dimension tables”. The facts within a dimension table are called “members”.The joins between the dimension and fact tables allow you to browse through the facts across any number of dimensions.The simple design of the star schema makes it easier to write queries, and they run faster. OLTP database could involve dozens of tables, making query design complicated. In addition, the resulting query could take hours to run.OLAP databases make heavy use of indexes because they help find records in less time. In contrast, OLTP databases avoid them because they lengthen the process of inserting data.
  • 25. 21II. Software & Application Considerations: OLAP and OLTP Database Types:OLAP (Online Analytical Processing):The process by which OLAP databases are populated is called: Extract, Transform, and Load (ETL). No direct data-entries are made into a OLAP database, only summaritive bulk ETL transactions.A cube aggregates the facts in each level of each dimension in a given OLAP schema. Because the cube contains all of the data in an aggregated form, it seems to know the answers to queries in advance.This arrangement of data into cubes overcomes a limitation of relational databases.
  • 26. 22II. Software & Application Considerations: OLAP and OLTP OLAP (Online Analytical Processing):What happens during a query?Client statement is issued Database Server Processes the query by locating extents Data is found on DiskResults are sent through database server to client.
  • 27. 23II. Software & Application Considerations: PostgreSQL Query FlowPostgreSQL: The Path of a Query1. Connection from Application.2. Parsing Stage3. Rewrite Stage4. Cost comparison and Plan/Optimization Stage5. Execution Stage6. Result
  • 28. 24II. Software & Application Considerations: OLAP and OLTP GreenPlum and PostgreSQL:Of the open source database options, PostgreSQL is the most robust, object-relational database management system.GreenPlum is a commercially based PostgreSQL DBMS, adding enterprise (OLAP) oriented enhancements to PostgreSQL, promising the following features: Economical Petabyte Scaling
  • 29. Massively Parallel Query Execution
  • 31. Shared-nothing massively parallel processing architecture
  • 34. “In-database” compression, 3-10x disk space reduction, with corresponding I/O improvement.License was $20,000 every 6 months ($40,000/yr.)It’s important to note that PostgreSQL is free and can be modified to perform similarly to GreenPlum. We did just that with our PSA server reconstruction project.
  • 35. PostgreSQL tweaks explained:PostgreSQL is tweaked through a configuration file called: “postgresql.conf” This flat file contains several dozen parameters from which the masterPostgreSQL service “postmaster” reads at startup. Changes made to this file require the “postgresql “ service to be bounced (restarted) via the command as root: “service postgresql restart”Corresponding “postgresql.conf” parameter affecting query performance:Maximum Connections (max_connections): Determines the maximum number of concurrent connections to the database server. Keep in mind that this figure is used as a multiplier for work_mem. Shared Buffers (shared_buffers): The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL to use for caching data. If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system. Working Memory (work_mem): If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents. 25II. Software & Application Considerations: PostgreSQL Tweaks
  • 36. 26The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf.The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf.II. Software & Application Considerations: PostgreSQL TweaksPostgreSQL tweaks explained:Shared BuffersPostgreSQL does not directly change information on disk. Instead, it requests data be read into the PostgreSQL shared buffer cache. PostgreSQL backends then read/write these blocks, and finally flush them back to disk.Backends that need to access tables first look for needed blocks in this cache. If they are already there, they can continue processing right away. If not, an operating system request is made to load the blocks. The blocks are loaded either from the kernel disk buffer cache, or from disk. These can be expensive operations. The default PostgreSQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find information in cache...to a limit.
  • 37. 27The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf.The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf.II. Software & Application Considerations: PostgreSQL TweaksPostgreSQL tweaks explained:Shared Buffers “How much is too much?” Setting “shared_buffers” too high results in expensive “paging”...which severely degrades the database’s performance.If everything doesn't fit in RAM, the kernel starts forcing memory pages to a disk area called swap. It moves pages that have not been used recently. This operation is called a swap pageout. Pageouts are not a problem because they happen during periods of inactivity. What is bad is when these pages have to be brought back in from swap, meaning an old page that was moved out to swap has to be moved back into RAM. This is called a swap pagein.This is bad because while the page is moved from swap, the program is suspended until the pagein completes.
  • 38. PostgreSQL tweaks explained:Horizontal “Range” Partitioning:Also known as “shard” involves putting different rows into different tables for improved manageability and performance. Benefits of partitioning include:Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory.When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table.Seldom-used data can be migrated to cheaper and slower storage media. 28II. Software & Application Considerations: PostgreSQL Tweaks
  • 39. PostgreSQL tweaks explained:Partitioning (cont.)The benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server. The following forms of partitioning can be implemented in PostgreSQL: Range Partitioning (aka “Horizontal”)The table is partitioned into "ranges" defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example one might partition by date ranges, or by ranges of identifiers for particular business objects. List Partitioning The table is partitioned by explicitly listing which key values appear in each partition. 29II. Software & Application Considerations: PostgreSQL Tweaks
  • 40. PostgreSQL tweaks explained:VACUUM:Ensures database is ACIDAtomicConsistentIsolatedDurablePostgreSQL uses MVCC (Multi-version Concurrency Control)…eliminating read locks on records by allowing several versions of data to exist in a database.VACUUM removes old versions of this multi-versioned data in base tables from the database. These old versions waste space once a commit is made.To keep a PostgreSQL database performing well, you must ensure VACUUM is run correctly.AUTOVACUUM suffices for our query based, low transaction database, keeping dead space to a minimum. 30II. Software & Application Considerations: PostgreSQL Tweaks
  • 41. 31III. PSA Server Case Studies: AOPen mini-PCs + GreenPlum PSA Server (v1): “dbnode1 – dbnode6”Originally, PSA was hosted on GreenPlum using 6 AOpen mini-PC nodes.Performance was slow, disk I/O was roughly 90MB/s (realized), Sysco’s weekly reports took roughly 15 minutes. Database volume was constantly around 90% capacity, causing Mike to have to manually delete tables…space was at a premium.Licensing with GreenPlum was expensive ($20,000/6 months….$40,000/yr.) and the system didn’t deliver performance as promised (in either PSA or NewOps). NewOps’ performance should have been significantly better given it’s more robust hardware (12 x DELL PowerEdge 2950’s).Since GreenPlum is based on PostgreSQL, it made sense to leverage the underlying free open source code and scrap the proprietary distributed DB solution, opting for a standalone server with enhanced space and I/O. Migrating existing tables to PostgreSQL required very little modification.The mini-PC’s we used to cluster GreenPlum were limited in capacity and scalability…each box was sealed and didn’t allow for expansion.Mini-PC Details:AOpenMP965-DIntel® Core™2 Duo CPU T7300 @ 2GHz3.24GB MemoryBus Speed: 800MHz150GBSATA Drive
  • 42. III. PSA Server Case Studies: TYAN Transport + PostgreSQLPSA Server (v2): “sentrana-psa-dw”This is our second generation PSA box, this time using PostgreSQL 8.3 instead of GreenPlum.Formerly used as a testing box at the colo, named “econ.sentrana.com”….consists of a basic Tyan Transport GX28 (B2881) commodity chassis, with a Tyan Thunder K8SR (S2881) motherboard, 2 Dual Core AMD Opteron 270’s @ 1000MHz w/2MBL2 Cache, 8GB memory, and 4 SATA-1 drive bays (SATA-II drives are backwards compatible, able to fit in these bays, however running at SATA-I speed). Filesystem: EXT3 (4KB block size = kernel page size)Storage Configuration: 4 drives bays = 1 OS drive + 3 RAID5 DB Drives @ SATA-I speed (150MB/s)Read Performance: ~ 76.75MB/s32
  • 43. III. PSA Server Case Studies: DELL PowerEdge 2950 + PostgreSQLPSA Server (v3): “psa-dw-2950”This is our third (and current) generation PSA box, still using PostgreSQL, only the server platform has evolved to a DELL PowerEdge 2950, with dual Xeon Quad Core processors @ 2.5GHz, 16GBDDR memory, 1333MHz FSB, and 6 SATA-II/SAS drive bays configured via PCIe PERC6/I integrated RAID controller.Formerly used as one of the NewOps DBNode’s, with GreenPlum, this box was rebuilt from the OS out using Ubuntu 8.10 Linux as the OS serving PostgreSQL 8.3 as the DB System. Filesystem: XFS (4KB block size = kernel page size)Storage Configuration: 6 x 1TB Drives @ 7,2KRPMs (300Mb/s SATA-II speed) in single RAID5 array ~ 5TB actual storage space (5 drive spindles used for data, 1 for RAID5 parity)Read Performance: ~ 507MB/s33
  • 44. III. PSA Server Case Studies: DELL PowerEdge 2950 + PostgreSQLPSA Server (v3): “psa-dw-2950”Postgresql.conf settings:max_connections = 25shared_buffers = 4096MB (1/4 total physical memory)(Sets the amount of memory the database server uses for shared memory buffers. )temp_buffers = 1024MB(Sets the maximum number of temporary buffers used by each database session.)work_mem = 4096MBSpecifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. (too high = paging will occur, too low = writing to tempdb)maintenance_work_mem = 256MBrandom_page_cost = 2.0(query planner constant... stating the cost of using disks is 2.0)effective_cache_size = 12288MB(query planner constant)constraint_exclusion = on(query planner uses table constraints to optimize queries...e.g. partitioned tables)34
  • 45. 1725 Eye St. NW, Suite 900Washington DC, 20006OFFICE 202.507.4480FAX 866.597.3285WEB sentrana.com

Editor's Notes

  • #4: Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data. Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support. More CPU cores allows you to parallelize workloads. A multithreaded database takes advantage of multi-processing bydistributing a query into several threads across multiple CPUs, drastically increasing the query’s efficiency while reducing its process time. Faster disks with high bandwidth and low seek times maximize read performance into memory for CPUs to process complex queries. OLAP databases benefit from this because they scan large datasets frequently. Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
  • #5: Set realistic goals, know the hardware’s expected limitations....Measure current performance....Analyze the results (research upgrades and possible performance problems)Modify the system..Benchmark again....Repeat as needed
  • #6: Client issues a query across the network ....Database server searches cache and memory for database extents...if they’re not found in memory, they’re located on disk...Disks then seek out the blocks containing the database extents and begin loading the data into memory...Memory pages are then fed into CPU’s cache and ultimately into the CPU for processing...Results are found and sent back to the client
  • #7: Each CPU has several coresInternal Clock Speed: processes per second in MHz or GHz (advertised)External Clock Speed: speed the FSB is accessed (typical bottleneck)Memory Clock Speed: speed at which RAM is given requests for data (another bottleneck)PostgreSQL is multi-process, one Unix process per DB connection.A single connection can only use on CPU, not multithreaded.
  • #8: CPU speed has increased roughly 70% each year, memory speed hasn’t kept up.DDR Memory (double data rate) allows for sending data to the CPU at the top and bottom of the clock cycle (sine wave). Doubling throughput....still bottleneck.Memory tradeoff is typically speed for capacity, faster = less capacity/more expensiveFurther out from the CPU you go, the slower and greater capacity the storage. Also, disk is the only permanent storage....also holds swap.
  • #9: 1st Generation Xeon multiprocessing bottlenecks....still bottlenecks today, but less so...Shared FSB between processors....halves bandwidth to memory....second processor competes for FSB bandwidth...Memory access delayed between controller and memory bankBandwidth between I/O controller (Southbridge) and memory controller (Northbridge) congested...as is bandwidth from Northbridge to the expansion slots.
  • #10: Here you see how each Intel processor shares a common FSB bandwidth, dividing the bandwidth per CPU. Access to memory must be at the reduced bandwidth, through the northbridge memory controller, and into the memory banks.AMD’s approach places a Northbridge controller directly on each processor, so there’s no external chipsets to deal with. Each processor features three point-to-point HyperTransport links, delivering 3.2 GB/s of bandwidth in each direction (6.4GB/s full duplex). So AMD’s scalability was better in the earlier days of Xeon multiprocessing.
  • #11: “Second Generation” (Harpertown) Xeon Processors E5200/E5400:==========================================Each CPU has a clock speed of 2GHz, 12MB of L2 cache, and a FSB of 1333MHz (1600MHz max on other models). The read bandwidth for each DDR2 667-MHz memory channel is 5.325 GB/s which gives a total read bandwidth of 21.3 GB/s for four memory channelsWrite memory bandwidth through the same four channels is 10.7 GB/s write memory bandwidth for the same four memory channels. Overall Effective bandwidth to memory is then 32 GB/s ... 21.3 GB/s read and 10.7 GB/s write.5500-series "Nehalem-EP" (Gainestown) adds: (December 2008)=================================Integrated memory controller supporting 2-3 DDR3 memory channelsPoint to Point processor interconnect called “QuickPath” (like AMD’s HyperTransport), bypassing FSBHyperthreading, doubling each core
  • #12: There’s 3 delays associated with reading or writing to a hard drive:Seek Time, Rotational Delay, and Transfer TimeSeek Time is the time it takes for the drive’s read/write head to be physically moved into the correct place for the data being sought.Rotational Delay is the time required for the addressed area of the disk to rotate into a position where it is accessible by the read/write head….typically measured in milliseconds.Transfer Time is the time it takes to transfer data from the disk through the read/write head, across the storage bus, into memory for processing by the CPU.Seek Time/Rotational Delay is heavily influenced by the disk’s rotational speed (RPMs), data location on the actual platters, how many platters the disk has, and the diameter of the platters.Generally speaking, the faster a disk spins, the lower its seek times will be. Also, the further outside the circumference of the platter data is located, the faster it will be sought and lower it’s rotational delay will be.Bandwidth/Throughput (Transfer Time): Once data is located, this is the raw throughput rate at which data is transferred from disk into memory. This can be aggregated using RAID, which will be discussed later.SATA-I bandwidth is 1Gb/s which translates into ~ 150MB/s real speed.SATA-II and SAS bandwidth is 3Gb/s, which translates into ~ 300MB/s real speed.Generally speaking, the higher the data density of the platter, the more data will be sent through the read/write head per block…resulting in higher throughput and lower transfer times.
  • #13: Buffer/Cache:Writeback-cacheData normally written to disk by the CPU is first written into disk’s the cache. This allows for higher write performance with the risk that data stored in cache isn’t flushed to disk before a power During idle machine cycles, the data are written from the cache into memory or onto disk. Write back caches improve performance, because a write to the high-speed cache is faster than to normal RAM or disk….this cache aids in Addressing the disk-to-memory subsystem bottleneck.I’ve enabled write-back caching on all of our RAID arrays. RAID will be discussed later.
  • #14: 4. Track Data Density :Defines how much information can be stored on a given track. The higher the track data density, the more information the disk can store on one track. If a disk can store more data on one track it does not have to move the head to the next track as often. This means that the higher the recording density the lower the chances are that the head will have to be moved to the next track to get the required data.
  • #15: RAID: (n = number of drives in array) “Redundant Array of Inexpensive Disks”. RAID systems improve performance by allowing the controller to exploit the capabilities of multiple hard disks to get around performance-limiting mechanical issues that plague individual hard disks. Different RAID implementations improve performance in different ways and to different degrees, but all improve it in some way. RAID0 “Striping” (n) : Fastest due to no parity…raw cumulative speed. Single drive failure causes the entire array to fail. “All-or-none” RAID1 “Mirroring” (n/2): Each drive is mirrored, speed and capacity is ½ of RAID0, requires even number of disks in order to be divided. Entire source or mirror array can go bad before data is jeopardized.RAID5 “Striping w/Parity” (n – 1): Fast, with a drive set aside for fault-tolerance. Only one drive can fail before the array is lost.RAID6 “Striping with dual Parity” (n -2): Fast, with 2 drives set aside for fault tolerance. Two drives can fail before the array is lost.
  • #16: Normal PCI’s bandwidth is 132MB/sAGP8x is 2.1GB/sPCI Express outperforms PCI significantly:PCIe is bidirectional/full-duplex...allowing data to flow in both directions simultaneously (doubling throughput):PCIe 1x = 500MB/s (250MB/s each way)PCIe 2x = 1GB/s (500MB/s each way)PCIe 4x = 2GB/s (1GB/s each way)PCIe 8x = 4GB/s (2GB/s each way)PCIe 16x = 8GB/s (4/GB each way)Even PCIe 32x = 16GB/s (8GB/s each way)So, to open the Internally, you want to use PCIe, not just plain PCI which is old and slow in comparison.AGP is also obsolete due to PCIe’s introduction, now graphics cards use this interface as well.Regular PCI is a bottleneck in modern computers.All of our 2950 servers have PERC6/i RAID controllers built in, the “i” means “integrated’ on the motherboard.I found that our throughput was significantly slower than what we expected, despite having 6 SATA-II drives even in RAID0. The settings we selected for the RAID Virtual Drives were: Stripe Element Size 64KB Read Policy: Adaptive Read-Ahead (to optimize large read operations) Write Policy: Write BackWe were seeing read speeds in RAID5 of approximately 150-225MB/s across 4 drives..which we knew was way too slow given the hardware.After rebuilding the array several times and searching around on the Internet, I came across DELL’s PERC firmware update site, which showed that a newer release was available: v.6.2.0-0013“performance enhancements including significant improvements in random-write performance, multi-threaded write performance, and reduction in maximum and average I/O response times.”I couldn’t flash the PERC controllers without a floppy, so I had to create a Linux based FreeDOS bootable CD with the updated PERC firmware in a subdirectory, allowing me to successfully flash the controller’s BIOS. Later, I discovered that DELL’s OpenManage CD provides a tool to handle BIOS updates, however, I wasn’t able to get this working....so the FreeDOS solution worked out.I also dug around and found that I could set filesystem read-ahead parameters through “hdparm” in Linux that would allow me to tell the OS to read ahead 2048 blocks whenever a read operation was performed. I set this in /etc/rc.local to persist after a reboot.Once the PERC controller was flashed and linux filesystem readahead was set, performance increased dramatically:We’re now seeing just over 500MB/s reads in RAID5/6.This significantly reduces the time it takes to load tables into memory for complex queries, thereby reducing overall query execution time. Performance is now on par with GreenPlum without having to pay $40,000/year licensing.
  • #18: Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data. Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support. More CPU cores allows you to parallelize workloads. A multithreaded database takes advantage of multi-processing bydistributing a query into several threads across multiple CPUs, drastically increasing the query’s efficiency while reducing its process time. Faster disks with high bandwidth and low seek times maximize read performance into memory for CPUs to process complex queries. OLAP databases benefit from this because they scan large datasets frequently. Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
  • #19: Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data. Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support. More CPU cores allows you to parallelize workloads. A multithreaded database takes advantage of multi-processing bydistributing a query into several threads across multiple CPUs, drastically increasing the query’s efficiency while reducing its process time. Faster disks with high bandwidth and low seek times maximize read performance into memory for CPUs to process complex queries. OLAP databases benefit from this because they scan large datasets frequently. Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
  • #20: Database Types:OLAP (Online Analytical Processing):Provides big picture, supports analysis, needs aggregate data, evaluates all datasets quickly, uses a multidimensional model.DB size is typically 100GB to several TB (even petabytes)Mostly read-only operations, lots of scans, complex queries.Benefits from multi-threading, parallel processing, and fast drives with highread throughput/low seek times.Key Performance Metrics: Query throughput/Response time.OLTP (Online Transactional Processing): Provides detailed audit, supports operations, needs detailed data, finds one dataset quickly, uses a relational model.DB size typically < 100GBShort, atomic transactions…read/write.Key Performance Metrics: Transaction Throughput, Availability
  • #21: Database Types:The time and expense involved in retrieving answers from databases means that a lot of business intelligence information often goes unused. The reason: most operational databases (OLTP) are designed to store your data, not to help you analyze it. The solution: an online analytical processing (OLAP) database, a specialized database designed to help you extract business intelligence information from your data.
  • #24: A connection from an application program to the PostgreSQL server has to be established.The parser stagechecks the query transmitted by the application program for correct syntax and creates a query tree. The rewrite systemtakes the query tree created by the parser stage and looks for any rules (stored in the system catalogs) to apply to the query tree. It performs the transformations given in the rule bodies. The planner/optimizer takes the (rewritten) query tree and creates a query planthat will be the input to the executor. It does so by first creating all possible paths leading to the same result. For example if there is an index on a relation to be scanned, there are two paths for the scan. One possibility is a simple sequential scan and the other possibility is to use the index. Next the cost for the execution of each path is estimated and the cheapest path is chosen. The executor recursively steps through the plan tree and retrieves rows in the way represented by the plan. The executor makes use ofthe storage systemwhile scanning relations, performs sorts and joins, evaluates qualifications and finally hands back the rows derived.
  • #25: GreenPlum and PostgreSQL:We found that despite the claims above, GreenPlum was overpriced, slow, and problematic. Furthermore, our GreenPlum PSA database grew to exceed the hardware we had in place, requiring us to constantly have to manually delete old tables. To replace GreenPlum while maintaining the table structures already in place, we opted to go with PostgreSQL, aware that it’s not pre-optimized for OLAP/data-warehouse applications.The mindset in doing this was that we could tweak PostgreSQL to mimic the actual performance we saw from GreenPlum without having to pay an expensive license.Understanding how to tweak PostgreSQL to mimic the performance of GreenPlum requires an understanding of PostgreSQL query execution characteristics and its tweak file concepts.
  • #26: Max_connections sets the maximum number of client connections per server. Several performance parameters use “max_connections” as part of their formula for tweaking Postgresql.Shared buffers: As the name implies, this is the maximum shared memory allowed to PostgreSQL. Too much and you risk paging.Working Memory:You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory. This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem.
  • #27: Max_connections sets the maximum number of client connections per server. Several performance parameters use “max_connections” as part of their formula for tweaking Postgresql.Shared buffers: As the name implies, this is the maximum shared memory allowed to PostgreSQL. Too much and you risk paging.Working Memory:You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory. This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem.
  • #29: Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits: Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table. Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE is far faster than a bulk operation. It also entirely avoids the VACUUM overhead caused by a bulk DELETE. Seldom-used data can be migrated to cheaper and slower storage media.
  • #31: Vacuuming ensures the databases remain ACID.... Atomic, Consistent, Isolated, and Durable.Atomicity: Guarantees that either all of the tasks of a transaction are performed or none.Consistency: Only valid data will ever be written to the database.Isolation: other operations cannot access or see the data in an intermediate state during a transaction.Durability: once the user has been notified of the transaction’s success, the transaction will persist and not be undone….thus surviving a system failure.
  • #32: disk I/O was roughly 90MB/s Database volume was constantly around 90% capacity, causing Mike to have to manually delete tables…space was at a premium.Licensing with GreenPlum was expensive ($20,000/6 months….$40,000/yr.) made sense to leverage the underlying free open source code and scrap the proprietary distributed DB solution
  • #33: Performance challenges with this server:Limited capacity, drives were small and slow (150MB/s).RAID controller’s SATA-1 interface didn’t recognize higher capacity SATA-II drives, despite SATA-II backwards compatibility.No floppy drive or USB boot capability…making it difficult to flash the controller and BIOS for SATA-II backwards compatibility. No PCIe expansion bays, only PCI-X, ruling out high-performance external enclosures.PostgreSQL requires significant configuration tweaks to realize decent performance.PostgreSQL isn’t multithreaded. A single query process (regardless of its complexity) uses only 1 CPU core.
  • #34: Performance challenges with this server:This is our third (and current) generation PSA boxDELL PowerEdge 2950, with dual Xeon Quad Core processors @ 2.5GHz16GBDDR memory @ 667MHz bus speed (x2, DDR)1333MHz FSB6 SATA-II1TB7200RPM Drives, configured in a single 5TBRAID5 Array…1 drive can fail. Throughput is across 5 spindles…effective capacity is roughly 4.5TB.Drive Setup: Used PERC6/i BIOS Menu for RAID configuration (hardware RAID, OS transparent), battery backup is enabled for write-caching.Virtual Drive1 (Physical drives 1 and 2): RAID1 for system (1TB), 64KB Stripe Element Size, Write-Back enabledVirtual Drive2 (Physical drives 3,4,5,6): RAID5 for PostgreSQL data (3TB), 64KB Stripe Element Size, Write-Back enabledI/O Performance: Read:    507MB/sWrite:   401MB/s
  • #35: Through process of elimination and online research (Google and Postgresql forums) we have setteled on the above settings in the PSA server’s configuration file:max_connections = 25Mike confirmed that only 10-15 PSA ever really connect at any given time, so this setting allows for spikes while remaining conservative enough to not inflate “work_mem” as work_mem uses max_connections in its memory allocation formula.shared_buffers = 4096MB This number comes from best practices, ¼ total physical memory (16GB)