Mcts self paced training kit exam 432 sql server 2008 - implementation and maintenance

Exam 70-432: Microsoft SQL Server 2008—
Implementation and Maintenance
0BJECTIVE LOCATION IN BOOK

INSTALLING AND CONFIGURING SQL SERVER 2008

Configure additional SQL Server components. Chapter 1, Lessons 3 and 4
Chapter 5, Lessons 1, 2 and 3

Configure SQL Server instances. Chapter 1, Lesson 3

Configure SQL Server services. Chapter 1, Lesson 3

Install SQL Server 2008 and related services. Chapter 1, Lesson 3

Implement database mail. Chapter 1, Lesson 4

Configure full-text indexing. Chapter 5, Lessons 1, 2 and 3

MAINTAINING SQL SERVER INSTANCES

Manage SQL Server Agent jobs. Chapter 10, Lesson 2

Manage SQL Server Agent alerts. Chapter 10, Lesson 4

Manage SQL Server Agent operators. Chapter 10, Lesson 3

Implement the declarative management framework (DMF). Chapter 8, Lessons 1 and 2

Back up a SQL Server environment. Chapter 9, Lessons 1, 2 and 3

MANAGING SQL SERVER SECURITY

Manage logins and server roles. Chapter 11, Lesson 3

Manage users and database roles. Chapter 11, Lesson 3

Manage SQL Server instance permissions. Chapter 11, Lesson 4

Manage database permissions. Chapter 11, Lesson 4

Manage schema permissions and object permissions. Chapter 11, Lesson 4

Audit SQL Server instances. Chapter 11, Lesson 5

Manage transparent data encryption. Chapter 11, Lesson 6

Configure surface area. Chapter 8, Lessons 1, 2 and 3
Chapter 11, Lesson 2

0BJECTIVE LOCATION IN BOOK

MAINTAINING A SQL SERVER DATABASE

Back up databases. Chapter 2, Lesson 1
Chapter 9, Lesson 1

Restore databases. Chapter 9, Lessons 2 and 3

Manage and conﬁgure databases. Chapter 2, Lessons 2, 3 and 4

Manage database snapshots. Chapter 9, Lesson 3

Maintain database integrity. Chapter 2, Lesson 4

Maintain a database by using maintenance plans. Chapter 9, Lesson 1

PERFORMING DATA MANAGEMENT TASKS

Import and export data. Chapter 7, Lessons 1, 2, 3 and 4

Manage data partitions. Chapter 6, Lessons 1, 2, 3 and 4

Implement data compression. Chapter 3, Lesson 1

Maintain indexes. Chapter 4, Lesson 3

Manage collations. Chapter 2, Lesson 3

MONITORING AND TROUBLESHOOTING SQL SERVER

Identify SQL Server service problems. Chapter 12, Lesson 4

Identify concurrency problems. Chapter 12, Lesson 2

Identify SQL Agent job execution problems. Chapter 10, Lesson 1

Locate error information. Chapter 12, Lesson 1

OPTIMIZING SQL SERVER PERFORMANCE

Implement Resource Governor. Chapter 13, Lesson 6

Use the Database Engine Tuning Advisor. Chapter 13, Lesson 4

Collect trace data by using SQL Server Proﬁler. Chapter 12, Lesson 2

Collect performance data by using Dynamic Management Views (DMVs). Chapter 13, Lesson 5

Collect performance data by using System Monitor. Chapter 12, Lesson 1

Use Performance Studio. Chapter 13, Lesson 7

IMPLEMENTING HIGH AVAILABILITY

Implement database mirroring. Chapter 15, Lessons 1, 2 and 3

Implement a SQL Server clustered instance. Chapter 14, Lessons 1 and 2

Implement log shipping. Chapter 16, Lessons 1, 2 and 3

Implement replication. Chapter 17, Lessons 1, 2 and 3

Exam objectives The exam objectives listed here are current as of this book’s publication date. Exam objectives are
subject to change at any time without prior notice and at Microsoft’s sole discretion. Please visit the Microsoft Learning
Web site for the most current listing of exam objectives: http://www.microsoft.com/learning/mcp/.

PUBLISHED BY
Microsoft Press
A Division of Microsoft Corporation
One Microsoft Way
Redmond, Washington 98052-6399
Copyright © 2009 by Mike Hotek
All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means
without the written permission of the publisher.
Library of Congress Control Number: 2008940530

Printed and bound in the United States of America.

1 2 3 4 5 6 7 8 9 QWT 4 3 2 1 0 9

Distributed in Canada by H.B. Fenn and Company Ltd.

A CIP catalogue record for this book is available from the British Library.

Microsoft Press books are available through booksellers and distributors worldwide. For further infor mation about
international editions, contact your local Microsoft Corporation ofﬁce or contact Microsoft Press International directly at
fax (425) 936-7329. Visit our Web site at www.microsoft.com/mspress. Send comments to mspinput@microsoft.com.

Microsoft, Microsoft Press, Excel, IntelliSense, Internet Explorer, MSDN, MSN, SharePoint, Silverlight, SQL Server, Visual
Studio, Windows, and Windows Server are either registered trademarks or trademarks of the Microsoft group of companies.
Other product and company names mentioned herein may be the trademarks of their respective owners.

The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted
herein are ﬁctitious. No association with any real company, organization, product, domain name, e-mail address, logo,
person, place, or event is intended or should be inferred.

This book expresses the author’s views and opinions. The information contained in this book is provided without any
express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will
be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.

Acquisitions Editor: Ken Jones
Developmental Editor: Laura Sackerman
Project Editor: Denise Bankaitis
Editorial Production: S4Carlisle Publishing Services
Technical Reviewer: Steve Kass and Umachandar Jayachandran; Technical Review services provided by
Content Master, a member of CM Group, Ltd.
Cover: Tom Draper Design

Body Part No. X15-24083

To Genilyn,
My compass in a storm and the light to show me the way home

Contents at a Glance

Introduction xxvii

CHAPTER 1 Installing and Conﬁguring SQL Server 2008 1
CHAPTER 2 Database Conﬁguration and Maintenance 37
CHAPTER 3 Tables 61
CHAPTER 4 Designing SQL Server Indexes 85
CHAPTER 5 Full Text Indexing 111
CHAPTER 6 Distributing and Partitioning Data 135
CHAPTER 7 Importing and Exporting Data 161
CHAPTER 8 Designing Policy Based Management 177
CHAPTER 9 Backing up and Restoring a Database 197
CHAPTER 10 Automating SQL Server 233
CHAPTER 11 Designing SQL Server Security 251
CHAPTER 12 Monitoring Microsoft SQL Server 307
CHAPTER 13 Optimizing Performance 367
CHAPTER 14 Failover Clustering 407
CHAPTER 15 Database Mirroring 451
CHAPTER 16 Log Shipping 483
CHAPTER 17 Replication 513

Glossary 553
Answers 561
Index 599

Contents

Introduction xxvii

Chapter 1 Installing and Configuring SQL Server 2008 1
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Lesson 1: Determining Hardware and Software Requirements . . . . . . . . . . 3
Minimum Hardware Requirements 3
Supported Operating Systems 4
Software Requirements 5
Lesson Summary 6
Lesson Review 6

Lesson 2: Selecting SQL Server Editions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
SQL Server Services 8
SQL Server Editions 11
Lesson Summary 15
Lesson Review 15

Lesson 3: Installing and Configuring SQL Server Instances . . . . . . . . . . . . . 17
Service Accounts 17
Collation Sequences 18
Authentication Modes 18
SQL Server Instances 19
SQL Server Configuration Manager 19
Lesson Summary 27
Lesson Review 27

Lesson 4: Configuring Database Mail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Database Mail 28

What do you think of this book? We want to hear from you!
Microsoft is interested in hearing your feedback so we can continually improve our
books and learning resources for you. To participate in a brief online survey, please visit:

www.microsoft.com/learning/booksurvey/ vii

Lesson Summary 31
Lesson Review 31

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter Summary 33
Key Terms 33
Case Scenario 33

Suggested Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Installing SQL Server 34
Managing SQL Server Services 34

Take a Practice Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 2 Database Configuration and Maintenance 37
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Lesson 1: Configuring Files and Filegroups . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Files and Filegroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Transaction Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

FILESTREAM data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

tempdb Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Lesson Summary 45
Lesson Review 45

Lesson 2: Configuring Database Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Database Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Recovery Options 46
Auto Options 48
Change Tracking 50
Access 50
Parameterization 51

Collation Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Lesson Summary 53
Lesson Review 53

Lesson 3: Maintaining Database Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Database Integrity Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Lesson Summary 56
viii Contents

Lesson Review 56

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Chapter Summary 57
Key Terms 57
Case Scenario 57

Conﬁguring Databases 59


Chapter 3 Tables 61
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Lesson 1: Creating Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Schemas 63
Data Types 64
Column Properties 69
Computed Columns 72
Row and Page Compression 72
Creating Tables 73
Lesson Summary 75
Lesson Review 76

Lesson 2: Implementing Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Primary Keys 77
Foreign Keys 77
Unique Constraints 78
Default Constraints 78
Check Constraints 78
Lesson Summary 80
Lesson Review 80

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Chapter Summary 81
Key Terms 81
Case Scenario 81

Creating Tables 82
Contents ix

Creating Constraints 83


Chapter 4 Designing SQL Server Indexes 85
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Lesson 1: Index Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Index Structure 87
Balanced Trees (B-Trees) 88
Index Levels 89
Lesson Summary 91
Lesson Review 91

Lesson 2: Designing Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Clustered Indexes 93
Nonclustered Indexes 95
Index Options 97
XML Indexes 99
Spatial Indexes 99
Lesson Summary 102
Lesson Review 102

Lesson 3: Maintaining Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104
Index Management and Maintenance 104
Disabling an index 105
Lesson Summary 107
Lesson Review 107

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108
Chapter Summary 108
Key Terms 108
Case Scenario 108

Suggested Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Creating Indexes 110

Take a Practice Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Chapter 5 Full Text Indexing 111
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
x Contents

Lesson 1: Creating and Populating Full Text Indexes . . . . . . . . . . . . . . . . . 113
Full Text Catalogs 113
Full Text Indexes 114
Change Tracking 115
Language, Word Breakers, and Stemmers 116
Lesson Summary 118
Lesson Review 118

Lesson 2: Querying Full Text Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .120
FREETEXT 120
CONTAINS 121
Lesson Summary 125
Lesson Review 126

Lesson 3: Managing Full Text Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .127
Thesaurus 127
Stop Lists 128
Populate Full Text Indexes 128
Lesson Summary 131
Lesson Review 131

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .132
Chapter Summary 132
Key Terms 132
Case Scenario 132

Create a Full Text Index 133
Query a Full Text Index 133
Create a Thesaurus File 134
Create a Stop List 134

Take a Practice Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134

Chapter 6 Distributing and Partitioning Data 135
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Lesson 1: Creating a Partition Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Partition Functions 137
Lesson Summary 141
Contents xi

Lesson Review 141

Lesson 2: Creating a Partition Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Partition Schemes 142
Lesson Summary 144
Lesson Review 144

Lesson 3: Creating Partitioned Tables and Indexes . . . . . . . . . . . . . . . . . . .146
Creating a Partitioned Table 146
Creating a Partitioned Index 147
Lesson Summary 149
Lesson Review 149

Lesson 4: Managing Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .150
Split and Merge Operators 150
Altering a Partition Scheme 150
Index Alignment 151
Switch Operator 151
Lesson Summary 156
Lesson Review 156

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Chapter Summary 157
Key Terms 157
Case Scenario 157

Suggested Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .160
Partitioning 160

Chapter 7 Importing and Exporting Data 161
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Lesson 1: Importing and Exporting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

Bulk Copy Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

The BULK INSERT command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
The SQL Server Import and Export Wizard 166
Lesson Summary 172
Lesson Review 172

xii Contents

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Chapter Summary 173
Key Terms 173
Case Scenario 173

Import and Export Data 175


Chapter 8 Designing Policy Based Management 177
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Lesson 1: Designing Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Facets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .180

Policy Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .180

Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

Policy Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

Policy Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .181
Central Management Server 183

Import and Export Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .183
Lesson Summary 191
Lesson Review 191

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Chapter Summary 193
Key Terms 193
Case Scenario 193

Implement Policy Based Management 196


Chapter 9 Backing up and Restoring a Database 197
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Lesson 1: Backing up Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .199

Contents xiii

Backup Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .199
Full Backups 200
Transaction Log Backups 203
Differential Backups 204
Filegroup Backups 205

Partial Backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .205

Page Corruption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .206

Maintenance Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .206
Certiﬁcates and Master Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .207

Validating a Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .209
Lesson Summary 211
Lesson Review 211

Lesson 2: Restoring Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .212

Transaction Log Internals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .212

Database Restores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Restoring a Full Backup 214
Restoring a Differential Backup 216
Restoring a Transaction Log Backup 216
Online Restores 217
Restore a Corrupt Page 217
Restoring with Media Errors 218
Lesson Summary 222
Lesson Review 222
Lesson 3: Database Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

Creating a Database Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Copy-On-Write Technology 224

Reverting Data Using a Database Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . 225
Lesson Summary 227
Lesson Review 227

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .228
Chapter Summary 228
Key Terms 228
Case Scenario 229
xiv Contents

Backing up a Database 231
Restoring a Database 231


Chapter 10 Automating SQL Server 233
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

Lesson 1: Creating Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .234
Job Steps 234

Job Schedules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Job History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .236
Lesson Summary 240
Lesson Review 240

Lesson 2: Creating Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

SQL Server Agent Alerts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Lesson Summary 245
Lesson Review 245

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .246
Chapter Summary 246
Key Terms 246
Case Scenario 246

Create Jobs 248
Create Alerts 248


Chapter 11 Designing SQL Server Security 251
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

Lesson 1: TCP Endpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Endpoint Types and Payloads 252
Endpoint Access 253

Contents xv

TCP Endpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
TCP Protocol Arguments 253
Database Mirroring and Service Broker Common Arguments 254
Database Mirroring–Specific Arguments 255
Service Broker–Specific Arguments 255
Lesson Summary 257
Lesson Review 257

Lesson 2: Configuring the SQL Server
Surface Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Surface Area Configuration 259
Lesson Summary 261
Lesson Review 262

Lesson 3: Creating Principals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .263
Logins 263
Fixed Server Roles 265
Database Users 266
Loginless Users 266
Fixed Database Roles 267
User Database Roles 268
Lesson Summary 269
Lesson Review 270

Lesson 4: Managing Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .271
Securables 272
Permissions 272
Metadata Security 273
Ownership Chains 274
Impersonation 275
Master Keys 275
Certificates 276
Signatures 277
Lesson Summary 283
Lesson Review 283

Lesson 5: Auditing SQL Server Instances. . . . . . . . . . . . . . . . . . . . . . . . . . . .285
DDL Triggers 285
Audit Specifications 286
xvi Contents

C2 Auditing 288
Lesson Summary 290
Lesson Review 291

Lesson 6: Encrypting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .292
Data Encryption 292
Hash Algorithms 293
Symmetric Keys 294
Certiﬁcates and Asymmetric Keys 294
Transparent Data Encryption 294
Encryption Key Management 296
Lesson Summary 300
Lesson Review 300

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .302
Chapter Summary 302
Key Terms 302
Case Scenario: Designing SQL Server Security 303

Manage Logins and Server Roles 304
Manage Users and Database Roles 305
Manage SQL Server Instance Permissions 305
Manage Database Permissions 305
Manage Schema Permissions and Object Permissions 305
Audit SQL Server Instances 305
Manage Transparent Data Encryption 305


Chapter 12 Monitoring Microsoft SQL Server 307
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .307

Lesson 1: Working with System Monitor. . . . . . . . . . . . . . . . . . . . . . . . . . . .309

System Monitor Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .309

Capturing Counter Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
Lesson Summary 315
Lesson Review 315
Contents xvii

Lesson 2: Working with the SQL Server Profiler . . . . . . . . . . . . . . . . . . . . . 317

Defining a Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

Specifying Trace Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .320

Selecting Data Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322

Applying Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

Managing Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

Correlating Performance and Monitoring Data . . . . . . . . . . . . . . . . . . . . . 325
Lesson Summary 330
Lesson Review 331

Lesson 3: Diagnosing Database Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

SQL Server Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

Database Space Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .334
Lesson Summary 338
Lesson Review 339

Lesson 4: Diagnosing Service Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .340

Finding Service Startup Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .340
Configuration Manager 340
Lesson Summary 349
Lesson Review 349

Lesson 5: Diagnosing Hardware Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

Disk Drives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Memory and Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
Lesson Summary 353
Lesson Review 353

Lesson 6: Resolving Blocking and Deadlocking Issues . . . . . . . . . . . . . . . . 355

Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

Transaction Isolation Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

Blocked Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

Deadlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .358
Lesson Summary 362
Lesson Review 362

xviii Contents

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .363
Chapter Summary 363
Key Terms 363
Case Scenario 363

Creating a Trace Using SQL Server Proﬁler to
Diagnose Performance and Deadlock Issues 366
Create a Counter Log Using System Monitor to
Diagnose Performance, Deadlock, and System Issues 366


Chapter 13 Optimizing Performance 367
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .367

Lesson 1: Using the Database Engine Tuning Advisor . . . . . . . . . . . . . . . .369

Database Engine Tuning Advisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .369
Lesson Summary 375
Lesson Review 375

Lesson 2: Working with Resource Governor. . . . . . . . . . . . . . . . . . . . . . . . . 376

Resource Governor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
Lesson Summary 386
Lesson Review 386

Lesson 3: Using Dynamic Management Views and Functions . . . . . . . . .387

DMV Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .387
Database Statistics 388
Query Statistics 389
Disk Subsystem Statistics 390
Hardware Resources 391
Lesson Summary 393
Lesson Review 394

Lesson 4: Working with the Performance Data Warehouse . . . . . . . . . . .395

Performance Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .395
Lesson Summary 400
Lesson Review 400

Contents xix

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .402
Chapter Summary 402
Key Terms 402
Case Scenario 403

Using the Performance Data Warehouse to Gather Data
for Performance Optimization 405
Using Database Engine Tuning Advisor to Gather Data
Using Dynamic Management Views to Gather Data

Chapter 14 Failover Clustering 407
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .407
Lesson 1: Designing Windows Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
Windows Cluster Components 410
Types of Clusters 412
Security Configuration 413
Disk Configuration 413
Network Configuration 414
Cluster Resources 415
Cluster Groups 416
Lesson Summary 428
Lesson Review 429
Lesson 2: Designing SQL Server 2008 Failover Cluster Instances . . . . . . .430
Terminology 431
Failover Cluster Instance Components 431
Health Checks 433
Cluster Failover 433
Lesson Summary 444
Lesson Review 444
Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .445
Chapter Summary 445
Key Terms 445
Case Scenario 445
xx Contents

Windows Clustering 448
SQL Server Failover Clustering 449

Chapter 15 Database Mirroring 451
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451

Lesson 1: Overview of Database Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . 452
Database Mirroring Roles 452
Principal Role 453
Mirror Role 453
Witness Server 453
Database Mirroring Endpoints 454
Operating Modes 455
Caching 458
Transparent Client Redirect 458
Database Mirroring Threading 459
Database Snapshots 459
Lesson Summary 462
Lesson Review 462

Lesson 2: Initializing Database Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . .464
Recovery Model 465
Backup and Restore 465
Copy System Objects 466
Lesson Summary 469
Lesson Review 469

Lesson 3: Designing Failover and Failback Strategies . . . . . . . . . . . . . . . . . 471
Designing Mirroring Session Failover 471
Designing Mirroring Session Failback 472
Lesson Summary 474
Lesson Review 475

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
Chapter Summary 476
Key Terms 476
Case Scenario 477
Contents xxi

Establishing Database Mirroring 480
Creating a Database Snapshot Against a Database
Mirror 480


Chapter 16 Log Shipping 483
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .483

Lesson 1: Overview of Log Shipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .484
Log Shipping Scenarios 484
Log Shipping Components 485
Types of Log Shipping 487
Lesson Summary 487
Lesson Review 488

Lesson 2: Initializing Log Shipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .489
Log Shipping Initialization 489
Lesson Summary 499
Lesson Review 499

Lesson 3: Designing Failover and Failback
Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .500
Log Shipping Failover 501
Log Shipping Failback 502
Lesson Summary 505
Lesson Review 505

Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .506
Chapter Summary 506
Key Terms 506
Case Scenario 507

Initiating Log Shipping 511
Failover and Failback Log Shipping 512


xxii Contents

Chapter 17 Replication 513
Before You Begin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
Lesson 1: Overview of Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
Replication Components 514
Replication Roles 515
Replication Topologies 516
Replication Agents 517
Agent Proﬁles 518
Replication Methods 519
Data Conﬂicts 521
Lesson Summary 524
Lesson Review 525
Lesson 2: Transactional Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
Change Tracking 526
Transactional Options 528
Transactional Architectures 530
Monitoring 532
Validation 532
Lesson Summary 536
Lesson Review 537
Lesson 3: Merge Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .538
Change Tracking 538
Validation 541
Lesson Summary 543
Lesson Review 543
Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .545
Chapter Summary 545
Key Terms 545
Case Scenario 546
Transactional Replication 550
Merge Replication 551

Contents xxiii

Glossary 553
Answers 561
Index 599

xxiv Contents

Acknowledgements

hank you to all my readers over the past decade or so; it’s hard to believe that this will be
T the seventh SQL Server book I’ve written and it would not be possible without you. I’d
like to thank my editorial team at Microsoft Press, Denise Bankaitis and Laura Sackerman.
I would especially like to thank Ken Jones, who has gone through ﬁve books with me and has
proved to be an invaluable asset to Microsoft Press. Thank you to Rozanne Whalen, who has
now tech-edited three books for me. I don’t know how she does it, but Susan McClung’s word
wizardry has transformed my writing into the volume you hold in your hands. That all of this
content is coherent is a testament to the many hours of hard work put in by Rozanne, Susan,
and the rest of the editing team.

xxv

Introduction
his training kit is designed for information technology (IT) professionals who plan to take
T the Microsoft Certified Technology Specialist (MCTS) Exam 70-432, as well as database
administrators (DBAs) who need to know how to implement, manage, and troubleshoot
Microsoft SQL Server 2008 instances. It’s assumed that before using this training kit, you
already have a working knowledge of Microsoft Windows and SQL Server 2008, and you have
experience with SQL Server or another database platform.
By using this training kit, you learn how to do the following:
Install and configure SQL Server 2008
Create and implement database objects
Implement high availability and disaster recovery
Secure instances, databases, and database objects
Monitor and troubleshoot SQL Server instances

Using the CD and DVD
A companion CD and an evaluation software DVD are included with this training kit.
The companion CD contains the following:
Practice tests You can reinforce your understanding of how to implement and
maintain SQL Server 2005 databases by using electronic practice tests that you can
customize to meet your needs from the pool of Lesson Review questions in this book.
Alternatively, you can practice for the 70-432 certification exam by using tests created
from a pool of about 200 realistic exam questions, which will give you enough different
practice tests to ensure that you’re prepared.
Practice files Not all exercises incorporate code, but for each exercise that has code,
you find one or more files in a folder for the corresponding chapter on the companion
CD. You can either type the code from the book or open the corresponding code file in
a query window.
eBook An electronic version (eBook) of this training kit is included for use at times
when you don’t want to carry the printed book with you. The eBook is in Portable
Document Format (PDF), and you can view it by using Adobe Acrobat or Adobe Reader.
You can use the eBook to cut and paste code as you work through the exercises.
Sample chapters Sample chapters from other Microsoft Press titles on SQL Server
2008. These chapters are in PDF format.

xxvii

Evaluation software The evaluation software DVD contains a 180-day evaluation
edition of SQL Server 2008 in case you want to use it instead of a full version of SQL
Server 2008 to complete the exercises in this book.

Digital Content for Digital Book Readers: If you bought a digital-only edition of this book, you can
enjoy select content from the print edition’s companion CD.
Visit http://go.microsoft.com/fwlink/?LinkId=139187 to get your downloadable content. This content
is always up-to-date and available to all readers.

How to Install the Practice Tests
To install the practice test software from the companion CD to your hard disk, perform the
following steps:
1. Insert the companion CD into your CD-ROM drive and accept the license agreement
that appears onscreen. A CD menu appears.

NOTE
E ALTERNATIVE INSTALLATION INSTRUCTIONS IF AUTORUN IS DISABLED
If the CD menu or the license agreement doesn’t appear, AutoRun might be disabled
on your computer. Refer to the Readme.txt ﬁle on the companion CD for alternative
installation instructions.

2. Click Practice Tests and follow the instructions on the screen.

How to Use the Practice Tests
To start the practice test software, follow these steps:
1. Click Start and select All Programs, Microsoft Press Training Kit Exam Prep. A window
appears that shows all the Microsoft Press training kit exam prep suites that are installed
on your computer.
2. Double-click the lesson review or practice test that you want to use.

Lesson Review Options
When you start a lesson review, the Custom Mode dialog box appears, enabling you to
conﬁgure your test. You can click OK to accept the defaults, or you can customize the number
of questions you want, the way the practice test software works, which exam objectives you
want the questions to relate to, and whether you want your lesson review to be timed. If you
are retaking a test, you can select whether you want to see all the questions again or only
those questions you previously skipped or answered incorrectly.

xxviii Introduction

After you click OK, your lesson review starts. You can take the test by performing the
following steps:
1. Answer the questions and use the Next, Previous, and Go To buttons to move from
question to question.
2. After you answer an individual question, if you want to see which answers are correct,
along with an explanation of each correct answer, click Explanation.
3. If you would rather wait until the end of the test to see how you did, answer all the
questions and then click Score Test. You see a summary of the exam objectives that
you chose and the percentage of questions you got right overall and per objective.
You can print a copy of your test, review your answers, or retake the test.

Practice Test Options
When you start a practice test, you can choose whether to take the test in Certification Mode,
Study Mode, or Custom Mode.
Certification Mode Closely resembles the experience of taking a certification exam.
The test has a set number of questions, it is timed, and you cannot pause and restart
the timer.
Study Mode Creates an untimed test in which you can review the correct answers
and the explanations after you answer each question.
Custom Mode Gives you full control over the test options so that you can customize
them as you like.
In all modes, the user interface that you see when taking the test is basically the same,
but different options are enabled or disabled, depending on the mode. The main options are
discussed in the previous section, “Lesson Review Options.”
When you review your answer to an individual practice test question, a “References”
section is provided. This section lists the location in the training kit where you can find the
information that relates to that question, and it provides links to other sources of information.
After you click Test Results to score your entire practice test, you can click the Learning Plan
tab to see a list of references for every objective.

How to Uninstall the Practice Tests
To uninstall the practice test software for a training kit, use the Add Or Remove Programs
option (Windows XP or Windows Server 2003) or the Program And Features option (Windows
Vista or Windows Server 2008) in Control Panel.

Introduction xxix

Microsoft Certified Professional Program
Microsoft certifications provide the best method to prove your command of current Microsoft
products and technologies. The exams and corresponding certifications are developed to
validate your mastery of critical competencies as you design and develop or implement and
support solutions with Microsoft products and technologies. Computer professionals who
become Microsoft-certified are recognized as experts and are sought after industry-wide.
Certification brings a variety of benefits to the individual and to employers and organizations.

MORE INFO LIST OF MICROSOFT CERTIFICATIONS
For a full list of Microsoft certifications, go to http://www.microsoft.com/learning/mcp/
default.mspx.

Technical Support
Every effort has been made to ensure the accuracy of this book and the contents of the
companion CD. If you have comments, questions, or ideas regarding this book or the
companion CD, please send them to Microsoft Press by using either of the following methods:
E-mail
• tkinput@microsoft.com

Postalt Mail:
• Microsoft Press
Attn: MCTS Self-Paced Training Kit (Exam 70-432): Microsoft SQL Server 2008 Implementation
and Maintenance Editor
One Microsoft Way
Redmond, WA, 98052-6399

For additional support information regarding this book and the companion CD (including
answers to commonly asked questions about installation and use), visit the Microsoft Press Technical
Support Web site at http://www.microsoft.com/learning/support/books. To connect directly to the
Microsoft Knowledge Base and enter a query, visit http://support.microsoft.com/search. For support
information regarding Microsoft software, please connect to http://support.microsoft.com.

xxx Introduction

Evaluation Edition Software
The 180-day evaluation edition provided with this training kit is not the full retail product
and is provided only for the purposes of training and evaluation. Microsoft and Microsoft
Technical Support do not support this evaluation edition.
Information about any issues relating to the use of this evaluation edition
with this training kit is posted in the Support section of the Microsoft Press Web site
(http://www.microsoft.com/learning/support/books/ ). For information about ordering the
full version of any Microsoft software, please call Microsoft Sales at (800) 426-9400 or
visit http://www.microsoft.com.

Introduction xxxi

CHAPTER 1

Installing and Configuring
SQL Server 2008
his chapter will prepare you to install Microsoft SQL Server instances. You will learn
T about the capabilities of each SQL Server edition as well as the hardware requirements
to install SQL Server. At the end of this chapter, you will be able to configure services and
SQL Server components. You will also learn how to configure Database Mail, which will be
used for a variety of notification tasks.

Exam objectives in this chapter:
Install SQL Server 2008 and related services.
Configure SQL Server instances.
Configure SQL Server services.
Configure additional SQL Server components.
Implement Database Mail.

Lessons in this chapter:
Lesson 1: Determining Hardware and Software Requirements 3

Lesson 2: Selecting SQL Server Editions 8
Lesson 3: Installing and Configuring SQL Server Instances 17

Lesson 4: Configuring Database Mail 28

Before You Begin
To complete the lessons in this chapter, you must have both of the following:
A machine that meets or exceeds the minimum hardware and software requirements
as outlined in Lesson 1
SQL Server 2008 installation media

CHAPTER 1 1

REAL WORLD
Michael Hotek

S QL Server 2008 is not simply a database, but is instead a complete database
platform consisting of numerous services and hundreds of capabilities. All too
frequently, organizations simply “point and click” to install SQL Server and then
start loading data. Prior to installing, you need to determine how the SQL Server
computer is going to be used, as well as the hardware resources required.

Not too long ago, I was working with a company that just installed servers running
SQL Server and depended upon being able to change configurations as they went.
Unfortunately, no one did the homework for a new application the company was
deploying. SQL Server was installed, and the DBA team deployed the database
structure and started to load data. Suddenly, the load procedures aborted and the
database was no longer accessible. They had undersized the disk drives and had
run out of space during the load process. After they allocated more disk space
and started the load process again, they encountered another error, which made
SQL Server unavailable. Although they had allocated additional disk space to the
database, Tempdb had now run out of space. After multiple retries, they finally got
the data loaded, only to find out that the design specifications called for replication,
service broker, and CLR capabilities.

After installing replication support and configuring service broker and the CLR
routines, the system went into production, 16 days behind schedule. In less than
one day, all the users were complaining about slow response times. The DBA team
planned to have only 20 concurrent users in the application, the maximum number
they had ever seen before; yet more than 2,000 people were trying to use the new
application. The single processor machine with 2 GB of RAM was insufficient to
handle 2,000 concurrent users attempting to access more than 400 GB of data.

After taking the application off-line, buying new hardware, and redeploying the
system, the new application went back online, 43 days behind their scheduled date.
Most of the users had moved on to other systems deployed by competitors. The
company wasted millions of dollars of holiday advertising due to lack of planning at
both the installation and deployment stages.

2 CHAPTER 1 Installing and Configuring SQL Server 2008

Lesson 1: Determining Hardware and Software
Requirements
SQL Server 2008 has very minimal hardware and software requirements. This lesson explains
the minimum hardware requirements along with operating system versions and additional
software necessary to run SQL Server 2008 instances.

IMPORTANT
T MINIMUM HARDWARE REQUIREMENTS
This lesson outlines the minimum requirements for installing SQL Server. Production
systems usually require signiﬁcantly more hardware to meet performance and capacity
expectations. You need to apply the knowledge from subsequent chapters in this book to
help you determine the memory, disk storage, and processor requirements that may be
required by a given application.

After this lesson, you will be able to:
Verify minimum hardware requirements
Verify operating system support
Verify additional software required

Estimated lesson time: 20 minutes

Minimum Hardware Requirements
SQL Server 2005 had a variety of requirements that depended upon the edition of SQL Server
as well as whether it was a 32-bit or 64-bit version. SQL Server 2008 simpliﬁes the minimum
hardware requirements for a SQL Server instance.
The minimum hardware requirements are listed in Table 1-1.

TABLE 1-1 Hardware Requirements

REQUIREMENT 32-BIT 64-BIT

Processor Pentium III or higher Itanium, Opteron, Athelon, or Xeon/
Pentium with EM64T support
Processor Speed 1.0 gigahertz (GHz) or 1.6 GHz or higher
higher
Memory 512 megabytes (MB) 512 MB

The amount of disk space consumed by the installation depends upon the services
and utilities that are installed. To determine the amount of disk space required, please

Lesson 1: Determining Hardware and Software Requirements CHAPTER 1 3

refer to the SQL Server Books Online article, “Hardware and Software Requirements for
Installing SQL Server 2008,” at http://technet.microsoft.com/en-us/library/ms143506.aspx.

IMPORTANT
T ADDITIONAL HARDWARE COMPONENTS
SQL Server Books Online lists a mouse, CD/DVD drive, and monitor with at least 1024 x 768
resolution as requirements for installation. However, it is possible to install SQL Server to a
computer that does not have any of these devices attached, which is very common within
a server environment. A CD/DVD drive is required only if you are installing from a disk.
A monitor is required only if you are using the graphical tools.

Supported Operating Systems
SQL Server 2008 is supported on 32-bit and 64-bit versions of Microsoft Windows. The 64-bit
version of SQL Server can install only to a 64-bit version of Windows. The 32-bit version of
SQL Server can be installed to either a 32-bit version of Windows or to a 64-bit version of
Windows with Windows on Windows (WOW) enabled.
The operating systems supported for all editions of SQL Server are:
Windows Server 2008 Standard or higher
Windows Server 2003 Standard SP2 or higher
The operating systems supported for SQL Server Developer, Evaluation, and Express are:
Windows XP Professional SP2 or higher
Windows Vista Home Basic or higher
SQL Server Express is also supported on:
Windows XP Home Edition SP2 or higher
Windows XP Home Reduced Media Edition
Windows XP Tablet Edition SP2 or higher
Windows XP Media Center 2002 SP2 or higher
Windows XP Professional Reduced Media Edition
Windows XP Professional Embedded Edition Feature Pack 2007 SP2
Windows XP Professional Embedded Edition for Point of Service SP2
Windows Server 2003 Small Business Server Standard Edition R2 or higher

EXAM TIP
SQL Server 2008 is not supported on Windows Server 2008 Server Core. Windows Server
2008 Server Core is not supported because the .NET Framework is not supported on Server
Core. SQL Server 2008 relies on .NET Framework capabilities to support FILESTREAM,
SPATIAL, and DATE data types, along with several additional features.


Software Requirements
SQL Server 2008 requires .NET Framework 3.5. Although the installation routine installs the
required versions of the .NET Framework, you need to have Windows Installer 4.5 on the
computer prior to the installation of SQL Server.

IMPORTANT
T .NET FRAMEWORK
NET Framework 2.0 includes Windows Installer 3.1, so if you have .NET Framework 2.0
already installed, you meet the minimum requirements. However, to minimize the amount
of time required for installation, it is recommended that you install all versions of the .NET
Framework through version 3.5 on the machine prior to installing SQL Server.

The SQL Server setup routine also requires:
Microsoft Data Access Components (MDAC) 2.8 SP1 or higher
Shared Memory, Named Pipes, or TCP/IP networking support
Internet Explorer 6 SP1 or higher

Q
Quick Check
1 . What edition of Windows Server 2008 is not supported for SQL Server 2008
installations?

2. Which operating systems are supported for all editions of SQL Server?

Quick Check Answers
1 . Windows Server 2008 Server Core is not supported for SQL Server 2008
installations.

2. Windows Server 2003 Standard SP2 or higher, Windows Server 2008 Standard
RC0 or higher.

PR ACTICE Verify Minimum Requirements

In the following practices, you verify that your machine meets the minimum hardware,
operating system, and supporting software requirements for a SQL Server installation.

PR ACTICE 1 Verify Hardware and Operating System Requirements
In this practice, you verify that your computer meets the minimum hardware and operating
system requirements to install SQL Server 2008.
1. Click Start, right-click My Computer, and select Properties.
2. On the General tab under System, verify that your operating system meets the
minimum requirements.


3. On the General tab under Computer, verify that your computer meets the minimum
hardware requirements.

PR ACTICE 2 Verify Supporting Software Requirements
In this practice, you verify that you have the appropriate supporting software installed.
1. Click Start, and then select Control Panel.
2. Double-click Add/Remove Programs.
3. Verify that you have the minimum versions of Windows Internet Explorer and the .NET
Framework installed by performing the following steps:
a. Click Start, and then select Run.
b. Enter regedit in the text box.
c. When the Registry Editor opens, browse through the navigation pane to HKEY_LOCAL_
MACHINESoftwareMicrosoftDataAccess.
4. Verify the MDAC version in the FullInstallVer key.

Lesson Summary
SQL Server 2008 is supported on both 32-bit and 64-bit operating systems.
You can install all editions of SQL Server 2008 on either Windows Server 2003 Standard
Edition SP2 and higher or Windows Server 2008 Standard and higher.
You cannot install SQL Server 2008 on Windows Server 2008 Server Core.

Lesson Review
The following questions are intended to reinforce key information presented in Lesson 1,
“Determining Hardware and Software Requirements.” The questions are also available on the
companion CD if you prefer to review them in electronic form.

NOTE
E ANSWERS
Answers to these questions and explanations of why each answer choice is right or wrong
are located in the “Answers” section at the end of the book.

1. You are deploying a new server within Wide World Importers that will be running
a SQL Server 2008 instance in support of a new application. Because of the feature
support that is needed, you will be installing SQL Server 2008 Enterprise. Which
operating systems will support your installation? (Choose all that apply.)
A. Windows 2000 Server Enterprise SP4 or higher
B. Windows Server 2003 Enterprise
C. Windows Server 2003 Enterprise SP2
D. Windows Server 2008 Enterprise


2. You are deploying SQL Server 2008 Express in support of a new Web-based application
that will enable customers to order directly from Coho Vineyards. Which operating
system does NOT support your installation?
A. Windows XP Home Edition SP2
B. Windows Server 2008 Server Core
C. Windows Server 2003 Enterprise SP2
D. Windows XP Tablet Edition SP2


Lesson 2: Selecting SQL Server Editions
SQL Server 2008 is available is several editions, ranging from editions designed for mobile,
embedded applications with a very small footprint to editions designed to handle petabytes
or data being manipulated by millions of concurrent users. This lesson explains the services
available within the SQL Server 2008 database platform and the differences between the SQL
Server editions.

Understand the differences between SQL Server 2008 Enterprise, Workgroup,
Standard, and Express
Understand the role of each service that ships within the SQL Server 2008 data
platform


SQL Server Services
SQL Server 2008 is much more than a simple database used to store data. Within the SQL
Server 2008 data platform are several services that can be used to build any conceivable
application within an organization.
Within the core database engine, you will find services to store, manipulate, and back up and
restore data. The core database engine also contains advanced security capabilities to protect
your investments, along with services to ensure maximum availability. Your data infrastructure
can be extended to handle unstructured text along with synchronizing multiple copies of a
database. Many of these capabilities are discussed in subsequent chapters in this book.

Service Broker
Service Broker was introduced in SQL Server 2005 to provide a message queuing system
integrated into the SQL Server data platform. Based on user-defined messages and processing
actions, you can use Service Broker to provide asynchronous data processing capabilities. Not
only is Service Broker a capable message queuing system, you can also provide advanced
business process orchestration with Service Broker handling data processing across a myriad of
platforms, all without requiring the user to wait for the process to complete or affecting the user
in any other way.

SQL Server Integration Services
SQL Server Integration Services (SSIS) features all the enterprise class capabilities that you can
find in Extract, Transform, and Load (ETL) applications while also allowing organizations to
build applications that can manage databases and system resources, respond to database and
system events, and even interact with users.


SSIS has a variety of tasks to enable packages to upload or download files from File Transfer
Protocol (FTP) sites, manipulate files in directories, import files into databases, or export
data to files. SSIS can also execute applications, interact with Web services, send and receive
messages from Microsoft Message Queue (MSMQ), and respond to Windows Management
Instrumentation (WMI) events. Containers allow SSIS to execute entire tasks and workflows
within a loop with a variety of inputs from a simple counter to files in a directory or across
the results of a query. Specialized tasks are included to copy SQL Server objects around
an environment as well as manage database backups, re-indexing, and other maintenance
operations. If SSIS does not ship with a task already designed to meet your needs, you can
write your own processes using the Visual Studio Tools for Applications or even design your
own custom tasks that can be registered and utilized within SSIS.
Precedence constraints allow you to configure the most complicated operational
workflows, where processing can be routed based on whether a component succeeds, fails, or
simply completes execution. In addition to the static routing based on completion status, you
can combine expressions to make workflow paths conditional. Event handlers allow you to
execute entire workflows in response to events that occur at a package or task level, such as
automatically executing a workflow to move a file to a directory when it cannot be processed,
log the details of the error, and send an e-mail to an administrator.
Package configurations enable developers to expose internal properties of a package such
that the properties can be modified for the various environments in which a package will
be executing. By exposing properties in a configuration, administrators have a simple way
of reconfiguring a package, such as changing database server names or directories, without
needing to edit the package.
Beyond the workflow tasks, SSIS ships with extensive data movement and manipulation
components. Although it is possible for you to simply move data from one location to another
within a data flow task, you can also apply a wide variety of operations to the data as it moves
through the engine. You can scrub invalid data, perform extensive calculations, and convert
data types as the data moves through a pipeline. Inbound data flows can be split to multiple
destinations based on a condition. The data flow task has the capability to perform data
lookups against sources to either validate inbound data or include additional information as
the data is sent to a destination. Fuzzy lookups and fuzzy grouping can be applied to allow
very flexible matching and grouping capabilities beyond simple wildcards. Multiple inbound
data flows can be combined to be sent to a single destination. Just as multiple inbound flows
can be combined, you can also take a single data flow and broadcast to multiple destinations.
Within an SSIS data flow task, you can also remap characters, pivot or unpivot data sets,
calculate aggregates, sort data, perform data sampling, and perform text mining. If SSIS
does not have a data adapter capable of handling the format of your data source or data
destination or does not have a transform capable of the logic that you need to perform, a
script component is included that allows you to bring the entire capabilities of Visual Studio
Tools for Applications to bear on your data.

Lesson 2: Selecting SQL Server Editions CHAPTER 1 9

SQL Server Reporting Services
Organizations of all sizes need to have access to the vast quantities of data stored throughout
the enterprise in a consistent and standardized manner. Although it would be nice to expect
everyone to know how to write queries against data sources to obtain the data that is
needed or to have developers available to write user interfaces for all the data needs, most
organizations do not have the resources. Therefore, tools need to be available to create
standardized reports that are made available throughout the organization, as well as providing
the ability for users to build reports on an ad hoc basis.
SQL Server Reporting Services (SSRS) fills the data delivery gap by providing a flexible
platform for designing reports as well as distributing data throughout an organization. The IT
department can build complex reports rapidly, which are deployed to one or more portals that
can be accessed based on flexible security rules. The IT department can also design and publish
report models that allow users to build their own reports without needing to understand the
underlying complexities of a database. Reports built by IT as well as by users can be deployed
to a centralized reporting portal that allows members of the organization to access the
information they need to do their jobs.
Users can access reports which are either generated on the fl y or displayed from
cached data that is refreshed on a schedule. Users can also confi gure subscriptions to
a report which allow SSRS to set up a schedule to execute the report and then send it
to users on their preferred distribution channel formatted to their specifications. For
example, a sales manager can create a subscription to a daily sales report such that the
report is generated at midnight after all sales activity is completed, have it rendered in a
Portable Document Format (PDF) format, and dropped in his e-mail inbox for review in
the morning.
SSRS ships with two main components, a report server and a report designer.
The report server is responsible for hosting all the reports and applying security. When
reports are requested, the report server is responsible for connecting to the underlying data
sources, gathering data, and rendering the report into the final output. Rendering a report
is accomplished either on demand from a user or through a scheduled task which allows the
report to be run during off-peak hours.
For the report server to have anything to deliver to users, reports must first be created. The
report designer is responsible for all the activities involved in creating and debugging reports.
Components are included that allow users to create both simple tabular or matrix reports
and more complex reports with multiple levels of subreports, nested reports, charts, linked
reports, and links to external resources. Within your reports, you can embed calculations and
functions, combine tables, and even vary the report output based on the user accessing the
report. The report designer is also responsible for designing reporting models that provide
a powerful semantic layer which masks the complexities of a data source from users so that
they can focus on building reports.


SQL Server Analysis Services
As the volume of data within an organization explodes, you need to deploy tools that allow
users to make business decisions on a near-real-time basis. Users can no longer wait for IT
to design reports for the hundreds of questions that might be asked by a single user. At the
same time, IT does not have the resources to provide the hundreds of reports necessary to
allow people to manage a business.
SQL Server Analysis Services (SSAS) was created to fill the gap between the data needs
of business users and the ability of IT to provide data. SSAS encompasses two components:
Online Analytical Processing (OLAP) and Data Mining.
The OLAP engine allows you to deploy, query, and manage cubes that have been designed
in Business Intelligence Development Studio (BIDS). You can include multiple dimensions and
multiple hierarchies within a dimension, and choose a variety of options such as which attributes
are available for display and how members are sorted. Measures can be designed as simple
additive elements as well as employing complex, user-defined aggregations schemes. Key
Performance Indicators (KPIs) can be added which provide visual queues for users on the state
of a business entity. Cubes can contain perspectives which define a subset of data within a single
cube to simplify viewing. The built-in metadata layer allows you to specify language translations
at any level within a cube so that users can browse data in their native language.
The Data Mining engine extends business analysis to allow users to find patterns and make
predictions. Utilizing any one of the several mining algorithms that ship with SQL Server, businesses
can examine data trends over time, determine what factors influence buying decisions, or even
reconfigure a shopping experience based on buying patterns to maximize the potential of a sale.

MORE INFO SQL SERVER SERVICES
For a detailed discussion of each feature available within the SQL Server 2008 data
platform, please refer to the book Microsoft SQL Server 2008 Step by Step (Microsoft Press,
2008), which provides overview chapters on every SQL Server 2008 feature.

SQL Server Editions
SQL Server 2008 is available in the following editions:
Enterprise Designed for the largest organizations and those needing to leverage the
full power of the SQL Server 2008 platform.
Standard Designed for small and midsized organizations that do not need all the
capabilities available in SQL Server 2008 Enterprise.
Workgroup Suitable for small departmental projects with a limited set of features.
Express A freely redistributable version of SQL Server that is designed to handle
the needs of embedded applications as well as the basic data storage needs for
server-based applications, such as Web applications with a small number of users.


Compact Designed as an embedded database.
Developer Designed for use by developers in creating SQL Server applications.
SQL Server 2008 Developer has all the features and capabilities as SQL Server 2008
Enterprise, except that it is not allowed to be used in a production environment.
Evaluation Designed to allow organizations to evaluate SQL Server 2008. SQL Server
2008 Evaluation has all the features and capabilities as SQL Server 2008 Enterprise,
except that it is not allowed to be used in a production environment and it expires
after 180 days.

NOTE
E SQL SERVER EDITIONS
The Developer Edition of SQL Server is designed for developers to create new SQL Server
applications. The Evaluation Edition of SQL Server is designed to allow organizations to
evaluate the features available in SQL Server. Both the Developer and Evaluation editions
contain the same functionality as the Enterprise Edition of SQL Server; the only exception
being that the Developer and Evaluation editions are not licensed to run in a production
environment. For the purposes of this book, we will discuss only the editions that can be
deployed into production environments: Express, Workgroup, Standard, and Enterprise.

The main differences between the SQL Server editions are in the hardware and feature set
that is supported. The tables below provide a basic overview of the differences between the
editions in the various areas.

TABLE 1-2 Hardware Support

HARDWARE STANDARD WORKGROUP EXPRESS COMPACT

# of CPUs 4 4 1 1
Database size Unlimited Unlimited 4 GB 4 GB
RAM Unlimited Unlimited 1 GB 1 GB

TABLE 1-3 Database Engine Support

FEATURE STANDARD WORKGROUP EXPRESS COMPACT

SQL Server Management Yes Yes Separate No
Studio download
Full Text Search Yes Yes Advanced No
services
only
Partitioning No No No No
Parallel Operations No No No No


TABLE 1-3 Database Engine Support


Multiple Instances No No No No
Database Snapshots No No No No
Scalable Shared No No No No
Databases
Indexed Views No No No No
Log Compression No No No No
Clustering 2 nodes No No No
Database Mirroring Single-thread No No No
Online Operations No No No No
Resource Governor No No No No
Backup Compression No No No No
Hot Add Memory/CPU No No No No
Data Encryption Limited Limited Limited Password-
based only
Change Data Capture No No No No
Data Compression No No No No
Policy-Based Yes No No No
Management
Performance Data Yes No No No
Collection
CLR Yes
XML Native Native Native Stored as
text
Spatial data Yes
Stored procedures, Yes Yes Yes No
triggers, and views
Merge Replication Yes Yes Subscriber Subscriber
only only
Transactional Replication Yes Subscriber Subscriber No
only only


TABLE 1-4 SSIS Support


Import/Export Wizard Yes Yes N/A N/A
Package Designer Yes Yes N/A N/A
Data Mining No No N/A N/A
Fuzzy grouping/lookup No No N/A N/A
Term extraction/lookup No No N/A N/A
OLAP processing No No N/A N/A

TABLE 1-5 SSRS Support


Microsoft Ofﬁce Integration Yes Yes Advanced N/A
services only
Report Builder Yes Yes Advanced N/A
services only
Scale-out reporting No No No N/A
Data-driven subscriptions No No No N/A

TABLE 1-6 SSAS—OLAP Support


Linked measures/dimensions No No N/A N/A
Perspectives No No N/A N/A
Partitioned cubes No No N/A N/A

TABLE 1-7 SSAS—Data Mining Support


Time series No No N/A N/A
Parallel processing and prediction No No N/A N/A
Advanced mining algorithms No No N/A N/A

EXAM TIP
For the exam, you need to understand the basic design goals for each edition of SQL
Server. You also need to know the feature set, memory, and processor support differences
between the editions.


Q
Quick Check
1 . Which editions support the entire feature set available within the SQL Server data
platform? Of these editions, which editions are not licensed for production use?

2. Which editions of SQL Server are designed as storage engines for embedded
applications with limited hardware and feature support?

Quick Check Answers
1 . Enterprise, Developer, and Evaluation editions have the entire set of features
available within the SQL Server 2008 data platform. Developer and Evaluation
editions are not licensed for use in a production environment.

2. Express and Compact editions are designed as storage engines for embedded
applications and support only a single CPU, up to 1 GB of RAM, and a maximum
database size of 4 GB.

Lesson Summary
SQL Server 2008 is available in Enterprise, Standard, Workgroup, Express, and Compact
editions for use in a production environment.
In addition to the core database engine technologies, SQL Server 2008 Enterprise
supports Service Broker for asynchronous processing.

Lesson Review
“Selecting SQL Server Editions.” The questions are also available on the companion CD if you
prefer to review them in electronic form.

NOTE
E ANSWERS

1. Margie’s Travel is opening a new division to offer online travel bookings to their
customers. Managers expect the trafﬁc volume to increase rapidly, to the point where
hundreds of users will be browsing offerings and booking travel at any given time.
Management would also like to synchronize multiple copies of the database of travel
bookings to support both online and face-to-face operations. Which editions of SQL
Server 2008 would be appropriate for Margie’s Travel to deploy for their new online
presence? (Choose all that apply.)
A. Express
B. Standard
C. Enterprise
D. Compact

2. Margie’s Travel decided to minimize the cost and deploy SQL Server 2008 Standard to
support the new online division. After a successful launch, managers are having a hard
time managing business operations and need to deploy advanced analytics. A new
server running SQL Server will be installed. Which edition of SQL Server needs to be
installed on the new server to support the necessary data analytics?
A. SQL Server 2008 Standard
B. SQL Server 2008 Express with Advanced Services
C. SQL Server 2008 Workgroup
D. SQL Server 2008 Enterprise


Lesson 3: Installing and Configuring
SQL Server Instances
In this lesson, you learn how to create and configure service accounts that will be used to run
the SQL Server services that you choose to install. You also learn about the authentication
mode and collation settings that are specified when an instance is installed. Finally, you learn
how to configure and manage SQL Server services following installation.

to:
Create service accounts
Install a SQL Server 2008
Understand collation sequences
Understand authentication modes
Install sample databases
Configure a SQL Server instance


Service Accounts
All the core SQL Server components run as services. To configure each component properly,
you need to create several service accounts prior to installation. You need dedicated service
accounts for the following components:
Database Engine
SQL Server Agent
The service account that is utilized for each SQL Server service not only allows SQL Server
to provide data and scheduling services to applications but also defines a security boundary.
The SQL Server engine requires access to many resources on a computer, such as memory,
processors, disk space, and networking. However, the SQL Server service is still running within
the security framework provided by Windows. SQL Server is able to access only the Windows
resources for which the service account has been granted permissions.

E
NOTE SQL SERVER SECURITY
SQL Server security is discussed in more detail in Chapter 11, “Designing SQL Server
Security.”

Lesson 3: Installing and Configuring SQL Server Instances CHAPTER 1 17

NOTE
E OPERATING SYSTEM VERSION
I am using Windows XP Professional SP2 for all exercises in this book. You will need to
make appropriate adjustments for the Windows version that you are using. In addition, if
your machine is a member of a domain, your service accounts should be domain accounts,
not local accounts, when installing SQL Server 2008 in any operational environment.

Collation Sequences
Collation sequences control how SQL Server treats character data for storage, retrieval, sorting,
and comparison operations. SQL Server 2008 allows you to specify a collation sequence to
support any language currently used around the world.
Collation sequences can be specified at the instance, database, table, and column levels.
The only collation sequence that is mandatory is defined at the instance level, which defaults
to all other levels unless it is overridden specifically.
A collation sequence defines the character set that is supported, including case sensitivity,
accent sensitivity, and kana sensitivity. For example, if you use the collation sequence of SQL_Latin1_
General_CP1_CI_AI, you get support for a Western European character set that is case-insensitive
and accent-insensitive. SQL_Latin1_General_CP1_CI_AI treats e, E, è, é, ê, and ë as the same character
for sorting and comparison operations, whereas a case-sensitive (CS), accent-sensitive (AI) French
collation sequence treats each as a different character.

NOTE
E COLLATION SEQUENCES
You will learn more about collation sequences in Chapter 2, “Database Configuration and
Maintenance.”

Authentication Modes
One of the instance configuration options you need to set during installation is the authentication
mode that SQL Server uses to control the types of logins allowed. You can set the authentication
mode for SQL Server to either:
Windows Only (integrated security)
Windows and SQL Server (mixed mode)
When SQL Server is configured with Windows-only authentication, you can use only
Windows accounts to log in to the SQL Server instance. When SQL Server is configured in
mixed mode, you can use either Windows accounts or SQL Server–created accounts to log in
to the SQL Server instance.

NOTE
E LOGINS
Logins are discussed in more detail in Chapter 11.


SQL Server Instances
SQL Server instances define the container for all operations you perform within SQL Server.
Each instance contains its own set of databases, security credentials, configuration settings,
Windows services, and other SQL Server objects.
SQL Server 2008 supports the installation of up to 50 instances on SQL Server on a single
machine. You can install one instance as the default instance along with up to 49 additional
named instances, or you can install 50 named instances with no default.
When you connect to a default instance of SQL Server, you use the name of the machine to
which the instance is installed. When connecting to a named instance, you use the combination
of the machine name and instance name, such as <machinename><instancename>.
The primary reasons for installing more than one instance of SQL Server on a single
machine are:
You need instances for quality assurance testing or development.
You need to support multiple service pack or patch levels.
You have different groups of administrators who are allowed to access only a subset of
databases within your organization.
You need to support multiple sets of SQL Server configuration options.

NOTE
E MULTIPLE SQL SERVER INSTANCES
Only SQL Server 2008 Enterprise supports the installation of multiple instances on a single
machine.

EXAM TIP
You will need to know how collation sequences affect the way SQL Server stores and
handles character data.

SQL Server Configuration Manager
Shown in Figure 1-1, SQL Server Configuration Manager is responsible for managing SQL
Server services and protocols. The primary tasks that you will perform are
Starting, stopping, pausing, and restarting a service
Changing service accounts and service account passwords
Managing the start-up mode of a service
Configuring service start-up parameters

FIGURE 1-1 List of services within SQL Server Configuration Manager

After you have completed the initial installation and configuration of your SQL Server services,
the primary action that you will perform within SQL Server Configuration Manager is to change
service account passwords periodically. When changing service account passwords, you no
longer have to restart the SQL Server instance for the new credential settings to take effect.

CAUTION
N WINDOWS SERVICE CONTROL APPLET
The Windows Service Control applet also has entries for SQL Server services and allows you to
change service accounts and passwords. You should never change service accounts or service
account passwords using the Windows Service Control applet. SQL Server Configuration
Manager needs to be used, because it includes the code to regenerate the service master key
that is critical to the operation of SQL Server services.

Although you can start, stop, pause, and restart SQL Server services, SQL Server has extensive
management features that should ensure that you rarely, if ever, need to shut down or restart a
SQL Server service.
SQL Server Configuration Manager also allows you to configure the communications
protocols available to client connections. In addition to configuring protocol-specific
arguments, you can control whether communications are required to be encrypted or whether
an instance responds to an enumeration request, as shown in Figure 1-2.


FIGURE 1-2 The Protocol Properties dialog box

NOTE
E WINDOWS SERVICE CONTROL APPLET
Applications can broadcast a special command, called an enumeration request, across a
t
network to locate any servers running SQL Server that are on the network. Although being
able to enumerate servers running SQL Server is valuable in development and testing
environments where instances can appear, disappear, and be rebuilt on a relatively frequent
basis, enumeration is not desirable in a production environment. By disabling enumeration
responses by setting the Hide Instance to Yes, you prevent someone from using discovery
techniques to locate servers running SQL Server for a possible attack.

Q
Quick Check
1 . Which edition of SQL Server supports installing more than one instance of SQL
Server on a machine?

2. What are the authentication modes that SQL Server can be conﬁgured with?

Quick Check Answers
1 . Only SQL Server Enterprise supports multiple instances on the same machine.

2. You can conﬁgure SQL Server to operate under either Windows Only or Windows
And SQL Server authentication modes.


PR ACTICE Installing and Conﬁguring a SQL Server Instance

In the following practices, you create service accounts, install a default SQL Server instance, and install
the AdventureWorks sample database that will be used throughout the remainder of this book.

PR ACTICE 1 Creating Service Accounts
In this practice, you create a service account that will be used to run your SQL Server service.
1. Click Start, right-click My Computer, and select Manage.
2. Expand Local Users And Groups and select Users.
3. Right-click in the right-hand pane and select New User.
4. Specify SQL2008TK432DE in the User Name ﬁeld, supply a strong password, clear the
User Must Change Password At Next Logon check box, and select the Password Never
Expires check box.
5. Repeat steps 3 and 4 to create another account named SQL2008TK432SQLAgent.

PR ACTICE 2 Install a SQL Server Instance
In this practice, you install an instance of SQL Server.
1. Start the SQL Server installation routine.
2. If you have not already installed the prerequisites for SQL Server 2008, the setup
routine installs the necessary components.
3. After the prerequisites have been installed, you see the main installation window, as
shown in Figure 1-3.

FIGURE 1-3 Main SQL Server installation screen


4. Click the New SQL Server stand-alone Installation link to start the SQL Server installation.
5. Installation executes a system conﬁguration check. When the check completes successfully,
your screen should look similar to Figure 1-4.

FIGURE 1-4 The system configuration check upon successful completion

6. Click OK. Select the SQL Server edition that you want to install. Click Next.
7. Click Next. Select the features, as shown in Figure 1-5, and then click Next.

FIGURE 1-5 The Feature Selection dialog box


8. Review the disk space requirements, and then click Next.
9. On the Instance Configuration page, verify that Default Instance is selected and click Next.
10. Enter the name of the service accounts that you created in Practice 1. When complete,
the page should look similar to Figure 1-6.

FIGURE 1-6 The Specifying Service Accounts tab of the Server Configuration dialog box

11. Click the Collation tab to review the collation sequence set for the Database Engine.
Make any adjustments that you think are necessary according to the language support
that you require. Click Next.
12. On the Database Engine Configuration page, select Mixed Mode (SQL Server Authentication
And Windows Authentication) and set a password. Click Add Current User to add the Windows
account that you are running the installation under as an administrator within SQL Server. Click
Add to add any other Windows accounts that you want as administrators within SQL Server.
13. Click the Data Directories tab to review the settings.

MORE INFO DATABASES AND DIRECTORIES
You will learn more about creating databases, database files, and directories in Chapter 2.

14. Click the FILESTREAM tab, and then select the Enable FILESTREAM for Transact-SQL
Access check box and the Enable FILESTREAM For File I/O Streaming Access check box.
Leave the Windows share name set to the default of MSSQLSERVER. Click Next.

MORE INFO FILESTREAM DATA TYPE
You will learn more about the FILESTREAM data type and creating tables in Chapter 3,
“Tables.”


15. Select the options of your choice on the Error And Usage Reporting page. Click Next.
16. Review the information on the Ready To Install page. When you are satisfied, click Install.
17. SQL Server starts the installation routines for the various options that you have
specified and displays progress reports.

PR ACTICE 3 Install the AdventureWorks Sample Database
In this practice, you install the AdventureWorks sample database, which will be used throughout
this book to demonstrate SQL Server 2008 capabilities.

NOTE
E SAMPLE DATABASES
SQL Server 2008 does not ship with any sample databases. You need to download the
AdventureWorks2008 and AdventureWorksDW2008 databases from the CodePlex Web
site (http://www.codeplex.com).

1. Open Internet Explorer and go to http://www.codeplex.com/MSFTDBProdSamples. Click
the Releases tab.
2. Scroll to the bottom of the page and download the AdventureWorks2008*.msi file to
your local machine.

IMPORTANT
T INSTALLATION ROUTINE
The CodePlex site contains installation routines for 32-bit, x64, and IA64 platforms.
Download the msi file that is appropriate to your operating system.

3. Run the installation routines for both downloads and use the default extract location.
4. Click Start, and then select All Programs, Microsoft SQL Server 2008, and SQL Server
Management Studio.
5. If not already entered, specify the machine name that you installed your SQL Server
instance to in the previous exercise. Click Connect. Your screen should look like Figure 1-7.
6. Click New Query. Enter the following code and then click Execute:
EXEC sp_configure 'filestream_access_level', 2;
GO
RECONFIGURE
GO
RESTORE DATABASE AdventureWorks FROM DISK='C:Program FilesMicrosoft SQL
Server100ToolsSamplesAdventureWorks2008.bak' WITH RECOVERY;
GO
RESTORE DATABASE AdventureWorksDW FROM DISK='C:Program FilesMicrosoft
SQL Server100ToolsSamplesAdventureWorksDW2008.bak' WITH RECOVERY;
GO


FIGURE 1-7 The SQL Server Management Studio screen

7. When you expand the Database node, your screen should look similar to Figure 1-8.

FIGURE 1-8 Installing the AdventureWorks sample database


Lesson Summary
The SQL Server engine and SQL Server Agent run as services and need to have service
accounts created with the appropriate Windows permissions to be able to access
needed Windows resources.
SQL Server can be configured to run under either the Windows Only or the Windows
And SQL Server authentication mode.
The collation sequence controls how SQL Server stores and manages character-based
data.
SQL Server supports up to 50 instances installed on a single machine; however, only
the Enterprise Edition has multi-instance support.

Lesson Review
“Installing and Configuring SQL Server Instances.” The questions are also available on the

NOTE
E ANSWERS

1. Wide World Importers will be using the new FILESTREAM data type to store scanned
images of shipping manifests. Which command must be executed against the SQL
Server instance before FILESTREAM data can be stored?
A. ALTER DATABASE
B. DBCC
C. sp_configure
D. sp_filestream_configure
2. Contoso has implemented a new policy that requires the passwords on all service
accounts to be changed every 30 days. Which tool should the Contoso database
administrators use to change the service account passwords so that SQL Server services
comply with the new policy?
A. Windows Service Control applet
B. SQL Server Management Studio
C. QL Server Configuration Manager
D. SQL Server Surface Area Configuration Manager


Lesson 4: Configuring Database Mail
Database Mail provides a notification capability to SQL Server instances. In this lesson, you
learn about the features of Database Mail and how to configure Database Mail within a SQL
Server 2008 instance.

Configure Database Mail
Send messages using Database Mail


Database Mail
Database Mail enables a computer running SQL Server to send outbound mail messages.
Although messages can contain the results of queries, Database Mail is primarily used to send
alert messages to administrators to notify them of performance conditions or changes that
have been made to objects.
Database Mail was added in SQL Server 2005 as a replacement to SQL Mail. The reasons
for the replacement were very simple:
Remove the dependency on Microsoft Mail Application Programming Interface (MAPI)
Simplify configuration and management
Provide a fast, reliable way to send mail messages
Database Mail uses the Simple Mail Transfer Protocol (SMTP) relay service that is available
on all Windows machines to transmit mail messages. When a mail send is initiated, the
message along with all of the message properties is logged into a table in the Msdb database.
On a periodic basis, a background task that is managed by SQL Server Agent executes. When
the mail send process executes, all messages within the send queue that have not yet been
forwarded are picked up and sent using the appropriate mail profile.
Profiles form the core element within Database Mail. A given profile can contain multiple
e-mail accounts to provide a failover capability in the event a specific mail server is unavailable.
Mail accounts define all the properties associated to a specific e-mail account such as
e-mail address, reply to e-mail address, mail server name, port number, and authentication
credentials.
You can secure access to a mail profile to restrict the user’s ability to send mail through
a given profile. When a profile is created, you can configure the profile to be either a public
or private profile. A public profile can be accessed by any user with the ability to send mail.
A private profile can be accessed only by those users who have been granted access to the
mail profile explicitly.


In addition to configuring a mail profile as either public or private, you can designate a
mail profile to be the default. When sending mail, if a mail profile is not specified, SQL Server
uses the mail profile designated as the default to send the message.

TIP
P SENDING MAIL
Database Mail utilizes the services of SQL Server Agent to send messages as a background
process. If SQL Server Agent is not running, messages will accumulate in a queue within the
Msdb database.

Q
Quick Check
1 . What are the two basic components of Database Mail?

2. What are the two types of mail profiles that can be created?

Quick Check Answers
1 . Database Mail uses mail profiles which can contain one or more mail accounts.

2. Mail profiles can be configured as either public or private.

PR ACTICE Configuring Database Mail

In this practice, you configure Database Mail and send a test mail message.
1. Open SQL Server Management Studio, connect to your SQL Server instance, and click
New Query to open a new query window and execute the following code to enable
the Database Mail feature:

EXEC sp_configure 'Database Mail XPs',1
GO
RECONFIGURE WITH OVERRIDE
GO

2. Within the Object Explorer, open the Management node, right-click Database Mail, and
select Configure Database Mail.
3. Click Next on the Welcome screen.
4. Select Set Up Database Mail By Performing The Following Tasks and click Next.
5. Specify a name for your profile and click the Add button to specify settings for a mail
account.
6. Fill in the Account Name, E-mail Address, Display Name, Reply E-mail, and Server
Name fields.
7. Select the appropriate SMTP Authentication mode for your organization and, if using
Basic authentication, specify the username and password. Your settings should look
similar to Figure 1-9.

Lesson 4: Configuring Database Mail CHAPTER 1 29

FIGURE 1-9 Configuring Database Mail

NOTE
E DATABASE MAIL SETTINGS
I am using my Internet e-mail account and have purposely left the Server Name, User
Name, and Password out of Figure 1-9. You need to specify at least the Server Name if
you are using an internal mail server.

8. Click OK and then click Next.
9. Select the check box in the Public column next to the profile you just created and set
this profile to Yes in the Default Profile column. Click Next.
10. Review the settings on the Configure System Parameters page. Click Next.
11. Click OK and then click Next. Click Finish.
12. The final page should show that all four configuration steps completed successfully.
Click Close.
13. Right-click Database Mail and select Send Test E-mail.
14. Select the Database Mail profile you just created, enter an e-mail address in the To line,
and click Send Test E-Mail, as shown in Figure 1-10.
15. Go to your e-mail client and verify that you have received the test mail message.

TIP
P SENDING MAIL
Database Mail utilizes the services of SQL Server Agent to send messages as a background
process. If SQL Server Agent is not running, messages accumulate in a queue within the
Msdb database.


FIGURE 1-10 Sending a test mail message

Lesson Summary
Database Mail is used to send mail messages from a SQL Server instance.
To send mail messages, SQL Server Agent must be running.
A mail profile can contain one or more mail accounts.
You can create either public or private mail profiles.
The mail profile designated as the default profile will be used to send mail messages if
a profile is not specified.

Lesson Review
“Configuring Database Mail.” The questions are also available on the companion CD if you

NOTE
E ANSWERS

1. As part of the implementation of the new Web-based booking system at Margie’s
Travel, customers should receive notices when a travel booking has been successfully
saved. What technologies or features can the developers at Margie’s Travel use to
implement notifications? (Choose all that apply.)
A. Notification Services
B. Database Mail
C. Microsoft Visual Studio.NET code libraries
D. Activity Monitor
Lesson 4: Configuring Database Mail CHAPTER 1 31

2. The developers at Margie’s Travel have decided to utilize Database Mail to send
messages to their customers. The ability to send mail messages through a given profile
needs to be restricted, but it must not require an approved user to specify a mail
profile when sending messages. What settings need to be configured to meet these
requirements? (Choose all that apply.)
A. Set the mail profile to public.
B. Set the mail profile to private.
C. Set the mail profile to private and grant access to approved users.
D. Designate the mail profile as the default.


Chapter Review
To practice and reinforce the skills you learned in this chapter further, you can:
Review the chapter summary.
Review the list of key terms introduced in this chapter.
Complete the case scenario. This scenario sets up a real-world situation involving the
topics of this chapter and asks you to create solutions.
Complete the suggested practices.
Take a practice test.

Chapter Summary
SQL Server 2008 is available in Enterprise, Standard, Workgroup, Express, and Compact
editions.
SQL Server runs as a service within Windows and requires a service account to be
assigned during installation.
You can configure an instance for Windows Only or Windows And SQL Server
authentication modes.
SQL Server Configuration Manager is used to manage any SQL Server services.
Database Mail can be enabled and configured on a SQL Server instance to allow users
and applications to send mail messages.

Key Terms
Do you know what these key terms mean? You can check your answers by looking up the
terms in the glossary at the end of the book.
Collation Sequence
Database Mail
Data Mining
Mail profile

Case Scenario
In the following case scenario, you apply what you’ve learned in this chapter. You can find
answers to these questions in the “Answers” section at the end of this book.

Case Scenario: Defining a SQL Server Infrastructure
Wide World Importers is implementing a new set of applications to manage several lines of
business. Within the corporate data center, they need the ability to store large volumes of
data that can be accessed from anywhere in the world.

Chapter Review CHAPTER 1 33

Several business managers need access to operational reports that cover the current workload
of their employees, along with new and pending customer requests. The same business managers
also need to be able to access large volumes of historical data to spot trends and optimize their
stafﬁng and inventory levels.
A large sales force makes customer calls all over the world and needs access to data on
the customers that a sales rep is servicing, along with potential prospects. The data for the
sales force needs to be available even when the salespeople are not connected to the Internet
or the corporate network. Periodically, sales reps will connect to the corporate network and
synchronize their data with the corporate databases.
A variety of Windows applications have been created with Visual Studio.NET and all data
access is performed using stored procedures. The same set of applications are deployed for
users connecting directly to the corporate database server as well as for sales reps connecting
to their own local database servers.
Answer the following questions:
1. What edition of SQL Server 2008 should be installed on the laptops of the sales force
to minimize the cost?
2. What edition of SQL Server 2008 should be installed within the corporate data
center?
3. What SQL Server services need to be installed to meet the needs of the bsuiness
managers?
4. What versions of Windows need to be installed on the corporate database server?

Suggested Practices
To help you master the exam objectives presented in this chapter, complete the following
tasks.

Installing SQL Server
Practice: Install SQL Server Instances Install two more SQL Server 2008 database
engine instances.

Managing SQL Server Services
Practice: Manage a SQL Server Instance Change the service account of one of your
installed instances.
Change the service account password for one of your installed instances.


Take a Practice Test
The practice tests on this book’s companion CD offer many options. For example, you can test
yourself on just one exam objective, or you can test yourself on all the 70-432 certiﬁcation
exam content. You can set up the test so that it closely simulates the experience of taking
a certiﬁcation exam, or you can set it up in study mode so that you can look at the correct
answers and explanations after you answer each question.

MORE INFO PRACTICE TESTS
For details about all the practice test options available, see the section “How to Use the
Practice Tests,” in the Introduction to this book.

Take a Practice Test CHAPTER 1 35

CHAPTER 2

Database Configuration
and Maintenance
he configuration choices that you make for a database affect its performance, scalability,
T and management. In this chapter, you learn how to design the file and filegroup storage
structures underneath a database. You learn how to configure database options and
recovery models. You will also learn how check and manage the integrity of a database.

Back up databases.
Manage and configure databases.
Maintain database integrity.
Manage collations.

Lesson 1: Configuring Files and Filegroups 39

Lesson 2: Configuring Database Options 46

Lesson 3: Maintaining Database Integrity 54

Before You Begin
To complete the lessons in this chapter, you must have:
Microsoft SQL Server 2008 installed
The AdventureWorks database installed within the instance

CHAPTER 2 37

REAL WORLD
Michael Hotek

I have worked on millions of databases across thousands of customers during the
portion of my career where I have worked with SQL Server. In all that time, I have
come up with many best practices while at the same time creating many arguments
among the “purists.” All my recommendations and approaches to architecting
and managing SQL Servers come from a pragmatic, real-world perspective that,
although rooted in a deep knowledge of SQL Server, hardware, networking, and
many other components, rarely matches up with the perfect world theory.

Designing the disk structures that underlie a database is one of the cases where
I deviate from a lot of the theoretical processes and computations that you will find
published. Although you can find entire white papers and even sections of training
classes devoted to teaching you how to calculate disk transfer and random vs.
sequential writes, I have never encountered an environment where I had the time or
luxury to run those calculations prior to implementing a system.

It is really nice that there are formulas to calculate the disk transfer of a given disk
configuration, and you can also apply statistical methods to further refine those
calculations based on the random vs. sequential I/O of a system. However, all the
time spent doing the calculations is worthless unless you also know the required
read and write capacity of the databases you are going to place on that disk
subsystem. Additionally, unless you are buying a new storage system, dedicated to
a specific application, you will have a very difficult time architecting the disk storage
underneath a database according to all the theories.

The challenge in achieving optimal performance is to separate the transaction logs
from data files so that you can isolate disk I/O. The transaction log is the key to
high-performance write operations, because the maximum transaction rate is bound
by the write capacity to the transaction log file. After taking care of the transaction
log, you need to add enough files and filegroups to achieve enough disk throughput
to handle the read/write activity. However, the most important component of
performance is to write applications with efficient code that accesses only the
minimum amount of data necessary to accomplish the business task.

38 CHAPTER 2 Database Configuration and Maintenance

Lesson 1: Configuring Files and Filegroups
Data within a database is stored on disk in one or more data files. Prior to being written to the
data file(s), every transaction is written to a transaction log file. In this lesson, you learn how
to design the data files underneath a database, group the files into filegroups to link physical
storage into a database, and manage the transaction log. You also learn how to configure the
tempdb database for optimal performance.

to:
Create filegroups
Add files to filegroups
Work with FILESTREAM data
M
Configure the transaction log


Files and Filegroups
Although storing all your data in memory would provide extremely fast access, you would
lose everything after the machine was shut down. To protect your data, it has to be persisted
to disk. Underneath each database is one or more files for persisting your data.
SQL Server uses two different types of files—data and transaction log files. Data files are
responsible for the long-term storage of all the data within a database. Transaction log files,
discussed in more detail later in this lesson, are responsible for storing all the transactions that
are executed against a database.
Instead of defining the storage of objects directly to a data file, SQL Server provides an
abstraction layer for more flexibility called a filegroup. Filegroups are a logical structure,
defined within a database, that map a database and the objects contained within a database,
to the data files on disk. Filegroups can contain more than one data file.
All objects that contain data, tables, indexes, and indexed views have an ON clause that
you can use to specify when you create an object that allows you to specify the filegroup
where SQL Server stores the object. As data is written to the objects, SQL Server uses the
filegroup definition to determine on which file(s) it should store the data.
At the time that a file is added to a database, you specify the initial size of the file. You can
also specify a maximum size for the file, as well as whether SQL Server automatically increases
the size of the file when it is full of data. If you specify automatic growth, you can specify
whether the file size increases based on a percentage of the current size or whether the file
size increases at a fixed amount that you define.

Lesson 1: Configuring Files and Filegroups CHAPTER 2 39

Unless a filegroup has only a single file, you do not know in which file a specific row of data
is stored. When writing to files, SQL Server uses a proportional fill algorithm. The proportional
fill algorithm is designed to ensure that all files within a filegroup reach the maximum defined
capacity at the same time. For example, if you had a data file that was 10 gigabytes (GB) and
a data file that was 1 GB, SQL Server writes ten rows to the 10 GB file for every one row that is
written to the 1 GB file.
The proportional fill algorithm is designed to allow a resize operation to occur at a filegroup
level. In other words, all files within a filegroup expand at the same time.

File Extensions

S QL Server uses three file extensions: .mdf, .ndf, and .ldf. Unfortunately, many
people have placed a lot of emphasis and meaning on these three extensions,
where no meaning was ever intended. Just like Microsoft Office Word documents
have a .doc or .docx extension, and Microsoft Office Excel files have an .xls or .xlsx
extension, the extension is nothing more than a naming convention. I could just
as easily create a Word document with an extension of .bob, or even no extension,
without changing the fact that it is still a Word document or preventing the ability
of Word to open and manipulate the file.

A file with an .mdf extension is usually the first data file that is created within a
database, generally is associated with the primary filegroup, and usually is considered
the primary data file which contains all the system objects necessary to a database.
The .ndf extension is generally used for all other data files underneath a database,
regardless of the filegroup to which the file is associated. The .ldf extension generally
is used for transaction logs.

The file extensions that you see for SQL Server are nothing more than naming
conventions. SQL Server does not care what the file extensions are or even if the
files have extensions. If you really wanted to, you could use an .ldf extension for
the primary data file, just as you could use an .mdf extension for a transaction log
file. Although the use of file extensions in this way does not affect SQL Server, it
generally could cause confusion among the other database administrators (DBAs) in
your organization. To avoid this confusion, it is recommended that you use the .mdf,
.ndf, and .ldf naming conventions commonly used across the SQL Server industry,
but do not forget that this is just a naming convention and has absolutely no effect
on SQL Server itself.

All data manipulation within SQL Server occurs in memory within a set of buffers. If you are
adding new data to a database, the new data is first written to a memory buffer, then written
to the transaction log, and finally persisted to a data file via a background process called
check pointing. When you modify or delete an existing row, if the row does not already exist


in memory, SQL Server first reads the data off disk before making the modification. Similarly if
you are reading data that has not yet been loaded into a memory buffer, SQL Server must read
it out of the data files on disk.
If you could always ensure that the machine hosting your databases had enough memory to
hold all the data within your databases, SQL Server could simply read all the data off disk into
memory buffers upon startup to improve performance. However, databases are almost always
much larger than the memory capacity on any machine, so SQL Server retrieves data from disk only
on an as-needed basis. If SQL Server does not have enough room in memory for the data being
read in, the least recently used buffer pools are emptied to make room for newly requested data.
Because accessing a disk drive is much slower than accessing memory, the data file design
underneath a database can have an impact on performance.
The first layer of design is within the disk subsystem. As the number of disk drives within
a volume increases, the read and write throughput for SQL Server increases. However, there
is an upper limit on the disk input/output (I/O), which is based upon the capacity of the
redundant array of independent disks (RAID) controller, host bus adapter (HBA), and disk
bus. So you cannot fix a disk I/O bottleneck by continually adding more disk drives. Although
entire 200+ page white papers have been written on random vs. sequential writes, transfer
speeds, rotational speeds, calculations of raw disk read/write speeds, and other topics,
the process of designing the disk subsystem is reduced to ensuring that you have enough
disks along with appropriately sized controllers and disk caches to deliver the read/write
throughput required by your database.
If it were simply a matter of the number of disks, there would be far fewer disk I/O
bottlenecks in systems. But there is a second layer of data file design: determining how many
data files you need and the location of each data file.
SQL Server creates a thread for each file underneath a database. As you increase the
number of files underneath a database, SQL Server creates more threads that can be used
to read and write data. However, you cannot just create a database with thousands of files
to increase its number of threads. This is because each thread consumes memory, taking
away space for data to be cached, and even if you could write to all the threads at the same
time, you would then saturate the physical disks behind the data files. In addition, managing
thousands of data files underneath a database is extremely cumbersome, and if a large
percentage of the files need to expand at the same time, you could create enough activity to
halt the flow of data within the database.
Due to the competing factors and the simple fact that in the real world, few DBAs have the
time to spend running complex byte transfer rate calculations or even to design the disk layer
based on a precise knowledge of the data throughput required, designing the data layer is an
iterative approach.
Designing the data layer of a database begins with the database creation. When you
create a database, it should have three files and two filegroups. You should have a file with an
.mdf extension within a filegroup named PRIMARY, a file with an .ndf extension in a filegroup
with any name that you choose, and the transaction log with an .ldf extension.


NOTE
E FILE EXTENSIONS
As stated in the sidebar “File Extensions,” earlier in this chapter, file extensions are nothing
more than naming conventions. They do not convey any special capabilities.

Besides being the logical definition for one or more files that defines the storage boundary
for an object, filegroups have a property called DEFAULT. The purpose of the DEFAULT property
is to define the filegroup where SQL Server places objects if you do not specify the ON clause
during object creation.
When the database is created, the primary filegroup is marked as the default filegroup.
After you create the database, you should mark the second filegroup as the default
filegroup. By changing the default filegroup, you ensure that any objects you create are
not accidentally placed on the primary filegroup and that only the system objects for the
database reside on the primary filegroup. You change the default filegroup by using the
following command:

ALTER DATABASE <database name> MODIFY FILEGROUP <filegroup name> DEFAULT

The main reason not to place any of your objects on the primary filegroup is to provide
as much isolation in the I/O as possible. The data in the system objects does not change as
frequently as data in your objects. By minimizing the write activity to the primary data file,
you reduce the possibility of introducing corruption due to hardware failures. In addition,
because the state of the primary filegroup also determines the state of the database,
you can increase the availability of the database by minimizing the changes made to the
primary filegroup.
Following the initial creation of the database, you add filegroups as needed to separate
the storage of objects within the database. You also add files to filegroups to increase the disk
I/O available to the objects stored on the filegroup, thereby reducing disk bottlenecks.

Transaction Logs
When SQL Server acknowledges that a transaction has been committed, SQL Server must
ensure that the change is hardened to persistent storage. Although all writes occur through
memory buffers, persistence is guaranteed by requiring that all changes are written to the
transaction log prior to a commit being issued. In addition, the writes to the transaction log
must occur directly to disk.
Because every change made to a database must be written directly to disk, the disk storage
architecture underneath your transaction log is the most important decision affecting the
maximum transaction throughput that you can achieve.
SQL Server writes sequentially to the transaction log but does not read from the log except
during a restart recovery. Because SQL Server randomly reads and writes to the data files
underneath a database, by isolating the transaction log to a dedicated set of disks you ensure
that the disk heads do not have to move all over the disk and move in a mostly linear manner.

EXAM TIP
The maximum transaction throughput for any database is bound by the amount of data
per second that SQL Server can write to the transaction log.

Benchmarks

B enchmark disclosures are the best source of information when designing the
disk storage for optimal performance. Many organizations and the press place
great emphasis on various benchmarks. However, a careful study reveals that,
by itself, SQL Server doesn’t have as large of an impact on the overall numbers
as you are led to believe. The transaction processing engine within SQL Server is
extremely efficient and has a fixed contribution to transaction throughput, but the
real key to maximizing the transaction rate is in the disk storage. Given the same
disk configuration, a 7,200 RPM drive delivers about 50 percent of the SQL Server
transaction rate of a 15,000 RPM drive. Having 100 disks underneath a transaction
log generally doubles the transaction rate of having only 50 disks. In addition,
one of the tricks used in benchmarks is to partition a disk such that all the SQL
Server data is written to the outside half or less of the disk platter, because based
on physics, as the read/write head of a disk moves toward the edge of a circular
object, the velocity increases, thereby spinning a larger segment of the disk platter
underneath the drive head per unit of time.

FILESTREAM data
Although the volume of data within organizations has been exploding, leading the way in this
data explosion is unstructured data. To tackle the problem of storing, managing, and combining
the large volumes of unstructured databases with the structured data in your databases, SQL
Server 2008 introduced FILESTREAM.
The FILESTREAM feature allows you to associate files with a database. The files are stored
in a folder on the operating system, but are linked directly into a database where the files can
be backed up, restored, full-text-indexed, and combined with other structured data.
Although the details of FILESTREAM are covered in more detail in Chapter 3, “Tables,”
and Chapter 5, “Full Text Indexing,” to store FILESTREAM data within a database, you need
to specify where the data will be stored. You define the location for FILESTREAM data in
a database by designating a filegroup within the database to be used for storage with
the CONTAINS FILESTREAM property. The FILENAME property defined for a FILESTREAM
filegroup specifies the path to a folder. The initial part of the folder path definition must exist;
however, the last folder in the path defined cannot exist and is created automatically. After
the FILESTREAM folder has been created, a filestream.hdr file is created in the folder, which is
a system file used to manage the files subsequently written to the folder.


tempdb Database
Because the tempdb database is much more heavily used than in previous versions, special
care needs to be taken in how you design the storage underneath tempdb.
In addition to temporary objects, SQL Server uses tempdb for worktables used in
grouping/sorting operations, worktables to support cursors, the version store supporting
snapshot isolation level, and overflow for table variables. You can also cause index build
operations to use space in tempdb.
Due to the potential for heavy write activity, you should move tempdb to a set of disks
separated from your databases and any backup files. To spread out the disk I/O, you might
consider adding additional files to tempdb.

NOTE
E MULTIPLE tempdb FILES
A common practice for tempdb is to create one file per processor. The one file per processor
is with respect to what SQL Server would consider a processor and not the physical processor,
which could have multiple cores as well as hyperthreading.

Q
Quick Check
1 . What are the types of files that you create for databases and what are the
commonly used file extensions?

2. What is the purpose of the transaction log?

Quick Check Answers
1 . You can create data and log files for a database. Data files commonly have either
an .mdf or .ndf extension, whereas log files have an .ldf extension.

2. The transaction log records every change that occurs within a database to persist
all transactions to disk.

PR ACTICE Creating Databases

In this practice, you create a database with multiple files that is enabled for FILESTREAM storage.
1. Execute the following code to create a database:

CREATE DATABASE TK432 ON PRIMARY
( NAME = N'TK432_Data', FILENAME = N'c:testTK432.mdf' ,
SIZE = 8MB , MAXSIZE = UNLIMITED, FILEGROWTH = 16MB ),
FILEGROUP FG1
( NAME = N'TK432_Data2', FILENAME = N'c:testTK432.ndf' ,
SIZE = 8MB , MAXSIZE = UNLIMITED, FILEGROWTH = 16MB ),


FILEGROUP Documents CONTAINS FILESTREAM DEFAULT
( NAME = N'Documents', FILENAME = N'c:testTK432Documents' )
LOG ON
( NAME = N'TK432_Log', FILENAME = N'c:testTK432.ldf' ,
SIZE = 8MB , MAXSIZE = 2048GB , FILEGROWTH = 16MB )
GO

2. Execute the following code to change the default filegroup:

ALTER DATABASE TK432
MODIFY FILEGROUP FG1
DEFAULT
GO

Lesson Summary
You can define one or more data and log files for the physical storage of a database.
Data files are associated to a filegroup within a database.
Filegroups provide the logical storage container for objects within a database.
Files can be stored using the new FILESTREAM capabilities.

Lesson Review
The following question is intended to reinforce key information presented in Lesson 1,
“Configuring Files and Filegroups.” The question is also available on the companion CD if you
prefer to review it in electronic form.

NOTE
E ANSWERS
Answers to this question and an explanation of why each answer choice is correct or
incorrect is located in the “Answers” section at the end of the book.

1. You have a reference database named OrderHistory, which should not allow any data
to be modified. How can you ensure, with the least amount of effort, that users can
only read data from the database?
A. Add all database users to the db_datareader role.
B. Create views for all the tables and grant select permission only on the views to
database users.
C. Set the database to READ_ONLY.
D. Grant select permission on the database to all users and revoke insert, update, and
delete permissions from all users on the database.


Lesson 2: Configuring Database Options
Data within a database is stored on disk in one or more data files. Prior to being written to
the data file(s), every transaction is written to a transaction log file. In this lesson, you learn
how to design the data files underneath a database, group the files into filegroups to link
physical storage into a database, and manage the transaction log.

to:
Set the database recovery model
Configure database options
Manage collation sequences
Check and maintain database consistency


Database Options
A database has numerous options that control a variety of behaviors. These options are
broken down into several categories, including the following:
Recovery
Auto options
Change tracking
Access
Parameterization

Recovery Options
The recovery options determine the behavior of the transaction log and how damaged pages
are handled.

Recovery Models
Every database within a SQL Server instance has a property setting called the recovery model.
The recovery model determines the types of backups you can perform against a database.
The recovery models available in SQL Server 2008 are:
Full
Bulk-logged
Simple


THE FULL RECOVERY MODEL
When a database is in the Full recovery model, all changes made, using both data manipulation
language (DML) and data definition language (DDL), are logged to the transaction log. Because
all changes are recorded in the transaction log, it is possible to recover a database in the Full
recovery model to a given point in time so that data loss can be minimized or eliminated if you
should need to recover from a disaster. Changes are retained in the transaction log indefinitely
and are removed only by executing a transaction log backup.

BEST PRACTICES
S RECOVERY MODELS
Every production database that accepts transactions should be set to the Full recovery
model. By placing the database in the Full recovery model, you can maximize the restore
options that are possible.

THE BULK-LOGGED RECOVERY MODEL
Certain operations are designed to manipulate large amounts of data. However, the
overhead of logging to the transaction log can have a detrimental impact on performance.
The Bulk-logged recovery model allows certain operations to be executed with minimal
logging. When a minimally logged operation is performed, SQL Server does not log
every row changed but instead logs only the extents, thereby reducing the overhead and
improving performance. The operations that are performed in a minimally logged manner
with the database set in the Bulk-logged recovery model are:
BCP
BULK INSERT
SELECT. . .INTO
CREATE INDEX
ALTER INDEX. . .REBUILD
Because the Bulk-logged recovery model does not log every change to the transaction
log, you cannot recover a database to a point in time, within the interval that a minimally
logged transaction executed, when the Bulk-logged recovery model was enabled.

THE SIMPLE RECOVERY MODEL
The third recovery model is Simple. A database in the Simple recovery model logs operations to
the transaction log exactly as the Full recovery model does. However, each time the database
checkpoint process executes, the committed portion of the transaction log is discarded. A
database in the Simple recovery model cannot be recovered to a point in time because it is not
possible to issue a transaction log backup for a database in the simple recovery model.
Because the recovery model is a property of a database, you set the recovery model by
using the ALTER DATABASE command as follows:

ALTER DATABASE database_name
SET RECOVERY { FULL | BULK_LOGGED | SIMPLE }

Lesson 2: Configuring Database Options CHAPTER 2 47

The backup types available for each recovery model are shown in Table 2-1.

TABLE 2-1 Backup Types Available for Each Recovery Model

BACKUP TYPE
RECOVERY MODEL

FULL DIFFERENTIAL TRAN LOG

Full Yes Yes Yes
Bulk Yes Yes Yes/no minimally
logged
Simple Yes Yes No

EXAM TIP
You need to know which types of backups are possible for each recovery model.

Damaged Pages
It is possible to damage data pages during a write to disk if you have a power failure or
failures in disk subsystem components during the write operation. If the write operation fails
to complete, you can have an incomplete page in the database that cannot be read. Because
the damage happens to a page on disk, the only time that you see a result of the damage is
when SQL Server attempts to read the page off disk.
The default configuration of SQL Server does not check for damaged pages and could
cause the database to go off-line if a damaged page is encountered. The PAGE_VERIFY
CHECKSUM option can be enabled, which allows you to discover and log damaged pages.
When pages are written to disk, a checksum for the page is calculated and stored in the page
header. When SQL Server reads a page from disk, a checksum is calculated and compared
to the checksum stored in the page header. If a damaged page is encountered, an 824 error
is returned to the calling application and logged to the SQL Server error log and Windows
Application Event log, and the ID of the damaged page is logged to the suspect_pages table
in the msdb database.
In SQL Server 2005, the only way to fix a damaged page was to execute a page restore,
which is discussed in Chapter 9, “Backing Up and Restoring a Database.” In addition to a
page restore, if the database is participating in a database mirroring session, SQL Server 2008
automatically replaces the page with a copy of the page from the mirror. When Database
Mirroring automatically fixes a corrupt page, an entry is logged and can be reviewed with the
sys.dm_db_mirroring_auto_page_repair view.

Auto Options
There are five options for a database that enable certain actions to occur automatically:
AUTO_CLOSE
AUTO_SHRINK


AUTO_CREATE_STATISTICS
AUTO_UPDATE_STATISTICS
AUTO_UPDATE_STATISTICS_ASYNCH
Each database within an instance requires a variety of resources, the most significant of
which is a set of memory buffers. Each open database requires several bytes of memory and
any queries against the database populate the data and query caches. If the AUTO_CLOSE
option is enabled, when the last connection to a database is closed, SQL Server shuts down
the database and releases all resources related to the database. When a new connection is
made to the database, SQL Server starts up the database and begins allocating resources.
By default, AUTO_CLOSE is disabled. Unless you have severe memory pressure, you should
not enable a database for AUTO_CLOSE. In addition, a database that is frequently accessed
should not be set to AUTO_CLOSE because it would cause a severe degradation in performance.
This is because you would never be able to use the data and query caches adequately.
Data files can be set to grow automatically when additional space is needed. Although most
operations to increase space affect the database on a long-term basis, some space increases
are needed only on a temporary basis. If the AUTO_SHRINK option is enabled, SQL Server
periodically checks the space utilization of data and transaction log files. If the space checking
algorithm finds a data file that has more that 25 percent free space, the file automatically
shrinks to reclaim disk space.
Expanding a database file is a very expensive operation. Shrinking a database file is also
an expensive operation. If the size of a database file increased during normal operations, it
is very likely that if the file shrinks, the operation would recur and increase the database file
again. The only operations that cause one-time space utilization changes to database files are
administrative processes that create and rebuild indexes, archive data, or load data. Because
the growth of database files is so expensive, it is recommended to leave the AUTO_SHRINK
option disabled and manually shrink files only when necessary.
Statistics allow the Query Optimizer to build more efficient query plans. If the AUTO_
CREATE_STATSTICS option is enabled, SQL Server automatically creates statistics that are
missing during the optimization phase of query processing. Although the creation of statistics
incurs some overhead, the benefit to query performance is worth the overhead cost for SQL
Server to create statistics automatically when necessary.
Statistics capture the relative distribution of values in one or more columns of a table.
After the database has been in production for a while, normal database changes do not
appreciably change the statistics distribution in general. However, mass changes to the data
or dramatic shifts in business processes can suddenly introduce significant skew into the data.
If the statistics are not updated to reflect the distribution shift, the Optimizer could select an
inefficient query plan.
Databases have two options that allow SQL Server to update out-of-date statistics
automatically. The AUTO_UPDATE_STATISTICS option updates out-of-date statistics
during query optimization. If you choose to enable AUTO_UPDATE_STATISTICS, a second


option, AUTO_UPDATE_STATISTICS_ASYNC, controls whether statistics are updated during
query optimization or if query optimization continues while the statistics are updated
asynchronously.

Change Tracking
One of the challenges for any multiuser system is to ensure that the changes of one user do
not accidentally overwrite the changes of another. To prevent the changes of multiple users
from overriding each other, applications are usually built within mechanisms to determine
whether a row has changed between the time it was read and the time it is written back to
the database. The tracking mechanisms usually involve columns with either a datetime or
timestamp column and also might include an entire versioning system.
SQL Server 2008 introduces a new feature implemented through the CHANGE_TRACKING
database option. Change tracking is a lightweight mechanism that associates a version with
each row in a table that has been enabled for change tracking. Each time the row is changed,
the version number is incremented. Instead of building systems to avoid changes from multiple
users overriding each other, applications need only compare the row version to determine if a
change has occurred to the row between when the row was read and written.
After change tracking has been enabled for the database, you can choose which tables
within a database that change tracking information should be captured for. Over time,
change tracking information accumulates in the database, so you can also specify how long
tracking information is retained through the CHANGE_RETENTION option and whether
tracking information should be automatically cleaned up with the AUTO_CLEANUP option.

Access
Access to a database can be controlled through several options.
The status of a database can be explicitly set to ONLINE, OFFLINE, or EMERGENCY. When
a database is in an ONLINE state, you can perform all operations that would otherwise be
possible. A database that is in an OFFLINE state is inaccessible. A database in an EMERGENCY
state can be accessed only by a member of the db_owner role, and the only command
allowed to be executed is SELECT.
You can control the ability to modify data for an online database by setting the database
to either READ_ONLY or READ_WRITE. A database in READ_ONLY mode cannot be written to.
In addition, when a database is placed in READ_ONLY mode, SQL Server removes any transaction
log file that is specified for the database. Changing a database from READ_ONLY to READ_WRITE
causes SQL Server to re-create the transaction log file.
User access to a database can be controlled through the SINGLE_USER, RESTRICTED_USER,
and MULTI_USER options. When a database is in SINGLE_USER mode, only a single user is
allowed to access the database. A database set to RESTRICTED_USER only allows access to
members of the db_owner, dbcreator, and sysadmin roles.


If multiple users are using the database when you change the mode to SINGLE_USER or
users that conflict with the allowed set for RESTRICTED_USER, the ALTER DATABASE command
is blocked until all the non-allowed users disconnect. Instead of waiting for users to complete
operations and disconnect from the database, you can specify a ROLLBACK action to terminate
connections forcibly. The ROLLBACK IMMEDIATE option forcibly rolls back any open transactions,
along with disconnecting any nonallowed users. You can allow users to complete transactions
and exit the database by using the ROLLBACK AFTER <number of seconds> option, which waits
for the specified number of seconds before rolling back transactions and disconnecting users.
The normal operational mode for most databases is ONLINE, READ_WRITE, and MULTI_USER.

Parameterization
One of the “hot button” topics in application development is whether to parameterize calls
to the database. When a database call is parameterized, the values are passed as variables.
You can find just as many articles advocating for both sides. Unfortunately, applications gain
a significant benefit when database calls are parameterized.
SQL Server caches the query plan for every query that is executed. Unless there is pressure
on the query cache that forces a query plan from the cache, every query executed since
the instance started is in the query cache. When a query is executed, SQL Server parses and
compiles the query. The query is then compared to the query cache using a string-matching
algorithm. If a match is found, SQL Server retrieves the plan that has already been generated
and executes the query.
A query that is parameterized has a much higher probability of being matched because
the query string does not change even when the values being used vary. Therefore,
parameterized queries can reuse cached query plans more frequently and avoid the time
required to build a query plan.
Because not all applications parameterize calls to the database, you can force SQL Server
to parameterize every query for a given database by setting the PARAMETERIZATION
FORCED database option.
The default setting for a database is not to force parameterization. The reuse of query plans
provides a benefit so long as the query plan being reused is the most efficient path through
the data. For tables where there is significant data skew, one value produces an efficient
query plan, whereas another value causes a different query plan to be created. In addition,
applications see the effect of parameterization only if the majority of database calls have an
extremely short duration.
So long as the majority of your database calls have a very short duration and the query
plan generated do not change depending upon the parameters passed, you could see a
performance boost by forcing parameterization.

Collation Sequences
SQL Server has the capability to store character data that spans every possible written language.
However, not every language follows the same rules for sorting or data comparisons. SQL Server
allows you to define the rules for comparison, sorting, case sensitivity, and accent sensitivity
through the specification of a collation sequence.
When you install SQL Server, you specify a default collation sequence that is used for all
databases, tables, and columns. You can override the default collation sequence at each level.
The collation sequence for an instance can be overridden at a database level by specifying
the COLLATE clause in either the CREATE DATABASE or ALTER DATABASE command.

Q
Quick Check
1 . How do you restrict database access to members of the db_owner role and
terminate all active transactions and connection at the same time?

2. What backups can be executed for a database in each of the recovery models?

Quick Check Answers
1 . You would execute the following command: ALTER DATABASE <database name>
SET RESTRICTED_USER WITH ROLLBACK IMMEDIATE.

2. You can create full, differential, and file/filegroup backups in the Simple recovery
model. The Bulk-logged recovery model allows you to execute types of backups,
but you cannot restore a database to a point in time during an interval when a
minimally logged transaction is executing. All types of backups can be executed
in the Full recovery model.

PR ACTICE Changing the Database Recovery Model

In this practice, you change the recovery model of the AdventureWorks database to FULL to
ensure that you can recover from a failure to a point in time.
1. Execute the following code:
ALTER DATABASE AdventureWorks
SET RECOVERY FULL
GO

2. Right-click the AdventureWorks database, select Properties, and select the Options tab
to view the recovery model and make sure that it is full.

Lesson Summary
You can set the recovery model for a database to Full, Bulk-logged, or Simple.
You can back up transaction logs for a database in the Full or Bulk-logged recovery
model.
The AUTO_SHRINK option shrinks a database file when there is more than 25 percent
of free space in the file.
You can track and log damaged pages by enabling the PAGE_VERIFY CHECKSUM option.

Lesson Review
“Configuring Database Options.” The question is also available on the companion CD if you

NOTE
E ANSWERS

1. You are the database administrator at Blue Yonder Airlines and are primarily
responsible for the Reservations database, which runs on a server running SQL Server
2008. In addition to customers booking flights through the company’s Web site, flights
can be booked with several partners. Once an hour, the Reservations database receives
multiple files from partners, which are then loaded into the database using the Bulk
Copy Program (BCP) utility. You need to ensure that you can recover the database
to any point in time while also maximizing the performance of import routines. How
would you configure the database to meet business requirements?
A. Enable AUTO_SHRINK
B. Set PARAMETERIZATION FORCED on the database
C. Configure the database in the Bulk-logged recovery model
D. Configure the database in the Full recovery model


Lesson 3: Maintaining Database Integrity
In a perfect world, everything that you save to disk storage would always write correctly, read
correctly, and never have any problems. Unfortunately, your SQL Server databases live in an
imperfect world where things do go wrong. Although this occurs very rarely, data within your
database can become corrupted if there is a failure in the disk storage system as SQL Server is
writing to a page. Data pages are 8 kilobytes (KB) in size, but SQL Server divides a page into
16 blocks of 512 bytes apiece when performing write operations. If SQL Server begins writing
blocks on a page and the disk system fails in the middle of the write process, only a portion
of the page is written successfully, producing a problem called a torn page. In this lesson, you
learn how to detect and correct corruption errors in your database.

to:
Check a database for integrity
Use DMVs to diagnose corruption issues


Database Integrity Checks
As you learned in Lesson 2, databases have an option called PAGE_VERIFY. The page
verification can be set to either TORN_PAGE_DETECTION or CHECKSUM. The PAGE_VERIFY
TORN_PAGE_DETECTION option exists for backwards compatibility and should not be used.
When the PAGE_VERIFY CHECKSUM option is enabled, SQL Server calculates a checksum for
the page prior to the write. Each time a page is read off disk, a checksum is recalculated and
compared to the checksum written to the page. If the checksums do not match, the page has
been corrupted.
When SQL Server encounters a corrupt page, an error is thrown, the command attempting
to access the corrupt page is aborted, and an entry is written into the suspect_pages table in
the msdb database.

BEST PRACTICES
S PAGE VERIFICATION
You should enable the PAGE_VERIFY CHECKSUM option on every production database.
M

Although page verification can detect and log corrupted pages, the page must be read
off disk to trigger the verification check. Data is normally read off disk when users and
applications access data, but instead of having a user receive an error message, it is much
better for you to proactively find corruption and fix the problem by using a backup before
the user has a process aborted.


You can force SQL Server to read every page from disk and check the integrity by executing
the DBCC CHECKDB command. The generic syntax of DBCC CHECKDB is:

DBCC CHECKDB [( 'database_name' | database_id | 0
[ , NOINDEX | { REPAIR_ALLOW_DATA_LOSS | REPAIR_FAST
| REPAIR_REBUILD } ] )]
[ WITH {[ ALL_ERRORMSGS ] [ , [ NO_INFOMSGS ] ] [ , [ TABLOCK ] ]
[ , [ ESTIMATEONLY ] ] [ , [ PHYSICAL_ONLY ] ] | [ , [ DATA_PURITY ] ] } ]

When DBCC CHECKDB is executed, SQL Server performs all the following actions:
Checks page allocation within the database
Checks the structural integrity of all tables and indexed views
Calculates a checksum for every data and index page to compare against the stored
checksum
Validates the contents of every indexed view
Checks the database catalog
Validates Service Broker data within the database
To accomplish these checks, DBCC CHECKDB executes the following commands:
DBCC CHECKALLOC, to check the page allocation of the database
DBCC CHECKCATALOG, to check the database catalog
DBCC CHECKTABLE, for each table and view in the database to check the structural
integrity
Any errors encountered are output so that you can ﬁx the problems. If an integrity error is
found in an index, you should drop and re-create the index. If an integrity error is found in a
table, you need to use your most recent backups to repair the damaged pages.

NOTE
E DATABASE MIRRORING
If the database is participating in Database Mirroring, SQL Server attempts to retrieve a
copy of the page from the mirror. If the page can be retrieved from the mirror and has
the correct page contents, the page is replaced automatically on the principal without
requiring any intervention. When SQL Server replaces a corrupt page from the mirror, an
entry is written into the sys.dm_db_mirroring_auto_page_repair view.
r

Q
Quick Check
1 . Which option should be enabled for all production databases?
2. What checks does DBCC CHECKDB perform?

Lesson 3: Maintaining Database Integrity CHAPTER 2 55

Quick Check Answers
1 . You should set the PAGE_VERIFY CHECKSUM option for all production databases.
M

2. DBCC CHECKDB checks the logical and physical integrity of every table, index,
and indexed view within the database, along with the contents of every indexed
view, page allocations, Service Broker data, and database catalog.

PR ACTICE Checking Database Integrity

In this practice, you check the integrity of the AdventureWorks database.

DBCC CHECKDB ('AdventureWorks') WITH NO_INFOMSGS, ALL_ERRORMSGS
GO

2. Review the results.

Lesson Summary
The PAGE_VERIFY CHECKSUM option should be enabled for every production database
to detect any structural integrity errors.
When a corrupt page is encountered, the page is logged to the suspect_pages table in
the msdb database. If a database is participating in a Database Mirroring session, SQL
Server automatically retrieves a copy of the page from the mirror, replaces the page on
the principal, and logs an entry in the sys.dm_db_mirroring_auto_page_repair view.
DBCC CHECKDB is used to check the logical and physical consistency of a database.

Lesson Review
“Maintaining Database Integrity.” The question is also available on the companion CD if you

NOTE
E ANSWERS

1. Which commands are executed when you run the DBCC CHECKDB command? (Check
all that apply.)
A. DBCC CHECKTABLE
B. DBCC CHECKIDENT
C. DBCC CHECKCATALOG
D. DBCC FREEPROCCACHE


Chapter Review
To practice and reinforce the skills you learned in this chapter further, you can perform the
following tasks:
topics in this chapter and asks you to create a solution.

Chapter Summary
Databases can be configured with the Full, Bulk-logged, or Simple recovery model.
The recovery model of the database determines the backups that can be created, as
well as limitations on the recovery options that can be performed.
You can set a collation sequence for a database that overrides the collation sequence
defined for the instance.

Key Terms
Corrupt page
Filegroup
Recovery model

Case Scenario

Case Scenario: Configuring Databases for Coho Vineyard

BACKGROUND

Company Overview
Coho Vineyard was founded in 1947 as a local, family-run winery. Due to the award-winning
wines it has produced over the last several decades, Coho Vineyards has experienced
significant growth. To continue expanding, several existing wineries were acquired over the
years. Today, the company owns 16 wineries; 9 wineries are in Washington, Oregon, and
California, and the remaining 7 wineries are located in Wisconsin and Michigan. The wineries


employ 532 people, 162 of whom work in the central office that houses servers critical to the
business. The company has 122 salespeople who travel around the world and need access to
up-to-date inventory availability.

Planned Changes
Until now, each of the 16 wineries owned by Coho Vineyard has run a separate Web site locally
on the premises. Coho Vineyard wants to consolidate the Web presence of these wineries so
that Web visitors can purchase products from all 16 wineries from a single online store. All data
associated with this Web site be stored in databases in the central office.
When the data is consolidated at the central office, merge replication will be used to
deliver data to the salespeople as well as to allow them to enter orders. To meet the needs of
the salespeople until the consolidation project is completed, inventory data at each winery is
sent to the central office at the end of each day. Merge replication has been implemented to
allow salespeople to maintain local copies of customer, inventory, and order data.

EXISTING DATA ENVIRONMENT

Databases
Each winery presently maintains its own database to store all business information. At the
end of each month, this information is brought to the central office and transferred into the
databases shown in Table 2-2.

TABLE 2-2 Coho Vineyard Databases

DATABASE SIZE

Customer 180 megabytes (MB)
Accounting 500 MB
HR 100 MB
Inventory 250 MB
Promotions 80 MB

After the database consolidation project is complete, a new database named Order will
serve as a data store to the new Web store. As part of their daily work, employees also will
connect periodically to the Order database using a new in-house Web application.
The HR database contains sensitive data and is protected using Transparent Data
Encryption (TDE). In addition, data in the Salary table is encrypted using a certificate.

Database Servers
A single server named DB1 contains all the databases at the central office. DB1 is running SQL
Server 2008 Enterprise on Windows Server 2003 Enterprise.


Business Requirements
You need to design an archiving solution for the Customer and Order databases. Your archival
strategy should allow the Customer data to be saved for six years.
To prepare the Order database for archiving procedures, you create a partitioned table
named Order.Sales. Order.Sales includes two partitions. Partition 1 includes sales activity
for the current month. Partition 2 is used to store sales activity for the previous month.
Orders placed before the previous month should be moved to another partitioned table
named Order.Archive. Partition 1 of Order.Archive includes all archived data. Partition 2
remains empty.
A process needs to be created to load the inventory data from each of the 16 wineries by
4 A.M. daily.
Four large customers submit orders using Coho Vineyards Extensible Markup Language
(XML) schema for Electronic Data Interchange (EDI) transactions. The EDI files arrive by 5 P.M.
and need to be parsed and loaded into the Customer, Accounting, and Inventory databases,
which each contain tables relevant to placing an order. The EDI import routine is currently a
single-threaded C++ application that takes between three and six hours to process the files.
You need to finish the EDI process by 5:30 P.M. to meet your Service Level Agreement (SLA)
with the customers. After the consolidation project has finished, the EDI routine loads all data
into the new Order database.
You need to back up all databases at all locations. You can lose a maximum of five minutes
of data under a worst-case scenario. The Customer, Account, Inventory, Promotions, and Order
databases can be off-line for a maximum of 20 minutes in the event of a disaster. Data older
than six months in the Customer and Order databases can be off-line for up to 12 hours in the
event of a disaster.
Answer the following questions.
1. How should you configure the databases for maximum performance?
2. How should the databases be configured to meet recovery obligations?

Suggested Practices
tasks.

Configuring Databases
Practice 1 Create a database which can store FILESTREAM data.
Practice 2 Change the recovery model and observe the effects on backup and
restore options.

Suggested Practices CHAPTER 2 59

Practice 3 Change the database state to READ_ONLY and observe the effect on the
transaction log ﬁle.
Practice 4 Create multiple connections to a database, change the access to RESTRICTED_
USER, and specify the ROLLBACK IMMEDIATE option. Observe the effects.


For details about all the practice test options available, see the section entitled “How to
Use the Practice Tests,” in the Introduction to this book.


CHAPTER 3

Tables
ables form the core of your databases by defining the structure that is used to store your
T data. A database without tables would have very little use within a business application.
In this chapter, you learn how to create efficient tables that can perform well under a variety
of conditions while also enforcing the rules of your business.

Exam objective in this chapter:
Implement data compression

Lesson 1: Creating Tables 63

Lesson 2: Implementing Constraints 77

Before You Begin
To complete the lessons in this chapter, you must have
An instance of SQL Server 2008 installed
The AdventureWorks sample database loaded in your instance

REAL WORLD
Michael Hotek

A lmost 20 years ago, I started working with SQL Server. Of course, back then
it was Sybase SQL Server, which became the basis of the first version of
Microsoft SQL Server. During that time, I’ve dealt with millions of databases across
thousands of companies. I’ve also taught SQL Server to over 50,000 people. During
that time, I’ve been amazed at the complicated lengths many people go to design
a database or teach someone how to design a database.

Interestingly enough, it turns out that regardless of whether you are designing
a relational database or a data warehouse, the entirety of the field of database
design can be found in a single statement—“Put stuff where it belongs.” Yes, those

CHAPTER 3 61

hundreds of thousands of pages that have been published about relational database
or data warehouse design can really be encompassed in a single sentence, coupled
with the logic that we all possess by the time we are asked to design a database.

If you designed all your databases around this single sentence, not only would you
need very little time to design a database, but you would also produce the database
that best met your company’s needs. When you design a database, you aren’t actually
designing the entire database in one step. You are designing one table at a time for
the data that you need to store.

Tables follow some very basic rules—columns define a group of data that you need
to store, and you add one row to the table for each unique group of information.
The columns that you define represent the distinct pieces of information that you
need to work with inside your database, such as a city, product name, first name,
last name, or price.

If you were to design a database to store orders placed by customers for products,
you have already defined three core tables—customers, orders, and products. For a
customer to place an order, you want to know who the customer is. So your customer
table would have one or more columns for a name, depending upon whether you
wanted to work with first name separately from last name and if you wanted to store
an honorific such as Rev., Mr., or Mrs.

If your customers are placing orders, you probably want to ship the orders to the
customers, so you would have one or more columns to store the address. You could
validly place the column(s) for the customer’s address into the customer table. This
structure would work well if you allowed only a single address for a customer. If your
customers wanted more flexibility to store multiple addresses, you would create a
new table to store the addresses and then link addresses back to the customers. The
customer address table would be created because you can add an unlimited number
of rows to a table, whereas the number of columns is finite.

If you were to continue the process, you would have quickly defined dozens or even
hundreds of tables that would allow you to store the data required by your business
application. You would have also created your database without ever having to think
about first, second, or third normal forms, star schemas, snowflake schemas, or any
other type of database construct. You would have created your database by “putting
stuff where it belongs.” Then, after you have defined the database structure, all you
have left to do is determine what types of data a column is going to store and whether
the column is required or not to have a database that could be used by an application.

62 CHAPTER 3 Tables

Lesson 1: Creating Tables
Tables form the most granular building blocks of applications, defining the structure of the
data that can be stored. When designing tables, your task is to create the tables that can
store the data required by your business applications, while at the same time minimizing the
amount of disk and memory being used. In this lesson, you learn about the trade-offs that
you need to make in the definition of a table to handle your business data while minimizing
the resources consumed by the data.

to:
Create schemas
Select appropriate data types
Apply column properties to enforce business requirements
Add computations to a table
Define storage properties that reduce the amount of space consumed by a row
or page


Schemas
In addition to being a security structure (which you learn more about in Chapter 11, “SQL
Server Security”), a schema provides a convenient mechanism to group objects together
within a database. A schema is also the container that owns all objects within a database.
You manage each database that is created within an instance separately in terms of disk
consumption, transactions, and memory resources. If your application currently accesses
multiple databases or you are creating an application with multiple databases that do not
need to be stored on separate instances for increased capacity, you should combine the
objects into a single database and use schemas to separate groups of objects.
The simplest syntax to create a schema is:

CREATE SCHEMA <schema name> AUTHORIZATION <owner name>

E
NOTE CREATING SCHEMAS
The CREATE SCHEMA statement supports the creation of a schema along with the creation
of tables and views and the assignment of permissions in a single statement. Creating code
within SQL Server is not an obfuscation exercise, nor is it an exercise in trying to figure out
the fewest statements you can construct to achieve your goals. Someone else usually has to
maintain your code, and saving a couple of extra steps to create a more maintainable script
is advisable. Therefore, it is recommended that you do not create tables and view or assign
permissions within a CREATE SCHEMA statement. Any CREATE SCHEMA statement that is
executed must be in a separate batch.

Lesson 1: Creating Tables CHAPTER 3 63

Data Types
Although not typically referred to as constraints, the data type for a column is the most
fundamental constraint that you can specify for a table. Your choice of data type restricts the
range of possible values while deﬁning the maximum amount of space that will be consumed
for the column within a row.
The choice of data type is also the most fundamental performance decision you will ever
make for a database. You need to select a data type that can store the data required by the
business, but your data type should not consume a single byte of storage more than necessary.
Although it might seem strange to worry about something that sounds as trivial as 1 byte, when
you have millions or billions of rows of data in a table, a single wasted byte per row adds up to
a signiﬁcant amount of disk storage. More importantly, each wasted byte also wastes your most
precious commodity: memory on the server, because all data must pass through memory before
an application can use the data.

Numeric Data Types
Nine numeric data types ship with SQL Server 2008, and they are used to store integer,
monetary, and decimal-based numbers. Table 3-1 lists the numeric data types available, along
with the range of values and storage space required for each.

TABLE 3-1 Numeric Data Types

DATA TYPE RANGE OF VALUES STORAGE SPACE

TINYINT 0 to 255 1 byte
SMALLINT -32,768 to 32,767 2 bytes
INT -231 to 231-1 4 bytes
BIGINT -263 to 263-1 8 bytes
DECIMAL(P,S) and -1038+1 to 1038-1 5 to 17 bytes
NUMERIC(P,S)
SMALLMONEY -214,748.3648 to 214,748.3647 4 bytes
MONEY -922,337,203,685,477.5808 to 8 bytes
922,337,203,685,477.5807
REAL -3.438 to -1.1838, 0, and 1.1838 to 3.438 4 bytes
FLOAT(N) -1.79308 to -2.23308, 0, and 2.23308 to 4 bytes or 8 bytes
1.79308

NOTE
E NUMERIC AND DECIMAL DATA TYPES
The data types NUMERIC and DECIMAL are exactly equivalent. Both data types still exist
C
within SQL Server for backwards compatibility purposes.

64 CHAPTER 3 Tables

The MONEY and SMALLMONEY data types are designed specifically to store monetary
values with a maximum of four decimal places.
The FLOAT data type takes an optional parameter of the number of digits stored after
the decimal, which is called the mantissa. If the mantissa is defined between 1 and 24, then
a FLOAT consumes 4 bytes of storage. If the mantissa is defined between 25 and 53, then a
FLOAT consumes 8 bytes of storage.

NOTE
E NUMERIC PRECISION
FLOAT and REAL data types are classified as approximate numerics, or floating point numbers.
T
The value stored within a float or real column depends upon the processor architecture that
is used. Moving a database from a server with an Intel chipset to one with an AMD chipset, or
vice versa, can produce different results in these columns. If you are utilizing FLOAT and REAL
T
due to the range of values supported, you must account for compounding error factors in any
calculation that you perform.

Decimal Data Types
Decimal data types have two parameters—precision and scale. The precision indicates the
total number of digits that can be stored both to the left and to the right of the decimal.
The scale indicates the maximum number of digits to the right of the decimal point. For
example, assigning a column the DECIMAL(8,3) data type allows SQL Server to store a total of
eight digits in the column, with three of the digits to the right of the decimal point or values
between -99999.999 and 99999.999.
The storage space consumed by a decimal data type depends on the defined precision, as
shown in Table 3-2.

TABLE 3-2 Decimal and Numeric Data Type Storage

PRECISION STORAGE SPACE

1 to 9 5 bytes
10 to 19 9 bytes
20 to 28 13 bytes
29 to 38 17 bytes

Character Data Types
SQL Server 2008 has four data types for storing character data, with the choice of which one
to use depending upon whether you have fixed- or variable-length values and whether you
want to store Unicode or non-Unicode data. Table 3-3 shows the storage space consumed by
character data types.


TABLE 3-3 Character Data Types

DATA TYPE STORAGE SPACE

CHAR(n) Non-Unicode, 1 byte per character deﬁned by n, up to a maximum of
8,000 bytes.
VARCHAR(n) Non-Unicode, 1 byte per character stored up to a maximum of
8,000 bytes
NCHAR(n) Unicode, 2 bytes per character deﬁned by n, up to a maximum of
4,000 bytes
NVARCHAR(n) Unicode, 2 bytes per character stored up to a maximum of 4,000 bytes

You can substitute the number of characters with the keyword MAX, such as VARCHAR(MAX).
A VARCHAR(MAX) or NVARCHAR(MAX) data type allows you to store up to 2 gigabytes (GB)
of data.

Date and Time Data
One of the biggest recent advances in SQL Server greatly expands the data types to store
dates and times, as shown in Table 3-4.

TABLE 3-4 Date and Time Data Types

DATA TYPE RANGE OF VALUES ACCURACY STORAGE SPACE

SMALLDATETIME 01/01/1900 to 06/06/2079 1 minute 4 bytes
DATETIME 01/01/1753 to 12/31/9999 0.00333 seconds 8 bytes
DATETIME2 01/01/0001 to 12/31/9999 100 nanoseconds 6 to 8 bytes
DATETIMEOFFSET 01/01/0001 to 12/31/9999 100 nanoseconds 8 to 10 bytes
DATE 01/01/0001 to 12/31/9999 1 day 3 bytes
TIME 00:00:00.0000000 to 100 nanoseconds 3 to 5 bytes
23:59:59.9999999

SMALLDATETIME and DATETIME data types store a date and a time together as a single
value and have existed for several versions of SQL Server. The range of values stored for
a DATETIME data type was rather limited for historical applications, so SQL Server 2008
introduced a DATETIME2 data type that provides better precision than either SMALLDATETIME
or DATETIME, along with a much larger range of values.
The DATETIMEOFFSET allows you to store a time zone for applications that need to localize
dates and times.
The most sought after data type additions are DATE and TIME. You can now store data
as either just a date or just a time, thereby eliminating many of the parsing and comparison
issues that developers faced in previous versions of SQL Server.

66 CHAPTER 3 Tables

Binary Data
Binary data is stored in a set of four data types, which are listed in Table 3-5.

TABLE 3-5 Binary Data Types

DATA TYPE RANGE OF VALUES STORAGE SPACE

BIT Null, 0, and 1 1 bit
BINARY Fixed-length binary data Up to 8,000 bytes
VARBINARY Variable-length binary data Up to 8,000 bytes

Similar to the variable-length character data types, you can apply the MAX keyword to
the VARBINARY data type to allow the storage of up to 2 GB of data while supporting all the
programming functions available for manipulating binary data.

XML Data Type
The XML data type allows you to store and manipulate Extensible Markup Language (XML)
documents natively. When storing XML documents, you are limited to a maximum of 2 GB,
as well as a maximum of 128 levels within a document. Although you could store an XML
document in a character column, the XML data type natively understands the structure of
XML data and the meaning of XML tags within the document.
Because the XML data type natively understands an XML structure, you can apply
additional validation to the XML column, which restricts the documents that can be stored
based on one or more XML schemas.
XML schemas are stored within SQL Server in a structure called a schema collection. Schema
collections can contain one or more XML schemas. When a schema collection is applied to
an XML column, the only documents allowed to be stored within the XML column must ﬁrst
validate to the associated XML schema collection.
The following command creates an XML schema collection:

CREATE XML SCHEMA COLLECTION ProductAttributes AS
'<xsd:schema xmlns:schema="PowerTools" xmlns:xsd=http://www.w3.org/2001/XMLSchema
xmlns:sqltypes=http://schemas.microsoft.com/sqlserver/2004/sqltypes
targetNamespace="PowerTools" elementFormDefault="qualified">

<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes"
schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />

<xsd:element name="dbo.PowerTools">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Category">
<xsd:simpleType>
<xsd:restriction base="sqltypes:varchar" sqltypes:localeId="1033"

sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType
IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="30" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="Amperage">
<xsd:simpleType>
<xsd:restriction base="sqltypes:decimal">
<xsd:totalDigits value="3" />
<xsd:fractionDigits value="1" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="Voltage">
<xsd:simpleType>
<xsd:restriction base="sqltypes:char" sqltypes:localeId="1033"
sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType
IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="7" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>'

Spatial Data Types
SQL Server 2008 supports two data types to store spatial data: GEOMETRY and GEOGRAPHY.
Both spatial data types are implemented by using the Common Language Runtime (CLR)
capabilities that were introduced in SQL Server 2005. Geometric data is based on Euclidian
geometry and is used to store points, lines, curves, and polygons. Geographic data is based
on an ellipsoid and is used to store data such as latitudes and longitudes.
You define spatial columns in a table using either the GEOMETRY or GEOGRAPHY data
types. When values are stored in a spatial column, you have to create an instance using one
of several spatial functions specific to the type of data being stored. A GEOMETRY column
can contain one of seven different geometric objects with each coordinate in the definition
separated by a space, as shown in Table 3-6.
The Multi* instances define multiple geometric shapes within a single instance. The
GeometryCollection allows multiple shapes to be combined into a single column to represent a
complex shape. When the object is instantiated by storing the object within a column defined
as either GEOMETRY or GEOGRAPHY, the data and the definition of the object instance are
stored within the column. Because the type of object and the coordinate data values are
inseparable, it is possible to store multiple different types of objects in a single column.

68 CHAPTER 3 Tables

TABLE 3-6 Geometry Data Type Definitions

INSTANCE DESCRIPTION

Point Has x and y coordinates, with optional elevation and measure
values.
LineString A series of points that defines the start, end, and any bends in the
line, with optional elevation and measure values.
Polygon A surface defined as a sequence of points that defines an exterior
boundary, along with zero or more interior rings. A polygon has at
least three distinct points.
GeometryCollection Contains one or more instances of other geometry shapes, such as
a Point and a LineString.
MultiPolygon Contains the coordinates for multiple Polygons.
MultiLineString Contains the coordinates of multiple LineStrings.
MultiPoint Contains the coordinates of multiple Points.

Geographic data is stored as latitude and longitude points. The only restriction on geographic
data is that the data and any comparisons cannot span a single hemisphere.

HIERARCHYID Data Type
The HIERARCHYID data type is used to organize hierarchical data, such as organization
charts, bills of materials, and flowcharts. The HIERARCHYID stores a position within a tree
hierarchy. By employing a HIERARCHYID, you can quickly locate nodes within a hierarchy as
well as move data between nodes within the structure.

Column Properties
The seven properties that you can apply to a column are: nullability, COLLATE, IDENTITY,
ROWGUIDCOL, FILESTREAM, NOT FOR REPLICATION, and SPARSE.

Nullability
You can specify whether a column allows nulls by specifying NULL or NOT NULL for the column
properties. Just as with every command you execute, you should always specify explicitly
each option that you want, especially when you are creating objects. If you do not specify the
nullability option, SQL Server uses the default option when creating a table, which could produce
unexpected results. In addition, the default option is not guaranteed to be the same for each
database because you can modify this by changing the ANSI_NULL_DEFAULT database property.

COLLATE
Collation sequences control the way characters in various languages are handled. When you
install an instance of SQL Server, you specify the default collation sequence for the instance. You
can set the COLLATE property of a database to override the instance collation sequence, which


SQL Server then applies as the default collation sequence for objects within the database. Just
as you can override the default collation sequence at a database level, you can also override the
collation sequence for an entire table or an individual column.
By specifying the COLLATE option for a character-based column, you can set
language-specific behavior for the column.

IDENTITY
Identities are used to provide a value for a column automatically when data is inserted. You
cannot update a column with the identity property. Columns with any numeric data type,
except float and real, can accept an identity property because you also have to specify a seed
value and an increment to be applied for each subsequently inserted row. You can have only
a single identity column in a table.
Identity columns frequently are unique, but they do not have to be. To make an identity
column unique, you must apply a constraint to the column, which you will learn about in
Lesson 2, “Implementing Constraints.”
Although SQL Server automatically provides the next value in the sequence, you can insert
a value into an identity column explicitly by using the SET IDENTITY_INSERT <table name>
ON command. You can also change the next value generated by modifying the seed using
the DBCC CHECKIDENT command.

ROWGUIDCOL
The ROWGUIDCOL property is used mainly by merge replication to designate a column that is
used to identify rows uniquely across databases. The ROWGUIDCOL property is used to ensure
that only a single column of this type exists and that the column has a UNIQUEIDENTIFIER
data type.

FILESTREAM
Databases are designed to store well-structured, discrete data. As the variety of data within
an organization expands, organizations need to be able to consolidate data of all formats
within a single storage architecture. SQL Server has the ability to store all the various data
within an organization, the majority of which exist as documents, spreadsheets, and other
types of files.
Prior to SQL Server 2008, you had to extract the contents of a file to store it in a
VARBINARY(MAX), VARCHAR(MAX), or NVARCHAR(MAX) data type. However, you were
limited to storing only 2 GB of data within a large data type. To work around this restriction,
many organizations stored the filename within SQL Server and maintained the file on the
operating system. The main issue with storing the file outside the database is that it was very
easy to move, delete, or rename a file without making a corresponding update to the database.
SQL Server 2008 introduces a new property for a column called FILESTREAM. FILESTREAM
combines the best of both worlds. Binary large objects (BLOBs) stored in a FILESTREAM column

70 CHAPTER 3 Tables

are controlled and maintained by SQL Server; however, the data resides in a file on the operating
system. By storing the data on the file system outside of the database, you are no longer
restricted to the 2-GB limit on BLOBs. In addition, when you back up the database, all the files
are backed up at the same time, ensuring that the state of each file remains synchronized with
the database.
You apply the FILESTREAM property to columns with a VARBINARY(MAX) data type. The
column within the table maintains a 16-byte identifier for the file. SQL Server manages the
access to the files stored on the operating system.

EXAM TIP
A FILEGROUP designated for FILESTREAM storage is off-line and inaccessible within
a Database Snapshot. In addition, you cannot implement Database Mirroring against a
database containing data stored with FILESTREAM.

NOT FOR REPLICATION
The NOT FOR REPLICATION option is used for a column that is defined with the IDENTITY
property. When you define an identity, you specify the starting value, seed, and an increment
to be applied to generate the next value. If you explicitly insert a value into an identity column,
SQL Server automatically reseeds the column. If the table is participating in replication, you
do not want to reseed the identity column each time data is synchronized. By applying the
NOT FOR REPLICATION option, SQL Server does not reseed the identity column when the
replication engine is applying changes.

SPARSE
Designed to optimize storage space for columns with a large percentage of NULLs, the option
to designate a column as sparse is new in SQL Server 2008. To designate a column as SPARSE,
the column must allow NULLs. When a NULL is stored in a column designated as SPARSE,
no storage space is consumed. However, non-NULL values require 4 bytes of storage space
in addition to the normal space consumed by the data type. Unless you have a high enough
percentage of rows containing a NULL to offset the increased storage required for non-NULL
values, you should not designate a column as SPARSE.
You cannot apply the SPARSE property to
Columns with the ROWGUIDCOL or IDENTITY property
TEXT, NTEXT, IMAGE, TIMESTAMP, GEOMETRY, GEOGRAPHY, or user-defined data types
A VARBINARY(MAX) with the FILESTREAM property
A computed column of a column with a rule or default bound to it
Columns that are part of either a clustered index or a primary key
A column within an ALTER TABLE statement


NOTE
E ROW SIZE LIMITATION
If the maximum size of a row in your table exceeds 4,009 bytes, you cannot issue an ALTER
statement to either change a column to SPARSE or add an additional SPARSE column.
E E
During the ALTER, each row is recomputed by writing a second copy of the row on the
same data page. Because two copies of a row that exceed 4,009 bytes would exceed the
8,018 bytes allowed per page, the ALTER TABLE statement fails.
E
The only workarounds to this storage design issue are the following:
Reduce the data within a row so that the maximum row size is greater than
4,009 bytes
Create a new table, copy all the data to the new table, drop the old table, and
then rename the newly created table
Export the data, truncate the existing table, make the changes, and import the data
back into the table

Computed Columns
Computed columns allow you to add to a table columns that, instead of being populated
with data, are calculated based on other columns in the row. For example, you might have a
subtotal and shipping amount in your table that would allow you to create a calculated column
for the grand total which automatically changes if the subtotal or shipping amount changes.
When you create a computed column, only the deﬁnition of the calculation is stored. If you
use the computed column within any data manipulation language (DML) statement, the value
is calculated at the time of execution. If you do not want to incur the overhead of making
the calculation at runtime, you can specify the PERSISTED property. If a computed column is
PERSISTED, SQL Server stores the result of the calculation in the row and updates the value
anytime data that the calculation relies upon is changed.

Row and Page Compression
SQL Server 2008 now allows you to compress rows and pages for tables that do not have a
SPARSE column, as well as for indexes and indexed views.
Row-level compression allows you to compress individual rows to ﬁt more rows on a page,
which in turn reduces the amount of storage space for the table because you don’t need to
store as many pages on a disk. Because you can uncompress the data at any time and the
uncompress operation must always succeed, you cannot use compression to store more than
8,060 bytes in a single row.
Page compression reduces only the amount of disk storage required because the entire page
is compressed. When SQL Server applies page compression to a heap (a table without a clustered
index), it compresses only the pages that currently exist in the table. SQL Server compresses
new data added to a heap only if you use the BULK INSERT or INSERT INTO. . .WITH (TABLLOCK)
statements. Pages that are added to the table using either BCP or an INSERT that does not specify

72 CHAPTER 3 Tables

a table lock hint are not compressed. To compress any newly added, uncompressed pages, you
need to execute an ALTER TABLE. . .REBUILD statement with the PAGE compression option.
The compression setting for a table does not pass to any nonclustered indexes or indexed
views created against the table. You need to specify compression for each nonclustered index or
indexed view that you want to be compressed. If the table is partitioned, which you learn about
in Chapter 6, “Distributing and Partitioning Data,” you can apply compression at a partition level.
VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) store data in specialized structures
outside the row. In addition, VARBINARY(MAX) with the FILESTREAM option stores documents in
a directory external to the database. Any data stored outside the row cannot be compressed.

Creating Tables
A portion of the general syntax for creating a table is

CREATE TABLE
[ database_name . [ schema_name ] . | schema_name . ] table_name
( { <column_definition> | <computed_column_definition>
| <column_set_definition> }
[ <table_constraint> ] [ ,...n ] )
[ ON { partition_scheme_name ( partition_column_name ) | filegroup
| "default" } ]
[ { TEXTIMAGE_ON { filegroup | "default" } ]
[ FILESTREAM_ON { partition_scheme_name | filegroup
| "default" } ]
[ WITH ( <table_option> [ ,...n ] ) ][ ; ]

<column_definition> ::=
column_name <data_type>
[ FILESTREAM ]
[ COLLATE collation_name ]
[ NULL | NOT NULL ]
[ [ CONSTRAINT constraint_name ] DEFAULT constant_expression ] |
| [ IDENTITY [ ( seed ,increment ) ] [ NOT FOR REPLICATION ] ]
[ ROWGUIDCOL ] [ <column_constraint> [ ...n ] ] [ SPARSE ]

<data type> ::=
[ type_schema_name . ] type_name
[ ( precision [ , scale ] | max |
[ { CONTENT | DOCUMENT } ] xml_schema_collection ) ]

<computed_column_definition> ::=
column_name AS computed_column_expression
[ PERSISTED [ NOT NULL ] ]

<column_set_definition> ::=
column_set_name XML COLUMN_SET FOR ALL_SPARSE_COLUMNS

<table_option> ::=
{ DATA_COMPRESSION = { NONE | ROW | PAGE }
[ ON PARTITIONS ( { <partition_number_expression> | <range> } [ , ...n ] ) ]}

A standard table in SQL Server 2008 can have 1,024 columns. However, by using the new
column set definition in conjunction with the new sparse column capabilities, you can create
a table with as many as 30,000 columns. Tables that exceed 1,024 columns by using a column
set definition are referred to as wide tables, but the data stored in any row still cannot exceed
8,019 bytes unless you have a VARCHAR(MAX), NVARCHAR(MAX), or VARBINARY(MAX)
column defined for the table.
In addition to persistent tables that you create within a database, you can also create four
additional table structures that are transient.
Temporary tables are stored in the tempdb database and can be either local or global.
A local temporary table is designated by a name beginning with a # symbol and is visible
only within the connection that created it. A global temporary table is designated by a name
beginning with a ## symbol and is visible across all connections to the instance. Both global
and local temporary tables are dropped automatically when the connection that created the
temporary tables is terminated.
Table variables can be created to pass sets of data within objects such as stored procedures
and functions. Table variables can be populated with INSERT, UPDATE, DELETE, or MERGE
statements and even participate in JOIN statements like any other table. A table variable is a
memory-resident structure that is visible only within the connection that declared the variable
and is deallocated after the code which declared the variable completes.
A new feature in SQL Server 2008 is a table data type that allows you to create a function
and stored procedure parameters to pass sets of data between objects.

Q
Quick Check
1 . How do you design a database?

2. What are three new options that you can configure for columns, rows, or pages
within a table?

Quick Check Answers
1 . The ruling principle for designing a database is “Put things where they belong.”
If the need is to store multiple rows of information that link back to a single
entity, you need a separate table for those rows. Otherwise, each table defines a
major object for which you want to store data and the columns within the table
define the specific data that you want to store.

2. You can designate columns as SPARSE to optimize the storage of NULLs. You can
apply the FILESTREAM property to a VARBINARY(MAX) column to enable the
M
storage of documents in a directory on the operating system that exceed 2 GB.
Rows can be compressed to fit more rows on a page. Pages can be compressed to
reduce the amount of storage space required for the table, index, or indexed view.

74 CHAPTER 3 Tables

PR ACTICE Creating Tables

In this practice, you create a schema to store a set of tables. You also add constraints and
conﬁgure the storage options for rows and pages within a table.
1. Execute the following code to create the test schema in the AdventureWorks database:

USE AdventureWorks
GO

CREATE SCHEMA test AUTHORIZATION dbo
GO

2. Execute the following code to create a table with an IDENTITY and a SPARSE column:

CREATE TABLE test.Customer
(CustomerID INT IDENTITY(1,1),
LastName VARCHAR(50) NOT NULL,
FirstName VARCHAR(50) NOT NULL,
CreditLine MONEY SPARSE NULL,
CreationDate DATE NOT NULL)
GO

3. Execute the following code to create a table with a computed column and row
compression:

CREATE TABLE test.OrderHeader
(OrderID INT IDENTITY(1,1),
CustomerID INT NOT NULL,
OrderDate DATE NOT NULL,
OrderTime TIME NOT NULL,
SubTotal MONEY NOT NULL,
ShippingAmt MONEY NOT NULL,
OrderTotal AS (SubTotal + ShippingAmt))
WITH (DATA_COMPRESSION = ROW)
GO

Lesson Summary
Schemas allow you to group related objects together as well as provide a security
container for objects.
The most important decision you can make when designing a table is the data type of
a column.
You can use a column set deﬁnition along with sparse columns to create tables with up
to 30,000 columns.
Tables, indexes, and indexed views can be compressed using either row or page
compression; however, compression is not compatible with sparse columns.


Lesson Review
“Creating Tables.” The question is also available on the companion CD if you prefer to review
it in electronic form.

NOTE
E ANSWERS
The answer to this question and an explanation of why the answer choice is right or wrong
is located in the “Answers” section at the end of the book.

1. Which options are not compatible with row or page compression? (Choose two. Each
forms a separate answer.)
A. A column with a VARCHAR(MAX) data type
B. A sparse column
C. A table with a column set
D. A VARBINARY(MAX) column with the FILESTREAM property

76 CHAPTER 3 Tables

Lesson 2: Implementing Constraints
You use constraints to enforce business rules as well as consistency in data. In this lesson, you
learn about constraints and how to implement each type of constraints within your database.

Create a primary key
Create a foreign key
Create a unique constraint
Implement a default constraint
Apply a check constraint


Primary Keys
You can have only a single primary key constraint defined for a table. The primary key defines
the column(s) that uniquely identify every row in the table. You must specify all columns
within the primary key as NOT NULL.
When you create a primary key, you also designate whether the primary key is clustered or
nonclustered. A clustered primary key, the default SQL Server behavior, causes SQL Server to
store the table in sorted order according to the primary key.

EXAM TIP
The default option for a primary key is clustered. When a clustered primary key is created
on a table that is compressed, the compression option is applied to the primary key when
the table is rebuilt.

Foreign Keys
You use foreign keys to implement referential integrity between tables within your database.
By creating foreign keys, you can ensure that related tables cannot contain invalid, orphaned
rows. Foreign keys create what is referred to as a parent-child relationship between two tables
and ensures that a value cannot be written to the child table that does not already exist in the
parent table. For example, it would not make any sense to have an order for a customer who
does not exist.
To create a foreign key between two tables, the parent table must have a primary key,
which is used to refer to the child table. In addition, the data types between the parent
column(s) and child column(s) must be compatible. If you have a multicolumn primary key, all
the columns from the parent primary key must exist in the child table to define a foreign key.

Lesson 2: Implementing Constraints CHAPTER 3 77

CAUTION
N CASCADING
One of the options for a foreign key is CASCADE. You can configure a foreign key such that
modifications of the parent table are cascaded to the child table. For example, when you
delete a customer, SQL Server also deletes all the customer’s associated orders. Cascading
is an extremely bad idea. It is very common to have foreign keys defined between all
the tables within a database. If you were to issue a DELETE statement without a WHERE
E
clause against the wrong table, you could eliminate every row, in every table within your
database, very quickly. By leaving the CASCADE option off for a foreign key, if you attempt
to delete a parent row that is referenced, you get an error.

Unique Constraints
Unique constraints allow you to define a column or columns for which the values must be
unique within the table. Duplicate entries are not allowed. For example, you might want to
ensure that you do not have any duplicate customer names in your database. Although a
unique constraint is similar to a primary key, a unique constraint allows NULLs.

EXAM TIP
Although a NULL does not equal another NULL and NULLs cannot be compared, a unique
constraint treats a NULL as it does any other data value. If the unique constraint is defined
on a single column, then a single row within the table is allowed to have a NULL within
that column. If the unique constraint is defined across more than one column, then you
can store NULLs within the columns so long as you do not produce a duplicate across the
combination of NULLs and actual data values.

Default Constraints
Default constraints allow you to specify a value that is written to the column if the application
does not supply a value. Default constraints apply only to new rows added with an INSERT,
BCP, or BULK INSERT statement. You can define default constraints for either NULL or NOT
NULL columns. If a column has a default constraint and an application passes in a NULL for
the column, SQL Server writes a NULL to the column instead of the default value. SQL Server
writes the default value to the column only if the application does not specify the column in
the INSERT statement.

Check Constraints
Check constraints limit the range of values within a column. Check constraints can be created
at a column level and are not allowed to reference any other column in the table. Table-level
check constraints can reference any column within a table, but they are not allowed to
reference columns in other tables.

78 CHAPTER 3 Tables

The evaluation of a check constraint must return a value of true or false. Any other state
for the evaluation is not allowed. Data that passes the check constraint is allowed into the
table or column, whereas data that does not pass the check constraint is rejected, and an
error is returned to the application.
Check constraints can utilize simple comparisons, such as >, <, >=, <=, <>, and =. You can
create more complex check constraints by multiple tests using AND, OR, and NOT. Check
constraints can also use the wildcards % and _, as well as performing pattern matching routines.
For example, you could create the following check constraint to enforce a valid format for a U.S.
social security number that consists of three digits, a dash, two digits, a dash, and four digits:

CHECK (Column1 LIKE '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]')

Q
Quick Check
1 . What is the difference between a primary key and a unique constraint?

2. What restrictions does the parent table have when creating a foreign key?

Quick Check Answers
1 . A primary key does not allow NULLs.

2. The parent table must have a primary key that is used to deﬁne the relationship
between the parent and child tables. In addition, if the parent’s primary key is
deﬁned on multiple columns, all the columns must exist in the child table for the
foreign key to be created.

PR ACTICE Implement Constraints

In this practice, you add constraints to the tables that you created in Lesson 1.
1. Execute the following code to add primary keys to the Customer and OrderHeader tables:

ALTER TABLE test.Customer
ADD CONSTRAINT pk_customer PRIMARY KEY CLUSTERED (CustomerID)
GO

ALTER TABLE test.OrderHeader
ADD CONSTRAINT pk_orderheader PRIMARY KEY CLUSTERED (OrderID)
GO

2. Execute the following code to add a foreign key between the Customer and
OrderHeader tables:

ADD CONSTRAINT fk_orderheadertocustomer FOREIGN KEY (CustomerID)
REFERENCES test.Customer (CustomerID)
GO

Lesson 2: Implementing Constraints CHAPTER 3 79

3. Execute the following code to implement defaults for the CreationDate and OrderDate
columns:

ALTER TABLE test.Customer
ADD CONSTRAINT df_creationdate DEFAULT (GETDATE()) FOR CreationDate
GO

ADD CONSTRAINT df_orderdate DEFAULT (GETDATE()) FOR OrderDate
GO

4. Execute the following code to add a check constraint to the SubTotal column:

ADD CONSTRAINT ck_subtotal CHECK (SubTotal > 0)
GO

Lesson Summary
A primary key deﬁnes the column(s) that uniquely identify each row in a table.
Foreign keys are used to enforce referential integrity between tables.
Default constraints provide a value when the application does not specify a value for a
column.
Check constraints limit the acceptable values for a column.

Lesson Review
“Implementing Constraints.” The question is also available on the companion CD if you prefer
to review it in electronic form.

NOTE
E ANSWERS
The answer to this question and an explanation of why each answer choice is right or
wrong is located in the “Answers” section at the end of the book.

1. Columns with which properties cannot be sparse columns? (Choose two. Each forms a
separate answer.)
A. FILESTREAM
B. NULL
C. NOT FOR REPLICATION
D. COLLATE

80 CHAPTER 3 Tables

Chapter Review
To practice and reinforce the skills you learned in this chapter further, you can

Chapter Summary
Tables form the foundation of every database that you create, with the choice of data
types for each column being the most important performance decision that you make.
You can designate columns as SPARSE to optimize the storage when a large number of
rows contain a NULL for a column.
Row and page compression can conserve storage space and improve data-processing
performance.
Primary keys should be created on tables to identify each row within a table uniquely.

Key Terms
Check constraint
Default contraint
FILESTREAM
Foreign key
Identity column
Primary key
Schema
Schema collection
Sparse column
Unique constraint

Case Scenario


Case Scenario: Performing Data Management Tasks
Wide World Importers is implementing a new set of applications to manage several lines of
business. Within the corporate data center, they need the ability to store large volumes of
data that can be accessed from anywhere in the world.
Several business managers need access to operational reports that cover the current
workload of their employees along with new and pending customer requests. The same
business managers also need access to large volumes of historical data to spot trends and
optimize their stafﬁng and inventory levels.
Business managers want to eliminate all the product manuals that are included with their
products and instead direct users to the company Web site. Users should be able to browse
for manuals based on the product, or search for text within a manual. The sales force also
would like to enhance the company Web site to allow product descriptions to be created and
searched in multiple languages.
the customers that a sales rep is servicing, along with potential prospects. The data for the
sales force needs to be available even when they are not connected to the Internet or the
corporate network. Periodically, sales reps connect to the corporate network and synchronize
their data with the corporate databases.
A variety of Windows applications have been created with Microsoft Visual Studio.NET,
and all data access is performed using stored procedures. The same set of applications is
deployed for users connecting directly to the corporate database server, as well as for sales
reps connecting to their own local database servers.
1. How should you design the tables to allow product manuals to be stored within the
database?
2. How should you design the table to hold product descriptions in multiple languages?
3. How should you design the tables so that you can assign customers to sales reps while
also ensuring that a customer cannot be assigned to a sales rep that does not exist?

Suggested Practices
To help you master the exam objectives presented in this chapter, complete the following tasks.

Creating Tables
Practice 1 Insert a row into the Customer table and review the value of the
CustomerID. Change the seed, the increment for the identity column, or both. Insert
additional rows into the Customer table and review the value(s) for the CustomerID.
Did you get the CustomerID values that you expected?

82 CHAPTER 3 Tables

Creating Constraints
Practice 1 Attempt to insert a customer without specifying a last name. Did you
receive the result you expected?
Practice 2 Try to insert an order with an invalid CustomerID. What happens?
Practice 3 When you insert a new order or a new customer, what do you get for the
CreationDate or OrderDate?
Practice 4 Attempt to insert an order with a negative subtotal. What happens?




CHAPTER 4

Designing SQL Server Indexes
n Chapter 3, “Tables,” you learned about the key considerations that go into designing a
I ﬂexible and high-performing database. After you have an optimal table design, you need
to design efﬁcient indexes to effectively query any data that is stored. In this chapter, you
learn about the internal architecture of an index, as well as how to construct clustered,
nonclustered, Extensible Markup Language (XML), and spatial indexes. You will then learn
how to manage and maintain the indexes to ensure peak performance.

Maintain indexes.

Lesson 1: Index Architecture 87

Lesson 2: Designing Indexes 93

Lesson 3: Maintaining Indexes 104

Before You Begin

CHAPTER 4 85

REAL WORLD
Michael Hotek

O ne of my customers had a moderate-sized data warehouse environment
that was used to drive many company pricing and product decisions. Once
a month, they would receive data from several source systems. The most recent
data would be combined with all the previous data on a staging server. After they
imported the data, they would execute several processes to compute aggregates,
derive tables, transform data into fact and dimension tables, and denormalize data
to be used for subsequent query activity.

The fundamental business problem was performance and data availability. To begin
improving the situation, they completed a multi-month project to replace all the
servers, networking, and storage area network (SAN) storage at a cost of over $1 million.

When all the new hardware was in place, the processing run dropped to between
12 and 16 days. Some of the processes took 12 to 18 hours. One of their consultants
analyzed the databases and determined that many of the indexes were severely
fragmented. Over the course of about two weeks, they defragmented the indexes,
as well as adding several more indexes. In the process, they told the customer
that performance was going to be improved by the changes that were being
implemented. The customer was happy that they were “finally getting help” and that
their problems were “SQL Server’s fault.”

During the next monthly run, however, there was very little improvement to the
processing routines.

Analysis determined that the indexes were almost completely fragmented again.
What the consultant failed to account for was the fact that the processing routines
manipulated almost the entire contents of every table within the database.
No matter how much effort was put into defragmenting indexes, the way the
processing routines were written, SQL Server was not going to take advantage of
many indexes and the indexes just added overhead to many of the routines.

Further analysis found a host of problems. Data types used in joins did not match,
GROUP BY clauses were added to dozens of queries that did not contain an aggregate,
temp tables were being filled with millions of rows of data and then never used,
temp tables with tens or hundreds of millions of rows were created to generate other
temp tables which generated other temp tables, INSERT. . .SELECT statements had
T
ORDER BY clauses, joins were being performed on calculations, table designs were not
efficient, and the list went on and on and on.

The moral of the story is that although indexes are designed to improve the
performance of data retrieval operations, indexes alone cannot overcome inefficient
code or inefficient table designs.

86 CHAPTER 4 Designing SQL Server Indexes

Lesson 1: Index Architecture
Indexes are designed so that you can find the information you are looking for within a
vast volume of data by needing to perform only a very small number of read operations.
In this lesson, you learn about the internal structure of an index, as well as how SQL Server
builds and manipulates indexes. Armed with this structural information, you can make better
decisions on the number, type, and definition of the indexes that you choose to create.

to:
Understand how a B-tree is built and maintained
Understand why SQL Server uses a B-tree structure for indexes


Index Structure
SQL Server does not need to have indexes on a table to retrieve data. A table can simply be
scanned to find the piece of data that is requested. However, the amount of time to find a
piece of data is directly proportional to the amount of data in the table. Because users want
to store increasing amounts of data in a table as well as have consistent query performance,
regardless of the data volume, you need to employ indexes to satisfy the needs of the
applications that all businesses are built upon.
Indexes are not a new concept; we use them every day. At the back of this book, you
will find an index in printed form. If you want to read about clustering, you can find the
information two different ways. You could open this book, start at page 1, and scan each
page until you reached Chapter 14, “Failover Clustering,” and located the specific information
that you needed. You could also open the index at the back of the book, locate the clustering
entry, and then go to the corresponding page in the book. Either would accomplish your
goal, but using the index allows you to locate the information you want by looking at the
smallest number of pages possible.
An index is useful only if it can provide a means to find data very quickly regardless of
the volume of data that is stored. Take a look at the index at the back of this book. The index
contains only a very small sampling of the words in the book, so it provides a much more
compact way to search for information. The index is organized alphabetically, a natural way
for humans to work with words, which enables you to eliminate a large percentage of the
pages in the book to find the information you need. In addition, it enables you to scan down
to the term you are searching for; after you find the word, you know that you don’t have to
look any further. SQL Server organizes indexes in a very similar manner.

Lesson 1: Index Architecture CHAPTER 4 87

Balanced Trees (B-Trees)
The structure that SQL Server uses to build and maintain indexes is called a balanced tree
(B-tree). An example of a B-tree is shown in Figure 4-1.

Root

Intermediate

Leaf

FIGURE 4-1 B-tree structure

A B-tree is constructed of a root node that contains a single page of data, one or more
optional intermediate level pages, and one or more optional leaf level pages. The core
concept of a B-tree can be found in the first word of the name: balanced. A B-tree is always
symmetrical, with the same number of pages on both the left and right halves at each level.
The leaf-level pages contain entries sorted in the order that you specified. The data at the
leaf level contains every combination of values within the column(s) that are being indexed.
The number of index rows on a page is determined by the storage space required by the
columns that are defined in the index.

NOTE
E INDEX ENTRY STORAGE
Pages in SQL Server can store up to 8,060 bytes of data. So an index created on a column
with an INT data type can store 2,015 values on a single page within the index, whereas
T
an index based on a column with a datetime2 data type can store only about half as many
values per page, or 1,007 values per page.

The root and intermediate levels of the index are constructed by taking the first entry from
every page in the level below, along with a pointer to the page where the data value came
from, as shown in Figure 4-2.
A query scans the root page until it finds a page that contains the value being searched
on. It then uses the page pointer to hop to the next level and scan the rows in that page
until it finds a page that contains the data being searched for. It then repeats the process
with subsequent levels until it reaches the leaf level of the index. At this point, the query has
located the required data.


City1
City62
City121
City190

City1 City62 City121 City190
City34 City93 City150 City220

City1 City34 City62 City93 City121 City150 City190 City220
City27 City61 City89 City120 City148 City177 City217 City247

FIGURE 4-2 Constructing intermediate and root levels

For example, if you were looking for City132 in the B-tree depicted in Figure 4-2, the
query starts at the root level and scan the rows. Because City132 falls between City121 and
City190, SQL Server calculates that City132 could possibly be found on the page that starts
with City121. SQL Server then moves to the intermediate-level page beginning with City121.
Upon scanning the page, SQL Server again determines that City132 lies between City121 and
City150, so SQL Server moves to the leaf-level page starting with City121 and scans that page
until City132 is located. Because this is the leaf-level page, there aren’t any more pages to
search for the data required. If City132 did not exist in the table, SQL Server would not find an
entry for City132. As soon as it read the entry for City133, it would determine that the value
for City132 could not possibly be contained farther down the page, and the query returns
with no results found. You should note that from the structure shown here, SQL Server has to
read only a maximum of three pages to locate any city within the database.
This is what it means to have a balanced tree. Every search that SQL Server performs
always travels through the same number of levels in the index, as well as the same number of
pages in the index, to locate the piece of data you want.

Index Levels
The number of levels in an index and the number of pages within each level of an index are
determined by simple mathematics. A data page in SQL Server is 8,192 bytes in size, which
can be used to store up to 8,060 bytes of actual user data. Based on the number of bytes
required to store an index key, determined by the data type, you can calculate the number of
rows per page that are stored by using simple division.
The following example describes not only how an index is built, but also the size calculations
for an index. It gives you an idea of how valuable it can be to use an index to find data within
very large tables, as well as explain why the amount of time to find a piece of data does not
vary much even if the size of a database increases dramatically. Of course, the amount of time
needed to locate data also depends upon writing efficient queries.
If you build an index on an INT column, each row in the table will require 4 bytes of
storage in the index.


If the table contains only 1,200 rows of data, you need 4,800 bytes of storage. Because all
the entries would fit on a single page of data, the index would have a single page that would
be the root page as well as the leaf page. In fact, you could store 2,015 rows in the table and
still allocate only a single page to the index.
As soon as you add the 2,016th row, however, all the entries can no longer fit on a single
page, so two additional pages are allocated to the index in a process called page splitting.
The existing root page is pushed down the structure to become a leaf-level page. SQL Server
takes half of the data on the index page and moves it to one of the newly allocated pages.
The other new page is allocated at the top of the index structure to become the new root
page. The final step in the process is to take the first entry on each of the leaf-level pages
and write the entries to the newly created root page. You are now left with an index with a
root page and two leaf-level pages. This index does not need an intermediate level created
because the root page can contain all the values at the beginning of the leaf-level pages. At
this point, locating any row in the table requires scanning exactly two pages in the index.

NOTE
E PAGE SPLITS
Keep in mind that rows on an index page are maintained in sorted order, so SQL Server
always writes any new entries into the correct sorted location when page splitting. This
can cause rows to move between pages, and page splits can occur at any level within the
storage structure.

You can continue to add rows to the table without affecting the number of levels in
the index until you reach 4,060,225 rows. You then have 2,015 leaf-level pages with 2,015
entries apiece. The root page has 2,015 entries corresponding to the first row on each of the
leaf-level pages. Therefore, for SQL Server to find any row within the 4,060,255 rows in the
table, it would require reading exactly two pages. When the 4,060,226th row of data is added
to the table, another page needs to be allocated to the index at the leaf level, but the root
page cannot hold 2,016 entries because that would make it exceed the 8,060 bytes that
are allowed. So SQL Server goes through a page split process. The previous root-level page
now becomes an intermediate-level page, with a second page allocated at the intermediate
level. The former root page undergoes a page split to move half of the entries to the newly
allocated intermediate-level page, and the first entry on each of the two intermediate-level
pages is written to the newly allocated root page.
The next time SQL Server needs to introduce an intermediate level occurs when it must add
the 8,181,353,376th row of data to the table—2,015 rows on the root page corresponding to
2,015 pages on the intermediate level, each of which has 2,015 entries corresponding to 2,015
pages at the leaf level, plus one extra row of data that will not fit.
As you can see, this type of structure enables SQL Server to locate rows in extremely large
tables very quickly. In this example, finding a row in the table with a little over 4 million rows
requires SQL Server to scan only two pages of data, and the table could grow to more than
8 billion rows before it would require SQL Server to read three pages to find any row.


EXAM TIP
If you are creating an index on a sparse column, you should use a filtered index to create
the most compact and efficient index possible.

Q
Quick Check
1 . What type of structure does SQL Server use to construct an index?

2. What are the three types of pages within an index?

Quick Check Answers
1 . SQL Server uses a B-tree structure for indexes.

2. An index can contain root, intermediate, and leaf pages. An index has a single
root page defined at the top of the index structure. An index can have one or
more levels of intermediate pages, but it is optional. The leaf pages are the
lowest-level page within an index.

Lesson Summary
SQL Server creates an index using a B-tree structure.
Each index has a single root-level page and if all the entries do not fit on a single page,
the index can create pages at intermediate and leaf levels.

Lesson Review
“Index Architecture.” The question is also available on the companion CD if you prefer to
review it in electronic form.

NOTE
E ANSWERS

1. Fabrikam stores product information in the following table:

CREATE TABLE Products.Product
(ProductID INT IDENTITY(1,1),
ProductName VARCHAR(30) NOT NULL,
SKU CHAR(8) NOT NULL,
Cost MONEY NOT NULL,
ListPrice MONEY NOT NULL,
ShortDescription VARCHAR(200) NOT NULL,
LongDescription VARCHAR(MAX) NULL,
CONSTRAINT pk_product PRIMARY KEY CLUSTERED (ProductID))


The table is queried either by ProductID, ProductName, or SKU. The application
displays ProductName, SKU, ListPrice, and ShortDescription. The ProductID is also
returned to facilitate any subsequent operations. Several thousand new products were
recently added and now you have performance degradation. Which index should you
implement to provide the greatest improvement in query performance?
A. CREATE NONCLUSTERED INDEX idx_product ON Products.Product (ProductID,
ProductName, SKU)
B. CREATE NONCLUSTERED INDEX idx_product ON Products.Product (ProductName)
C. CREATE NONCLUSTERED INDEX idx_product ON Products.Product (ProductName)
INCLUDE (SKU, ListPrice, ShortDescription, ProductID)
D. CREATE NONCLUSTERED INDEX idx_product ON Products.Product (ProductName,
SKU, ProductID, ListPrice, ShortDescription)


Lesson 2: Designing Indexes
Indexes enable you to effectively query large amounts of data within a database. In this
lesson, you learn how to create clustered and nonclustered indexes, as well as why each type
of index is useful. You learn how to create filtered indexes and indexes with included columns
to expand the number of queries that can be covered by indexes. Finally, you learn how to
create XML and spatial indexes to improve search capabilities for XML documents and spatial
applications.

to:
Create clustered indexes
Create nonclustered indexes
Understand forwarding pointers
Create filtered indexes
Specify included columns for a nonclustered index
Create XML indexes
Create spatial indexes


Clustered Indexes
You can define an index by using one or more columns in the table, called the index key, with
the following restrictions:
You can define an index with a maximum of 16 columns.
The maximum size of the index key is 900 bytes.
The column(s) defined for the clustered index are referred to as the clustering key. A clustered
index is special because it causes SQL Server to arrange the data in the table according to the
clustering key. Because a table cannot be sorted more than one way, you can define only one
clustered index on a table.
Clustered indexes provide a sort order for the storage of data within a table. However, a
clustered index does not provide a physical sort order. A clustered index does not physically
store the data on disk in a sorted order because doing so creates a large amount of disk input/
output (I/O) for page split operations. Instead, a clustered index ensures that the page chain of
the index is sorted logically, allowing SQL Server to traverse directly down the page chain to
locate data. As SQL Server traverses the clustered index page chain, each row of data is read in
clustering key order.
Because the leaf level of a clustered index is the row of data in the table, when SQL Server
traverses the clustered index down to the leaf level, it has retrieved the data. No additional
reads are required to locate the required data.

Lesson 2: Designing Indexes CHAPTER 4 93

In general, every table should have a clustered index. One of the main purposes of a clustered
index is to eliminate forwarding pointers.

Forwarding Pointers

A table without a clustered index is referred to as a heap. When you have a heap,
page chains are not stored in sorted order. SQL Server allocates pages and
stores data as data is written to the table. The nonclustered indexes are built on
the data that is stored, with the leaf level of the indexes containing a pointer to the
location of the row in the table’s data pages.

If SQL Server must move the row by subsequent modifications, such as a page split or
the row no longer fits on the data page, SQL Server does not update the nonclustered
index with the new location of the row. Instead, SQL Server creates a forwarding
pointer on the data page pointing to the new location of the row.

Although the presence of a handful of forwarding pointers for a table is generally not a
concern, having a large number of forwarding pointers can cause severe performance
degradation. If a forwarding pointer did not exist, SQL Server would traverse the
nonclustered index and then need to perform only one additional operation to retrieve
data from the row. However, if a forwarding pointer is encountered, SQL Server needs
to perform an additional operation to gather data from the forwarded row and then
return back to continue reading down the page chain. In severe cases, you could
observe SQL Server requiring 10 to 15 times the number of read operations as the
number of rows returned by the query.

The general syntax for creating a relational index is as follows:

CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name
ON <object> ( column [ ASC | DESC ] [ ,...n ] )
[ INCLUDE ( column_name [ ,...n ] ) ]
[ WHERE <filter_predicate> ]
[ WITH ( <relational_index_option> [ ,...n ] ) ]
[ ON { partition_scheme_name ( column_name ) | filegroup_name | default } ]
[ FILESTREAM_ON { filestream_filegroup_name | partition_scheme_name | "NULL" } ][ ;
]

NOTE
E FILESTREAM DATA
The FILESTREAM_ON clause is used when clustered indexes are created on a table containing
FILESTREAM data. If you specify a different filegroup in the FILESTREAM_ON clause than
M
where the FILESTREAM data is currently located, all the FILESTREAM data will be moved to the
M M
newly specified filegroup during the creation of the clustered index.

You may recall the table creation scripts that we used in Chapter 3 had a keyword of clustered
in the specification of a primary key. Although a primary key is a constraint, SQL Server physically
implements a primary key as an index. Because the default option for a primary key is clustered
unless you specify otherwise, SQL Server creates a clustered index for a primary key. Likewise,
a unique constraint is physically implemented as a unique index. Because a primary key is also
unique by default unless it is specified as nonclustered, SQL Server physically implements each
primary key as a unique, clustered index.
As we already discussed in Chapter 3, the ON clause specifies the filegroup that the index
is created on. However, because the leaf level of a clustered index is the row of data in the
table, the table and clustered index are always stored on the same filegroup.

MORE INFO PARTITION SCHEMES
Partition schemes will be discussed in detail in Chapter 6, “Distributing and Partitioning
Data.”

Nonclustered Indexes
The other type of relational index that you can create is a nonclustered index. Nonclustered
indexes do not impose a sort order on the table, so you can create multiple nonclustered
indexes on a table. Nonclustered indexes have the same restrictions as a clustered index—
they can have a maximum of 900 bytes in the index key and a maximum of 16 columns. In
addition, a table is limited to a maximum of 1,000 nonclustered indexes.
The leaf level of a nonclustered index contains a pointer to the data you require. If a clustered
index exists on the table, the leaf level of the nonclustered index points at the clustering key. If a
clustered index does not exist on the table, the leaf level of the nonclustered index points at the
row of data in the table. Either way, when SQL Server traverses a nonclustered index to the leaf
level, one additional operation is required to locate data within a table row.

Index Maintenance
At first glance, you might think that you should just create dozens or hundreds of indexes
against a table to satisfy any possible query. An index is a B-tree structure that consists of all
the entries from the table corresponding to the index key. Values within an index are stored
on index pages according to the sort order specified for the index.
When a new row is added to the table, before the operation can complete, SQL Server
must add the value from this new row to the correct location within the index. Each time SQL
Server writes to the table it must also perform a write operation to any affected index.
If the leaf-level index page does not have room for the new value, SQL Server has to
perform a page split and write half the rows from the full page to a newly allocated page. If
this also causes an intermediate-level index page to overflow, a page split occurs at that level
as well. If the new row also causes the root page to overflow, the root page is split into a new
intermediate level, creating a new root page.


Indexes can improve query performance, but each index created also causes performance
degradation on all INSERT, UPDATE, DELETE, BULK INSERT, and BCP operations. Therefore,
you need to balance the number of indexes carefully for optimal operations. As a general
rule of thumb, if you have ﬁve or more indexes on a table designed for Online Transaction
Processing (OLTP) operations, you probably need to reevaluate why those indexes exist. Tables
designed for read operations or data warehouse types of queries usually have many more
indexes because write operations to a data warehouse typically occur via administratively
controlled batch operations during off-peak hours.

Covering Indexes
When an index is built, every value in the index key is loaded into the index. In effect, each
index is a mini-table containing all the values corresponding to just the columns in the index
key. Therefore, it is possible for a query to be entirely satisﬁed by using the data in the index.
An index that is constructed such that SQL Server can completely satisfy queries by reading
only the index is called a covering index.
If you can construct covering indexes for frequently accessed data, you can increase the
response time for queries by avoiding additional reads from the underlying table. You can
also potentially increase concurrency by having queries accessing the data from an index
while changes that do not write to the index are being made to the underlying table.
SQL Server is also capable of using more than one index for a given query. If two indexes
have at least one column in common, SQL Server can join the two indexes to satisfy a query.

Included Columns
Clearly, indexes are a good thing to have in your database and covering indexes provide
even greater value to queries. However, you are limited to 16 columns and 900 bytes for the
index key of an index. These limitations effectively rule out columns with large data types that
would be useful within a covering index so that a query does not have to pull the data from
the underlying table.
Indexes can be created using the optional INCLUDE clause. Included columns become part
of the index at only the leaf level. Values from included columns do not appear in the root
or intermediate levels of an index and do not count against the 900-byte limit for an index.
Therefore, you can construct covering indexes that can have more than 16 columns and
900 bytes by using the INCLUDE clause.

Distribution Statistics
The component that is responsible for determining whether an index should even be used to
satisfy a query is called the query optimizer. The query optimizer decides whether or not to
use an index based on the distribution statistics that are stored for the index.
When an index is created, SQL Server generates a structure called a histogram that stores
information about the relative distribution of data values within a column. The degree to which
values in the column allow you to locate small sets of data is referred to as the selectivity of the


index. As the number of unique values within a column increases, the selectivity of an index
increases. The query optimizer chooses the most selective indexes to satisfy a query because
a highly selective index allows the query processor to eliminate a very large portion of the
table so as to access the least amount of data necessary to satisfy your query. Indexes with low
selectivity and a low percentage of unique values are not considered by the query optimizer,
but they still incur an overhead for write operations.

Filtered Indexes
An index key could have a significant skew in the data values where a large percentage of the
table contains duplicated values only within a narrow range of the overall set of values. If a
query were executed that retrieved data from the portion of the table that was highly selective,
it is likely that subsequent queries executed against the low selectivity range would use the
same index, but doing so is clearly inappropriate.
To handle the cases where significant skew exists in the data, SQL Server 2008 allows you
to create filtered indexes. A filtered index is simply an index with a WHERE clause. Only the
index keys matching the WHERE clause are added to the index, allowing you to build indexes
that focus on the highly selective portions of the table while allowing SQL Server to choose
another method for the less selective range.
Filtered indexes have the following restrictions:
They must be a nonclustered index.
They cannot be created on computed columns.
Columns cannot undergo implicit or explicit data type conversion.

Index Options
Several options can be specified during the creation of an index. The most important of these
is FILLFACTOR. When an index page is full and SQL Server needs to write an entry to the
page, a page split must occur. The result of the page split is two index pages which are only
half full. If page splits frequently occur within the index, you can quickly have a large number
of index pages that contain only a partial set of data. In the same manner as files on disk,
indexes become fragmented due to page splitting. Highly fragmented indexes require a large
number of read operations to locate the information requested.
To control the rate at which page splits occur, you can specify a fill factor for the index.
FILLFACTOR specifies the percentage of free space that should be left on the leaf level of
an index during creation or rebuild. By leaving space on the leaf level, you can write a small
number of rows to a leaf-level page before a page split is required, thereby slowing the rate
of fragmentation for an index.
FILLFACTOR applies only to the leaf level of the index. Intermediate-level pages
(if applicable) and the root page are filled to near capacity. SQL Server reserves only enough
space on intermediate-level page(s) and the root page for approximately one additional row
to be added. However, if you are going to be introducing large numbers of leaf-level pages,


which in turn will cause page splits on the intermediate level(s) and potentially the root page,
you can use the PAD_INDEX option. The PAD_INDEX option causes the FILLFACTOR to be
applied to the intermediate-level page(s) and the root page of an index.
During the creation of an index, all the data values for the index key are read. After these
values are read, SQL Server creates a series of internal work tables to sort the values prior to
building the B-tree structure. By default, the work tables are created in the same database as
the index. If you do not want to consume space in the database where the index is created,
you can specify the SORT_IN_TEMPDB option, which causes the work tables for sort operations
to be generated in the tempdb database.
Both clustered and nonclustered indexes can be designated as unique. After an index has
been specified as such, you cannot place duplicate entries within it. If you attempt to insert
duplicate values, you receive an error and the transaction is disallowed. By default, multi-row
inserts where even one row produces a duplicate have the entire transaction rolled back. If
you want to allow any rows to be inserted that do not produce duplicate values and reject
only the rows that cause duplicates in a multi-row insert operation, you can specify the
IGNORE_DUP_KEY option. With the IGNORE_DUP_KEY option enabled, rows that produce
duplicate values generate a warning message, and only those rows are rejected.

Online Index Creation
When an index is built, all the values in the index key need to be read and used to construct
the index. The process of reading all the values and building the index does not occur
instantly. So, it is possible for the data to change within the index key. SQL Server controls the
data changes in a table to ensure data consistency during the build of the index according to
the creation option specified. Indexes can be created either online or off-line. When an index
is created using the WITH ONLINE = OFF option, SQL Server locks the entire table, preventing
any changes until the index is created. When an index is created using the ONLINE = ON
option, SQL Server allows changes to the table during the creation of the index by using the
version store within the tempdb database.
You control the creation of an index by using the WITH ONLINE = ON | OFF option. The
default is ONLINE = OFF. When you build a clustered index off-line, the table is locked and does
not allow select statements or data modifications. If you build a nonclustered index off-line,
a shared table lock is acquired, which allows select statements but not data modification.
During an online index creation, the underlying table or view can be accessed by queries
and data modification statements. When an index is created online, the row versioning
functionality within SQL Server 2005 is used to ensure that the index can be built without
conflicting with other operations on the table. Online index creation is available only in SQL
Server 2008 Enterprise.

EXAM TIP
Online operations such as online index creation/rebuild or online restore are available only
in SQL Server 2008 Enterprise.


XML Indexes
An XML data type can contain up to 2 gigabytes (GB) of data in a single column. Although
the XML data has a structure that can be queried, SQL Server needs to scan the data structure
to locate data within an XML document. To improve the performance of queries against XML
data, you can create a special type of index called an XML index.
There are two different types of XML indexes: primary and secondary.
A primary XML index is built against all the nodes within the XML column. The primary
XML index is also tied to the table by maintaining a link to the corresponding row in the
clustered index. Therefore, a clustered index is required before you can create a primary XML
index.
After a primary XML index has been created, you can create additional secondary indexes.
Secondary indexes can be created on PATH, VALUE, or PROPERTY. A primary XML index is
ﬁrst required, because secondary XML indexes are built against the data contained within the
primary XML index.
Secondary XML indexes created FOR PATH are built on the PATH and NODE values of the
primary XML index. A PATH XML index is used to optimize queries searching for a path within
an XML document. Indexes created FOR VALUE are built against the PATH and VALUE of the
primary XML index and are used to search for values within XML documents. Indexes created
FOR PROPERTY are created using the primary key, node, and path. Property XML indexes are
used to return data efﬁciently from an XML column along with additional columns from the
table.
The generic syntax for creating an XML index is:

CREATE [ PRIMARY ] XML INDEX index_name
ON <object> ( xml_column_name )
[ USING XML INDEX xml_index_name
[ FOR { VALUE | PATH | PROPERTY } ] ]
[ WITH ( <xml_index_option> [ ,...n ] ) ][ ; ]

Spatial Indexes
Spatial indexes are created against a spatial column that is typed as either geometry or
geography.

CREATE SPATIAL INDEX index_name
ON <object> ( spatial_column_name )
{[ USING <geometry_grid_tessellation> ]
WITH ( <bounding_box>
[ [,] <tessellation_parameters> [ ,...n ] ]
[ [,] <spatial_index_option> [ ,...n ] ] )
| [ USING <geography_grid_tessellation> ]
[ WITH ( [ <tessellation_parameters> [ ,...n ] ]
[ [,] <spatial_index_option> [ ,...n ] ] ) ]
} [ ON { filegroup_name | "default" } ];

Spatial data is defined using a two-dimensional coordinate system. Indexes are built using
B-trees, which are a linear structure. Therefore, to index spatial data, SQL Server must trans-
form the two-dimensional space of spatial data into a linear chain. The decomposition process
is accomplished using a process known as tessellation.
If you are indexing a geography data type, SQL Server maps the ellipsoid data to a
two-dimensional, non-Euclidian space. The surface of the ellipsoid is first divided into
hemispheres. Each hemisphere is then projected onto a quadrilateral pyramid. Each pyramid
is then flattened into a two-dimensional plane. The planes representing the upper and lower
hemispheres are then joined at the edge. The final process for indexing geography data is to
apply tessellation.
Prior to tessellation, SQL Server constructs a four-level, uniform, hierarchical decomposition
of the represented space. Level 1 is the top level of the hierarchy. Level 2 decomposes each
cell in the Level 1 grid into a grid of equal dimension. Level 3 decomposes each cell in the
Level 2 grid into a grid of equal dimension. Likewise, Level 4 decomposes each cell in Level 3
into a grid of equal dimension, as shown in Figure 4-3.

Level 4

Level 3

Level 2

Level 1

FIGURE 4-3 A four-level, uniform, grid hierarchy

The grid at each level in the hierarchy is numbered using the Hilbert space-filling curve.

Tessellation
After the four-level grid hierarchy is constructed, each row is read of spatial data is read and
plotted onto the grid. Beginning at level 1, the tessellation process plots the spatial object
onto the set grid cells that the object touches. The set of touched cells are then recorded into
the index.
Very small objects touch only a small number of cells within the grid hierarchy whereas
large objects can touch a very large number of cells. To limit the size of the index without
losing accuracy, tessellation applies a set of rules to determine the final output that is written
into the index:


Covering If an object completely covers a cell, the cell is not tessellated.
Cells-per-object This rule enforces the CELLS_PER_OBJECT parameter for the spatial
index for levels 2, 3, and 4 of the grid hierarchy.
Deepest cell Records only the lowest-level cells that have been tessellated
If a cell is not tessellated, no information is recorded for any subsequent levels of the grid
hierarchy that correspond to the nontessellated cell. As long as the cells-per-object rule has not
been exceeded and the object does not completely cover a cell, the cell is tessellated. When a
cell is tessellated, the portion of the object contained within the cell is plotted against the grid
in the next level of the hierarchy, where the tessellation rules are applied to the set of grid cells.
Tessellation continues across each cell in the grid and then through each subsequent level
of the grid hierarchy. After the process reaches either the cells-per-object limit or level 4 of
the hierarchy, tessellation finishes. The cells that were tessellated at the lowest level define the
key of the spatial index for the row.

Bounding Box
When indexing geometry data, you need to define one additional option for a spatial index:
the bounding box. A B-tree is a finite, linear structure that has a clearly defined beginning
and end. Because a geometric plane is infinite, you cannot define a B-tree against all possible
two-dimensional space. The BOUNDING_BOX parameter defines the maximum and minimum
x, y coordinates that are considered when constructing the grid hierarchy and tessellating the
rows of geometry data.
Any objects or portions of an object that fall outside the bounding box are not considered
or counted for the index. When you choose the limits of the bounding box, you need to select
values that encompass the majority of the objects that you want to index within the table.

MORE INFO

For more information on spatial indexes and the tessellation process, please refer to the SQL
Server Books Online article “Spatial Indexing Overview,” at http://technet.microsoft.com/
en-us/library/bb964712.aspx.

Q
Quick Check
1 . What is the difference between a clustered and a nonclustered index?

2. How does the FILLFACTOR option affect the way an index is built?

Quick Check Answers
1. A clustered index imposes a sort order on the data pages in the table. A nonclustered
index does not impose a sort order.

2. The FILLFACTOR option reserves space on the intermediate and leaf levels of the
index.


PR ACTICE Creating Indexes

In this practice, you create indexes for the AdventureWorks database.
1. Execute the following code to create a nonclustered index on the Person.Address table:

CREATE NONCLUSTERED INDEX idx_city ON Person.Address(City) INCLUDE (AddressLine1)

2. Execute the following code to create a ﬁltered index on the Person.Address table:

CREATE NONCLUSTERED INDEX idx_city2 ON Person.Address(City)
INCLUDE (AddressLine1, AddressLine2)
WHERE AddressLine2 IS NOT NULL

3. Execute the following code to create a spatial index on the Person.Address table:

CREATE SPATIAL INDEX sidx_spatiallocation
ON Person.Address(SpatialLocation)
USING GEOGRAPHY_GRID
WITH (GRIDS = (MEDIUM, LOW, MEDIUM, HIGH ),
CELLS_PER_OBJECT = 64);

Lesson Summary
Clustered indexes specify a sort order for data pages in a table.
You can create up to 1,000 nonclustered indexes on a table.
Nonclustered indexes can include columns in the leaf level of the index to cover more
queries.
You can specify a WHERE clause for a nonclustered index to limit the data set that the
index is built upon.
You can create three different types of XML indexes; PATH, VALUE, and PROPERTY.
Spatial indexes can be deﬁned for either geography or geometry data types.
If you are indexing a geometry data type, the BOUNDING_BOX parameter is required
to provide limits to the two-dimensional plane.

Lesson Review
“Designing Indexes.” The question is also available on the companion CD if you prefer to

NOTE
E ANSWERS


1. You are the Database Administrator at a retail company that supplies blanks and kits
to pen turners. You are designing a database to store characteristics of the products
offered. Each product has a variety of characteristics, but not all products have the
same set of characteristics. You are planning the index strategy for the database. The
most common query will be the following:
SELECT a.ProductName, b.ProductType, b.WoodSpecies, b.Color
FROM Products a INNER JOIN ProductAttributes b ON a.ProductID = b.ProductID
WHERE b.Color = "X"

Not all products have a Color attribute. Which index strategy would be the most
efficient?
A. A nonclustered index on Color
B. A nonclustered index on Color that includes the ProductType and WoodSpecies
columns
C. A filtered, nonclustered index on Color
D. A filtered, nonclustered index on Color that includes the ProductType and
WoodSpecies columns


Lesson 3: Maintaining Indexes
Over time, data changes will cause indexes to become fragmented. In order to ensure the
most efficient query operations possible, you need to ensure that fragmentation is minimized.
In this lesson, you will learn how to control the rate of fragmentation as well as how to
remove fragmentation from an index.

Rebuild indexes to remove fragmentation
Disable an index


Index Management and Maintenance
Because the data within an index is stored in sorted order, over time, values can move around
within the index due to either page splits or changes in the values. To manage the fragmentation
of an index over time, you need to perform periodic maintenance.

Index Fragmentation
Files on an operating system can become fragmented over time due to repeated write
operations. Although indexes can become fragmented, index fragmentation is a bit different
from file fragmentation.
When an index is built, all the values from the index key are written in sorted order onto
pages within the index. If a row is removed from the table, SQL Server needs to remove
the corresponding entry from the index. The removal of the value creates a “hole” on the
index page. SQL Server does not reclaim the space left behind because the cost of finding
and reusing a hole in an index is prohibitive. If a value in the table that an index is based on
changes, SQL Server must move the index entry to the appropriate location, which leaves
behind another hole. When index pages fill up and require a page split, you get additional
fragmentation of the index. Over time, a table that is undergoing large amounts of data
changes has the indexes become fragmented.
To control the rate of fragmentation of an index, you can use an index option called the fill
factor. You can also use the ALTER INDEX statement to remove the fragmentation.

FILLFACTOR
The FILLFACTOR option for an index determines the percentage of free space that is reserved
on each leaf-level page of the index when an index is created or rebuilt. The free space
reserved leaves room on the page for additional values to be added, thereby reducing the
rate at which page splits occur. The FILLFACTOR is represented as a percentage full. For
example, a fill factor of 75 means that 25 percent of the space on each leaf-level page is left
empty to accommodate future values.


Defragmenting an Index
Because SQL Server does not reclaim space, you must periodically reclaim the empty space in
an index to preserve the performance benefits of an index. You defragment an index by using
the ALTER INDEX statement, as shown here:
ALTER INDEX { index_name | ALL }
ON <object>
{ REBUILD
[ [ WITH ( <rebuild_index_option> [ ,...n ] ) ]
| [ PARTITION = partition_number
[ WITH ( <single_partition_rebuild_index_option>
[ ,...n ] )] ] ]
| DISABLE | REORGANIZE
[ PARTITION = partition_number ]
[ WITH ( LOB_COMPACTION = { ON | OFF } ) ]
| SET ( <set_index_option> [ ,...n ] ) }[ ; ]

When you defragment an index, you can use either the REBUILD or REORGANIZE options.
The REBUILD option rebuilds all levels of the index and leaves all pages filled according
to the FILLFACTOR setting of an index. If you rebuild the clustered index, only the clustered
index is rebuilt. However, rebuilding the clustered index with the ALL keyword also rebuilds
all nonclustered indexes on the table. The rebuild of an index effectively re-creates the entire
B-tree structure, so unless you specify the ONLINE option, a shared table lock is acquired,
preventing any changes until the rebuild operation completes.
The REORGANIZE option removes fragmentation only at the leaf level. Intermediate-level
pages and the root page are not defragmented during a reorganize. REORGANIZE is always
an online operation that does not incur any long-term blocking.

Disabling an index
An index can be disabled by using the ALTER INDEX statement as follows:

ON <object>
DISABLE [ ; ]

When an index is disabled, the definition remains in the system catalog but is no longer
used. SQL Server does not maintain the index as data in the table changes, and the index
cannot be used to satisfy queries. If a clustered index is disabled, the entire table becomes
inaccessible.
To enable an index, it must be rebuilt to regenerate and populate the B-tree structure. You
can accomplish this by using the following command:

ON <object>
REBUILD [ ; ]

Lesson 3: Maintaining Indexes CHAPTER 4 105

Q
Quick Check
1 . What is the difference between the REBUILD and REORGANIZE options of ALTER
INDEX?

2. What happens when an index is disabled?

Quick Check Answers
1 . REBUILD defragments all levels of an index. REORGANIZE defragments only the
leaf level of the index.

2. An index that is disabled is no longer used by the optimizer. In addition, as data
changes in the table, any disabled index is not maintained.

PR ACTICE Maintaining Indexes

In this practice, you defragment, disable, and re-enable indexes.
1. Execute the following query to rebuild all the indexes on the Person.Address table:

ALTER INDEX ALL
ON Person.Address
REBUILD

2. Execute the following query to reorganize an index on the Person.Person table:

ALTER INDEX IX_Person_LastName_FirstName_MiddleName
ON Person.Person
REORGANIZE

3. Execute the following query to disable the clustered index on the Person.Address table:

ALTER INDEX PK_Address_AddressID
ON Person.Address
DISABLE

4. Execute the following query to verify that the Person.Address table is not accessible:

SELECT * FROM Person.Address

5. Execute the following query to re-enable the clustered index on the Person.Address table:

ALTER INDEX PK_Address_AddressID
ON Person.Address
REBUILD

6. Execute the following query to verify that you can now access the Person.Address
table:

SELECT * FROM Person.Address


Lesson Summary
You can defragment indexes using either the REBUILD or REORGANIZE options.
The REBUILD option defragments all levels of an index. Unless the ONLINE option is
speciﬁed, a REBUILD acquires a shared table lock and block any data modiﬁcations.
The REORGANIZE option defragments only the leaf level of an index and does not
cause blocking.
You can disable an index to exclude the index from consideration by the optimizer
or any index maintenance due to data changes. If the clustered index is disabled, the
entire table becomes inaccessible.

Lesson Review
“Maintaining Indexes.” The question is also available on the companion CD if you prefer to

NOTE
E ANSWERS

1. You are in charge of building the process that loads approximately 150 GB of data into
the enterprise data warehouse every month. Every table in your data warehouse has at
least eight indexes to support data analysis routines. You want to load the data directly
into the tables as quickly as possible. Which operation provides the best performance
improvement with the least amount of administrative effort?
A. Use a BULK INSERT command.
B. Drop and re-create the indexes.
C. Disable and enable the indexes.
D. Use Integration Services to import the data.

Lesson 3: Maintaining Indexes CHAPTER 4 107

Chapter Review
following tasks:
Complete the case scenario. The scenario sets up a real-world situation involving the

Chapter Summary
You can create a single clustered index on a table, which imposes a sort order on the
index pages.
You can create up to 1,000 nonclustered indexes on a table. Nonclustered indexes can
be filtered and can include additional columns in the leaf level of the index.
A spatial index is constructed by mapping the spatial objects to a four-level uniform
grid and applying tessellation rules.
You can defragment indexes by using either the REBUILD or REORGANIZE option of
the ALTER INDEX statement.

Key Terms
Balanced tree (B-tree) A symmetric, linear structure used to construct an index.
A B-tree provides a compact structure that enables searching very large volumes of
data with a small number of read operations.
Clustered index An index that imposes a sort order on the pages within the index.
A table can have only one clustered index.
Covering index An index that allows a query to be satisfied by using only the entries
within the index.
Nonclustered index An index that does not impose a sort order on the data pages
within the table. You can have up to 1,000 nonclustered indexes in a table.
Tessellation The process that is used to construct a spatial index. Tessellation counts
the cells that a spatial object touches within the four-level grid hierarchy.

Case Scenario
In the following case scenario, you apply what you have learned about designing indexes. You
can find answers to these questions in the “Answers” section at the end of this book.


Case Scenario: Performing Data Management Tasks
Wide World Importers is implementing a new set of applications to manage several areas of
their business. Within the corporate data center, they need the ability to store large volumes
of data that can be accessed from anywhere in the world.
workload of their employees along with new and pending customer requests. The same
business managers need to be able to access large volumes of historical data to spot trends
and optimize their stafﬁng and inventory levels.
for manuals based on product or search for text within a manual. The sales force also would
like to enhance the company Web site to allow product descriptions to be created and
the customers that a sales rep is serving along with potential prospects. The data for the sales
force needs to be available even when the sales reps are not connected to the Internet or the
Some of the main tables within the database are listed in Table 4-1.

TABLE 4-1 Tables in the Wide World Importers Database

TABLE PURPOSE

Customer Contains the name of a customer along with their credit line,
account number, Web site login, and password.
CustomerAddress Contains one or more addresses for each customer. An address
can have up to three lines in addition to the city, state/province,
postal code, country, and the latitude/longitude of the address.
One address line, city, and country are required.
CustomerContact Contains one or more rows per customer to store contact
information such as phone number, cell phone, e-mail address,
and fax number.
SalesPerson Contains a list of the employees assigned to sales along with the
territory each one is assigned to, commission rate, and sales quota.
Product Contains a ProductID, SKU, inventory on hand, minimum stock
amount, product cost, and standard price.
CustomerOrder Contains the orders placed by a customer along with the sales
person of record for the order, order date, a ﬂag indicating
whether the order is shipped, and the grand total of the order.
CustomerOrderDetail Contains the line items for each order placed.
CustomerSalesPerson Links each customer to a salesperson.


A variety of Microsoft Windows applications have been created with Microsoft Visual
Studio.NET and all data access is performed using stored procedures. The same set of
applications is deployed for users connecting directly to the corporate database server, as
well as for sales reps connecting to their own local database servers.
Some of the common queries are
Search for customers by name, city, or salesperson
Search for the customers who ordered a specific product
Search for all orders that have not yet shipped
Find the shipping address for a customer
Find all of the products that have been ordered in a given month
Find all customers within a given distance from a salesperson’s current location
What should you do to ensure efficient query operations in the database?

Suggested Practices

Creating Indexes
Practice 1 Add primary keys to all tables in your database that do not have one.
Practice 2 Find all the tables in your database that do not have a clustered index and
create a clustered index or change the primary key to be clustered.
Practice 3 Create covering indexes for frequently executed queries that are not
currently being satisfied entirely by an index.
Practice 4 Change an index that is used to satisfy queries that are accessing only a
subset of rows to a filtered index to foster more efficient operations.




CHAPTER 5

Full Text Indexing
n this chapter, you will learn how to create and manage full text indexes so your
I applications can efficiently work with unstructured data stored in FILESTREAM, XML, and
large character columns.

Install SQL Server 2008 and related services
Configure additional SQL Server components
Configure full text indexing
Maintain indexes

Lesson 1: Creating and Populating Full Text Indexes 113

Lesson 2: Querying Full Text Data 120

Lesson 3: Managing Full Text Indexes 127

Before You Begin
The AdventureWorks sample database installed

REAL WORLD
Michael Hotek

O ne of the companies that I worked with sold a large number of complex
products. Each product shipped with one or more manuals, and some of the
manuals could be thousands of pages, spanning multiple volumes. To save paper
and ink costs, the company eliminated all printed manuals in 2002.

Manuals were produced using Microsoft Office Word and then rendered to a PDF
to be loaded on the company’s Web site. Customers could then visit the company
Web site to access the relevant manual. Although customers were satisfied with

CHAPTER 5 111

the term index at the back of each manual, after the manuals were loaded to the
company Web site, customers expected to be able to search across the manuals to
locate the information needed.

To solve the problem, the company hired a consulting ﬁrm to build a searchable
index for the manuals. After spending two months developing a proof of concept
which indexed 15 manuals at a cost of over $40,000, the company faced a project
estimate of over $8 million to index all the existing manuals, along with an
estimated $1 million per year for maintaining the code and indexing any new
manuals or changes to existing manuals. The project was brought to our attention
within a larger budget meeting.

Without making a big announcement, we built a simple database that took
advantage of the FILESTREAM capabilities of Microsoft SQL Server 2008. Over
M
the weekend, we loaded all 20,000+ Word documents into a VARBINARY column
Y
designated for FILESTREAM using the Win32 API features. We then created and
M
populated a full text index across all the documents. The document load and
creation of the full text index required approximately 32 hours to complete. We
also had one of our ASP developers create a quick page mockup that allowed us to
submit searches through a browser.

On Monday morning, we called a meeting with the project team working on the
document indexing project and presented our solution. Not only was management
stunned that we had managed to index all the manuals in a weekend, but they could
not believe that our search quality was signiﬁcantly better than the home-grown
solution, would automatically update with any changes, and was ready to go into
production immediately. Within two hours of leaving the meeting, we had the
manual search capability live on the company’s Web site at a savings of over
$8 million, not to mention the $1 million per year in maintenance fees the vendor
was proposing.

112 CHAPTER 5 Full Text Indexing

Lesson 1: Creating and Populating Full Text Indexes
SQL Server 2008 allows you to build indexes that give you the ability to query large volumes
of unstructured data rapidly. In this lesson, you learn how to create full text catalogs and full
text indexes.

Create a full text catalog
Create a full text index
Configure the population mode for a full text index
Manage full text index population


Full Text Catalogs
The first step in building a full text index is to create a storage structure. Unlike relational
indexes, full text indexes have a unique internal structure that is maintained within a separate
storage format called a full text catalog. Each full text catalog contains one or more full text
indexes.
The generic syntax for creating a full text catalog is
CREATE FULLTEXT CATALOG catalog_name
[ON FILEGROUP filegroup ]
[IN PATH 'rootpath']
[WITH <catalog_option>]
[AS DEFAULT]
[AUTHORIZATION owner_name ]

<catalog_option>::=
ACCENT_SENSITIVITY = {ON|OFF}

EXAM TIP
Prior to SQL Server 2008, you associated the full text catalog with a filegroup only for
backup purposes, with the entire contents of the catalog maintained in a directory
structure on the operating system. In SQL Server 2008, Microsoft eliminated the external
file structure, and the contents of a full text catalog are now stored within the database.

The FILEGROUP clause specifies the filegroup that you want to use to store any full text
indexes created within the full text catalog. The IN PATH clause has been deprecated and
should no longer be used because full text indexes are now stored within the database.

Lesson 1: Creating and Populating Full Text Indexes CHAPTER 5 113

ACCENT_SENSITIVITY allows you to configure whether the full text engine considers accent
marks when building or querying a full text index. If you change the ACCENT_SENSITIVITY
option, you need to rebuild all the full text indexes within the catalog.
The AS DEFAULT clause works the same as the DEFAULT option for a filegroup. If you do
not specify a catalog name when creating a full text index, SQL Server creates the index
within the catalog that is marked as the default full text catalog.
The AUTHORIZATION option specifies the owner of the full text catalog. The specified
owner must have TAKE OWNERSHIP permission on the full text catalog.

NOTE
E FILEGROUP PLACEMENT
Although it is possible to store a full text catalog on a filegroup that also contains relational
data, it is recommended that you create a separate filegroup for full text indexes to separate
the input/output (I/O) against the full text catalog from that of the relational data.

Full Text Indexes
After you create the full text catalog, you can create the full text indexes that are the basis for
searching unstructured data.
You can create full text indexes on columns that are CHAR/VARCHAR, XML, and
VARBINARY data types. If you create a full text index on a CHAR/VARCHAR column, the
full text engine can parse the data directly and build an appropriate index. Full text indexes
built on XML columns load a special processor that can understand and parse an Extensible
Markup Language (XML) document so that you are indexing the content of the XML
document and not the XML tags within the document.
The most common use of a VARBINARY(MAX) column is to store documents using the
new FILESTREAM capabilities in SQL Server 2008. Although the full text engine can build an
index directly on documents that you create, thereby avoiding costly conversion processes,
the engine needs to employ specialized assemblies designed for the various types of
documents that you want to store. When you process a VARBINARY(MAX) column, you also
need to specify a column that designates the type of document so that the full text parser
can load the appropriate assembly. SQL Server 2008 ships with 50 filters that allow processing
of a variety of document types such as Hypertext Markup Language (HTML), Word, Microsoft
Office PowerPoint, and Microsoft Office Excel.
The full text indexing engine uses helper services such as word breakers and stemmers
that are language-specific to build indexes. The first task of building an efficient index on
unstructured data is to build a list of words within the data being indexed. Word breakers are
assemblies that locate breaks between words to build a list of words to be indexed. Because
verbs can have multiple forms, such as past, present, and future tense, stemmers conjugate
verbs so that your queries can locate information even across multiple verb tenses. Languages
are used to specify the particular word breaker and stemmer to apply to the column because
languages conjugate verbs and even break words differently.


The list of words is filtered through a list of common words called stop words. You specify
stop words such that your index does not become polluted with large volumes of words that
you would not normally search upon. For example, the, a, and an are considered stop words
for the English language, whereas le and la are stop words for the French language.
You can create full text indexes on multiple columns; however, you can create only a single
full text index on a table or indexed view. The generic syntax for creating a full text index is
CREATE FULLTEXT INDEX ON table_name
[ ( { column_name
[ TYPE COLUMN type_column_name ]
[ LANGUAGE language_term ]
} [ ,...n]
) ]
KEY INDEX index_name
[ ON <catalog_filegroup_option> ]
[ WITH [ ( ] <with_option> [ ,...n] [ ) ] ]
[;]

<catalog_filegroup_option>::=
{fulltext_catalog_name
| ( fulltext_catalog_name, FILEGROUP filegroup_name )
| ( FILEGROUP filegroup_name, fulltext_catalog_name )
| ( FILEGROUP filegroup_name )}

<with_option>::=
{CHANGE_TRACKING [ = ] { MANUAL | AUTO | OFF [, NO POPULATION ] }
| STOPLIST [ = ] { OFF | SYSTEM | stoplist_name }}

The TYPE COLUMN parameter designates the column that contains the filter type that the
full text index engine should utilize when processing a VARBINARY(MAX) column. You use
the LANGUAGE parameter to specify the language of the data being indexed. The KEY INDEX
parameter is the single column within the table or indexed view that uniquely identifies a row.

Change Tracking
The CHANGE_TRACKING option for a full text index determines how SQL Server maintains the
index when the underlying data changes.
When you specify either MANUAL or AUTO, SQL Server maintains a list of changes to the
indexed data. When set to MANUAL, you are responsible for periodically propagating the
changes into the full text index. When set to AUTO, SQL Server automatically updates the full
text index as the data is modified. Unlike a relational index, population of a full text index
is not an immediate process because the data has to be submitted to the indexing engine,
which then applies word breakers, stemmers, language files, and stop lists before merging the
changes into the index.

When CHANGE_TRACKING is set to OFF, SQL Server does not maintain a list of changes to
the underlying data. Therefore, if you want to update the index to reflect the data currently
in the indexed column, you must repopulate the index completely. With CHANGE_TRACKING
turned off, you can also specify the NO POPULATION option, which allows you to create the
full text index without populating upon initial creation.

Language, Word Breakers, and Stemmers
The language specification is a key component in building an effective full text index.
Although you could simply use a single word breaker for all your data, when the data spans
multiple languages, you can have unexpected results. For example, the English language
breaks words with a space, whereas languages such as German and French can combine
words. If a word breaker recognized only white space between words as breaks, the full text
index would meet your needs only if all data stored were English.
The language specification is used to control the specific word breaker and stemmer
loaded by the full text indexing engine. The selected word breaker and stemmer will be the
same for the entire full text index and cannot change dynamically based on a type column
like you can apply to a VARBINARY column. However, you do not have to split column data
based on each specific language. Although words may differ, you can group many languages
into a small set of general language families and each word breaker has the ability to handle
words that span a narrow group of languages.
For example, you might store data that spans various Western European languages such as
English, German, French, and Spanish. You could use a single language to index the column that
would appropriately break the words for the index. When you have data spanning languages,
you should specify a language setting for the most complicated language. For example, the
German word breaker can also break English, Spanish, and French correctly, whereas the English
word breaker would have trouble with some of the language elements of German.
When the languages vary widely such as Arabic, Chinese, English, and Icelandic, you should
split the data into separate columns based on language. Otherwise, you will not be able to
break all words validly and build a full text index that behaves as you expect.
SQL Server 2008 ships with 50 language-specific word breakers/stemmers. Support is also
included for you to register and use third-party word breakers and stemmers within SQL Server. For
example, Turkish, Danish, and Polish are third-party word breakers that ship with SQL Server 2008.
Word breakers locate and tokenize word boundaries within text. The full text index then
aggregates each token to build distribution statistics for searching. In addition, word breakers
recognize proximity within the data set and build the proximity into the full text statistics. The
ability to search based on word proximity is a unique characteristic of full text indexes that
allows for compound search criteria that take into account how words relate to each other.
SQL Server uses stemmers to allow a full text index to search on all inflectional forms of a
search term, such as drive, drove, driven, and driving. Stemming is language-specific. Although
you could employ a German word breaker to tokenize English, the German stemmer cannot
process English.


Q
Quick Check
1 . Before you can create a full text index, which structure do you need to create?

2. What do you need to specify to create a full text index on documents stored
within a FILESTREAM?

Quick Check Answers
1 . Full text indexes are contained within full text catalogs. Therefore, you must
create a full text catalog prior to creating a full text index.

2. SQL Server stores the documents within a VARBINARY(MAX) column with the
FILESTREAM property enabled. In order to create a full text index, you need to
also specify a type column that designates what type of document is stored in
the VARBINARY(MAX) column to load the appropriate filter for the word breaker
to use.

PR ACTICE Creating Full Text Indexes

In the following practices you create a full text catalog along with a full text index.

PR ACTICE 1 Create a Full Text Catalog
In this practice, you create a full text catalog.
1. Execute the following code to add a filegroup and a file to the AdventureWorks
database for use with full text indexing:

ADD FILEGROUP AWFullTextFG
GO


ADD FILE (NAME = N'S AdventureWorksFT', FILENAME =
N'C:Program FilesMicrosoft SQL Server
MSSQL10.MSSQLSERVERMSSQLDATAAdventureWorks FT.ndf')
TO FILEGROUP AWFullTextFG
GO

2. Execute the following code to create the AdventureWorks full text catalog:

USE AdventureWorks
GO
CREATE FULLTEXT CATALOG ProductsFTC
ON FILEGROUP AWFullTextFG
GO


PR ACTICE 2 Create a Full Text Index
In this practice, you create a full text index.
1. Execute the following code to a full text index for the product description:

CREATE FULLTEXT INDEX ON Production.ProductDescription(Description)
KEY INDEX PK_ProductDescription_ProductDescriptionID
ON ProductsFTC
WITH CHANGE_TRACKING = AUTO
GO

2. Expand the Storage, Full Text Catalogs node underneath the AdventureWorks database,
right-click the ProductsFTC catalog, and select Properties.
3. Select the Tables/Views page to review the full text index you just created.
4. Review the full text indexes within the AW2008FullTextCatalog that ships with the
AdventureWorks database.

Lesson Summary
Before creating a full text index, you must create a full text catalog that is mapped to a
filegroup to contain one or more full text indexes.
You can create a full text index on CHAR/VARCHAR, XML, and VARBINARY columns.
If you create a full text index on a VARBINARY(MAX) column, you must specify the
column for the COLUMN TYPE parameter so that the full text engine loads the
appropriate filter for parsing.
The LANGUAGE setting controls the word breaker and stemmer that SQL Server loads
to tokenize and build inflectional forms for the index.
Although a word breaker can be used against different languages that are closely
related with acceptable results, stemmers are specific to the language that is selected.
The CHANGE_TRACKING option controls whether SQL Server tracks changes to
underlying columns as well as whether changes are populated automatically into the
index.

Lesson Review
“Creating and Populating Full Text Indexes.” The questions are also available on the

NOTE
E ANSWERS


1. You are the database administrator at your company. You need to enable the sales
support team to perform fuzzy searches on product descriptions. Which actions do
you need to perform to satisfy user needs with the least amount of effort? (Choose
two. Each forms part of the correct answer.)
A. Create a full text catalog specifying the filegroup for backup purposes and the root
path to store the contents of the catalog on the file system.
B. Create a full text catalog and specify the filegroup to store the contents of the
catalog.
C. Create a full text index on the table of product descriptions for the description
column and specify NO POPULATION.
D. Create a full text index on the table of product descriptions for the description
column and specify CHANGE_TRACKING AUTO.

2. You want to configure your full text indexes such that SQL Server migrates changes
into the index as quickly as possible with the minimum amount of administrator effort.
Which command should you execute?
A. ALTER FULLTEXT INDEX ON <table_name> START FULL POPULATION
B. ALTER FULLTEXT INDEX ON <table_name> START INCREMENTAL POPULATION
C. ALTER FULLTEXT INDEX ON <table_name> SET CHANGE_TRACKING AUTO
D. ALTER FULLTEXT INDEX ON <table_name> START UPDATE POPULATION

Lesson 2: Querying Full Text Data
SQL Server provides two commands to query full text data: CONTAINS and FREETEXT.
There are two additional commands that produce a result set with additional columns of
information: CONTAINSTABLE and FREETEXTTABLE.
The main difference between the four commands is that CONTAINS and FREETEXT return
a True/False value used to restrict a result set, and CONTAINSTABLE and FREETEXTTABLE
return a result set that can be used to extend query functionality.
When you include a full text predicate in a query, SQL Server hands off the full text search
term to the full text indexing engine to apply a word breaker which tokenizes the search
argument. Based on the tokenization of the search term, distribution statistics are returned to
the optimizer, which then merges the full text portion of the query with the relational portion
to build a query plan.

to:
Query unstructured data through a full text index


FREETEXT
FREETEXT queries are the most basic form of a full text search. The generic syntax for a
FREETEXT query is

FREETEXT ( { column_name | (column_list) | * }
, 'freetext_string' [ , LANGUAGE language_term ] )

An example of a FREETEXT query is

SELECT ProductDescriptionID, Description
FROM Production.ProductDescription
WHERE FREETEXT(Description,N'bike')
GO

The LANGUAGE parameter allows you to specify the word breaker and stemmer that are
employed to evaluate the input search argument. Although you might have used the German
language to build the full text index, you can specify an English-language parameter to
employ an English-specific word breaker for the query.

E
NOTE LANGUAGE PARAMETERS
Although you can employ a different language for a query than an index was built upon,
you will not automatically improve the accuracy of a full text search. Because stemmers are
language-specific, if the index was built with a language different from the language specified
in the query, you will not be able to find any inflectional forms of words with your query.


EXAM TIP
All search terms used with full text are Unicode strings. If you pass in a non-Unicode
string, the query still works, but it is much less efficient because the optimizer cannot use
parameter sniffing to evaluate distribution statistics on the full text index. Make certain
that all terms you pass in for full text search are always typed as Unicode for maximum
performance.

FREETEXTTABLE returns a result set with additional information that ranks the results in
accordance to how close the match was to the original search term. The generic syntax for
FREETEXTTABLE is

FREETEXTTABLE (table , { column_name | (column_list) | * }
, 'freetext_string'
[ ,LANGUAGE language_term ]
[ ,top_n_by_rank ] )

The same query expressed with FREETEXTTABLE is as follows:

SELECT a.ProductDescriptionID, a.Description, b.*
FROM Production.ProductDescription a
INNER JOIN FREETEXTTABLE(Production.ProductDescription,
Description,N'bike') b ON a.ProductDescriptionID = b.[Key]
ORDER BY b.[Rank]
GO

CONTAINS
For queries that require greater flexibility, you would use the CONTAINS predicate, which
allows you to
Search word forms
Search for word proximity
Provide relative weighting to terms

The generic syntax for CONTAINS is
CONTAINS
( { column_name | (column_list) | * }
, '< contains_search_condition >'
[ , LANGUAGE language_term ] )

< contains_search_condition > ::=
{ < simple_term > | < prefix_term > | < generation_term >
| < proximity_term > | < weighted_term > }
| { ( < contains_search_condition > )
[ { < AND > | < AND NOT > | < OR > } ]
< contains_search_condition > [ ...n ] }

Lesson 2: Querying Full Text Data CHAPTER 5 121

< simple_term > ::=
word | " phrase "
< prefix term > ::=
{ "word * " | "phrase *" }
< generation_term > ::=
FORMSOF ( { INFLECTIONAL | THESAURUS } , < simple_term > [ ,...n ] )

< proximity_term > ::=
{ < simple_term > | < prefix_term > }
{ { NEAR | ~ }
{ < simple_term > | < prefix_term > } } [ ...n ]

< weighted_term > ::=
ISABOUT ( { { < simple_term > | < prefix_term > | < generation_term >
| < proximity_term > }
[ WEIGHT ( weight_value ) ] } [ ,...n ] )

Search terms can be used for either exact matches or as prefixes. The following query
returns the products with an exact match on the word bike. Although the query looks almost
exactly the same as the FREETEXT version, the CONTAINS query returns two fewer rows due
to the exact matching, as follows:

WHERE CONTAINS(Description,N'bike')
GO

If you want to perform a basic wildcard search for words prefixed by a search term, you
can execute the following query:

WHERE CONTAINS(Description,N'"bike*"')
GO

If you compare the previous results to the FREETEXT query, you will see that each returns
the same set of rows. With CONTAINS, you have to specify explicitly that you want to perform
fuzzy searching, which includes word prefixes, but FREETEXT defaults to fuzzy searching.
In those cases where you want to search on word variants, you can use the FORMSOF,
INFLECTIONAL, and THESAURUS options. INFLECTIONAL causes the full text engine to
consider word stems. For example, searching on driven also searches on drive, driving, drove,
etc. The THESAURUS produces synonyms for the search term. An example of searching on
word variants is as follows:

WHERE CONTAINS(Description,N' FORMSOF (INFLECTIONAL,ride) ')
GO

WHERE CONTAINS(Description,N' FORMSOF (THESAURUS,metal) ')
GO

NOTE
E THESAURUS FILES
A thesaurus file exists for each supported language. All thesaurus files are XML files
stored in the FTDATA directory underneath your default SQL Server installation path.
The thesaurus files are not populated, so to perform synonym searches, you need to
populate the thesaurus files. You will learn about thesaurus files in Lesson 3, “Managing
Full Text Indexes.”

Because full text indexes are built against unstructured data, the index stores the proximity
of one word to another in addition to indexing the words found within the data. Proximity
searching is accomplished by using the NEAR keyword. Although you can perform proximity
and weighted proximity searches using CONTAINS, these types of searches generally are
performed using CONTAINSTABLE to use the RANK value that is calculated.
The following query returns all rows where bike is near performance. The rank value is
affected by the distance between the two words:

FROM Production.ProductDescription a INNER JOIN
CONTAINSTABLE(Production.ProductDescription, Description,
N'bike NEAR performance') b ON a.ProductDescriptionID = b.[Key]
ORDER BY b.[Rank]
GO

The following query returns the top 10 rows by rank according to the weighted averages
of the words performance, comfortable, smooth, safe, and competition:

N'ISABOUT (performance WEIGHT (.8), comfortable WEIGHT (.6),
smooth WEIGHT (.2) , safe WEIGHT (.5), competition WEIGHT (.5))', 10)
b ON a.ProductDescriptionID = b.[Key]
ORDER BY b.[Rank] DESC
GO

Q
Quick Check
1 . Which predicate performs fuzzy searching by default?

2. Which predicate is used to perform proximity and synonym searches?


Quick Check Answers
1 . FREETEXT and FREETEXTTABLE predicates perform wildcard searches by default.
T E

2. CONTAINS and CONTAINSTABLE are used for proximity, thesaurus, and
S E
inﬂectional searches.

PR ACTICE Querying with a Full Text Index

In the following practice, you execute several queries to compare the results of CONTAINS,
CONTAINSTABLE, FREETEXT, and FREETEXTTABLE.
1. Execute the following query and review the contents of the Description column:

SELECT Description
GO

2. Execute the following query and review the rows that are returned:

WHERE FREETEXT(Description,N'bike')
GO


FROM Production.ProductDescription a
INNER JOIN FREETEXTTABLE(Production.ProductDescription,
Description,N'bike') b ON a.ProductDescriptionID = b.[Key]
ORDER BY b.[Rank]
GO


WHERE CONTAINS(Description,N'bike')
GO


GO



WHERE CONTAINS(Description,N' FORMSOF (INFLECTIONAL,ride) ')
GO

7. Execute the following query and note that 0 rows are returned because you haven’t
populated a thesaurus ﬁle yet:

GO


N'bike NEAR performance') b ON a.ProductDescriptionID = b.[Key]
ORDER BY b.[Rank]
GO


N'ISABOUT (performance WEIGHT (.8), comfortable WEIGHT (.6),
smooth WEIGHT (.2) , safe WEIGHT (.5), competition WEIGHT (.5))', 10)
b ON a.ProductDescriptionID = b.[Key]
ORDER BY b.[Rank] DESC
GO

Lesson Summary
The FREETEXT and CONTAINS predicates return a value of True or False, which you can
then use in a query similar to an EXISTS clause to restrict a result set.
The FREETEXTTABLE and CONTAINSTABLE predicates return a result set that includes a
ranking column that tells you how closely a row matched the search term.
FREETEXT and FREETEXTTABLE perform wildcard searches by default.
CONTAINS and CONTAINSTABLE can perform wildcard searches along with proximity,
word form, and synonym searches.


Lesson Review
“Querying Full Text Data.” The questions are also available on the companion CD if you prefer
to review them in electronic form.

NOTE
E ANSWERS

1. You want to search for two terms based on proximity within a row. Which full text
predicates can be used to perform proximity searches? (Choose two. Each forms a
separate answer.)
A. CONTAINS
B. FREETEXT
C. CONTAINSTABLE
D. FREETEXTTABLE
2. You want to perform a proximity search based on a weighting value for the search
arguments. Which options for the CONTAINSTABLE predicate should you use?
A. FORMSOF with the THESAURUS keyword
B. FORMSOF with the INFLECTIONAL keyword
C. ISABOUT
D. ISABOUT with the WEIGHT keyword


Lesson 3: Managing Full Text Indexes
Although you could derive a significant amount of benefit from just the creation and
automatic population of a full text index, you can increase the index’s usefulness by creating
thesaurus files and building stop lists to filter out of the index. In this lesson, you learn how to
create and use a thesaurus file and how to build a stop list and rebuild a full text index.

to:
Maintain thesaurus files
Create and manage stop lists
Manage the population of full text indexes


Thesaurus
You use a thesaurus file to enable full text queries to retrieve rows that match the search
argument along with synonyms of a search argument. A thesaurus is a language-specific
XML file that is stored in the FTDATA directory. After you define it, SQL Server uses the
language-specific thesaurus automatically for FREETEXT and FREETEXTTABLE queries. The
thesaurus is used only for CONTAINS and CONTAINSTABLE queries when you specify the
FORMSOF THESAURUS option.
A thesaurus can contain expansion sets or replacement sets. A replacement set defines
a term or terms that are replaced within the search argument prior to the word breaker
tokenizing the argument list. An expansion set defines a set of terms that are used to expand
upon a search argument. When an expansion set is used, a match on any term within the
expansion set causes SQL Server to retrieve the row.
The basic structure of a thesaurus file is

<XML ID="Microsoft Search Thesaurus">
<!-- Commented out
<thesaurus xmlns="x-schema:tsSchema.xml">
<diacritics_sensitive>0</diacritics_sensitive>
<expansion>
Internet Explorer
IE
IE5
</expansion>
<replacement>
<pat>NT5</pat>
<pat>W2K</pat>
Windows 2000
</replacement>

Lesson 3: Managing Full Text Indexes CHAPTER 5 127

<expansion>
run
jog
</expansion>
</thesaurus>
-->
</XML>

The diacritics setting specifies whether the thesaurus is accent-sensitive.

Stop Lists
Stop lists, known in previous versions of SQL Server as noise word files, are used to exclude
words that you do not want included in a full text index. You exclude words from an index
that commonly occur so that valid, targeted results can be returned to validly formed
searches. Although you might want to search for “Microsoft” across the entire Internet, if your
search is being executed across Microsoft product documentation, not only would you likely
return every product document that exists within the indexed set, the designer of the system
would consider such a search request as invalid.
The common stop words for each language, such as the, a, and an, are already accounted
for by the full text indexing engine. In addition to the common words, an administrator
can add words that are specific to your organization which are likely to appear frequently
within the data you want to index. When a stop word is included as a search argument
or encountered within data being indexed, the word breaker categorizes the term as
uninteresting and removes it. If the arguments that you submitted within a full text predicate
are all stop words, then the query returns no results without ever accessing the data.

EXAM TIP
Although many features of SQL Server operate the same from one version to another,
others are enhanced or changed. You can assume that you will have questions on an
exam which are designed to test whether you know the change in behavior for the new
version. In SQL Server 2005 and prior versions, you configured noise word files that were
in the FTDATA directory. In SQL Server 2008, you configure stop lists that are contained
within a database in SQL Server. It is very likely that if you have a question concerning
the configuration of stop words, the available answers will include both the SQL Server
2005 and SQL Server 2008 behaviors and any of the SQL Server 2005 behaviors would be
incorrect answers.

Populate Full Text Indexes
Full text indexes can be populated manually, either on demand or on a schedule, or automatically
as data underneath the index changes. You can also stop, pause, or resume the population of an
index to control resource utilization when making large volumes of changes to a full text index.

The options for populating a full text index are
FULL Reprocesses every row from the underlying data to rebuild the full text index
completely
INCREMENTAL Processes only the rows that have changed since the last population;
requires a timestamp column on the table
UPDATE Processes any changes since the last time the index was updated; requires
that the CHANGE_TRACKING option is enabled for the index and set to MANUAL
To initiate population of a full text index, you would execute the ALTER FULLTEXT INDEX
statement.

Q
Quick Check
1 . Which type of files enable searching based on synonyms?

2. What do you configure to exclude words from your index and search arguments?

Quick Check Answers
1 . A thesaurus file allows you to configure synonyms for search arguments.

2. A stop list contains the list of words that you want excluded from a full text
index as well as from search arguments.

PR ACTICE Manage Full Text Indexes

In the following practices, you populate a thesaurus file and compare the search results. You
also build a stop list to filter common words from search arguments and the full text index.

PR ACTICE 1 Populate a Thesaurus
In this practice, you populate a thesaurus file.
1. Execute the following query and verify that you do not return any rows:
GO

2. Open the Tsenu.xml file (U.S. English) located at Program Files
Microsoft SQL ServerMSSQL10.MSSQLSERVERMSSQLFTData.
3. Change the contents of the file to the following:

<XML ID="Microsoft Search Thesaurus">
<thesaurus xmlns="x-schema:tsSchema.xml">
<diacritics_sensitive>0</diacritics_sensitive>

<expansion>
metal
steel
aluminum
alloy
</expansion>
</thesaurus>
</XML>

4. Reload the thesaurus ﬁle by executing the following (1033 speciﬁed the U.S. English
language):

USE AdventureWorks
GO
EXEC sys.sp_fulltext_load_thesaurus_file 1033;
GO

5. Execute the following query, verify that you now receive 33 rows of data, and compare
the rows returned to what you expect based on your thesaurus entry:

GO

PR ACTICE 2 Build a Stop List
In this practice, you build a stop list and then compare the results of queries.

NOTE
E COMMAND DELIMITERS
A semicolon is the Transact-SQL delimiter for a command. In most cases, you do not have
to specify a command delimiter explicitly. Some commands, however, such as CREATE
FULLTEXT STOPLIST, require you to specify a command delimiter explicitly for the
command to execute successfully.

1. Execute the following query against the AdventureWorks database and review the
16 rows returned:

GO

2. Create a new stop list by executing the following command:

CREATE FULLTEXT STOPLIST ProductStopList;
GO

3. Add the word bike to the stop list by executing the following command:

ALTER FULLTEXT STOPLIST ProductStopList ADD 'bike' LANGUAGE 1033;
GO

4. Associate the stop list to the full text index on the ProductDescription table as follows:

ALTER FULLTEXT INDEX ON Production.ProductDescription
SET STOPLIST ProductStopList
GO

5. Execute the following query against the AdventureWorks database and review the
results:

GO

Lesson Summary
You manage thesaurus files by editing the language-specific file that is contained
within the FTDATA directory for the instance.
You use the CREATE FULLTEXT STOPLIST and ALTER FULLTEXT STOPLIST commands to
build a list of stop words to be excluded from search arguments and the full text index.
Once a stop list has been built, you can use the ALTER FULLTEXT INDEX command to
associate a stop list with a full text index.

Lesson Review
“Managing Full Text Indexes.” The questions are also available on the companion CD if you

NOTE
E ANSWERS

1. You have a list of words that should be excluded from search arguments. Which action
should you perform in SQL Server 2008 to meet your requirements with the least
amount of effort?
A. Create a stop list and associate the stop list to the full text index.
B. Create a noise word file and associate the noise word file to the full text index.
C. Populate a thesaurus file and associate the thesaurus file to the full text index.
D. Parse the inbound query and remove any common words from the search
arguments.


Chapter Review


Chapter Summary
Full text indexes can be created against CHAR/VARCHAR, XML, and VARBINARY
columns.
When you full text index a VARBINARY column, you must specify the filter to be used
by the word breaker to interpret the document content.
Thesaurus files allow you to specify a list of synonyms or word replacements for search
terms.
Stop lists exclude a list of words from search arguments and a full text index.

Key Terms

Full text catalog
Full text filter
Full text index
Stemmer
Stop list
Thesaurus file
Word breaker

Case Scenario


Case Scenario: Installing and Configuring SQL Server 2008
Wide World Importers is implementing a new set of applications to manage several lines
of business. They need the ability to store large volumes of data within the corporate data
center that can be accessed from anywhere in the world.
workload of their employees along with new and pending customer requests. They also need
to be able to access large volumes of historical data to spot trends and optimize their staffing
and inventory levels.
for manuals based on product names or search for text within a manual. The sales force also
would like to enhance the company Web site to allow product descriptions to be created and
the customers a sales rep is serving, along with potential prospects. The data for the sales
force needs to be available even when the sales reps are not connected to the Internet or the
A variety of Microsoft Windows applications have been created with Microsoft Visual
Studio.NET and all data access is performed using stored procedures. The same set of
applications are deployed for users connecting directly to the corporate database server as
well as for sales reps connecting to their own local database servers.
1. What features of SQL Server 2008 should be used to store the product manuals?
2. What should you configure to allow users to perform searches against a product
manual?
3. To provide the best possible results for searches, which objects should be configured?

Suggested Practices
tasks.

Create a Full Text Index
Create a full text index against a large character data type.

Query a Full Text Index
Perform various queries using CONTAINS, CONTAINSTABLE, FREETEXT, and
FREETEXTTABLE against the data you created for a full text index and compare the
results to what you expect to return.


Create a Thesaurus File
Populate a thesaurus ﬁle to provide word replacements or synonyms and execute
additional queries to review the effect.

Create a Stop List
Create a stop list to exclude common words from your searches and verify the effect
when you attempt to utilize excluded words.




CHAPTER 6

Distributing and Partitioning
Data
able partitioning was introduced in Microsoft SQL Server 2005 as a means to split
T large tables across multiple storage structures. Previously, objects were restricted to a
single filegroup that could contain multiple files. However, the placement of data within a
filegroup was still determined by SQL Server.
Table partitioning allows tables, indexes, and indexed views to be created on multiple
filegroups while also allowing the database administrator (DBA) to specify which portion of
the object will be stored on a specific filegroup.
The process for partitioning a table, index, or indexed view is as follows:
1. Create a partition function.
2. Create a partition scheme mapped to a partition function.
3. Create the table, index, or indexed view on the partition scheme.

Manage data partitions

Lesson 1: Creating a Partition Function 137

Lesson 2: Creating a Partition Scheme 142

Lesson 3: Partitioning Tables and Indexes 146

Lesson 4: Managing Partitions 150

Before You Begin
An instance of SQL Server 2008 installed using Enterprise, Developer, or Evaluation.

CHAPTER 6 135

REAL WORLD
Michael Hotek

O ne of my customers was having some severe contention issues on their
production servers running SQL Server. The contention was so severe at
times that their customers could not log in to schedule payments, check account
balances, or perform any other actions. The contention issue was tracked down to
the archive routines that were mandated by a newly created SOX data retention
policy. No matter what they tried, the DBAs could not reduce the overhead of
the archive process enough to keep it from affecting customers. The daily archive
routines would take 3 to 4 hours to execute, and weekly archives of auditing data
could take as much as 22 hours.

To solve the contention issues, we partitioned all the tables covered by the SOX data
retention policy. Then we implemented a new process utilizing the SPLIT, MERGE,
H
and SWITCH capabilities of partitioning to move segments of data from the OLTP
tables into a set of staging tables. Not only did the time required to complete the
archive routines reduce from hours to less than 5 seconds, we also eliminated all the
data contention against the tables.

136 CHAPTER 6 Distributing and Partitioning Data

Lesson 1: Creating a Partition Function
A partition function defines the set of boundary points for which data will be partitioned.
In this lesson, you learn how to perform the first step in partitioning, which is creating a
partition function.

Create a partition function


Partition Functions
A partition function defines the boundary points that will be used to split data across a
partition scheme. Figure 6-1 shows an example of a basic partitioned table.

Table Part. Function Part. Scheme

ID c1 c2 c3 c4

1 3 A 3 Filegroup1

2 5 B 5 Filegroup2

3 2 B 2 Filegroup1

4 1 L 1 Filegroup4

5 5 Y 5 Filegroup2

6 5 A 5 Filegroup2

7 2 F 2 Filegroup1

FIGURE 6-1 A partitioned table

An example of a partition function is
CREATE PARTITION FUNCTION
mypartfunction (int)
AS RANGE LEFT
FOR VALUES (10,20,30,40,50,60)

Each partition function requires a name and a data type. The data type defines the limits
of the boundary points that can be applied and must span the same data range or less than
the data type of a column in a table, index, or indexed view to which you want to apply the
partition function.

Lesson 1: Creating a Partition Function CHAPTER 6 137

The data type for a partition function can be any native SQL Server data type, except text,
ntext, image, varbinary(max), timestamp, xml, and varchar(max). You also cannot use Transact-SQL
user-defined data types or Common Language Runtime (CLR) data types. Imprecise data types,
such as real and computed columns, must be persisted. Any columns that are used to partition
must be deterministic.
The AS clause allows you to specify whether the partition function you are creating is
RANGE LEFT or RANGE RIGHT. The LEFT and RIGHT parameters define which partition will
include a boundary point.
The FOR VALUES clause is used to specify the boundary points for the partition function.
If the partition function is created as RANGE LEFT, then the boundary point is included in the
left partition. If the partition function is created as RANGE RIGHT, then the boundary point is
included in the right partition.
A partition function always maps the entire range of data; therefore, no gaps are present.
You cannot specify duplicate boundary points. This ensures that any value stored in a column
always evaluates to a single partition. Null values are always stored in the leftmost partition
until you explicitly specify null as a boundary point and use the RANGE RIGHT syntax, in
which case nulls are stored in the rightmost partition.
Because the entire range of values is always mapped for a partition function, the result is
the creation of one more partition than you have defined boundary points. Table 6-1 shows
how the following partition function is defined in SQL Server:

AS RANGE LEFT
FOR VALUES (10,20,30,40,50,60)

TABLE 6-1 Range Left Partition Function

PARTITION NUMBER MINIMUM VALUE MAXIMUM VALUE

1 -∞ 10
2 11 20
3 21 30
4 31 40
5 41 50
6 51 60
7 61 +∞

NOTE
E CODE REUSE
The definition of a partition function does not provide a clause for an object, column, or
storage. This means that a partition function is a stand-alone object that you can apply to
multiple tables, indexes, or indexed views if you choose.


Table 6-2 shows how the partitions change when the partition function is deﬁned as
RANGE RIGHT instead, as follows:

AS RANGE RIGHT
FOR VALUES (10,20,30,40,50,60)

TABLE 6-2 Range Right Partition Function

PARTITION NUMBER MINIMUM VALUE MAXIMUM VALUE

1 -∞ 9
2 10 19
3 20 29
4 30 39
5 40 49
6 50 59
7 60 +∞

You can have a maximum of 1,000 partitions for an object; therefore, you are allowed to
specify a maximum of 999 boundary points.

EXAM TIP
You can partition an existing object after it has been populated with data. To partition
an existing table, you need to drop the clustered index and re-create the clustered index
on the partition scheme. To partition an existing index or indexed view, drop the index
and re-create the index on a partition scheme. You will want to be very careful when
partitioning existing objects that already contain data, because implementing the partition
will cause a signiﬁcant amount of disk input/output (I/O).

Q
Quick Check
1 . What data types cannot be used with partition functions?

2. What is the maximum number of partitions allowed for a table?

3. What is the maximum number of boundary points allowed for a partition function?

Quick Check Answers
1 . You cannot use text, ntext, image, xml, varbinary(max), varchar(max), or any CLR
data types.

2. The maximum number of partitions for a table is 1,000.

3. The maximum number of boundary points for a partition function is 999.


PR ACTICE Creating a Partition Function

In this practice, you create a database to utilize in learning partitioning and create a partition
function in the newly created database.
1. In Microsoft Windows Explorer, create a directory called C:Test if one does not already
exist.
2. Open a new query window in SQL Server Management Studio (SSMS).
3. Execute the following statement to create a test database:

--Create a database with multiple filegroups.
USE master
GO
CREATE DATABASE partitiontest
ON PRIMARY
(NAME = primary_data, FILENAME = 'c:testdb.mdf', SIZE = 2MB),
FILEGROUP FG1
(NAME = FG1_data, FILENAME = 'c:testFG1.ndf', SIZE = 2MB),
FILEGROUP FG2
FILEGROUP FG3
FILEGROUP FG4
FILEGROUP FG5
FILEGROUP FG6
FILEGROUP FG7
FILEGROUP FG8
FILEGROUP FG9
FILEGROUP FG10
FILEGROUP FG11
FILEGROUP FG12
FILEGROUP FG13
(NAME = FG13_data, FILENAME = 'c:testFG13.ndf', SIZE = 2MB)
LOG ON
(NAME = db_log, FILENAME = 'c:testlog.ndf', SIZE = 2MB, FILEGROWTH = 10% );
GO
USE partitiontest
GO


4. Create a partition function with boundary points for each month as follows:

--Create a partition function with boundary points for each month
CREATE PARTITION FUNCTION partfunc (datetime) AS
RANGE RIGHT FOR VALUES ('1/1/2005','2/1/2005','3/1/2005','4/1/2005','5/1/2005',
'6/1/2005','7/1/2005','8/1/2005','9/1/2005','10/1/2005','11/1/2005',
'12/1/2005')
GO

5. Execute the following command to view the results of step 4:

SELECT * FROM sys.partition_range_values;

Lesson Summary
A partition function deﬁnes the boundary points for a set of partitions.
You can create a partition function as either RANGE LEFT or RANGE RIGHT.
You can utilize any data type except: text, ntext, image, varbinary(max), varchar(max),
XML, or CLR data types.

Lesson Review
“Creating a Partition Function.” The question is also available on the companion CD if you

NOTE
E ANSWERS

1. Contoso has a very high-volume transaction system. There is not enough memory on
the database server to hold the active data set, so a very high number of read and
write operations are hitting the disk drives directly. After adding several additional
indexes, the performance still does not meet expectations. Unfortunately, the DBAs
cannot ﬁnd any more candidates for additional indexes. There isn’t enough money in
the budget for additional memory, additional servers, or a server with more capacity.
However, a new storage area network (SAN) has recently been implemented. What
technology can Contoso use to increase performance?
A. Log shipping
B. Replication
C. Partitioning
D. Database mirroring


Lesson 2: Creating a Partition Scheme
A partition scheme defines the storage structures and collection of filegroups that you
want to use with a given partition function. In this lesson, you learn how to create partition
schemes to map a partition to a filegroup.

Create a partition scheme


Partition Schemes
Partition schemes provide an alternate definition for storage. You define a partition scheme
to encompass one or more filegroups. The generic syntax for creating a partition scheme is

CREATE PARTITION SCHEME partition_scheme_name
AS PARTITION partition_function_name
[ ALL ] TO ( { file_group_name | [ PRIMARY ] } [ ,...n ] )

Three examples of partition schemes are as follows:

CREATE PARTITION SCHEME mypartscheme AS PARTITION mypartfunction TO (Filegroup1,
Filegroup2, Filegroup3, Filegroup4, Filegroup5, Filegroup6, Filegroup7)

Filegroup1, Filegroup2, Filegroup2, Filegroup3)

CREATE PARTITION SCHEME mypartscheme AS PARTITION mypartfunction ALL TO (Filegroup1)

Each partition scheme must have a name that conforms to the rules for identifiers. You use the AS
PARTITION clause to specify the name of the partition function that you want to map to the partition
scheme. The TO clause specifies the list of filegroups that are included in the partition scheme.

T
IMPORTANT FILEGROUPS
Any filegroup specified in the CREATE PARTITION SCHEME statement must already exist in
E
the database.

A partition scheme must be defined in such a way as to contain a filegroup for each
partition that is created by the partition function mapped to the partition scheme. SQL Server
2008 allows the use of the ALL keyword, as shown previously, which allows you to create all
partitions defined by the partition function within a single filegroup. If you do not use the ALL
keyword, the partition scheme must contain at least one filegroup for each partition defined
within the partition function. For example, a partition function with six boundary points (seven
partitions) must be mapped to a partition scheme with at least seven filegroups defined. If more


filegroups are included in the partition scheme than there are partitions, any excess filegroups
will not be used to store data unless explicitly specified by using the ALTER PARTITION SCHEME
command.

EXAM TIP
If you specify the ALL keyword when creating a partition scheme, you can specify a
maximum of one filegroup.

Table 6-3 shows how a partition function and partition scheme are mapped to specific
filegroups, as the following code shows:

AS RANGE LEFT
FOR VALUES (10,20,30,40,50,60);
GO
Filegroup2, Filegroup2, Filegroup4, Filegroup5, Filegroup6, Filegroup7);
GO

TABLE 6-3 Partition Function Mapped to a Partition Scheme

FILEGROUP PARTITION NUMBER MINIMUM VALUE MAXIMUM VALUE

Filegroup1 1 -∞ 10
Filegroup2 2 11 20
Filegroup2 3 21 30
Filegroup4 4 31 40
Filegroup5 5 41 50
Filegroup6 6 51 60
Filegroup7 7 61 +∞

Q
Quick Check
1 . How many filegroups can you specify if you use the ALL keyword when defining
a partition scheme?

2. Can you create a new filegroup at the same time that you are creating a partition
scheme?

Quick Check Answers
1 . You can specify exactly one filegroup when using the ALL keyword.

2. No. Any filegroups that you specify in the CREATE PARTITION SCHEME
statement must already exist in the database.

Lesson 2: Creating a Partition Scheme CHAPTER 6 143

PR ACTICE Create a Partition Scheme

In this practice, you create a partition scheme mapped to the partition function from the
previous exercise.
1. Open a new query window in SSMS and change context to the Partitiontest
database.
2. Execute the following statement to create a partition scheme mapped to the partition
function:

CREATE PARTITION SCHEME partscheme AS
PARTITION partfunc TO
([FG1], [FG2], [FG3], [FG4], [FG5], [FG6], [FG7], [FG8], [FG9], [FG10], [FG11],
[FG12], [FG13])
GO
--View the partition scheme
SELECT * FROM sys.partition_schemes;

Lesson Summary
A partition scheme is a storage definition containing a collection of filegroups.
If you specify the ALL keyword, the partition scheme allows only a single filegroup to
be specified.
If you do not specify the ALL keyword, you must specify enough filegroups to map all
the partitions created by the partition function.

Lesson Review
“Creating a Partition Scheme.” The question is also available on the companion CD if you

NOTE
E ANSWERS
The answer to the question and an explanation of why each answer choice is right or wrong

1. Margie’s Travel wants to keep orders in their online transaction processing database
for a maximum of 30 days from the date an order is placed. The orders table contains
a column called OrderDate that contains the date an order was placed. How can the
DBAs at Margie’s Travel move orders that are older than 30 days from the orders
table with the least amount of impact on user transactions? (Choose two. Each answer
represents a part of the solution.)


A. Use the SWITCH operator to move data partitions containing data that is older
than 30 days.
B. Create a stored procedure that deletes any orders that are older than 30 days.
C. Partition the order table using the partition function deﬁned for a datetime data
type using the OrderDate column.
D. Create a job to delete orders that are older than 30 days.

Lesson 2: Creating a Partition Scheme CHAPTER 6 145

Lesson 3: Creating Partitioned Tables and Indexes
The final step in creating a partitioned table or index is to create the table or index on a
partition scheme through a partitioning column. In this lesson, you learn how to create
partitioned tables and indexes.

to:
Create a partitioned table
Create a partitioned index


Creating a Partitioned Table
Creating a partitioned table, index, or indexed view is very similar to creating a nonpartitioned
table, index, or indexed view. Every object that you create has an ON clause that specifies
where SQL Server should store the object. The ON clause is routinely omitted, causing SQL
Server to create objects on the default filegroup. Because a partition scheme is just a definition
for storage, partitioning a table, index, or indexed view is a very straightforward process.
An example of a partitioned table follows:

CREATE TABLE Employee (EmployeeID int NOT NULL,
FirstName varchar(50) NOT NULL,
LastName varchar(50) NOT NULL)
ON mypartscheme(EmployeeID);
GO

The key is the ON clause. Instead of specifying a filegroup on which to create the
table, you specify a partition scheme. You have already defined the partition scheme with
a mapping to a partition function. So you need to specify the column in the table, the
partitioning key, to which the partition function will be applied In the previous example, we
created a table named Employee and used the EmployeeID column to partition the table
based on the definition of the partition function that was mapped to the partition scheme on
which the table is stored. Table 6-4 shows how the data is partitioned in the Employee table,
as shown in the following code:

AS RANGE LEFT
FOR VALUES (10,20,30,40,50,60);
GO
Filegroup2, Filegroup3, Filegroup4, Filegroup5, Filegroup6, Filegroup7);
GO


CREATE TABLE Employee (EmployeeID int NOT NULL,
FirstName varchar(50) NOT NULL,
LastName varchar(50) NOT NULL)
ON mypartscheme(EmployeeID);
GO

TABLE 6-4 Partition Function Mapped to a Partition Scheme

FILEGROUP PARTITION NUMBER MINIMUM EMPLOYEEID MAXIMUM EMPLOYEEID

Filegroup1 1 -∞ 10
Filegroup2 2 11 20
Filegroup3 3 21 30
Filegroup4 4 31 40
Filegroup5 5 41 50
Filegroup6 6 51 60
Filegroup7 7 61 +∞

The partitioning key that is specified must match the data type, length, and precision
of the partition function. If the partitioning key is a computed column, the computed
column must be PERSISTED.

NOTE
E PARTIAL BACKUP AND RESTORE
Partitioning has an interesting management effect on your tables and indexes. Based on
the definition of the partition function and partition scheme, it is possible to determine the
set of rows which are contained in a given filegroup. By using this information, it is possible
to back up and restore a portion of a table as well as to manipulate the data in a portion of
a table without affecting any other part of the table.

Creating a Partitioned Index
Similar to creating a partitioned table, you partition an index by specifying a partition scheme
in the ON clause, as in the following code example:

CREATE NONCLUSTERED INDEX idx_employeefirtname
ON dbo.Employee(FirstName) ON mypartscheme(EmployeeID);
GO

When specifying the partitioning key for an index, you are not limited to the columns that
on which the index is defined. As you learned in Chapter 4, “Designing SQL Server Indexing,”
an index can have an optional INCLUDE clause. When you create an index on a partitioned
table, SQL Server automatically includes the partitioning key in the definition of each index,
thereby allowing you to partition an index the same way as the table is partitioned.

Lesson 3: Creating Partitioned Tables and Indexes CHAPTER 6 147

Q
Quick Check
1 . What property must be set to use a computed column as a partitioning key?

2. Which clause of the CREATE TABLE or CREATE INDEX statements is used to
E X
partition the object?

Quick Check Answers
1 . A computed column must be PERSISTED.

2. The ON clause is used to specify the storage structure, ﬁlegroup, or partition
scheme, for the table or index.

PR ACTICE Partitioning a Table

In this practice, we create a partitioned table using the partition function and partition
scheme you created in previous exercises.
1. Open a new query window in SSMS and change context to the Partitiontest database.
2. Create an orders table on the partition scheme, as follows:

CREATE TABLE dbo.orders (
OrderID int identity(1,1),
OrderDate datetime NOT NULL,
OrderAmount money NOT NULL
CONSTRAINT pk_orders PRIMARY KEY CLUSTERED (OrderDate,OrderID))
ON partscheme(OrderDate)
GO

3. Populate some data into the orders table by executing the following code:

SET NOCOUNT ON
DECLARE @month int,
@day int

SET @month = 1
SET @day = 1

WHILE @month <= 12
BEGIN
WHILE @day <= 28
BEGIN
INSERT dbo.orders (OrderDate, OrderAmount)
SELECT cast(@month as varchar(2)) + '/' + cast(@day as varchar(2))
+ '/2005', @day * 20

SET @day = @day + 1
END

SET @day = 1
SET @month = @month + 1
END
GO

4. View the basic data distribution by executing the following:

SELECT * FROM sys.partitions
WHERE object_id = OBJECT_ID('dbo.orders')

Lesson Summary
The ON clause is used to specify the storage structure, ﬁlegroup, or partition scheme
to store a table or index.
The partitioning key must match the data type, length, and precision of the partition
function.
A computed column used as a partitioning key must be persisted.

Lesson Review
“Creating Partitioned Tables and Indexes.” The question is also available on the companion CD
if you prefer to review it in electronic form.

NOTE ANSWERS
The answer to the question and an explanation of why each answer choice is right or wrong

1. Wide World Importers has a very large and active data warehouse that is required to
be accessible to users 24 hours a day, 7 days a week. The DBA team needs to load new
sets of data on a weekly basis to support business operations. Inserting large volumes
of data would affect users unacceptably. Which feature should be used to minimize the
impact while still handling the weekly data loads?
A. Transactional replication
B. The SWITCH operator within partitioning
C. Database mirroring
D. Database snapshots

Lesson 3: Creating Partitioned Tables and Indexes CHAPTER 6 149

Lesson 4: Managing Partitions
After you partition a table or index, SQL Server automatically stores the data according to the
definition of your partition function and partition scheme. Over time, the partitioning needs
of your data can change. In this lesson, you learn how to change the definition of a partition
function, partition scheme, and manage partitions within a database.

to:
Add and remove boundary points from a partition function
Add filegroups to a partition scheme
Designate a filegroup to be used for the next partition created
Move partitions between tables


Split and Merge Operators
With data constantly changing, partitions are rarely static. Two operators are available to
manage the boundary point definitions—SPLIT and MERGE.
The SPLIT operator introduces a new boundary point into a partition function. MERGE
eliminates a boundary point from a partition function. The general syntax is as follows:

ALTER PARTITION FUNCTION partition_function_name()
{SPLIT RANGE ( boundary_value )
| MERGE RANGE ( boundary_value ) } [ ; ]

You must be very careful when using the SPLIT and MERGE operators. You are either
adding or removing an entire partition from the partition function. Data is not being removed
from the table with these operators, only the partition. Because a partition can reside only
in a single filegroup, a SPLIT or MERGE could cause a significant amount of disk I/O as SQL
Server relocates rows on the disk.

Altering a Partition Scheme
You can add filegroups to an existing partition scheme to create more storage space for a
partitioned table. The general syntax is as follows:

ALTER PARTITION SCHEME partition_scheme_name
NEXT USED [ filegroup_name ] [ ; ]

The NEXT USED clause has two purposes:
1. It adds a new filegroup to the partition scheme, if the specified filegroup is not already
part of the partition scheme.
2. It marks the NEXT USED property for a filegroup.


The filegroup that is marked with the NEXT USED flag is the filegroup that contains the
next partition that is created when a SPLIT operation is executed.

Index Alignment
You can partition a table and its associated indexes differently. The only requirement is
that you must partition the clustered index and the table the same way because SQL Server
cannot store the clustered index in a structure separate from the table.
However, if a table and all its indexes are partitioned using the same partition function,
they are said to be aligned. If a table and all its indexes use the same partition function and
the same partition scheme, the storage is aligned as well. A basic diagram of a storage-aligned
table is shown in Figure 6-2.

Partitioned Index

Partitioned Table
Partition (Data)

Filegroups

FIGURE 6-2 Storage alignment

By aligning the storage, rows in a table along with the indexes dependent upon the
rows are stored in the same filegroups. This ensures that if a single partition is backed up or
restored, the data and corresponding indexes are kept together as a single unit.

Switch Operator
At this point, partitioning is probably about as clear as mud. After all, the purpose of
partitioning is to split a table and its associated indexes into multiple storage structures. The
purpose of each operator is to manage the multiple storage structures. However, partitioning
allows advanced data management features that go well beyond simply storing a portion of a
table in a filegroup. To understand the effect, we must take a step back and look at the basic
layout of data within SQL Server.

Lesson 4: Managing Partitions CHAPTER 6 151

SQL Server stores data on pages in a doubly linked list. To locate and access data, SQL
Server performs the following basic process:
1. Resolve the table name to an object ID.
2. Locate the entry for the object ID in sys.indexes to extract the first page for the
object.
3. Read the first page of the object.
4. Using the Next Page and Previous Page entries on each data page, walk the page chain
to locate the data required.
The first page in an object does not have a previous page; therefore, the entry value is set
to 0:0. The last page of the object does not have a next page entry, so the entry value is set
to 0:0. When a value of 0:0 for the next page is located, SQL Server does not have to read any
further.
What does the page chain structure have to do with partitioning? When a table is
partitioned, the data is physically sorted, split into sections, and stored in filegroups. So from the
perspective of the page chain, SQL Server finds the first page of the object in partition 1; walks
the page chain; reaches the last page in partition 1, which points to the first page in partition 2;
and continues through the rest of the table. By creating a physical ordering of the data, a very
interesting possibility becomes available.
If you were to modify the page pointer on the last page of partition 1 to have a value of
0:0 for the next page, SQL Server would not read past that point, and it would cause data
to “disappear” from the table. There would not be any blocking or deadlocking because a
simple, metadata-only operation had occurred to update the page pointer. The basic idea for
a metadata operation is shown in Figure 6-3.
It would be nice to be able to simply discard a portion of a table. However, SQL Server
does not allow you to throw away data. This is where the SWITCH operator comes in.
The basic idea is that SWITCH allows you to exchange partitions between tables in a
perfectly scalable manner with no locking, blocking, or deadlocking.
SWITCH has several requirements to ensure that the operation is perfectly scalable. The
most important requirements are the following:
The data and index for the source and target tables must be aligned.
Source and target tables must have the same structure.
Data cannot be moved from one filegroup to another.
Two partitions with data cannot be exchanged.
The target partition must be empty.
The source or target table cannot participatie in replication.
The source or target tables cannot have full text indexes or a FILESTREAM data type
defined.


0:0 0:0

1:500 1:500

1:500 1:500

1:752 1:752

1:752 1:752

1:1043 0:0
Metadata Edit

1:1043 0:0

1:931 1:931

1:931 1:931

1:822 1:822

1:822 1:822

0:0 0:0

FIGURE 6-3 Doubly linked list

By meeting these requirements, you can accomplish an effect similar to Figure 6-4.

Table1 Table2

Q4 =
Q1 Q2 Q3 Q4
Empty

ALTER TABLE Table2 SWITCH PARTITION TO Table1 PARTITION 4

Table1 Table2

Q4 =
Q1 Q2 Q3 Q4
Empty

FIGURE 6-4 Switching a partition

EXAM TIP
For the 70-432 exam, you need to know that in a SWITCH operation, you cannot move data
from one ﬁlegroup to another or exchange two partitions with data.

Q
Quick Check
1 . Which operators are used to add or remove boundary points from a partition
function?

2. Which operator is used to move partitions between tables?

Quick Check Answers
1 . The SPLIT operator is used to introduce a new boundary point. The MERGE
T
operator is used to remove a boundary point.

2. The SWITCH operator is used to move partitions between tables.

PR ACTICE Sliding Window Scenario

In this practice, we use the SPLIT, MERGE, and SWITCH operators to remove data from a table
so that it can be archived without affecting query performance on the operational table.
In the lesson 3 practice, you set up the orders table with 12 months of order data. In this
practice, using the SPLIT operation, we create a new partition for January 2006. Using the
SWITCH function, we remove the partition for January 2005 so that it can be archived. Using
the MERGE function, we eliminate the boundary point for January 2005.
The data in the orders table should look like the following:
TABLE 6-5 Orders Table Data Distribution

FILEGROUP MINIMUM DATE MAXIMUM DATE

FG1 -∞ 12/31/2004
FG2 1/1/2005 1/31/2005
FG3 2/1/2005 2/28/2005
FG4 3/1/2005 3/31/2005
FG5 4/1/2005 4/30/2005
FG6 5/1/2005 5/31/2005
FG7 6/1/2005 6/30/2005
FG8 7/1/2005 7/31/2005
FG9 8/1/2005 8/31/2005


TABLE 6-5 Orders Table Data Distribution

FILEGROUP MINIMUM DATE MAXIMUM DATE

FG10 9/1/2005 9/30/2005
FG11 10/1/2005 10/31/2005
FG12 11/1/2005 11/30/2005
FG13 12/1/2005 +∞

1. Alter the partition scheme to set the NEXT USED ﬂag on FG1 as follows:

ALTER PARTITION SCHEME partscheme
NEXT USED [FG1];
GO

2. Introduce a new boundary point for January 2006 as follows:

ALTER PARTITION FUNCTION partfunc()
SPLIT RANGE ('1/1/2006');
GO

3. Create an archive table for the January 2005 orders as follows:

CREATE TABLE dbo.ordersarchive (
OrderID int NOT NULL,
OrderDate datetime NOT NULL
CONSTRAINT ck_orderdate CHECK (OrderDate<'2/1/2005'),
OrderAmount money NOT NULL
CONSTRAINT pk_ordersarchive PRIMARY KEY CLUSTERED (OrderDate,OrderID)
)
ON FG2
GO

4. Use the SWITCH operator to detach the January 2005 partition from orders and attach
it to ordersarchive as follows:

ALTER TABLE dbo.orders
SWITCH PARTITION 2 TO dbo.ordersarchive
GO

5. Remove the boundary point for January 2005 as follows:

ALTER PARTITION FUNCTION partfunc()
MERGE RANGE ('1/1/2005');
GO

6. Verify the contents of the orders and ordersarchive tables.

Lesson Summary
SPLIT is used to introduce a new boundary point to a partition function.
MERGE is used to remove a boundary point from a partition function.
SWITCH is used to move partitions between tables.

Lesson Review
“Managing Partitions.” The question is also available on the companion CD if you prefer to

NOTE
E ANSWERS

1. Contoso Limited has a very high-volume order entry system. Management has
determined that orders should be maintained in the operational system for a
maximum of six months before being archived. After data is archived from the table,
it is loaded into the data warehouse. The data load occurs once per month. Which
technology is the most appropriate choice for archiving data from the order entry
system?
A. Database mirroring
B. Transactional replication
C. Database snapshots
D. Partitioning


Chapter Review

Chapter Summary
Partitioning allows you to divide a table or index into multiple ﬁlegroups.
Tables and indexes are partitioned horizontally, based on rows, by specifying a
partitioning column.
To create a partitioned table or index, you need to perform the following actions:

• Create a partition function.

• Create a partition scheme mapped to the partition function.

• Create a table or index on the partition scheme.
The $PARTITION function allows you to limit queries to a speciﬁc partition.
You use the SPLIT function to add a new boundary point and hence a partition.
You use the MERGE function to remove a boundary point and hence a partition.
You use the SWITCH function to move a partition of data between tables.

Key Terms
Index alignment
Partition function
Partitioning key
Partition scheme

Case Scenario


Case Scenario: Building a SQL Server Infrastructure for Coho Vineyard
BACKGROUND
Company Overview
wines produced over the last several decades, Coho Vineyards has experienced significant
growth. Today, the company owns 12 wineries spread across the upper midwestern United
States and employs 400 people, 74 of whom work in the central office that houses servers
critical to the business.

Planned Changes
on the premises. Coho Vineyard wants to consolidate the Web presence of these wineries so
that Web visitors can purchase products from all 12 wineries from a single online store. All
data associated with this Web site will be stored in databases in the central office.

Databases


DATABASE SIZE

Customer 180 MB
Accounting 500 MB
HR 100 MB
Inventory 250 MB
Promotions 80 MB

serve as a back end to the new Web store. As part of their daily work, employees also will
connect periodically to the Order database using a new in-house Web application.

Database Servers
Server 2008, Enterprise Edition on Windows Server 2003, Enterprise Edition.

PERFORMANCE


named Order.Sales. Order.Sales includes two partitions. Partition 1 will include sales activity
for the current month. Partition 2 will be used to store sales activity for the previous month.
Orders placed before the previous month should be moved to another partitioned table
named Order.Archive. Partition 1 of Order.Archive will include all archived data. Partition 2
will remain empty.
These two tables and their partitions are illustrated in Figure 6-5.

Order.Sales

Partition 1 Partition 2

This month’s Last month’s
data data

Order.Archive

Partition 1 Partition 2

All data before (Empty)
last month

FIGURE 6-5 Partitions at Coho Vineyards

SPLIT, MERGE, and SWITCH operations will be used to move the data among these partitions.
1. Which of the following methods allows you to meet the archival requirements for the
Customer database with the least amount of administrative overhead?
A. Import all Customer data into a Microsoft Ofﬁce Excel spreadsheet. Save the
spreadsheets on disk for six years.
B. Perform a monthly backup of all Customer data to tape. Save the backups in a
secure location for six years.
C. Create a new database named ArchiveData and use database replication to
migrate the Customer data into the new database every month. Keep all data in
the ArchiveData database for at least six years.
D. Do not copy the Customer data to another location. Simply save all data on the
Customer database for at least six years.


2. How should you move Partition 2 of the Order.Sales table to the Order.Archive table?
A. Use a SPLIT operation to move data to Partition 1 on Order.Archive.
B. Use a SPLIT operation to move data to Partition 2 on Order.Archive.
C. Use a SWITCH operation to move data to Partition 1 on Order.Archive.
D. Use a SWITCH operation to move data to Partition 2 on Order.Archive.
3. Which of the following two partitions must be located on the same ﬁlegroup?
A. Partitions 1 and 2 on Order.Sales
B. Partitions 1 and 2 on Order.Archive
C. Partition 2 on Order.Sales and Partition 2 on Order.Archive
D. Partition 1 on Order.Sales and Partition 1 on Order.Archive

Suggested Practices
tasks.

Partitioning
For this task, you practice partitioning tables and using the SPLIT, MERGE, and SWITCH
operators to archive data as well as load data.
Practice 1 Create a partitioned table and practice adding ﬁlegroups as well as
splitting and merging partitions.
Practice 2 Using the partitioned table in Practice 1, create an archive table to use
with the SWITCH operator to remove data.
Practice 3 Using the archive table created in Practice 2, use the SWITCH operator to
append the data to another table.




CHAPTER 7

Importing and Exporting
Data
ost applications are designed to allow users to manipulate individual pieces of data.
M However, there are times when you need to import or export large volumes of data.
When importing a large volume of data, an INSERT statement is not very efficient. Likewise,
when exporting data, it is not very efficient to return a result set to an application which
then has to write the rows out to a file or some other destination. In this chapter, you will
learn about the features that SQL Server has available to efficiently import as well as export
large volumes of data while consuming minimal resources.

Import and export data.

Lesson in this chapter:
Lesson 1: Importing and Exporting Data 163

Before You Begin

REAL WORLD
Michael Hotek

A couple of years ago, I was working with a customer that had an entire division
of the company focused on fulfilling orders for partners. Once per day,
partners would upload files to our FTP server with their orders. The files would
be parsed; loaded into a database; and then routed through the picking, packing,
shipping, and invoicing process. Unfortunately, the process could take anywhere

CHAPTER 7 161

from two hours to seven hours to import the orders of each partner, but the
business needed all partner’s files imported within an hour to meet the agreements
with their customers. It was very common to have 30 or more files sitting in a folder
waiting to be processed. Furthermore, only about 5 percent of the partners were
even allowed to upload orders in bulk because the system could not handle any
additional load.

The system that was built to import the orders was composed of about a dozen C++
applications, in excess of 30 folders spanning three servers, and a small amount of
code within the database where the orders were imported. Written over a decade
ago, over 90 percent of the work done within the applications was to move files
around from between directories, and the only purpose of the directories was to
isolate files during processing. Further research uncovered code to manage multiple
applications attempting to access the same file, a situation that was actually created
by the way that the whole system was put together.

After the files finally got to the point where something was processing that
the business even cared about, we found an application that read one row at
a time from the file and processed it. For each row that was processed, the
application executed in excess of 100 queries to validate product codes, inventory
on hand, price levels, and several other business rules.

We rewrote the entire system to use the bulk import capabilities of SQL Server.
The first phase of the rewrite replaced all the C++ code as well as the entire folder
structure with a single folder, one stored procedure to BCP the files, and one stored
procedure to process everything after the file was imported. A subsequent phase
replaced the BCP routine with an SSIS package capable of processing multiple files
in parallel and was more flexible in dealing with multiple data formats.

At the completion of phase 1, the import routine was outrunning the ability of
partners to upload files. Within one minute of the file being delivered, all the data
was imported into SQL Server and processed, and the orders were at the warehouse
queued up for packing. After we finished phase 2, we were able to extend the
order upload service to the other 95 percent of the partners, and even the largest
partner’s order files were processed and acknowledged within 15 seconds of the file
being delivered. The direct result to the business was not only the retention of their
existing customers, but an increase in their customer base, which fueled a business
increase of more than 400 percent within six months of implementing the new
system.

162 CHAPTER 7 Importing and Exporting Data

Lesson 1: Importing and Exporting Data
The BULK INSERT command and Bulk Copy Program (BCP) are used to provide limited import
and export capabilities. In this lesson, you learn how to use the BCP utility to export as well as
import data. You also learn how to import data using the BULK INSERT command and the SQL
Server Import and Export Wizard available within SQL Server Management Studio (SSMS).

to:
Export data to a file using BCP
Import data from a file using BCP
Import data from a file using BULK INSERT
Import and export data using the SQL Server Import and Export Wizard


Bulk Copy Program
BCP.exe, BCP, is arguably the oldest piece of code within SQL Server. I can remember using
BCP to import and export data before Microsoft even licensed the first version of SQL Server
from Sybase. Although it has been enhanced over the more than two decades that I have
been using BCP, all the original syntax, the purpose, and the performance have not changed.
Simply put, BCP is the most efficient way to import well-defined data in files into SQL Server
as well as export tables to a file.
BCP is designed as a very fast, lightweight solution for importing and exporting data. BCP
is also designed to handle well-formatted data. If you need to perform transformations or to
perform error-handling routines during the import/export process, you should be using SQL
Server Integration Services (SSIS) for your import/export processes.
If you are exporting data using BCP, the account that BCP is running under needs only
SELECT permissions on the table or view. If you are importing data, the account that BCP is
running under needs SELECT, INSERT, and ALTER TABLE permissions.
BCP is a utility that you execute from the command line and has the following syntax:

bcp {[[database_name.][owner].]{table_name | view_name} | "query"}
{in | out | queryout | format} data_file
[-mmax_errors] [-fformat_file] [-x] [-eerr_file]
[-Ffirst_row] [-Llast_row] [-bbatch_size]
[-n] [-c] [-w] [-N] [-V (60 | 65 | 70 | 80)] [-6]
[-q] [-C { ACP | OEM | RAW | code_page } ] [-tfield_term]
[-rrow_term] [-iinput_file] [-ooutput_file] [-apacket_size]
[-Sserver_name[instance_name]] [-Ulogin_id] [-Ppassword]
[-T] [-v] [-R] [-k] [-E] [-h"hint [,...n]"]

Lesson 1: Importing and Exporting Data CHAPTER 7 163

Although BCP has many options, also known as command-line switches, you most commonly
use a small set of them.

CAUTION
N CASE SENSITIVITY
All command-line switches for BCP are case-sensitive. For example you use –e to specify an
error file, yet –E tells BCP to preserve identity values during an import.
E

Below are three examples of common BCP commands:

bcp AdventureWorks.HumanResources.Department out c:testdepartment.txt -n -SHOTEK –T

bcp AdventureWorks.HumanResources.Department in c:testdepartment.txt -c
-SHOTEK -U<login> -P<password>

bcp "SELECT Name, GroupName FROM HumanResources.Department" queryout
c:testdepartment.txt -n -SHOTEK –T

NOTE
E EXPORT SOURCES
All the discussions for importing and exporting data in this chapter refer to a table for
simplicity. You can export from tables or views. You can also import data into a view so
long as the view meets the requirements for executing an INSERT statement.
T

The first argument specifies the table or query that BCP operates upon. The third argument
specifies the file that is the source or target of the BCP command.
The second argument for the BCP command can be set to in, out, or queryout. When the
switch is set to in, the BCP command imports the entire contents of the file specified into the
table specified. If the second argument is set to out, BCP exports the entire contents of the
table into the file specified. If you want to export only a subset of a table or the result set of
a query, you can replace the name of the table with a query delimited by double quotes and
then specify the queryout parameter. As the name implies, the queryout option allows you
only to export data.

EXAM TIP
The exam tests you on whether you know which import/export option is most appropriate
to a given situation.

Following the BCP arguments are a set of command-line switches that you can specify
in any order, but most database administrators (DBAs) follow the convention of specifying
switches in the same order as they appear in the general syntax listing.
The –n and –c switches are mutually exclusive. The –n switch specifies that the data in
the file is in the native format of SQL Server. The –c switch specifies that data in the file is in
a character format. If you are exporting data which will be imported to another SQL Server

instance, you should use the –n switch because it provides better performance by allowing
SQL Server to dump data in the internal storage format that SQL Server uses on the data
pages. If you need to move the file between database platforms or to other non–SQL Server
systems, you should use the –c switch, which converts the data from the native storage
format of SQL Server into a standard character-based format. The switch that is specified
when importing data from a file is dictated by the format of the file that you receive, because
BCP cannot convert data between storage formats during an import.
When you execute Transact-SQL (T-SQL) commands, you don’t have to specify the
instance or database context because the connection properties already have encapsulated
the connection information. Because BCP is an application, it does not have any database
or instance context. Therefore, you have to specify the connection information to use. The
–S switch specifies the instance name to connect to. You can log in to an instance using either
SQL Server or Microsoft Windows credentials. The –T switch designates a trusted connection
and BCP uses the Windows credentials of the account that is executing the BCP command to
connect. You can use a SQL Server login by specifying the –U and –P switches. –U specifies
the login name and –P specifies the password to use.

NOTE
E ENFORCING CHECK CONSTRAINTS AND TRIGGERS
When you import data into a table using BCP, triggers and check constraints are disabled
by default. If you want to enforce check constraints and fire triggers during the import,
you need to use the –h switch. If you do not disable triggers and check constraints during
an import, you do not need ALTER TABLE permissions.
E

The BULK INSERT command
One of the drawbacks of the BCP utility is that it is a command-line program. The BULK INSERT
command has many of the same options as BCP and behaves almost identically, except for
the following two differences:
BULK INSERT cannot export data.
BULK INSERT is a T-SQL command and does not need to specify the instance name or
login credentials.
The general syntax for BULK INSERT is:

BULK INSERT
[ database_name . [ schema_name ] . | schema_name . ] [ table_name | view_name ]
FROM 'data_file'
[ WITH
( [ [ , ] BATCHSIZE = batch_size ] [ [ , ] CHECK_CONSTRAINTS ]
[ [ , ] CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' } ]
[ [ , ] DATAFILETYPE = { 'char' | 'native'| 'widechar' | 'widenative' } ]
[ [ , ] FIELDTERMINATOR = 'field_terminator' ] [ [ , ] FIRSTROW =first_row ]


[ [ , ] FIRE_TRIGGERS ] [ [ , ] FORMATFILE = 'format_file_path' ]
[ [ , ] KEEPIDENTITY ] [ [ , ] KEEPNULLS ]
[ [ , ] KILOBYTES_PER_BATCH =kilobytes_per_batch ] [ [ , ] LASTROW = last_row ]
[ [ , ] MAXERRORS = max_errors ] [ [ , ] ORDER ( { column [ ASC | DESC ] } [ ,...n ] ) ]
[ [ , ] ROWS_PER_BATCH = rows_per_batch ] [ [ , ] ROWTERMINATOR = 'row_terminator' ]
[ [ , ] TABLOCK ] [ [ , ] ERRORFILE = 'file_name' ] )]

The SQL Server Import and Export Wizard
BCP and the BULK INSERT command provide a simple, lightweight means to import and
export data through the use of files. If you want to import and export data directly between
source and destination, as well as apply transformations and error-handling routines, you can
use the capabilities of SSIS to build packages.
The SQL Server Import and Export Wizard provides a subset of the SSIS capabilities within
SSMS to enable administrators to move data between a source and destination. You access
the wizard by right-clicking a database within Object Explorer, selecting Tasks, and then
selecting either Import Data or Export Data.
Although BCP and BULK INSERT use files, the Import and Export Wizard can use any data
source that is recognized by SSIS, such as Microsoft Office Excel, Microsoft Office Access, or
Extensible Markup Language (XML) files. In addition, the Import and Export Wizard supports
any data source or destination for which you have an Object Linking and Embedding
Database (OLE DB) provider. BCP and BULK INSERT require a SQL Server instance to be either
the source or target for the data, but the Import and Export Wizard does not require a SQL
Server instance to be either the source or destination. Finally, the Import and Export Wizard
can move data from multiple tables or files in a single operation, and BCP and BULK INSERT
can operate only against a single table, view, or query.

MORE INFO INTEGRATION SERVICES
The full capabilities of SSIS packages are beyond the scope of this book. For an overview
of SSIS capabilities, please refer to Microsoft SQL Server 2008 Step by Step by Mike Hotek
(Microsoft Press, 2008).

Q
Quick Check
1 . What are the data formats that BCP supports and the command-line switches for
each format?

2. Which parameter do you specify to export data using a query?

3. The Import and Export Wizard is based on which feature of SQL Server?

4. Which sources and destinations is the Import and Export Wizard capable of using?


Quick Check Answers
1 . BCP can work with data in either a character or native format. The –c switch
c
designates character mode while the –n switch is used for native mode.

2. The queryout parameter is used to export the result set of a query.
t

3. The Import and Export Wizard uses a subset of the SSIS feature.

4. You can define any source or destination for which you have an OLE DB provider.

PR ACTICE Exporting Data

In these practices, you export the contents of the HumanResources.Department table to both
native and character-based files. You also use the Import and Export Wizard to export the
entire contents of the AdventureWorks database.

PR ACTICE 1 Export Data Using BCP
In this practice, you export the contents of the HumanResources.Department table.
1. Open a command-prompt window and execute the following command:

bcp AdventureWorks.HumanResources.Department out c:testdepartment.txt -c
-S<instance name> –T

2. Open the file generated in Notepad and inspect the results.
3. Execute the following command from the command prompt:

bcp AdventureWorks.HumanResources.Department out c:testdepartment.bcp -n
-S<instance name> –T

4. Open the file generated in Notepad and inspect the results.

PR ACTICE 2 Exporting Tables
In this practice, you export the data from the AdventureWorks database to a new database
named AdventureWorksTest.
1. In SSMS, execute the following command from a query window:

CREATE DATABASE AdventureWorksTest

2. In Object Explorer, right-click the AdventureWorks database, select Tasks, and then
select Export Data.
3. Click Next when the Welcome To SQL Server Import And Export Wizard page appears.
4. Specify the AdventureWorks database as the source, as shown here, and click Next.

5. Specify the AdventureWorksTest database as the destination, as shown here, and
click Next.


6. Select Copy Data From One Or More Tables Or Views, and click Next.
7. Select all the tables in the AdventureWorks database, as shown here.

8. In the Source list, select the AWBuildVersion database, as shown here, and click Edit
Mappings.


9. Inspect the options that are available as the data is moved from source to destination.
Click Cancel, and then click Next.

10. When SQL Server moves data using SSIS, it translates the data types to .NET data
types because SSIS is based on C#.NET. C#.NET does not have a data type equivalent
for the hierarchy, geography, or geometry data types. Therefore, you cannot use the
Import and Export Wizard or SSIS if you need to work with these three data types.
If one or more of these data types are present, you see the error message shown
here. Click Back.

11. Clear the [HumanResources].[Employee], [Person].[Address], [Production].[Document],
and [Production].[ProductDocument] tables check boxes, and click Next.

12. Verify that the Run Immediately check box is selected, as shown here, and
click Next.

13. Review the actions to be performed, as shown here. When all actions have been
performed, click Finish.


Lesson Summary
BCP is a lightweight, command-line utility that allows you to import and export data.
The BCP utility is not designed to provide data transformation or error-handling
routines.
In addition to exporting the entire contents of a table or view, you can export the
results of a query by using the queryout argument for the BCP utility.
BULK INSERT is a T-SQL command you can use only to import data.
The Import and Export Wizard, based on a subset of SSIS, allows you to move data
directly between a source and destination without requiring the use of a ﬁle.

Lesson Review
The following questions are intended to reinforce key information presented in this lesson.
The questions are also available on the companion CD if you prefer to review them in
electronic form.

NOTE
E ANSWERS
Answers to these questions and explanations of why each answer choice is correct or
incorrect are located in the “Answers” section at the end of the book.

1. You want to import data into the Orders table. The table has triggers and check
constraints that you want to be checked to guarantee integrity. You choose to use
the BCP utility and specify the -h “CHECK_CONSTRAINTS, FIRE_TRIGGERS” hint to
accomplish your task. Which of the following permissions must be in place?
A. SELECT permission on the Orders table
B. ALTER TABLE on the Orders table
C. INSERT permission on the Orders table
D. A member of the bulkadmin role
2. You are performing a migration on the Order database at Contoso from Oracle to SQL
Server. The Order database contains several hundred tables. The CustomerAddress
table has an XML column named AddressBook. What is the most efﬁcient, least
intrusive way to move the data to the new SQL Server database?
A. Move the Order database from Oracle to SQL Server using replication.
B. Unload the data using Oracle utilities and load the data into SQL Server using BCP.
C. Move the Order database using the Import and Export Wizard.
D. Move the data from Oracle to SQL Server using the OPENROWSET function.


Chapter Review
following tasks:

Chapter Summary
BCP is a program that allows you to import data from a file into a table as well as
export data from a table to a file.
BULK INSERT is a T-SQL command that allows you to import data from a file into
a table.
The Import and Export Wizard uses a subset of the SSIS feature set to move data
between a source and destination.

Key Terms
Bulk Copy Program (BCP)
BULK INSERT

Case Scenario

Case Scenario: Designing an Import Strategy for Coho Vineyard

BACKGROUND

Company Overview



PLANNED CHANGES
Until now, each of the 16 wineries owned by Coho Vineyard has run a separate Web site
locally on the premises. Coho Vineyard wants to consolidate the Web presence of these
wineries so that Web visitors can purchase products from all 16 wineries from a single online
store. All data associated with this Web site will be stored in databases in the central office.
sent to the central office at the end of each day.


Databases


DATABASE SIZE

Accounting 500 MB
HR 100 MB
Inventory 250 MB
Promotions 80 MB

After the database consolidation project is complete, a new database named Order
will serve as a data store to the new Web store. As part of their daily work, employees
also will connect periodically to the Order database using a new in-house Web
application.
The HR database contains sensitive data and is protected using Transparent Data Encryption
(TDE). In addition, data in the Salary table is encrypted using a certificate.

Database Servers


named Order.Sales. Order.Sales includes two partitions. Partition 1 includes sales activity for
the current month. Partition 2 is used to store sales activity for the previous month. Orders
placed before the previous month should be moved to another partitioned table named Order.
Archive. Partition 1 of Order.Archive includes all archived data. Partition 2 remains empty.
A process needs to be created to load the inventory data from each of the 16 wineries
by 4 A.M. daily.
single threaded C++ application that takes between three and six hours to process the files.
1. What method should be used to move the data from each winery into the central
database?
2. What method would provide the most flexible way to handle all the EDI submissions?

Suggested Practices

Import and Export Data
Practice 1 Use BCP in character mode to export the contents of a table to a file.
Practice 2 Use BULK INSERT to import the contents of the file generated in Practice 1
into a table.
Practice 3 Learn SSIS. The SSIS platform has capabilities to accomplish any import/
export or data manipulation process that you need for your environment.


The practice tests on this book’s companion CD offer many options. For example, you
can test yourself on just one exam objective, or you can test yourself on all the 70-432
certiﬁcation exam content. You can set up the test so that it closely simulates the experience
of taking a certiﬁcation exam, or you can set it up in study mode so that you can look at the
correct answers and explanations after you answer each question.



CHAPTER 8

Designing Policy Based
Management
rior to Microsoft SQL Server 2008, you performed configuration management of an
P environment by using a conglomeration of documents, scripts, and manual checking. The
configuration options, naming conventions, and allowed feature set were outlined in one or
more documents. To enforce your standards, you would have had to connect to each instance
and execute scripts that needed to be maintained and updated with new versions and service
packs. In this chapter, you learn about the new Policy Based Management framework that
allows you to check and enforce policy compliance across your entire SQL Server infrastructure.

Implement the declarative management framework (DMF).
Configure surface area.

Lesson in this chapter:
Lesson 1: Designing Policies 179

Before You Begin
SQL Server 2008 installed

REAL WORLD
Michael Hotek

M anaging a single server running SQL Server or even a small group of them,
one at a time, has always been reasonably straightforward. However, when
you needed to uniformly manage an entire SQL Server environment or a large
group of instances, you had to either write a large amount of custom code or
purchase additional products.

CHAPTER 8 177

One customer I work with has an environment with more than 5,000 SQL Server
instances. Prior to the release of SQL Server 2008, two DBAs were required to
manage the almost 50,000 lines of code that checked instances for compliance to
corporate policies. They devoted more than 70 hours each week to maintaining the
code and checking systems.

After deploying SQL Server 2008, they started to convert all their code to policies.
After the conversion was completed, they estimate that less than 1,000 lines of
custom logic remained. By using the central management features to check and
enforce policies across the environment, they should be able to save over 3,000
hours of management and maintenance time per year.

178 CHAPTER 8 Designing Policy Based Management

Lesson 1: Designing Policies
SQL Server 2008 has a new feature called Policy Based Management, also known as
the declarative management framework (DMF), to tackle the problem of standardizing
your SQL Server instances. Although Policy Based Management can be used just to
alert an administrator when an object is out of compliance, depending upon the type of
policy, you can also enforce compliance by preventing changes that would violate a policy.
Policy Based Management introduces the following new objects that are used to design
and check for compliance:
Facets
Conditions
Policies
Policy targets
Policy categories

to:
Create conditions
Define policies
Specify targets for policy checking
Configure policy categories
Check for policy compliance
Import and export policies


Facets
Facets are the core object upon which your standards are built. Facets define the type of
object or option to be checked, such as database, Surface Area, and login. SQL Server ships
with 74 facets, implemented as .NET assemblies, each with a unique set of properties.
All the objects for Policy Based Management are stored within the msdb database. You
can get a list of the facets available by querying the dbo.syspolicy_management_facets table.
Unfortunately, unless you want to write code to interact with Server Management Objects
(SMOs), the only way to get a list of facet properties is to open each facet in SQL Server
Management Studio (SSMS), one at a time, and view the list of properties.

Lesson 1: Designing Policies CHAPTER 8 179

Conditions
When you define a WHERE clause for a data manipulation language (DML) statement, you
set a condition for the DML statement that defines the set of rows that meet your specific
inclusion criteria. Within the Policy Based Management framework, conditions are the
equivalent of a WHERE clause that defines the criteria needing to be checked.
You define the conditions that you want to check or enforce for a policy by defining
criteria for the properties of a facet. Just like a WHERE clause, a condition can be defined by
one or more facet properties, and a single facet property can be checked for multiple criteria.
The comparison operators that can be used are restricted by the data type of the property.
For example, a property of type string can be checked with =, <>, LIKE, NOT LIKE, IN, or NOT
IN, whereas a boolean type can only be checked for = and <>.
If a condition that you want to check for a facet does not have a specific property that
can be used, you can use the advanced editor to define complex conditions that compare
multiple properties and incorporate functions. For example, you can check that every table
has a primary key and that a table with a single index must be clustered. Unfortunately, if you
define a condition using the advanced editor, a policy that incorporates the condition must
be executed manually and cannot be scheduled.
Conditions are checked in a single step. You cannot have a condition pull a list of objects,
iterate across the list of objects, and then apply subsequent checks. To work within the
Policy-Based Management framework, conditions need to return a True or False value.
Therefore, when building complex conditions with the advanced editor, you cannot return
a list of objects that do not meet your criteria. You have to define the condition such that
if any object does not meet your criteria, a value of False is returned.
Although you can check many properties of a facet within a single condition, a single condition
can’t be defined for multiple facets. For example, you can check all 10 of the properties for the
Surface Area Configuration facet in a single condition, but you have to define a second condition
to check a property of the Surface Area Configuration for Analysis Services.

Policy Targets
Conditions are the foundation for policies. However, you don’t always want to check policies
across every object available, such as every database in an instance or every index within
every database. Conditions can also be used to specify the objects to compare the condition
against, called policy targeting or target sets.
You can target a policy at the server level, such as instances that are SQL Server 2005 or
SQL Server 2008. You can also target a policy at the database level, such as all user databases
or all system databases.

Policies
Policies are created for a single condition and set to either enforce or check compliance. The
execution mode can be set as follows:
On demand Evaluates the policy when directly executed by a user
On change, prevent Creates data definition language (DDL) triggers to prevent a
change that violates the policy
On change, log only Checks the policy automatically when a change is made using
the event notification infrastructure
On schedule Creates a SQL Server Agent job to check the policy on a defined schedule
If a policy contains a condition that was defined using the advanced editor, the only available
execution mode is On Demand.
To use the On change, prevent and On change, log only execution modes, the policy must
target instances running SQL Server 2005 and above. The On change, log only execution mode
uses the event notification infrastructure that is available only for SQL Server 2005 and later. The
On change, prevent execution mode depends on DDL triggers to prevent a change that is not
in compliance with the policy and are available only for SQL Server 2005 and later. In addition,
you can set a policy to On change, prevent only if it is possible for a DDL trigger to prevent the
change. For example, you could prevent the creation of an object that violated your naming
conventions, but you could not enforce a policy that all databases have to be in the Full recovery
model because the ALTER DATABASE command executes outside the context of a transaction.

Policy Categories
Policy categories can be used to group one or more policies into a single compliance unit. If
not specified, all policies belong to the DEFAULT category. To check or enforce policies, you
create a subscription to one or more policies.
Subscription occurs at two levels—instance and database. A member of the sysadmin role
can subscribe an instance to a policy category. Once subscribed, the owner of each database
within the instance can subscribe their database to a policy category.
Each policy category has a Mandate property that applies to databases. When a policy category
is set to Mandate and a sysadmin subscribes the instance to a policy category, all databases that
meet the target set are controlled by the policies within the policy category. A policy subscription
to a policy category set to Mandate cannot be overridden by a database owner.

Policy Compliance
Because you cannot set all policies to enforce compliance, you need to check policies manually
that cannot be enforced on a regular basis. You view policies that apply to an instance by
right-clicking the name of the instance within Object Explorer and selecting Policies, View.


You can check policies that apply to an instance by right-clicking the name of the instance
within Object Explorer and selecting Policies, Evaluate.
You can check all policies within an instance, as shown in Figure 8-1, by right-clicking the
Policies node and selecting Evaluate.

FIGURE 8-1 Evaluate policies

By clicking Evaluate, you execute the policies and review the results, as shown in Figure 8-2.

FIGURE 8-2 Policy check results


EXAM TIP
Defining a condition to be used as a policy target is a critical component to well-defined
policies. A policy fails during a check if the object does not conform to the criteria and if
the property does not exist. For example, attempting to check that the Web Assistant is
disabled against a SQL Server 2008 instance fails because the feature does not exist.

Central Management Server
Policy Based Management would be limited to SQL Server 2008 and be very tedious if you
had to do any of the following:
Duplicate policies on every instance
Create subscriptions to each instance in your environment individually
Check compliance for each instance individually
Within the Registered Servers pane in SSMS, you can configure a Central Management Server.
Underneath the Central Management Server, you can create multiple levels of folders, and register
instances into the appropriate folder. After you have the Central Management Server structure set
up in SSMS, you can evaluate polices against a specific instance, folder, or all instances underneath
the Central Management Server. Figure 8-3 shows an example of a Central Management Server.

FIGURE 8-3 Central Management Server

Import and Export Policies
Policies and conditions can be exported to files as well as imported from files. SQL Server
ships with 53 policies that are located in the Microsoft SQL Server100ToolsPolicies folder.
There are 50 policies for the database engine, 2 policies for Reporting Services, and 1 policy
for Analysis Services. The CodePlex site (http://www.codeplex.com) has additional policies that
you can download and import.


You can import policies within the Registered Servers pane or the Object Explorer. Within
Object Explorer, you can right-click the Policies node underneath Policy Management and
select Import Policy. Within Registered Servers, you can right-click the Central Management
Server or any folder or instance underneath the Central Management Server and select Import
Policies. If you import policies from the Central Management Server, the policies are imported
to every instance defined underneath the Central Management Server, but not to the Central
Management Server itself. Likewise, right-clicking a folder imports the policies to all instances
within the folder hierarchy. To import policies to the Central Management Server, you must
connect to the instance within Object Explorer and import from the Policies node.

Q
Quick Check
1 . What are the five objects that are used within Policy Based Management?

2. What are the allowed execution modes for a policy?

3. Which object has a property that allows you to mandate checking for all
databases on an instance?

4. How many facets can be checked within a single condition?

5. How many conditions can be checked within a single policy?

Quick Check Answers
1 . The objects that are used with Policy Based Management are facets, conditions,
policies, policy targets, and policy categories.

2. The policy execution modes are On demand, On schedule, On change, Log only,
d y
and On change, prevent.

3. Policy categories allow you to mandate checking of all databases within an
instance.

4. A condition can be defined on only one facet.

5. A policy can check only a single condition.

PR ACTICE Defining Policies and Checking for Compliance

In these practices, you define and check several policies for your environment.

PR ACTICE 1 Create a Condition
In this practice, you create a condition for the following:
Check that a database does not have the auto shrink or auto close properties set.
Check that CLR, OLE Automation, Ad Hoc Remote Queries, and SQL Mail are all disabled.
Check that a database is not in the Simple recovery model.
Check that all tables have a primary key.


1. In Object Explorer, expand the Policy Management node within the Management node.
2. Right-click the Conditions node and select New Condition.
3. Configure the condition as shown here. Click OK when you are done.

4. Right-click the Conditions node again, select New Condition, and configure the condition
as shown here. Click OK.

5. Right-click the Conditions node, select New Condition, and configure this third condition
as shown here. Click OK when you are finished.


6. Right-click the Conditions node and select New Condition. Select the Table facet, click
the ellipsis button next to the Field column to display the Advanced Edit dialog box,
enter the following code in the Cell Value text box, and click OK:

IsNull(ExecuteSql('Numeric', 'SELECT 1 FROM sys.tables a INNER JOIN sys.indexes b
ON a.object_id = b.object_id WHERE b.is_primary_key = 1
AND a.name = @@ObjectName AND a.schema_id = SCHEMA_ID(@@SchemaName)'), 0)

7. Conﬁgure the Name, Operator, and Value as shown here, and then click OK.

PR ACTICE 2 Create a Condition for a Target Set
In this practice, you create a condition to target all SQL Server 2005 and later instances, along
with a condition to target all user databases that are online.


1. Right-click the Conditions node, select New Condition, and conﬁgure the condition as
shown here. Click OK.

2. Right-click the Conditions node, select New Condition, and conﬁgure the condition as
shown here. Click OK when you are done.

PR ACTICE 3 Create a Policy
In this practice, you create policies that use the conditions you just created to do the following:
Check that a database does not have the auto shrink or auto close properties set.
Check that CLR, OLE Automation, Ad Hoc Remote Queries, and SQL Mail are all disabled.


Check that a database is not in the Simple recovery model.
Check that all tables have a primary key.
1. Right-click the Policies node, select New Policy, and conﬁgure the policy as shown
here. Click OK.

2. Right-click the Policies node, select New Policy, and conﬁgure this second policy as


3. Right-click the Policies node, select New Policy, and conﬁgure the policy as shown
here. Click OK.

4. Right-click the Policies node, select New Policy, and conﬁgure the last policy as shown
here. Click OK.


PR ACTICE 4 Create a Policy Category
In this practice, you create two policy categories for the policies that you created.
1. Right-click Policy Management, select Manage Categories, and create the categories as

2. In SSMS, in the console tree, expand the Policies folder. Right-click the Check For Auto
Shrink And Auto Close Policy, select Properties, click the Description tab, and change
the category to Database Best Practices. Click OK.
3. Right-click the Check For Simple Recovery Model Policy, select Properties, select the
Description tab, and change the category to Database Best Practices. Click OK.
4. Right-click the Check For Surface Area Conﬁguration Policy, select Properties, click the
Description tab, and change the category to Instance Surface Area Best Practices. Click OK.
5. Right-click the Check Tables For Primary Key Policy, select Properties, select the
Description tab, and change the category to Database Best Practices. Click OK.

PR ACTICE 5 Import Policies
In this practice, you import the policies that ship with SQL Server.
1. Right-click the Policies node underneath Policy Management and select Import Policy.
2. Click the ellipsis button next to the Files To Import text box, navigate to the Microsoft
SQL Server100ToolsPoliciesDatabaseEngine1033 folder, select all the ﬁles in the
folder, as shown here, and click Open.


3. Select the Replace Duplicates With Items Imported check box, select Preserve Policy
State On Import, and click OK.
4. Take the time to browse the policies and conditions that were created during the
import.

Lesson Summary
You can build policies to enforce conditions across any version of SQL Server.
Policies can enforce a single condition and each condition can be based on a single
facet.
Policy categories allow you to group policies together for compliance checking.
A policy category can be set with the Mandate property, which requires the policy to
be checked against all databases within an instance.

Lesson Review
The following question is intended to reinforce key information presented in this
lesson. The question is also available on the companion CD if you prefer to review it in
electronic form.

NOTE
E ANSWERS
Answers to this question and an explanation of why each answer choice is correct
or incorrect is located in the “Answers” section at the end of the book.


1. You have deﬁned several policies that you want applied to all databases within an
instance. How do you ensure that a database owner is not allowed to avoid the policy
check with the least amount of administrative effort?
A. Create a condition that checks all databases.
B. Add the policy to a user-deﬁned policy category and set the Mandate property.
C. Add the policy to the default policy category.
D. Check the policies manually against the instance.


Chapter Review
following tasks:

Chapter Summary
Facets are the .NET assemblies that define the set of properties for an object upon
which conditions are built.
A condition can be defined for a single facet and a policy can be checked for a single
instance.
Policies can be checked manually or automatically. Automatic policy checking can be
performed on a scheduled basis or by using the event notification infrastructure.
A database owner can subscriber a database to one or more policies; however, a policy
that belongs to a policy category set with the Mandate property requires checking
against all databases.

Key Terms
Condition
Facet
Policy category
Policy target

Case Scenario


Case Scenario: Designing a Management Strategy for Coho Vineyard
BACKGROUND
Company Overview

Planned Changes
Management wants to ensure that you cannot execute stored procedures written in C#.NET
or use the OPENROWSET or OPENDATASOURCE command.

Databases


DATABASE SIZE

Accounting 500 MB
HR 100 MB
Inventory 250 MB
Promotions 80 MB


serve as a data store to the new Web store. As part of their daily work, employees also
will connect periodically to the Order database using a new in-house Web application.

Database Servers

placed before the previous month will be moved to another partitioned table named Order.
4 A.M. daily.
with the customers. After the consolidation project finishes, the EDI routine loads all data into
the new Order database.
You need to back up all databases at all locations. All production databases are required
to be configured with the Full recovery model. You can lose a maximum of five minutes of
data under a worst-case scenario. The Customer, Account, Inventory, Promotions, and Order
Answer the following question.
What policies would you implement to check and enforce the business requirements
for Coho Vineyard?


Suggested Practices

Implement Policy Based Management
Practice 1 Configure a policy to check the surface area configuration for all your SQL
Server instances.
Practice 2 Configure a policy to check the last time a database was successfully
backed up.
Practice 3 Configure policies to check the membership of the sysadmin and db_owner
roles.
Practice 4 Configure a policy to ensure that databases are not set to either auto
shrink or auto close.
Practice 5 Based on the policies that ship with SQL Server 2008, decide which policies
apply to your environment and implement the policy checks.




CHAPTER 9

Backing up and Restoring
a Database
long with security, the other fundamental task of a database administrator (DBA) is to
A ensure that data can be recovered in the event of a disaster. Unless you can protect the
data, the thousands of features in Microsoft SQL Server 2008 to build high-performance
and scalable applications cannot be used to run a business. In this chapter, you learn about
the capabilities of the backup and restore engine, as well as procedures for recovering from
various disaster scenarios.

Back up a SQL Server environment.
Back up databases.
Restore databases.
Manage database snapshots.
Maintain a database by using maintenance plans.

Lesson 1: Backing up Databases 199

Lesson 2: Restoring Databases 212

Lesson 3: Database Snapshots 223

Before You Begin

CHAPTER 9 197

REAL WORLD
Michael Hotek

S everal years ago, I was called into a company to help them recover from a major
disaster that required relocation to new facilities. A new office and data center
were already established, and it was my job to get the database servers online with
all the databases recovered.

I was given a large number of documents and a detailed description of the backup
procedures that had been in place for the databases and the standby servers where
redundant copies of the data were online in the event of a failure on a primary.
Everything looked to be in order, so I got on a plane and arrived at the new data
center a few hours later.

Only after getting to the new data center and starting to gather everything
together did I learn the bad news: the documentation was completely worthless.
The company had standby servers that were maintained by log shipping, and the
standby servers were in the same data center as the primary servers. Although
having the standby servers in the same data center would not be a recommended
solution, it poses a bit of a problem when the previous data center is under 17 feet
of water. To make matters worse, even though everything was backed up to tape,
the tapes were stored in a filing cabinet…in the same data center that was 17 feet
underwater.

The only usable backups that we had were a set of tapes from about two months
before when one of the new DBAs started a project to store backup tapes off-site.
The project hadn’t gone anywhere due to funding issues. Fortunately, this new
DBA had gotten busy and forgotten about the fact that the first step in the off-site
storage project was to simply have the on-call DBAs take the previous day’s tapes
home with them, and he had forgotten that he had taken a set of tapes home with
him, two months before.

Having a backup strategy, standby servers, and multiple copies of a database are
all best practices for database management. But you have to filter in just a little bit
of common sense. If the purpose of backups is to protect you from a disaster, you
probably don’t want the backup tapes stored anywhere near the primary machines
that could encounter a disaster.

198 CHAPTER 9 Backing up and Restoring a Database

Lesson 1: Backing up Databases
Database backups form the backbone upon which every disaster recovery plan is built.
Backup strategies are designed in the opposite order of the material presented in this chapter.
You start with the recovery requirements and procedures and then figure out which types of
backups best meet your recovery needs. Unless you are devising recovery-oriented backup
strategies, it is unlikely that you will ever meet your disaster recovery requirements. However,
it is very difficult to teach data recovery without first having backups, so in this lesson you
learn about the various backup types and how to create backups to support your databases
prior to learning about recovering databases.

to:
Create full, differential, and transaction log backups
Create maintenance plans


Backup Security

A ll backups execute under the security context of the SQL Server service account.
Although you could grant read/write access on the backup directory directly
to the SQL Server service account, you should instead grant read/write access to the
Windows group SQLServerMSSQLUser$<machine_name>$<instance_name>, which
contains the SQL Server service account.

A member of the sysadmin server role can back up any database in an instance and
members of the db_owner database role can back up their databases. You can also
add a user to the db_backupoperator fixed database role to allow the user to back
up a database while preventing any other access to the database.

Backup Types
SQL Server 2008 allows you to create four different types of backups:
Full
Differential
Transaction log
Filegroup

Lesson 1: Backing up Databases CHAPTER 9 199

Full Backups
A full backup captures all pages within a database that contain data. Pages that do not
contain data are not included in the backup. Therefore, a backup is never larger, and in
most cases is smaller, than the database for which it is created. A full backup is the basis for
recovering a database and must exist before you can use a differential or transaction log
backup.
Because it is more common to back up a database than to restore one, the backup engine
is optimized for the backup process. When a backup is initiated, the backup engine grabs
pages from the data files as quickly as possible, without regard to the order of pages. Because
the backup process is not concerned with the ordering of pages, multiple threads can be used
to write pages to your backup device. The limiting factor for the speed of a backup is the
performance of the device where the backup is being written.
A backup can be executed concurrently with other database operations. Because changes can
be made to the database while a backup is running, SQL Server needs to be able to accommodate
the changes while also ensuring that backups are consistent for restore purposes. To ensure
both concurrent access and backup consistency, SQL Server performs the steps of the backup
procedure as follows:
1. Locks the database, blocking all transactions
2. Places a mark in the transaction log
3. Releases the database lock
4. Extracts all pages in the data files and writes them to the backup device
5. Locks the database, blocking all transactions
6. Places a mark in the transaction log
7. Releases the database lock
8. Extracts the portion of the log between the marks and appends it to the backup
The only operations that are not allowed during a full backup are
Adding or removing a database file
Shrinking a database
The generic syntax to back up a database is
BACKUP DATABASE { database_name | @database_name_var }
TO <backup_device> [ ,...n ]
[ <MIRROR TO clause> ] [ next-mirror-to ]
[ WITH { DIFFERENTIAL | <general_WITH_options> [ ,...n ] } ]

<backup_device>::= { { logical_device_name | @logical_device_name_var }
| { DISK | TAPE } =
{ 'physical_device_name' | @physical_device_name_var } }

<MIRROR TO clause>::= MIRROR TO <backup_device> [ ,...n ]

<general_WITH_options> [ ,...n ]::=
--Backup Set Options
COPY_ONLY | { COMPRESSION | NO_COMPRESSION }
| DESCRIPTION = { 'text' | @text_variable }
| NAME = { backup_set_name | @backup_set_name_var }
| PASSWORD = { password | @password_variable }
| { EXPIREDATE = { 'date' | @date_var }
| RETAINDAYS = { days | @days_var } }

--Media Set Options
{ NOINIT | INIT } | { NOSKIP | SKIP } | { NOFORMAT | FORMAT }
| MEDIADESCRIPTION = { 'text' | @text_variable }
| MEDIANAME = { media_name | @media_name_variable }
| MEDIAPASSWORD = { mediapassword | @mediapassword_variable }
| BLOCKSIZE = { blocksize | @blocksize_variable }

--Error Management Options
{ NO_CHECKSUM | CHECKSUM }
| { STOP_ON_ERROR | CONTINUE_AFTER_ERROR }

The only two parameters required for a backup are the name of the database and the
backup device. When you specify a disk backup device, a directory and a file name can
be specified. If a directory is not specified, SQL Server performs a backup to disk and writes
the file to the default backup directory configured for the instance. Although most backups
are written to a single disk file or a single tape device, you can specify up to 64 backup
devices. When you specify more than one backup device, SQL Server stripes the backup
across each of the devices specified.

NOTE
E RESTORING A STRIPED BACKUP
When SQL Server stripes a backup across multiple devices, all devices are required to
successfully restore. SQL Server does not provide any redundancy or fault tolerance within
the stripe set. A stripe set is used strictly for performance purposes.

An example of a striped backup is

BACKUP DATABASE AdventureWorks
TO DISK = 'AdventureWorks_1.bak', DISK = ' AdventureWorks_2.bak'
GO

One of the maxims of disaster recovery is that you can’t have enough copies of your
backups. The MIRROR TO clause provides a built-in capability to create up to four copies of a
backup in a single operation. When you include the MIRROR TO clause, SQL Server retrieves
the page once from the database and writes a copy of the page to each backup mirror.

During a restore operation, you can use any of the mirrors. Mirrored backups have a small
number of requirements:
All backup devices must be of the same media type.
Each mirror must have the same number of backup devices.
WITH FORMAT must be specified in the backup command.
If you back up to tape, you must mirror to tape. If you back up to disk, you must mirror to
disk. You can’t back up to tape and mirror to disk, or vice versa. In addition, you must mirror
to the same number of devices as you back up to; for example, if you back up to 64 disk
devices, then you must also mirror to 64 disk devices.
The limiting factor for backup performance is the speed of the backup device. By
compressing a backup, you can write the data necessary while also reducing the amount of
data written to the backup device. There is a cost in processing overhead when compressing
a backup. Although an uncompressed backup usually consumes very few processing resources,
compression can consume a noticeable amount of processing resources. A backup normally
compresses between 4:1 and 10:1.

BEST PRACTICE
E DECREASING BACKUP TIMES
The overhead of compression is always worth it. The amount of time saved for a
compressed backup far exceeds the overhead associated with the compression operation.
Fortunately, SQL Server has a configuration option called backup compression default
that you can set to always have backups compressed regardless of whether you explicitly
specify compression. Unfortunately, compression is available only in SQL Server 2008
Enterprise.

A single backup device can contain multiple backups. The INIT/NOINIT options of a
BACKUP command control whether an existing backup file is overwritten or appended to.
When you specify NOINIT and you are backing up to a file that already exists, SQL Server
appends the new backup to the end of the file. If you specify INIT and the file already exists,
SQL Server overwrites the file with the contents of the new backup.

BEST PRACTICE
E AVOIDING BACKUP PROBLEMS
To avoid confusion, it is recommended that you use a unique naming scheme that employs
a date and time in the file name so that you can tell when a backup was taken based on the
name of the backup file. Because backups are taken to reduce your risk of data loss, it is
also never a good idea to include multiple backups in a single file.

When CHECKSUM is specified, SQL Server verifies the page checksum, if it exists, before
writing the page to the backup. In addition, a checksum is calculated for the entire backup
that can be used to determine if the backup has been corrupted. The default behavior for
errors encountered during a backup is STOP_ON_ERROR. If an invalid page checksum is


encountered during a backup, the backup terminates with an error. To continue past the
error and back up as many pages as possible, you can specify the CONTINUE_PAST_ERROR
option.

NOTE
E IDENTIFYING BAD PAGES
It is recommended that you specify the CHECKSUM option to catch bad pages as early as
M
possible. You do not want to encounter any surprises when you need to use the backup to
restore a database.

Transaction Log Backups
Every change made to a database has an entry made to the transaction log. Each row is
assigned a unique number internally called the Log Sequence Number (LSN). The LSN is an
integer value that starts at 1 when the database is created and increments to infinity. An LSN
is never reused for a database and always increments. Essentially, an LSN provides a sequence
number for every change made to a database.
The contents of a transaction log are broken down into two basic parts—active and
inactive. The inactive portion of the transaction log contains all the changes that have been
committed to the database. The active portion of the log contains all the changes that have
not yet been committed. When a transaction log backup is executed, SQL Server starts
with the lowest LSN in the transaction log and starts writing each successive transaction log
record into the backup. As soon as SQL Server reaches the first LSN that has not yet been
committed (that is, the oldest open transaction), the transaction log backup completes. The
portion of the transaction log that has been backed up is then removed, allowing the space
to be reused.
Based on the sequence number, it is possible to restore one transaction log backup
after another to recover a database to any point in time by simply following the chain of
transactions as identified by the LSN.
Because transaction log backups are intended to be restored one after another, the
restrictions on transaction log backups depend on having the entire sequence of LSNs intact.
Any action that creates a gap in the LSN sequence prevents any subsequent transaction log
backup from being executed. If an LSN gap is introduced, you must create a full backup
before you can start backing up the transaction log.
A transaction log backup works just like an incremental backup in Microsoft Windows.
A transaction log backup gathers all committed transactions in the log since the last
transaction log backup. However, because a transaction log backup contains only
the transactions that have been issued against the database, you need a starting point
for the transaction log chain.
Before you can issue a transaction log backup, you must execute a full backup. After
the first backup, so long as the transaction log chain is not interrupted, you can restore
the database to any point in time. Additional full backups can be created to have a more


recent starting point for a restore operation. Regardless of the number of full backups that
you create, so long as you haven’t introduced a gap in the LSN chain, you can start with
any full backup and restore every transaction log from that point forward to recover a
database.
The general syntax for creating a transaction log backup is:

BACKUP LOG { database_name | @database_name_var }
TO <backup_device> [ ,...n ]
[ <MIRROR TO clause> ] [ next-mirror-to ]
[ WITH { <general_WITH_options> | <log-specific_optionspec> } [ ,...n ] ][;]

Differential Backups
A differential backup captures all extents that have changed since the last full backup. The
primary purpose of a differential backup is to reduce the number of transaction log backups
that need to be restored. A differential backup has to be applied to a full backup and can’t
exist until a full backup has been created.
SQL Server tracks each extent that has been changed following a full backup using a
special page in the header of a database called the Differential Change Map (DCM). A full
backup zeroes out the contents of the DCM. As changes are made to extents within the
database, SQL Server sets the bit corresponding to the extent to 1. When a differential
backup is executed, SQL Server reads the contents of the DCM to ﬁnd all the extents that
have changed since the last full backup.
A differential backup is not the same as an incremental backup. A transaction log backup
is an incremental backup because it captures any changes that have occurred since the last
transaction log backup. A differential backup contains all pages changed since the last full
backup. For example, if you were to take a full backup at midnight and a differential backup
every four hours, both the 4 A.M. backup and the 8 A.M. backup would contain all the changes
made to the database since midnight.

The COPY_ONLY Option

O ne of the options that can be speciﬁed for any backup type is COPY_ONLY.
Each backup executed against a database has an effect on the starting point
for a recovery and which backups can be used. Differential backups contain all
extents that have changed since the last full backup, so every full backup executed
changes the starting point that a differential backup is based upon. When a
transaction log backup is executed, the transactions that have been backed up are
removed from the transaction log.

On occasion, you need to create a backup to create a database for a development or
test environment. You want to have the most recent set of data, but you do not want to

affect the backup set in the production environment. The COPY_ONLY option
allows you to create a backup that can be used to create the development
or test environment, but it does not affect the database state or set of backups
in production. A full backup with the COPY_ONLY option does not reset the
differential change map page and therefore has no impact on differential
backups. A transaction log backup with the COPY_ONLY option does not remove
transactions from the transaction log.

Filegroup Backups
Although full backups capture all the used pages across the entire database, a full backup of
a large database can consume a significant amount of space and time. If you need to reduce
the footprint of a backup, you can rely on file or filegroup backups instead. As the name
implies, a file/filegroup backup allows you to target a portion of a database to be backed up.

CAUTION
N BACKING UP INDIVIDUAL FILES
Although it is possible to back up a file, it is recommended that your backups are only
as granular as the filegroup level. A filegroup is a storage boundary, and when you have
multiple files within a filegroup, SQL Server stores data across all the files. However, with
respect to the table, index, or partition, the distribution of data across the files is essentially
random. Therefore, to recover a database, you need all the files underneath a filegroup to
be in exactly the same state.

Filegroup backups can be used in conjunction with differential and transaction log
backups to recover a portion of the database in the event of a failure. In addition, so long
as you do not need to restore the primary filegroup and you are running SQL Server 2008
Enterprise, the database can remain online and accessible to applications during the restore
operation. Only the portion of the database being restored is off-line.

EXAM TIP
You need to know how to perform each type of backup that can be executed. Backing up
and restoring databases and the entire SQL Server environment is a major focus of the exam.

Partial Backups
Filegroups can be marked as read-only. A read-only filegroup cannot have any changes to the
objects that are stored on the filegroup. Because the purpose of backups is to capture changes
so that you can reconstruct a database to the most current state during a recovery operation,
backing up filegroups that cannot change unnecessarily consumes space within the backup.


To reduce the size of a backup to only the filegroups that can change, you can perform a
partial backup. Partial backups are performed by specifying the READ_WRITE_FILEGROUPS
option as follows:

BACKUP DATABASE database_name READ_WRITE_FILEGROUPS
[,<file_filegroup_list>] TO <backup_device>

When a partial backup is executed, SQL Server backs up the primary filegroup, all read/write
filegroups, and any explicitly specified read-only filegroups.

Page Corruption
Hopefully you will never have to deal with corruption within a database. Unfortunately, hardware
components fail, especially drive controllers and disk drives. Prior to a complete failure, drive
controllers or disk drives can introduce corruption to data pages by performing incomplete writes.
Prior to SQL Server 2005, when SQL Server encountered a corrupt page, you could potentially
have the entire instance go off-line. SQL Server 2005 introduced the ability to quarantine corrupt
pages while allowing the database to remain online. By executing the following command, SQL
Server detects and quarantines corrupted pages:

ALTER DATABASE <dbname> SET PAGE_VERIFY CHECKSUM

When SQL Server writes a page to disk, a checksum is calculated for the page. When you
enable page verification, each time a page is read from disk, SQL Server computes a new
checksum and compares it to the checksum stored on the page. If the checksums do not
match, SQL Server returns an error and logs the page into a table in the msdb database.

NOTE
E REPAIRING A CORRUPT PAGE
If the database is participating in a Database Mirroring session, a copy of the corrupt page
is retrieved from the mirror. If the page on the mirror is intact, the corrupt page is repaired
automatically with the page retrieved from the mirror.

Although corrupt pages can be quarantined, SQL Server has a protection mechanism in
place to protect your database from massive corruption. You are limited to a total of 1,000
corrupt pages in a database. When you reach the corrupt page limit, SQL Server takes the
database off-line and places it in a suspect state to protect it from further damage.

Maintenance Plans
Maintenance plans provide a mechanism to graphically create job workflows that support
common administrative functions such as backup, re-indexing, and space management.
Tasks that are supported by maintenance plans are:
Backing up of databases and transaction logs
Shrinking databases

Re-indexing
Updating of statistics
Performing consistency checks
The most common tasks performed by maintenance plans are database backups. Instead
of writing the code to back up a database, you can configure a maintenance plan to perform
the backup operations that you need to support your disaster recovery requirements.

NOTE
E EXECUTING MAINTENANCE PLANS
Maintenance plans are based upon the tasks within SQL Server Integration Services (SSIS). So,
when a maintenance plan executes, it first loads the SSIS engine. Then the .NET Framework
interprets the tasks within the package, constructs the necessary backup statements, and
executes the code generated.

Certificates and Master Keys
You always have a service master key for each instance. You could also have database master
keys and certificates. Certificates and master keys need to be backed up to ensure a complete
recovery of your instance.
A service master key is created automatically the first time that an instance is started.
The service master key is regenerated each time that you change the SQL Server service
account or service account password. The first action that you should take after an instance
is started is to back up the service master key. You should also back up the service master
key immediately following a change to the service account or service account password.
The generic syntax to back up a service master key is:

BACKUP SERVICE MASTER KEY TO FILE = 'path_to_file'
ENCRYPTION BY PASSWORD = 'password'

Database master keys (DMKs) are created prior to the creation of a certificate, symmetric
key, or asymmetric key. As explained in Chapter 11, “Designing SQL Server Security,” a DMK is
the root of the encryption hierarchy in a database. To ensure that you can access certificates,
asymmetric keys, and symmetric keys within a database, you need to have a backup of the
DMK. Immediately following the creation of a DMK, you should create a backup of the DMK.
The generic syntax to back up a DMK is:

BACKUP MASTER KEY TO FILE = 'path_to_file'
ENCRYPTION BY PASSWORD = 'password'

Before you can back up a DMK, it must be open. By default, a DMK is encrypted with the
service master key. If the DMK is encrypted only with a password, you must first open the
DMK by using the following command:

USE <database name>;
OPEN MASTER KEY DECRYPTION BY PASSWORD = '<SpecifyStrongPasswordHere>';

Certificates are used to encrypt data as well as digitally sign code modules. Although you
could create a new certificate to replace the digital signature in the event of the loss of a
certificate, you must have the original certificate to access any data that was encrypted with
the certificate. Certificates have both a public and a private key. You can back up just the pub-
lic key by using the following command:

BACKUP CERTIFICATE certname TO FILE = 'path_to_file'

However, if you restore a backup of a certificate containing only the public key, SQL Server
generates a new private key. Unfortunately, the private key is the important component of a
certificate that is used to encrypt/decrypt data within SQL Server. Therefore, you need to ensure
that both the public and private keys are backed up for a certificate. Just like master keys, you
should back up a certificate immediately after creation by using the following command:

BACKUP CERTIFICATE certname TO FILE = 'path_to_file'
[ WITH PRIVATE KEY
( FILE = 'path_to_private_key_file' ,
ENCRYPTION BY PASSWORD = 'encryption_password'
[ , DECRYPTION BY PASSWORD = 'decryption_password' ] ) ]

Backup Storage

T o restore databases after a disaster, you need to be able to access your
backups. Because disasters can encompass an entire site, all backups should
be stored off-site. However, backups that are rotated to an off-site storage facility
pose a security risk because you are moving data to another location that is
outside the security controls of Active Directory and your network. Therefore, you
need to take appropriate physical security measures to ensure that your backups
are safe.

Master keys and certificates impose an additional constraint on off-site storage. It
is common for an organization to have a single off-site backup vendor that collects
and stores all the corporate backups. If someone were to steal the backup of a
database that contained encrypted data, he or she would not be able to access
the encrypted data without also having access to the master keys and certificates.
Therefore, although you need to back up certificates and master keys, the backups
of your master keys and certificates should never be stored in the same location as
the databases with which they are associated.

MORE INFO MASTER KEYS
Chapter 11 has additional information about the use and management of certificates and
master keys.


Validating a Backup
Because backups are your insurance policy for a database, you need to ensure that the backups
created are valid and useable. To validate a backup, execute the following command:

RESTORE VERIFYONLY FROM <backup device>

When a backup is validated, SQL Server performs the following checks:
Calculates a checksum for the backup and compares to the checksum stored in the
backup file
Verifies that the header of the backup is correctly written and valid
Transits the page chain to ensure that all pages are contained in the database and can
be located

Q
Quick Check
1 . What are the four types of backups?

2. How can you detect and log corrupt pages?

Quick Check Answers
1. You can execute full, differential, transaction log, and file/filegroup backups. A full
backup is required before you can create a differential or transaction log backup.

2. Execute ALTER DATABASE <database name> SET PAGE_VERIFY CHECKSUM.

PR ACTICE Backing up Databases

In this practice, you create full, differential, and transaction log backups.

NOTE

Before performing the following exercises, verify that the AdventureWorks database is set
to the Full recovery model.

PR ACTICE 1 Create a Compressed, Mirrored, Full Backup
In this practice, you create a compressed backup for the AdventureWorks database and mirror
the backups for redundancy and validate page checksums.
1. Execute the following code to back up the AdventureWorks database:

TO DISK = 'c:testAdventureWorks_1.bak'
MIRROR TO DISK = 'c:testAdventureWorks_2.bak'
WITH COMPRESSION, INIT, FORMAT, CHECKSUM, STOP_ON_ERROR
GO

PR ACTICE 2 Create a Transaction Log Backup
In this practice, you create a pair of transaction log backups for the AdventureWorks database.
1. Execute the following code to modify data and perform the ﬁrst transaction log
backup:

USE AdventureWorks
GO

INSERT INTO HumanResources.Department
(Name, GroupName)
VALUES('Test1', 'Research and Development')
GO

BACKUP LOG AdventureWorks
TO DISK = 'c:testAdventureWorks_1.trn'
WITH COMPRESSION, INIT, CHECKSUM, STOP_ON_ERROR
GO

2. Execute the following code to make another data modiﬁcation and perform a second
transaction log backup:

(Name, GroupName)
GO

WITH COMPRESSION, INIT, CHECKSUM, STOP_ON_ERROR
GO

PR ACTICE 3 Create a Differential Backup
In this practice, you create a differential backup for the AdventureWorks database.
1. Execute the following code to create two more transactions:

USE AdventureWorks
GO

(Name, GroupName)
GO

2. Execute the following code to create a differential backup:

TO DISK = 'c:testAdventureWorks_1.dif'


MIRROR TO DISK = 'c:testAdventureWorks_2.dif'
WITH DIFFERENTIAL, COMPRESSION, INIT, FORMAT, CHECKSUM, STOP_ON_ERROR
GO

Lesson Summary
Full backups are the starting point for every backup procedure and recovery process.
A full backup contains only the pages within the database that have been used.
Differential backups contain all pages that have changed since the last full backup and
are used to reduce the number of transaction log backups that need to be applied.
Transaction log backups contain all the changes that have occurred since the last
transaction log backup.
To execute a transaction log backup, the database must be in either the Full or
Bulk-logged recovery model, a full backup must have been executed, and the
transaction log must not have been truncated since the last full backup.
You can back up only the filegroups that accept changes by using the READ_WRITE_
FILEGROUPS option of the BACKUP DATABASE command.

Lesson Review
“Backing up Databases.” The question is also available on the companion CD if you prefer to

NOTE
E ANSWERS

1. You are the database administrator for Fabrikam. The Orders database is critical
to company operations and is set to the Full recovery model. You are running full
backups daily at 1 A.M., differential backups every four hours beginning at 5 A.M., and
transaction log backups every five minutes. If the Orders database were to become
damaged and go off-line, what is the first step in the restore process?
A. Restore the most recent full backup with the NORECOVERY option.
B. Restore the most recent differential backup with the NORECOVERY option.
C. Back up the transaction log with the NO_TRUNCATE option.
D. Back up the transaction log with the TRUNCATE_ONLY option.


Lesson 2: Restoring Databases
In everyday life, you purchase various types of insurance. You hope that you never have to use
the insurance, but in the event of a disaster, an insurance policy provides financial protection.
Backups are the insurance policy for your data. You hope that you never need to use your
backups, but in the event of a disaster, backups allow you to recover your data and continue
business operations. In this lesson, you learn how to use your backups to recover your SQL
Server environment. In addition, because the recovery of a database depends upon the state
of the transaction log, you also learn a little about the internals of a transaction log.

to:
Restore databases


Transaction Log Internals
A transaction log is one or more files associated with a database that tracks every modification
made to either data or objects within the database. Transaction logs do not span databases;
therefore, a business transaction that is executed across multiple database is implemented
physically as a separate transaction within each affected database. The transaction log is also
required to store enough information to allow SQL Server to recover a database when the
instance is restarted.
The key piece of information within a transaction log is the LSN. The LSN starts at 0 when
the database is created and increments to infinity. The LSN always moves forward, never
repeats, and cannot be reset to a previous value. Each operation that affects the state of the
database increments the LSN.
Each storage unit within a database tracks the LSN of the last modification made to the
storage structure. At a database level, the LSN of the last change in the database is stored in
the header of the master data file. At a data file level, the LSN of the last change to a page
within the file is stored in the header of the data file. Each data page within a database also
records the LSN of the last change for the data page.
All data changes occur within buffers in memory. When a change is made, the
corresponding buffer is modified and a record is added to the transaction log. A modified
page in the buffer pool is referred to as a dirty page. Every dirty page tracks the LSN in the
transaction log that corresponds to the change that modified the page in the buffer pool.
When SQL Server executes a checkpoint, all dirty pages in the buffer pool are written out
to the data files.
During the checkpoint process, SQL Server compares the LSN of the dirty page in the
buffer pool to the LSN of the data page on disk. If the LSN of the data page on disk is equal


to or less than the LSN of the dirty page in the buffer pool as well as equal to or less than the
LSN for the data file, the page on disk is overwritten with the page from the buffer pool. If the
LSN of the dirty page is greater than the page on disk or the data file containing the page,
the page in the buffer pool is overwritten by the page on disk. When the checkpoint process
finishes writing dirty pages to the data files, the largest LSN written to each file is written
into the header of the file. In addition, the largest LSN written across the entire checkpoint
process is written to the header of the master data file. SQL Server ensures that the LSN for
every page within a file is equal to or less than the LSN for the file and that the LSN for every
file within a database is equal to or less than the LSN for the database. The final step in the
process is to clear the flag on each dirty page affected by the checkpoint that designates the
page has changed.
When a SQL Server is started, every database undergoes a process called restart recovery.
Restart recovery runs in two phases—UNDO and REDO. During the REDO phase, all
committed transactions in the transaction log are flushed to disk. The REDO phase uses the
same basic logic as the checkpoint process. If the LSN stored on the page is less than or equal
to the LSN of the log record being written to the page, the change is written. Otherwise, it is
skipped as having already been hardened to disk. At the completion of the REDO phase, the
UNDO phase starts. The UNDO phase moves through the transaction log and invalidates any
transaction in the log that is still open, ensuring that an uncommitted transaction cannot be
written out to disk. At the completion of the UNDO phase, the database undergoes a process
that is referred to as rolling forward. When a database is rolled forward, SQL Server reads the
last LSN recorded in the transaction log, increments the LSN, and writes the new LSN into
the header of every data file within the database, ensuring that transactions older than the
roll-forward point cannot be written to the data files.
Every backup that is created stores the minimum and maximum LSN for the database,
which corresponds to the backup taken. Because a full backup contains the portion of the
transaction log that was generated while the backup is running, a full backup is consistent
as of the time that the full backup completes and stores only the last LSN used within the
backup. Differential and transaction log backups record the database LSN at the beginning
of the backup operation, as well as the LSN at the end of the backup operation.
Because the LSN is always moving forward, SQL Server only has to compare the current
LSN to the LSN(s) recorded for the backup to determine if a backup can be applied to a
database. If the backup contains the next LSN in the sequence, then the backup can be
restored. If the backup does not contain the next LSN in the sequence, an error is generated
and the restore process terminates without applying any changes.
A full backup or a filegroup backup is required to begin a restore sequence, and then
additional differential and transaction log backups can be applied. However, to restore
additional differential or transaction log backups, the database or filegroup must be in a
restoring state. Any attempt to restore a differential or transaction log backup to a database
or filegroup that is not in a restoring state results in an error.
Over the years, many people have been incorrectly told that a differential or transaction log
backup cannot be restored to a database that is recovered because at the end of the restore

Lesson 2: Restoring Databases CHAPTER 9 213

process, the LSN is rolled forward and is no longer compatible with any of the transaction log
or differential backups. SQL Server does not reject the differential or transaction log backup
in this case due to the LSN. The differential and transaction log backup are specific to a full or
filegroup backup. A database that is recovered can have transactions executed, which would
make the database state incompatible with the differential or transaction log backup. Because
transactions cannot be executed against a database or filegroup that is in a recovering state,
SQL Server only has to verify if the database or filegroup is in a recovering state to proceed
with the secondary check for LSN compatibility.

MORE INFO

For more information about how SQL Server processes transactions, as well as the structure
of data files and transaction logs, please refer to Microsoft SQL Server 2008 Internals
(Microsoft Press, 2009 by Kalen Delaney).

Database Restores
All restore sequences begin with either a full backup or filegroup backup. When restoring
backups, you have the option to terminate the restore process at any point and make the
database available for transactions. After the database or filegroup being restored has been
brought online, you can’t apply any additional differential or transaction log backups to the
database.

Restoring a Full Backup
The generic syntax for restoring a full backup is:

RESTORE DATABASE { database_name | @database_name_var }
[ FROM <backup_device> [ ,...n ] ]
[ WITH {[ RECOVERY | NORECOVERY |
STANDBY = {standby_file_name | @standby_file_name_var } ]
| , <general_WITH_options> [ ,...n ]
| , <replication_WITH_option>
| , <change_data_capture_WITH_option>
| , <service_broker_WITH options>
| , <point_in_time_WITH_options—RESTORE_DATABASE>
} [ ,...n ]
]

<general_WITH_options> [ ,...n ]::=
--Restore Operation Options
MOVE 'logical_file_name_in_backup' TO 'operating_system_file_name'
[ ,...n ] | REPLACE | RESTART | RESTRICTED_USER

When a RESTORE command is issued, if the database does not already exist within the instance,
SQL Server creates the database along with all files underneath the database. During this process,
each file is created and sized to match the file sizes at the time the backup was created. After it
creates the files, SQL Server begins restoring each database page from the backup.

TIP
P RESTORING AN EXISTING DATABASE
The creation and sizing of all files associated to a database can consume a significant
amount of time. If the database already exists, you should just restore over the top of the
existing database, as described next.

The REPLACE option is used to force the restore over the top of an existing database.
Because it is much more common to back up a database than it is to restore a database,
the backup process is optimized to complete in the shortest amount of time. To accomplish
the shortest duration backup, SQL Server pulls pages into the backup regardless of the page
order. However, when restoring a database, the pages must be placed back into the database
in sequential order. Within each file, SQL Server must locate page 1, then page 2, then page 3,
etc. As a general rule of thumb, a restore operation will take approximately 30 percent longer
than the duration of the backup being restored.
The first pages within a database store the structural information about the database such
as the list of pages allocated to the database. After the restore process has restored the first
page in the database, anything currently residing on disk is invalidated. If you are restoring
over the top of an existing database and the restore process aborts, you no longer can
access anything in the database prior to the restore operation. If you are restoring a file or a
filegroup, only that portion of the database being restored is affected.
If you want the database to be online and accessible for transactions after the RESTORE
operation has completed, you need to specify the RECOVERY option. When a RESTORE is
issued with the NORECOVERY option, the restore completes, but the database is left in a
RECOVERING state such that subsequent differential and/or transaction log backups can be
applied. The STANDBY option can be used to allow you to issue SELECT statements against
the database while still issuing additional differential and/or transaction log restores. If
you restore a database with the STANDBY option, an additional file is created to make the
database consistent as of the last restore that was applied.
The file system on the machine that you are restoring the database to might not match the
machine where the backup was taken, or you may want to change the location of database
files during the restore. The MOVE option provides the ability to change the location of one
or more data files when the database is restored.

Restore Paths
Before restoring the database, you need to first take an inventory of the backups you have
created and the state of the database each backup applies to. Table 9-1 provides a basic
overview of the database state with respect to each backup that was created in Lesson 1.


TABLE 9-1 Database Modifications

BACKUP CREATED DATA MODIFICATION

Full backup
Insert Test1
Log backup
Insert Test2
Log backup
Insert Test3
Differential backup

Regardless of the point to which you want to restore the database, you have to restore the
full backup first. If you wanted to restore the database only up to the point where the Test1
department was added, you would then restore the first transaction log backup and recover
the database. The Test2 and Test3 departments would be lost. Similarly, if you want to restore
the database to the point before the Test3 department was added, you would restore the full
backup and then the first and second transaction log backups before recovering the database.
If you wanted to restore the database without losing any data, you only need to restore
the full backup and then the differential backup because the differential also contains all the
changes captured by each of the transaction log backups. What would happen, though, if you
restored the full backup and only then found out that the differential could not be used due
to damage to the backup? You could restore the two transaction log backups, but you would
irrevocably lose the Test3 department that was inserted.
To provide the greatest flexibility for a restore, the first step in any restore operation is to
issue a transaction log backup against the original database. Obviously, if the entire original
database no longer exists, you do not have the option to take a final transaction log backup
before beginning restore operations. However, so long as the transaction log is intact and
the master database still has an entry for the damaged database, you are allowed to issue
a BACKUP LOG command against the database, even if all the data files are damaged or
missing. The step in the restore process where you first take a final transaction log backup is
referred to as backing up the tail of the log.

Restoring a Differential Backup
A differential restore uses the same command syntax as a full database restore. When the full
backup has been restored, you can then restore the most recent differential backup.

Restoring a Transaction Log Backup
The generic syntax for restoring a transaction log backup is:

RESTORE LOG { database_name | @database_name_var }
[ <file_or_filegroup_or_pages> [ ,...n ] ]

[ FROM <backup_device> [ ,...n ] ]
[ WITH {[ RECOVERY | NORECOVERY |
STANDBY = {standby_file_name | @standby_file_name_var } ]
| , <general_WITH_options> [ ,...n ]
| , <replication_WITH_option>
| , <point_in_time_WITH_options—RESTORE_LOG> } [ ,...n ] ]

<point_in_time_WITH_options—RESTORE_LOG>::=
| { STOPAT = { 'datetime' | @datetime_var }
| STOPATMARK = { 'mark_name' | 'lsn:lsn_number' }
[ AFTER 'datetime' ]
| STOPBEFOREMARK = { 'mark_name' | 'lsn:lsn_number' }
[ AFTER 'datetime' ]

There are times that you need to restore a database but do not want to recover every
transaction that was issued. When restoring a transaction log, you can have SQL Server replay only
a portion of a transaction log by issuing what is referred to as a point-in-time restore. The STOPAT
command allows you to specify a date and time to which SQL Server restores. The STOPATMARK
and STOPBEFOREMARK options allow you to specify either an LSN or a transaction log MARK to
use for the stopping point in the restore operation.

Online Restores
A database has a state that governs whether it can be accessed and what operations can
be performed. For example, a database in an ONLINE state allows transactions and any
other operations to be executed, but a database in an EMERGENCY state allows only SELECT
operations to be executed by a member of the db_owner database role.
Each filegroup within a database can have a state. While one filegroup can be in a RESTORING
state and not be accessible, another filegroup can be in an ONLINE state and accept transactions.
The state of the database equals the state of the filegroup designated as PRIMARY.
SQL Server 2008 Enterprise allows you to perform restore operations while the database is
still online and accessible. However, because a full backup affects the entire database, the state
of the database is the state of the primary filegroup, and a database that is restoring is not
accessible. To perform an online restore operation, you must perform a file or filegroup restore.
In addition, you cannot be restoring the primary filegroup or a file within the primary filegroup.
A filegroup restore that affects only a portion of the database is referred to as a partial restore.

Restore a Corrupt Page
Page corruption occurs when the contents of a page are not consistent. Page contents can
become consistent when the page checksum does not match the contents of the page
or a row is only partially written to the page. Page corruption usually occurs where a disk
controller begins to fail.
If the page corruption occurs in an index, you do not need to perform a restore to fix the
corrupted page. Dropping and re-creating the index removes the corruption.

However, if the corruption occurs within a page of data within a table or the primary key,
you need to perform a restore to fix the corruption issue. In addition to being able to restore
filegroups, you can also restore an individual page in the database.
Page restore has several requirements:
The database must be in either the Full or Bulked-logged recovery model.
You must be able to create a transaction log backup.
A page restore can apply only to a read/write filegroup.
You must have a valid full, file, or filegroup backup available.
The page restore cannot be executed at the same time as any other restore operation.
All editions of SQL Server 2008 allow you to restore one or more pages while the database
is off-line. SQL Server 2008 Enterprise allows you to restore a page while the database, as well
as the filegroup containing the corrupt page, remain online. However, any operations that
attempt to access the page(s) during a restore receive an error and fail.
The syntax to restore a page is:
RESTORE DATABASE database_name
PAGE = 'file:page [ ,...n ]' [ ,...n ]
FROM <backup_device> [ ,...n ]
WITH NORECOVERY

The procedure to restore a corrupt page is as follows:
1. Retrieve the PageID of the damaged page.
2. Using the most recent full, file, or filegroup backup, execute the following command:

RESTORE DATABASE database_name
PAGE = 'file:page [ ,...n ]' [ ,...n ]
FROM <backup_device> [ ,...n ]
WITH NORECOVERY

3. Restore any differential backups with the NORECOVERY option.
4. Restore any additional transaction log backups with the NORECOVERY option.
5. Create a transaction log backup.
6. Restore the transaction log backup from step #5 using the WITH RECOVERY option.

EXAM TIP
You need to know the steps to restore a page to a database.

Restoring with Media Errors
As noted previously in this lesson, because pages are restored in sequential order, as soon as the
first page has been restored to a database, anything that previously existed is no longer valid. If
a problem with the backup media was subsequently encountered and the restore aborted, you

would be left with an invalid database that could not be used. If the only copy of your database
was the copy that you just overwrote with a restore or if you had only a single backup of the
database, not only would you have an invalid database, you would have lost all of your data.
SQL Server has the ability to continue the restore operation even if the backup media
is damaged. When it encounters an unreadable section of the backup ﬁle, SQL Server can
continue past the source of damage and continue restoring as much of the database as
possible. This feature is referred to as best effort restore.
After the restore operation completes, the database is set to EMERGENCY mode. An
administrator then has to connect to the database and determine if it is viable. If the database
is deemed to be valid and viable, the administrator can change the database status to
ONLINE. If the database is not viable, you at least have the option to read as much data as
possible from the database while it is in EMERGENCY mode.
To restore from backup media that has been damaged, you need to specify the
CONTINUE_AFTER_ERROR option for a RESTORE DATABASE or RESTORE LOG command.

Q
Quick Check
1 . Which recovery model always allows you to restore to the point of failure so
long as you can back up the tail of the log?

2. What is the ﬁrst operation that should be performed for any restore operation?

Quick Check Answers
1 . The Full recovery model.

2. Back up the tail of the log.

PR ACTICE Restoring a Database

In these practices, you restore the AdventureWorks database using the backups that were
created in the Lesson 1 practice.

PR ACTICE 1 Purposely Damage a Database
In this practice, you purposely damage the AdventureWorks database such that a restore is
required to be able to access your data.
1. Execute the following query to insert a new row into the AdventureWorks database:

USE AdventureWorks
GO

(Name, GroupName)
GO


2. Open SQL Server Configuration Manager and stop the SQL Server service for your SQL
Server 2008 instance.
3. Open Windows Explorer and delete the AdventureWorks.mdf file. Make certain that
you do not delete the AdventureWorks_1.ldf file.
4. In SQL Server Configuration Manager, start the SQL Server service for your SQL Server
2008 instance.
5. Reconnect to the instance with SSMS.
6. Observe that although the entry for the AdventureWorks database still exists, the
database is completely inaccessible because the data file no longer exists.

PR ACTICE 2 Restore a Full Backup
In this practice, you restore the AdventureWorks database from the full backup you created in
the Lesson 1 practice.

NOTE

Restoring a database requires exclusive access. Prior to executing any restore operation, you
need to ensure that you do not have any connections open to the AdventureWorks database.

1. The first step in a restore process is to back up the tail of the log. Open a new query
window and execute the following code:

WITH COMPRESSION, INIT, NO_TRUNCATE
GO

2. Now that you have captured the tail of the log, execute the following code to restore
the full backup:

RESTORE DATABASE AdventureWorks
FROM DISK = 'c:testAdventureWorks_1.bak'
WITH STANDBY = 'c:testAdventureWorks.stn'
GO

3. Verify that you can read the data but not modify it.
4. Verify that the departments that were added are missing from the database.

PR ACTICE 3 Restore a Differential Backup
In this practice, you restore the differential backup to the AdventureWorks database and
following verification, you bring the database online.
1. Execute the following code to restore the differential backup:
FROM DISK = 'c:testAdventureWorks_1.dif'
GO


2. Verify that the Test1, Test2, and Test3 departments exist.
3. Execute the following code to recover the database and terminate the restore process:

WITH RECOVERY
GO

PR ACTICE 4 Restore a Transaction Log Backup
In this practice, you restore the AdventureWorks database using the three transaction log
backups that you previously created.
1. Execute the following code to restore the full backup of the AdventureWorks database:

FROM DISK = 'c:testAdventureWorks_1.bak'
WITH STANDBY = 'c:testAdventureWorks.stn',
REPLACE
GO

2. Verify that data is missing from the Department table.
3. Execute the following code to restore the ﬁrst transaction log backup:

FROM DISK = 'c:testAdventureWorks_1.trn'
GO

4. Verify that the Test1 department now exists.
5. Execute the following code to restore the second transaction log backup:

GO

6. Verify that the Test2 department now exists.
7. Execute the following code to restore the third transaction log backup:

GO

8. Verify that the Test3 and Test4 departments exist.
9. Execute the following code to recover the database for access and transactions:

WITH RECOVERY
GO


Lesson Summary
The first step in any restore procedure is to back up the tail of the log.
A restore sequence begins with either a full, file, or filegroup restore.
If you execute a restore with the NORECOVERY option, you can apply subsequent
differential and transaction log backups.
If you execute a restore with the RECOVERY option, the database is recovered and you
cannot restore any additional backups to the database.
You can specify the STANDBY option during a restore process if you need to read the
contents of the database while still being able to restore additional differential and
transaction log backups.
You can execute a transaction log backup so long as the transaction log file is intact,
even if all data files for the database are damaged.
A restore can continue past damage to backup media when you specify the
CONTINUE_PAST_ERRORS option. If errors are encountered, the database is left in
EMERGENCY mode following the restore.

Lesson Review
“Restoring Databases.” The question is also available on the companion CD if you prefer to
review them in electronic form.

NOTE
E ANSWERS

1. The server that the Customers database is running on fails and needs to be replaced.
You build a new server and install SQL Server 2008. When you built the new server,
you decided that instead of configuring the new server exactly like the old one, you
implement a new drive letter and folder structure for data and log files. Which option
do you need to use when you restore the Customers database to the new server?
A. NORECOVERY
B. CONTINUE_AFTER_ERROR
C. MOVE
D. PARTIAL


Lesson 3: Database Snapshots
The Database Snapshots feature was introduced in SQL Server 2005 to provide users a
method to create read-only copies of data rapidly. In this lesson, you learn how to create
a database snapshot, as well as how to use a database snapshot to revert data or a database
to a previous point in time.

NOTE
E DATABASE SNAPSHOT
Database Snapshot is available only in SQL Server 2008 Enterprise.

CAUTION
N MANAGING FILESTREAM DATA
Database Snapshot is not compatible with FILESTREAM. If you create a Database Snapshot
against a database with FILESTREAM data, the FILESTREAM filegroup is disabled and not
accessible.

to:
Create a Database Snapshot
Revert data or a database from a Database Snapshot


Creating a Database Snapshot
The creation of a Database Snapshot is very similar to the creation of any database. To create
a Database Snapshot, you use the CREATE DATABASE command with the AS SNAPSHOT OF
clause. Because a Database Snapshot is a point-in-time, read-only copy of a database, you
don’t specify a transaction log.
The requirements to create a Database Snapshot are:
You must include an entry for each data file specified in the source database.
The logical name of each file must match the name in the source database exactly.
The generic syntax to create a Database Snapshot is:
CREATE DATABASE database_snapshot_name
ON
(NAME = logical_file_name,
FILENAME = 'os_file_name') [ ,...n ]
AS SNAPSHOT OF source_database_name

Lesson 3: Database Snapshots CHAPTER 9 223

The restrictions on a Database Snapshot are:
You can’t back up, restore, or detach a Database Snapshot.
The Database Snapshot must exist on the same instance as the source database.
Full text indexes are not supported.
FILESTREAM is not supported, and any FILESTREAM data is inaccessible through the
Database Snapshot.
You can’t create a Database Snapshot against a system database.
You can’t drop, restore, or detach a source database that has a Database Snapshot
created against it.
You can’t reference filegroups that are off-line, defunct, or restoring.
When a Database Snapshot is created, SQL Server doesn’t allocate space on disk equivalent
to the current size of the data files in the source database. Instead, SQL Server takes advantage
of an operating system feature called sparse files. A sparse file is essentially an entry in the file
allocation table and consumes almost no space on disk. As data is added to the file, the file
automatically grows on disk. By using sparse files, the creation time for a Database Snapshot is
independent of the size of the source database.
Accessing a Database Snapshot from an application perspective is very simple. A Database
Snapshot looks and acts like a read-only database to any queries being issued. Therefore, you
can issue a SELECT statement against a Database Snapshot and use the Database Snapshot
just as you would any other database.
At the time of creation, a Database Snapshot doesn’t contain any data. The instant a
Database Snapshot is created, you can issue SELECT statements against the Database Snapshot.
SQL Server uses the source database to retrieve data that hasn’t changed since you created the
Database Snapshot.

Copy-On-Write Technology
Because a Database Snapshot has to retain the state of the data in the source database at the
instant the Database Snapshot was created, SQL Server needs a mechanism to manage any
changes that occur within the source database. The mechanism SQL Server uses is known as
Copy-On-Write.
Remember that data within SQL Server is stored on pages; there are eight pages in an
extent, and SQL Server reads and writes extents. The first time a modification to a data page
within an extent occurs, SQL Server copies the before image of the extent to the Database
Snapshot. When SELECT statements are issued against the Database Snapshot, SQL Server
retrieves data from the Database Snapshot for any data that has changed while still pulling
data from the source database for any extents that have not changed.
By writing the before image of the extent the first time a change is made, SQL Server allows
changes to occur against the source database while also ensuring that any queries against the
Database Snapshot do not reflect any changes after the Database Snapshot was created.


After the initial change has been made to a page within an extent and SQL Server writes
the extent to the Database Snapshot, any subsequent changes to the extent are ignored by
the Copy-On-Write feature.
Because you can create multiple Database Snapshots against a source database, the before
image of an extent is written to each Database Snapshot that has not already received a copy
of the extent.

TIP
P DATABASE SNAPSHOT MAXIMUM SIZE
Because SQL Server maintains the Database Snapshot at the point in time that the
Database Snapshot was created, the maximum size of the Database Snapshot is the amount
of data that existed in the source database at the time of creation.

Reverting Data Using a Database Snapshot
Because a Database Snapshot contains all the data in the source database at the time of
creation of the Database Snapshot, you can use the Database Snapshot to return data in the
source database to the state contained in the Database Snapshot. In extreme cases, you can
use the Database Snapshot to return the entire contents of the source database to the state
of the Database Snapshot. For example, if you need to discard every change that happened
within the database since the Database Snapshot was created.
A database revert is a special category of restoring data that can be performed when you
have a Database Snapshot created.
If you need to revert only a row or a portion of a database, you can use an INSERT,
UPDATE, DELETE, or MERGE statement. SQL Server also allows you to revert the entire
database using the Database Snapshot, if necessary. When you use the Database Snapshot to
revert the entire database, the source database goes back to exactly the way it looked at the
time the Database Snapshot was created. Any transactions that had been issued against the
source database are lost.
The syntax to revert a database from a Database Snapshot is:

RESTORE DATABASE <database_name> FROM DATABASE_SNAPSHOT = <database_snapshot_name>

When you revert a source database there are several restrictions:
Only a single Database Snapshot can exist for the source database.
Full-text catalogs on the source database must be dropped and then re-created after
the revert completes.
Because the transaction log is rebuilt, the transaction log chain is broken.
Both the source database and Database Snapshot are off-line during the revert
process.
The source database cannot be enabled for FILESTREAM.

EXAM TIP
You need to know that FILESTREAM is not compatible with Database Snapshots. Although
you can create a Database Snapshot against a database enabled for FILESTREAM, you
cannot use the Database Snapshot as a source for a RESTORE DATABASE operation.

Q
Quick Check
1 . Which two features are incompatible with Database Snapshots?

2. Prior to reverting a database using a Database Snapshot, what must you do?

Quick Check Answers
1 . FILESTREAM and full text indexes

2. You must drop all Database Snapshots except the Database Snapshot being used
as the source for the RESTORE command.
E

PR ACTICE Creating a Database Snapshot

In the following practice, you create a Database Snapshot against the AdventureWorks database.

CREATE DATABASE AdventureWorksSnap ON
(NAME = N'AdventureWorks2008_Data', FILENAME = N'c:testAdventureWorks.ds'),
(NAME = N'S AdventureWorksFT', FILENAME = N'C:testAdventureWorks FT.ds')
AS SNAPSHOT OF AdventureWorks
GO

2. Execute the following code to compare the structures of the source database and the
Database Snapshot. Note the value in the source_database_id column of master.sys.
database:

SELECT * FROM AdventureWorks.sys.database_files
SELECT * FROM AdventureWorksSnap.sys.database_files
SELECT * FROM master.sys.databases
GO

3. Expand the Database Snapshots node in Object Explorer to view the new Database
Snapshot that you just created.
4. Execute a SELECT statement against the Database Snapshot and compare the results to
the AdventureWorks database.
5. Make a change to the data and compare the results between the Database Snapshot
and the AdventureWorks database.


Lesson Summary
A Database Snapshot is a point-in-time, read-only, copy of a database.
The Database Snapshots feature is not compatible with FILESTREAM or full-text indexes.
You can revert a database from a Database Snapshot.

Lesson Review
“Database Snapshots.” The question is also available on the companion CD if you prefer to

NOTE
E ANSWERS

1. A Database Snapshot can be created against which database? (Choose all that apply.
Each answer is a complete solution.)
A. master
B. A database with full text indexes
C. A database with FILESTREAM data
D. distribution


Chapter Review
following tasks:

Chapter Summary
Backups are the insurance policy for your data. Although you hope that you never have
to use a backup, in the event of a disaster, backups allow you to recover your databases
and continue business operations.
The first operation that should be performed during a restore process is to back up the
tail of the log.
Every restore begins with a full, file, or filegroup backup.
You can create transaction log backups for a database that is in the Full or Bulk-logged
recovery model.
You can restore to a point in time using a transaction log backup; however, you cannot
restore to a point in time during which a minimally logged transaction was executing.

Key Terms
Database revert
Differential backup
Full backup
Log Sequence Number (LSN)
Online restore
Page corruption
Partial backup
Partial restore
Tail backup
Transaction log backup
Transaction log chain


Case Scenario

Case Scenario: Designing a Backup Strategy for Coho Vineyard
BACKGROUND
Company Overview
significant growth. To continue expanding, the company acquired several additional wineries
over the years. Today, the company owns 16 wineries; 9 wineries are in Washington, Oregon,
and California, and the remaining 7 wineries are located in Wisconsin and Michigan. The
wineries employ 532 people, 162 of whom work in the central office that houses servers
critical to the business. The company has 122 salespeople who travel around the world and
need access to up-to-date inventory information.

Planned Changes
After the data is consolidated at the central office, merge replication will be used to deliver
data to the salespeople and allow them to enter orders. To meet the needs of the salespeople
until the consolidation project is completed, inventory data at each winery will be sent to the
central office at the end of each day.


Databases


DATABASE SIZE

Accounting 500 MB
HR 100 MB
Inventory 250 MB
Promotions 80 MB


will serve as a data store to the new Web store. As part of their daily work, employees also

Database Servers

placed before the previous month should be moved to another partitioned table named
Order.Archive. Partition 1 of Order.Archive includes all archived data. Partition 2 remains
empty. The archive data should reside in a different filegroup than the actively used data.
4 A.M. daily.
which contain tables relevant to placing an order. The EDI import routine is currently a single
threaded C++ application that takes between three and six hours to process the files. You
need to finish the EDI process by 5:30 P.M. to meet your Service Level Agreement (SLA) with
the customers. After the consolidation project has finished, the EDI routine will load all data
You need to back up all databases at all locations. All production databases are required
to be configured with the Full recovery model. You can lose a maximum of five minutes of
data under a worst-case scenario. The Customer, Account, Inventory, Promotions, and Order
than two months in the Customer and Order databases can be off-line for up to 12 hours in
the event of a disaster.
1. What backups do you need for the Account, Inventory, and Promotions databases?
2. What backup do you need for the Customer and Order databases?
3. What backup do you need for the HR database?


Suggested Practices

Backing up a Database
Practice 1 Create a certificate. Create a table that contains data encrypted by the
certificate. Back up the certificate along with the private key.
Practice 2 Create a database with multiple filegroups. Back up the entire database
using filegroup, differential, and transaction log backups.

Restoring a Database
Practice 1 Restore a certificate and the private key from a backup. Verify that you can
decrypt the data in your table using the restored certificate.
Practice 2 Practice restoring the database to different points in time using the
filegroup, differential, and transaction log backups that you created in the “Backing up
a Database” practice.




CHAPTER 10

Automating SQL Server
o ensure that your data is protected, you need to create backups frequently. In addition,
T you need to run various database maintenance routines, such as reindexing databases,
shrinking files, and expanding databases. In this chapter, you learn how to create and
schedule jobs within SQL Server Agent. You also learn how to configure alerts to notify you
of issues that need attention or to execute routines to fix problems before an outage occurs.

Manage SQL Server Agent jobs.
Manage SQL Server Agent alerts.
Manage SQL Server Agent operators.
Identify SQL Agent job execution problems.

Lesson 1: Creating Jobs 234

Lesson 2: Creating Alerts 242

Before You Begin
Microsoft SQL Server 2008 installed.
The AdventureWorks database installed within the instance.

CHAPTER 10 233

Lesson 1: Creating Jobs
SQL Server Agent provides a scheduling engine for SQL Server. Without SQL Server Agent,
you would either have to install a separate scheduling engine, or administrators would have
to remember to execute jobs at various times throughout the day.
Jobs provide the execution container that allows you to package together one or more
steps in a process that needs to execute. Although many jobs that you create have only a
single task, SQL Server allows you to create jobs composed of multiple tasks for which you
can configure various actions depending on whether the tasks succeeded or failed. Each task
or unit of work to be performed is contained within a job step.

Create jobs
Create operators
View job status and history


Job Steps
Job steps are the execution elements within a job. The types of job steps that can be executed are:
Transact-SQL (T-SQL)
Replication tasks
Operating system tasks or executable files
Analysis Services tasks
Integration Services packages
ActiveX scripts
Like any executable code, each job step runs under a security context. The default security
context for a job step corresponds to the login that is set as the owner of the job. You can
also override the security context by specifying a proxy account that the SQL Server Agent
uses for the job step based on credentials assigned to the proxy account.
In addition to the commands to execute, a job step can be configured with:
Logging
Notification to an operator
Retry settings that specify the number of times to retry a step as well as the number of
minutes between retries
Control flow logic

234 CHAPTER 10 Automating SQL Server

The control flow options allow you to specify an action based on either success or failure
as follows:
Quit job reporting success
Quit job reporting failure
Go to next step
Go to a specific step number
Logging can be directed to a file that is overwritten each time the step executes or you
can append to an existing file. You can also log step output to a table, although this is not
generally recommended due to the extra overhead of logging to a table versus logging to a
text file.

BEST PRACTICE
E LOGGING JOB STEPS
Every job step that you create should be configured to log to a file. The most common
way to configure logging is to create a new log file in the first step of a job, and then each
subsequent job step appends information to the log file.

Job Schedules
After you have added one or more steps to your job, you are ready to specify a schedule.
Schedules are defined and stored as independent objects, allowing you to define a single
schedule that can be applied to multiple jobs.
A job schedule can be created through either the Manage Schedules dialog box or during
the creation of a job. Some of the properties that you can set for a schedule are:
Frequency type; for example, daily, weekly, or monthly
Recurrence within a daily, weekly, or monthly recurrence; for example every third day
of every second month for a monthly frequency type
Recurrence within a day on a minute or hourly basis
Start and stop times
Start and end date for the schedule to be valid
For example, you could create a schedule to execute the first Monday of every third month
and then every 15 minutes between the hours of 3:00 A.M. and 7:00 P.M.

Job History
When a job is executed, any errors or messages are sent to the log file or table that is
specified for each step, allowing you to review the log file in the event of a job execution
error.

Lesson 1: Creating Jobs CHAPTER 10 235

In addition to any logging configured for a job step, each time a job executes, SQL Server
logs information into the dbo.sysjobhistory table in the msdb database for each job step that
is executed within the job. Some of the information that is recorded is:
Job step
Status
Execution date and time
Duration
If an error occurs, the number, severity, and text of the last error message generated

EXAM TIP
You need to know where to find information to diagnose the cause of an error in a job
step.

Operators
An operator is an alias for a person, group, or device. Operators are used to send notifications
when jobs fail or an alert is generated. For each operator, you specify a name along with contact
information such as an e-mail address, pager number, or NET SEND address. In addition, you can
designate which day(s) and operator(s) are available and the start and end time of a workday.

NOTE
E UNDERSTANDING THE STANDARD WORKWEEK
The start and end time of a workday is based on the U.S. standard workweek of Monday–
Friday and does not accommodate any other workweek definition.

Q
Quick Check
1 . If a job fails, where can you look to diagnose the problem?

2. What types of job steps can be executed?

Quick Check Answers
1 . The first place to look is in the job history, which can be accessed from SQL
Server Management Studio (SSMS) by right-clicking a job and selecting View
History. You can also look in the logging files that are configured for each job
step. In some cases, you might find additional information in the Microsoft
Windows event logs.

2. You can create jobs that execute T-SQL, ActiveX scripts, operating system
commands, or executable files. You can also configure specific tasks for
replication, Analysis Services, and Integration Services.


PR ACTICE Creating Jobs and Operators

In these practices, you deﬁne an operator for SQL Server to notify. You also create a job to
reindex the AdventureWorks database.

PR ACTICE 1 Create an Operator
In this practice, you create an operator that will be subsequently used to send notiﬁcations for
jobs and alerts.
1. In SQL Server Management Studio, expand the SQL Server Agent node, right-click
Operators, and select New Operator.
2. Give the operator a name and specify an e-mail address, as shown here.

3. Click OK and review the operator that was just created.

PR ACTICE 2 Create a Job
In this practice, you create a job to reindex the AdventureWorks database.
1. Execute the following code in the AdventureWorks database to create a stored
procedure to reindex tables:

CREATE PROCEDURE dbo.asp_reindex @database SYSNAME, @fragpercent INT
AS
DECLARE @cmd NVARCHAR(max),
@table SYSNAME,
@schema SYSNAME


--Using a cursor for demonstration purposes.
--Could also do this with a table variable and a WHILE loop
DECLARE curtable CURSOR FOR
SELECT DISTINCT OBJECT_SCHEMA_NAME(object_id, database_id) SchemaName,
OBJECT_NAME(object_id,database_id) TableName
FROM sys.dm_db_index_physical_stats (DB_ID(@database),NULL,NULL,NULL,'SAMPLED')
WHERE avg_fragmentation_in_percent >= @fragpercent
FOR READ ONLY

OPEN curtable
FETCH curtable INTO @schema, @table

WHILE @@FETCH_STATUS = 0
BEGIN
SET @cmd = 'ALTER INDEX ALL ON ' + @database + '.' + @schema + '.' + @table
+ ' REBUILD WITH (ONLINE = ON)'

--Try ONLINE build first, if failure, change to OFFLINE build.
BEGIN TRY
EXEC sp_executesql @cmd
END TRY
BEGIN CATCH
BEGIN
SET @cmd = 'ALTER INDEX ALL ON ' + @database + '.' + @schema + '.'
+ @table + ' REBUILD WITH (ONLINE = OFF)'

EXEC sp_executesql @cmd
END
END CATCH
FETCH curtable INTO @schema, @table
END

CLOSE curtable
DEALLOCATE curtable
GO

2. Below SQL Server Agent, right-click the Jobs node and select New Job.
3. Give your new job a name, set the owner to sa, select Database Maintenance for the
job category, and add a description, as shown here.


4. Select the Steps page and click New to open the New Job Step dialog box.
5. Specify a name for the step, select Transact-SQL for the step type, leave Run As blank,
enter the Database name as AdventureWorks, and enter the SQL command shown
here for the reindex procedure you created in the previous practice.


6. Select the Advanced page and specify an output file of C:TestDailymaintenance.txt
to log to. Click OK.
7. Select the Schedules page and click New to define a new daily schedule, as shown
here, and click OK to close the New Job Schedule dialog box.

8. Click OK to save the new job and close the New Job dialog box.
9. Expand Jobs. Right-click the Re-index Databases job and select Start. Upon completion
of the job, review the job execution history and the logging file.

Lesson Summary
An operator is an alias for a person, group, or device to which you want to be the
target of notifications.
A job can be created that contains multiple steps with control flow dependency,
logging, and one or more schedules.

Lesson Review
“Creating Jobs.” The question is also available on the companion CD if you prefer to review it
in electronic form.


NOTE
E ANSWERS

1. Where would you look to retrieve a list of jobs that have failed?
A. The Windows event log
B. The job history in SSMS
C. The SQL Server Agent error log
D. The SQL Server error log


Lesson 2: Creating Alerts
Alerts provide the capability to send notifications or perform actions based upon events or
conditions that occur either within the SQL Server instance or on the machine hosting your
instance.

Create alerts


SQL Server Agent Alerts
Alerts can be configured as one of the following three types:
A SQL Server event
A Performance Condition alert
A Windows Management Instrumentation (WMI) event
An alert is raised for a SQL Server event based on either an error number or an error
severity level. In addition, you can restrict the alert to a specific database or a specific text
string within an error message. When a SQL Server event alert is created, the SQL Server
Agent scans the Windows Application event log to look for matches to the event criteria that
you have defined. For example, you could fire an alert on an error severity of 22 to notify an
operator that a table is suspect.
Performance Condition alerts are defined against System Monitor counters. When the
alert is defined, you specify the object, counter, and instance that you want to monitor
along with specifying a condition for the value of the counter and whether the alert should
be fired when the counter is greater than, less than, or equal to your specified value. For
example, you could fire an alert to notify you when the amount of free disk space falls below
15 percent.
An alert for a WMI event allows you to send alerts based on events that occur on the
server hosting your SQL Server instance. Anytime an event occurs on the machine (for
example, the network card is disconnected, a file is created, a file is deleted, or the registry is
written to), a WMI event is raised within Windows. A WMI alert sets up a listener to the WMI
infrastructure to fire the alert when the Windows event occurs.
Each alert can be configured with a response. The responses that are available are:
Execute job
Notify operator


By specifying a job to execute when an alert is raised, you can configure your environment to
trap and attempt to fix errors on an automated basis, eliminating the need for an administrator
to respond to routine events.

EXAM TIP
You need to know the types of alerts that can be defined in SQL Server and the response
criteria that can be specified for an alert.

Q
Quick Check
1 . What are the three types of alerts that can be created?

2. What are the two response actions that can be configured for an alert?

Quick Check Answers
1 . You can create alerts on performance counters, SQL Server errors, and WMI
queries.

2. You can have an alert send a notification or execute a job in response to the alert
condition.

PR ACTICE Creating Alerts

In this practice, you create alerts to send notifications when you are running out of transaction
log space and when a Level 22 error occurs.

PR ACTICE 1 Create a Performance Condition Alert
In this practice, you create an alert to send a notification when the percentage of the transaction
log space used for the AdventureWorks database exceeds 90 percent.
1. In SQL Server Management Studio, below SQL Server Agent, right-click Alerts and
select New Alert.
2. Give your alert a name, and from the Type drop-down list, select SQL Server
Performance Condition Alert.
3. From the Object drop-down list, select the SQLServer:Databases object. From the
Counter drop-down list, select Percent Log Used. Select the AdventureWorks instance
from the Instance drop-down list, and set the alert for when the counter rises above 90
by selecting Rises Above from the Alert If Counter drop-down list and entering 90 into
the Value text box.

Lesson 2: Creating Alerts CHAPTER 10 243

4. Select the Response page, select the Notify Operators check box, and select the check
boxes for the notiﬁcation options for your operator.
5. Select the Options page and select the E-mail check box to include the alert error text.
Click OK.

PR ACTICE 2 Create a SQL Server Event Alert
In this practice, you create a SQL Server Event alert.
1. Right-click Alerts and select New Alert.
2. Give your alert a name by entering it into the Name text box, and select SQL Server
Event Alert from the Type drop-down list.
3. Specify All Databases and an error severity of 22, as shown here.
4. Select the Response page, select the Notify Operators check box, and select the
notiﬁcation options for your operator.
5. Select the Options page and select the E-mail check box to include the alert error text.
Click OK.


Lesson Summary
Alerts enable you to notify operators as well as execute jobs to fix problems when an
event occurs.

Lesson Review
“Creating Alerts.” The question is also available on the companion CD if you prefer to review it
in electronic form.

NOTE
E ANSWERS

1. Your Orders database crashed last night, and you have determined that the crash was
caused by a data file running out of space. What tool do you use to send a notification
to an administrator as well as expand the data file before it runs out of space?
A. SQL Server Agent
B. System Monitor
C. Event Viewer
D. Network Monitor

Lesson 2: Creating Alerts CHAPTER 10 245

Chapter Review
following tasks:

Chapter Summary
SQL Server Agent provides a scheduling engine that can be used to execute jobs.
Jobs can have one or more steps with basic control flow dependencies, logging,
notification, and one or more execution schedules.
Operators are used to encapsulate the settings used to send a notification to a person,
group, or device.

Key Terms
Alert
Job step
Operator

Case Scenario

Case Scenario: Designing an Automation Strategy for Coho Vineyard

BACKGROUND

Company Overview



Planned Changes


Databases
databases, shown in Table 10-1.


DATABASE SIZE

Accounting 500 MB
HR 100 MB
Inventory 250 MB
Promotions 80 MB

will serve as a data store to the new Web store. As part of their daily work, employees
also will connect periodically to the Order database using a new in-house Web
application.

Database Servers
Server 2008 Enterprise on Windows Server 2003, Enterprise edition.


placed before the previous month should be moved to another partitioned table named Order.
4 A.M. daily.
single-threaded C++ application that takes between three and six hours to process the files.
Answer the following question.
1. What would you configure to ensure that administrative processes were automated?

Suggested Practices
tasks.

Create Jobs
Practice Create jobs to execute full, differential, and transaction log backups within
your environment.

Create Alerts
Practice Create an alert that is raised when the disk drive that your data files is on has
less than 15 percent free space available.


CHAPTER 11

Designing SQL Server
Security
esigning a solid security system requires implementation of a layered approach.
D This process is called “defense in depth.” This chapter explains how to configure each
layer within the Microsoft SQL Server security infrastructure to help prevent unauthorized
access to data.

Manage logins and server roles.
Manage users and database roles.
Manage SQL Server instance permissions.
Manage database permissions.
Manage schema permissions and object permissions.
Audit SQL Server instances.
Manage transparent data encryption.
Configure surface area.

Lesson 1: TCP Endpoints 252

Lesson 2: Configuring the SQL Server Surface Area 259

Lesson 3: Creating Principals 263
Lesson 4: Managing Permissions 271

Lesson 5: Auditing SQL Server Instances 285
Lesson 6: Encrypting Data 292

Before You Begin
To complete the lessons in this chapter, you must have an instance of SQL Server 2008
installed with the AdventureWorks sample database.

CHAPTER 11 251

Lesson 1: TCP Endpoints
Endpoints control the capability to connect to an instance of SQL Server as well as to dictate
the communications methods that are acceptable. Acting very similar to firewalls on the
network, endpoints are a layer of security at the border between applications and your SQL
Server instance. This lesson provides a basic overview of the endpoint architecture present in
SQL Server 2008.

to:
Understand the role of endpoints in securing a SQL Server 2008 instance


Endpoint Types and Payloads
An endpoint has two basic parts: a transport and a payload. Endpoints can be of two different
transports: TCP and HTTP. Endpoints also have a payload that defines the basic category of
traffic that is allowed and have the values of SOAP, TSQL, SERVICE_BROKER, and DATABASE_
MIRRORING.
Table 11-1 lists the valid combinations of endpoint transport and endpoint payload.

TABLE 11-1 Endpoint Transport and Payload

TRANSPORT PAYLOAD

TCP TSQL
TCP SERVICE_BROKER
TCP DATABASE_MIRRORING
HTTP SOAP

By combining an endpoint transport and payload, SQL Server can filter acceptable traffic
before a command even reaches the SQL Server instance. For example, suppose you have an
endpoint defined as TCP with a payload of TSQL. If any application attempted to send HTTP,
SERVICE_BROKER, or DATABASE_MIRRORING traffic through the endpoint, the connection
would be denied without needing to authenticate the request.
This process is very similar to the way firewalls work on a network. Network
administrators configure firewalls to allow traffic on only a specific set of TCP and
UDP ports. Any request attempting to use a port that is blocked is rejected at the
firewall. Endpoints act in the same manner by rejecting requests that are not properly
formatted based on the endpoint definition.

252 CHAPTER 11 Designing SQL Server Security

Endpoint Access
Even if traffic going to the endpoint matches the correct transport and payload, a connection
is still not allowed unless access has been granted on the endpoint. Endpoint access has
two layers.
The first layer of access security is determined by the endpoint state. An endpoint can have
one of three states: STARTED, STOPPED, and DISABLED. The three states of an endpoint react
as follows:
STARTED The endpoint is actively listening for connections and will reply to an
application.
STOPPED The endpoint is actively listening but returns a connection error to an
application.
DISABLED The endpoint does not listen and does not respond to any connection that
is attempted.
The second layer of security is permission to connect to the endpoint. An application
must have a login created in SQL Server that has the CONNECT permission granted on the
endpoint before the connection is allowed through the endpoint.
You might be wondering about all the effort involved just to create a connection to an
instance of SQL Server before the user is even authenticated. In prior versions of SQL Server,
any application could connect to a server running SQL Server and transmit any type of
request. No attempt was made to ensure that applications had to transmit validly formed
requests, so hacking into a server running SQL Server was much easier to accomplish.
SQL Server 2008 ensures that only valid requests can be submitted by a valid user before a
request is scheduled within the engine. Administrators also have a master switch to shut off
access immediately if they feel someone is attempting to compromise their server running
SQL Server by setting the state of the endpoint being used to DISABLED.

TCP Endpoints
You can create Transmission Control Protocol (TCP) endpoints with three different payloads:
TSQL, DATABASE_MIRRORING, and SERVICE_BROKER.

TCP Protocol Arguments
You can configure TCP endpoints to listen on specific Internet Protocol (IP) addresses and
port numbers. The two arguments you can specify that are universal for all TCP endpoints are
the following:
LISTENER_PORT
LISTENER_IP
LISTENER_PORT is required. The TCP for TSQL endpoint that is created for each instance
during installation is already configured for TCP port 1433 or the alternate port number for
the instance.

Lesson 1: TCP Endpoints CHAPTER 11 253

BEST PRACTICES PORT NUMBERS
Because port 5022 is the default TCP port number for a DATABASE_MIRRORING endpoint,
and 1433 is the default TCP port for a TSQL endpoint, you might want to specify a different
port number. Not using the default port number helps to foil potential hackers—or at
least makes their job more difficult—by requiring them to use a port scanner instead of
just blindly connecting to port 1433 or 5022 for a denial of service (DoS) attack or other
hacking attack.

The LISTENER_IP argument is an optional argument that can provide a very powerful
security layer for some types of applications. You can specify a specific IP address for the
endpoint to listen on. The default setting is ALL, which means that the endpoint listens for
connections sent to any valid IP address configured on the machine. However, if you want
to limit connection requests to a specific network interface card (NIC), you can specify a
LISTENER_IP argument. When you specify an IP address, the endpoint listens for requests
sent only to the IP address specified.

EXAM TIP
TSQL endpoints do not have any additional configuration options beyond the universal
TCP settings.

Database Mirroring and Service Broker Common
Arguments
Database Mirroring and Service Broker endpoints provide options to specify the
authentication method and the encryption setting. You can use either Microsoft
Windows–based authentication or certificates. You specify Windows-based authentication
by selecting the NTLM, KERBEROS, or NEGOTIATE option. The NEGOTIATE option causes the
instances to select the authentication method dynamically. You can set up certificate-based
authentication by using a certificate from a trusted authority or by generating your own
Windows certificate.

BEST PRACTICES AUTHENTICATION
When all Database Mirroring and Service Broker instances reside within a single domain
or across trusted domains, you should use Windows authentication. When instances span
non-trusted domains, you should use certificate-based authentication.

SQL Server can encrypt all communications between endpoints, and you can specify which
encryption algorithm to use for the communications. The default algorithm is RC4, but you
can specify the much stronger Advanced Encryption Standard (AES) algorithm.


BEST PRACTICES ENCRYPTION
Use RC4 for minimal encryption strength and best performance. Use AES if you require
strong encryption, but note that this algorithm requires more calculation overhead and will
affect performance.

Database Mirroring–Specific Arguments
Database Mirroring endpoints include a third argument related to the role within the Database
Mirroring session.

EXAM TIP
You can specify only one TCP endpoint with a payload of DATABASE_MIRRORING for each
instance of SQL Server.

You can specify that an endpoint is a PARTNER, WITNESS, or ALL. An endpoint specified as
PARTNER can participate only as the principal or as the mirror. An endpoint specified as WITNESS
can participate only as a witness. An endpoint specified as ALL can function in any role.

NOTE ENDPOINTS ON EXPRESS EDITION
If you are creating a Database Mirroring endpoint on SQL Server 2008 Express, it supports
only a role of WITNESS.

The following T-SQL example shows how to create a Database Mirroring endpoint:

CREATE ENDPOINT [Mirroring]
AS TCP (LISTENER_PORT = 5022)
FOR DATA_MIRRORING (ROLE = PARTNER, ENCRYPTION = REQUIRED);
ALTER ENDPOINT [Mirroring] STATE = STARTED;

This code creates an endpoint to service Database Mirroring sessions on port 5022,
responding to requests from all valid IP addresses. The ROLE = PARTNER option specifies
that the endpoint allows only databases hosted on this SQL Server instance to participate as
a principal or mirror using the RC4 encryption algorithm.

Service Broker–Specific Arguments
In addition to authentication modes and encryption, Service Broker endpoints implement
arguments related to message forwarding.
The MESSAGE_FORWARDING option enables messages destined for a different broker
instance to be forwarded to a specified forwarding address. The options are ENABLED and
DISABLED. If the MESSAGE_FORWARDING option is set to ENABLED, you can also specify


the MESSAGE_FORWARDING_SIZE, which specifies the maximum amount of storage to
allocate for forwarded messages.
Although a complete discussion of Service Broker is beyond the scope of this book, a short
overview is necessary to explain this behavior. Service Broker instances process messages
by executing stored procedures to perform work in an asynchronous manner. Each Service
Broker instance is configured to process messages of a particular format. However, it is
possible to have many Service Broker instances configured in an environment, each of which
processes different types of messages. By employing message forwarding, administrators
can balance the load on Service Broker instances more easily, without requiring changes to
applications.

NOTE
E ENCRYPTION
The communication encryption for endpoints is coded to understand the source and
destination of the traffic. If the communication occurs entirely within the SQL Server
instance, the traffic is not encrypted because it would introduce unnecessary overhead
in the communications. This is especially important with Service Broker, in which many
messages are exchanged between queues within a single instance. Traffic is encrypted only
when data will be transmitted outside the SQL Server instance.

Q
Quick Check
1 . What are the two parts of an endpoint?

2. What are the three states of an endpoint, and what is the difference between
each state?

3. What authority must be granted before an endpoint allows a connection request?

4. What types of authentication are available for Service Broker and Database
Mirroring endpoints?

5. What are the two universal arguments for TCP endpoints?

Quick Check Answers
1 . An endpoint has a transport defined as either TCP or HTTP and has a payload
P P
defined as TSQL, SERVICE_BROKER, DATABASE_MIRRORING, or SOAP.
P

2. The three states are STARTED, STOPPED, and DISABLED. An endpoint that is
STARTED listens for and allows connections. An endpoint that is STOPPED listens
for connection requests and returns an error message. An endpoint that is
DISABLED does not respond to any request.

3. To allow a connection request, the login that is being used must have been
granted the CONNECT permission on the endpoint.
T


4. NTML or Kerberos authentications can be specified. You can also specify an
option of NEGOTIATE, which causes the specific authentication method to be
E
negotiated between the application and the endpoint.

5. You are required to specify a port for the endpoint to listen on. If you want,
you can configure an IP address that restricts the endpoint to respond to traffic
coming only from the specified IP address.

PR ACTICE Inspecting Existing Endpoints

In this practice, you query several dynamic management views (DMVs) to gather information
about endpoints configured in your environment.
1. Start SQL Server Management Studio (SSMS) and connect to your instance. Open a
new query window and execute the following batch:

SELECT * FROM sys.endpoints
SELECT * FROM sys.tcp_endpoints
SELECT * FROM sys.http_endpoints
SELECT * FROM sys.database_mirroring_endpoints
SELECT * FROM sys.service_broker_endpoints

2. Inspect the results for the data that is returned from each of the DMVs.

Lesson Summary
Endpoints in SQL Server act very similar to firewalls by filtering out any traffic that does
not meet allowed formats.
Each endpoint has a transport that is defined as either TCP or HTTP.
Endpoints have a second part called the payload, which is defined as TSQL, DATABASE_
MIRRORING, SERVICE_BROKER, or SOAP.
TSQL endpoints are configured during installation to listen on the port number
specified for the instance.
Service Broker and Database Mirroring endpoints can have an authentication method
specified as well as enabled for encrypting all traffic sent based on an algorithm that
you specify.

Lesson Review
You can use the following questions to test your knowledge of the information presented in
Lesson 1, “TCP Endpoints.” The questions are also available on the companion CD if you prefer
to review them in electronic form.


NOTE ANSWERS

1. You are the database administrator at A. Datum Corporation. Users are complaining
that applications cannot connect to the SQL Server. You have verified all the application
settings and you can connect to the server from your desktop using SSMS, but the
users’ applications keep returning an “Access denied” error message. What could be
the problem?
A. The TCP endpoint for TSQL is DISABLED.
B. The TCP endpoint for TSQL is STOPPED.
C. Remote connections are not enabled.
D. Users do not have CONNECT permissions on the endpoint.
2. You have configured a Database Mirroring session within your environment.
The Principal and Mirror endpoints were created successfully with a ROLE setting
of PARTNER and then started. You have verified that you can connect to and
authenticate to each endpoint. However, Database Mirroring fails to configure
properly. What might be the problem?
A. The authentication mode is set to NTLM.
B. The authentication mode is set to NEGOTIATE.
C. The encryption setting is different on each endpoint.
D. The encryption is set to AES on each endpoint.


Lesson 2: Configuring the SQL Server Surface Area
Given enough time, anyone can eventually beat any security implementation. The purpose
of security is to provide enough barriers such that the effort required to break into a system
exceeds the benefit received. In this lesson, you learn how to configure your instances to
expose the minimum number of attack points possible by minimizing the feature set that is
enabled within SQL Server.

to:
Enable and disable SQL Server features


Surface Area Configuration
One of the most frequent ways that attackers gain access to a system is through features that
have been enabled but are rarely used. SQL Server now disables every feature not required
for the operation of the database engine.
At the time of installation, you can decide to force users to authenticate to an instance
using only Windows credentials. If the authentication mode for your instance is set to
Windows only, you have disabled users’ ability to use SQL Server native logins.
For many years, SQL Server did not have any issues with viruses, mainly because no one
wrote a virus specifically attacking SQL Server. Several years ago, the first SQL Server–specific
attack, the Slammer Trojan, wreaked havoc on organizations around the world within a few
hours of release. The main issue with Slammer was not that it targeted SQL Server, but that
administrators had an extremely difficult time containing the Trojan because thousands of
copies of SQL Server were installed across an organization and were not under administrative
control. Many of the SQL Server instances were Microsoft Database Engine (MSDE), used as a
local data store for many applications. Unfortunately, every SQL Server instance, regardless of
edition, allowed open connections from any source, when in fact few instances required the
ability to connect remotely.
Beginning with SQL Server 2005, you can determine whether the instance accepts remote
connections by configuring the network protocols for remote access. By default, editions that
normally do not need to allow remote connections, such as SQL Server Express, only have the
Shared Memory network provider enabled. If you want to be able to connect to a SQL Server
instance remotely, the Transmission Control Protocol/Internet Protocol (TCP/IP) network
provider must be enabled.
The biggest potential risk to an instance is through the use of features that expose an
external interface or ad hoc execution capability. The two features with the greatest risk are
OPENROWSET/OPENDATASOURCE and OLE Automation procedures.

Lesson 2: Configuring the SQL Server Surface Area CHAPTER 11 259

You enable and disable SQL Server features by using sp_configure. The features that you
should have disabled unless you need the specific functionality are the following:
Ad Hoc Distributed Queries
CLR Enabled
Cross Database Ownership Chaining (CDOC)
Database Mail
External Key Management
Filestream Access Level
OLE Automation Procedures
Remote Admin Connections
SQL Mail extended stored procedures (XPs)
xp_cmdshell
OLE Automation procedures exist within SQL Server to provide some basic interoperability
features for previous versions. With the inclusion of Common Language Runtime (CLR) in
SQL Server 2005, any applications that need the services of Object Linking and Embedding
(OLE) automation should be rewritten as Visual Basic .NET or C#.NET assemblies. The main
advantage of a CLR routine is that the routine runs within a protected memory space and
cannot corrupt the SQL Server memory stack, which is possible with OLE automation.
SQL Mail XPs exist for backwards compatibility and were deprecated in SQL Server 2005.
You should not be using SQL Mail, and if you have applications still using SQL Mail functionality,
they need to be rewritten before the next version of SQL Server ships.
OPENROWSET and OPENDATASOURCE expose you to attack by allowing applications
to embed security credentials into code that spawns a connection to another instance from
within SQL Server. If you need the ability to execute queries across instances, you should be
using linked servers which allow Windows credentials to be passed between machines.
CDOC allows you to transfer execution authority across databases. When enabled, the
owner of the database containing the object being called effectively cedes control to another
database owner. In Lesson 4, “Managing Permissions,” you will learn about signatures, which
provide better control over security while still allowing procedures, functions, and triggers to
access objects across databases.

EXAM TIP
SQL Server 2005 provided a utility called the Surface Area Configuration Manager, which
does not exist in SQL Server 2008. The functionality that was provided by the Surface Area
Configuration for Connections is now accomplished using the SQL Server Configuration
Manager. The functionality provided by the Surface Area Configuration for Features did
not change; the GUI interface to sp_configure was just removed.


Q
Quick Check
1 . How do you configure an instance so that only local connections are allowed?

2. What do you use to enable or disable features for an instance?

Quick Check Answers
1. The TCP/IP provider enables connections to be created to the instance remotely. By
disabling the TCP/IP provider, you can create only local connections to the instance.

2. The sp_configure system stored procedure is used to enable or disable features.

PR ACTICE Configuring the Surface Area

In this practice, you disable several features and check the configuration options that are set
for the instance.
1. Execute the following code to turn on the ability to view all the configuration options
for an instance:
EXEC sp_configure 'show advanced options',1
GO
GO
EXEC sp_configure
GO

2. Execute the following code to turn off ad hoc distributed queries, CDOC, CLR, OLE
automation procedures, SQL Mail XPs, and xp_cmdshell:
EXEC sp_configure 'Ad Hoc Distributed Queries',0
EXEC sp_configure 'clr enabled',0
EXEC sp_configure 'cross db ownership chaining',0
EXEC sp_configure 'Ole Automation Procedures',0
EXEC sp_configure 'SQL Mail XPs',0
EXEC sp_configure 'xp_cmdshell',0
GO
GO

Lesson Summary
The first surface area configuration decision that you make occurs during installation,
when you decide whether to force all login access to the instance to use Windows-only
credentials.
You should disable the TCP/IP provider for any instance that you do not want remote
connections.
The sp_configure tool is used to enable or disable features.

Lesson 2: Configuring the SQL Server Surface Area CHAPTER 11 261

Lesson Review
“Configuring the SQL Server Surface Area.” The question is also available on the companion
CD if you prefer to review it in electronic form.

NOTE ANSWERS

1. Which tool would you use to enable or disable SQL Server features?
A. SQL Server Configuration Manager
B. The sp_configure tool
C. SQL Server Surface Area Configuration Manager
D. SQL Server Installation Center


Lesson 3: Creating Principals
Principals are the means by which you authenticate and are identified within an instance or
database. Principals are broken down into two major categories: logins/users and groups that
exist at both an instance and database level.

to:
Create a login
Manage server role membership
Create database users and roles
Manage database role membership
Create a loginless user


Logins
To gain access to an instance, a user has to authenticate by supplying credentials for SQL
Server to validate. You create logins for an instance to allow a user to authenticate. Logins
within SQL Server 2008 can be five different types:
Standard SQL Server login
Windows login
Windows group
Certificate
Asymmetric key
A standard SQL Server login is created by a database administrator (DBA) and configured
with a name and password which must be supplied by a user to authenticate successfully. The
login is stored inside the master database and assigned a local security identifier (SID) within
SQL Server.
A SQL Server login can also be mapped to either a Windows login or a Windows group.
When adding a Windows login or Windows group, SQL Server stores the name of the login
or group along with the corresponding Windows SID. When a user logs in to the instance
using Windows credentials, SQL Server makes a call to the Windows security application
programming interface (API) to validate the account, retrieve the SID, and then compare the
SID to those stored within the master database to verify whether the Windows account has
access to the instance.

Lesson 3: Creating Principals CHAPTER 11 263

EXAM TIP
You can create SQL Server logins mapped to certificates or asymmetric keys. However, a
login mapped to a certificate or asymmetric key does not provide a means to authenticate
to the instance. Logins mapped to certificates and asymmetric keys are used internally as a
security container.

The generic syntax for creating a login is:

CREATE LOGIN loginName { WITH <option_list1> | FROM <sources> }

<option_list1> ::=
PASSWORD = { 'password' | hashed_password HASHED } [ MUST_CHANGE ]
[ , <option_list2> [ ,... ] ]

<option_list2> ::=
SID = sid
| DEFAULT_DATABASE = database
| DEFAULT_LANGUAGE = language
| CHECK_EXPIRATION = { ON | OFF}
| CHECK_POLICY = { ON | OFF}
| CREDENTIAL = credential_name

<sources> ::=
WINDOWS [ WITH <windows_options> [ ,... ] ]
| CERTIFICATE certname
| ASYMMETRIC KEY asym_key_name

<windows_options> ::=
DEFAULT_DATABASE = database
| DEFAULT_LANGUAGE = language

When the CHECK_POLICY option (the default and recommended setting) is enabled, SQL
Server 2008 enforces the Windows password policy settings when you create a SQL Server
login. CHECK_EXPIRATION is used to prevent brute force attacks against a login. When
CHECK_EXPIRATION is enabled, each time the login is used to authenticate to an instance,
SQL Server checks whether the password has expired and prompts the user to change the
password if necessary.
Using Windows groups provides the greatest flexibility for managing security access. You
simply add or remove accounts from the group to control access to a SQL Server instance.
A DBA is also isolated from the details of people joining and leaving companies or moving to
different groups within an organization. The DBA can then focus on defining groups based
on permission profiles and leave the mechanics of adding and removing user accounts to
standard business processes within your company.

When you create a SQL Server login, you can specify a SID for the account explicitly. You
will not normally use the capability to specify a SID; however, when you need to copy SQL
Server logins from one instance to another, being able to specify the SID allows you to map
logins appropriately to any restored databases.
An account protection mechanism in Windows causes an account to be locked out when
the correct password is not provided within a specified number of attempts. Every account in
Windows can be locked out due to failed login attempts, except the administrator account.
The administrator account cannot be locked out because if you locked out the administrator,
you would not have a way of logging into the system and fixing anything. Just like the
administrator account in Windows, the sa account cannot be locked out due to failed login
attempts, making the sa account a prime target for brute force attacks. System administrators
defeat brute force attacks on the administrator account by renaming the account. You can
also rename the sa account to protect an instance from brute force attacks.
When you are performing maintenance on a database, such as deploying new code
or changing the database structure, you need to ensure that users are not accessing the
database in the meantime. One way to prevent access is to revoke permissions from a login;
however, you then have to be able to reestablish the permissions afterward. You can prevent
access while keeping the permissions for a login intact by disabling the login. You can disable
the login by executing the following code:

ALTER LOGIN <loginname> DISABLE

Fixed Server Roles
Roles in SQL Server provide the same functionality as groups within Windows. Roles provide
a convenient way to group multiple users with the same permissions. Permissions are assigned
to the role, instead of individual users. Users then gain the required set of permissions by
having their account added to the appropriate role.
SQL Server ships with a set of instance-level roles. The instance-level roles are referred to
as fixed server roles, because you cannot modify the permissions on the role. You also cannot
create additional roles at an instance level.
The server roles that ship with SQL Server are shown in Table 11-2.

TABLE 11-2 Fixed Server Roles

ROLE MEMBERS CAN

bulkadmin Administer BCP and Bulk Insert operations
dbcreator Create databases
diskadmin Manage disk resources
processadmin Manage connections and start or pause an instance
securityadmin Create, alter, and drop logins, but can’t change passwords

TABLE 11-2 Fixed Server Roles

ROLE MEMBERS CAN

serveradmin Perform the same actions as diskadmin and processadmin, plus manage
endpoints, change instance settings, and shut down the instance
setupadmin Manage linked servers
sysadmin Perform any action within the instance. Members cannot be
prevented from accessing any object or performing any action.

Database Users
SQL Server security works on the principle of “no access by default.” If you haven’t explicitly
been granted permission, you cannot perform an action. You grant access to a database
by adding a login to the database as a user by executing the CREATE USER command. CREATE
USER has the following general syntax:

CREATE USER user_name
[ { { FOR | FROM }
{ LOGIN login_name
| CERTIFICATE cert_name
| ASYMMETRIC KEY asym_key_name}
| WITHOUT LOGIN ]
[ WITH DEFAULT_SCHEMA = schema_name ]

The SID of the login is mapped to the database user to provide an access path after a
user has authenticated to the instance. When a user changes context to a database, SQL
Server looks up the SID for the login and if the SID has been added to the database, the user
is allowed to access the database. However, just because a user can access a database, that
does not mean that any objects within the database can be accessed since the user still needs
permissions granted to database object(s).
You can create a database user mapped to a certificate or asymmetric key. Database users
mapped to certificates or asymmetric keys do not provide access to the database for any
login. Certificate- and asymmetric key–mapped users are a security structure internal to the
database. One of the applications of this structure, signatures, will be covered in Lesson 4,
“Managing Permissions,” later in this chapter.

Loginless Users
It is possible to create a user in the database that is not associated to a login, referred to as a
loginless user.
Prior to SQL Server 2005, if you wanted to allow users to access a database only when a
specific application was being used, you used an application role. You created the application
role with a password, and assigned permissions to the application role. Users would then
specify the password for the application role to gain access to the database under the
application role’s security context. Unfortunately, when you connected with the application


role, SQL Server no longer knew the user issuing commands, which created a problem for
auditing activity.
Loginless users were added to replace application roles. Users still authenticate to the
instance using their own credentials. The user’s login needs access to the database. After SQL
Server changes the user’s context to the database, the user impersonates the loginless user
to gain necessary permissions. Because the user is authenticating to the instance using his or
her own credentials, SQL Server can still audit activity to an individual login even though the
login is impersonating a loginless user.

EXAM TIP
Loginless users are designed to replace application roles. Loginless users also provide a
much better audit trail than an application role because each user must authenticate to the
instance using their own credentials instead of using a generic account.

Fixed Database Roles
Just as you have fixed roles at an instance level, SQL Server provides a set of fixed roles at a
database level, as shown in Table 11-3.

TABLE 11-3 Fixed Database Roles

ROLE MEMBERS CAN

db_accessadmin Add or remove users in the database
db_backupoperator Back up the database but cannot restore a database or view any
information in the database
db_datareader Issue SELECT against all tables, views, and functions within the
database
db_datawriter Issue INSERT, UPDATE, DELETE, and MERGE against all tables
within the database. Members of this role must also be members
of the db_datareader role.
db_ddladmin Execute data definition language (DDL) statements
db_denydatareader Prevent SELECT against all tables, views, and functions within the
database
db_denydatawriter Prevent INSERT, UPDATE, DELETE, and MERGE against all tables
within the database
db_owner Owner of the database that has full control over the database
and all objects contained within the database
db_securityadmin Manage the membership of roles and associated permissions, but
cannot manage membership for the db_owner role
public Default group in every database that all users belong to


User Database Roles
Instead of managing permissions for each account, all modern operating systems allow you
to deﬁne groups of users that all have the same permissions. All system administrators need
to do is to manage the members of a group instead of the potentially hundreds or thousands
of individual permissions.
SQL Server uses the same security management principles that administrators have applied to
Windows domains by providing the ability to create database roles. A database role is a principal
within the database that contains one or more database users. Permissions are assigned to the
database role. Although you can assign permissions directly to a user, it is recommended that
you create database roles, add users to a role, and then grant permissions to the role.

Q
Quick Check
1 . Which logins cannot be used to authenticate to an instance?

2. What database principal was created as a replacement for an application role?

Quick Check Answers
1 . You cannot use logins that are mapped to a certiﬁcate or asymmetric key to
authenticate to an instance.

2. Loginless users are the replacement for an application role.

PR ACTICE Creating Logins and Database Users

In this practice, you create several logins, add the logins as users in the AdventureWorks
database, and then create a loginless user within the AdventureWorks database.
1. Click Start, right-click My Computer, and select Manage.
2. Right-click the Users node underneath Local Users and Groups and select New User.
Create a Windows account named TestAccount. Close Computer Management.
3. Execute the following code to add the Windows account as a login to your instance,
replacing <computer name> with the name of the machine on which you are running
SQL Server:

--Brackets are required due to the rules for identifiers
CREATE LOGIN [<computer name>TestAccount] FROM WINDOWS
GO

4. Execute the following code to create two SQL Server native logins, replacing
<EnterStrongPasswordHere> with a strong password, and add the accounts as users
to the AdventureWorks database:

CREATE LOGIN Test WITH PASSWORD = '<EnterStrongPasswordHere>'
CREATE LOGIN Test2 WITH PASSWORD = '<EnterStrongPasswordHere>'
GO

USE AdventureWorks
GO
CREATE USER Test FOR LOGIN Test
CREATE USER Test2 FOR LOGIN Test2
GO

5. Execute the following code to create a loginless user in the AdventureWorks database:

USE AdventureWorks
GO
CREATE USER TestUser WITHOUT LOGIN
GO

6. Execute the following code to review the endpoints along with the instance and
database principals:

--Instance level principals.
SELECT * FROM sys.asymmetric_keys
SELECT * FROM sys.certificates
SELECT * FROM sys.credentials
SELECT * FROM sys.linked_logins
SELECT * FROM sys.remote_logins
SELECT * FROM sys.server_principals
SELECT * FROM sys.server_role_members
SELECT * FROM sys.sql_logins
SELECT * FROM sys.endpoints
GO

--Database level principals.
SELECT * FROM sys.database_principals
SELECT * FROM sys.database_role_members
GO

7. Execute the following code to rename the sa account:

ALTER LOGIN sa WITH NAME = MySaAccount
GO

Lesson Summary
You can create SQL Server native logins or map Windows accounts to a SQL Server
login.
Logins can be mapped to certiﬁcates or asymmetric keys, but logins mapped to
certiﬁcates or asymmetric keys cannot be used to authenticate to an instance.
Since the sa account cannot be locked out, you should rename the account using the
ALTER LOGIN command.


Members of the sysadmin role can perform any action within the instance and cannot
be prevented from executing any command. Members of the db_owner role can
perform any action within the given database and cannot be prevented from executing
any command within the database.
Loginless users, created as a replacement to application roles, are users in a database
that are not mapped to a login.

Lesson Review
“Creating Principals.” The questions are also available on the companion CD if you prefer to

NOTE
E ANSWERS

1. Wide World Importers has a Windows Server 2003 domain and all the servers running
SQL Server are running on Windows Server 2003 Enterprise edition. The SQL Server
instance is conﬁgured for Windows-only authentication. Database roles have been
created for each group of permissions within a database. Logins are added to the
database roles. The DBAs want to move the security assignment of users to the owners
of each application without giving up control of the accounts or permissions inside
the SQL Server instance. How can the DBAs accomplish their goals? (Choose two. Each
answer represents part of the solution.)
A. Have the Windows administrator allow application owners to manage the
Windows groups associated to their applications.
B. Add the logins for application owners to the securityadmin role.
C. Map SQL Server logins to the Windows group corresponding to each application.
D. Add the logins for application owners to the sysadmin role.
2. Tina needs to be able to back up databases on an instance without also having the
authority to restore or access the contents of the database. How would you accomplish
this business requirement with the least amount of effort?
A. Add Tina to the diskadmin role.
B. Add Tina to the db_owner role.
C. Add Tina to the db_backupoperator role.
D. Add Tina to the sysadmin role.


Lesson 4: Managing Permissions
SQL Server denies access by default. Therefore, to access any object or perform any action,
you must be granted permission. In this lesson, you learn how to manage permissions on
the objects within an instance or database, which are called securables. You learn how to
impersonate a user to verify that permissions are set properly. Finally, you learn how to create
and manage master keys so that you can use signatures to elevate permissions only when
code is executing.

to:
Assign permissions to a user
Control permissions based on a scope
Understand the effects of metadata security
Work with ownership chains
Impersonate a login or user
Create and manage master keys
Create signatures and sign modules


Administrative Accounts

A dministrative accounts hold a special position within the SQL Server security
structure. Accounts that are considered administrative accounts are:

Members of the sysadmin role ﬁxed server role
Members of the db_owner ﬁxed database role
r
The sa account
In addition, members of the sysadmin role are members of the db_owner role in
r
every database within the instance.

You can prevent an account from performing an action by removing the
corresponding permission. You cannot limit the permissions of an administrative
account. Although you can execute commands to remove permissions, the
command does not have any effect because SQL Server does not check permissions
for an administrative account.

Lesson 4: Managing Permissions CHAPTER 11 271

Securables
Permissions would be rather uninteresting unless you had something to apply the permissions
to. In addition, there would be no need to have permissions if no one existed to use the
permission. Permissions work in concert with securables and principals. You GRANT/REVOKE/
DENY <permissions> ON <securables> TO <principals>.
Securables are the objects on which you grant permissions. Every object within SQL Server,
including the entire instance, is a securable. Securables also can be nested inside other
securables. For example, an instance contains databases; databases contain schemas; and
schemas contain tables, views, procedures, functions, and so on.

Schemas
Every object created within a database cannot exist without an owner. All objects must have an
owner because objects cannot spontaneously come into existence; rather, they must be created
by someone. In addition, for any account to access an object, permission has to be assigned,
and you need at least one user with the authority to manage permissions on an object.
Because objects ultimately have to be owned by a user, you can create a management
problem when you need to remove a user from a database. If database users directly owned
objects, it would not be possible to drop a user unless you reassigned the objects to a different
owner. Reassigning an object to a different owner would change the name of the object.
Schemas provide the containers that own all objects within a database and in turn,
a schema is owned by a database user. By introducing a schema between users and
objects, you can drop a user from the database without affecting the name of an object or
applications that use the object. Schemas are the only objects that are directly owned by a
database user, so to drop a user that owns a schema, you must ﬁrst change the ownership of
the schema to another user.

Permissions
Permissions provide the authority for principals to perform actions within an instance or
database. Some permissions apply to a statement such as INSERT, UPDATE, and SELECT, and
other permissions apply to an action such as ALTER TRACE, and still others encompass a
broad scope of authority such as CONTROL.
You add permissions to an object with the GRANT statement. Access to an object is
prevented with the DENY statement. To access an object, permission must be granted explicitly.
Each time you issue a GRANT statement, SQL Server places an entry in a security table for the
corresponding permission granted. Each time you issue a DENY, an entry is placed in a security
table for the DENY. Because a DENY overrides any other permission, a DENY overrides a GRANT.
The REVOKE statement removes permission entries for the object referenced. For example,
if you issue a GRANT SELECT ON Person.Address TO Test, you can remove the access by
executing REVOKE SELECT ON Person.Address FROM Test. Similarly, if you issue DENY SELECT
ON Person.Address TO Test, you can remove the DENY by executing REVOKE SELECT ON
Person.Address FROM Test.

You can also grant permissions at multiple levels; for example, you might grant SELECT
permission on the AdventureWorks database, the Person schema, and directly to the Person.
Address table. To prevent the user from accessing the Person.Address table, you can then
issue three REVOKE statements—database, schema, and table—to remove the SELECT access
on the table.

Permission Scope
Prior to SQL Server 2005, you granted all permissions directly to objects within a database.
SQL Server 2005 and later define multiple scopes to which you can assign permissions.
A securable can be a database, schema, or an object. Because you grant permissions on a
securable, you can assign permissions to a securable at any scope. Granting permission on
a database causes the permission to be granted implicitly to all schemas within the database
and thereby to all objects within all schemas. Granting permission on a schema causes the
permission to be granted implicitly to all objects within a schema.

IMPORTANT
T PERMISSION SCOPE
You can assign permissions to any securable. By using higher-level containers such as
databases and schemas, you can assign permissions very flexibly. Although you can assign
all permissions directly to the lowest-level objects, if a user needs the same permission to
access all objects in a schema or database, you can replace dozens or even thousands of
separate permissions by granting the permission on the schema or database instead.

A schema is the first layer of security within a database that you should plan for and take
advantage of. A schema should represent a functional grouping within an application, such
as Customers, Products, Inventory, and HumanResources. You then create objects within the
corresponding schema and grant permissions on the schemas to provide security access to an
application.
For example, if you want to grant SELECT, INSERT, UPDATE, and DELETE permissions on
all tables and views within a database, you can accomplish the permission assignment three
different ways:
Grant permissions on each table and view
Grant permissions on each schema within the database
Grant permissions on the database

Metadata Security
In your everyday life, you take it for granted that many things are hidden from you because
you don’t have the authority to use them. It should not be a surprise to find out that
SQL Server follows the same principle of “out of sight, out of mind.” SQL Server secures all
the metadata within the system such that you can view only the objects within an instance or
database on which you have permissions to perform an action.


If you need to allow users to view metadata in a database, you can execute the
following code:

GRANT VIEW DEFINI TION TO <user>

If you grant VIEW ANY DEFNITION to a login, the login can view metadata for any object
within the instance. The VIEW ANY DATABASE allows a login to see the existence of databases
within the instance, even databases the login does not have access rights to. For a login to see
execution statistics, such as sys.dm_exec_requests, you need to grant VIEW SERVER STATE to
the login.

Ownership Chains
Each object within a database has an owner associated to it—the schema owner. You can
also build objects that reference other objects within a database, such as stored procedures
that call functions which issue SELECT statements against views that are based on tables. The
owner of each object that is referenced in a calling stack forms an ownership chain as the
code transits from one object to the next within the calling stack. So long as the owner of
the object and any other objects that it references have the same owner, you have an intact
ownership chain. SQL Server checks your permissions on an object at the top of the calling
stack, as well as each time the object owner changes within a calling stack.
By using ownership chains, stored procedures become your most powerful security
mechanism within your database. Applications can be built to call stored procedures, which
can accomplish all the data manipulation required by the application. However, users are
never granted direct access to the underlying tables; therefore, the only actions that can be
performed are the actions allowed by the stored procedure.

BEST PRACTICES
S APPLICATION APIS
It is interesting to note that many developers seem to argue about calling stored
procedures in their applications. Instead, they want to embed SQL directly into the
application. But none of the developers you work with would ever think of writing an
application as just a bunch of embedded code. Rather, developers spent large amounts
of time constructing objects that have interfaces and then building applications by
connecting objects via their interfaces. This development style allows multiple developers
to work on a complex application, even when dependent code has not been completed.
Stored procedures perform the same function as the APIs that developers use within
every application. A stored procedure is nothing more than an API to the database, which
means that developers do not even need to know the structure of the database.

If you have a calling stack with different object owners and the user has not been granted
permission for each object within the calling stack where the owner changes, you have
produced a broken ownership chain. It is a common misconception that a broken ownership
chain represents a design ﬂaw in your database. There are situations, such as auditing, where

you want to break the ownership chain deliberately to ensure that users cannot access any of
the code used to audit actions or any of the audit data that is stored. However, to bridge the
gap created by a broken ownership chain, you need to use signatures, which will be discussed
at the end of this lesson.

IMPORTANT
T OBJECT OWNER
Although schemas contain all objects within a database, SQL Server considers the owner of
the schema to be the owner of every object within the schema when determining owner-
ship chains.

Impersonation
You can impersonate another principal to execute commands in a specific user context.
To impersonate another principal, you must have the IMPERSONATE permission granted to
your account on the principal that you want to impersonate. If IMPERSONATE permission
is assigned on a login, you can impersonate the login and execute under that principal’s
authority in any database to which the principal has access. If IMPERSONATE permission
is assigned on a database user, you can execute under the user’s context only within that
database.
You accomplish impersonation by using the EXECUTE AS statement as follows:

{ EXEC | EXECUTE ] AS <context_specification>

<context_specification>::=
{ LOGIN | USER } = 'name'
[ WITH { NO REVERT | COOKIE INTO @varbinary_variable } ]
| CALLER

EXAM TIP
To create a schema owned by another database principal, the user creating the schema
must have IMPERSONATE permission on the principal being designated as the schema
owner.

So long as you have not specified the NO REVERT clause for EXECUTE AS, you can return
to the previous execution context by executing a REVERT.

Master Keys
Master keys provide the basis for the encryption hierarchy within SQL Server and are
also required to before you can create a certificate or asymmetric key. You have a single
service master key for the entire instance along with a database master key within each
database.

Service Master Key
Each instance of SQL Server has a service master key that is generated automatically the first time
the instance is started. Service master keys are symmetric keys generated from the local machine
key and encrypted using the SQL Server service account by the Windows Data Protection API.
The generation and encryption process ensures that the service master key can be
decrypted only by the service account under which it was created or by a principal with access
to the service account credentials. By default the service master key is used to encrypt any
database master key that is created within the instance.

Database Master Key
A database master key must be generated explicitly using the following command:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘<StrongPasswrd>’
Each database has a different master key, ensuring that a user with access to decrypt
data in one database cannot also decrypt data in another database without being granted
permission to do so.
The database master key is used to protect any certificates, symmetric keys, or asymmetric
keys that are stored within a database. The database master key is encrypted using Triple DES
and the user-supplied password. A copy of the database master key is also encrypted using the
service master key such that automatic decryption can be accomplished within the instance.
When you make a request to decrypt data, the service master key is used to decrypt the
database master key, that is used to decrypt a certificate, symmetric key, or asymmetric key,
and in turn is used to decrypt the data.
The reason this hierarchy is important is that you must be careful when moving backups
containing encrypted data between SQL Server instances. To restore and be able to decrypt data
successfully, you must also back up the database master key and then regenerate the database
master key on the other instance. To perform this process, you need to use the OPEN MASTER
KEY, BACKUP MASTER KEY, RESTORE MASTER KEY, and CLOSE MASTER KEY commands.

IMPORTANT
T MASTER KEYS
You will learn more about data encryption in Lesson 6, “Encrypting Data.” However, a
database master key is required to create a certificate that is the basis of a signature.

Certificates
Certificates are keys based on the X.509 standard that are used to authenticate the credentials
of the entity supplying the certificate. You can create either public or private certificates.
A public certificate is essentially a file that is supplied by a certificate authority that validates
the entity using the certificate. Private certificates are generated by and used to protect data
within an organization. For example, the public certificate used by your bank’s Web site is
used to prove the bank’s Web site is valid as well as encrypting the data transmitted between
your browser and the bank’s servers.

To create a self-signed certificate in SQL Server, you use the following command:

CREATE CERTIFICATE certificate_name [ AUTHORIZATION user_name ]
{ FROM <existing_keys> | <generate_new_keys> }
[ ACTIVE FOR BEGIN_DIALOG = { ON | OFF } ]

<existing_keys> ::=
ASSEMBLY assembly_name | {
[ EXECUTABLE ] FILE = 'path_to_file'
[ WITH PRIVATE KEY ( <private_key_options> ) ] }

<generate_new_keys> ::=
[ ENCRYPTION BY PASSWORD = 'password']
WITH SUBJECT = 'certificate_subject_name'
[ , <date_options> [ ,...n ] ]

<private_key_options> ::=
FILE = 'path_to_private_key'
[ , DECRYPTION BY PASSWORD = 'password' ]
[ , ENCRYPTION BY PASSWORD = 'password' ]

<date_options> ::=
START_DATE = 'mm/dd/yyyy' | EXPIRY_DATE = 'mm/dd/yyyy'

Signatures
Signatures allow you to elevate a user’s permission but to provide a restriction such that the
elevation occurs only when the user is executing a specific piece of code.
You can add a digital signature to a module—stored procedures, functions, triggers, and
assemblies—by using the ADD SIGNATURE command. The process to sign code digitally to
manage permissions is as follows:
1. Create a database master key.
2. Create a certificate in the database.
3. Create a user mapped to the certificate.
4. Assign permissions on an object or objects to the user.
5. Execute ADD SIGNATURE on a module by the certificate.
One of the most useful places to employ a signature is to bridge the gap in a broken
ownership chain.
For example, you could construct logic to audit user actions in the database and ensure
that users cannot access the audit data directly by implementing a broken ownership chain.
The code that logs the user activity could then be digitally signed, allowing the audit action
to occur within the context of a user’s transaction while still preventing the user from directly
accessing the audit tables.

Q
Quick Check
1 . How are principals, securables, and permissions related?

2. What is an ownership chain, and how can you have a broken ownership chain?

Quick Check Answers
1 . You can GRANT, REVOKE, or DENY permissions ON a securable TO a principal.

2. An ownership chain applies to objects that reference other objects within a
database. The owner of the schema that contains the object is considered the
owner of the object. SQL Server checks permissions for the ﬁrst object that you
access, as well as each time the owner changes within the calling stack. The chain
of object owners within a calling stack is called an ownership chain. You have a
broken ownership chain when the object owner changes within a calling stack
and you have not been granted sufﬁcient permissions to continue accessing
objects within the call stack.

PR ACTICE Managing Permissions

In the following practices, you view the effect of metadata security within a database as you
grant permissions on objects at various scopes. You also investigate ownership chains and use
signatures to allow access to an object through a stored procedure while not being able to
access the same object directly.

PR ACTICE 1 Assigning Object Permissions
In this practice, you assign object permissions at various scopes and view the effect on metadata
security by using impersonation.
1. Execute the following code to verify your user context:
SELECT SUSER_SNAME(), USER_NAME()
GO

2. Change context to the AdventureWorks database and view the list of objects:
USE AdventureWorks
GO

--View the list of objects in the database
SELECT * FROM sys.objects
GO

3. Impersonate the Test user and view the list of objects:

EXECUTE AS USER = 'Test'
GO
GO


GO
REVERT
GO
GO

4. Grant SELECT permission on the Production.Document table to the Test user and view
the results:

GRANT SELECT ON Production.Document TO Test
GO

GO
SELECT DocumentNode, Title, FileName FROM Production.Document
REVERT
GO

5. Grant SELECT permission on the Production schema and view the results:

--Schema scoped permission
GRANT SELECT ON SCHEMA::Production TO Test
GO
GO
REVERT
GO

6. Grant SELECT permission on the entire AdventureWorks database and view the results.
Notice that even though the user now has SELECT permission on the entire database,
there are still objects that are visible only to the database owner:

GRANT SELECT ON DATABASE::AdventureWorks TO Test
GO
GO
REVERT
GO

7. Remove the ability to view object metadata and review the results:

DENY VIEW DEFINITION TO Test
GO
GO


SELECT DocumentNode, Title, FileName FROM Production.Document
REVERT
GO

8. Restore the ability to view object metadata:

REVOKE VIEW DEFINITION FROM Test
GO
GO
REVERT
GO

9. Remove SELECT permission from the database. Notice that the user can still view the
contents of the Production schema:

REVOKE SELECT ON DATABASE::AdventureWorks FROM Test
GO
GO
REVERT
GO

10. Remove SELECT permission on the schema. Notice that the user can still view the
Production.Document table and objects directly associated to the table:

REVOKE SELECT ON SCHEMA::Production FROM Test
GO
GO
REVERT
GO

11. Remove SELECT permission on the table. Notice that you have ﬁnally removed the Test
user’s access to the Production.Document table:

REVOKE SELECT ON Production.Document FROM Test
GO
GO
REVERT
GO


PR ACTICE 2 Creating and Managing Master Keys
In this practice, you create a database master key along with a database user based on a
certificate. You also learn how to back up a certificate.

1. Create a master key in the AdventureWorks database:

USE AdventureWorks
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<EnterStrongPasswordHere>'
GO

2. Back up the database master key and store it in a secure location away from your
database backups:

OPEN MASTER KEY DECRYPTION BY PASSWORD = '<EnterStrongPasswordHere>'

BACKUP MASTER KEY TO FILE = 'C:Program Files
Microsoft SQL ServerMSSQL10.MSSQLSERVER
MSSQLBackupawmasterkey.key'
ENCRYPTION BY PASSWORD = '<EnterStrongPasswordHere>'
GO

3. Create a certificate:

CREATE CERTIFICATE TestCert WITH SUBJECT = 'Test Certificate'
GO

4. Back up the certificate and store in a secure location away from your database
backups:

BACKUP CERTIFICATE TestCert TO FILE = 'C:Program Files
Microsoft SQL ServerMSSQL10.MSSQLSERVER
MSSQLBackuptestcert.cer'
GO

5. Create a database user mapped to the certificate:

CREATE USER CertUser FROM CERTIFICATE TestCert
GO

PR ACTICE 3 Adding a Signature to a Module to Bridge an Ownership Chain
In this practice, you purposely implement a broken ownership chain and then use signatures
to access objects to which your account would not normally have access.

1. Create a schema and test objects that create a broken ownership chain:

CREATE SCHEMA SignatureTest AUTHORIZATION Test2
GO
CREATE TABLE SignatureTest.TestTable

(ID INT IDENTITY(1,1),
Col1 VARCHAR(10) NOT NULL)
GO
INSERT INTO SignatureTest.TestTable
(Col1)
VALUES ('Row1'), ('Row2')
GO
--Create a procedures to access test table
CREATE PROCEDURE SignatureTest.asp_Proc1
AS
SELECT ID, Col1 FROM SignatureTest.TestTable
GO

--Create a stored procedure call stack
CREATE PROCEDURE dbo.asp_SignatureTest
AS
EXEC SignatureTest.asp_Proc1
GO
--Grant execute permissions on the outer stored procedure
GRANT EXECUTE ON dbo.asp_SignatureTest TO Test
GO

2. Test the stored procedure:

EXEC dbo.asp_SignatureTest
REVERT
GO

3. Grant execute permissions to the certiﬁcate-mapped user to the inner stored
procedure and add a digital signature to the outer procedure:

GRANT EXECUTE ON SignatureTest.asp_Proc1 TO CertUser
GO
--Sign the procedure with the certificate
ADD SIGNATURE TO dbo.asp_SignatureTest BY CERTIFICATE TestCert
GO

4. Test the procedure execution:

EXEC dbo.asp_SignatureTest
REVERT
GO

5. Verify the user cannot execute the inner stored procedure directly:

EXEC SignatureTest.asp_Proc1
REVERT
GO


6. Verify the user cannot access the table directly:

SELECT ID, Col1 FROM SignatureTest.TestTable
REVERT
GO

7. Verify that you cannot impersonate the user mapped to the certificate:

EXECUTE AS USER = 'CertUser'
GO

Lesson Summary
You GRANT permissions ON a securable TO a principal.
An instance, a database, and a schema are all securables. Assigning a permission at
a database or schema scope applies to all objects contained within the database or
schema.
All metadata within SQL Server is secured. If you have not been granted permission on
an object, you do not even see the object.
You can impersonate a login or database user with the EXECUTE AS statement.
You cannot impersonate a principal that has been mapped to a certificate or
asymmetric key.
A service master key is created when the instance is first started. A database master key
must be created explicitly within each database and is required to create a certificate,
an asymmetric key, or a symmetric key.
Digital signatures can be applied to a code module through the ADD SIGNATURE
statement to provide a means to escalate permissions only when you execute a
specified module without allowing direct access to the underlying objects.

Lesson Review
“Creating Permissions.” The questions are also available on the companion CD if you prefer to

NOTE
E ANSWERS


1. Wide World Importers has just implemented a new order inquiry system. All users with
access to the database need to be able to issue a SELECT statement against any table
within the database. How can you accomplish this functionality with the least amount
of effort?
A. Add the users to the db_datawriter database role.
B. Grant the users SELECT permission on every table in the database.
C. Grant the users SELECT permission on the database.
D. Grant the users SELECT permission on every schema in the database.
2. Which statement prevents users from viewing metadata about objects in a single
database, even if the user has access to the objects?
A. DENY VIEW DEFINITION
B. DENY VIEW ANY DEFINITION
C. DENY VIEW SERVER STATE
D. REVOKE VIEW DEFINITION


Lesson 5: Auditing SQL Server Instances
After granting the minimum permissions required for the completion a task, you will deal with
the second security principle—“Trust, but verify.” In this lesson, you learn about the auditing
capabilities available within SQL Server 2008.

to:
Create DDL triggers
Configure instance and database audit specifications
Implement C2 auditing


DDL Triggers
FL addition to CREATE, DROP, and ALTER actions, DDL triggers allow you to trap and
respond to login events. You can scope DDL triggers at either an instance or a database level.
The generic syntax for creating a DDL trigger is as follows:

CREATE TRIGGER trigger_name
ON { ALL SERVER | DATABASE }
[ WITH <ddl_trigger_option> [ ,...n ] ]
{ FOR | AFTER } { event_type | event_group } [ ,...n ]
AS { sql_statement [ ; ] [ ,...n ] |
EXTERNAL NAME < method specifier > [ ; ] }

Trigger on a LOGON event (Logon Trigger)
CREATE TRIGGER trigger_name
ON ALL SERVER
[ WITH <logon_trigger_option> [ ,...n ] ]
{ FOR | AFTER } LOGON
AS { sql_statement [ ; ] [ ,...n ] |
EXTERNAL NAME < method specifier > [ ; ] }

You use the ON clause to scope a trigger as either instance-level (ON ALL SERVER), or
database-level, ON DATABASE. You specify the DDL event or event group that the trigger fires
upon within the FOR clause.
DDL triggers fire within the context of the DDL statement being executed. In addition to
obtaining information about the command that was executed, DDL triggers allow you to
prevent many DDL actions. If you execute a ROLLBACK TRANSACTION within the DDL trigger,
the DDL statement that was executed rolls back because almost every DDL statement is
transactional and automatically executes within the context of a transaction.
Not all DDL statements execute within the context of a transaction. ALTER DATABASE can
make changes to the database, but it can also make changes to the file structure underneath

Lesson 5: Auditing SQL Server Instances CHAPTER 11 285

the database. Because the Windows operating system is not transactional, you cannot roll
back an action against the file system. To provide consistent behavior, the ALTER DATABASE
command executes outside the scope of a transaction. You can still fire a DDL trigger ON
ALTER DATABASE; however, the trigger is only able to audit, not prevent.

EXAM TIP
An important feature of DDL triggers is the ability to roll back an action. The Policy-Based
Management Framework creates DDL triggers for all policies that you configure to prevent
an out-of-compliance situation.

SQL Server provides a grouping mechanism for all DDL events within an instance. You
could create a DDL trigger to fire for the CREATE, DROP, or ALTER of a table or you could
specify the corresponding event group—DDL_TABLE_EVENTS.

MORE INFO EVENT GROUPS
For more information about event groups, please refer to the Books Online article “DDL
Event Groups” at http://technet.microsoft.com/en-us/library/bb510452.aspx.

Within the execution context of the DDL trigger, you have access to a special function,
EVENTDATA(), that provides information about the DDL action. EVENTDATA() returns an
Extensible Markup Language (XML) document with a structure that depends upon the event.

MORE INFO EVENTDATA() SCHEMAS
The XML schema available for an event is documented at http://schemas.microsoft.com/
sqlserver/2006/11/eventdata.

Audit Specifications
Prior to SQL Server 2008, you had to use multiple features to perform the full array of
auditing for an instance. DDL triggers would audit DDL changes; data manipulation language
(DML) triggers would audit data changes at the cost of increasing transaction times; SQL
Trace would audit SELECT statements.
SQL Server 2008 combines all the auditing capabilities into an audit specification. Audit
specifications begin with a server-level audit object that defines the logging location for the
audit trail. You then create server and database audit specifications tied to the audit object.
The general syntax for creating a server audit object is:

CREATE SERVER AUDIT audit_name
TO { [ FILE (<file_options> [, ...n]) ] |
APPLICATION_LOG | SECURITY_LOG }
[ WITH ( <audit_options> [, ...n] ) ] }[ ; ]

<file_options>::=
{FILEPATH = 'os_file_path'
[, MAXSIZE = { max_size { MB | GB | TB } | UNLIMITED } ]
[, MAX_ROLLOVER_FILES = integer ]
[, RESERVE_DISK_SPACE = { ON | OFF } ] }

<audit_options>::=
{ [ QUEUE_DELAY = integer ]
[, ON_FAILURE = { CONTINUE | SHUTDOWN } ]
[, AUDIT_GUID = uniqueidentifier ]}

If you specify a file to log an audit trail to, you can specify the maximum size of a single audit
file, as well as how many rollover files should be retained on the operating system. In addition,
you can preallocate disk space for the audit log instead of having the file grow as audit rows are
added.
Logging messages occurs either synchronously or asynchronously. When QUEUE_DELAY = 0,
audit records are sent to the audit log synchronously with the transaction. If you specify a delay
time (in milliseconds), audit records can be accumulated, but they still must be written within
the specified interval.
The ON_FAILURE action controls how the instance behaves if audit records cannot
be written. The default option is CONTINUE, which allows the instance to continue
running and processing transactions. If you specify a value of SHUTDOWN, if the audit
record cannot be written to the log within the specified QUEUE_DELAY interval, the
instance is shut down.
After a server audit object has been established, you can add one or more specifications to
the audit. If you want to audit actions that occur at an instance level, you create a server audit
specification with the following general syntax:

CREATE SERVER AUDIT SPECIFICATION audit_specification_name
FOR SERVER AUDIT audit_name
{ { ADD ( { audit_action_group_name } ) } [, ...n]
[ WITH ( STATE = { ON | OFF } ) ]}[ ; ]

If you want to audit events specific to a database, you create a database audit specification
with the following general syntax:

CREATE DATABASE AUDIT SPECIFICATION audit_specification_name
{ [ FOR SERVER AUDIT audit_name ]
[ { ADD ( { <audit_action_specification> |
audit_action_group_name } )
} [, ...n] ]
[ WITH ( STATE = { ON | OFF } ) ]}[ ; ]

<audit_action_specification>::=
{action [ ,...n ]ON [ class :: ] securable BY principal [ ,...n ]}

MORE INFO AUDIT EVENTS
For a list of the event classes and event groups that can be audited, please refer to the SQL
Server Books Online article, “SQL Server Audit Action Groups and Actions,” at http://technet
.microsoft.com/en-us/library/cc280663.aspx.

C2 Auditing
C2 auditing is a U.S. Department of Defense audit specification that can be enabled by
executing the following code:

sp_configure 'c2 audit mode', 1

C2 auditing has been superseded by the Common Criteria specification developed by the
European Union. Whether you are complying with C2 or Common Criteria, with respect to
SQL Server, the audit result is essentially the same. You need to audit every successful and
unsuccessful attempt to access a database object.
When C2 auditing is enabled, an audit log file is written to the default data directory
with a rollover size of 200 megabytes (MB). SQL Server continues to generate rollover files
until you run out of disk space, thereby causing the instance to shut down. With C2 auditing
enabled, the audit records are required to be written. If the system is too busy, user requests
are aborted to free up resources to write the audit trail.

CAUTION
N AUDITING IMPACT
You must be very careful when implementing C2 auditing. Be sure to check that a lower
level of auditing does not meet your requirements. When you enable C2 auditing, you
have made the decision that the audit is more important than a transaction. If the system
becomes too busy, SQL Server aborts a user transaction to write audit information.

Q
Quick Check
1 . Which object can be used to audit as well as prevent most object changes?

2. Which object is required before you can create a server or database audit
specification?

Quick Check Answers
1 . DDL triggers can audit any DDL command. If the DDL command executes within
a transaction, a DDL trigger can be used to roll back the DDL and prevent the
change from occurring.

2. You must create a server audit object before a server or database audit
specification can be created.


PR ACTICE Creating a Database Audit Specification

In this practice, you create a database audit specification to audit a SELECT, INSERT, UPDATE,
or DELETE statement that a user with db_owner permission executes against the confidential
data contained in the payroll history table.
1. Execute the following code to create the server audit object:

USE MASTER
GO
CREATE SERVER AUDIT RestrictedAccessAudit
TO APPLICATION_LOG
WITH ( QUEUE_DELAY = 1000, ON_FAILURE = CONTINUE);
GO

2. Execute the following code to create the database audit specification:

USE AdventureWorks
GO
CREATE DATABASE AUDIT SPECIFICATION EmployeePayrollAccess
FOR SERVER AUDIT RestrictedAccessAudit
ADD (SELECT, INSERT, UPDATE, DELETE
ON HumanResources.EmployeePayHistory
BY dbo)
WITH (STATE = ON);
GO

3. Execute the following code to enable the audit:

USE MASTER
GO
ALTER SERVER AUDIT RestrictedAccessAudit
WITH (STATE = ON);
GO

4. Expand the Security node in Object Explorer and review the server audit object named
RestrictedAccessAudit underneath the Audits node.
5. Expand the Security node and then the Database Audit Specifications node in Object
Explorer underneath the AdventureWorks database and review the properties of the
database audit specification named EmployeePayrollAccess that you created.
6. Execute the following code to test the database audit:

USE AdventureWorks
GO
SELECT * FROM HumanResources.EmployeePayHistory
GO

7. Right-click the server audit object and select View Audit Logs to review the results of
the audit, as shown here.


8. Disable the server audit by executing the following code:

USE MASTER
GO
ALTER SERVER AUDIT RestrictedAccessAudit
WITH (STATE = OFF);
GO

9. Review the audit log and note that the disable of the audit has been logged.

Lesson Summary
DDL triggers can be created to fire when specific DDL events or events within a group
are executed.
If the DDL event executes within the context of a transaction, you can use a DDL
trigger to prevent the action from occurring.
CREATE SERVER AUDIT creates an instance of an audit object.
After you create an audit object, you can hook server and database audit specifications
to the audit object in order to centrally manage auditing.


Lesson Review
“Auditing SQL Server Instances.” The questions are also available on the companion CD if you

NOTE
E ANSWERS

1. The Human Resources (HR) director at Contoso needs to ensure that only authorized
users are accessing employee pay records. What do you need to implement to satisfy
these auditing needs?
A. Database audit specification
B. A DDL trigger
C. A DML trigger
D. Server audit specification
2. The database administrators at Fabrikam have implemented log shipping for the
Orders database. To ensure that log shipping cannot break, you need to prevent
anyone from changing the recovery model of the database to Simple. How can you
accomplish this task?
A. A DDL trigger.
B. A DML trigger.
C. You can’t prevent the change of the recovery model.
D. Server audit specification.


Lesson 6: Encrypting Data
Data that must remain confidential, even from a user that has SELECT permission on a table,
should be encrypted. In this lesson you learn about the encryption infrastructure provided by
SQL Server 2008 and how to apply encryption to your data.

to:
Encrypt data using a hash algorithm
Encrypt data using symmetric keys
Encrypt data using asymmetric keys or certificates
Enable transparent database encryption


Data Encryption
Data that needs to remain confidential within the database (such as credit card numbers)
should be encrypted. After it’s encrypted, the data cannot be read without having the proper
credentials. In addition, encrypted columns cannot be used as search arguments or as columns
within an index because each action would defeat the purpose of encrypting the data.
Columns can be encrypted using a hash, passphrase, symmetric key, asymmetric key, or
a certificate. Symmetric keys are commonly used since a symmetric key provides the best
balance between securing data and performance. Asymmetric keys and certificates provide
the strongest encryption and decryption method.

Preventing Access to Objects and Data

S ecuring a database is not an exercise in guaranteeing that objects can’t be
accessed. If an asset is valuable enough and enough time is available, an attacker
can always get to it. Security is an exercise in making a system more difficult to get
into than the reward that would be gained in the attempt.

In addition, administrators have full control over a system for a reason, to provide
the authority to manage a system. Permissions are not checked for a user with
administrative access, so you can’t restrict the actions an administrator can
take. An administrator has access to the internal structures necessary to retrieve
any encryption keys that may be in use. Therefore, it is impossible to prevent
access from an administrator. SQL Server has a powerful, multilayered security
infrastructure; it is not a digital rights management system.


Securing objects also reduces the functionality available for those objects.
In particular, an encrypted column can’t be indexed, and you can’t search on
the contents of an encrypted column. One of the dumbest articles I’ve found
published explained how you could search on an encrypted column. The solution
to the problem involved placing a column in the table that contained the
unencrypted data so that you could search on the column while also retaining the
encrypted column. If you are going to store the data in an unencrypted format,
it is pointless to encrypt the data as well, especially within the same table. In
addition, as soon as you allow a user to search on encrypted data, you have just
enabled an attacker to use very simple dictionary attacks to reverse-engineer your
encrypted information.

Hash Algorithms
Encryption algorithms are either one-way or two-way. Two-way algorithms allow you to
encrypt and decrypt data. One-way algorithms only encrypt data, without any ability to
decrypt. A hash algorithm is a one-way algorithm that allows you to encrypt data but does
not allow decryption.

IMPORTANT
T TRANSMITTING AND STORING PASSWORDS
It is a common misconception that passwords are sent to SQL Server in plaintext and that
SQL Server decrypts the password stored to verify if the submitted password matches. SQL
Server uses an MD5 hash to handle passwords. When a password is specified for an object,
SQL Server applies an MD5 hash and stores the hash value. When you specify a password to
access an object, the password is hashed using the same MD5 hash, the hashed password
is transmitted in a secure channel, and the hash value transmitted is compared to the hash
value stored. Even an administrator who is running a trace cannot access the password.

SQL Server allows you to specify five different hash algorithms—SHA, SHA1, MD2, MD4,
and MD5. MD5 is the algorithm of choice because it provides stronger encryption than the
other algorithms. Hash algorithms are also platform-agnostic. You could hash a value within
PHP on a Linux system and receive the same value as if you hashed the same value within SQL
Server, so long as you used the same algorithm.
Hash algorithms are vulnerable to brute force attacks. If the range of values that you
are seeking to encrypt is small, an attacker can easily generate all the possible hashes for
the range of possible values. After generating these hashes, the attacker needs to compare
the hash values to find a match and thus reverse-engineer your data. For example, birth
dates, salaries, and credit card numbers would not be good choices to encrypt using a hash
algorithm.

Lesson 6: Encrypting Data CHAPTER 11 293

Salting a Hash
So long as the range of possible values is small, a hash algorithm is very easy to defeat with a
brute force attack. However, you can increase the complexity for an attacker dramatically by
implementing an encryption technique called salting.
A salt is a string of one or more characters that are added to the value before hashing.
Even adding a single character can defeat most attacks, so long as you are adding a salt that
actually increases the complexity. For example, you could append a zero to the end of a salary,
but you would not have increased the complexity nor made it more difficult for an attacker
to break with a brute force attack. However, if you were to use a single letter as the salt value,
you have made the brute force attack significantly more difficult. Even if the attacker knew
you were adding only a single letter, the English language provides 52 additional possibilities
(uppercase and lowercase letters). In addition, you could have added the character at either
the beginning or end, turning a simple problem into 104 different possibilities for each
possible salary value. If you account for the fact that you could have inserted the letter
anywhere within the salary, the range of possibilities for each salary value would require more
effort than just about any attacker would be willing to expend.

Symmetric Keys
Symmetric keys utilize a single key for both encryption and decryption. Because only a single
key is needed to encrypt and decrypt data, symmetric key encryption is not as strong as
asymmetric key or certificate-based encryption. However, symmetric keys provide the best
possible performance for routine use of encrypted data.

Certificates and Asymmetric Keys
Certificates and asymmetric keys are based on the X.509 standard and are essentially equivalent
in their application. Asymmetric keys are generated by a key server within an organization and
cannot be backed up or moved from one system to another. Certificates can be backed up to
and restored from a file, allowing you to move databases that are encrypted while being able to
re-create the certificate to access your data.

Transparent Data Encryption
Passphrases and encryption keys can be used by an application to encrypt data deliberately.
However, to use the data, you must apply special routines to decrypt data. Although
encrypting selective data is possible and manageable, encrypting the entire contents of a
database is generally prohibitive.
Unless the data is encrypted, an attacker can read data directly from the database files on
disk. Although the information is not easy to read, data is stored on disk in a plaintext format
that can be viewed within any text editor.
To prevent the theft of data as it resides on disk or within a backup, SQL Server 2008
introduces Transparent Data Encryption (TDE). TDE provides real-time encryption and


decryption services to ensure that data within the files and backups is encrypted. In addition,
SQL Server transparently encrypts and decrypts the data so that applications do not have to
be recoded to take advantage of the encryption.

REAL WORLD
Michael Hotek

W e’ve all seen the news articles over the last several years regarding data
thefts from a variety of organizations. As is human nature, we all assume that
such a theft could not possibly happen to us.

I was recently working with a major bank that was struggling with the implementation
of increased levels of security. They had encrypted sensitive data within columns.
However, under increasing regulatory scrutiny, a third-party audit identified the
backups as a weak point in the security implementation, even though sensitive data
was encrypted. The database still contained confidential data and the table structures
also provided useful information for an attacker. The business decided that the entire
contents of the backup needed to be encrypted.

After a three-month evaluation period, they had narrowed the list down to four
vendors who had solutions to meet their needs. Unfortunately, only one vendor could
encrypt the data before it was written to disk, leaving the backup still vulnerable.
In addition, the company was looking at a very large software expenditure to
purchase the necessary licenses. Because the company’s SQL Server licenses were
under software assurance, we upgraded the databases to SQL Server 2008 over the
weekend and implemented transparent data encryption. Not only are the backups
now encrypted, but the data and log files are as well.

The company saved a significant amount of money in the process. Three weeks
after implementation, a backup tape went missing. The tape was eventually found
and determined to simply have been mislabeled. However, had an actual data theft
occurred, the company would still have been protected.

TDE works by using an encryption key stored within the database boot record. The TDE
key is encrypted by using a certificate within the master database. In the event that an
attacker steals your data or log files, or more likely a backup of your database, the contents of
the database can’t be accessed without the certificate stored within the master database.
The process of implementing TDE on a database is as follows:
1. Create a database master key in the master database.
2. Create a certificate in the master database.


3. Create a database encryption key in the target database using the certificate in the
master database.
4. Alter the database and enable encryption.

EXAM TIP
You must back up the certificate used for TDE and store the backup in a safe location. After
you encrypt it, you cannot access your data without the certificate.

Encryption Key Management
Although SQL Server provides a variety of encryption methods, each method must be enabled
and managed within an instance. SQL Server 2008 provides the capability through Extensible
Key Management (EKM) to integrate with enterprise key management systems. Keys can be
maintained in a central location within an enterprise and exported for use within SQL Server. By
registering a key management provider to SQL Server, an instance can take advantage of all the
advanced features of hardware and software key management solutions, such as key rotation.

Q
Quick Check
1 . What object is required to implement TDE?

2. What do you need to do to a hash algorithm to increase the complexity when
the range of possible encryption values is small?

Quick Check Answers
1 . You must create a certificate in the master database that is used to encrypt the
database encryption key.

2. If the range of possible values to encrypt is small, you need to salt the hash value
in order to defeat brute force attacks.

PR ACTICE Encrypting Data

In the following practices, you apply multiple forms of encryption keys to encrypt data for an
application, as well as apply TDE to the AdventureWorks database.

PR ACTICE 1 Hashing Data
In this practice, you compare a hash algorithm for encrypting data.
1. Execute the following code and compare the results for each hash algorithm:

DECLARE @Hash varchar(100)
SELECT @Hash = 'Encrypted Text'


SELECT HashBytes('MD5', @Hash)
SELECT @Hash = 'Encrypted Text'
SELECT HashBytes('SHA', @Hash)

2. Execute the following code and note that the hash algorithm is case-sensitive:

DECLARE @Hash varchar(100)
SELECT @Hash = 'encrypted text'
SELECT HashBytes('SHA1', @Hash)
SELECT @Hash = 'ENCRYPTED TEXT'
SELECT HashBytes('SHA1', @Hash)

PR ACTICE 2 Encrypting Data with a Passphrase
In this practice, you use a passphrase to encrypt data.
1. Execute the following code and compare the results of the passphrase encryption:

DECLARE @EncryptedText VARBINARY(80)
SELECT @EncryptedText =
EncryptByPassphrase('<EnterStrongPasswordHere>','Encrypted Text')
SELECT @EncryptedText,
CAST(DecryptByPassPhrase('<EnterStrongPasswordHere>',@EncryptedText)
AS VARCHAR(MAX))

PR ACTICE 3 Encrypting Data with a Symmetric Key
In this practice, you create a symmetric key to encrypt data.
1. Execute the following code in the AdventureWorks database to create a symmetric key:

CREATE SYMMETRIC KEY TestSymmetricKey WITH ALGORITHM = RC4
ENCRYPTION BY PASSWORD = '<EnterStrongPasswordHere>'

SELECT * FROM sys.symmetric_keys

2. Execute the following code to open the symmetric key:

OPEN SYMMETRIC KEY TestSymmetricKey
DECRYPTION BY PASSWORD = '<EnterStrongPasswordHere>'

3. Execute the following code to view the data encrypted with the symmetric key:

DECLARE @EncryptedText VARBINARY(80)
SELECT @EncryptedText =
EncryptByKey(Key_GUID('TestSymmetricKey'),'Encrypted Text')
SELECT @EncryptedText, CAST(DecryptByKey(@EncryptedText) AS VARCHAR(30))

4. Execute the following code to close the symmetric key:

CLOSE SYMMETRIC KEY TestSymmetricKey
GO

PR ACTICE 4 Encrypting Data with a Certificate
In this practice, you create and use a certificate to encrypt data so that users cannot view data
they do not have permission to access.
1. Execute the following code to create a test table, two users, and permissions:
CREATE TABLE dbo.CertificateEncryption
(ID INT IDENTITY(1,1),
SalesRep VARCHAR(30) NOT NULL,
SalesLead VARBINARY(500) NOT NULL)
GO

CREATE USER SalesRep1 WITHOUT LOGIN
GO

CREATE USER SalesRep2 WITHOUT LOGIN
GO

GRANT SELECT, INSERT ON dbo.CertificateEncryption TO SalesRep1
GRANT SELECT, INSERT ON dbo.CertificateEncryption TO SalesRep2
GO

2. Create a certificate for each user as follows:

CREATE CERTIFICATE SalesRep1Cert AUTHORIZATION SalesRep1
WITH SUBJECT = 'SalesRep 1 certificate'
GO

CREATE CERTIFICATE SalesRep2Cert AUTHORIZATION SalesRep2
WITH SUBJECT = 'SalesRep 2 certificate'
GO

SELECT * FROM sys.certificates
GO

3. Insert data for each user as follows:

EXECUTE AS USER='SalesRep1'
GO
INSERT INTO dbo.CertificateEncryption
(SalesRep, SalesLead)
VALUES('SalesRep1',EncryptByCert(Cert_ID('SalesRep1Cert'), 'Fabrikam'))
REVERT
GO

GO
INSERT INTO dbo.CertificateEncryption


(SalesRep, SalesLead)
VALUES('SalesRep2',EncryptByCert(Cert_ID('SalesRep2Cert'), 'Contoso'))
REVERT
GO

4. Review the contents of the table, as well as for each user, as follows:

SELECT ID, SalesRep, SalesLead
FROM dbo.CertificateEncryption
GO

GO
SELECT ID, SalesRep, SalesLead,
CAST(DecryptByCert(Cert_Id('SalesRep1Cert'), SalesLead)
AS VARCHAR(MAX))
REVERT
GO

GO
SELECT ID, SalesRep, SalesLead,
CAST(DecryptByCert(Cert_Id('SalesRep2Cert'), SalesLead)
AS VARCHAR(MAX))
REVERT
GO

PR ACTICE 5 Implementing TDE
In this practice, you implement TDE for the AdventureWorks database.
1. Create a master key and certificate in the master database as follows:

USE master
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<EnterStrongPasswordHere>'
GO
CREATE CERTIFICATE ServerCert WITH SUBJECT = 'My Server Cert for TDE'
GO

2. Back up the certificate and private key to a file to ensure recoverability as follows:

BACKUP CERTIFICATE ServerCert TO FILE = 'C:Program FilesMicrosoft
SQL ServerMSSQL10.MSSQLSERVERMSSQLBackupservercert.cer'
WITH PRIVATE KEY (FILE = 'C:Program FilesMicrosoft SQL Server
MSSQL10.MSSQLSERVERMSSQLBackupservercert.key',
ENCRYPTION BY PASSWORD = '<EnterStrongPasswordHere>')

3. Create a database encryption key for the AdventureWorks database as follows:

USE AdventureWorks
GO

CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_128
ENCRYPTION BY SERVER CERTIFICATE ServerCert
GO

4. Enable encryption for the AdventureWorks database:

SET ENCRYPTION ON
GO

Lesson Summary
Data can be encrypted within tables using a hash algorithm, a passphrase, a symmetric
key, an asymmetric key, or a certificate.
A hash algorithm should be used with a salt value unless the range of values being
encrypted is large enough to defeat a brute force attack.
TDE is used to encrypt “data at rest.” The contents of the data and transaction log,
along with any backups, are encrypted by the engine.
If you implement TDE, make certain that you have a backup of the certificate along
with the private key; otherwise, you will not be able to restore a backup.

Lesson Review
“Encrypting Data.” The questions are also available on the companion CD if you prefer to

NOTE
E ANSWERS

1. The DBAs at Woodgrove Bank manage several sensitive databases containing credit
card and customer information. They need to encrypt the entire contents of the
database so that an attacker cannot read information off the disk. How can they meet
their requirement with the least amount of effort?
A. Create a certificate in the database that is used to encrypt the data.
B. Create a database encryption key and enable the database for encryption.


C. Create a symmetric key in the database that is used to encrypt the data.
D. Create an asymmetric key in the database that is used to encrypt the data.
2. The DBAs at Woodgrove Bank manage several sensitive databases containing credit
card and customer information. Due to recent data thefts at other banks that have
made headlines, the business wants to ensure that all data within backups is encrypted.
How can they accomplish the encryption requirement without needing to change
applications?
A. Create a certiﬁcate in the database that is used to encrypt the data.
B. Create a symmetric key in the database that is used to encrypt the data.
C. Create a database encryption key and enable the database for encryption.
D. Create an asymmetric key in the database that is used to encrypt the data.


Chapter Review
following tasks:

Chapter Summary
Endpoints provide the first layer of security within SQL Server. By providing a barrier
that is very similar to a firewall, endpoints ensure that only valid connections with valid
traffic can gain access to your SQL Server instance.
Endpoints can be created for either TCP or HTTP protocols. TCP endpoints can have
payloads for TSQL, DATABASE_MIRRORING, or SERVICE_BROKER. HTTP endpoints can
have a payload of SOAP.
HTTP endpoints enable stored procedures and functions to be exposed and consumed
as a Web service; in effect, enabling your SQL Server to act as a registered Web service.

Key Terms
Asymmetric key
Certificate
Database audit specification
Database master key
DDL trigger
Fixed database role
Fixed server role
Hash algorithm
Impersonation
Loginless user
Ownership chain
Principal


Salting
Securable
Server audit
Server audit specification
Service master key
Signature
Symmetric key
TCP endpoint

Case Scenario: Designing SQL Server Security

Case Scenario: Securing Coho Vineyard

BACKGROUND

Company Overview
wines produced over the last several decades, Coho Vineyards has experienced significant
growth. Today, the company owns 12 wineries spread across California and Washington State.
Coho employs 400 people, 74 of whom work in the central office that houses servers critical
to the business operations.

Planned Changes
In 2008, Coho Vineyards finally integrated the operations of all 12 vineyards, providing a
centralized database platform that is accessed from a variety of Web-based applications.
An audit has determined that Coho Vineyards essentially does not have any security
implemented within their databases, relying instead on the applications having complete
access via the sa account.
The sa account is to be used only in an emergency by members of the DBA team. At all
other times, the DBAs are expected to use their own Windows credentials when accessing
an instance. The tables containing customer information (especially credit card numbers) are
required to be encrypted to pass an upcoming Payment Card Industry (PCI) audit. All actions
performed by a sysadmin or database owner are required to be audited and logged to the
Windows Application Event Log.

Databases
Table 11-4 shows the databases at Coho Vineyards.



DATABASE SIZE

Customer 180 MB
Accounting 500 MB
HR 100 MB
Inventory 250 MB
Promotions 80 MB

Database Servers
Server 2008 Enterprise edition on Windows Server 2003 Enterprise edition.

GENERAL REQUIREMENTS
The Web-based applications need to be able to read and write data to the corresponding
database. The HR database needs to be restricted to the members of the HR department, and
any access to the HR database from an administrative user must be audited.

Technical Requirements
SECURITY
All traffic to and from DB1 must be encrypted. The SQL Server configuration must minimize
the server’s attack surface while still meeting all the business and technical requirements.
All client computers at the central office must be updated automatically with Microsoft
Updates.
1. How do you configure the instance to provide access to the applications without
giving the applications sysadmin authority?
2. How do you ensure that credit card numbers are encrypted within the database?
3. What would you implement to audit access to the HR database?

Suggested Practices
practice tasks.

Manage Logins and Server Roles
Practice 1 Create a SQL Server login and a Windows login for an instance.
Practice 2 Disable the login to prevent access to the instance.


Manage Users and Database Roles
Practice 1 Add a user to a database mapped to a login.
Practice 2 Create a database role and add a user to the role.

Manage SQL Server Instance Permissions
Practice Add a login to the appropriate fixed server role for the permissions you want
to grant.

Manage Database Permissions
Practice 1 Add a user to a fixed database role.
Practice 2 Grant permissions at a database scope and view the effects.

Manage Schema Permissions and Object Permissions
Practice 1 Grant permissions at a schema scope and view the effects.
Practice 2 Revoke permissions at an object level and view the effects.

Audit SQL Server Instances
Practice 1 Create a server audit object.
Practice 2 Add server audit and database audit specifications to the server audit and
test the auditing.

Manage Transparent Data Encryption
Practice 1 Configure a database for TDE.
Practice 2 Restore the database to another instance to practice restoring the
encryption keys that are necessary to access the backup.




CHAPTER 12

Monitoring Microsoft
SQL Server
n a perfect world, you would be able to install Microsoft SQL Server and deploy databases
Iwithout incident, applications would have perfect performance, and nothing would ever
go wrong. Unfortunately, we don’t live in a perfect world. Hardware fails. Application
performance degrades. Transactions block each other. Changes to the environment cause
outages. In this chapter, you learn how to monitor your SQL Server environment and
diagnose problems.

Collect performance data by using System Monitor.
Collect trace data by using SQL Server Proﬁler.
Identify SQL Server service problems.
Identify concurrency problems.
Locate error information.

Lesson 1: Working with System Monitor 309

Lesson 2: Working with the SQL Server Proﬁler 317

Lesson 3: Diagnosing Database Failures 332

Lesson 4: Diagnosing Service Failures 340

Lesson 5: Diagnosing Hardware Failures 351

Lesson 6: Resolving Blocking and Deadlocking Issues 355

Before You Begin

CHAPTER 12 307

REAL WORLD
Michael Hotek

O ver the years, development platforms and database servers have become more
sophisticated, loaded with features and graphical interfaces. The graphical
interfaces are supposed to allow applications to be developed in less time by
shielding the developer from all the low-level details. Being shielded from the details
has pros and cons. On the plus side, you don’t have to worry about learning all the
architectural details of a feature or how the feature interacts with hardware or other
components. On the negative side, you don’t have to worry about learning all the
architectural details of a feature or how the feature interacts with hardware or other
components.

Into this mix, the development tools have debuggers that too many people think
are the only answer to finding and fixing a problem. It is becoming increasingly
common that unless the debugger shows the precise line of code where an error
occurred as well as explaining what the exact error is, a developer cannot find the
problem.

This is a serious problem for a SQL Server environment. SQL Server interacts with
many hardware components and in many different ways. SQL Server also has many
interrelated, internal components that also affect each other. Into that mix, you
introduce your applications with the data structures, database code, and multiple
concurrent users. One poorly written query executed by a single person can cause
problems for other users that can cascade through multiple SQL Server and hardware
components. The real trick for a DBA is to be able to monitor a SQL Server, spot the
problem areas, and institute changes before applications are impacted. In those cases
where a DBA cannot avoid a problem, a methodical process coupled with knowledge
of all the low-level details between SQL Server and hardware is the only way to find
and fix an issue rapidly.

308 CHAPTER 12 Monitoring Microsoft SQL Server

Lesson 1: Working with System Monitor
System Monitor, commonly referred to as PerfMon, is a Microsoft Windows utility that allows
you to capture statistical information about the hardware environment, operating system,
and any applications that expose properties and counters. In this lesson, you learn how to
use System Monitor to gather counters into counter logs, which can be used to troubleshoot
system and performance issues.

to:
Select performance counters
Create a counter log


System Monitor Overview
System Monitor uses a polling architecture to capture and log numeric data exposed by
applications. The applications are responsible for updating the counters which are exposed
to System Monitor. An administrator chooses the counters to capture for analysis and the
interval to gather data. System Monitor then uses the definition supplied for the counters and
polling interval to gather only the counters desired on the interval defined.

E
NOTE FINDING SYSTEM MONITOR
With every new version of Windows, it seems that you have to hunt for the tools that you
use every day. They are either moved to a different location, or the name is changed. System
Monitor is no different. Everyone that I have ever dealt with in this industry calls this utility
PerfMon because you start the program with a file called PerfMon.exe. By the time Microsoft
released Windows XP and Windows Server 2003, it had been called Performance, Performance
Monitor, and System Monitor. Windows Vista and Windows Server 2008 renamed it yet again,
as will the upcoming Windows 7. Regardless of the name within your particular version of
Windows, it is officially called System Monitor by Microsoft, you are still using PerfMon.exe,
and the rest of the world simply calls it PerfMon.

Because the only data allowed by System Monitor is numeric and processes are not being
executed to calculate the values as data is gathered, the overhead for System Monitor is almost
nonexistent, regardless of the number of counters being captured. Although you want to minimize
the number of counters being captured to avoid being overwhelmed with data, capturing one
counter or 100 counters does not affect system performance.
Counters are organized into a three-level hierarchy: object, counter, and instance.
An object is a component, application, or subsystem within an application. For example,
Processor, Network Interface, and SQLServer:Databases are all objects. One or more counters

Lesson 1: Working with System Monitor CHAPTER 12 309

are specified within an object, and every object has to have at least one counter. For example,
within the SQLServer:Databases object, you have counters for active transactions, data file size,
transactions/sec, and the percent of the transaction log space currently in use. A counter can
have zero or more instances. For example, the System object has a Processor Queue Length
counter that does not have any instances, whereas the counter that captures the percentage
of the log space used within the SQLServer:Databases object has an instance for each
database as well as a cumulative instance named _Total.
When you define counters to capture, you can specify criteria at any level within the
hierarchy. If you decide to capture an entire object, System Monitor gathers data for every
counter within the object, as well as for each instance available within the counter. If you do
not want to capture everything, you can alternatively capture data for a subset of counters
within an object as well as for a subset of instances within a counter.

EXAM TIP
For the exam, you need to know what various performance counters are for as well as being
able to select the appropriate counter(s) to capture to diagnose problems within an instance.

Capturing Counter Logs
When first using System Monitor, most people start the program, add objects and counters,
and view the results in the graphical display. However, the graphical display does not allow
you to save the information to a file for later analysis and only provides about a two-minute
snapshot of the counters on a system.
To capture data for analysis, you need to configure a counter log, which runs in the
background when no one is logged on to the machine. Depending upon your operating
system, you configure a counter log from a variety of places. Figure 12-1 is from Windows XP
and Windows Server 2003.

FIGURE 12-1 System Monitor counter logs


In System Monitor, by right-clicking Counter Logs, selecting New Log Settings, and giving
the counter log a name, you see the counter log definition window shown in Figure 12-2.

FIGURE 12-2 Defining counter log properties

Clicking Add Objects allows you to specify the objects that you want to capture. Add
Counters allows you to specify individual counters within an object as well as individual
instances within a counter.

BEST PRACTICES
S CAPTURING COUNTER LOGS
Viewing counters in the graphical interface provided by System Monitor is useful only for
looking at the immediate state of a system. It is much more useful to capture counters to
a log to be used for later analysis. When setting up counter logs, it is recommended that
you select counter objects instead of individual counters to ensure you have captured
everything necessary for analysis.

The sample interval determines how frequently Windows gathers data for the counters
specified in the log. The default setting is every 15 seconds and is the most common setting
for routine analysis, establishing a baseline, and for long-term trend analysis. If you need to
analyze a problem that is recurring but has a very short duration, you want to decrease the
polling interval.

BEST PRACTICES
S SPECIFYING AN ACCOUNT FOR THE COUNTER LOG
At the bottom of the counter log definition screen, you can specify the security credentials
that the counter log runs under. You should always configure a counter log to run under
a specific account with sufficient permissions to access the applications for which you are
gathering counters. Failure to define a specific account is the most common cause of a
counter log failing to start. The second most common causes of a counter log failing to
start are password expiration, a locked out account, or a deactivated account.


On the Log Files tab, shown in Figure 12-3, you can deﬁne the name and format of the
counter log.

FIGURE 12-3 Defining log file properties

Performance Counters
Although you can potentially capture thousands of counters and tens of thousands of counter
instances, there is a small set of common counters that can be used to troubleshoot a variety
of problems.
An individual counter is generally used in conjunction with other counters plus additional
information about your environment to diagnose a problem. Individual counters and groups
of counters can direct you to an area of the system that might need further investigation
but does not directly indicate a problem on the system. However, three counters indicate a
system problem on their own:
System:Processor Queue Length
Network Interface:Output Queue Length
Physical Disk:Avg. Disk Queue Length
When the processor, network interface, or disk becomes overwhelmed with activity, processes
need to wait for resources to be freed up. Each thread that has to wait for a resource increments
the corresponding queue length counter. For example, a processor queue length of eight indicates
that there is insufﬁcient processor capacity on the machine and that eight requests are waiting in
a queue for a processor core to become available. Although the queue length can be greater than
zero for very short durations, having any queue length greater than zero on a routine or extended
basis means that you have a hardware bottleneck that affects application performance.


MORE INFO DIAGNOSING PROBLEMS WITH PERFORMANCE COUNTERS
Lesson 3, “Diagnosing Database Failures,” Lesson 4, “Resolving Blocking and Deadlocking
Issues,” and Lesson 5, “Diagnosing Hardware Failures,” provide additional detail on dozens
of counters that are used to diagnose a variety of SQL Server, hardware, and Windows issues.

Q
Quick Check
1 . What are the items that you can capture data for with System Monitor?

2. What types of data can System Monitor capture?

3. What are the three counters that, by themselves, indicate a system problem?

Quick Check Answers
1 . You can capture objects, counters, and counter instances.

2. System Monitor captures numeric data for performance counters that are
deﬁned for hardware or software components.

3. System:Processor Queue Length, Network Interface:Output Queue Length, and
Physical Disk:Avg. Disk Queue Length.

PR ACTICE Creating a Counter Log

In this practice, you create a counter log to use as a performance baseline and to
troubleshoot a variety of system errors.
1. Start System Monitor and create a new counter log.
2. On the General tab of the dialog box for the new counter log, click Add Objects.
The Add Objects dialog box appears, as shown here.


3. Add the following objects: Memory, Network Interface, PhysicalDisk, Processor,
SQLServer:BufferManager, SQLServer:Databases, SQLServer:Exec Statistics,
SQLServer:General Statistics, SQLServer:Latches, SQLServer:Locks, SQLServer:Memory
Manager, SQLServer:PlanCache, SQLServer:SQL Statistics, and System. Click Close.

4. Specify an interval of 15 seconds as well as an account under which to run the
counter log.
5. Click the Log Files tab, as shown here, and inspect the default settings.


6. Click OK to save the counter log. Start the counter log by right-clicking the log and
selecting Start.

Lesson Summary
System Monitor is used to capture numeric statistics about hardware and software
components.
Counters are organized into a three-level hierarchy: counter object, counter, and
counter instance.
A counter object must have at least one counter.
A counter can have zero or more instances.
You capture counter logs with System Monitor to perform analysis.

Lesson Review
“ Working with System Monitor.” The question is also available on the companion CD if you

NOTE ANSWERS


1. What does the System:Processor Queue Length counter measure?
A. The number of system requests waiting for a processor
B. The number of SQL Server requests waiting for a processor
C. The number of processors actively performing work
D. The amount of time that a processor is in use


Lesson 2: Working with the SQL Server Profiler
SQL Server is built upon an event subsystem, SQL Trace, which is exposed for administrators
to capture information associated to over 200 events that can occur within an instance. SQL
Server Profiler is the graphical tool that provides the most common interface to the SQL Trace
subsystem. You can use the data captured to monitor an instance for errors or concurrency
issues. You can also use Profiler to capture data that is used to optimize the performance of
queries being executed against the environment. In this lesson, you learn how to create a
variety of Profiler traces that can be used to diagnose errors and improve the performance
and stability of your applications.

to:
Create a Profiler trace
Select trace events
Filter traces


Defining a Trace
Although you can define a trace using Transact-SQL (T-SQL), it is more common to use SQL
Server Profiler to define the trace. You can start SQL Server Profiler from the SQL Server 2008
Performance Tools menu. After Profiler starts, you select File, New Trace, and then connect to
an instance to begin configuring a trace, as shown in Figure 12-4.

FIGURE 12-4 Creating a new trace

Lesson 2: Working with the SQL Server Profiler CHAPTER 12 317

You can specify several general properties for a trace, such as the name, template, stop
time, and whether to save the trace data. Every trace is required to have a name and at least
one event.
Profiler ships with several templates that have events, data columns, and filters already
defined, so you can use them as a starting point. Table 12-1 lists the general purpose of each
template.

TABLE 12-1 Profiler Trace Templates

TEMPLATE PURPOSE

Blank An empty trace; allows you to create an entire trace from scratch.
SP_Counts Captures each stored procedure executed so that you can determine
how many of each procedure is being executed.
Standard The most common template to start with; captures stored procedure
and ad hoc SQL being executed along with performance statistics for
each procedure and batch. Every login and logout is also captured.
TSQL Captures a list of all the stored procedures and ad hoc SQL batches that
are executed, but does not include any performance statistics.
TSQL_Duration Captures the duration of every stored procedure and ad hoc SQL batch
that is executed.
TSQL_Grouped Captures every login and logout along with the stored procedures and
ad hoc SQL batches that are executed. Includes information to identify
the application and user executing the request, but does not include
any performance data.
TSQL_Locks Captures blocking and deadlock information such as blocked processes,
deadlock chains, deadlock graphs, lock escalation, and lock timeouts.
This template also captures every stored procedure, each command
within a stored procedure, and every ad hoc SQL request.
TSQL_Replay Captures the stored procedures and ad hoc SQL batches executed
against the instance in a format that allows you to replay the trace
against a test system. This template is commonly used to perform load
and regression tests.
TSQL_SPs Captures performance data for all ad hoc SQL batches, stored
procedures, and each statement inside a stored procedure. Every login
and logout is also captured.
Tuning Captures basic performance information for ad hoc SQL batches, stored
procedures, and each statement inside a stored procedure.

By default, when a trace is started using Profiler, all the events appear in a grid within the
interface. You can additionally save the trace data to a table, a file, or both.


CAUTION
N RETURNING RESULTS INTO A GRID
Many organizations install the client tools on the server when SQL Server is installed.
Although the installation of tools provides utilities for querying and troubleshooting
an instance from the server console, you have to account for the overhead of the tools.
Profiler can capture a very large number of events in a short amount of time and when
loaded into the grid within Profiler can require the use of a large amount of memory.
Although grids can present information in an easy-to-understand format, a grid has much
more overhead than a text-based format.

If you save a trace to a file, you can specify an upper size limit on the trace file to keep the
file from growing out of control. In addition, you can enable a file rollover. If you enable file
rollover, after a trace file has reached the maximum size, the file is closed, and a new file is
opened. If you specify a maximum size without file rollover, Profiler stops capturing events
after the maximum file size has been reached.

BEST PRACTICES
S MAXIMUM TRACE FILE SIZE
Setting the maximum file size to 50 megabytes (MB) provides a good trade-off for
managing the size and number of trace files. A 50-MB file is small enough to copy or move
quickly across any network while also containing a large enough set of events within a
single file for analysis.

When you specify a table to save the trace output to, Profiler creates a connection and
streams all events captured to the target table. If you configure a maximum size to the number
of rows, Profiler quits capturing events once the maximum size has been reached.

BEST PRACTICES
S LOGGING A TRACE TO A FILE
Although SQL Server is optimized to handle large volumes of data changes, the SQL Trace
application programming interface (API) can produce enough events to overwhelm even
a server running SQL Server. You should never log trace events to the same instance for
which you are capturing events. Because you can possibly overwhelm a server running SQL
Server when you are logging events, the best logging solution is to log trace events to a
file and then later import the file into the server for analysis.

The trace stop time allows you to start a trace that runs for a given duration. After the stop
time has been reached, based on the server clock, the trace is stopped automatically.

NOTE
E CAPTURING ANALYSIS SERVICES EVENTS
Profiler can also be used to capture events for Analysis Services.


Specifying Trace Events
SQL Trace exposes more than 200 events that can be captured. The most important action that
you define when configuring a trace is to select the set of events that you need to monitor for
various situations that occur in an operational environment.
Events are classified into 21 event groups, some of which contain more than 40 events. The
event groups available are listed in Table 12-2.

TABLE 12-2 SQL Trace Event Groups

EVENT GROUP PURPOSE

Broker 13 events for Service Broker messages, queues, and conversations
CLR 1 event for the loading of a Common Language Runtime (CLR)
assembly
Cursors 7 events for the creation, access, and disposal of cursors
Database 6 events for data/log file grow/shrink as well as Database Mirroring
state changes
Deprecation 2 events to notify when a deprecated feature is used within the
instance
Errors and 16 events for errors, warnings, and information messages being
Warnings logged. Events to detect suspect pages, blocked processes, and
missing column statistics.
Full Text 3 events to track the progress of a full text index crawl
Locks 9 events for lock acquisition, escalation, release, and deadlocks
OLEDB 5 events for distributed queries and remote stored procedure calls
Objects 3 events that track when an object is created, altered, or dropped
Performance 14 events that allow you to capture show plans, use of plan guides,
and parallelism. This event group also allows you to capture full text
queries
Progress Report 1 event for online index creation progress
Query 4 events to track the parameters, subscriptions, and templates for
Notifications query notifications
Scans 2 events to track when a table or index is scanned
Security Audit 44 events to track the use of permissions, impersonation, changes
to security objects, management actions are taken on objects, start/
stop of an instance, and backup/restore of a database.
Server 3 events for mounting a tape, change to the server memory, and
closing a trace file



EVENT GROUP PURPOSE

Sessions 3 events for existing connections when the trace starts as well as
tracking the execution of logon triggers and resource governor
classifier functions
Stored Procedures 12 events for the execution of a stored procedure, cache usage,
recompilation, and statements within a stored procedure
Transactions 13 events for the begin, save, commit, and rollback of transactions
TSQL 9 events for the execution of ad hoc T-SQL or XQuery calls. Events for
an entire SQL batch as well as each statement within a batch
User Configurable 10 events that you can configure with SQL Trace

The most commonly used event groups are Locks, Performance, Security Audit, Stored
Procedures, and TSQL. The Stored Procedure and TSQL event groups are commonly
captured along with events from the Performance group to baseline and troubleshoot
query performance. The Security Audit event group is used to define auditing quickly
across a variety of security events, although the new audit specification feature provides
more secure and flexible auditing capabilities. Events from the Locks event group are
commonly used to troubleshoot concurrency issues.

EXAM TIP
You need to know which events are used to solve various problems. For example, resolving
deadlocks, blocking, or stored procedure performance.

Although most events return a small amount of data, some can have a significant payload
on a very busy instance. The events to be very careful with are:
Performance | Showplan *
Stored Procedures | SP:StmtCompleted
Stored Procedures | SP:StmtStarting
TSQL | StmtCompleted
TSQL | StmtStarting
This group of events should be included in a trace only in conjunction with a very restrictive
filter that limits the trace to a single object or query string.
The Showplan Statistics Profile, Showplan Text, and Showplan XML events return varying
amounts of data depending upon the complexity of a query. Complex queries can return
a large query plan and functions and stored procedures can return multiple query plans to
cover the statements within the function or procedure. Showplan XML events return the most
data of all events in the Performance event group because not only is the showplan and
statistical information returned, but the information is formatted with all the XML tags for the
showplan XML schema.


The StmtCompleted and StmtStarting events produce the highest volume of events in
any trace. Stored procedures and T-SQL batches can contain a large number of statements
to be executed. You can capture performance data on the entire stored procedure with the
RPC:Completed event and an entire batch with the SQL:BatchCompleted event. However,
if you need to troubleshoot performance for an individual statement, you can use the
StmtCompleted and StmtStarting events, which send event data back for every statement that
is executed within a stored procedure or T-SQL batch.

Selecting Data Columns
After you have determined which events you want to capture, you then need to determine
which columns of data to return. Although you could select all 64 possible data columns to
return for a trace event, your trace data is more useful if you capture only the information
necessary to your purposes. For example, you could return the transaction sequence number
for an RPC:Completed event, but if all the stored procedures you are analyzing do not change
any data, then the transaction sequence number consumes space without providing any value.
Additionally, not all data columns are valid for all trace events, for example Reads, Writes, CPU,
and Duration are not valid for the RPC:Starting or SQL:BatchStarting events.
When you are trying to baseline performance, you can capture the following events:
RPC:Starting
RPC:Completed
SQL:BatchStarting
SQL:BatchCompleted
However, you will find that the *Starting events capture almost all the same information
as the *Completed events. However, if you trace the *Completed events, you can also capture
performance statistics with the Reads, Writes, CPU, and Duration columns. Therefore, it is very
unlikely that you would trace the SQL:BatchStarting or RPC:Starting event because they cannot
provide any performance data necessary for a baseline or to troubleshoot performance issues.
You can capture the DatabaseID as well as the DatabaseName that corresponds to the event
captured. However, because the DatabaseName is much more descriptive and user-friendly
than the DatabaseID, you should leave the DatabaseID out of your trace definition.
The ApplicationName, NTUserName, LoginName, ClientProcessID, SPID, HostName, LoginSID,
NTDomainName, and SessionLoginName provide context for who is executing a command
and where the command is originating from. The SessionLoginName always displays the name
of the login that was used to connect to the SQL Server instance, whereas the LoginName
column accounts for any EXECUTE AS statements and displays the current user context for the
command. The ApplicationName is empty unless the application property of the connection
string has been set when a connection attempt is made to the instance. The NTUserName
and NTDomainName reflect the Windows account for the connection, regardless of whether
the connection used a Windows or SQL Server login to connect. The HostName is particularly
useful in finding rogue processes in an environment because it lists the name of the machine


that a command originated from. For example, you could use the HostName column to find
commands being executed from a development machine against a production instance due to
an incorrect configuration of an application pool.
The StartTime and EndTime columns record the time boundaries for an event. The StartTime
and EndTime columns are especially useful when you need to correlate trace data with
information from other systems.
The ObjectName, ServerName, and DatabaseName columns are useful for analysis
operations. For example, the ObjectName column for the RPC:Completed event lists the name
of the stored procedure executed so that you can easily locate all calls to a specific stored
procedure that might be causing problems in your environment. Because it is possible to save
trace data to a table, a common practice is to have a single instance within your environment
where you import all your traces. By including the ServerName and DatabaseName columns,
you can easily separate trace data between multiple instances/databases while still only needing
a single table for storage.

Applying Filters
To target trace data, you can add one or more filters to a trace. A filter is essentially a WHERE
clause that is applied to the event data returned by the SQL Trace API. Filters are defined on
data columns and allow you to specify multiple criteria to be applied, as shown in Figure 12-5.

FIGURE 12-5 Specifying trace filters

Data columns that contain character data allow filters to be defined on a text string using
LIKE or NOT LIKE that can contain one or more wildcard characters. Time-based data columns
allow you to specify greater than or less than. Numeric-based data columns allow you to
specify equals, not equal to, greater than or equal, and less than or equal. Binary data columns
cannot be filtered.


Multiple filters for a single data column are treated as OR conditions. Filters across multiple
data columns are treated as AND conditions.

Managing Traces
You can start, stop, and pause a trace. After a trace has been started, the SQL Trace API
returns events that match the trace definition while discarding any events that do not match
trace filter criteria. When a trace is stopped, all event collection terminates, and if the trace is
subsequently started again, all previous trace data is cleared from the Profiler display. If you
want to suspend data collection temporarily, you can pause a trace. Upon resumption of a
trace, subsequent events are appended to the bottom of the Profiler display.
SQL Server Profiler is an application that allows you to define graphically settings that are
translated into stored procedure calls to create and manage traces. The trace modules that
ship with SQL Server 2008 are listed in Table 12-3.


MODULE PURPOSE

sp_trace_create A stored procedure that creates a new trace object. Equivalent
to the definition on the General tab of the New Trace dialog in
Profiler.
sp_trace_generateevent A stored procedure that allows you to define your own trace
event.
sp_trace_setevent A stored procedure that adds a data column for an event to be
captured by a trace. You need to call sp_trace_setevent once
for each data column being captured for an event. Equivalent
to the event and data column selection grid in the Events
Selection tab of the New Trace dialog in Profiler.
sp_trace_setfilter A stored procedure that adds a filter to a trace. Equivalent to
the Edit Filter dialog box in Profiler.
sp_trace_setstatus A stored procedure that starts, stops, and closes a trace.
A status of 0 stops a trace. A status of 1 starts a trace. A status
of 2 closes a trace and removes the trace definition from the
instance.
fn_trace_geteventinfo A function that returns the events and data columns being
captured by a trace.
fn_trace_getfilterinfo A function that returns the filters applied to a specified trace.
fn_trace_getinfo A function that returns status and property information about
all traces defined in the instance.
fn_trace_gettable A function that reads one or more trace files and returns the
contents as a result set. Commonly used to import trace files
into a table.


BEST PRACTICES
S RUNNING TRACE AUTOMATICALLY
Although you can use Profiler to gather quick traces against an instance, it is much more
common to set up and manage traces using code. By running traces via code, you eliminate
all the overhead of a graphical tool while also providing a means of unattended tracing
through the use of jobs within SQL Server Agent.

Correlating Performance and Monitoring Data
SQL Trace is used to capture information that occurs within a SQL Server instance. System
Monitor is used to capture performance counters that provide a picture of the hardware
resources and other components running on the server. SQL Server cannot run without access
to hardware resources. The state of hardware resources affects how well a server running SQL
Server functions. For example, a query might be running slowly, but Profiler tells you only how
slowly a query is running. However by adding in performance counters, you might find that
the reason queries are running slowly is because you have insufficient processing resources.
Although you might be able to diagnose an issue using only System Monitor or SQL Trace,
using the two sets of data together provides context to any analysis. The challenge is to bring
the two sets of data together in a coherent way that enables efficient analysis. You can save a
trace to a table, just as you can save a counter log to a table. After both data sets are saved to
a table, you can execute a variety of queries to correlate the information together.
If you don’t want to write queries to correlate the data sets, you can perform this action
in a much simpler manner. Profiler allows you to view a trace file with a counter log. Not only
can you view the two data sets in the same screen, Profiler also keeps the two synchronized
together. As you scroll down a trace, an indicator shows the counter values at the point when
the query was executed.

CAUTION
N CORRELATING A COUNTER LOG TO A TRACE FILE
You can correlate a counter log with a trace file only if you have captured the StartTime
data column in the trace.

Q
Quick Check
1 . What are the three items that you define within a trace?

2. Which events are commonly used to establish a performance baseline?

Quick Check Answers
1 . You define events, data columns, and filters within a trace.

2. The RPC:Completed and SQL:BatchCompleted events are used to establish a
d d
performance baseline.


PR ACTICE Creating Traces

In this practice, you create traces to audit security, establish a performance baseline, and
troubleshoot a deadlock. You also import a trace into a table.

PR ACTICE 1 Creating a security auditing trace
In this practice, you configure a trace to audit security.
1. Start Profiler. Select File, New Trace, and connect to your instance.
2. Specify a trace name, template, and options to save to a file, as shown here.

3. Click the Events Selection tab.
4. Select the Show All Events and Show All Columns check boxes.
5. Right-click the Security Audit event category and then choose Select Event Category,
as shown here.

6. Click Run.


7. Start SQL Server Management Studio (SSMS), connect to the instance, and perform
various security actions, such as creating a login, creating a database user, and creating
an object.
8. Observe the effects of these actions within the Profiler event grid, shown here.

9. Stop and close the trace.

PR ACTICE 2 Establishing a performance baseline
In this practice, you configure a trace to establish a performance baseline for stored
procedures and ad hoc SQL execution.
1. If necessary, start Profiler. Select File, New Trace, and connect to your instance.
2. Specify a trace name, template, and options to save to a file, as shown here.



4. Below Security Audit, clear the Audit Login, Audit Logout, ExistingConnection, and
SQL:BatchStarting event check boxes.
5. Select the Showplan XML event check box. The Showplan XML is added for
demonstration purposes only; normally, you would not capture the XML showplan for
a performance baseline trace.
6. Select the TextData, NTUserName, LoginName, CPU, Reads, Writes, Duration, SPID,
StartTime, EndTime, BinaryData, DatabaseName, ServerName, and ObjectName
columns, as shown here.

7. Click Column Filters and specify a ﬁlter on the AdventureWorks database, as shown
here.

8. Click Run.


9. Execute several queries and stored procedures against the AdventureWorks database,
and then observe the results in Profiler, as shown here.

PR ACTICE 3 Importing a trace file
In this practice, you import a trace file into a table for further analysis.
1. Open a query window and execute the following command:

SELECT * INTO dbo.TK432BaselineTrace
FROM fn_trace_gettable('c:testTK432 Performance Baseline.trc', default);
GO

2. Execute the following query and inspect the results:

SELECT * FROM dbo.TK432BaselineTrace
GO

PR ACTICE 4 Correlating SQL Trace and System Monitor Data
In this practice, you correlate the System Monitor counter log created in Lesson 1 with the
Profiler trace created in Practice 2 of this lesson.
1. Stop the counter log that you created in Lesson 1.
2. Start Profiler. Select File, Open, and Trace File. Select the performance baseline trace
file that you created in Practice 2 of this lesson.
3. Select File, Import Performance Data. Select the counter log that you created in Lesson 1.
4. In the Performance Counters Limit Dialog window, shown on the following page, select
Network Interface:Output Queue Length, Processor:% Processor Time, System:Processor
Queue Length, and SQLServer:Buffer Manager:Page life expectancy. Click OK.


5. Scroll down the Proﬁler trace in the top pane and observe the changes that occur
within the performance counter graph and grid, as shown here.

Lesson Summary
Proﬁler is the utility that allows you to interact graphically with the SQL Trace API.
SQL Trace exposes events that can be captured to audit actions, monitor the
operational state of an instance, baseline queries, and troubleshoot performance
problems.


You can specify the columns of data that you want to capture for a given event.
Trace output can be limited by applying filters.

Lesson Review
“ Working with the SQL Server Profiler.” The question is also available on the companion CD
if you prefer to review it in electronic form.

NOTE
E ANSWERS

1. You are trying to troubleshoot a performance issue at Fabrikam. At about 15 minutes
past the hour, on a recurring basis, query performance declines for about 1 minute
before application performance returns to normal. What tools can you use to diagnose
the cause of the performance problems? (Choose all that apply.)
A. System Monitor
B. Database Engine Tuning Advisor
C. Resource Governor
D. Profiler


Lesson 3: Diagnosing Database Failures
In this lesson, you learn how to view and filter logs used in diagnosing errors. You also learn
how to deal with database space issues, which are the most common causes of errors.

to:
Work with SQL Server log files
Diagnose out-of-space issues


SQL Server Logs
Error and informational messages related to your server running SQL Server can be found in:
Windows Event logs
SQL Server error logs
SQL Server Agent logs
Database mail logs
Windows Event logs are commonly viewed using the Windows Event Viewer. Information
that affects your SQL Server installation can be found in three different event logs:
System Event log Contains system error and information messages primarily related
to hardware and operating system components.
Application Event logs The primary source of SQL Server information that contains
all the error and informational messages for an instance, including service start/stop/
pause messages.
Security Event log If you have enabled auditing of login/logout events, each successful
connect and disconnect from an instance is logged in this event log.
The SQL Server error log is a text file on disk that can be opened and viewed by any text
editor such as Notepad. The current SQL Server error log is named errorlog (without any file
extension) and is located in the MSSQL10.<instance>MSSQLLOG directory. You can also
retrieve the contents of the current error log as a result set by executing the system extended
stored procedure sys.xp_readerrorlog.
The SQL Server error log contains startup information such as the version (including
service pack) of SQL Server and Windows; Windows process ID; authentication mode; number
of processors; instance configuration parameters; and messages for each database that
was opened, recovered, and successfully started. The error log also contains informational
messages for major events within the instance, such as traces starting and stopping or

database backup/restores. However, the main purpose of the error log is to log error messages
such as database corruption, stack dumps, insufficient resources, and hardware failures.
The SQL Server error log also contains any messages created with a RAISERROR command
that specifies the WITH LOG parameter. Messages are logged to both the SQL Server error log
and the Windows Application Event log if you execute a RAISERROR with a severity level of
16 or higher.
Errors and informational messages related to SQL Server Agent are found in the SQL
Server Agent log file named Sqlagent.out. The database mail log is contained in the dbo.
sysmail_log table in the msdb database.
You could use the Windows Event Viewer to view event logs, Notepad to view SQL Server
and SQL Server Agent logs, and T-SQL for database mail logs. Instead of having to open logs
using multiple tools, SSMS has a Log File Viewer that allows you to view the various error and
event log in a single interface, shown in Figure 12-6.

FIGURE 12-6 Viewing error and event logs

The Log File Viewer provides an integrated view of all the logs that you specify based on
the date and time of each event. By integrating all the logs that you specify, you can correlate
information across multiple logs directly to diagnose an error such as being able to view any
system or application log entries that might have occurred around the same time that you
encountered a SQL Server error.
In addition to viewing multiple files interleaved in a single interface, the Log File Viewer
also allows you to search, filter, and export information in one or more log files.

Lesson 3: Diagnosing Database Failures CHAPTER 12 333

Database Space Issues
The most common errors that you encounter deal with running out of space for either the
data or log files.
When you run out of space in the transaction log, all write activity to the database stops.
You can still read data from the database, but any operation that attempts a write rolls back
and returns error 9002 to the application, as well as writing the error to the SQL Server error
log and Windows Application Event log. If a transaction log fills up, you can perform the
following actions:
Back up the transaction log.
Add disk space to the volume that the transaction log file is on.
Move the log to a volume with more space.
Increase the size of the transaction log file.
Add another log file on a disk volume that has space.
The first action that you should perform is to execute a transaction log backup. However, a
transaction log backup might not free up enough space within the log to be reused if any of
the following occurs:
The database is participating in replication and the distribution database is not
available.
You have an open transaction.
Database Mirroring is paused.
The mirror database is behind the principal.
A log scan is running.
SQL Server cannot increase the size of a file quickly enough.

E
NOTE INCREASING TRANSACTION LOG SPACE
If a transaction log backup does not free up enough space in the log to be reused, then
you need to add space to the transaction log by increasing the disk space available. The
most common way to increase the disk space available is to add a second log file to the
database on a disk volume that has free space.

If a database runs out of disk space, an 1101 or 1105 error is raised. So long as the database
is out of space, you cannot insert any new data. You can increase the space available to a
database by adding a file to the appropriate filegroup on a disk volume that has additional
space available.
You might be tempted to add a new filegroup with files on a disk volume with space, but
this does not solve your space problem. All your data is being written to tables within the
database. A table is stored in either a single filegroup or mapped to a partition function that
spreads the table across multiple filegroups. When you add a new filegroup, none of the
existing tables or partition schemes can use the new filegroup to store data being written


to the existing objects in the database. Files have to be added to existing filegroups in the
database to increase the space available and eliminate the 1101/1105 errors.

TIP
P FILE GROWTH
Before adding files to filegroups, you should first check if the data files have the auto-grow
feature disabled. If auto-grow is disabled and the disk volume still has space, you can
increase the space available just by increasing the size of the existing files.

The tempdb database is a special case that needs to be closely watched. Running out of
space in tempdb causes serious problems for a variety of processes. In addition to the local and
global temporary tables that any user can create, tempdb storage is used for the following:
Work tables for GROUP BY, ORDER BY, and UNION queries
Work tables for cursors and spool operations
Work tables for creating/rebuilding indexes that specify the SORT_IN_TEMPDB option
Work files for hash operations
Tempdb storage is also used for the version store. The version store is used to store row
version for the following:
Online index creation
Online index rebuild
Transactions running under snapshot isolation level
Transactions running against a database with the read committed snapshot property
enabled
Multiple Active Result Sets (MARS)
If tempdb runs out of space, you can affect every database on an instance. In severe cases,
all your applications running against an instance could cease to function. In addition to the
1101, 1105, and 9002 errors, tempdb will raise 3958, 3959, 3966, and 3967 errors when space
issues for the version store are encountered.

EXAM TIP
For the exam, you should know the common error codes for space issues, what each error
code means, and how to fix the problems which generated the errors.

BEST PRACTICES
S OUT-OF-SPACE ERRORS
Running out of space is a serious issue that should be avoided at all costs. Most database
administrators (DBAs) react to problems. Instead, you should be proactively managing your
instances to ensure that you do not run out of space. You can proactively manage space
by creating alerts to notify an operator when the amount of free space has fallen below a
specified threshold, usually 10 percent to 15 percent. You can also be notified immediately
if a space error occurs by creating alerts in SQL Server Agent.


Q
Quick Check
1 . What are the main places to ﬁnd error and informational messages about the
database engine?

2. What are the three error codes that are raised when you run out of space for a
database?

Quick Check Answers
1. You use the SQL Server error log and Windows Application Event log for messages
about the database engine. If you are auditing logins and logouts from the
instance, you use the SQL Server error log and Windows Security log. The Windows
System Event log can also provide hardware and operating system information
useful to troubleshooting a database engine issue.

2. You receive a 9002 error when you run out of transaction log space, and either
an 1101 or 1105 error when the data ﬁles are out of space.

PR ACTICE Creating Failure Alerts

In this practice, you create SQL Server Agent alerts for a variety of database errors.

PR ACTICE 1 Version Store Alerts
In this practice, you create alerts for version store errors in tempdb.
1. In Object Explorer, expand the SQL Server Agent node.
2. Right-click the Alerts node and select New Alert.
3. Name your alert Full Version Store.
4. Select the tempdb database.
5. Select Error Number and specify 3959 for the error number, as shown in Figure 12-7.
6. Select the Response page and select an operator to notify by e-mail.
7. Select the Options page, select the E-mail check box, and set the Delay Between
Responses option to 1 minute, as shown in Figure 12-8.
8. Click OK.
9. Repeat the process to create an alert for error 3967 named Forced Version Store
Shrink.
10. Repeat the process to create an alert for error 3958 named Row Version Not Found.
11. Repeat the process to create an alert for error 3966. Because alert names must be
unique and there are two error numbers which indicate the same problem, add an
increment to the name of the second alert, Row Version Not Found2.


FIGURE 12-7 Specifying error number

FIGURE 12-8 Specifying delay between responses


PR ACTICE 2 Log File Alerts
In this practice, you create an alert for when the transaction log is full.
2. Name your alert Transaction Log Full.
3. Select <All Databases>.
4. Select Error Number and specify 9002 for the error number.
Responses option to 1 minute.
7. Click OK.

PR ACTICE 3 Data File Alerts
In this practice, you create an alert for when a database is out of space.
2. Name your alert Database Full.
3. Select <All Databases>.
4. Select Error Number and specify 1101 for the error number.
Responses option to 1 minute.
7. Click OK.
8. Repeat the process to create an alert for error 1105 named Database Full2.

Lesson Summary
The SQL Server error log contains configuration information upon instance startup,
errors, stack dumps, and informational messages about your instance.
The Windows Application Event Log contains service start/stop messages, major event
informational messages, errors, and anything from a RAISERROR command that uses
either the WITH LOG parameter or specifies a severity level of 16 or higher.
The Log File Viewer allows you to view error and event logs combined into a single list
in chronological order. The Log File Viewer also allows you to filter and search logs.
An 1101 or 1105 error occurs when a database runs out of space.
A 9002 error occurs when a transaction log is full.
When the version store encounters space issues, you could receive 3958, 3959, 3966,
and 3967 errors.

You can configure alerts in SQL Server Agent to notify you when any space-related
errors occur.
You can configure performance condition alerts in SQL Server agent to notify you
when storage space is getting low.

Lesson Review
“Diagnosing Database Failures.” The question is also available on the companion CD if you

NOTE
E ANSWERS

1. Which types of SQL Server events are logged to the Windows Application Event log?
(Choose all that apply.)
A. Stack dumps
B. Startup configuration messages
C. Job failures
D. Killed processes


Lesson 4: Diagnosing Service Failures
SQL Server and SQL Server Agent run as services within Windows. In addition to SQL Server
failing due to hardware or software issues, you can also have startup failures. In this lesson,
you learn how to diagnose and fix the most common issues when a service fails to start.

to:
Locate information about a service startup failure
Diagnose the cause of a service startup failure
Fix common causes of service startup failures


Finding Service Startup Failures
You can use two utilities to view the state and configuration of SQL Server services—the
Windows Services console and SQL Server Configuration Manager.
As we discussed in Chapter 11, “Designing SQL Server Security,” every instance has a
service master key at the root of the encryption hierarchy. The service master key is encrypted
using the SQL Server service account and service account password. The Windows Services
console does not contain the code necessary to encrypt a service master key; therefore, you
should never use the Windows Services console to change the service account or service
account password.

Configuration Manager
SQL Server Configuration Manager is installed on every machine that is running an instance
of SQL Server 2008, and it cannot be removed. Unlike other SQL Server utilities, SQL
Server Configuration Manager cannot be used to manage instances on multiple machines.
The purpose of SQL Server Configuration Manager is to configure and manage the SQL
Server services, network configuration, and native client. From the main console, shown in
Figure 12-9, you can view the state, startup mode, and service account, as well as start, stop,
pause, and restart a service.

BEST PRACTICES
S MANAGING SQL SERVER SERVICES
It is always much easier to remember and enforce administration policies that do not have
a lot of exceptions. You can safely change some of the options for SQL Server services
using the Windows Services console; however, others cannot be safely changed this way.
Therefore, most environments that I have worked in dictate that all changes to a SQL
l
Server service must be made using the SQL Server Configuration Manager.


FIGURE 12-9 SQL Server Configuration Manager

Remote Desktop

M any organizations are now running SQL Server on dedicated machines.
Unfortunately, many of those organizations have also implemented a policy
under the guise of “separation of duties.” Under such a policy, SQL Server DBAs are
supposed to manage only SQL Server, whereas Windows System Administrators are
supposed to manage the Windows operating system and hardware that SQL Server
is running on.

What the organizations have not accounted for in implementing this policy is that
SQL Server is intimately tied to the operating system and hardware. When a machine
running SQL Server is operating normally, DBAs don’t need access to the operating
system or hardware, except to check the contents of and the space available on disk
volumes. When a machine running SQL Server is not operating normally, DBAs need
access to the operating system, hardware, and diagnostic tools.

When you need to troubleshoot service startup, you need to be able to use Remote
Desktop to connect to the console of the machine that SQL Server is running on to
access SQL Server Conﬁguration Manager. You also need access to disk volumes,
Windows Event Viewer, and error logs for troubleshooting. If a DBA cannot remote

Lesson 4: Diagnosing Service Failures CHAPTER 12 341

into a server running SQL Server, he or she must find someone with access and
then attempt to troubleshoot service startup failures over the phone, hoping that
the person on the other end of the line is looking at the correct information and
performing the actions that he or she is relaying. In many cases, fixing a service
startup problem when the DBA does not have remote access to the machine turns a
problem that might take two to three minutes to fix into a major outage that might
take hours before it is finally solved.

If you cannot connect to your SQL Server instance, the first thing that you need to check is
whether SQL Server is running. If the service is in a starting state, SQL Server is in the process
of starting up and should be available in a short amount of time.
If the service is in a stopped state, the SQL Server is shut down due to any of the following:
The service is set to manual or disabled startup mode and the machine was rebooted.
The service was shut down by someone.
SQL Server encountered a critical error and shut down the service.
The first place you should look when the SQL Server service is in a stopped state is either
the SQL Server error log or Windows Application Event log. If there were critical errors and
the SQL Server instance was stopped as a result, both error logs contain additional diagnostic
messages that can be used to troubleshoot hardware problems, which are covered in Lesson 5,
“Diagnosing Hardware Failures.”
If the machine was rebooted, both logs show a message that states “SQL Server is
terminating because of a system shutdown.” If someone shut down the service, then both
logs show a message that states “SQL Server is terminating in response to a ‘stop’ request
from Service Control Manager.” So long as you do not see any additional hardware errors in
the logs, then all you have to do is start the service again.
After the service has been successfully restarted, you need to connect and verify that
all the databases are online and accessible, and that applications can connect successfully.
Following establishment of normal operations, you need to find out who shut the service
down or rebooted the machine to ensure that the problem does not happen again.
The normal startup mode for SQL Server and SQL Server Agent services on a stand-alone
machine is Automatic. If the services were set to either Manual or Disabled, you must find
out who made the change and why, and ensure that changes do not affect operations. You
can change the startup mode from the Service tab by right-clicking the service and selecting
Properties, as shown in Figure 12-10.

NOTE
E STARTUP MODE
A service with a startup mode of Disabled shows as Other in the Start Mode column in the
SQL Server Configuration Manager main window.


FIGURE 12-10 Service account startup mode

CAUTION
N SQL SERVER CLUSTERED INSTANCES
The start mode for all SQL Server services for a clustered installation should be set to
Manual. Do not change the start mode for any services in a clustered installation. The
services are being controlled by the cluster service, and setting any of the services to an
Automatic start mode creates significant problems on a cluster if something restarts.

Service Accounts
To start SQL Server, the SQL Server service account needs to have several permissions,
including the following:
Read and Write access to the folder(s) where the data/log files for system databases are
stored
Read and Write permissions on SQL Server registry keys
Log On As A Service authority
Sysadmin authority inside the SQL Server instance
If the SQL Server service starts and then immediately shuts down, you are most likely
dealing with an issue with the service account. The first step should be to check if any of the
following conditions exist for the service account:
Account deleted
Account locked out
Account disabled
Password expired


If the service account has been deleted, then you need to have your System Administrator
re-create the account and then reset the startup account on the Log On tab of the SQL Server
service Properties dialog box, shown in Figure 12-11.

FIGURE 12-11 Service account logon

If the account was disabled, your System Administrator must re-enable the account
before you can restart the SQL Server service. Following a restart, you should work with your
System Administrator to ﬁgure out how the service account got deleted or disabled and put a
process in place to ensure that it doesn’t happen again.
If the password was expired, your System Administrator should disable password expiration
for the account. When password expiration has been disabled, you should change the password,
update the password for the service account in the Log On tab, and then restart the service.
If the service account was locked out, you should immediately change the password.
Before unlocking the service account, your network and system administrators need to run
a detailed security diagnostic across the network. Because a service account should never
be used for anything except to run the associated service, a locked-out service account
means that someone attempted to use the account for other purposes in violation of security
policies. You need to isolate the person or application that attempted to use the service
account and make sure that the security violation is addressed before you unlock the account
and start the SQL Server service backup. Although it might sound counterintuitive to keep
a critical SQL Server system off-line until you isolate the cause of a service account lockout,
if the person or application had managed to knowingly or unknowingly crack the password
after the account was locked out, starting the SQL Server backup before isolating the security
violation gives the person access to all the data hosted on the instance.
If all the service account checks pass, then you are probably dealing with a permission
issue.


When SQL Server starts up, the master database is first brought online. After the master
database is brought online, the tempdb database is re-created. The creation of the tempdb
database causes a write to occur to the folder(s) that store the data and log files for the
tempdb database. If SQL Server cannot re-create the tempdb database, the service shuts down.
The most common causes of SQL Server not being able to re-create the tempdb database are
the following:
The SQL Server service account does not have sufficient permissions.
The folder does not exist or is unavailable.
If the folder does not exist, then you either need to create the folder or get the disk volume
for tempdb back online before restarting the SQL Server instance. If you cannot get the disk
volume back online or the tempdb creation problems were due to a configuration error, you
can start the SQL Server in single-user mode using the –m startup parameter, change the
location of tempdb, and then restart the instance normally.
If the folder is available, you should look for the following sequence of events in the
Windows Application Event Log:
Service startup
Device activation or file not found error
Service shutdown
This sequence of events indicates that the SQL Server service account does not have sufficient
permissions to access the data or log file(s) for either master or tempdb. If you rename the data
and log files for tempdb, restart the instance, and you see a file named Tempdb.mdf with a size
of 0 where the Tempdb.mdf file should be located, then the problem is with the tempdb data/log
files. After you fix the permission issues, SQL Server starts normally.

NOTE
E DEVICE ACTIVATION ERRORS
Anytime you see a device activation error, SQL Server could not access a data or log file
for a database. Device activation errors for master and tempdb prevent the instance from
r
starting, whereas device activation errors for any other database only make the database
unavailable. You should always investigate any device activation error because you either
have a failing disk subsystem or an administrator is improperly shutting down a storage
system while it is being used.

Startup Parameters
If you encounter a device activation error related to the master database, the sequence of
events that you see in the Windows Application Event Log is as follows:
Service startup
Device activation or file not found error
Service shutdown


The most common causes of a device activation error during SQL Server startup are the
following:
The service account does not have sufficient permissions.
The storage system is off-line.
The startup parameters were changed improperly.
You should first check that the storage system is online, the folder(s) containing the master
database files exist, the master data and log files exist, and the service account has permissions
to the folder(s). If you do not see any problems with the permissions, files, and folders, then
you should check the service startup options, as shown in Figure 12-12.

FIGURE 12-12 Startup parameters

Startup parameters for the service can be directly entered by clicking the drop-down list
and typing in the settings, as shown in Figure 12-13.

FIGURE 12-13 Viewing error and event logs

The –d parameter lists the location and file name for the master data file. The –l parameter
lists the location and file name of the master transaction log file. The –e parameter lists the
name and location of the SQL Server error log. One of the most common changes that you
make to the startup parameters is by adding trace flags through the use of the –T parameter.
However, unless you are very careful and fight your normal typing instincts, you can create an


error that prevents the instance from starting. If you look very closely at Figure 12-11, startup
parameters are separated by a semicolon (;). However, your normal instinct is to place a space
after the semicolon. If you introduce a space after a semicolon, SQL Server interprets the
semicolon and everything after it as part of the previous parameter. As a result, instead of the
master transaction log being named <directory>mastlog.ldf, SQL Server determines that the
name of the master transaction log file is <directory>masterlog.ldf; -T<trace flag>. So if you
have just changed the startup parameters and the instance does not start, the first thing you
should always look for is a space that does not belong.

NOTE
E EXPERIENCED SQL SERVER DBAS
Besides the additional years in the job of DBA, the biggest thing that separates someone
with experience from someone without experience is that the experienced person has
managed to survive things that have gone wrong. I’ll never admit many of the things
that I’ve managed to survive over the decades. By following many of the best practices,
sidebars, notes, and cautions that you will find in this book, I hope that you can avoid many
of the mistakes I and many others have made over the years.

If the folder or folders containing the data and log files for the master database are
accessible and the files exist, you should first check that the SQL Server service account
has read/write permissions to the folder(s). If the SQL Server service account has sufficient
permissions and you are still receiving device activation errors on the master database files,
then you have a corrupt master database that can be repaired only by running SQL Server
setup. If you have to repair a corrupt SQL Server installation, you should start SQL Server
setup, select the Maintenance page, and then start the Repair Wizard.

EXAM TIP
For the exam, you should focus on the most common error scenarios, most of which deal
with security permissions. You need to know which utilities to use to troubleshoot the
errors. You should also know how to rebuild a master database in SQL Server 2008, which
is accessed using the new Installation Center even though it still uses setup. Favorite
questions for the exam test whether you know the “new” or “improved” way of performing
an action in the current SQL Server version vs. the method(s) from the previous version.

CAUTION
N SYSTEM DATABASES
SQL Server has four system databases on every instance: master, model, msdb, and tempdb.
If you have configured replication, you also see a database named distribution. SQL Server
has a sixth system database, first introduced in SQL Server 2005, named mssqlsystemresource.
The mssqlsystemresource database contains most of the stored procedure, function, DMV,
e
and other code that ships with SQL Server. The mssqlsystemresource database is critical to
SQL Server operations and prevents a server running SQL Server from starting. Unfortunately,
this database is hidden. So, you need to look for device activation errors related to the
mssqlsystemresource database as well as the master and tempdb database.
e r

Q
Quick Check
1 . What does a device activation error mean?

2. Errors in which three databases prevent SQL Server from starting?

Quick Check Answers
1 . A device activation error means that SQL Server either cannot find or cannot
access a data or log file.

2. Errors in the master, tempdb, and mssqlsystemresource databases can prevent
an instance from starting. Errors in all other databases just make the problem
database inaccessible until you can fix the problem.

PR ACTICE Troubleshooting Service Startup Errors

In this practice, you fix a startup problem related to SQL Server not being able to find the
master database files.

PR ACTICE 1 Changing the Startup Folder
In this practice, you change the startup folder for the master database file.
1. Open SQL Server Configuration Manager and select SQL Server Services.
2. Stop the SQL Server service for your instance.
3. Open Windows Explorer and navigate to the folder which contains master.mdf
and mastlog.ldf for your instance (commonly found in Microsoft SQL Server
MSSQL10.<instance>MSSQLData).
4. Rename master.mdf to master.mdf2.
5. Rename mastlog.ldf to mastlog.ldf2.
6. Attempt to start your SQL Server instance.
7. Inspect the Windows Application Event log and the SQL Server error log, as
shown here.

PR ACTICE 2 Correcting Unavailable Devices
In this practice, you fix the device activation errors introduced by the change in Practice 1.
1. Right-click the SQL Server service in the SQL Server Configuration Manager and select
Properties.
2. Select the Advanced tab.
3. Change the file names for the master data and log file parameters.
4. Click OK.
5. Start the instance.

6. Inspect the SQL Server error log and the Windows Application Event Log for a normal
startup sequence.

Lesson Summary
SQL Server Configuration Manager is used to configure and manage services, protocols,
and the SQL Native Client.
The most common cause of service startup errors is permissions.
A device activation error indicates that SQL Server either cannot find or cannot write to
a data or log file.
When changing the startup parameters for the SQL Server service, a semicolon separates
parameters, and you need to make certain that you do not introduce a space following
a semicolon.

Lesson Review
“Diagnosing Service Failures.” The question is also available on the companion CD if you

NOTE
E ANSWERS


1. You are the DBA at Blue Yonder Airlines and the phone rings. The main ticket booking
application has just gone off-line and cannot be reconnected to the database. You
attempt to connect to the SQL Server and find that it is unreachable. You find that the
service has stopped, and upon inspecting the error logs, you find a large number of
device activation errors. What is the most likely cause of the problem?
A. Someone deleted the ticketing database files.
B. The disk storage system underneath the master or tempdb databases went off-line.
C. The disk storage system underneath the ticket booking database went off-line.
D. The SQL Server service account was locked out.


Lesson 5: Diagnosing Hardware Failures
Although most failures you encounter with SQL Server are related to organizational processes
or security, all hardware fails eventually. In this lesson, you learn how to diagnose the causes
of hardware problems so that you can replace the appropriate components.

to:
Diagnose failures due to hardware errors


Disk Drives
Disk drives are one of the last remaining hardware components with moving parts. Because
disk drives contain very small parts with very stringent clearance tolerances between
components and also subject components to extremely high velocities and mechanical
stresses, the most common hardware failures occur in your disk storage.
Having a single disk fail generally isn’t a problem because all your databases should be
stored on a disk system with some redundant array of inexpensive disks (RAID) level that
provides a spare. However, a failure of multiple disks can take an entire disk volume off-line,
making your databases or the entire instance unavailable.
The first indication that you have of a failure in the disk system is errors logged to the
Windows System Event Log or within the logging system for your Storage Area Network
(SAN) or Network Attached Storage (NAS) array. If the errors get severe enough for a volume
or an entire array to go off-line, you immediately begin seeing device activation errors in the
SQL Server error log as well as the Windows Application Event log.
When you encounter device activation errors and have determined that the storage
system where the data or log files are stored is unavailable, you should notify the storage
administrator for your organization.
If the storage for your databases is locally attached, you can use the Disk Management
folder within the Computer Management Console to determine the state of disk volumes as
shown in Figure 12-14. Errors in locally attached storage can be diagnosed as well as possibly
fixed by using the CHKDSK command-line utility.

CAUTION
N SAN AND NAS ARRAYS
When your databases are stored on SAN or NAS arrays, you should always use the
specialized utilities that ship with your storage array to diagnose and repair any disk errors.

Lesson 5: Diagnosing Hardware Failures CHAPTER 12 351

FIGURE 12-14 Managing locally attached disks

Memory and Processors
It is quite rare to have a stick of random access memory (RAM) or a processor to simply fail.
More likely, you begin receiving sporadic errors that occur on a seemingly random basis and
the SQL Server instance might stay running without showing any errors at all. You might also
see stack dumps in the SQL Server error log or folder where the SQL Server error log is stored.
When SQL Server encounters a severe error, a stack dump is generated. If the error is
recoverable, SQL Server continues to run. If the error is severe enough, the instance shuts
down. Because it is very rare to be on the console of the machine running SQL Server, you do
not see any blue screens when a STOP error occurs. Therefore, the ﬁrst indication that you
normally have for memory or processor issues is when a stack dump is generated.
You should have an alert sent to an operator if a stack dump entry ever appears in the SQL
Server error log, Windows Application Event log, or Windows System Event log, or if a ﬁle
with a .mdmp extension is created in the folder that stores the SQL Server error log.
To diagnose a memory or processor problem, you should use the diagnostic utilities that
the vendor ships with the hardware.

EXAM TIP
For the exam, you need to know the most common errors related to the failure of
hardware components.


Q
Quick Check
1 . What errors do you see if your disk storage goes off-line underneath a database?

2. What errors do you see if there is a fault in either the memory or processor?

Quick Check Answers
1 . When a disk volume that databases are stored on goes off-line, SQL Server begins
logging device activation errors.

2. If you are encountering memory or process problems, you see STOP errors.
If there is a memory error encountered when the computer is booting, you
see a POST error. Both STOP and POST are accompanied by a blue screen with
additional diagnostic information.

Lesson Summary
A severe failure of the disk system that takes a storage volume off-line logs device
activation errors and the affected databases become inaccessible.
Memory and processor errors are usually intermittent and generally occur in
conjunction with a stack dump being generated.

Lesson Review
“Diagnosing Hardware Failures.” The question is also available on the companion CD if you

NOTE
E ANSWERS

1. Humongous Insurance has hired you to evaluate its SQL Server infrastructure, make
recommendations, and manage projects to improve the production environment.
During your evaluation, you have learned that database files are stored across dozens
of different drives and mount points. Data and log files are mixed with backup files.
System databases exist on the same disk drives as user databases. Your first project is
to move all the system databases to separate drives from the user databases. You will
also be moving the tempdb database to dedicated storage on each instance because
many very poorly written queries move massive quantities of data through tempdb.
The on-call DBA is performing the maintenance and has tested the procedures several
times in a lab environment. You receive a call that the first instance, following the

Lesson 5: Diagnosing Hardware Failures CHAPTER 12 353

database moves, does not start. You have verified service account permissions for the
folder containing the master database files, that the master database files are in the
correct location, and the startup parameters are correct. What is the most likely cause
of the problem?
A. The master database is corrupted.
B. The mssqlsystem resource database is corrupted.
C. The service account does not have permissions to the folder containing the tempdb
database files.
D. You have a bad memory module in the server.


Lesson 6: Resolving Blocking and Deadlocking Issues
As the saying goes, “SQL Server would run perfectly, if it weren’t for the users.” However, if
there weren’t any users to work with the data, the databases that you are managing would
be worthless. Also, if you could only read data from a database, you wouldn’t have to worry
about multiple users trying to work with the same data. Of course, a read-only database
would not have any way to get data into the database in the ﬁrst place. Because you need
to have data in a database that is accessible to and can be manipulated by multiple users,
a mechanism must be in place to manage concurrent access to maintain data consistency.
In this lesson, you learn how the SQL Server locking mechanism manages access and how to
troubleshoot processes that collide, producing blocking and deadlocking.

MORE INFO LOCKING, BLOCKING, AND DEADLOCKING
For a detailed discussion of locking, blocking, deadlocking, and isolation levels, please
refer to Microsoft SQL Server 2008 Internals by Kalen Delaney (Microsoft Press, 2009).

to:
Find blocked processes
Kill a process
View a deadlock graph


Locks
SQL Server uses a locking mechanism to maintain data consistency for multiuser access.
An internal process called the Lock Manager determines the appropriate lock to acquire, how
long to retain the lock, and arbitrates when processes are allowed to modify data such that
reads are always consistent.
SQL Server has seven different locking modes and three different lock types. In most
situations, you deal with only the following three locking modes:
Shared
Exclusive
Update
The three different lock types that can be acquired are row, page, and table. Locks can be
scoped to a session, transaction, or cursor.
A shared lock is acquired for read operations to prevent the data being read from changing
during the read. Because read operations cannot introduce data inconsistencies, you can have
multiple shared locks on the same resource at the same time. An exclusive lock is acquired on

Lesson 6: Resolving Blocking and Deadlocking Issues CHAPTER 12 355

a resource that is being modified and is held until the modification is complete. As the name
implies, you can have only one exclusive lock on a resource at a time, and all other processes
needing to access the resource must wait until the exclusive lock has been released. An update
lock is a hybrid of a shared and an exclusive lock. Although an update lock is acquired for
any update, update locks can be acquired during any action that requires SQL Server to first
locate the piece of data to be modified. An update lock starts out by acquiring a shared lock
on resource until it finds the piece of data that needs to be modified, the shared lock is then
changed to an exclusive lock while the data is being changed.
Each lock mode can be acquired against a row, page, or table. The Lock Manager determines
the type of lock to acquire based on a very aggressive resource threshold, commonly referred
to as the two percent rule, which is designed to minimize the number of locks needing to be
acquired and managed, because each lock acquired also consumes memory. If SQL Server
determines that more than two percent of the rows on a page will need to be accessed, a page
lock will be acquired. Likewise, if more than two percent of the pages in a table will need to be
accessed, a table lock will be acquired.
The Lock Manager uses distribution statistics, also used by the Query Optimizer, to determine
which type of lock to acquire. Because distribution statistics are not always accurate or don’t
always exist, the Lock Manager has a mechanism called lock escalation that allows a lock to be
promoted to another type. SQL Server can escalate a row lock to a table lock, or a page lock to
a table lock.

NOTE
E LOCK ESCALATION
It is a very common misconception that SQL Server promotes row locks to page locks. Row
locks are promoted only to table locks.

Transaction Isolation Levels
Isolation levels affect the way SQL Server handles transactions, as well as the duration of locks
acquired. SQL Server has five isolation levels, which are described in Table 12-4.

TABLE 12-4 Transaction Isolation Levels

ISOLATION LEVEL DESCRIPTION

READ UNCOMMITTED Data can be read that has not been committed. Although
an exclusive lock still blocks another exclusive lock, any read
operations ignore an exclusive lock.
READ COMMITTED This is the default isolation level for SQL Server. An exclusive
lock blocks both shared as well as exclusive locks. A shared lock
blocks an exclusive lock. Shared locks are released as soon as
the data has been read.


TABLE 12-4 Transaction Isolation Levels

ISOLATION LEVEL DESCRIPTION

REPEATABLE READ Exclusive locks block both shared and exclusive locks. Shared
locks block exclusive locks. Shared locks are held for the
duration of the transaction.
READ SERIALIZABLE All the restrictions as the REPEATABLE READ isolation level. In
addition, you cannot insert a new row within the keyset range
currently locked by the transaction. Locks are held for the
duration of the transaction.
SNAPSHOT Uses the row versioning feature to keep shared and exclusive
locks from blocking each other while maintaining data
consistency. A read operation retrieves data from the version of
the row prior to the start of a data modification operation.

Blocked Processes
The Lock Manager is based on a first in, first out (FIFO) algorithm. Each process that executes
a command needs to acquire a lock. The locks being requested are queued up by the Lock
Manager in the order that the request is made. So long as the requested resource does
not have a lock or has a lock that does not conflict with the lock being requested, the Lock
Manager grants the lock request. If a locking conflict occurs, such as a request to acquire a
shared lock on a row that is exclusively locked by another resource, the request is not granted
and the Lock Manager holds the request in the locking queue, along with every other lock
request for that resource, until the competing lock is released.
Blocking is the term used when a situation occurs that produces competing locks on the
same resource. The second process is blocked from acquiring the lock until the first resource
releases the competing lock. A process that is blocked stops executing until the necessary
locks can be acquired.
Although blocking is a normal occurrence within any database that allows data manipulation
by multiple users, you have a problem if the blocking is severe or lasts for a long time.
Anyone living in a large metropolitan area has first-hand experience with blocking. At rush
hour each day, a flood of vehicles attempts to use a single resource—a road. So long as driving
conflicts do not occur, every vehicle rapidly travels the road and completes its route. However,
if an accident occurs that suddenly closes every lane on the road, all the traffic stops, and
people begin to get angry. Traffic cannot start flowing again until the accident is cleared out of
the way; at that point, traffic flow eventually returns to normal. The same process occurs within
a database. If blocking occurs for a long time or continuous blocking occurs across processes,
the performance of an application suffers.


You can determine whether a process is blocked by using the sys.dm_exec_requests view.
A process that is blocked will show a nonzero number in the blocking_session_id column.
If you determine that a process is causing contention within your database, a member of
the sysadmin fixed server role can terminate the process forcibly by executing the following
command (where SPID is the system process ID of the blocking session):

KILL <spid>

When a process is killed:
Any open transaction is rolled back.
A message is returned to the client.
An entry is placed in the SQL Server error log.
An entry is placed in the Windows Application Event Log.

Deadlocks
SQL Server maintains an orderly flow of transactions, even through blocking operations.
A blocked process waits until a competing lock is released before execution resumes.
However, it is possible to create a situation in which blocks can never be resolved. When
two processes block each other in such a way that neither process can be resolved, you have
created a deadlock.
A deadlock requires at least two processes and each process must be performing an action
that modifies data. Because you can have multiple shared locks acquired on a single resource at
the same time, it is not possible to produce a deadlock with a process that only retrieves data.
A deadlock is a transient issue that occurs through the following sequence:
1. SPID1 acquires an exclusive lock on RowA.
2. SPID2 acquires an exclusive lock on RowC.
3. SPID1 attempts to acquire a shared lock on RowC and is blocked by the exclusive lock.
4. SPID2 attempts to acquire a shared lock on RowA and is also blocked by the exclusive
lock.
Because both processes have to wait on the other process to release an exclusive lock before
they can complete, the Lock Manager has an impossible situation. When this occurs, the Lock
Manager detects the deadlock and chooses one of the processes to be killed automatically.
Unfortunately, the process that is chosen as the deadlock victim is the one that has the
least amount of accumulated time within SQL Server. So, you can have a critical process that
is chosen as the deadlock victim purely because you do not keep the connection open and
execute multiple queries on the session. You cannot change the deadlock victim selection
algorithm.

When a deadlock is detected, a 1205 error message is returned to the client and the
deadlock is recorded in the SQL Server error log. In addition, you can use Profiler to capture a
deadlock trace, which allows you to inspect the cause of the deadlock graphically.

BEST PRACTICES
S HANDLING DEADLOCKS
Deadlocks are transient situations basically caused by bad timing. If the queries that the
two sessions were running completed in less time, the deadlock might have been avoided.
Likewise, if one process had been started a small amount of time later, the deadlock
might never have occurred. Because a deadlock is a transient locking conflict state, your
applications should be coded to detect a 1205 error and then immediately reissue the
transaction because it is a very strong possibility that the process will not deadlock the
second time the command is executed.

EXAM TIP
For the exam, you need to know the locks that can be acquired and how lock escalation
can lead to either blocking or deadlocking issues. If a block or deadlock occurs, you also
need to know how to resolve the problem.

Q
Quick Check
1 . What are the most common lock modes and types that are available?

2. How does a deadlock occur?

Quick Check Answers
1 . The three most common locking modes are shared, exclusive, and update.
The three lock types are row, page, and table.

2. A deadlock requires at least two processes that are both modifying data. Each
process acquires an exclusive lock on a resource and then attempts to acquire a
shared lock on the same resource exclusively locked by the other process.

PR ACTICE Troubleshooting a Deadlock

In this practice, you configure a trace to capture a deadlock graph.
1. In SQL Server Profiler, select File, New Trace, and connect to your instance.
2. Specify a trace name, template, and options to save to a file, as shown in the following
graphic.


4. Select the check box for the Deadlock graph event within the Locks category,
as shown here.

5. Click Run.
6. Open two query windows and change the context to the AdventureWorks database.


7. In query window 1, execute the following code:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
GO
BEGIN TRANSACTION
UPDATE Production.Product
SET ReorderPoint = 600
WHERE ProductID = 316

8. In query window 2, execute the following code:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
GO
BEGIN TRANSACTION
UPDATE Production.ProductInventory
SET Quantity = 532
AND LocationID = 5

SELECT Name, ReorderPoint
FROM Production.Product

9. Switch back to query window 1 and execute the following code, ensuring that you do
not issue a commit transaction:

SELECT ProductID, LocationID, Shelf, Bin, Quantity, ModifiedDate
FROM Production.ProductInventory
AND LocationID = 5

10. Observe the results in Proﬁler, shown here.


Lesson Summary
The Lock Manager is responsible for managing the locks that SQL Server uses to
maintain data consistency while allowing multiple users to manipulate data concurrently.
When an exclusive lock is acquired, no other process is allowed to acquire a shared
lock for reading or an exclusive lock for modiﬁcation until the exclusive lock has been
released. If the process is running in the READ UNCOMMITTED isolation level, read
operations ignore exclusive locks.
You can use the KILL command to terminate a process.
A deadlock is a transient state in which two processes acquire competing locks in such
a way that neither process can complete. The Lock Manager throws a 1205 error and
selects one of the processes as a deadlock victim.

Lesson Review
“Resolving Blocking and Deadlocking Issues.” The question is also available on the companion

NOTE
E ANSWERS

1. Which of the following are used to locate blocked processes? (Choose all that apply.)
A. sys.dm_exec_sessions view
B. sys.dm_exec_requests view
C. sys.dm_os_waiting_tasks view
D. sp_who2 system stored procedure


Chapter Review
following tasks:

Chapter Summary
SQL Server Profiler provides an interface to the SQL Trace API, which exposes events
that occur within the database engine so that you can capture information about the
current operational state of an instance.
System Monitor allows you to capture performance counters that can be correlated to
SQL Trace output within Profiler that provides hardware state context to events that
have been captured.
Failures can occur at many levels: hardware, service accounts, and configuration. The
most common causes of a service not being able to start are related to permissions.
The most common cause of failures for active databases is running out of disk space.
Deadlocks are a transient issue of competing blocks that should be trapped by an
application, which then resubmits the command to be executed.

Key Terms
Counter log
Deadlock
Isolation level
Lock escalation
SQL Trace
Trace Event

Case Scenario


BACKGROUND
Company Overview
wines it has produced over the last several decades, Coho Vineyards has experienced significant
growth. To continue expanding several existing wineries were acquired over the years. Today,
the company owns 16 wineries; 9 wineries are in Washington, Oregon, and California, and the
remaining 7 wineries are located in Wisconsin and Michigan. The wineries employ 532 people,
162 of whom work in the central office that houses servers critical to the business. The company
has 122 salespeople who travel around the world and need access to up-to-date inventory
availability.

Planned Changes
on the premises. Coho Vineyard wants to consolidate the Web presence of these wineries
so that Web visitors can purchase products from all 16 wineries from a single online store.
All data associated with this Web site will be stored in databases in the central office.
To meet the needs of the salespeople until the consolidation project is completed, inventory
data at each winery is sent to the central office at the end of each day. Merge replication has
been implemented to allow salespeople to maintain local copies of customer, inventory, and
order data.

Databases


DATABASE SIZE

Accounting 500 MB
HR 100 MB
Inventory 250 MB
Promotions 80 MB

serve as a data store to the new Web store. As part of their daily work, employees also


Database Servers
The chief technology ofﬁcer (CTO) is considering buying a new machine to replace DB1
because users are complaining about performance issues on a sporadic basis. In addition,
several of the wineries have been reporting device activation errors and even a few blue
screens on the server running their local SQL Server instances.

placed before the previous month should be moved to another partitioned table named
Order.Archive. Partition 1 of Order.Archive includes all archived data. Partition 2 remains
empty.
4 A.M. daily.
There have been reports of massive contention on the central SQL Server while data is
being imported during the nightly consolidation process. You need to reduce or eliminate the
contention.
1. How do you determine the cause of performance issues?
2. How can you reduce or eliminate the blocking problems during the nightly
consolidation run?
3. How do you troubleshoot the errors that are occurring at the wineries?


Suggested Practices
tasks.

Creating a Trace Using SQL Server Proﬁler to Diagnose
Performance and Deadlock Issues
Practice 1 Create a trace to capture deadlock graphs and set the trace to start
automatically when SQL Server starts.
Practice 2 Create a trace to capture query performance statistics that can be used to
produce a comparison baseline for your instance.

Create a Counter Log Using System Monitor to Diagnose
Performance, Deadlock, and System Issues
Practice 1 Create a counter log that captures hardware and SQL Server counters that
can be used to produce a performance baseline for the machine.
Practice 2 Create a counter log to capture events that allow you to diagnose disk
space and hardware errors.




CHAPTER 13

Optimizing Performance
ntire books, hundreds of webcasts, thousands of conference sessions, hundreds of
E seminars, dozens of training classes, and tens of thousands of pages spread across
hundreds of Web sites have been devoted to helping you optimize Microsoft SQL Server
performance. Just about all of these resources assume that you already know where the
performance problems are and provide recipes for fixing performance issues that you have
already identified. Anyone handed the task of optimizing SQL Server performance knows
that the biggest challenge is finding the performance issue in the first place. Chapter 12
already covered two tools—SQL Server Profiler and System Monitor which are invaluable
in optimizing performance. In this chapter, you learn about the rest of the tools available
within SQL Server 2008 which enable you to find performance bottlenecks.

Implement Resource Governor.
Use the Database Engine Tuning Advisor.
Collect performance data by using Dynamic Management Views (DMVs).
Use Performance Studio.

Lesson 1: Using the Database Engine Tuning Advisor 369

Lesson 2: Working with Resource Governor 376

Lesson 3: Using Dynamic Management Views and Functions 387
Lesson 4: Working with the Performance Data Warehouse 395

Before You Begin

CHAPTER 13 367

REAL WORLD
Michael Hotek

I frequently hear the claim that performance tuning is an “art.” People who say that
are usually trying to convince you that performance tuning is an “art” so that they
can sell you a piece of software or consulting services to fix your problems. After
all, we’re all taught that it takes special talents, which only a few people have, to
be an “artist.” Taking a blank piece of cloth, paint, and a brush and turning out a
painting—that is art. Taking a hammer and chisel to a hunk of rock and producing
a statue is art. Performance tuning has about as much in common with art as a piece
of fruit has in common with a car.

SQL Server is a bunch of instructions executed by a computer. The code within SQL
Server defines the output that the computer produces based on the input received.
Due to the simple fact that every computer system is based on binary math, one
input produces exactly one output. The same input always produces exactly the
same output. The code also limits the range of possible inputs that are valid. So
long as your requests are within the defined limits of the computer program, you
receive an answer. The task of performance tuning is simply a matter of knowledge.
The better you understand the rules and structures within the computer code that
makes up SQL Server, the better you are able to construct requests such that the
fewest possible resources are used.

Performance tuning only has one real principle, which is rooted in mathematics: the
shortest path between two points is a straight line.

If your code reads more data than is necessary, your application runs more slowly
than if you read only the data that is needed. If your code makes two passes
through the data before returning a result, it runs more slowly than code that
makes only one pass through the data.

Your challenge in performance tuning is to first find the code that you are telling
SQL Server to execute that doesn’t take a straight-line, single pass through the data
or manipulates more data than is necessary. After you find it, you then apply your
knowledge of the way SQL Server works to rewrite the request such that it goes in a
straight line, manipulating the least amount of data possible, and only takes a single
pass through the data to do so.

368 CHAPTER 13 Optimizing Performance

Lesson 1: Using the Database Engine Tuning Advisor
The Database Engine Tuning Advisor (DTA) is designed to evaluate your queries against
the rules in the Query Optimizer to make suggestions that can improve performance. In
this lesson, you learn how to build a workload file and then use DTA to analyze the query
workload to determine how you might be able to improve the performance.

to:
Configure DTA to analyze a workload
Save DTA recommendations


Database Engine Tuning Advisor
DTA works in conjunction with SQL Trace output. First, a trace is captured that contains the
queries that you want DTA to analyze. The trace output is read and evaluated by DTA against
a database. The recommendations that DTA can make are:
Adding indexes
Dropping indexes
Partitioning tables
Storage aligning tables
The source of a DTA workload can be a trace file, Transact-SQL (T-SQL) script, or a table
that contains T-SQL commands. Although Profiler is capable of capturing a wide range of
events, the only events that DTA is concerned about are:
RPC:Starting
RPC:Completed
SQL:Batch Starting
SQL:Batch Completed
An analysis is accomplished with four steps:
1. Generate a workload for analysis.
2. Start DTA and connect to a server running SQL Server that contains a database to
analyze the workload against.
3. Select the workload to use.
4. Specify tuning options.
Let’s take a look at the DTA analysis steps and options that you can specify. Start DTA so
that you can configure an analysis session, as shown in Figure 13-1.

Lesson 1: Using the Database Engine Tuning Advisor CHAPTER 13 369

FIGURE 13-1 Creating a DTA analysis session

Analysis within DTA is performed in a session. Each session must have a name and is saved so
that you can review the results at a later date. You should give each session a unique name that
helps you remember what the analysis was for, as well as when the analysis was executed.
After specifying the session name, you need to select the workload options. The most
common way of performing an analysis is with a file which either contains the output of
a trace or contains one or more T-SQL commands.

BEST PRACTICES
S AUTOMATING ANALYSIS
DTA is the graphical utility with which you interact. You can also interact with the command
line dta.exe. You can configure a trace using code, which can be executed from a SQL
Server Agent job. The job can import a trace file into a table once the trace is complete.
Because DTA can use a table as a workload source, you can set up a job step to start a DTA
analysis run against the trace data that you just imported. Instead of spending your time
clicking through GUIs, you can leave the analysis to the computer.


In the Workload section, you select the database to be used for the workload analysis. The
database selected for workload analysis is used as the basis for any tuning recommendations.
The bottom section allows you to select the databases and tables that you want to tune.
Queries that contain objects which are not selected for tuning are ignored by DTA during analysis.
After you have speciﬁed all the general options, click the Tuning Options tab, as shown in
Figure 13-2.

FIGURE 13-2 Specifying tuning options

There are four groups of tuning options:
Time limitations and online actions
Existing structures in the database
Partitioning options
Whether to retain existing structures in the database
If you limit the tuning time, DTA analyzes as many of the queries in the workload as
possible within the required time frame. Any queries still remaining in the workload when the
tuning time expires are not analyzed. You can also use the Advanced Tuning Options dialog
box to limit the space consumed by recommendations, as well as specify whether changes
recommended need to be implemented online or off-line, as shown in Figure 13-3.


FIGURE 13-3 Advanced tuning options

DTA makes index and indexed view recommendations based on your settings for the
Physical Design Structures (PDS) To Use In Database section. The most common setting is
to recommend indexes only, which include both clustered and nonclustered indexes. The
Evaluate Utilization Of Existing PDS only option can be used to locate indexes and indexed
views that can be removed because they are not being used.
If you specify a partitioning strategy, all recommendations are suggested using either full
partitioning or an aligned partition strategy. The options in the Physical Design Structures
(PDS) To Keep In Database section enables you to define whether recommendations have
to consider existing index and partitioning structures in the database or whether existing
structures can be removed as part of the recommendations.

CAUTION
N DTA PERFORMANCE IMPACT
DTA analyzes the cost of a specified query against each possible recommendation. Query cost
is generated by the query optimizer based on distribution statistics. To receive an accurate
query cost, DTA generates statistics in the database being used for workload analysis before
submitting a request to the optimizer. The creation and destruction of statistics by DTA can
place a very heavy load on the database being analyzed. Therefore, you must be very careful
if you are running DTA against a production database and in almost all cases, you want to use
DTA against a test system instead.

After you have specified your desired tuning options, you can start analysis by clicking Start
Analysis on the toolbar. At the completion of an analysis run, DTA presents recommendations
that are complete with the command(s) necessary to implement each recommendation, as


FIGURE 13-4 Tuning recommendations

You can also review a variety of reports related to the queries tuned that can tell you the
following:
Estimated percentage improvement
Frequency of each query within the workload
Query cost statistics
Detailed report of current indexes in the analyzed database

EXAM TIP
You need to know how each of the tuning options affect the recommendations that
DTA makes.

Q
Quick Check
What are the valid input sources for DTA to analyze?

Quick Check Answer
DTA can analyze queries and stored procedures that are stored in either a ﬁle or a
table. The most common tuning source for DTA is a trace output ﬁle.


PR ACTICE Analyzing a Query Workload

In this practice, you build a workload file that can be used by DTA to make recommendations
to improve performance.
1. If it doesn’t already exist, create a new database named AdventureWorksTest.
2. Execute the following command to generate a testing table:

USE AdventureWorksTest
GO
CREATE SCHEMA Person AUTHORIZATION dbo
GO

SELECT * INTO AdventureWorksTest.Person.Address
FROM AdventureWorks.Person.Address

3. Save the following code to a file:

USE AdventureWorksTest
GO
SELECT AddressLine1, AddressLine2, City, PostalCode
FROM Person.Address
WHERE City = 'Dallas'
GO
FROM Person.Address
WHERE City LIKE 'S%'
GO
FROM Person.Address
WHERE PostalCode = '75201'
GO

4. Start DTA and connect to the instance containing your AdventureWorksTest database.
5. Give your tuning session a name.
6. Select the file that you created in step 3 and specify the AdventureWorksTest database
for the workload analysis.
7. Select the AdventureWorksTest database and Person.Address table to tune.
8. Click the Tuning Options tab.
9. In the Physical Design Structures (PDS) To Use In Database section, select the Indexes
option if necessary.
10. In the Partitioning Strategy To Employ section, select the No Partitioning option.
11. In the Physical Design Structures (PDS) To Keep In Database section, select the Keep All
Existing PDS option if necessary.
12. Start the analysis.
13. Review the recommendations, along with each of the analysis reports available.


Lesson Summary
DTA is used to analyze a query workload against a database to make recommendations
on structures to create or drop, which might improve performance.
You can use either a file or a table as the workload source.
DTA creates statistics in the analysis database and then submits a request to the Query
Optimizer to evaluate the query cost and determine if an improvement has been
made.

Lesson Review
The following question is intended to reinforce key information presented in Lesson 1, “Using
the Database Engine Tuning Advisor.” The question is also available on the companion CD if you

NOTE
E ANSWERS

1. What types of workloads can DTA use for analysis? (Choose all that apply.)
A. A T-SQL script
B. A trace file containing Extensible Markup Language (XML) showplans
C. A trace file containing RPC:Completed events
D. A trace file containing SP:StmtCompleted events


Lesson 2: Working with Resource Governor
Resource Governor allows you to limit the CPU and memory allocated to or used by a specific
connection or group of users. In this lesson, you learn how to configure Resource Governor to
maximize the resource allocation based on business workload priorities.

to:
Create a resource pool
Create a workload group
Create a classifier function
Evaluate resource utilization of a resource pool
Evaluate resource utilization of a workload group


Resource Governor
Resource Governor works with three components:
Resource pools
Workload groups
Classification functions
When Resource Governor is activated, processing within SQL Server adheres to the process

E
NOTE CONNECTION CLASSIFICATION
Classification occurs at the time a connection is created. Therefore, the only way you can
limit resources is based on properties of the connection. You cannot limit the resource
consumption of individual queries or even types of queries.

The resources that can be managed by Resource Governor are CPU and memory.
Although you can limit the resources made available to a workload group, any request that
is currently executing is not limited, and you cannot place limitations on internal SQL Server
operations.
A workload group is just a name that is associated to a user session. A workload group
doesn’t define a query workload but rather a login that is executing the queries. Workload
groups are just labels that you associate to a connection when it is created so that Resource
Governor can assign the connection to the appropriate resource pool.


Connections

Classification

Workload Group Workload Group Workload Group

Resource Resource Resource
Pool Pool Pool

FIGURE 13-5 Specifying tuning options

A classifier function is a function that you create in the master database. Only one
classifier function can be active for Resource Governor at a time. The function cannot have
any input parameters and is required to return a scalar value. The value that is returned is
the name of the workload group that the session should be classified into. The function can
contain any code that is valid for a function, but you should minimize the amount of code
in any classification function. Because the classifier function executes after authentication
but before a connection handle is returned to the user’s application, any performance issues
in your classification function affect connection times to the server running SQL Server and
could potentially lead to connection time out issues in an application.

NOTE
E DEFAULT RESOURCE POOLS
If a classifier function is not associated to Resource Governor, or the classifier function
does not exist, returns NULL, or returns a nonexistent workload group, the user session is
associated to the default resource pool.

A connection can belong to only a single workload group, but multiple connections can be
classified to the same workload group. Each workload group is assigned to a single resource
pool, but multiple workload groups can be assigned to the same resource pool.
A resource pool defines the minimum and maximum CPU, memory, or both allocated to
a resource pool. The minimum value designates the lowest guaranteed amount of a resource

Lesson 2: Working with Resource Governor CHAPTER 13 377

that is available to the resource pool. Each resource pool can be configured with a minimum
value, but the total of the minimum values across all resource groups cannot exceed 100. The
maximum value places an upper bound on the amount of a resource that can be allocated to
a workload group associated to a resource pool.

NOTE
E RESOURCE ALLOCATION
All connections running within a resource pool are treated with equal weight, and SQL
Server balances the resources available to the resource pool across all requests currently
executing within the pool.

Although a classification function can be created at any time, you should create the
classification function as the last step in a Resource Governor implementation. When
associated to Resource Governor, the classification function is executed for every new
connection to your instance.
You implement Resource Governor using the following steps:
1. Enable Resource Governor.
2. Create one or more resource pools.
3. Create one or more workload groups.
4. Associate each workload group to a resource pool.
5. Create and test a classifier function.
6. Associate the classifier function to Resource Governor.

EXAM TIP
You need to know the resources that Resource Governor can control, as well as how to test
and troubleshoot a classification function.

You can use the following views to return information about the Resource Governor
configuration:

SELECT * FROM sys.resource_governor_resource_pools
SELECT * FROM sys.resource_governor_workload_groups
SELECT * FROM sys.resource_governor_configuration
GO

When Resource Governor is active, the group_id column of sys.dm_exec_sessions is the ID
of the workload group to which the session is assigned.

Q
Quick Check
1 . What are the objects that are used for a Resource Governor implementation?

2. What resources can Resource Governor control?


Quick Check Answers
1 . Resource Governor relies on a user-deﬁned classiﬁer function in the master
database to assign a connection to a workload group. Each workload group is
assigned to a resource pool that manages CPU and memory resources.

2. Resource Governor can be used to manage CPU and memory resources.

PR ACTICE Implementing Resource Governor

In this practice, you implement Resource Governor and test the effect on user sessions.

PR ACTICE 1 Creating a Resource Pool
In this practice, you create two resource pools that will be used to guarantee CPU availability
for groups of users.
1. In SSMS, connect to your instance in the Object Explorer.
2. Expand Management, Resource Governor, right-click Resource Pools, and select New
Resource Pool.
3. Select the Enable Resource Governor check box.
4. Create a pool named Executives and set the minimum CPU to 20%.
5. Create a pool named Customers and set the minimum CPU to 50%.
6. Create a pool named AdHocReports and set the minimum CPU to 0%.

7. Click OK.


PR ACTICE 2 Assigning a Workload Group
In this practice, you create workload groups that will be used to segment user requests into
the appropriate resource pools.
1. Expand the AdHocReports node in Object Explorer, Right-click Workload Groups, and
select New Workload Group.
2. Create an AdHocReportGroup for the AdHocReports resource pool, as shown here.

3. Create a CustomerGroup for the Customers resource pool.
4. Create an ExecutiveGroup for the Executives resource pool.
5. Click OK.

PR ACTICE 3 Creating a Classifier Function
In this practice, you create a classifier function and then test the workload classification and
assignment to a resource pool.
1. Execute the following code in the master database:

CREATE FUNCTION dbo.fn_ResourceGovernorClassifier()
RETURNS sysname
WITH SCHEMABINDING
AS
BEGIN
DECLARE @group sysname
--Workload group name is case sensitive,
-- regardless of server setting


IF SUSER_SNAME() = 'Executive'
SET @group = 'ExecutiveGroup'
ELSE IF SUSER_SNAME() = 'Customer'
SET @group = 'CustomerGroup'
ELSE IF SUSER_SNAME() = 'AdHocReport'
SET @group = 'AdHocReportGroup'
ELSE
SET @group = 'default'

RETURN @group
END
GO

2. Execute the following code to associate the classiﬁer function to Resource Governor:

ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = dbo.fn_
ResourceGovernorClassifier)
GO

3. Execute the following code to make the classiﬁer function active:

ALTER RESOURCE GOVERNOR RECONFIGURE
GO

PR ACTICE 4 Testing Resource Governor
In this practice, you build a simple test script to be executed and monitor the operation of the
resource pools using System Monitor.
1. Open a new query window, log in as an administrator, and execute the following code
to create three logins for testing:

CREATE LOGIN Executive WITH PASSWORD = '<InsertStrongPasswordHere>'
GO
CREATE LOGIN Customer WITH PASSWORD = '<InsertStrongPasswordHere>'
GO
CREATE LOGIN AdHocReport WITH PASSWORD = '<InsertStrongPasswordHere>'
GO

2. Open a new query window and log in using the Customer login.
3. Open a new query window and log in using the Executive login.
4. Open a new query window and log in using the AdHocReport login.
5. Execute the following query within the query window where you are connected as an
administrator to verify the workload group assigned to each connection:

SELECT b.name WorkloadGroup, a.login_name, a.session_id
FROM sys.dm_exec_sessions a INNER JOIN sys.dm_resource_governor_workload_groups b
ON a.group_id = b.group_id
WHERE b.name != 'internal'

6. Enter the following query in each of the query windows:

SET NOCOUNT ON
DECLARE @var INT

SET @var = 1

WHILE @var < 10000000
BEGIN
SELECT @@VERSION
SET @var = @var + 1
END
GO

7. Because you don’t care about the actual results returned by the query, for the Customer,
Executive, and AdHocReports query windows, select Query, Query Options. In the left
pane of the Query Options dialog box, select the Grid node and then select the Discard
Results After Execution check box.
8. Start System Monitor, remove all the default counters, and add in the counter
SQLServer:Workload Group Stats: CPU Usage % for the AdHocReportGroup,
CustomerGroup, and ExecutiveGroup instances, as shown here.

9. Start a second instance of System Monitor, remove all the default counters, and add in
the counter for the SQLServer:Resource Pool Stats:CPU usage % for the AdHocReports,
Customers, and Executives instances, as shown on the following page.

10. Execute the script in the Customer query window and observe the graphs in System
Monitor.
11. Execute the script in the Executive query window and observe the graphs in System
Monitor.
12. Execute the script in the AdHocReports query window and observe the graphs in
System Monitor, as shown here.


13. Switch to the Customer, Executive, and AdHocReports query windows and stop the
query execution.
14. Limit the resource usage by the AdHocReports pool by executing the following code:

ALTER RESOURCE POOL AdHocReports
WITH (MAX_CPU_PERCENT = 5)
ALTER RESOURCE POOL AdHocReports
WITH (MIN_CPU_PERCENT = 0)
ALTER RESOURCE POOL Executives
ALTER RESOURCE POOL Executives
ALTER RESOURCE POOL Customers
ALTER RESOURCE POOL Customers
ALTER RESOURCE GOVERNOR RECONFIGURE
GO

15. Execute the script in the Customer query window and observe the graphs in System
Monitor.
16. Execute the script in the Executive query window and observe the graphs in System
Monitor.
17. Execute the script in the AdHocReports query window and observe the graphs in
System Monitor, as shown on the following page.


CAUTION
N LIMITING RESOURCES
Even with Resource Governor enabled, SQL Server still seeks to maximize the resources
available to all concurrently executing requests. If you set a maximum limit for a resource
pool, connections assigned to the resource pool can use more resources than the
conﬁgured maximum. If other sessions that are executing do not need all the resources,
any amount of free resource is allowed to be used by any session, even if that causes the
session to exceed the resource limits of its assigned resource pool.


Lesson Summary
Resource Governor is used to limit the CPU, memory, or both allocated to one or more
connections.
Connections are assigned to a workload group using a classifier function.
A workload group is assigned to a resource pool.
Resource pools are configured to manage CPU and memory resources.

Lesson Review
“ Working with Resource Governor.” The question is also available on the companion CD if
you prefer to review it in electronic form.

NOTE
E ANSWERS
Answers to this question and an explanation of why each answer choice is correct or incorrect

1. You are the database administrator at Coho Vineyards. Following the consolidation of all
the wineries’ inventory, customer, and order databases, the marketing group wants to be
able to run ad hoc queries for analysis purposes. Users are allowed to execute any query
that they can construct, regardless of the impact it might have on the performance of
the database. Unfortunately, the same databases are being used to create and process
customer orders. Management does not want to restrict the queries that marketing can
execute, but it wants you to ensure that customer orders can be created and processed
in a timely fashion. What can be used to limit the impact of marketing queries to ensure
customer orders are processed?
A. Configure the max degree of parallelism option.
B. Implement Resource Governor.
C. Configure the query governor cost threshold.
D. Limit the memory utilization for marketing users.


Lesson 3: Using Dynamic Management Views
and Functions
Dynamic management views (DMVs) and dynamic management functions (DMFs) provide
the instrumentation infrastructure that allows database administrators to retrieve system
information as well as monitor, diagnose, and ﬁx problems. In this lesson, you learn about the
basic DMVs and DMFs available to optimize performance.

NOTE
E TERMINOLOGY CONVENTIONS
For simplicity, the entire set of instrumentation code that is available within SQL Server
is referred to collectively as DMVs, regardless of whether you are using a view or
a function.

to:
Understand the categories of DMVs and DMFs
Identify important performance and monitoring DMVs and DMFs


DMV Categories
DMVs are all stored in the sys schema and can be grouped into several dozen broad categories.
Because DMVs use a standard naming scheme and the names used are very descriptive,
separating the categories is reasonably straightforward. Table 13-1 lists the most important
DMV categories that are used for performance tuning.

TABLE 13-1 DMV Categories

DMV PREFIX GENERAL PURPOSE

dm_db_* General database space and index utilization
dm_exec_* Statistics for queries that are executing, as well as queries that have
completed and still have plans in the query cache
dm_io_* Disk subsystem statistics
dm_os_* Statistics related to the use of hardware resources

Lesson 3: Using Dynamic Management Views and Functions CHAPTER 13 387

Database Statistics
The most common DMVs used to gather database statistics are:
sys.dm_db_index_usage_stats
sys.dm_db_index_operational_stats
sys.dm_db_index_physical_stats
sys.dm_db_missing_index_groups
sys.dm_db_missing_index_group_stats
sys.dm_db_missing_index_details
Indexes are created to improve performance of specific queries. However, you also
have maintenance overhead for each index when data modifications are made. Having an
insufficient number of indexes can cause performance problems, as can having indexes that
are not used or used very infrequently. The sys.dm_db_index_usage_stats view contains the
number of times (and the last time) each index was used to satisfy a seek, scan, or lookup, as
well as the number of times and the last time an update was performed to each index. If an
index does not have any seeks, scans, or lookups, the index is not being used. If an index has
not been used to satisfy a seek, scan, or lookup for a significant amount of time, SQL Server
is no longer using the index.
Sys.dm_db_index_operational_stats is a function that takes four optional parameters:
database_id, object_id, index_id, and partition_id. This function returns locking, latching, and
access statistics for each index that can help you determine how heavily an index is being
used. This function also helps you diagnose contention issues due to locking and latching.
Sys.dm_db_index_physical_stats is a function that takes five optional parameters: database_
id, object_id, index_id, partition_id, and mode. The function returns size and fragmentation
statistics for each index and should be the primary source for determining when an index
needs to be defragmented.
When you submit a query to SQL Server, the query is parsed and optimized to determine
the most efficient way to satisfy the query. The execution plan generated is then used
to execute the query. Even if a table does not have an index, SQL Server still keeps basic
distribution statistics for each column in the table. Therefore, even in the absence of an index,
the Optimizer can determine if an index would have been beneficial to satisfying the query.
If you have enabled the AUTO_CREATE_STATISTICS option and the Optimizer determines that
it is beneficial to do so, statistics are automatically generated that can subsequently be used
by queries for improved performance.
When the Optimizer determines that an index would be beneficial but does not exist,
it created a situation referred to as an index miss. Although an index miss automatically
generates statistics, an index is still more efficient to satisfy a query if you have enabled
automatic creation.
One of the most interesting categories available in SQL Server 2008 are the sys.dm_db_
missing_index_* views. When an index miss is generated, SQL Server logs the details of the


index miss, which can then be viewed using the sys.dm_db_missing_index_* views. The sys.
dm_db_missing_index_details view contains aggregate statistics on how many times an index
miss was generated for a given index possibility, which you can use to evaluate whether you
should create the index.

BEST PRACTICES
S DETERMINING WHICH INDEXES TO CREATE
The missing index DMVs can list multiple permutations of a group of columns to use for
index creation. Each unique combination of columns producing an index miss generates
an entry. By applying some basic aggregations across the data in the missing index views,
you can determine which indexes are more beneficial. Keep in mind that the cost of
maintaining an index is not included in the aggregation. The query that you can use to
calculate a “usefulness factor” is:
is:

SELECT *
FROM
(SELECT user_seeks * avg_total_user_cost * (avg_user_impact * 0.01) AS
index_advantage, migs.* FROM sys.dm_db_missing_index_group_stats migs) AS
migs_adv
INNER JOIN sys.dm_db_missing_index_groups AS mig
ON migs_adv.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid
ON mig.index_handle = mid.index_handle
ORDER BY migs_adv.index_advantage

If the index advantage reaches 10,000, you have an index that can provide a significant
impact on query performance, but you need to balance the performance benefit against
the additional maintenance overhead. If the index advantage exceeds 50,000, the benefit
of creating the index far outweighs any maintenance required due to data manipulation
activities.

Query Statistics
The sys.dm_exec_* DMVs return information related to connection to the instance, as well as
query execution.
The following DMVs return information about connections and actively executing requests:
Sys.dm_exec_connections
Sys.dm_exec_sessions
Sys.dm_exec_requests
Sys.dm_exec_connections contains one row for each connection to the instance. Within
this view, you can find out when the connection was made along with connection properties
and encryption settings. This view also tells you the total number of reads and writes for the
connection, as well as the last time a read or write was executed.


Sys.dm_exec_sessions contains a row for each currently authenticated session. In addition
to the login information, this DMV also tracks the current state of each possible query option
and the current execution status. This DMV also returns the accumulated reads, writes, CPU,
and query execution duration for the session.
Sys.dm_exec_requests contains one row for each currently executing request in the instance.
You can use the blocking_session_id column to diagnose contention issues. This DMV also
contains the start time, elapsed time, estimated completion time, reads, writes, and CPU for
the request. In addition, you can retrieve the database and command being executed, along
with handles for the SQL statement and query plan associated with the request.
Each query that is executed has to be compiled, tokenized, and compared to the query
cache. If a match is found, the optimizer uses the cached query plan to execute the query.
If a match is not found, the query plan generated is written into the query cache.
The sys.dm_exec_query_stats DMV contains detailed statistics on the performance and
resources consumed for every query in the query cache. This DMV lists the last time the query
was executed and how many times the query was executed, along with the minimum and
maximum execution time, logical/physical reads/writes/CPU, and a handle to the query plan
generated by the Optimizer.
SQL Server stores the query plan and text of each query executed in the query cache,
identified by a unique value called the handle. The sys.dm_exec_sql_text function returns the
text of the SQL statement associated to the handle that was passed in. The sys.dm_exec_query_
plan accepts a plan handle and returns the corresponding XML showplan.
If you wanted to return the query and XML showplan for all currently executing queries,
you could use the following statement:
SELECT * FROM sys.dm_exec_requests CROSS APPLY sys.dm_exec_query_plan(plan_handle)
CROSS APPLY sys.dm_exec_sql_text(sql_handle)

The following command could be used to return the SQL statement and XML showplan for
every query that is cached in the query cache:

SELECT * FROM sys.dm_exec_query_stats CROSS APPLY sys.dm_exec_query_plan(plan_handle)
CROSS APPLY sys.dm_exec_sql_text(sql_handle)

Disk Subsystem Statistics
The sys.dm_io_virtual_file_stats function returns statistics about the reads and writes for every
database file. This view returns the aggregate number of reads and writes, as well as the bytes
read and written to each file since the instance was started. You can also retrieve a piece of
information called the IOStall for both reads and writes. When SQL Server has to wait for
the disk subsystem to become available to satisfy either a read or write operation, an IOStall
occurs. The time for IOStalls, measured in milliseconds, is logged for each database file.
You use the information returned by the sys.dm_io_virtual_stats function to determine
whether disk contention is contributing to performance issues. You can also use this view to


determine whether your disk input/output (I/O) is balanced across database files or if you
have created a disk hot spot.
The sys.dm_io_pending_requests DMV contains a row for each request that is waiting for
the disk subsystem to complete an I/O request. On a busy system, you always find entries in
this view. However, if you have a request that appears frequently or stays for a very long time,
you probably have a disk bottleneck issue that needs to be dealt with.

Hardware Resources
One of the mistakes that many people make when attempting to track down performance
issues is to think that a poorly performing query can be fixed by either adding indexes or
rewriting code. A query could be running slowly due to inefficient code or a lack of indexes.
However, a query also could be running slowly due to resource contention issues that cause
the query to wait for a resource to become available.
SQL Server uses a cooperative processing model to satisfy requests. Each request that is
executed is assigned to a User Mode Scheduler (UMS). Unless you have changed the default
configuration through the set affinity mask configuration option, SQL Server has one UMS per
processor available to satisfy query requests. Because only a single command can execute on
a processor at any time, the maximum number of requests that can be executing concurrently
is equal to the number of UMSs that SQL Server has running. Any requests that exceed this
number are added to a runnable queue in the order they were received. After a request
has made it to the top of the runnable queue and a UMS becomes available, the request is
swapped on to the running queue of the UMS and begins to execute. As soon as the process
has to wait on a resource to be allocated such as a lock to be acquired, disk I/O to become
available, or memory to be allocated, the request is swapped off the processor and on to
a waiting queue to make room for the next request in the runnable queue to start executing.
The request remains on the waiting queue until the resource becomes available and then the
request is moved to the bottom of the runnable queue, where it must wait behind all other
requests before being swapped back on to the UMS to continue executing.
If you have contention for resources, a request could make multiple cycles between the
runnable, running, and waiting queues before the query can complete. By removing the
processing bottlenecks, you can improve query performance.
When a request is sent to the waiting queue, SQL Server sets a value called the wait type
that designates the type of resource that the request is waiting on. As soon as a wait type is
set, SQL Server starts a clock. When the resource becomes available, SQL Server stops the
clock and records the amount of time that the request had to wait for the resource to become
available, called the wait time. SQL Server also sets a clock when a request enters the runnable
queue, called the signal wait, which records how long it takes a process to get to the top of
the queue and begin executing.
The sys.dm_os_wait_stats DMV lists the aggregate amount of signal wait and wait time
for each wait type. Signal wait and wait time is an aggregate value since the last time the


statistics were cleared. Although most DMVs can be cleared only by restarting the instance,
the wait time and signal wait time can be cleared by executing the following code:

DBCC SQLPERF(WAITSTATS,CLEAR)

MORE INFO WAIT TYPES
Although wait statistics are an extremely valuable piece of information for diagnosing
performance issues, in the almost 10 years since detailed information has been available,
Microsoft still has not documented the wait types, all of which have extremely cryptic
names. The best resource available for understanding wait types and how to resolve issues
t
uncovered by wait types is on Gert Draper’s Web site (http://www.sqldev.net).

EXAM TIP
For the exam, you need to know the purpose of the main set of DMVs and how to use each
of them to diagnose and troubleshoot performance issues.

Q
Quick Check
1 . What is the difference between sys.dm_db_index_operational_stats and sys.
dm_db_index_physical_stats?

2. Which DMV can you use to retrieve execution statistics for each connection
currently executing a command?

Quick Check Answers
1 . Sys.dm_db_index_physical_stats returns fragmentation statistics for each index,
and sys.dm_dm_index_operational_stats returns locking, latching, and access
statistics for each index.

2. The sys.dm_exec_requests DMV returns one row for each currently executing
command.

PR ACTICE Evaluating Missing Indexes

In this practice, you use the DMVs to ﬁnd and evaluate index misses.
1. Open a new query window and execute the following query against the
AdventureWorksTest database:

SELECT City, PostalCode, AddressLine1
FROM Person.Address
WHERE City = 'Seattle'
GO


FROM Person.Address
WHERE City = 'Seattle' AND AdressLine2 IS NOT NULL
GO
FROM Person.Address
WHERE City LIKE 'D%'
GO

2. Execute the following index evaluation query to inspect the indexes suggested by the
Optimizer:

SELECT *
FROM
index_advantage, migs.* FROM sys.dm_db_missing_index_group_stats migs) AS migs_adv


FROM Person.Address
WHERE City LIKE 'Atlan%'
go 100

4. Execute the following index evaluation query to see how the values change as the
number of query executions increases:

SELECT *
FROM
index_advantage, migs.* FROM sys.dm_db_missing_index_group_stats migs) AS migs_adv

Lesson Summary
The sys.dm_db_* DMVs provide general space and index utilization information.
The sys.dm_exec_* DMVs are used to return information about currently executing queries
as well as queries that are still in the query cache. You can also use this set of DMVs to
troubleshoot blocking issues, as well as view the last wait type assigned to a request.


The sys.dm_io_* DMVs are used to evaluate disk subsystem performance and
determine if you have disk bottlenecks.
The sys.dm_os_wait_stats DMV provides information about the internal handling of
requests and whether a request is waiting on resources to become available.

Lesson Review
The following question is intended to reinforce key information presented in Lesson 3, “Using
Dynamic Management Views and Functions.” The question is also available on the companion

NOTE
E ANSWERS

1. What DMV would you use to ﬁnd indexes that are no longer being used?
A. sys.dm_db_index_operational_stats
B. sys.dm_db_index_physical_stats
C. sys.dm_db_index_usage_stats
D. sys.dm_db_missing_index_details


Lesson 4: Working with the Performance
Data Warehouse
The Performance Data Warehouse, also referred to as Performance Studio, is a new feature in
SQL Server Management Studio (SSMS) that allows you to configure and gather performance
data for instances through your environment that can be used for later analysis. In this lesson,
you learn about the components of the Performance Data Warehouse, as well as how to
configure collection sets, data collection, and analyze results.

to:
Create a Performance Data Warehouse
Create collection items and collection sets
Define a collection target
Configure data collection
Analyze the results of data collection


Performance Data Warehouse
The Performance Data Warehouse is based on a new feature in SQL Server 2008 called the
Data Collector. The Data Collector is based on SQL Server Integration Services (SSIS) packages
and SQL Server Agent jobs, as shown in Figure 13-6.
Data collection for the Performance Data Warehouse is configured using one of the
following collector types:
T-SQL Query
SQL Trace
Performance Counter
Query Activity
The T-SQL Query collector allows you to specify a SELECT statement to execute, as well
as the database(s) to execute the query against. The results of the query are stored in a table
within the Performance Data Warehouse whose name you define using the OutputTable
parameter of the Data Collector definition. Because the Data Collector dynamically generates
the table based on the results of the query defined, you must ensure all of the following:
The result set does not contain columns named snapshot_time, snapshot_id, or
database_name, because these are reserved for the Data Collector.
All columns in the result set must have a name.
Columns with an image, text, ntext, or XML data type cannot be included.
Only a single result set is returned.

Lesson 4: Working with the Performance Data Warehouse CHAPTER 13 395

SSIS Package

Data Provider
(Connection String)

1:1

Collector Type
(Disk Volumes) 1:1

N:1

Collection Item Collection Target
Job Schedule 1→N properties (SQL Server Instance)
0,1 modes

1:N 1:N

Data
SQL Agent Job Collector Collection Set 1:1
Run-time

Audit and Execution History

FIGURE 13-6 The Data Collector

The SQL Trace collector supports either the default trace or a user-defined trace. The results
of the trace are written to a file and the Data Collector uses the fn_trace_gettable function to
extract the contents of the file as a result set to be loaded to the Performance Data Warehouse.
Any setting that is valid for a trace can be defined for the Data Collector (e.g., rollover files and
filters). The results of the data collection are stored in the snapshots.trace_info and snapshots.
trace_data tables in the Performance Data Warehouse.
The Performance Counter collector allows you to define any combination of objects,
counters, and counter instances. The results of the data collection be stored in the
snapshots.performance_counters table in the Performance Data Warehouse.
The Query Activity collector gathers information from sys.dm_exec_requests, sys.dm_exec_
sessions, and sys.dm_exec_query_stats.

EXAM TIP
For the exam, you need to know what the purpose of the Performance Data Warehouse
is, the components that data collection is based upon, and the information that can be
collected.


Q
Quick Check
1 . Which features is the Performance Data Warehouse based on?

2. What collector types are available in SQL Server 2008?

Quick Check Answers
1 . The Performance Data Warehouse is built upon the Data Collector infrastructure.
Data collection is based on SSIS packages and SQL Server Agent jobs.

2. SQL Server 2008 ships with T-SQL Query, SQL Trace, Query Activity, and
Performance Counter collector types.

PR ACTICE Configure the Performance Data Warehouse

In this practice, you configure the Performance Data Warehouse, set up the built in collection
sets, and analyze the data collection results.

PR ACTICE 1 Configuring the Performance Warehouse
In this practice, you configure the Performance Data Warehouse.
1. Start SSMS, connect to your instance, expand the Management node, right-click Data
Collection, and select Configure Management Data Warehouse.
2. Click Next, select Create Or Upgrade A Management Data Warehouse, as shown here,
and click Next again.

3. Create a new database named PerfData with all the default settings. Click Next.


4. Select the login corresponding to your SQL Server service account (the one in this
exercise is called SQL2008SBSDE) and the mdw_admin role, as shown here. Click Next.

5. Click Finish to create the structures within the PerfData database.
6. Review the objects that have been created in the PerfData database.

PR ACTICE 2 Configuring Data Collection
In this practice, you configure data collection for the newly created Management Data
Warehouse.
1. Right-click the Data Collection node and select Configure Management Data
Warehouse.
2. Click Next, select Set Up Data Collection, as shown here, and click Next again.


3. Select the location of your PerfData database and leave the Cache directory blank.
Click Next.

4. Click Finish.

5. Expand the System Data Collection Sets folder, right-click the Disk Usage collector,
and select Properties. Review the settings for the data collection set, as
shown here.

PR ACTICE 3 Reviewing Results for a Collection Set

In this practice, you review the reports that are created by default for the system data
collection sets.

1. Right-click the Data Collection node, select Reports, Management Data Warehouse,
and Disk Usage Summary.

2. Review the results, which should look as shown on the following page.


Lesson Summary
The Data Collector is a new infrastructure component available in SQL Server 2008 that
is based on SSIS packages and SQL Server Agent jobs.
You can define four different data collection types—T-SQL Query, SQL Trace,
Performance Counter, and Query Activity.
The T-SQL Query collector is the most flexible, allowing you to specify the SELECT
statement to execute as well as the databases to execute the query against.
All the data gathered by the Data Collector, as well as the definitions of all the
collection sets, is stored in the Performance Data Warehouse.

Lesson Review
“ Working with the Performance Data Warehouse.” The question is also available on the
companion CD if you prefer to review it in electronic form.

NOTE
E ANSWERS


1. As part of a recent acquisition, Humongous Insurance now has SQL Server
instances from version 6.5 through 9.0. A variety of third-party products and
custom code has been used in the past to manage capacity across the SQL Server
environment. Your manager wants to consolidate everything into a single platform
that can be used to perform capacity management tasks and evaluate performance
against baselines. You need to implement a solution that has minimal cost and
requires the least amount of effort to conﬁgure and maintain. What solution should
you propose?
A. Install a SQL Server 2008 instance and implement policy-based management.
B. Install a SQL Server 2008 instance and implement a Performance Data Warehouse.
C. Install a SQL Server 2008 instance and rewrite everything using SSIS.
D. Implement Microsoft System Center Operations Manager 2007.


Chapter Review
following tasks:

Chapter Summary
Resource Governor allows you to classify connections into workload groups that can be
assigned to a resource pool that can limit CPU and memory resources available to the
connection.
DTA evaluates one or more SQL statements and makes recommendations for indexes
that could be created or dropped to improve performance.
DMVs are a collection of views and functions that ship with SQL Server and expose
system and diagnostic data in a format that is easy to use and manipulate.
The Performance Data Warehouse uses the Data Collector infrastructure to aggregate
information that can be used to do capacity management and analyze performance
trends.

Key Terms
Classiﬁcation function
Collection item
Collection set
Collection target
Data collector
Data provider
Dynamic Management Function (DMF)
Dynamic Management View (DMV)
Resource pool
Workload ﬁle
Workload group


Case Scenario

BACKGROUND
Company Overview
significant growth. To continue expanding several existing wineries were acquired over the

Planned Changes
store. All data associated with this Web site can be stored in databases in the central office.
To meet the needs of the salespeople until the consolidation project is completed,
inventory data at each winery is sent to the central office at the end of each day. Merge
replication has been implemented to allow salespeople to maintain local copies of customer,
inventory, and order data.


Databases


DATABASE SIZE

Accounting 500 MB
HR 100 MB
Inventory 250 MB
Promotions 80 MB


will serve as a data store to the new Web store. As part of their daily work, employees also

Database Servers
Server 2008 Enterprise on Windows Server 2003, Enterprise edition.
The chief technology ofﬁcer (CTO) is considering buying a new machine to replace DB1
because users are complaining about performance issues on a sporadic basis. In addition,
several of the wineries have been reporting device activation errors and even a few blue
screens on the server running their local SQL Server instances.

named Order.Sales. Order.Sales includes two partitions. Partition 1 includes sales activity for the
current month. Partition 2 is used to store sales activity for the previous month. Orders placed
before the previous month should be moved to another partitioned table named Order.Archive.
Partition 1 of Order.Archive includes all archived data. Partition 2 remains empty.
4 A.M. daily.
which each contain tables relevant to placing an order. The EDI import routine is currently
a single threaded C++ application that takes between three and six hours to process the ﬁles.
There have been reports of massive contention on the central SQL Server while data is
being imported during the nightly consolidation process. You need to reduce or eliminate the
contention.


1. How do you determine the cause of performance issues?
2. How do you troubleshoot the errors that are occurring at the wineries?

Suggested Practices
tasks.

Using the Performance Data Warehouse to Gather Data
for Performance Optimization
Practice 1 Configure a Query Activity collection set for all your instances running SQL
Server 2005 or later.
Practice 2 Configure a Performance Counter collection set for all your instances that
gathers the System, Processor, Network, Physical Disk, and all the SQL Server performance
objects so that you can establish a performance baseline for comparison.
Practice 3 Configure a SQL Trace collection set that includes the RPC:Completed and
SQL:BatchCompleted events so that you can baseline query performance.
Practice 4 Configure a T-SQL Query collection set to gather application-specific data
that you can use for analysis.
Practice 5 Establish a single performance warehouse and combine data from multiple
collection targets into the single warehouse.
Practice 6 Using Reporting Services, define custom reports for the data you are
collecting.

Using Database Engine Tuning Advisor to Gather Data
Practice 1 Using the results of the SQL Trace collection set stored in the Performance
Data Warehouse, run an analysis using DTA.

Using Dynamic Management Views to Gather Data
Practice 1 Using the T-SQL Query collector type, configure a data collection for the
sys.dm_db_* and sys.dm_io_* DMVs.


CHAPTER 14

Failover Clustering
icrosoft SQL Server failover clustering is built on top of Microsoft Windows clustering
M and is designed to protect a system against hardware failure. This chapter explains
Windows clustering and SQL Server failover clustering conﬁgurations.

Implement a SQL Server clustered instance.

Lesson 1: Designing Windows Clustering 410

Lesson 2: Designing SQL Server 2008 Failover Cluster Instances 430

Before You Begin
Cluster-capable hardware or Microsoft Virtual Server 2005 R2
Windows Server 2003 SP2 and later or Windows Server 2008 installed on your server

E
NOTE VIRTUAL SERVER
You can use Virtual Server and Microsoft Virtual PC to simulate hardware conﬁgurations.
Unlike Virtual PC, Virtual Server supports Windows clustering and you can use it to build
a SQL Server failover cluster.

CHAPTER 14 407

IMPORTANT
T LESSON PRACTICES
You use Virtual Server for all the practices in this chapter. To follow the steps in the practices,
you have to create three virtual machines using Virtual Server, and you must install all three
machines with Windows Server 2003 Standard edition SP2 and later or Windows Server
2008 Standard and later. You should configure one of the virtual machines as a domain
controller, and the other two machines as member servers in the domain. You need to allocate
512 megabytes (MB) of memory to the two virtual machines that you configure as member
servers, and configure the domain controller with 192 MB of RAM. To meet the hardware
requirements for this Virtual Server configuration, you need a minimum of 1.5 GB of RAM on
the host machine, and the disk drives should be at least 7200 RPM for reasonable performance.

The practices in the lessons require you to have performed the following actions:
Created three virtual machines
Installed Windows Server 2003 Standard edition and later or Windows Server 2008
Standard edition and later onto each virtual machine
Configured one virtual machine as a domain controller
Configured two virtual machines as member servers in the domain
Configured the domain controller with a single network adapter, as shown in Table 14-1
Configured the member servers with two network adapters, as shown in Table 14-1
Configured all networks as Guest Only

TABLE 14-1 TCP/IP Address Configuration for Networks

MACHINE CONNECTION IP SETTINGS

Domain controller Local Area Connection IP: 10.1.1.1
Subnet: 255.255.255.0
Gateway: 10.1.1.1
DNS: 10.1.1.1
Member server (Node1) Local Area Connection IP: 10.1.1.2
Subnet: 255.255.255.0
Gateway: 10.1.1.1
DNS: 10.1.1.1
Member server (Node1) Local Area Connection 2 Dynamically assign
Member server (Node2) Local Area Connection IP: 10.1.1.3
Subnet: 255.255.255.0
Gateway: 10.1.1.1
DNS: 10.1.1.1
Member server (Node2) Local Area Connection 2 Dynamically assign

408 CHAPTER 14 Failover Clustering

IMPORTANT

A complete discussion of Virtual Server is beyond the scope of this book. You can find
step-by-step instructions for performing each of the actions required to configure the base
environment in the Virtual Server documentation. If you have physical hardware capable of
clustering, you can perform the practices on this hardware by skipping the steps specific to
configuring the Virtual Server environment.

Before You Begin CHAPTER 14 409

Lesson 1: Designing Windows Clustering
Windows clustering is the foundation for building a SQL Server failover cluster. This lesson
outlines how to configure a Windows cluster and describes best practices for configuration.

IMPORTANT
T COMPATIBLE HARDWARE
The most frequent cause of outages for a cluster is hardware that has not been certified for
clustering. To ensure that the hardware you are deploying is certified for clustering, it must
appear in the Windows Catalog. The entire hardware solution must specifically designate that
it is certified for clustering, so you need to ensure that you check the clustering categories of
the Windows Catalog (which can be found at www.microsoft.com/whdc/hcl/default.mspx).

MORE INFO WINDOWS CLUSTERING
You can find white papers, webcasts, blogs, and other resources related to Windows
clustering at www.microsoft.com/windowsserver2003/community/centers/clustering.

Design a Microsoft Cluster Service (MSCS) implementation.


Windows Cluster Components
Windows clustering enables multiple pieces of hardware to act as a single platform for
running applications. Each piece of hardware in a cluster is called a cluster node.

MORE INFO WINDOWS SERVER VERSIONS
At the time of writing, Windows Server 2008 was just being released onto the market.
The exercises in this chapter, as well as detailed Windows clustering information, are
based primarily on Windows Server 2003, with information from Windows Server 2008
incorporated where available. If you are deploying Windows Server 2008, please refer to
the Windows Server 2008 documentation for details on clustering features.

You first must install cluster nodes with an operating system such as Windows Server 2003
or Windows Server 2008. Depending on the edition you choose, different numbers of nodes
are supported, as shown in Table 14-2.


TABLE 14-2 Number of Nodes Supported for Clustering

VERSION EDITION NODES

Windows Server 2003 Standard 2
Windows Server 2003 Enterprise 4
Windows Server 2003 Datacenter 8
Windows Server 2008 Standard 2
Windows Server 2008 Enterprise 16

Each Windows cluster has a distinct name along with an associated Internet Protocol (IP)
address. The cluster name is registered into Domain Name System (DNS) and can be resolved
on the network.
A quorum database is created that contains all the configuration information for the cluster.
All nodes within a cluster must be in a Windows domain and you should also configure
them in the same domain. You need to create a domain account that you use for the cluster
administrator account.
The most complicated elements within a cluster are groups and resources. A cluster group is
a logical name that is assigned to a container that holds one or more cluster resources. A cluster
resource consists of anything that is allowed to be configured on a server. Examples of cluster
resources are IP addresses, network names, disk drives, Windows services, and file shares.
A basic diagram of a two-node cluster is shown in Figure 14-1.

Public Network

SQL Server 2008 Instance

MSCS MSCS
Heartbeat

Node A Node B

Shared Disk
Array
FIGURE 14-1 Windows two-node cluster

Lesson 1: Designing Windows Clustering CHAPTER 14 411

Types of Clusters
Windows Server 2003 and Windows Server 2008 support standard clusters and majority node
set clusters.

Standard Windows Cluster
A standard cluster, shown in Figure 14-1, has a single quorum database stored on the shared
array. The quorum drive is accessible by only one node within the cluster at any time. All other
nodes in the cluster cannot access the drive. In the event of a failure, another node takes
ownership of the disk resource containing the quorum database and then continues cluster
operations.

Majority Node Set Cluster
The main difference with a majority node set cluster is that a copy of the quorum database is
stored locally on each node in the cluster.

NOTE LOCAL QUORUM
The location of the quorum is %SystemRoot%ClusterQoN.%ResourceGUID%$%Resource
GUID%$MSCS. A share is created on each node that is named %NodeName%%
ResourceGUID%$. You should not modify this directory or change the permissions on this
directory or share in any way.

A majority node set cluster gets its name because a majority of the nodes have to be
online for the cluster to be online. For this reason, you create majority node set clusters only
when you have three or more nodes conﬁgured in the cluster. Table 14-3 shows a comparison
of how many nodes can be ofﬂine with the cluster still operational for a standard cluster and
a majority node set cluster.

TABLE 14-3 Fault Tolerance for Clustering

FAILED NODE FAILED NODE
TOLERANCE—MAJORITY TOLERANCE—STANDARD
NUMBER OF NODES NODE SET CLUSTER CLUSTER

1 0 0
2 0 1
3 1 2
4 1 3
5 2 4
6 2 5
7 3 6
8 3 7


Looking at Table 14-3, you might wonder why anyone would use a majority node set clus-
ter because it appears to offer less tolerance than a standard cluster.
The quorum database contains the configuration of the cluster and controls cluster operations.
If the quorum database were to become unavailable, the entire cluster would be unavailable.
A standard cluster uses a single quorum database on a single shared drive array. Failure of the
shared drive array or corruption of the quorum database causes the entire cluster to become
unavailable. A majority node set cluster has a copy of the quorum database on each node that
is synchronized with all other copies, so it eliminates the quorum database as a single point of
failure in a cluster.

Security Configuration
You should apply all security best practices for Windows to each node within a cluster. Disable
any services that are not necessary.
You need to create an account in the domain that is used as the cluster administrator
account. You should add this domain account to each node in the cluster as a member of the
local administrators groups prior to configuring the cluster.

CAUTION ENCRYPTED OPERATING SYSTEM
To support encryption of the file system in a cluster configuration, Kerberos must be
enabled, and the computer accounts, along with the cluster service account, must
be trusted. If you choose to encrypt the file system, you must also account for the
performance degradation that all read and write operations incur because of encrypt/
decrypt processes.

You cannot use a regular user account for the cluster service; the cluster service must
be able to read and write to the registry, mount and unmount disk drives, stop and start
services, and perform other tasks. These tasks are possible only under a local administrator
authority.

Disk Configuration
You can build clusters by using either Small Computer System Interface/Internet Small
Computer System Interface (SCSI/iSCSI) drives or Fibre drives; Integrated Development
Environment (IDE) drives are not supported for clustering. If you are building a cluster that
contains more than two nodes, have Windows Datacenter, or have the 64-bit version of
Windows, you are restricted to using only Fibre drives.
Clusters do not support the use of dynamic disks; you can use only basic disks and mount
points for clustering. Because drive letters A, B, C, and D are already allocated to local re-
sources on each node, a total of 22 drive letters can be used.


NOTE
E OPERATING SYSTEM
Check with your storage area network (SAN) vendor to determine whether your nodes can
be booted from the SAN. If your nodes cannot be booted from the SAN, or if you are using
direct attached storage, you must install the operating system on an internal hard drive
that you use to boot the node. Installing the operating system on an internal hard drive on
each node is the most common configuration.

When configuring the disks, you should allocate a dedicated drive for use by the quorum.
You need to configure the Microsoft Distributed Transaction Coordinator (MS DTC) in all
clusters. MS DTC requires disk space on a drive that is configured as a dependency of the MS
DTC resource that you manually add to the cluster after you create it.
The disk required for MS DTC creates a dilemma for most administrators. You need to
ensure that you have the maximum number of drive letters to use for databases while also
balancing best practices for performance and stability. The best practices recommendation
for a cluster is to allocate a dedicated disk for the MS DTC resource and then configure MS
DTC and its associated disk drive in a separate cluster group.
If you are not enlisting MS DTC in your applications, you are wasting a disk drive that might
be put to better use for databases. Therefore, if you do not have enough drives to produce
the drive configuration that you need for database operations and if you are not enlisting MS
DTC for any applications, you can place the MS DTC resource into the cluster group and set
its disk dependency to the drive that you have configured as the quorum. This configuration
violates best practices, but if you need the extra drive, and if MS DTC is not taking advantage
of it, you can make this configuration change for functionality reasons without affecting cluster
operations.

CAUTION
N ANTIVIRUS SOFTWARE
Antivirus software has become very prevalent on database servers. In a cluster environment,
you need to configure the antivirus scanning so that it does not interfere with cluster
operations. You must exclude the MSCS directory and all the directories containing data files
from scanning. During a failover, the disks are mounted on the node that a group is failing
over to, which triggers the antivirus software to begin scanning the disk. If the antivirus
software begins scanning a database file before SQL Server can open it, the recovery of the
database is delayed until the file has been fully scanned. Because database files are normally
very large, scanning can add a considerable amount of time to the failover process.

Network Configuration
Each node within a Windows cluster needs at least two network cards that are configured for
public and private communications. The public network is the access point for all applications
and external traffic that request data from the cluster. The internal network is used for all
internode and intercluster communications.


Windows clustering executes periodic health checks, which determine whether a node is
available and can be used to run applications. The most basic health check, which is called
a LooksAlive test, is executed by sending a ping request from one node in the cluster to
another node. If a node fails to respond to a LooksAlive test, it is considered unavailable, and
the cluster executes a failover process.
If the private network saturates, a LooksAlive test has the possibility of failing and causing
an anomalous failover. To prevent an anomalous failover, you should configure the public and
private networks on different subnets.

BEST PRACTICES
S PRIVATE NETWORK CONNECTION
You should configure the following items on the private network connection:
Disable all services except Transmission Control Protocol/Internet Protocol (TCP/IP).
Remove the default gateway address.
Remove any DNS server addresses.
Disable DNS registration.
Disable NetBIOS over TCP/IP.
Disable LMHOSTS lookup.
This configuration ensures that the network connection can process only TCP/IP traffic and
that the IP address needs to be known to be used.

NOTE
E REMOTE PROCEDURE CALL (RPC)
All health checks within a cluster use the remote procedure call (RPC) service. If the RPC
service is unavailable or has been disabled, all health checks within a cluster fail. You must
ensure that the RPC service is enabled and set to start automatically on all nodes within a
cluster.

Cluster Resources
You can separate cluster resources, which are the most granular items that you can configure
within a cluster, into the broad categories shown in Table 14-4.

TABLE 14-4 Cluster Resources

CATEGORY EXAMPLES

Networking IP address, network name
Hardware Disk drives
Software Services, executable files, file shares, MS DTC


Resources that are physically attached to a machine cannot be configured in a cluster,
so you might wonder how disk drives can be defined as a cluster resource. As described in
the “Disk Configuration” section earlier in this lesson, all data within a cluster must reside on
an external drive array. The external drive array can be a Fibre channel cabinet attached to
each node in the cluster or a SAN that is connected to all nodes in the cluster. You cannot
configure the local hard drive in each node as a cluster resource.

The physical disk drives within the disk array are not the actual cluster resources. The
disk mount definition within Windows is configured and controlled by Windows clustering.
Although a disk resource is defined on all nodes, only the node that is configured to own the
disk resource has the disks mounted and accessible. All other nodes maintain the disk mount
definition, but have the disks unmounted. This prevents more than one machine from writing
to the same media at the same time.

The main resource that is configured in a cluster is a service such as SQL Server or SQL
Server Agent. Although each node in the cluster has an entry for a given service, it is started
only on a single node within the cluster.

One of the most powerful elements within a cluster is the way in which IP addresses and
network names are handled. Although each node in the cluster carries the IP address and
network name definition, only the node designated as the owner of the IP address and name
has it bound to a physical network card. When a failover occurs to another node, clustering
performs the following operations on the network stack:

1. Unregisters the network name from DNS
2. Binds the IP address to a physical network card on the operational node
3. Reregisters the network name in DNS

This process ensures that all applications maintain the same IP address and network
name, regardless of the piece of hardware on which they are currently running. By preserving
the same IP address and network name through a failover, you do not need to reconfigure
applications to reconnect following a failover.

Cluster Groups
You use cluster groups to combine one or more cluster resources into a logical management
structure. The unit of failover within a cluster is a group. It can be helpful to think of a cluster
group as an application. Each SQL Server failover cluster instance that you create appears as a
separate group within a Windows cluster.

A cluster group, along with the resources contained within the group, is shown in
Figure 14-2.


FIGURE 14-2 Cluster group and associated cluster resources

Q
Quick Check
1 . What is the main difference between a standard cluster and a majority node set
cluster?

2. What are some examples of cluster resources?

3. How many network connections does a node need for clustering? Why?

4. How does the health check within a Windows cluster work?

5. Which types of disk conﬁgurations are supported for clustering?

Quick Check Answers
1 . A standard cluster uses a shared quorum database. A majority node set cluster
maintains a separate quorum database on each node that is synchronized across
all nodes. The majority of nodes (more than 50 percent) must be online for a
majority node set cluster to function.

2. Cluster resources can be hardware, software, or networking. Some examples are
IP addresses, network names, disk mounts, and Windows services.


3. Each node needs at least two network connections: One connection is used for
public communications to applications on the network, and the other is used for
private internal communications within the cluster.

4. The basic health check that is performed is called a LooksAlive test. This test
consists of each node pinging the others.

5. Clustering supports basic disks. Dynamic disks are not supported. Disks must
also be external to each node within the cluster, so disks mounted locally within
a cluster are not visible to any resource within a cluster.

PR ACTICE Creating a Windows Cluster

In this practice, you create a Windows cluster that you use in Lesson 2, “Designing SQL
Server 2008 Failover Cluster Instances,” to install a SQL Server failover cluster instance.
1. Open the Virtual Server Administration Web site.
2. Start the Virtual Machine Remote Control Client and connect to your Virtual Server
instance.
3. Verify that Node1 and Node2 are off. Start the domain controller (hereafter referred
to as DC).
4. Under the Virtual Disks section of the Virtual Server Administration Web site, choose
Create and then Fixed Size Virtual Hard Disk.
5. Name this disk Quorum.vhd with a size of 500 MB.
6. Repeat steps 4 and 5 to create two more disks: Qqldata.vhd with a size of 1 GB and
Sqllog.vhd with a size of 500 MB.
7. Within the Virtual Server Administration Web site, click Edit Configuration for the first
node in your cluster (hereafter referred to as Node1).
8. Verify that you have configured two network adapters. If you do not have two network
adapters configured, add a second network adapter.
9. Click the SCSI Adapters link.
10. Add three SCSI adapters with the SCSI Adapter ID set to 6 and the Share SCSI Bus For
Clustering check box selected, as shown in Table 14-6.

TABLE 14-5 Node1 SCSI Adapter Configuration

VIRTUAL ADAPTER SCSI ID

Virtual SCSI Adapter 1 6 (Share SCSI Bus For Clustering)


11. Click the Hard Disks link.
12. Click Add Disk, and then add each of the Quorum.vhd, Sqldata.vhd, and Sqllog.vhd
disks. Attach each disk, as displayed in Table 14-6.

TABLE 14-6 Node1 Cluster Disk Configuration

DISK NAME ATTACHMENT

Virtual Hard Disk 1 (Name of disk for base machine) Primary channel (0)
Virtual Hard Disk 2 Quorum.vhd SCSI 0 ID 0 (shared bus)
Virtual Hard Disk 3 Sqldata.vhd SCSI 1 ID 0 (shared bus)
Virtual Hard Disk 4 Sqllog.vhd SCSI 2 ID 0 (shared bus)

13. Verify that your configuration matches Table 14-6. An example is shown in Figure 14-3.

FIGURE 14-3 Node1 configuration

14. Click the Master Status link under Navigation.
15. Repeat steps 7–13 for the second node in your cluster (hereafter referred to as Node2).

NOTE
E SCSI ADAPTER ID FOR NODE2
Each node must use a different SCSI adapter ID. Because Node1 is configured with
a SCSI adapter ID of 6 for each SCSI adapter, you must configure Node2 with a SCSI
adapter ID of 7 for each node.


16. Verify that your conﬁgurations match those in Tables 14-7 and 14-8. An example is

TABLE 14-7 Node2 SCSI Adapter Configuration

VIRTUAL ADAPTER SCSI ID


TABLE 14-8 Node2 Cluster Disk Configuration

DISK NAME ATTACHMENT

Virtual Hard Disk 1 (Name of disk for base machine) Primary channel (0)
Virtual Hard Disk 2 Quorum.vhd SCSI 0 ID 0 (shared bus)
Virtual Hard Disk 3 Sqldata.vhd SCSI 1 ID 0 (shared bus)
Virtual Hard Disk 4 Sqllog.vhd SCSI 2 ID 0 (shared bus)

FIGURE 14-4 Node2 configuration

17. Click the Master Status link under Navigation.
18. Switch to the Virtual Machine Remote Control Client and log on to the DC.


19. Open Active Directory Users And Computers.
20. Create a new user named Clusteradmin that is not a member of any special groups, as

FIGURE 14-5 Cluster administrator account

CAUTION
N INITIAL CONFIGURATION
It is critical that you be very careful with the order in which you start and stop Node1
and Node2 during the subsequent steps in this practice. If you ever run both Node1 and
Node2 at the same time, before you conﬁgure the cluster you will corrupt the disks
and not be able to complete the steps. You must check and double-check the state of
Node1 and Node2 before stopping or starting either one.

21. Verify that Node2 is off and then start Node1.
22. After logging onto Node1, open Disk Management by right-clicking My Computer on
the Start menu and choosing Manage. In the console tree of the Computer Manage-
ment console, select Disk Management.
23. Because you have three unconﬁgured disks, you see the Initialize And Convert Disk
Wizard.
24. Click Next, verify that all three disks are selected, and click Next.
25. Verify that all three disks are not selected (because dynamic disks are incompatible
with clustering), click Next, and then click Finish.


CAUTION
N BASIC DISKS
Follow the prompts in the dialog box to set up the disks. Make absolutely certain that
you do not convert the disks to dynamic. Clustering supports only basic disks; if you
convert the disks to dynamic disks, you cannot conﬁgure your cluster and will have to
start at the beginning with new disks.

26. Create a new NTFS partition for each disk that is a primary partition encompassing the
entire disk and that shows its space as unallocated.
27. Conﬁgure the drive letters according to Table 14-9, as shown in Figure 14-6.

TABLE 14-9 Node1 Disk Configuration

DISK DRIVE LETTER

Disk 0 C
Disk 1 Q
Disk 2 M
Disk 3 N

FIGURE 14-6 Node1 disk configuration


28. In the Computer Management console, expand the Local Users And Groups node and
select Groups.
29. Double-click the Administrators group and add the Clusteradmin account you created
within your domain in step 20.
30. Close the Computer Management console.
31. Open Network Connections.
32. Rename Local Area Connection to Public.
33. Rename Local Area Connection 2 to Private.
34. Right-click the Private connection and choose Properties.
35. Clear the Client For Microsoft Networks and File And Printer Sharing For Microsoft
Networks check boxes, as shown in Figure 14-7.

FIGURE 14-7 Private network adapter properties

36. Select Internet Protocol (TCP/IP) and click Properties.
37. Specify 10.10.213.1 with a subnet mask of 255.255.255.0. Do not conﬁgure a default
gateway or DNS server, as shown in Figure 14-8.
38. Click Advanced.
39. Select the DNS tab and clear the Register This Connection’s Addresses In DNS check
box, as shown in Figure 14-9.


FIGURE 14-8 Private network IP and DNS settings

FIGURE 14-9 Private network DNS configuration


40. Select the WINS tab, clear the Enable LMHOSTS Lookup check box, and then select
Disable NetBIOS Over TCP/IP, as shown in Figure 14-10.

FIGURE 14-10 Private network WINS configuration

41. Click OK twice and then click Close to close the Private Properties dialog box.
42. Close Network Connections and shut down Node1.
43. Verify that Node1 is off and then start Node2.
44. Repeat steps 21–42 for Node2. Refer to Tables 14-10 and 14-11 for the disk and
networking configuration on Node2.

NOTE
E DISK INITIALIZATION
When you select Disk Management on Node2, the Initialize And Convert Disk Wizard
does not appear because the disks already have a signature written to them. You do not
need to format the disks because you already performed this step when you configured
Node1. You also do not need to specify drive letters because Node2 picks them up from
the cluster after you configure it.


TABLE 14-10 Node2 Disk Configuration

DISK DRIVE LETTER

Disk 0 C
Disk 1 Q
Disk 2 M
Disk 3 N

TABLE 14-11 Node2 Network Configuration

OPTION SETTING

Client For Microsoft Networks Disabled
File And Printer Sharing For Microsoft Networks Disabled
IP Address 10.10.213.2
Subnet Mask 255.255.255.0
Default Gateway Blank
DNS Blank
Register This Connection’s Addresses In DNS Disabled
Enable LMHOSTS Lookup Disabled
Disable NETBIOS Over TCP/IP Selected

45. Verify that both Node1 and Node 2 are off and then start Node1.
46. Log on to Node1 and start Cluster Administrator.
47. In the Action drop-down list, choose Create New Cluster and click OK.
48. In the New Server Cluster Wizard, click Next, verify that your domain name is specified
correctly in the Domain drop-down list, and enter Clust1 as the Cluster Name.
Click Next.
49. Node1 should be specified by default for the Computer Name. Click Next.
50. The wizard now analyzes Node1 to verify that it is compatible for clustering. When the
analysis completes and displays a green bar, click Next.
51. Enter an IP address for the cluster on the Public segment. Based on the suggested
IP address settings specified at the beginning of this chapter, set the IP address to
10.1.1.5. Click Next.
52. Enter clusteradmin for the User Name, enter the password that you used for this
account, verify that the domain name is specified correctly, and click Next.
53. On the Proposed Cluster Configuration page, click Quorum and ensure that Disk Q is
specified for the quorum. If not, change the entry and click OK.


NOTE
E SPECIFYING A QUORUM
When you configure a cluster on physical hardware, the disk that the New Server Cluster
Wizard selects by default as the quorum is the first disk added to Node1 that is not a
locally attached disk. Virtual Server selects the first drive letter in alphabetical order. You
can use the Cluster Configuration Quorum dialog box to specify a local quorum that is
used when building a single node cluster for testing. This dialog box is also where you
can change the type of cluster from the standard cluster you are building to a majority
node set cluster by choosing Majority Node Set from the drop-down list. If you choose
Majority Node Set from this drop-down list, the New Server Cluster Wizard creates a
quorum database on each node in the cluster.

54. Verify that all settings are correct and click Next.
55. The next step in the process takes a few minutes as the cluster is built to your
specifications. When this process completes, click Next and then click Finish.
56. Congratulations—you have created a Windows cluster!
57. Verify that you have created three groups: Cluster Group contains Cluster Name,
Cluster IP Address, and Disk Q; Group 0 contains Disk M; and Group 1 contains Disk N.
58. With Node1 running, start Node2.
59. In Cluster Administrator on Node1, right-click Clust1, and choose New and then Node.
Click Next when the Add Nodes Wizard starts.
60. Specify Node2 as the computer name, click Add, and then click Next.
61. After the analysis completes, click Next.
62. Enter the password for the Clusteradmin account and then click Next.
63. Verify the cluster configuration settings and click Next.
64. Node2 is now configured for clustering and added to Clust1.
65. Click Next and then click Finish.

NOTE
E CLUSTER ANALYSIS WARNINGS
Because of the way Virtual Server handles disk resources internally, you can receive
some warnings when a cluster is configured. This is normal and does not affect the
operation of your cluster. As long as you do not receive an error (the progress bar turns
red), your configuration succeeded, and you have a fully functional cluster.

66. Verify that you now see both Node1 and Node2 configured as part of Clust1.
67. Select the Cluster Group group. In the details pane, right-click Cluster Name and
choose Take Offline. Right-click Cluster IP Address and choose Take Offline.
68. From the File menu, choose New and then Resource.


69. Specify a name of MSDTC, select Distributed Transaction Coordinator for the Resource
Type, and verify that Cluster Group is selected for the group. Click Next.
70. Verify that both Node1 and Node2 are specified as Possible Owners and click Next.
71. Add Cluster Name and Disk Q to the Resource Dependencies. Click Finish. Click OK to
close the message box confirming the creation of the cluster resource.
72. Right-click Cluster Group and choose Bring Online. After the resource is online, your
screen should look similar to Figure 14-11.

FIGURE 14-11 Completed two-node cluster

BEST PRACTICES
S MICROSOFT DISTRIBUTED TRANSACTION COORDINATOR (MS DTC)
MS DTC, which you need to add to every Windows cluster you build, ensures that
operations requiring enlisting resources such as COM+ can work in a cluster. It has been
recommended that you always configure MS DTC to use a disk that is different from the
quorum disk or any disk used by SQL Server or other applications. We find this to generally
be a waste of very limited disk resources. If you are running applications in a cluster that
make very heavy use of MS DTC, you need to dedicate a disk for MS DTC operations. If you
are not running applications that require COM+, you can safely configure MS DTC within
the cluster group and set its dependencies to the Quorum drive.

Lesson Summary
You build a standard cluster using a single quorum database stored on a shared disk
array. You build a majority node set cluster with a copy of the quorum database on all
nodes within the cluster.


Windows clustering supports only basic disks. You can encrypt disks, but the encrypt/
decrypt functions affect performance. Clustering does not support disk compression.
A cluster needs two separate networks. The cluster uses the public network to
communicate with applications and clients; it uses the private network for internal
cluster communications.

Lesson Review
You can use the following questions to test your knowledge of the information in Lesson 1,
“Designing Windows Clustering.” The questions are also available on the companion CD if you

NOTE
E ANSWERS

1. Coho Vineyards has recently experienced problems with its distribution system. Delays
in scheduling trucks and getting shipments out to suppliers were caused by a series
of hardware failures. Management has authorized the chief technical ofﬁcer (CTO) to
acquire a hardware solution capable of withstanding the failure of an entire server.
Hardware that is compatible with clustering will be acquired. Which operating system
should you install to meet these business requirements for the least cost?
A. Windows 2000 Server Standard edition
B. Windows 2000 Advanced Server
C. Windows Server 2003 Standard edition
D. Windows Server 2003 Enterprise edition
2. The CTO at Coho Vineyards has decided to purchase two servers for clustering that will
be used to run the distribution system. Which combination of operating system version
and cluster type provides the most fault tolerance for the lowest cost?
A. Windows Server 2003 Standard edition with a standard cluster
B. Windows Server 2003 Standard edition with a majority node set cluster
C. Windows Server 2003 Enterprise edition with a standard cluster
D. Windows Server 2003 Enterprise edition with a majority node set cluster
3. Which service needs to be running for health checks to be executed within a cluster?
A. Server service
B. RPC service
C. Net Logon service
D. Terminal Services service


Lesson 2: Designing SQL Server 2008 Failover
Cluster Instances
After you build and configure the Windows cluster, you can install instances of SQL Server
into the cluster. Clustered instances provide fault tolerance to SQL Server by ensuring that a
hardware failure cannot cause an extended outage for applications. This lesson explains how
to install and configure SQL Server 2008 failover cluster instances for optimal redundancy in a
cluster.

Install a SQL Server clustered instance


REAL WORLD
Michael Hotek

A little more than two years ago, I was at a customer site to help implement
clustering. Instead of starting with the installation and configuration of
clustering, I had to back up and explain clustering. Some consultant told employees
of this company that clustering could be used to eliminate downtime when service
packs were installed and enable them to load-balance their hardware resources.
They were also told that clustering could enable a transaction that started on one
node to be completed after the cluster failed over.

Clustering does not have the capability to do any of these things. SQL Server
failover clustering provides protection against hardware failures. In the event of a
failure of one piece of hardware, a second piece of hardware automatically takes
over and starts SQL Server.

Service packs still cause an outage on a cluster because the SQL Server instance can
exist on only a single node at any time. Any transactions that are not completed
when a cluster fails over are rolled back. Because SQL Server does not allow multiple
processes to access database files simultaneously, load balancing is not possible.

After explaining that clustering protects only from hardware failures, I still
implemented the cluster within the customer’s environment. The customer could
effectively manage the database within the cluster by understanding that failures
would still incur outages, but the amount of downtime because of hardware failure
would be minimal.


Terminology
SQL Server instances installed into a cluster have been referred to by several different
terminologies, many of which are inaccurate. So before explaining the SQL Server
configuration within a cluster, all the terminology issues will be addressed.
SQL Server clusters are either single- or multiple-instance clusters. A single-instance cluster
is a Windows cluster that has exactly one instance of SQL Server installed. A multiple-instance
cluster is a Windows cluster that has more than one instance of SQL Server installed. It does
not matter on which node you configure the instances to run; the terminology stays the same.
Active/Active and Active/Passive clusters exist at a Windows level. An Active/Active cluster
indicates that applications are running on all the nodes in a cluster. An Active/Passive cluster
indicates that applications are running on only a single node in the cluster. This is irrelevant
as far as SQL Server is concerned because SQL Server is either running or not. SQL Server
instances are unaware of any other SQL Server instances. SQL Server cannot be load-balanced.
So SQL Server is running on one of the nodes; the node that SQL Server is running on is left to
the whim of the database administrator (DBA) who manages the cluster.
SQL Server instances installed into a cluster used to be referred to as virtual servers. This
terminology created a fundamental problem because Microsoft has a stand-alone product
that is called Virtual Server. Instances of SQL Server in a cluster are referred to as either SQL
Server clustered instances or SQL Server failover clustered instances.

Failover Cluster Instance Components
When installing a stand-alone instance, DBAs are not concerned with IP addresses, network
names, or even the presence of disk drives. Each of these components needs to be considered
when installing a SQL Server instance into a cluster.
The components that you need to configure for a SQL Server failover clustered instance
are the following:
IP addresses
Network names
Disk drives on the shared drive array
SQL Server services
Service accounts

Network Configuration
Each SQL Server instance installed into a cluster requires a unique IP address, which needs
to be on the public network segment configured in the cluster. Bound to each IP address is a
unique network name that is registered into DNS so the SQL Server can be resolved by name.

Lesson 2: Designing SQL Server 2008 Failover Cluster Instances CHAPTER 14 431

BEST PRACTICES
S SQL BROWSER SERVICE
SQL Server 2008 installs a service called SQL Browser. If you have installed named instances
in a cluster, the SQL Browser service must be running to resolve these names. If you do not
have named instances, you should disable the SQL Browser service.

Disk Configuration
You must configure each SQL Server clustered instance with a dedicated set of drive letters.
On a stand-alone server, multiple instances can store databases on the same drive or even in
the same directory as other instances. In a cluster, the drives are mounted to a particular node
at any given time. Any other node does not have access to those drives. You can configure
an instance of SQL Server to run on any node. If you could configure more than one SQL
Server clustered instance to store databases on the same drive letter, it would be possible to
create a configuration in which the instance is running on one node while another node has
ownership of the disks, thereby rendering the SQL Server instance inoperable.
The concept of disk configurations in a SQL Server cluster is known as the instance-to-disk
ratio. Although a SQL Server cluster instance can address more than one drive letter, a drive
letter can be associated to only a single SQL Server cluster instance. Additionally, a drive letter
must be configured as a dependency of the SQL Server service being allowed to store databases.

Security Configuration
You need to configure each SQL Server service with a service account. You should generally
use a different account for each SQL Server service—such as the SQL Server, SQL Server
Agent, and Full Text services.
Although the accounts do not need any special privileges, they must be domain accounts
because the security identifier (SID) for a local account cannot be resolved on another
machine.
SQL Server 2008 does not require service accounts with administrative authority in
Windows. This has created a situation in which a Windows account could have dozens of
individual permissions granted to it, such as registry access, directory access, and file access
permissions. Changing service accounts would become very complicated because you would
have to assign all these individual permissions to the new service account to ensure that
services continue to function normally.
With the shift in the security infrastructure, the Windows accounts for SQL Server 2008
services are designed to follow industry-accepted practices for managing Windows accounts.
Windows groups are granted permissions on the various resources that will be accessed.
Windows accounts are then added to their respective groups to gain access to resources.
On a stand-alone machine, these groups are created by default of the form SQLServer
MSSQLUser$<machine name>$<instance name> and SQLServerSQLAgentUser$<machine
name>$<instance name>. SQL Server Setup automatically assigns permissions on the

directories, registry keys, and other resources needed to allow a SQL Server to function to the
appropriate group. It then adds the service account to the respective group.
Although this process works on a stand-alone machine, it is not as simple in a cluster. Within
the cluster, a SQL Server failover cluster instance can be running on any physical machine in the
cluster. Local Windows groups do not have a valid security context across machines. Therefore,
the groups for the SQL Server service accounts need to be created at domain level.
The installation routine does not assume that you have the authority to create groups in
the domain. You need to create these domain groups prior to installing a SQL Server failover
cluster instance. You have to define three groups within the domain that have the following
purposes:
SQL Server service accounts
SQL Server Agent service accounts
SQL Server Full Text Search Daemon accounts
You specify the groups that you create during the final stages of the installation routine.

BEST PRACTICES
S BALANCING SECURITY WITH MANAGEABILITY
Security best practices would create a domain-level group for each type of service and
for each SQL Server clustered instance installed. Management simplicity would create
a domain-level group for each of the three services, and all SQL Server failover cluster
instances would specify the same set of domain groups. You need to determine where to
balance a very secure (but highly complex) domain group scheme with a less complex (but
less secure) domain group scheme.

Health Checks
Clustering performs two health checks against a SQL Server failover cluster instance. The first
check performed is the LooksAlive test, which is a ping from each node in the cluster to the IP
address of the SQL Server instance. However, a ping test does not indicate that an instance is
available—the instance could be responding to a ping but still be inaccessible.
To detect availability issues because SQL Server is unavailable, a second check, the IsAlive
test, is performed. The IsAlive test creates a connection to the SQL Server instance and issues
SELECT @@SERVERNAME. The SQL Server must return a valid result set to pass this health
check.

Cluster Failover
If either health check fails, the cluster initiates a failover of the SQL Server instance.
The first step in the failover process is to restart SQL Server on the same node. The
instance is restarted on the same node because the cluster first assumes that a transient error
caused the health check to fail.


If the restart does not respond immediately, the SQL Server group fails over to another
node in the cluster (the secondary node). The network name of the server running SQL Server
is unregistered from DNS. The SQL Server IP address is bound to the network interface card
(NIC) on the secondary node. The disks associated to the SQL Server instance are mounted
on the secondary node. After the IP address is bound to the NIC on the secondary node, the
network name of the SQL Server instance is registered into DNS. After the network name and
disks are online, the SQL Server service is started. After the SQL Server service is started, SQL
Server Agent and Full Text indexing are started.
Regardless of whether the instance was restarted on the same node or on a secondary
node, the SQL Server instance is shut down and restarted. Any transactions that have not
completed when the failover process is initiated are rolled back when SQL Server restarts.
Upon restarting, the normal process of restart recovery is followed.
In general, a cluster will fail over in 10 to 15 seconds. The failover time can be affected
by the registration into DNS, and it can also increase if a large number of databases are
configured on the instance. In SQL Server 2000, the failover time was bound by the amount
of time it took for both the redo and undo phases to complete, which left the failover time at
the mercy of the applications issuing transactions against databases. Because databases are
now available as soon as the redo phase completes, a SQL Server 2008 clustered instance fails
over and has databases available much more rapidly.

Q
Quick Check
1 . Which types of Windows accounts and groups can you use with a SQL Server
cluster instance?

2. With how many clustered instances can a single drive letter be used?

3. What are the two health checks performed in a cluster, and which operations are
executed?

Quick Check Answers
1 . Domain users and domain groups must be used with SQL Server failover cluster
instances. The SID for accounts and groups used must be resolvable across all
nodes in the cluster. The SID for a local account or group cannot be resolved
across machines.

2. Although a clustered instance can address multiple drive letters, you can configure
a given drive letter for only a single instance. This configuration prevents the
possibility of having SQL Server running on one node while a different node has
ownership of the disk resources required by the clustered instance.

3. The LooksAlive check executes every 5 seconds by default and issues a ping from
all nodes to the IP address of the SQL Server clustered instance. The IsAlive check
executes every 60 seconds by default, connects to the SQL Server clustered
instance, issues SELECT @@SERVERNAME, and must receive a valid result set.


PR ACTICE Installing a SQL Server Failover Clustered Instance

In this practice, you install a SQL Server failover cluster instance into the Windows cluster
created in the practice for Lesson 1.

NOTE
E PREREQUISITES
You must have already installed .NET Framework 2.0 SP1 on both nodes in the cluster
before proceeding with this practice. If you are configuring a Windows Server 2003 cluster,
you also need to download and install the KB937444 hotfix.

1. Open Cluster Administrator and connect to Clust1.
2. Right-click Group 0 and rename it Temp.

IMPORTANT
T CLUSTER GROUP SPECIFICATION
Unlike the previous three versions of SQL Server, the SQL Server 2008 setup routine
does not allow you to specify an existing cluster group to install services to, even
though the group specified does not contain a SQL Server instance. So to get around
this problem, make sure that you do not have a group already created with the same
name as you want when installation completes.

3. Select the Group 1 group; drag and drop the disk in the Group 0 group into the Temp
group. When prompted, click Yes twice to confirm this move.
4. Verify that the temp group contains both Disk M and Disk N.
5. Right-click Group 1 and select Delete. Click Yes to confirm the deletion. Verify that the
Temp group contains Disk M and Disk N, as shown in Figure 14-12.
6. Switch to the DC.
7. Open Active Directory Users And Computers.
8. Create three global security groups: SQLServerService, SQLServerAgentService, and
SQLServerFullTextService.
9. Create a user account named SQLAdmin, as shown in Figure 14-13. Also create an
account named SQLServerFullText.
10. Add the SQLAdmin account to the SQLServerService and SQLServerAgentService
groups. Add the SQLServerFullText account to the SQLServerFullTextService group.
11. Switch back to Node1 and start SQL Server setup.
12. Accept the End User License Agreement and click Next.
13. Click Install to install the setup prerequisites. When the installation completes, click
Next.
14. Click the Installation link and then click the New SQL Server Failover Cluster installation
link in the SQL Server Installation Center.


FIGURE 14-12 Initial Cluster Group configuration

FIGURE 14-13 Domain groups and users


15. After the Setup rules execute, click OK.
16. On the Setup Support Files page, click Install to install the support files.
17. After the System Configuration Check completes, click Next.
18. On the Product Key page, enter your product key or choose Specify A Free Edition.
Click Next.
19. On the License Terms page, select the I Accept The License Terms check box.
Click Next.
20. Select the SQL Server Database Services and SQL Server Replication check boxes. Click
Next.
21. On the Instance Configuration page, specify SQLClust1 as the SQL Server Failover
Cluster Network Name as well as the Instance Name and Instance ID, as shown in
Figure 14-14. Click Next.

FIGURE 14-14 Specifying the cluster name

22. Verify the disk space requirements. Click Next.
23. Specify SQLClust1 for the SQL Server cluster resource group name and click Next, as
24. Select Disk M and Disk N on the Cluster Disk Selection page, as shown in Figure 14-16.
Click Next.


FIGURE 14-15 Specifying the cluster resource group

FIGURE 14-16 Selecting the cluster disk


25. Select the IPv4 check box, and specify 10.1.1.6 for the IP address. Click Next, as shown
in Figure 14-17.

FIGURE 14-17 Selecting the cluster network

26. Specify SQLServerService for the Database Engine domain group and
SQLServerAgentService for the SQL Server Agent domain group on the Cluster Security
Policy page, and then click Next, as shown in Figure 14-18.

FIGURE 14-18 Cluster security policy


27. Specify the service accounts for the SQL Server Agent, SQL Server Database Engine, and
Full Text Search Daemon services along with their passwords, as shown in Figure 14-19,
and click Next.

FIGURE 14-19 Specifying service accounts

CAUTION
N SERVICE STARTUP
The start-up type for SQL Server clustered services should be Manual. The Windows
cluster needs to have control over the services. If you change the start-up type to
Automatic, you will cause errors with the cluster operations.

28. Select Mixed Mode (SQL Server Authentication And Windows Authentication) and click
Add Current User, as shown in Figure 14-20.
29. Click the Data Directories tab and change the location of the log files to the N: drive, as
shown in Figure 14-21. Click Next.
30. Select the check boxes on the Error And Usage Reporting page if you want to send
error reports and usage data to Microsoft. Click Next.
31. Review the Cluster Installation Rules page. Click Show Details if you want to see the
setup rules and your configuration’s status. Click Next.
32. Review the configuration on the Ready To Install page and click Install.
33. At the completion of the installation, click Close.


FIGURE 14-20 Specifying the authentication mode

FIGURE 14-21 Specifying data directories


34. Observe the resources that are now configured in the SQLClust1 group within Cluster
Administrator, as shown in Figure 14-22.

FIGURE 14-22 Configured, single-node cluster

35. It is impossible to install all nodes in a cluster from a single machine, so switch to
Node2, map the directory on Node1 that has the SQL Server installation files, and start
SQL Server setup directly from Node2.
36. Install the .NET Framework prerequisites, any required hotfixes, and reboot if necessary.
37. When you reach the SQL Server Installation Center, click the Installation link, and click
Add Node To A SQL Server Failover Cluster.
38. After the Setup Support Rules analysis completes, click OK.
39. Install the Setup Support files.
40. After the second analysis of Setup Support Rules completes, click Next.
41. On the Cluster Node Configuration page, select SQLCLUST1 for the instance name, as
shown in Figure 14-23. Click Next.
42. Specify the passwords for the service account(s). Click Next.
43. Select the check boxes on the Error And Usage Reporting page as appropriate and
then click Next.
44. Verify that the server passes all node rules and click Next.
45. Click Install to start the installation, as shown in Figure 14-24.


FIGURE 14-23 Selecting the clustered instance to configure

FIGURE 14-24 Installing binaries on the second node


Lesson Summary
You can conﬁgure SQL Server as either single- or multiple-instance clusters.
The LooksAlive and IsAlive health checks provide the capability to detect failures and
fail over automatically.
Although an instance can use multiple disks, you can associate a disk only to a single
SQL Server clustered instance.

Lesson Review
You can use the following questions to test your knowledge of the information in Lesson 2,
“Designing SQL Server 2008 Failover Cluster Instances.” The questions are also available on
the companion CD if you prefer to review them in electronic form.

NOTE
E ANSWERS

1. Consolidated Messenger is experiencing outages because of hardware failures. Because
the company’s business is run from SQL Server databases, a solution needs to be
implemented to minimize downtime. Management also wants to ensure that the system
can recover from failures without requiring the intervention of IT staff. What technology
can you use to accomplish these requirements?
A. Log shipping
B. Replication
C. Failover clustering
D. Database snapshots
2. Trey Research currently has four instances of SQL Server running a variety of databases in
support of the company’s medical research. Instance1 requires 200 GB of disk space for
databases and serves more than 500 concurrent users. Instance2 requires about 1 terabyte
of storage space for a small group of 25 researchers who are investigating genome
therapy. Instance3 and Instance4 contain smaller databases that manage all the company’s
infrastructure (for example, HumanResources, Payroll, and Contacts). The Genetrak
database on Instance1 routinely consumes more than 60 percent of the processor
capacity. Instance2 averages 45 percent processor utilization. Which version and edition of
Windows is required to build a SQL Server cluster environment at the minimal cost?
A. Windows 2000 Advanced Server
B. Windows 2000 Datacenter edition
C. Windows Server 2003 Standard edition
D. Windows Server 2003 Enterprise edition


Chapter Review
following tasks:
topics of this chapter and asks you to create a solution.

Chapter Summary
SQL Server clustering is based on Windows clustering to provide automatic failure
detection and automatic failover.
A cluster can be conﬁgured as a standard cluster with a shared quorum or as a majority
node set with a copy of the quorum database on each node.
The LooksAlive and IsAlive health checks are designed to detect hardware failures as
well as the unavailability of SQL Server for connections.
SQL Server failover clustering protects only from a hardware failure.

Key Terms
Cluster group
Cluster name
Cluster node
Cluster resource
Majority node set cluster
Quorum database
Standard cluster

Case Scenario

Case Scenario: Planning for High Availability
In the following case scenario, you apply what you’ve learned about failover clustering. You
can ﬁnd answers to these questions in the “Answers” section at the end of this book.


BACKGROUND

Company Overview
Margie’s Travel provides travel services from a single office located in San Diego. Customers
can meet with an agent in the San Diego office or make arrangements through the
company’s Web site.

Problem Statements
With the addition of a new product catalog, the Web site is experiencing stability issues.
Customers are also prevented from purchasing products or services at various times during
the day when changes are being made to the underlying data.
The company has just fired the consulting firm responsible for developing and managing
the Web site and all other applications within the company because of its failure to provide
any availability for business-critical systems.

Planned Changes
The newly hired CTO has been tasked with implementing high availability for all
business-critical systems. The CTO has just hired a DBA and a systems administrator to assist
in this task as well as manage the day-to-day operations.

There are 11 databases within the environment, as shown in Table 14-12.

TABLE 14-12 Margie’s Travel Databases

DATABASE PURPOSE SIZE

Orders Stores all orders placed by customers. 50 GB
Customers Stores all personal information related to a customer. 15 GB
CreditCards Stores customer credit card information. 200 MB
Employees Stores information related to all employees. 50 MB
Human Stores all human resource (HR) documents, as well as employee 300 MB
Resources salaries.
Products Stores the products that can be purchased on the Web site. 25 GB
Flights Stores the flights that have been booked by customers. 2 GB
Cruises Stores the cruises that have been booked by customers. 1 GB
CarRental Stores the car rentals that have been booked by customers. 1 GB
Excursions Stores the excursions that have been booked by customers. 2 GB
(An excursion is defined as something that is not a flight, cruise,
product, or car rental.)
Admin A utility database for use by DBAs that is currently empty. 12 GB


The environment has a single Web server named WEB1, along with a single database
server named SQL1. All servers are running on Windows Server 2003, and SQL1 is running
SQL Server 2008.
SQL1 has an external storage cabinet connected to a redundant array of inexpensive disks
(RAID) controller with a battery backup that is capable of implementing RAID 0, RAID 1, and
RAID 5. The entire array is currently configured as a single RAID 0 set. The current storage is
at only 10 percent capacity.
A tape drive is connected to both WEB1 and SQL1, but the tape drives have never been used.
SQL1 and WEB1 are currently located in the cubicle adjacent to the previously fired
consultant. All applications on Web1 are written using either Active Server Pages (ASP) or
ColdFusion.

PROPOSED ENVIRONMENT
The CTO has allocated a portion of the budget to acquire four more servers configured with
Windows Server 2003 and SQL Server 2008. All hardware will be cluster-capable.
Data within the Products, Customers, Orders, Flights, Cruises, Excursions, and CarRental
databases can be exposed to the Internet through applications running on Web1. All other
databases must be behind the firewall and accessible only to users authenticated to the
corporate domain.
A new SAN is being implemented for database storage that contains sufficient drive
space for all databases. Each of the 20 logical unit numbers (LUNs) configured on the SAN is
configured in a stripe of mirrors configuration, with four disks in each mirror set.

A short-term solution is in place that enables the system to be fully recovered from any
outage within two business days, with a maximum data loss of one hour. In the event of a
major disaster, the business can survive the loss of up to two days of data.
A maintenance window between the hours of midnight and 8 A.M. on Sunday is available to
make any changes.
A longer-term solution needs to be created to protect the company from hardware
failures, with a maximum outage of less than one minute required.

Technical Requirements
The Orders and Customers databases need to be stored on the same SQL Server instance and
fail over together because both databases are linked.
All HR-related databases must be secured very strongly, with access for only the HR director.
All HR data must be encrypted within the database as well as anywhere else on the network.
The marketing department needs to build reports against all the customer and order data,
along with the associated products or services that were booked, to develop new marketing
campaigns and product offerings. All analysis requires near real-time data.


All databases are required to maintain 99.92 percent availability over an entire year.
A minimum of intervention from administrators is required to recover from an outage.
Customers using the Web site need to be unaware when a failover occurs.
1. Which technology or technologies can you use to meet all availability and business
needs? (Choose all that apply.)
A. A two-node majority node set cluster
B. A two-node standard cluster
C. Database mirroring
D. Replication
2. Which technology should be used to meet the needs of the marketing department?
A. Failover clustering
B. Database mirroring
C. Log shipping
D. Replication
3. Which combinations of Windows and SQL Server meet the needs of Margie’s Travel
with the lowest cost?
A. Windows Server 2003 Standard edition with SQL Server 2008 Standard
B. Windows Server 2003 Enterprise edition with SQL Server 2008 Standard
C. Windows Server 2003 Enterprise edition with SQL Server 2008 Enterprise
D. Windows Server 2003 Datacenter edition with SQL Server 2008 Datacenter

Suggested Practices

Windows Clustering
The following suggested practices for this topic are based on the Windows cluster built in the
practice for Lesson 1.
Practice 1 Fail over the cluster from Node1 to Node2 and observe the state of each
resource along with the dependency chain.
Practice 2 Fail all groups over to Node1. Evict Node2 from the cluster.
Practice 3 Add Node2 to the cluster again.
Practice 4 Change the IP address for the cluster.
Practice 5 Complete the best practices conﬁguration for a Windows cluster by
setting the Public network to All Communications and the Private network to Internal
Clustering Communications Only.


SQL Server Failover Clustering
The following suggested practices for this topic are based on the SQL Server failover cluster
instance built in the practice for Lesson 2.
Practice 1 Fail over the SQL Server instance from Node1 to Node2 and observe the
state of each resource along with the dependency chain.
Practice 2 Install a second failover cluster instance into your Windows cluster.
Practice 3 Change the IP address for the server running SQL Server.
Practice 4 Create a file share, add it to the cluster, and configure it so that it is
addressable by the same name regardless of which node on which it is running.
Practice 5 Configure the file share so that if it fails to come online during a failover, it
does not cause the entire group to be taken off-line.




CHAPTER 15

Database Mirroring
atabase Mirroring provides a fault-tolerant alternative to SQL Server failover clustering
D while also allowing failure protection to be limited to one or more databases instead of
to the entire instance.
This chapter explains how to design and deploy Database Mirroring.

Implement database mirroring.

Lesson 1: Overview of Database Mirroring 452

Lesson 2: Initializing Database Mirroring 464

Lesson 3: Designing Failover and Failback Strategies 471

Before You Begin
To complete the lessons in this chapter, you must have the following:
Three instances of SQL Server installed

 Two of the instances must be running SQL Server 2008 Standard, Enterprise, or
Developer.

 One of the instances can be any edition of SQL Server, including SQL Server 2008
Express.
A SQL Server 2005 version of the AdventureWorks database installed on at least one
of the instances

CAUTION FILESTREAM DATA
You need a version of the AdventureWorks database from a previous edition of SQL
Server, because the SQL Server 2008 version of the AdventureWorks database contains
FILESTREAM data, and Database Mirroring is not compatible with FILESTREAM data.

CHAPTER 15 451

Lesson 1: Overview of Database Mirroring
As a new technology, Database Mirroring introduces new terminology with new capabilities.
This lesson covers the terminology used with Database Mirroring and provides an
understanding of the operation of Database Mirroring.

Design Database Mirroring roles


Database Mirroring Roles
There are two mandatory Database Mirroring roles and a third optional role. You must
designate a database in a principal role and another database in a mirror role. If you want,
you can also designate a SQL Server instance in the role of witness server to govern automatic
failover from the principal to the mirror database. Figure 15-1 shows a reference diagram for
a Database Mirroring conﬁguration.

Witness

Principal Mirror

Application

SQL Server SQL Server

FIGURE 15-1 Database Mirroring components

The databases designated in the role of principal and mirror comprise a Database
Mirroring session. You can conﬁgure an optional witness server for each session, and a single
witness server can manage multiple Database Mirroring sessions.

452 CHAPTER 15 Database Mirroring

Principal Role
The database that you configure in the principal role becomes the source of all transactions
in a Database Mirroring session. The principal, or primary, database is recovered and allows
connections, and applications can read data from and write data to it.

NOTE
E SERVING THE DATABASE
When an instance has a database that allows transactions to be processed against it, it is
said to be “serving the database.”

Mirror Role
The database that you define in the mirror role is the database partner of the principal
database and continuously receives transactions. The Database Mirroring process is
constantly replaying transactions from the principal database into the transaction log and
flushing the transaction log to the data files on the mirror database so that the mirror
database includes the same data as the principal database. The mirror database is in a
recovering state, so it does not allow connections of any kind, and transactions cannot be
written directly to it. However, you can create a database snapshot against a mirror database
to give users read-only access to the database’s data at a specific point in time.

NOTE
E TRANSIENT OPERATING STATES
The principal and mirror roles are transient operating states within a Database Mirroring
session. Because the databases are exact equivalents and are maintained in synch with each
other, either database can take on the role of principal or mirror at any time.

Witness Server
The witness server role is the third optional role that you can define for Database Mirroring.
The sole purpose of the witness is to serve as an arbiter within the High Availability operating
mode to ensure that the database can be served on only one SQL Server instance at a time. If
a primary database fails and the witness confirms the failure, the mirror database can take the
primary role and make its data available to users.
Although Database Mirroring enables a principal and mirror to occur only in pairs
(for example, a principal cannot have more than one mirror, and vice versa), a witness
server can service multiple Database Mirroring pairs. The sys.database_mirroring_witnesses
catalog view stores a single row for each Database Mirroring pair that is serviced by the
witness.

Lesson 1: Overview of Database Mirroring CHAPTER 15 453

IMPORTANT
T DATABASE-LEVEL VS. SERVER-LEVEL ROLES
Principal and mirror roles occur at a database level and must be defined within SQL Server
2008 instances that are running either SQL Server 2008 Standard or Enterprise. However,
you define the witness role at an instance level. The instance of SQL Server 2008 that you
use for the witness server can be running any edition, including SQL Server 2008 Express,
which is why you refer to a principal or mirror database but a witness server.
r

Database Mirroring Endpoints
All Database Mirroring traffic is transmitted through a TCP endpoint with a payload of
DATABASE_MIRRORING. You can create only one Database Mirroring endpoint per SQL
Server instance.

MORE INFO ENDPOINTS
For more information about defining endpoints, please refer to Chapter 8, “Designing SQL
Server Endpoints.”

By default, the Database Mirroring endpoint is defined on port 5022. Although port 5022
can be used for Database Mirroring, it is recommended that you choose a different port
number to avoid a configuration that can be attacked by an inexperienced hacker who is
trying to exploit systems using a default configuration.
You can configure multiple SQL Server instances on a single server, and each instance can
have a single Database Mirroring endpoint. However, you must set the port number for the
Database Mirroring endpoint on each instance on the same server to a different port number.
If you will be using only a single instance per server for Database Mirroring, you should
standardize a port number within your environment.
You can assign a name to each endpoint that you create. The name for a Database
Mirroring endpoint is used only when the state is being changed or a GRANT/REVOKE
statement is being issued. Because the endpoint name is used only by a database
administrator (DBA) for internal operations, it is recommended that you leave the name set to
its default value of Mirroring.
Security is the most important aspect that you configure for Database Mirroring. You
can configure the Database Mirroring endpoint for either encrypted or nonencrypted
communications. It is recommended that you leave the endpoint configured by the default
value, which encrypts all traffic between endpoints. If the instances participating in Database
Mirroring do not have the same service account for the SQL Server service, you must ensure
that each service account is granted access to the SQL Server along with being granted
CONNECT TO authority on the Database Mirroring endpoint.

454 CHAPTER 15 Database Mirroring

MORE INFO SECURING AN ENDPOINT
For more information about defining the security of a Database Mirroring endpoint, please
refer to Chapter 8.

Operating Modes
You can configure Database Mirroring for three different operating modes: High Availability,
High Performance, and High Safety. The operating mode governs the way SQL Server
transfers transactions between the principal and the mirror databases, as well as the failover
processes that are available in the Database Mirroring session. In this lesson, you learn about
each operating mode, the benefits of each mode, and how caching and Transparent Client
Redirect capabilities give Database Mirroring advantages over other availability technologies.

High Availability Operating Mode
High Availability operating mode provides durable synchronous transfer between the principal
and mirror databases, as well as automatic failure detection and automatic failover.
SQL Server first writes all transactions into memory buffers within the SQL Server memory
space. The system writes out these memory buffers to the transaction log. When SQL
Server writes a transaction to the transaction log, the system triggers Database Mirroring
to begin transferring the transaction log rows for a given transaction to the mirror. When
the application issues a commit for the transac

Mcts self paced training kit exam 432 sql server 2008 - implementation and maintenance

More Related Content

What's hot (19)

Viewers also liked (20)

Similar to Mcts self paced training kit exam 432 sql server 2008 - implementation and maintenance (20)

Mcts self paced training kit exam 432 sql server 2008 - implementation and maintenance