SlideShare a Scribd company logo
SQLBits X SQL Server 2012 Rich Unstructured Data
Make SQL Server the preferred choice for managing
Unstructured Data and allow building Rich Application
Experience on top
Scale Up for storage and search to 100m to 500m documents
Easy use/access to Unstructured data from all applications
Rich insight into unstructured data to make better decisions
Transactional Access                       Streaming Win32 Access
                                                                                    Streaming Win32 Access??
                                             Database Applications                  Windows Apps           SQL Apps


                                                                        Blobs            SMB Share         FileStream
                                                                                        Files/Folders          API

                      Rich Services

 Fulltext Search                                      Database




                                                                                                               Solutions
                                                                                                               Scale-up
Semantic Similarity                                                                                                            Disk   Disk   Disk



                                                     FileTable
                                                                                                                                1      2      3


                                                                                 FileStreams
    Search
                                                                                                                            Multiple Containers


                                       Integrated Administration?
                                        Integrated Administration                   Remote BLOB Storage
                                                                                 Customer Application
                                                                                            SQL RBS API
                                         D
                                                 D                                               Centera        SQL
                                         B           FileStre                   Azure lib          lib     FILESTREAM lib
                                                 B        FileStreams




                                              Integrated                        Azure           Centera        SQL DB
                                      Backup/Replication/AlwaysOn
Machine Boundary
                                                   1 Write BLOB(Photo)
             Application
                                                   2 Return Blob ID
                       2
      RBS Client           RBS                     3     Write Blob ID to
       Library             Services:                     PhotoRef field
                           • Create
      BLOB Store           • Fetch
    Provider Library       • GC
                           • Delete
                                               ClaimID       ClaimDate       PhotoRef
1                                        3      4390         6/5/2007       <Binary(20)>


BLOB Store                        SQL Server
// Store a new blob.
byte[] myBlobId;
SqlRemoteBlobContext blobContext = new SqlRemoteBlobContext(sqlConn);


using (SqlRemoteBlob newBlob = blobContext.CreateNewBlob()) {
    // Write to a System.IO.Stream object.
    newBlob.Write(…);
    newBlob.Close();
    myBlobId = newBlob.BlobId;
}
// Alternative way to write.
newBlob.WriteFromStream(inputStream);
// Add a new row including the blob ID to the database
// table.
// Fetch the blob.
using (SqlRemoteBlob existingBlob = blobContext.OpenBlob(myBlobId)) {
    // Read from System.IO.Stream object.
    existingBlob.Read(...);
}


// Alternative way to read.
existingBlob.ReadToStream(outputStream);
Store BLOBs in
DB + File System
     Application

               BLOB




     DB
// New TSQL Function:
// Get_filestream_transaction_context()
//
SELECT Get_filestream_transaction_context()


// New TSQL Function :
// PathName()
//
SELECT ClaimImage.PathName()
FROM Insurancedb..Claims
// New SqlFileStream Class in VS08 SP1
//
SqlFileStream sfs = new SqlFileStream(path, txnId, System.IO.FileAccess.Read);


// output file to read into
System.IO.FileStream fs = new System.IO.FileStream ("c:output2.jpg", System.IO.FileMode.Create);
{
     byte[] buffer = new byte[512 * 1024];
     int cbBytesRead = buffer.Length;
     while (cbBytesRead == buffer.Length)
      {
          cbBytesRead = sfs.Read(buffer, 0, buffer.Length);
          fs.Write(buffer, 0, cbBytesRead);
      }
}
sfs        SqlFileStream




      sfs.Write



// commit SQL transaction and close SQL connection.
SQLBits X SQL Server 2012 Rich Unstructured Data
FileTable Folder Hierarchy
FILESTREAM
Share
                        MSSQLSERVER


                                                             my_machine
Database                                                     MSSQLSERVEROffice
Directories                                                  DocsDocuments
               Private Docs                  Office Docs
               (Database1)                  (Database2)

FileTable
Directories
                                Media          Documents       LogFiles
                              (FileTable)      (FileTable)    (FileTable)
User-Defined
Directory
Structure
ALTER DATABASE Contoso SET FILESTREAM( non_transacted_access=FULL,
        Directory_name = N’Contoso’)



CREATE TABLE Contoso..Documents AS FILETABLE
      WITH (filetable_directory = N'Document Library')
             <machine name><FILESTREAM share>ContosoDocument Library
FileTable Schema
      File Attribute Name              Type                                           Purpose
 Path_locator               hierarchyid                 Represents position of this node in the hierarchical FileNamespace.

 parent_path_locator        hierarchyid                 Represents the hierarchyID of the parent directory
                                                        -- a computed column
 stream_id                  uniqueidentifier            UniqueId for Filestream Data
 file_stream                varbinary(max) filestream   Filestream data

 file_type                  nvarchar(255)               Type of the file. Can be used for fulltext index creation

 cached_file_size           bigint                      Size of the filestream (cached value)
 Name                       nvarchar(255)               File/Folder Name (e.g foo.txt)
 creation_time              datetime2                   Creation Time
 last_write_time            datetime2                   LastWrite Time
 last_access_time           datetime2                   LastAccess Time
 is_directory               bit                         TRUE for directories.
 is_offline                 bit                         Offline attribute
 is_hidden                  bit                         Hidden attribute
 is_readonly                bit                         Read Only attribute
 is_archive                 bit                         Archive attribute
 is_system                  bit                         System attribute
 is_temporary               bit                         Temporary attribute
ALTER TABLE Documents DISABLE FILETABLE_NAMESPACE
machine<FILESTREAMshare><Database_directory><FileTable_Directory>...
GetFileNamespacePath()
FileTableRootPath()
GetPathlocator()
DECLARE @path nvarchar(max)

// get FileNamespace path
SELECT @path=file_stream.GetFileNamespacePath()
FROM DocumentStore WHERE name='MySpec.doc';

// Open File handle
handle = CreateFile( @path, GENERIC_WRITE, 0, NULL,   CREATE_ALWAYS,
FILE_ATTRIBUTE_NORMAL, NULL);
SQLBits X SQL Server 2012 Rich Unstructured Data
VNNSharedb
sys.dm_filestream_non_transact_handles




sp_kill_filestream_non_transacted_handles
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
Create/Alter Database
          max_size
DBCC Shrinkfile Emptyfile
Use of multiple spindles for achieving better I/O Scalability
2012   2012
File Stores /
                                           SQL BLOBs     Remote Blob
                          External Blob                                  FILESTREAM   FILETABLE
                                                            API
                          Stores (CAS)

                           Depends on                    Depends on
Streaming Performance
                          external store                external store

                         Depends on                    Depends on
  Win32 App Compat       external store                external store


Link Level Consistency


Data Level Consistency

  Integrated Query &
     Management

Non-local Windows File
                                n/a
       Servers


 External Blob Stores           n/a
Features                                              FileServer+DB   SQL 2008–    SQL 2012–
                                                      Solution        FILESTREAM   FileTable


Integrated Admin operations for Relational and File   No              Yes          Yes
data
- Backup/Restore, HA/Mirroring
Integrated Services for Relational and File data      No              Yes          Yes
- Tex/Semantic Search, Reports, Query etc
Integrated Security Model                             No              Yes          Yes
In-place update of Filestream data                    Yes             No           Yes
(non-transacted)
Fully Transacted update of Filestream data            No              Yes          Yes
File/Directory hierarchy in DB                        No              No           Yes
Win32 App compatibility                               Yes             No           Yes
Relational access to File Attributes                  No              No           Yes
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
Queries over 350M documents database and random DMLs running in background.
Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
2005/8 vs 2012




                                                                         2005/8


                                                                         2012




Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer
playback benchmark
SQLBits X SQL Server 2012 Rich Unstructured Data
New Search Filter for Document Properties
    CONTAINS (PROPERTY ( { column_name }, 'property_name' ), ‘contains_search_condition’ )
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
Source Table                                                                                             Keyphrases              KeyphraseDocuments
                                                                                      --------------
      Key           Title                                   Document                  --------------           ID   Keyword           ID                   DocID
      D1            Annual Budget                           …                         --------------
                                                                                    --------------
                                                                                      --------------           T1   revenue           T1 (revenue)         D1 (Annual Budget)
      D2            Corporate Earnings                      …                       --------------
                                                                                      --------------
                                                                                  --------------
                                                                                    --------------             T2   growth            T2 (growth)          D2 (Corporate Earnings)
      D3            Marketing Reports                       …                     --------------
                                                                                    --------------             T3   Windows           T3 (Windows)         D3 (Marketing Reports)
                                                                                  --------------
                                                                                    --------------
      …             …                                       …                                                  T4   Azure
                                                                                  --------------                                      …                    …
                                                                                  --------------               …    …                 T1 (revenue)         D7 (Finance Report)
                                                                1
                                                                                                                                      …                    …
                Full-Text and Semantic Processing                                                                                     T3 (Windows)         D11 (Azure Strategy)

                                         quarter, record,
                                                                                                                                      T4 (Azure)           D11 (Azure Strategy)
                                         revenue…




                                                                                                                                                           3
                                                                                                                              DocumentSimilarity
                                                                2
                                                                a



Keyword Index (Full-Text)                                                                                                     DocID                        MatchedDocID
ID        Keyword       Colid   …                compDocid             CompOc                   CompPid                       D1 (Annual Budget)           D2 (Corporate Earnings)
K1        revenue       1       …                10,23,123             (1,4),(5,8),(1,34)       2,5,6,8,4,3                   D1 (Annual Budget)           D7 (Finance Report)
K2        growth        1       …                10,23,123             (1,5),(5,9),(1,34)       2,5,6,8,5,4                   D3 (Marketing Reports)       D11 (Azure Strategy)
          …             …       …                …                     …                        …                             …                            …
CREATE FULLTEXT INDEX ON Production.Document (   ALTER FULLTEXT INDEX ON Production.Document
  Title LANGUAGE 1033,                               ALTER COLUMN Document
  Document                                             ADD STATISTICAL_SEMANTICS
      LANGUAGE 1033                                  WITH NO POPULATION;
      TYPE COLUMN FileExtension
      STATISTICAL_SEMANTICS                      …
  )                                              …
  KEY INDEX PK_Document_DocumentID               ALTER FULLTEXT INDEX ON Production.Document
      ON documents_catalog                           START FULL POPULATION;
      WITH CHANGE_TRACKING OFF, NO POPULATION;
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data

More Related Content

PPTX
SQLBits X SQL Server 2012 Beyond Relational
PPTX
24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm
PPTX
SQLBits X Scaling out with SQL Azure Federations
PDF
Preparing yourdataforcloud
PPTX
In15orlesss hadoop
PDF
Hdfs Dhruba
PDF
Performance analysis of MongoDB and HBase
PPTX
A unified data modeler in the world of big data
SQLBits X SQL Server 2012 Beyond Relational
24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm
SQLBits X Scaling out with SQL Azure Federations
Preparing yourdataforcloud
In15orlesss hadoop
Hdfs Dhruba
Performance analysis of MongoDB and HBase
A unified data modeler in the world of big data

What's hot (18)

PDF
Sql no sql
PDF
Ado Fundamentals
PPTX
Oracle: DW Design
PPTX
SQL Server 2008 Overview
PDF
Building and deploying large scale real time news system with my sql and dist...
PDF
Prepare Your Data For The Cloud
PDF
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
PPTX
One Large Data Lake, Hold the Hype
PDF
HugeTable:Application-Oriented Structure Data Storage System
PPTX
NORM: No ORM Framework
 
PPTX
Drill njhug -19 feb2013
PPTX
No SQL introduction
PPTX
Intro to Hadoop
PDF
Microsoft's Big Play for Big Data
PDF
Hadoop and its Ecosystem Components in Action
PPTX
Unit 3 MongDB
PPT
NoSQL Options Compared
PDF
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
Sql no sql
Ado Fundamentals
Oracle: DW Design
SQL Server 2008 Overview
Building and deploying large scale real time news system with my sql and dist...
Prepare Your Data For The Cloud
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
One Large Data Lake, Hold the Hype
HugeTable:Application-Oriented Structure Data Storage System
NORM: No ORM Framework
 
Drill njhug -19 feb2013
No SQL introduction
Intro to Hadoop
Microsoft's Big Play for Big Data
Hadoop and its Ecosystem Components in Action
Unit 3 MongDB
NoSQL Options Compared
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
Ad

Viewers also liked (20)

PPTX
Scaling with SQL Server and SQL Azure Federations
PPTX
Microsoft's Hadoop Story
PPTX
U-SQL Meta Data Catalog (SQLBits 2016)
PPTX
SQL Server 2012 Beyond Relational Performance and Scale
PPTX
U-SQL Reading & Writing Files (SQLBits 2016)
PPTX
U-SQL Does SQL (SQLBits 2016)
PPTX
Killer Scenarios with Data Lake in Azure with U-SQL
PPTX
Using C# with U-SQL (SQLBits 2016)
PPTX
SQLBits X SQL Server 2012 Spatial Indexing
PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
PPTX
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
PPTX
U-SQL Learning Resources (SQLBits 2016)
PPTX
U-SQL Intro (SQLBits 2016)
PPTX
SQL and NoSQL in SQL Server
PPTX
U-SQL Federated Distributed Queries (SQLBits 2016)
PPTX
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
PPTX
U-SQL Query Execution and Performance Tuning
PPTX
ADL/U-SQL Introduction (SQLBits 2016)
PPTX
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
PPTX
Introducing U-SQL (SQLPASS 2016)
Scaling with SQL Server and SQL Azure Federations
Microsoft's Hadoop Story
U-SQL Meta Data Catalog (SQLBits 2016)
SQL Server 2012 Beyond Relational Performance and Scale
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)
Killer Scenarios with Data Lake in Azure with U-SQL
Using C# with U-SQL (SQLBits 2016)
SQLBits X SQL Server 2012 Spatial Indexing
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
SQL and NoSQL in SQL Server
U-SQL Federated Distributed Queries (SQLBits 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
U-SQL Query Execution and Performance Tuning
ADL/U-SQL Introduction (SQLBits 2016)
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Introducing U-SQL (SQLPASS 2016)
Ad

Similar to SQLBits X SQL Server 2012 Rich Unstructured Data (20)

PPTX
FileTable and Semantic Search in SQL Server 2012
PPT
ORACLE 9i
PPTX
Azure, Cloud Computing & Services
PPTX
ISUG 113: File stream
PDF
Samedi SQL Québec - La plateforme data de Azure
PPTX
EBS and RBS in SharePoint 2010
PPTX
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
PDF
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
PPTX
Remote Blog Storage (RBS) Best Practices in SharePoint 2010 - EPC Group
PPTX
MS Cloud Day - Building web applications with Azure storage
PDF
Lee oracle
PDF
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger
PPTX
Building services using windows azure
PPTX
Azure: Lessons From The Field
PPTX
Azure Data Storage
PPT
Microsoft Azure
PPT
Active Directory
PPTX
formation azure storage certification AZ104-AZ-103T00A-ENU-03.pptx
PDF
Lerox db2 database dictionary for ibm system as400 i series i5 system i v05
PDF
Citrix Day 2012: ShareFile
FileTable and Semantic Search in SQL Server 2012
ORACLE 9i
Azure, Cloud Computing & Services
ISUG 113: File stream
Samedi SQL Québec - La plateforme data de Azure
EBS and RBS in SharePoint 2010
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
Remote Blog Storage (RBS) Best Practices in SharePoint 2010 - EPC Group
MS Cloud Day - Building web applications with Azure storage
Lee oracle
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger
Building services using windows azure
Azure: Lessons From The Field
Azure Data Storage
Microsoft Azure
Active Directory
formation azure storage certification AZ104-AZ-103T00A-ENU-03.pptx
Lerox db2 database dictionary for ibm system as400 i series i5 system i v05
Citrix Day 2012: ShareFile

More from Michael Rys (17)

PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
PPTX
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
PPTX
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
PPTX
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
PPTX
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
PPTX
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
PPTX
U-SQL Partitioned Data and Tables (SQLBits 2016)
PPTX
Azure Data Lake Intro (SQLBits 2016)
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data Processing with .NET and Spark (SQLBits 2020)
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
U-SQL Partitioned Data and Tables (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
KodekX | Application Modernization Development
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Machine learning based COVID-19 study performance prediction
PDF
Approach and Philosophy of On baking technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
Reach Out and Touch Someone: Haptics and Empathic Computing
sap open course for s4hana steps from ECC to s4
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
KodekX | Application Modernization Development
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Unlocking AI with Model Context Protocol (MCP)
Diabetes mellitus diagnosis method based random forest with bat algorithm
Network Security Unit 5.pdf for BCA BBA.
Spectroscopy.pptx food analysis technology
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Machine learning based COVID-19 study performance prediction
Approach and Philosophy of On baking technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

SQLBits X SQL Server 2012 Rich Unstructured Data

  • 2. Make SQL Server the preferred choice for managing Unstructured Data and allow building Rich Application Experience on top
  • 3. Scale Up for storage and search to 100m to 500m documents Easy use/access to Unstructured data from all applications Rich insight into unstructured data to make better decisions
  • 4. Transactional Access Streaming Win32 Access Streaming Win32 Access?? Database Applications Windows Apps SQL Apps Blobs SMB Share FileStream Files/Folders API Rich Services Fulltext Search Database Solutions Scale-up Semantic Similarity Disk Disk Disk FileTable 1 2 3 FileStreams Search Multiple Containers Integrated Administration? Integrated Administration Remote BLOB Storage Customer Application SQL RBS API D D Centera SQL B FileStre Azure lib lib FILESTREAM lib B FileStreams Integrated Azure Centera SQL DB Backup/Replication/AlwaysOn
  • 5. Machine Boundary 1 Write BLOB(Photo) Application 2 Return Blob ID 2 RBS Client RBS 3 Write Blob ID to Library Services: PhotoRef field • Create BLOB Store • Fetch Provider Library • GC • Delete ClaimID ClaimDate PhotoRef 1 3 4390 6/5/2007 <Binary(20)> BLOB Store SQL Server
  • 6. // Store a new blob. byte[] myBlobId; SqlRemoteBlobContext blobContext = new SqlRemoteBlobContext(sqlConn); using (SqlRemoteBlob newBlob = blobContext.CreateNewBlob()) { // Write to a System.IO.Stream object. newBlob.Write(…); newBlob.Close(); myBlobId = newBlob.BlobId; } // Alternative way to write. newBlob.WriteFromStream(inputStream);
  • 7. // Add a new row including the blob ID to the database // table. // Fetch the blob. using (SqlRemoteBlob existingBlob = blobContext.OpenBlob(myBlobId)) { // Read from System.IO.Stream object. existingBlob.Read(...); } // Alternative way to read. existingBlob.ReadToStream(outputStream);
  • 8. Store BLOBs in DB + File System Application BLOB DB
  • 9. // New TSQL Function: // Get_filestream_transaction_context() // SELECT Get_filestream_transaction_context() // New TSQL Function : // PathName() // SELECT ClaimImage.PathName() FROM Insurancedb..Claims
  • 10. // New SqlFileStream Class in VS08 SP1 // SqlFileStream sfs = new SqlFileStream(path, txnId, System.IO.FileAccess.Read); // output file to read into System.IO.FileStream fs = new System.IO.FileStream ("c:output2.jpg", System.IO.FileMode.Create); { byte[] buffer = new byte[512 * 1024]; int cbBytesRead = buffer.Length; while (cbBytesRead == buffer.Length) { cbBytesRead = sfs.Read(buffer, 0, buffer.Length); fs.Write(buffer, 0, cbBytesRead); } }
  • 11. sfs SqlFileStream sfs.Write // commit SQL transaction and close SQL connection.
  • 13. FileTable Folder Hierarchy FILESTREAM Share MSSQLSERVER my_machine Database MSSQLSERVEROffice Directories DocsDocuments Private Docs Office Docs (Database1) (Database2) FileTable Directories Media Documents LogFiles (FileTable) (FileTable) (FileTable) User-Defined Directory Structure
  • 14. ALTER DATABASE Contoso SET FILESTREAM( non_transacted_access=FULL, Directory_name = N’Contoso’) CREATE TABLE Contoso..Documents AS FILETABLE WITH (filetable_directory = N'Document Library') <machine name><FILESTREAM share>ContosoDocument Library
  • 15. FileTable Schema File Attribute Name Type Purpose Path_locator hierarchyid Represents position of this node in the hierarchical FileNamespace. parent_path_locator hierarchyid Represents the hierarchyID of the parent directory -- a computed column stream_id uniqueidentifier UniqueId for Filestream Data file_stream varbinary(max) filestream Filestream data file_type nvarchar(255) Type of the file. Can be used for fulltext index creation cached_file_size bigint Size of the filestream (cached value) Name nvarchar(255) File/Folder Name (e.g foo.txt) creation_time datetime2 Creation Time last_write_time datetime2 LastWrite Time last_access_time datetime2 LastAccess Time is_directory bit TRUE for directories. is_offline bit Offline attribute is_hidden bit Hidden attribute is_readonly bit Read Only attribute is_archive bit Archive attribute is_system bit System attribute is_temporary bit Temporary attribute
  • 16. ALTER TABLE Documents DISABLE FILETABLE_NAMESPACE
  • 19. DECLARE @path nvarchar(max) // get FileNamespace path SELECT @path=file_stream.GetFileNamespacePath() FROM DocumentStore WHERE name='MySpec.doc'; // Open File handle handle = CreateFile( @path, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
  • 25. Create/Alter Database max_size DBCC Shrinkfile Emptyfile
  • 26. Use of multiple spindles for achieving better I/O Scalability
  • 27. 2012 2012
  • 28. File Stores / SQL BLOBs Remote Blob External Blob FILESTREAM FILETABLE API Stores (CAS) Depends on Depends on Streaming Performance external store external store Depends on Depends on Win32 App Compat external store external store Link Level Consistency Data Level Consistency Integrated Query & Management Non-local Windows File n/a Servers External Blob Stores n/a
  • 29. Features FileServer+DB SQL 2008– SQL 2012– Solution FILESTREAM FileTable Integrated Admin operations for Relational and File No Yes Yes data - Backup/Restore, HA/Mirroring Integrated Services for Relational and File data No Yes Yes - Tex/Semantic Search, Reports, Query etc Integrated Security Model No Yes Yes In-place update of Filestream data Yes No Yes (non-transacted) Fully Transacted update of Filestream data No Yes Yes File/Directory hierarchy in DB No No Yes Win32 App compatibility Yes No Yes Relational access to File Attributes No No Yes
  • 33. Queries over 350M documents database and random DMLs running in background. Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
  • 34. 2005/8 vs 2012 2005/8 2012 Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer playback benchmark
  • 36. New Search Filter for Document Properties CONTAINS (PROPERTY ( { column_name }, 'property_name' ), ‘contains_search_condition’ )
  • 40. Source Table Keyphrases KeyphraseDocuments -------------- Key Title Document -------------- ID Keyword ID DocID D1 Annual Budget … -------------- -------------- -------------- T1 revenue T1 (revenue) D1 (Annual Budget) D2 Corporate Earnings … -------------- -------------- -------------- -------------- T2 growth T2 (growth) D2 (Corporate Earnings) D3 Marketing Reports … -------------- -------------- T3 Windows T3 (Windows) D3 (Marketing Reports) -------------- -------------- … … … T4 Azure -------------- … … -------------- … … T1 (revenue) D7 (Finance Report) 1 … … Full-Text and Semantic Processing T3 (Windows) D11 (Azure Strategy) quarter, record, T4 (Azure) D11 (Azure Strategy) revenue… 3 DocumentSimilarity 2 a Keyword Index (Full-Text) DocID MatchedDocID ID Keyword Colid … compDocid CompOc CompPid D1 (Annual Budget) D2 (Corporate Earnings) K1 revenue 1 … 10,23,123 (1,4),(5,8),(1,34) 2,5,6,8,4,3 D1 (Annual Budget) D7 (Finance Report) K2 growth 1 … 10,23,123 (1,5),(5,9),(1,34) 2,5,6,8,5,4 D3 (Marketing Reports) D11 (Azure Strategy) … … … … … … … …
  • 41. CREATE FULLTEXT INDEX ON Production.Document ( ALTER FULLTEXT INDEX ON Production.Document Title LANGUAGE 1033, ALTER COLUMN Document Document ADD STATISTICAL_SEMANTICS LANGUAGE 1033 WITH NO POPULATION; TYPE COLUMN FileExtension STATISTICAL_SEMANTICS … ) … KEY INDEX PK_Document_DocumentID ALTER FULLTEXT INDEX ON Production.Document ON documents_catalog START FULL POPULATION; WITH CHANGE_TRACKING OFF, NO POPULATION;

Editor's Notes

  • #5: SQL 2008 provides Filestreams as a way add large blobs/unstructured data streams into SQL and still be able to open a Win32 handle (using SQL API) and provide high streaming performance for the data Win32 Namespace support in SQL Server 2012 has the following goals Reduce the barrier to entry for customers who have data in file servers and have Win32 applications that work on these currently. By enabling Win32 namespace, SQL will generate Windows Share that can be exposed to existing Win32 applications similar to any file server shares. This can allow Win32 applications/mid tier servers (like IIS) to work with this data without having to understand the database/transaction semantics Single integrated set of Admin tools – SQL backup/restore, Replication, HA solutions etc Scale up – Add multiple disks on a machine for storing Filestream data. Use SQL services like Full text search for both FileStream and relational metadata, Property Promotion Infrastructure fro extracting interesting properties from SQL blobs/filestream to surface as relational columns for query
  • #6: RBS API is exposed in RBS client library.
  • #7: Blob id is generated after close.Now the app can stored the blob id in the RBS column.
  • #10: To get the tran context, you need a transaction. This is a SQL tran.
  • #11: We are reading from SqlFilestream and writing the bytes read into the output buffer.
  • #12: URI: HealthCare.MRI.JoeSmithApplication::GetResourceStream Method : Returns a resource stream for a resource data file that is located at the specified UriWriting into a SqlFileStream: We use a buffer that we read into it and write from it.Fileoption: 0 =&gt; Default: buffered reads, no write through. Because no write through, might be in some cases, a bit faster.Native shipped first, we wanted client filestream code to be aggressive with flushing the cached writes.Manages sqlfilestream class shipped sometime after the native API.=========================If the file access is readwrite handle of SqlFilestream will be positioned at the beginning of the file. System.io.seek methods to move the handle..
  • #25: Reading bigger buffers gives better performance FS volumeDedicated volumes means volumes not used for tempdb (non-OS, paging, SQL data &amp; log volumes)If stored files are large as we generally recommend, format with 64K clustersDo compress filestream volumes or filestream containers, but ONLY if data to be stored is compressible. Note that in this case NTFS cluster size must be 4K.1 vol per container =&gt; enables space management at volume level.AV should be configured not to delete infected files but to quarantine them. Otherwise corruption will be reported.SMBWith 60KB: A read can happen in one single IO and ideally coming back in one single TCP-IP packet. It is not 64K because 64KB data can&apos;t fit in one single TCP/IP buffer.Partitioning:FILESTREAM columns require the presence of the ROWGUID unique index for aligned partitioning, or in case this is not possible, explicitly specifying the data placement option for the unique or primary key constraint on the ROWGUID column.
  • #28: Optimized hot paths, removed unnecessary serialization, expensive FileSystem operations etc
  • #39: Not first extraction; another instanceEach has specialty syntaxUser has to just know, and rememberBetter to have one construct for all extraction-related BR services
  • #41: Expose this data to usersCustomize: Don’t want fancy relationship, just sharing concepts!
  • #42: In all examples: choose value, choose storageImagine IntelliSense: start typing, here’s the value!