SlideShare a Scribd company logo
Using and Developing
      with Open Source
  Digital Forensics Software
in Digital Archives Programs
              Mark A. Matienzo
Manuscripts and Archives, Yale University Library
          2012 SAA Research Forum
                August 7, 2012
Is open source digital forensics
  software extensible enough
   and well-suited to support
 work in the archival domain?
Digital forensics in the
        archival domain

• Increasing use of digital forensics tools/methodologies
   within the context of digital archives programs
   (Kirschenbaum et al. 2010)

• Technology-focused work (John 2008; Woods & Brown
   2009; AIMS Work Group 2012)

• Methodology-focused work (Duranti 2009; Xie 2011)
Significant barriers to use of
digital forensics in archives


• Cost (Kirschenbaum et al. 2010; Daigle 2012)
• Complexity (Kirschenbaum et al. 2010; Daigle 2012)
• Digital archives as an emerging market for forensics
Potential of open source
  digital forensics software


• Requires additional tool development work to be useful
   for archivists (Kirschenbaum et al. 2010)

• Requires additional integration work (Lee et al. 2012)
Institutional Context

• Focus on implementation of and development with open
  source digital forensics software at Yale University
  Library

• Work must support accessioning, processing, and
  management of born-digital archival material

• Primary focus are records received on legacy media
Design Principles
• Use and develop with open source digital forensics software to support
   accessioning, arrangement, and description of born-digital archival records

• Focus on first two phases (preservation and searching) of Carrier’s (2005)
   model of digital investigation process

• Curation micro-services (Abrams, et al. 2010) as philosophical basis to
   guide development and implementation

• Recognition of both disk images as digital object (Woods, Lee, and
   Garfinkel 2011) and objects within disk images as needing management

• Intention of forensic soundness, but assume much of state is lost
Micro-services as
                Design Philosophy*
    Principles               Preferences                                          Practices
• Granularity      • Small and simple over                                 • Define, decompose,
                        large and complex                                     recurse


• Orthogonality    • Minimally sufficient over • Top down design, bottom
                        feature-laden                                         up implementation


• Parsimony        • Configurable over the                                  • Code to interfaces
                        prescribed


• Evolution        • The proven over the                                   • Sufficiency through a
                        merely novel                                          series of incrementally
                                                                              necessary steps
                   • Outcomes over means
                    *UC Curation Center/California Digital Library, 2010
Workflow
   Start
accessioning                              Write-protect media               Verify image
  process            Media




                                          Record identifying              Extract filesystem-     Disk     Meta-
                 Retrieve media            characteristics of               and file-level      images     data

                                          media as metadata                   metadata         Transfer package




                                                                           Package images
               Assign identifiers to                                                             Ingest transfer
                                            Create image                  and metadata for
                     media                                                                         package
                                                                               ingest




                                  Media                         FS/File                           Document
                                                Disk
                                   MD                            MD                              accessioning
                                               image
                                                                                                   process




                                                                                                     End
                                                                                                 accessioning
                                                                                                   process
Disk Image Acquisition
• Requires a combination of hardware (drives/media
   readers, controller cards, write blockers) and software

• In some cases, software depends on particular hardware
• Software tested: FTK Imager (proprietary/gratis),
   hardware-specific solutions (FC5025 WinDIB; KryoFlux
   DTC/GUI; Catweasel Imagetool3)

• Goal: sector image interpretable by multiple tools
Using and Developing with Open Source Digital Forensics Software in Digital Archives Programs
Analysis Process
• Multiple levels of analysis within digital forensics based
   on layers of abstraction (Carrier 2003)

• Conceptual linkages with metadata extraction/analysis
   processes with digital curation/archival domain




                           Carrier, 2003
Metadata Extraction
• Use open source digital forensics software (Sleuth Kit,
   fiwalk) and other open source tools to characterize
   media, volume, file system, and file information

• Attempt to repurpose this information as descriptive,
   structural, and/or technical metadata to support
   accessioning, appraisal, and processing
The Sleuth Kit
•Open source C library, command line tools, and GUI
  application (Autopsy) for forensic analysis

•Supports analysis of FAT, NTFS, ISO9660, HFS+, Ext2/3,
  UFS1/2

•Splits tools into layers: volume system, file system, file
  name, metadata, data unit (“block”)

•Additional utilities to sort and post-process extracted
  metadata
Digital Forensics XML
•Representation in XML of structured forensic information
  developed by Simson Garfinkel

•Produced by tools including fiwalk (Garfinkel 2012),
  which uses Sleuth Kit for volume, file system, file, and
  application-level analysis

•Easily extensible (local plugin development as focus)
•Straight forward to process
Results
Disk Images
      •Acquired 1,039 disk images from across 69 accessions at
        Manuscripts and Archives
500
           422

375
                      312


250
                                    185


125                                          94

                                                            26
  0
             CDs     3.5” floppies    DVDs   5.25” floppies        Zip disks
Metadata Extraction
      •Ran metadata extraction on 812 images
                                  File Systems within Images
400         386



300
                            246

200
                                             155


100

                                                               14          11
  0
                  ISO9660         FAT12         Unidentified         HFS+        FAT16
Metadata Extraction
•Ran enhanced metadata extraction on 619 images (users
  plugins for fiwalk developed during research)

•Performed analysis on 49,724 files within images
•Successfully identified 43,729 files (147 unique file types)
  against PRONOM format registry

•Identified 9 files as containing virus signatures (2 unique
  virus signatures)
image/tiff!
                                                    Identified MIME Types by OPF FIDO (36320 total matches)!                                                                             image/jpeg!
14000!                                                                                                                                                                                  application/msword!
                                                                                                                                                                                        text/html!
                                                                                                                                                                                        application/pdf!
                                                                                                                                                                                        image/gif!
    image/tiff, 12429!
                                                                                                                                                                                        image/bmp!
                                                                                                                                                                                        image/x-pict!
12000!
                                                                                                                                                                                        application/x-gzip!
                                                                                                                                                                                        image/vnd.dwg!
                                                                                                                                                                                        message/rfc822!
                                                                                                                                                                                        application/postscript!
                                                                                                                                                                                        application/zip!
10000!                                                                                                                                                                                  application/octet-stream!
                                                                                                                                                                                        text/plain!
                                                                                                                                                                                        video/mpeg!
                                                                                                                                                                                        application/java-archive!
         image/jpeg, 8219!                                                                                                                                                              image/x-sgi-bw!
                                                                                                                                                                                        text/xml!
 8000!
                                                                                                                                                                                        application/vnd.lotus-1-2-3!
                                                                                                                                                                                        image/png!
                                                                                                                                                                                        text/css!
                                                                                                                                                                                        video/x-msvideo!
                                                                                                                                                                                        video/quicktime!
 6000!                                                                                                                                                                                  application/rtf!
                                                                                                                                                                                        application/xml!
         application/msword, 5008!
                                                                                                                                                                                        audio/mpeg!
                                                                                                                                                                                        application/vnd.ms-powerpoint!
                                                                                                                                                                                        application/javascript!
 4000!                                                                                                                                                                                  image/vnd.dxf!
                 text/html, 3558!
                                                                                                                                                                                        audio/x-wav!
                  application/pdf, 3111!                                                                                                                                                audio/prs.sid!
                                                                                                                                                                                        application/vnd.ms-excel!
                                                                                                                                                                                        application/inf!
                                                                                                                                                                                        video/x-ms-wmv!
 2000!
                                                                                                                                                                                        audio/x-ms-wma!
                                                                                                                                                                                        application/xhtml+xml!
                              756!                                                                                                                                                      application/x-endnote-refer!
                                 499!485!480!395!                                                                                                                                       image/vnd.microsoft.icon!
                                                280!208!
                                                       152!105!102!100! 89! 71! 58! 40!
                                                                                        23! 22! 17! 17! 15! 11! 10! 10! 10! 6!   6!   4!   4!   4!   3!   3!   3!   3!   2!   1!   1!   application/x-shockwave-flash!
    0!                                                                                                                                                                                  application/x-director!
                                                                                         1!
Software Development
•Created Fiwalk plugins to perform additional analysis
  and evaluation of files/bitstreams within disk images

•Virus identification plugin using ClamAV/pyclamd
•File format identification against PRONOM format
  registry using Open Planets Foundation’s FIDO

•Code (including additional plugins) available online:
  https://github.com/anarchivist/fiwalk-dgi/
Gumshoe
• Prototype based on Blacklight (Ruby on Rails + Solr)
• Indexing code works with fiwalk output or directly from a
   disk image

• Populates Solr index with all file-level metadata from
   fiwalk and, optionally, text strings extracted from files

• Provides searching, sorting and faceting based on
   metadata extracted from filesystems and files

• Code at http://github.com/anarchivist/gumshoe
Using and Developing with Open Source Digital Forensics Software in Digital Archives Programs
Advantages
•Faster (and more forensically sound) to extract metadata
  once rather than having to keep processing an image

•Possibility of developing better assessments during
  accessioning process (significance of directory structure,
  accuracy of timestamps)

•Integrating additional extraction processes and building
  supplemental tools is simple

•Performance of tools correlates to complexity of analysis
Limitations
• Use of tools limited to specific types of file systems
• Additional software (particularly to document imaging
  process) requires additional integration and data
  normalization

• DFXML is not (currently) a metadata format common
  within domains of archives/libraries and requires an
  domain-specific application profile

• Extracted metadata maybe harder to repurpose for
  descriptive purposes based on level of granularity
Work in Progress
• BitCurator project under development; early release
   available for testing: http://wiki.bitcurator.net

• The Sleuth Kit and related tools under continuing
   development (Autopsy, fiwalk, etc.): http://sleuthkit.org

• Additional testing, development integration under work
   at Yale and NYPL
Thanks!
   Mark A. Matienzo
mark.matienzo@yale.edu
  http://matienzo.org
      @anarchivist
References
•   Abrams, S., et al. (2011). “Curation Micro-Services: A Pipeline Metaphor for Repositories.” Journal of Digital Information 12(2). http://journals.tdl.org/
    jodi/article/view/1605

•   AIMS Work Group (2012). AIMS Born-Digital Collections: An Inter-Institutional Model for Stewardship. http://www2.lib.virginia.edu/aims/whitepaper/

•   Carrier, B. (2003). “Defining Digital Forensic Examination and Analysis Tools Using Abstraction Layers.” International Journal of Digital Evidence 1(4).

•   Carrier, B. (2005). File System Forensic Analysis. Boston and London: Addison Wesley.

•   Daigle, B.J. (2012). “The Digital Transformation of Special Collections.” Journal of Library Administration 52(3-4), 244-264.

•   Duranti, L. (2009). “From Digital Diplomatics to Digital Records Forensics.” Archivaria 68, 39-66.

•   Garfinkel, S. (2012). “Digital Forensics XML and the DFXML Toolset.” Digital Investigation 8, 161-174.

•   John, J.L. (2008). “Adapting Existing Technologies for Digitally Archiving Personal Lives: Digital Forensics, Ancestral Computing, and Evolutionary
    Perspectives and Tools.” Presented at iPRES 2008. http://www.bl.uk/ipres2008/presentations_day1/09_John.pdf

•   Kirschenbaum, M.G., et al. (2010). Digital Forensics and Born-Digital Content in Cultural Heritage Collections. Washington: Council on Library and
    Information Resources.

•   Lee, C.A., et al. (2012). “BitCurator: Tools and Techniques for Digital Forensics in Collecting Institutions.” D-Lib Magazine 18(5/6).

•   UC Curation Center/California Digital Library (2019). “UC3 Curation Foundations.” Revision 0.13. https://confluence.ucop.edu/download/attachments/
    13860983/UC3-Foundations-latest.pdf

•   Woods, K. and Brown, G. (2009). “From Imaging to Access: Effective Preservation of Legacy Removable Media.” In Archiving 2009. Springfield, VA:
    Society for Imaging Science and Technology.

•   Woods, K., Lee, C.A., and Garfinkel, S. (2011). “Extending Digital Repository Architectures to Support Disk Image Preservation and Access.” In JCDL ’11.

•   Xie, S.L. (2011). “Building Foundations for Digital Records Forensics: A Comparative Study of the Concept of Reproduction in Digital Records Management
    and Digital Forensics.” American Archivist 74(2), 576-599.
Sleuth Kit example
$ fsstat -t 2004-M-088.0007.dd
fat12
Sleuth Kit example
$ fsstat -t 2004-M-088.0007.dd
fat12

$ fls -a -m A: 2004-M-088.0007.dd
0|A:/DRURY|3|r/rrwxrwxrwx|0|0|1281|1284955200|871048826|0|0
0|A:/BEARD.897|4|r/rrwxrwxrwx|0|0|2392|1284955200|871054862|0|0
0|A:/_P}WP{2 (deleted)|5|r/rrwxrwxrwx|0|0|2392|0|871054894|0|0
0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0
0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0
0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0
0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0
Sleuth Kit example
$ fsstat -t 2004-M-088.0007.dd
fat12

$ fls -a -m A: 2004-M-088.0007.dd
0|A:/DRURY|3|r/rrwxrwxrwx|0|0|1281|1284955200|871048826|0|0
0|A:/BEARD.897|4|r/rrwxrwxrwx|0|0|2392|1284955200|871054862|0|0
0|A:/_P}WP{2 (deleted)|5|r/rrwxrwxrwx|0|0|2392|0|871054894|0|0
0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0
0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0
0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0
0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0

$ icat 2004-M-088.0007.dd 4 | file -
/dev/stdin: (Corel/WP)
Sleuth Kit example
$ fsstat -t 2004-M-088.0007.dd
fat12

$ fls -a -m A: 2004-M-088.0007.dd
0|A:/DRURY|3|r/rrwxrwxrwx|0|0|1281|1284955200|871048826|0|0
0|A:/BEARD.897|4|r/rrwxrwxrwx|0|0|2392|1284955200|871054862|0|0
0|A:/_P}WP{2 (deleted)|5|r/rrwxrwxrwx|0|0|2392|0|871054894|0|0
0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0
0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0
0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0
0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0

$ icat 2004-M-088.0007.dd 4 | file -
/dev/stdin: (Corel/WP)

$ icat 2004-M-088.0007.dd 4 | strings | head -n 6
WPCN
Courier 10cpi
HP LaserJet+
HPLASERJ.PRS
Cowles Foundation for Research in Economics
Yale University
Sleuth Kit example
$ fsstat -t 2004-M-088.0007.dd
fat12

$ fls -a -m A: 2004-M-088.0007.dd
0|A:/DRURY|3|r/rrwxrwxrwx|0|0|1281|1284955200|871048826|0|0
0|A:/BEARD.897|4|r/rrwxrwxrwx|0|0|2392|1284955200|871054862|0|0
0|A:/_P}WP{2 (deleted)|5|r/rrwxrwxrwx|0|0|2392|0|871054894|0|0
0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0
0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0
0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0
0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0

$ icat 2004-M-088.0007.dd 4 | file -
/dev/stdin: (Corel/WP)

$ icat 2004-M-088.0007.dd 4 | strings | head -n 6
WPCN
Courier 10cpi
HP LaserJet+
HPLASERJ.PRS
Cowles Foundation for Research in Economics
Yale University

$ tsk_recover -a 2004-M-088.0007.dd /tmp
Files Recovered: 2
Sample DFXML Output
<?xml version='1.0' encoding='UTF-8'?>
<dfxml version='1.0'>
  <metadata
  xmlns='http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML'
  xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
    <dc:type>Disk Image</dc:type>
  </metadata>
  <creator version='1.0'>
    <!-- provenance information re: extraction - software used; operating system -->
  </creator>
  <source>
    <image_filename>2004-M-088.0018.dd</image_filename>
  </source>
  <volume offset='0'><!-- partitions within each disk image -->
      <fileobject><!-- files within each partition --></fileobject>
  </volume>
  <runstats><!-- performance and other statistics --></runstats>
</dfxml>
Sample DFXML Output
<fileobject>
  <filename>_ublist1.wpd</filename>
  <partition>1</partition>
  <id>1</id>
  <name_type>r</name_type>
  <filesize>202152</filesize>
  <unalloc>1</unalloc>
  <used>1</used>
  <inode>3</inode>
  <meta_type>1</meta_type>
  <mode>511</mode>
  <nlink>0</nlink>
  <uid>0</uid>
  <gid>0</gid>
  <mtime>2001-02-22T22:30:52Z</mtime>
  <atime>2001-02-22T05:00:00Z</atime>
  <crtime>2001-02-22T22:31:54Z</crtime>
  <libmagic>(Corel/WP)</libmagic>
  <byte_runs>
   <byte_run file_offset='0' fs_offset='16896' img_offset='16896' len='512'/>
  </byte_runs>
  <hashdigest type='md5'>d7bc22242c0a88fd8b68712980d5ab28</hashdigest>
  <hashdigest type='sha1'>64bf2bdf82e33fcda50158804483ac611e753db5</hashdigest>
</fileobject>

More Related Content

PDF
Digital Forensics for Digital Archives
PDF
Accessioning-Based Metadata Extraction and Iterative Processing: Notes From t...
PDF
ArchivesSpace: Building a Next-Generation Archives Management Tool
PDF
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
PDF
Preservation Planning: Choosing a suitable digital preservation strategy
PPTX
NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend
PPT
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
PDF
Inherit Your Tags - Integration of collaborative tagging and tag proposal int...
Digital Forensics for Digital Archives
Accessioning-Based Metadata Extraction and Iterative Processing: Notes From t...
ArchivesSpace: Building a Next-Generation Archives Management Tool
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
Preservation Planning: Choosing a suitable digital preservation strategy
NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Inherit Your Tags - Integration of collaborative tagging and tag proposal int...

What's hot (14)

PPTX
NCompass Live: Digital Preservation, Part 2: Storage and Protection
PPTX
Needs for Data Management & Citation Throughout the Information Lifecycle
PDF
Metadata Workshop
PPTX
Data Management for Education Research
PPT
PRESERVATION Web archiving
PPTX
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
PPTX
e-Science, Research Data and Libaries
PPTX
Eudat user forum-london-11march2013-biovel-v3
PPT
Bit Level Preservation
PPT
JeromeDL Tutorial
PDF
RDAP13 Mark Leggott: Stewarding research data using the Islandora framework
PPT
KeepIt Course 4: Putting storage, format management and preservation planning...
PDF
e-Services to Keep Your Digital Files Current
NCompass Live: Digital Preservation, Part 2: Storage and Protection
Needs for Data Management & Citation Throughout the Information Lifecycle
Metadata Workshop
Data Management for Education Research
PRESERVATION Web archiving
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
e-Science, Research Data and Libaries
Eudat user forum-london-11march2013-biovel-v3
Bit Level Preservation
JeromeDL Tutorial
RDAP13 Mark Leggott: Stewarding research data using the Islandora framework
KeepIt Course 4: Putting storage, format management and preservation planning...
e-Services to Keep Your Digital Files Current
Ad

Viewers also liked (20)

PDF
Become an Internet Sleuth!
PPTX
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
PPT
July132000
PPTX
Facebook Forensics Toolkit(FFT)
PPTX
Capturing forensics image
PDF
Social Media for Investigations Tools
PPTX
WinFE: The (Almost) Perfect Triage Tool
PPTX
Mounting virtual hard drives
PDF
NTFS Forensics
PPTX
Windows nt istallation
PPTX
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
PPTX
Windows 7 forensics jump lists-rv3-public
PDF
Windows 7 forensics -overview-r3
PPTX
Windows 10 Forensics: OS Evidentiary Artefacts
PDF
2010 2013 sandro suffert memory forensics introdutory work shop - public
PPT
Windows forensic artifacts
PPT
File system
PPT
Installation of Joomla on Windows XP
PDF
Forensics of a Windows System
PPT
Vista Forensics
Become an Internet Sleuth!
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
July132000
Facebook Forensics Toolkit(FFT)
Capturing forensics image
Social Media for Investigations Tools
WinFE: The (Almost) Perfect Triage Tool
Mounting virtual hard drives
NTFS Forensics
Windows nt istallation
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
Windows 7 forensics jump lists-rv3-public
Windows 7 forensics -overview-r3
Windows 10 Forensics: OS Evidentiary Artefacts
2010 2013 sandro suffert memory forensics introdutory work shop - public
Windows forensic artifacts
File system
Installation of Joomla on Windows XP
Forensics of a Windows System
Vista Forensics
Ad

Similar to Using and Developing with Open Source Digital Forensics Software in Digital Archives Programs (20)

PDF
Watching the Detectives: Using digital forensics techniques to investigate th...
PDF
Workshop 1 revised
PPT
Digital Forensics in the Archive
PDF
Workshop 2 revised
PDF
Tackling File Characterization and Analysis in Archivematica
PDF
Accessing Forensic Images
PDF
Technologies For Appraising and Managing Electronic Records
PDF
Accessioning Born-Digital Materials
PDF
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
PPTX
Windows Forensics
PDF
DefCon 2012 - Anti-Forensics and Anti-Anti-Forensics
PPT
Session 48 - Principles of Semantic metadata management
PPTX
VRA 2012, Cataloging Case Studies, ROBOCATALOGING
PDF
Imagically Image Forensic Tool
PDF
Emulation Bridging The Past To The Future Dirk Von Suchodoletz
PDF
CNIT 121: 11 Analysis Methodology
PPT
The Elephant in the Library
PPT
Beginning an Imaging Program: Achieving Success and Avoiding the Pitfalls – A...
Watching the Detectives: Using digital forensics techniques to investigate th...
Workshop 1 revised
Digital Forensics in the Archive
Workshop 2 revised
Tackling File Characterization and Analysis in Archivematica
Accessing Forensic Images
Technologies For Appraising and Managing Electronic Records
Accessioning Born-Digital Materials
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
Windows Forensics
DefCon 2012 - Anti-Forensics and Anti-Anti-Forensics
Session 48 - Principles of Semantic metadata management
VRA 2012, Cataloging Case Studies, ROBOCATALOGING
Imagically Image Forensic Tool
Emulation Bridging The Past To The Future Dirk Von Suchodoletz
CNIT 121: 11 Analysis Methodology
The Elephant in the Library
Beginning an Imaging Program: Achieving Success and Avoiding the Pitfalls – A...

More from Mark Matienzo (11)

PDF
To Hell With Good Intentions: Linked Data and the Power to Name
PDF
Linked Data and the Semantic Web in the Archival Context
PPTX
Archival Sensemaking: Personal Digital Archiving as an Iteration
PDF
Findability in the Flow: Discovery through Linking
PDF
Learning to Take, Learning to Give: Linking as Repurposing Metadata
PDF
EAD and MARC sitting in a tree: D-R-U-P-A-L
ZIP
Online Presence and Participation
PDF
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
PDF
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
PDF
Archives & the Semantic Web
PDF
How I failed to present on using DVCS to control archival metadata
To Hell With Good Intentions: Linked Data and the Power to Name
Linked Data and the Semantic Web in the Archival Context
Archival Sensemaking: Personal Digital Archiving as an Iteration
Findability in the Flow: Discovery through Linking
Learning to Take, Learning to Give: Linking as Repurposing Metadata
EAD and MARC sitting in a tree: D-R-U-P-A-L
Online Presence and Participation
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
Archives & the Semantic Web
How I failed to present on using DVCS to control archival metadata

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Empathic Computing: Creating Shared Understanding
PDF
Modernizing your data center with Dell and AMD
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation theory and applications.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
NewMind AI Weekly Chronicles - August'25 Week I
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Understanding_Digital_Forensics_Presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
cuic standard and advanced reporting.pdf
MYSQL Presentation for SQL database connectivity
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Approach and Philosophy of On baking technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Review of recent advances in non-invasive hemoglobin estimation
Empathic Computing: Creating Shared Understanding
Modernizing your data center with Dell and AMD
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
Encapsulation theory and applications.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”

Using and Developing with Open Source Digital Forensics Software in Digital Archives Programs

  • 1. Using and Developing with Open Source Digital Forensics Software in Digital Archives Programs Mark A. Matienzo Manuscripts and Archives, Yale University Library 2012 SAA Research Forum August 7, 2012
  • 2. Is open source digital forensics software extensible enough and well-suited to support work in the archival domain?
  • 3. Digital forensics in the archival domain • Increasing use of digital forensics tools/methodologies within the context of digital archives programs (Kirschenbaum et al. 2010) • Technology-focused work (John 2008; Woods & Brown 2009; AIMS Work Group 2012) • Methodology-focused work (Duranti 2009; Xie 2011)
  • 4. Significant barriers to use of digital forensics in archives • Cost (Kirschenbaum et al. 2010; Daigle 2012) • Complexity (Kirschenbaum et al. 2010; Daigle 2012) • Digital archives as an emerging market for forensics
  • 5. Potential of open source digital forensics software • Requires additional tool development work to be useful for archivists (Kirschenbaum et al. 2010) • Requires additional integration work (Lee et al. 2012)
  • 6. Institutional Context • Focus on implementation of and development with open source digital forensics software at Yale University Library • Work must support accessioning, processing, and management of born-digital archival material • Primary focus are records received on legacy media
  • 7. Design Principles • Use and develop with open source digital forensics software to support accessioning, arrangement, and description of born-digital archival records • Focus on first two phases (preservation and searching) of Carrier’s (2005) model of digital investigation process • Curation micro-services (Abrams, et al. 2010) as philosophical basis to guide development and implementation • Recognition of both disk images as digital object (Woods, Lee, and Garfinkel 2011) and objects within disk images as needing management • Intention of forensic soundness, but assume much of state is lost
  • 8. Micro-services as Design Philosophy* Principles Preferences Practices • Granularity • Small and simple over • Define, decompose, large and complex recurse • Orthogonality • Minimally sufficient over • Top down design, bottom feature-laden up implementation • Parsimony • Configurable over the • Code to interfaces prescribed • Evolution • The proven over the • Sufficiency through a merely novel series of incrementally necessary steps • Outcomes over means *UC Curation Center/California Digital Library, 2010
  • 9. Workflow Start accessioning Write-protect media Verify image process Media Record identifying Extract filesystem- Disk Meta- Retrieve media characteristics of and file-level images data media as metadata metadata Transfer package Package images Assign identifiers to Ingest transfer Create image and metadata for media package ingest Media FS/File Document Disk MD MD accessioning image process End accessioning process
  • 10. Disk Image Acquisition • Requires a combination of hardware (drives/media readers, controller cards, write blockers) and software • In some cases, software depends on particular hardware • Software tested: FTK Imager (proprietary/gratis), hardware-specific solutions (FC5025 WinDIB; KryoFlux DTC/GUI; Catweasel Imagetool3) • Goal: sector image interpretable by multiple tools
  • 12. Analysis Process • Multiple levels of analysis within digital forensics based on layers of abstraction (Carrier 2003) • Conceptual linkages with metadata extraction/analysis processes with digital curation/archival domain Carrier, 2003
  • 13. Metadata Extraction • Use open source digital forensics software (Sleuth Kit, fiwalk) and other open source tools to characterize media, volume, file system, and file information • Attempt to repurpose this information as descriptive, structural, and/or technical metadata to support accessioning, appraisal, and processing
  • 14. The Sleuth Kit •Open source C library, command line tools, and GUI application (Autopsy) for forensic analysis •Supports analysis of FAT, NTFS, ISO9660, HFS+, Ext2/3, UFS1/2 •Splits tools into layers: volume system, file system, file name, metadata, data unit (“block”) •Additional utilities to sort and post-process extracted metadata
  • 15. Digital Forensics XML •Representation in XML of structured forensic information developed by Simson Garfinkel •Produced by tools including fiwalk (Garfinkel 2012), which uses Sleuth Kit for volume, file system, file, and application-level analysis •Easily extensible (local plugin development as focus) •Straight forward to process
  • 17. Disk Images •Acquired 1,039 disk images from across 69 accessions at Manuscripts and Archives 500 422 375 312 250 185 125 94 26 0 CDs 3.5” floppies DVDs 5.25” floppies Zip disks
  • 18. Metadata Extraction •Ran metadata extraction on 812 images File Systems within Images 400 386 300 246 200 155 100 14 11 0 ISO9660 FAT12 Unidentified HFS+ FAT16
  • 19. Metadata Extraction •Ran enhanced metadata extraction on 619 images (users plugins for fiwalk developed during research) •Performed analysis on 49,724 files within images •Successfully identified 43,729 files (147 unique file types) against PRONOM format registry •Identified 9 files as containing virus signatures (2 unique virus signatures)
  • 20. image/tiff! Identified MIME Types by OPF FIDO (36320 total matches)! image/jpeg! 14000! application/msword! text/html! application/pdf! image/gif! image/tiff, 12429! image/bmp! image/x-pict! 12000! application/x-gzip! image/vnd.dwg! message/rfc822! application/postscript! application/zip! 10000! application/octet-stream! text/plain! video/mpeg! application/java-archive! image/jpeg, 8219! image/x-sgi-bw! text/xml! 8000! application/vnd.lotus-1-2-3! image/png! text/css! video/x-msvideo! video/quicktime! 6000! application/rtf! application/xml! application/msword, 5008! audio/mpeg! application/vnd.ms-powerpoint! application/javascript! 4000! image/vnd.dxf! text/html, 3558! audio/x-wav! application/pdf, 3111! audio/prs.sid! application/vnd.ms-excel! application/inf! video/x-ms-wmv! 2000! audio/x-ms-wma! application/xhtml+xml! 756! application/x-endnote-refer! 499!485!480!395! image/vnd.microsoft.icon! 280!208! 152!105!102!100! 89! 71! 58! 40! 23! 22! 17! 17! 15! 11! 10! 10! 10! 6! 6! 4! 4! 4! 3! 3! 3! 3! 2! 1! 1! application/x-shockwave-flash! 0! application/x-director! 1!
  • 21. Software Development •Created Fiwalk plugins to perform additional analysis and evaluation of files/bitstreams within disk images •Virus identification plugin using ClamAV/pyclamd •File format identification against PRONOM format registry using Open Planets Foundation’s FIDO •Code (including additional plugins) available online: https://github.com/anarchivist/fiwalk-dgi/
  • 22. Gumshoe • Prototype based on Blacklight (Ruby on Rails + Solr) • Indexing code works with fiwalk output or directly from a disk image • Populates Solr index with all file-level metadata from fiwalk and, optionally, text strings extracted from files • Provides searching, sorting and faceting based on metadata extracted from filesystems and files • Code at http://github.com/anarchivist/gumshoe
  • 24. Advantages •Faster (and more forensically sound) to extract metadata once rather than having to keep processing an image •Possibility of developing better assessments during accessioning process (significance of directory structure, accuracy of timestamps) •Integrating additional extraction processes and building supplemental tools is simple •Performance of tools correlates to complexity of analysis
  • 25. Limitations • Use of tools limited to specific types of file systems • Additional software (particularly to document imaging process) requires additional integration and data normalization • DFXML is not (currently) a metadata format common within domains of archives/libraries and requires an domain-specific application profile • Extracted metadata maybe harder to repurpose for descriptive purposes based on level of granularity
  • 26. Work in Progress • BitCurator project under development; early release available for testing: http://wiki.bitcurator.net • The Sleuth Kit and related tools under continuing development (Autopsy, fiwalk, etc.): http://sleuthkit.org • Additional testing, development integration under work at Yale and NYPL
  • 27. Thanks! Mark A. Matienzo mark.matienzo@yale.edu http://matienzo.org @anarchivist
  • 28. References • Abrams, S., et al. (2011). “Curation Micro-Services: A Pipeline Metaphor for Repositories.” Journal of Digital Information 12(2). http://journals.tdl.org/ jodi/article/view/1605 • AIMS Work Group (2012). AIMS Born-Digital Collections: An Inter-Institutional Model for Stewardship. http://www2.lib.virginia.edu/aims/whitepaper/ • Carrier, B. (2003). “Defining Digital Forensic Examination and Analysis Tools Using Abstraction Layers.” International Journal of Digital Evidence 1(4). • Carrier, B. (2005). File System Forensic Analysis. Boston and London: Addison Wesley. • Daigle, B.J. (2012). “The Digital Transformation of Special Collections.” Journal of Library Administration 52(3-4), 244-264. • Duranti, L. (2009). “From Digital Diplomatics to Digital Records Forensics.” Archivaria 68, 39-66. • Garfinkel, S. (2012). “Digital Forensics XML and the DFXML Toolset.” Digital Investigation 8, 161-174. • John, J.L. (2008). “Adapting Existing Technologies for Digitally Archiving Personal Lives: Digital Forensics, Ancestral Computing, and Evolutionary Perspectives and Tools.” Presented at iPRES 2008. http://www.bl.uk/ipres2008/presentations_day1/09_John.pdf • Kirschenbaum, M.G., et al. (2010). Digital Forensics and Born-Digital Content in Cultural Heritage Collections. Washington: Council on Library and Information Resources. • Lee, C.A., et al. (2012). “BitCurator: Tools and Techniques for Digital Forensics in Collecting Institutions.” D-Lib Magazine 18(5/6). • UC Curation Center/California Digital Library (2019). “UC3 Curation Foundations.” Revision 0.13. https://confluence.ucop.edu/download/attachments/ 13860983/UC3-Foundations-latest.pdf • Woods, K. and Brown, G. (2009). “From Imaging to Access: Effective Preservation of Legacy Removable Media.” In Archiving 2009. Springfield, VA: Society for Imaging Science and Technology. • Woods, K., Lee, C.A., and Garfinkel, S. (2011). “Extending Digital Repository Architectures to Support Disk Image Preservation and Access.” In JCDL ’11. • Xie, S.L. (2011). “Building Foundations for Digital Records Forensics: A Comparative Study of the Concept of Reproduction in Digital Records Management and Digital Forensics.” American Archivist 74(2), 576-599.
  • 29. Sleuth Kit example $ fsstat -t 2004-M-088.0007.dd fat12
  • 30. Sleuth Kit example $ fsstat -t 2004-M-088.0007.dd fat12 $ fls -a -m A: 2004-M-088.0007.dd 0|A:/DRURY|3|r/rrwxrwxrwx|0|0|1281|1284955200|871048826|0|0 0|A:/BEARD.897|4|r/rrwxrwxrwx|0|0|2392|1284955200|871054862|0|0 0|A:/_P}WP{2 (deleted)|5|r/rrwxrwxrwx|0|0|2392|0|871054894|0|0 0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0 0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0 0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0 0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0
  • 31. Sleuth Kit example $ fsstat -t 2004-M-088.0007.dd fat12 $ fls -a -m A: 2004-M-088.0007.dd 0|A:/DRURY|3|r/rrwxrwxrwx|0|0|1281|1284955200|871048826|0|0 0|A:/BEARD.897|4|r/rrwxrwxrwx|0|0|2392|1284955200|871054862|0|0 0|A:/_P}WP{2 (deleted)|5|r/rrwxrwxrwx|0|0|2392|0|871054894|0|0 0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0 0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0 0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0 0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0 $ icat 2004-M-088.0007.dd 4 | file - /dev/stdin: (Corel/WP)
  • 32. Sleuth Kit example $ fsstat -t 2004-M-088.0007.dd fat12 $ fls -a -m A: 2004-M-088.0007.dd 0|A:/DRURY|3|r/rrwxrwxrwx|0|0|1281|1284955200|871048826|0|0 0|A:/BEARD.897|4|r/rrwxrwxrwx|0|0|2392|1284955200|871054862|0|0 0|A:/_P}WP{2 (deleted)|5|r/rrwxrwxrwx|0|0|2392|0|871054894|0|0 0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0 0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0 0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0 0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0 $ icat 2004-M-088.0007.dd 4 | file - /dev/stdin: (Corel/WP) $ icat 2004-M-088.0007.dd 4 | strings | head -n 6 WPCN Courier 10cpi HP LaserJet+ HPLASERJ.PRS Cowles Foundation for Research in Economics Yale University
  • 33. Sleuth Kit example $ fsstat -t 2004-M-088.0007.dd fat12 $ fls -a -m A: 2004-M-088.0007.dd 0|A:/DRURY|3|r/rrwxrwxrwx|0|0|1281|1284955200|871048826|0|0 0|A:/BEARD.897|4|r/rrwxrwxrwx|0|0|2392|1284955200|871054862|0|0 0|A:/_P}WP{2 (deleted)|5|r/rrwxrwxrwx|0|0|2392|0|871054894|0|0 0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0 0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0 0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0 0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0 $ icat 2004-M-088.0007.dd 4 | file - /dev/stdin: (Corel/WP) $ icat 2004-M-088.0007.dd 4 | strings | head -n 6 WPCN Courier 10cpi HP LaserJet+ HPLASERJ.PRS Cowles Foundation for Research in Economics Yale University $ tsk_recover -a 2004-M-088.0007.dd /tmp Files Recovered: 2
  • 34. Sample DFXML Output <?xml version='1.0' encoding='UTF-8'?> <dfxml version='1.0'> <metadata xmlns='http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:dc='http://purl.org/dc/elements/1.1/'> <dc:type>Disk Image</dc:type> </metadata> <creator version='1.0'> <!-- provenance information re: extraction - software used; operating system --> </creator> <source> <image_filename>2004-M-088.0018.dd</image_filename> </source> <volume offset='0'><!-- partitions within each disk image --> <fileobject><!-- files within each partition --></fileobject> </volume> <runstats><!-- performance and other statistics --></runstats> </dfxml>
  • 35. Sample DFXML Output <fileobject> <filename>_ublist1.wpd</filename> <partition>1</partition> <id>1</id> <name_type>r</name_type> <filesize>202152</filesize> <unalloc>1</unalloc> <used>1</used> <inode>3</inode> <meta_type>1</meta_type> <mode>511</mode> <nlink>0</nlink> <uid>0</uid> <gid>0</gid> <mtime>2001-02-22T22:30:52Z</mtime> <atime>2001-02-22T05:00:00Z</atime> <crtime>2001-02-22T22:31:54Z</crtime> <libmagic>(Corel/WP)</libmagic> <byte_runs> <byte_run file_offset='0' fs_offset='16896' img_offset='16896' len='512'/> </byte_runs> <hashdigest type='md5'>d7bc22242c0a88fd8b68712980d5ab28</hashdigest> <hashdigest type='sha1'>64bf2bdf82e33fcda50158804483ac611e753db5</hashdigest> </fileobject>