SlideShare a Scribd company logo
Data Carving using Artificial Headers 
R. Daniel1, N.L. Clarke1,2 
& F. Li1 
1Centre for Security, Communications & Network Research (CSCAN), Plymouth 
University, United Kingdom; 
2Security Research Institute, Edith Cowan University, Western Australia 
e-mail: info@cscan.org 
Abstract 
Digital forensic tools are an essential requirement in criminal and increasingly civil 
cases in order to process electronic evidence. Investigators rely upon the functionality 
of these tools to identify and extract relevant artifacts. One of these key processes is 
data carving – an approach that ignores the file system and analyses the drive for files 
that match a particular signature. Unfortunately, however, other than simple files, data 
carving has many limitations that result in either missing files or producing high 
numbers of false alarms. The core of their detection is largely based upon a signature 
appearing in the header of the file. However, for files that have corrupted or missing 
headers, modern data carvers are unable to recover the file successfully. This paper 
proposes a new approach to data carving that inserts an artificial header onto the file, 
thereby circumventing the header issue. Experiments have demonstrated that this 
approach is able to successfully recover files that no current data-carving tools are 
able to achieve. 
Keywords: Digital forensics, computer forensics, data carving
Data Carving using Artificial Headers 
1. INTRODUCTION 
Digital forensics has become an invaluable tool in the identification of criminal 
activities (Casey, 2010). Computer and mobile forensics have received particular 
attention due to the demand from law enforcement, which is in turn linked to the 
growth and popularity of such equipment (European Anti-Fraud Office, 2014). Used 
for both cyber and traditional crime (e.g. terrorist attacks, child pornography and 
information leakage), these electronic devices provide an invaluable source of 
information and evidence. Indeed, criminals have been prosecuted based upon the 
evidence recovered from their computers and mobile phones via digital forensic 
techniques (FBI, 2011; Inforsecusa, 2011; Brainz, 2014). An essential analysis tool 
available to investigators is to perform data carving. This process permits the recovery 
of files from the raw image independent of any file system that might be present. This 
enables files to be recovered from unallocated space, slack space and from within 
files that an inspection of the file system would not reveal. The primary for detection 
mechanism is to locate the header and footer of a file and extract the data in between 
(Beek, 2011). Unfortunately, however, due to a variety of issues, such as 
fragmentation, deletion and missing sectors, the ability for data carvers to recover the 
data successfully is variable (Merola, 2008). 
A key issue for data carvers is their ability to recover data in scenarios where no 
associated header or footer information is present. For example, slack space often 
contains information regarding files but with the header missing perhaps due to being 
overwritten. The paper develops a new approach to data carving that enables the 
investigator to be able to determine if particular chunks of data contain information. 
The paper is structured as follows. Section 2 describes the current state of the art, 
introduces a range of data carvers and performs an evaluation of data carvers to 
investigate their performance. Section 3 presents the new tool and describes the 
design, testing and logic of the approach. An evaluation of the tool is presented in 
Section 4 alongside the conclusions and future work in Section 5. 
2. BACKGROUND LITERATURE 
Literature often seeks to classify data carving approaches into two: simple and 
advanced (Pal & Memon, 2009). Simple data carvers are able to carve files via 
identifying a unique signature within the header and locating its associated footer. For 
example, a PDF file could be carved from a piece of data if it starts with “%PDF” (i.e. 
the PDF header) and ends with “%EOF” (i.e. the PDF footer). The approach therefore 
assumes the files are stored in continuous data clusters within the raw image (Hand, 
2012). From one perspective, this is a sound assumption, as modern file systems will 
always seek to store data in continuous data clusters. 
However, due to the operation of the file system and the size of a file, a series of 
alternative scenarios are possible. As illustrated in Figure 1, a variety of 
fragmentation possibilities exist which result in the data for a file being injected with 
data from another file, missing or reversed.
Figure 1: Examples of File Fragmentation 
Advanced data carving approaches seek to overcome these issues. Techniques to date 
largely focus upon relying upon some internal file structure within the data itself. 
Content-based approaches utilize characteristics such as character count, 
text/language recognition, white and black listing of data, statistical attributes and 
information entropy (Kloet, 2010). Such approaches are however open to errors with 
incorrectly carved files. This gives rise to performance characteristics. 
Garfinkel (2007) identified two key limitations with current data carving tools: 
1. Files had to be stored in sequential clusters 
2. No evaluation of the carved file leading to a large number of false positives 
Pal and Memon (2009) present a number of approaches that seek to automate the 
reconstruction of fragmented files with varying levels of success. Automated 
verification of the validity of data carving is no simple problem to solve. 
Whilst the literature provides a reasonable overview of the current state of the art, it is 
difficult to establish their relative performance. Moreover, it is not evident from the 
prior work, how well they perform in scenarios where files are fragmented. It was 
therefore considered prudent to perform an evaluation of current tool capabilities in 
order to evaluate the performance. An experiment was devised to test the capabilities 
of a number of data carvers against a fixed forensic image. The Digital Forensics 
Workshop (DFRWS) through its annual conference challenge produced a dataset in 
2006 (and also a more advanced version in 2007) (DFRWS, 2006; DFRWS, 2007). 
The 2006 dataset focused primarily on 4 categories of files: HTML, Microsoft Office, 
JPEG and Zip and contained a total of 32 base files. 
A selection of open source and commercial data carving tools were utilized, including 
the industry leading products: Guidance Software’s Encase and AccessData’s FTK 
(Guidance Software, 2014; AccessData, 2014).
Application No. of 
files 
present 
No. of Files 
Extracted 
No. of 
Successfully 
Carved 
No. of 
Partial 
Carved 
Files 
Encase 32 24 10 (31%) 6 
FTK 32 24 6 (19%) 10 
Scalpel 32 50 15 (47%) 5 
WinHex 32 13 8 (25%) 5 
Table 1: Data Carver Results for DFRWS 2006 Dataset 
The results from the DRFWS 2006 dataset demonstrate a relatively poor performance 
across the tools. The successful category is measured based upon a file that is 
completely carved correctly. It was notable on a number of occasions across all tools 
that partial recovery was possible. Indeed, utilizing Scalpel, three of the fragmented 
image files had been partial recovered successfully. In these particular cases, enough 
to recognize the content and thus be of potential use; however, this is not necessarily 
always the case. Notably, none of the carvers supported the Microsoft Excel 
spreadsheet or the text file formats, so neither were successfully carved. That said, 
some of the text files were contained within other partially carved files (i.e. appeared 
as a fragment after an HTML file). Initially, the 2007 dataset was also going to be 
evaluated; however, as it represents a more complex scenario incorporating a wider 
range of file types such as MP3, AVI, FLV and PDF and given the performance 
against the 2006 dataset, it was deemed unnecessary. Analysis of these results shows 
that the data carvers have a significant issue when it comes files that are fragmented, 
out of sequence or missing. What is particularly surprising is that these problems have 
been established for over 8 years and modern carvers are still unable to process them 
(Garfinkel, 2007). 
3. FILE RECOVERY USING ARTIFICIAL HEADERS (FRAH) 
Given the prior art and evaluation of the tools, the research sought to develop an 
approach to data carving that look to solve several issues: 
• To provide the ability to render files with missing or corrupt headers 
• To provide the ability to render fragments of data that contain no associated 
header information. 
This approach to the problem enables the investigator to examine whether files that 
are not rendering (or cannot be open) might indeed be incomplete but yet contain 
valuable information. It also provides an approach to examine the slack space areas 
within the drive to determine whether the data is meaningful. It achieves this by 
inserting an artificial header on the file and subsequently manipulating the data in 
order to determine whether a valid file is present. A process model for the approach is 
presented in Figure 2.
Figure 2: FRAH Process Model 
In order to test the approach, a prototype was developed. As illustrated in Figure 3, a 
simple interface was proposed that accepted the location of the file and would then 
subsequently proceed to evaluate the data against a set of pre-defined file types (e.g. 
BMP, PNG, GIF, PDF). In order to focus upon the concept of artificial headers, the 
tool was designed to take files that Access Data’s FTK was able to extract, rather than 
working on the individual forensic image; however, future developments will include 
this functionality. After the file has been entered and a file type selected, the system 
will apply the appropriate header and attempt to open the file using the system’s built-in 
viewer. 
Figure 3: FRAH Interface 
For the purposes of demonstrating the capability, the tool merely leaves the decision 
as to whether the file context is valid or not to the investigator. However, for large 
numbers of files, this process will need to be automated. 
4. EVALUATION & DISCUSSION 
In order to test the tool across the differing files types, a number of test files were 
created (2 BMP, 2 PNG, 1 GIF, 1 PDF). In each of the cases, the header information 
was corrupted through the deletion or additional of random bytes. Importantly 
however, in all but one test file (Testfile1b), the data carving signature was included, 
meaning data carvers should be able to identify the file. As illustrated in Figure 3, in a 
standard file system view of the files, none of them are either rendered or identified 
except for Testfile1a – which is recognized as a BMP merely due to the file extension
being present on the file name. Nevertheless it is still unable to render the image due 
to corruption. 
Figure 4: Evaluation Files: Initial State 
As illustrated in Table 3 and Figure 4, the application of FRAH results in each of 
these files being recoverable. In each case, FRAH merely ignores any header 
information present and merely inserts an artificial header onto the file. 
Filename: Carve 
Signature 
File Type Analysis 
File Type 
Successful 
Carve 
Testfile1a Yes BMP BMP Yes 
Testfile1b No Unknown BMP Yes 
Testfile2a Yes Unknown PNG Yes 
Testfile2b Yes Unknown PNG Yes 
Testfile3a Yes Unknown GIF Yes 
Testfile4a Yes Unknown PDF Yes 
Table 3: Evaluation Results 
Interestingly, even with valid carver signatures present in five of the six files, testing 
these files against the Access Data’s FTK resulted in FTK unable to recover any of 
the files. The FTK data carving process did however recover three partial carved files, 
but all three were associated with images contained within the PDF of Testfile4a. 
Figure 5: Evaluation Files: Post FRAH
Notably, neither of the forensic images (DFRWS 2006 and 2007) contain files where 
the header is specifically corrupted or no longer present, although files that have been 
fragmented could arguably fall into this category for any fragments (bar the one 
containing the header). Therefore, a secondary external source was identified in order 
to evaluate the tool. The DC3 Digital Forensics Challenge is an annual forensics 
challenge run by the US Department of Defence (DC3, 2013). The challenge involves 
users putting their knowledge of security to the test in completing a range of tasks 
such as data carving, decryption, file registry analysis and steganography. The 
challenge consists of two files with missing headers (a PNG and PDF). As illustrated 
in Figure 4 both of these files were recovered successfully. 
Whilst the evaluation has proven successful, further analysis of the scenarios that 
would naturally occur within cases does highlight a number of limitations with the 
current approach. FRAH currently operates by inserting an artificial header onto the 
payload of the file. If a file header is corrupted then FRAH is able to recover the file. 
However, in circumstances where the header or the first fragment is missing, it is 
likely that elements of the payload in addition to the header are also missing. Further 
research needs to investigate the impact of missing or corrupt payload data, with a 
view to the padding and manipulation of the data in order to recover the files contents 
that remain. This approach would then also permit the application of single fragments 
of data to be recovered (rather than simple the first fragment as is typical with data 
carvers today). 
5. CONCLUSIONS 
The proposed tool is capable of recovering files with corrupt or missing header 
information across a number of standard file types. An analysis of current data carvers 
demonstrated that none of these tools currently have such capability and the 
evaluation successfully demonstrated recovery for all files. 
The initial prototype is however limited and further research is required to provide a 
more robust carver with a level of automation. Enhancements are required in the 
following areas: 
• The ability to accept a range of data fragments, rather than a single file so that 
multiple data fragments can readily analyzed 
• To automate the identification of meaningful data, thereby removing the need 
for human-intervention 
• To manipulate the file contents in a systematic fashion in order to enable 
successful viewing of the content 
• To increase the range of file types supported 
REFERENCES 
AccessData (2014) “FTK-Forensic Toolkit”, http://www.accessdata.com/products/digital-forensics/ftk, 
date accessed 02 February 2014 
Beek, C (2011) “Introduction to File Carving”, McAfee white paper, 
http://www.mcafee.com/uk/resources/white-papers/foundstone/wp-intro-to-file-carving.pdf, date 
accessed: 18 February 2014
Brainz (2014) “15 Criminal Cases Solved With Digital Evidence”, http://brainz.org/15-criminal-cases-solved- 
digital-evidence/, date accessed: 15 January 2014 
Casey, E. ed (2010). “Handbook of Digital Forensics and Investigation”, Academic Press. 
p. 567. ISBN 0-12-374267-6 
DC3 (2013) “DC3 Cyber Crime Challenges”, https://www.dc3.mil/challenge/, date accessed: 10 
February 2014 
DFRWS (2006) “DFRWS 2006 Forensics Challenge Overview”, 
http://www.dfrws.org/2006/challenge/index.shtml, date accessed: 28 January 2014 
DFRWS (2007) “DFRWS 2007 Forensics Challenge Overview”, 
http://www.dfrws.org/2007/challenge/index.shtml, date accessed: 28 January 2014 
European Anti-Fraud Office (2014) “Digital Forensics”, 
http://ec.europa.eu/anti_fraud/investigations/forensics/index_en.htm, 19 February 2014 
FBI (2011) “Digital Forensics Regional Labs Help Solve Local Crimes”, 
http://www.fbi.gov/news/stories/2011/may/forensics_053111, date accessed: 20 January 2014 
Garfinkel, S. (2007). Carving contiguous and fragmented files with fast object validation. Available: 
http://dfrws.org/2007/proceedings/p2-garfinkel.pdf. Last accessed 12th Apr 2013. 
Guidance Software (2014) “EnCase Forensic”, 
http://www.guidancesoftware.com/products/Pages/encase-forensic/overview.aspx?cmpid=nav, date 
accessed: 06 February 2014 
Hand, S. (2012). Bin-Carver: Automatic Recovery of Binary Executables. Available: 
http://www.dfrws.org/2012/proceedings/DFRWS2012-12.pdf. Last accessed 12th Apr 2013. 
Inforsecusa (2011) “Computer Forensics Criminal Cases”, http://infosecusa.com/computer-forensics-criminal- 
cases, date accessed: 22 January 2014 
Kloet, B. (2010). Advanced File Carving. Available: http://computer-forensics.sans.org/summit-archives/ 
2010/eu-digital-forensics-incident-response-summit-bas-kloet-advanced-file-carving.pdf. Last 
accessed 12th Apr 2013. 
Merola, A (2008) “Data Carving Concepts”, (http://www.sans.org/reading-room/ 
whitepapers/forensics/data-carving-concepts-32969, date accessed 10 febuary 2014 
Pal, A & Memon, N. (2009). The Evolution of File Carving. Available: http://digital-assembly. 
com/technology/research/pubs/ieee-spm-2009.pdf. Last accessed 12th Apr 2013.

More Related Content

PPT
File Carving
PPTX
Advances in File Carving
PPT
File structures
PDF
File organisation
PPT
PPTX
File management
PPTX
File Management
PPT
Fundamental File Processing Operations
File Carving
Advances in File Carving
File structures
File organisation
File management
File Management
Fundamental File Processing Operations

What's hot (19)

PPT
File Management
PPT
Files concepts.53
PPTX
File Management – File Concept, access methods, File types and File Operation
PPT
File organisation
PPTX
6 chapter 6 record storage and primary file organization
PPTX
File management
PPT
File organization
PDF
A basic course on Reseach data management, part 2: protecting and organizing ...
PPT
Chapter 11 - File System Implementation
PPT
File management
PDF
Unit ivos - file systems
PPTX
FILE SYSTEMS IN WINDOWS OPERATING SYSTEMS
PDF
A basic course on Research data management, part 3: sharing your data
PDF
ITFT_File system interface in Operating System
PPT
PDF
10 File System
PPTX
Types of files
ODP
NTFS and Inode
File Management
Files concepts.53
File Management – File Concept, access methods, File types and File Operation
File organisation
6 chapter 6 record storage and primary file organization
File management
File organization
A basic course on Reseach data management, part 2: protecting and organizing ...
Chapter 11 - File System Implementation
File management
Unit ivos - file systems
FILE SYSTEMS IN WINDOWS OPERATING SYSTEMS
A basic course on Research data management, part 3: sharing your data
ITFT_File system interface in Operating System
10 File System
Types of files
NTFS and Inode
Ad

Similar to Data carving using artificial headers info sec conference (20)

PDF
File Reconstruction in Digital Forensic
PDF
Bi4101343346
PDF
A Systems Approach To Qualitative Data Management And Analysis
PDF
A SURVEY ON MULTIMEDIA FILE CARVING
PDF
An Analyzing of different Techniques and Tools to Recover Data from Volatile ...
PDF
Rule-based Information Extraction for Airplane Crashes Reports
PDF
Rule-based Information Extraction for Airplane Crashes Reports
PDF
Comparison of data recovery techniques on master file table between Aho-Coras...
PDF
Forensic Readiness on Hadoop Platform: Non-Ambari HDP as a Case Study
DOCX
SDOT Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-B...
ODP
The need of Interoperability in Office and GIS formats
PDF
Adaptive information extraction
PDF
Cloud Storage Client Application Analysis
PDF
Web_Mining_Overview_Nfaoui_El_Habib
DOCX
Nate French Internship Final Paper
RTF
Visualization of Computer Forensics Analysis on Digital Evidence
PDF
Flexor Muscle Exercise
PDF
In tech application-of_data_mining_technology_on_e_learning_material_recommen...
PDF
In tech application-of_data_mining_technology_on_e_learning_material_recommen...
PDF
Client Forensics: An Assessment of Existing Research And Future Directions
File Reconstruction in Digital Forensic
Bi4101343346
A Systems Approach To Qualitative Data Management And Analysis
A SURVEY ON MULTIMEDIA FILE CARVING
An Analyzing of different Techniques and Tools to Recover Data from Volatile ...
Rule-based Information Extraction for Airplane Crashes Reports
Rule-based Information Extraction for Airplane Crashes Reports
Comparison of data recovery techniques on master file table between Aho-Coras...
Forensic Readiness on Hadoop Platform: Non-Ambari HDP as a Case Study
SDOT Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-B...
The need of Interoperability in Office and GIS formats
Adaptive information extraction
Cloud Storage Client Application Analysis
Web_Mining_Overview_Nfaoui_El_Habib
Nate French Internship Final Paper
Visualization of Computer Forensics Analysis on Digital Evidence
Flexor Muscle Exercise
In tech application-of_data_mining_technology_on_e_learning_material_recommen...
In tech application-of_data_mining_technology_on_e_learning_material_recommen...
Client Forensics: An Assessment of Existing Research And Future Directions
Ad

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation theory and applications.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
cuic standard and advanced reporting.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PPTX
A Presentation on Artificial Intelligence
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Cloud computing and distributed systems.
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
MYSQL Presentation for SQL database connectivity
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation theory and applications.pdf
NewMind AI Weekly Chronicles - August'25 Week I
cuic standard and advanced reporting.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Monthly Chronicles - July 2025
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
A Presentation on Artificial Intelligence
20250228 LYD VKU AI Blended-Learning.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Cloud computing and distributed systems.
CIFDAQ's Market Insight: SEC Turns Pro Crypto
“AI and Expert System Decision Support & Business Intelligence Systems”
Chapter 3 Spatial Domain Image Processing.pdf
Machine learning based COVID-19 study performance prediction
MYSQL Presentation for SQL database connectivity

Data carving using artificial headers info sec conference

  • 1. Data Carving using Artificial Headers R. Daniel1, N.L. Clarke1,2 & F. Li1 1Centre for Security, Communications & Network Research (CSCAN), Plymouth University, United Kingdom; 2Security Research Institute, Edith Cowan University, Western Australia e-mail: info@cscan.org Abstract Digital forensic tools are an essential requirement in criminal and increasingly civil cases in order to process electronic evidence. Investigators rely upon the functionality of these tools to identify and extract relevant artifacts. One of these key processes is data carving – an approach that ignores the file system and analyses the drive for files that match a particular signature. Unfortunately, however, other than simple files, data carving has many limitations that result in either missing files or producing high numbers of false alarms. The core of their detection is largely based upon a signature appearing in the header of the file. However, for files that have corrupted or missing headers, modern data carvers are unable to recover the file successfully. This paper proposes a new approach to data carving that inserts an artificial header onto the file, thereby circumventing the header issue. Experiments have demonstrated that this approach is able to successfully recover files that no current data-carving tools are able to achieve. Keywords: Digital forensics, computer forensics, data carving
  • 2. Data Carving using Artificial Headers 1. INTRODUCTION Digital forensics has become an invaluable tool in the identification of criminal activities (Casey, 2010). Computer and mobile forensics have received particular attention due to the demand from law enforcement, which is in turn linked to the growth and popularity of such equipment (European Anti-Fraud Office, 2014). Used for both cyber and traditional crime (e.g. terrorist attacks, child pornography and information leakage), these electronic devices provide an invaluable source of information and evidence. Indeed, criminals have been prosecuted based upon the evidence recovered from their computers and mobile phones via digital forensic techniques (FBI, 2011; Inforsecusa, 2011; Brainz, 2014). An essential analysis tool available to investigators is to perform data carving. This process permits the recovery of files from the raw image independent of any file system that might be present. This enables files to be recovered from unallocated space, slack space and from within files that an inspection of the file system would not reveal. The primary for detection mechanism is to locate the header and footer of a file and extract the data in between (Beek, 2011). Unfortunately, however, due to a variety of issues, such as fragmentation, deletion and missing sectors, the ability for data carvers to recover the data successfully is variable (Merola, 2008). A key issue for data carvers is their ability to recover data in scenarios where no associated header or footer information is present. For example, slack space often contains information regarding files but with the header missing perhaps due to being overwritten. The paper develops a new approach to data carving that enables the investigator to be able to determine if particular chunks of data contain information. The paper is structured as follows. Section 2 describes the current state of the art, introduces a range of data carvers and performs an evaluation of data carvers to investigate their performance. Section 3 presents the new tool and describes the design, testing and logic of the approach. An evaluation of the tool is presented in Section 4 alongside the conclusions and future work in Section 5. 2. BACKGROUND LITERATURE Literature often seeks to classify data carving approaches into two: simple and advanced (Pal & Memon, 2009). Simple data carvers are able to carve files via identifying a unique signature within the header and locating its associated footer. For example, a PDF file could be carved from a piece of data if it starts with “%PDF” (i.e. the PDF header) and ends with “%EOF” (i.e. the PDF footer). The approach therefore assumes the files are stored in continuous data clusters within the raw image (Hand, 2012). From one perspective, this is a sound assumption, as modern file systems will always seek to store data in continuous data clusters. However, due to the operation of the file system and the size of a file, a series of alternative scenarios are possible. As illustrated in Figure 1, a variety of fragmentation possibilities exist which result in the data for a file being injected with data from another file, missing or reversed.
  • 3. Figure 1: Examples of File Fragmentation Advanced data carving approaches seek to overcome these issues. Techniques to date largely focus upon relying upon some internal file structure within the data itself. Content-based approaches utilize characteristics such as character count, text/language recognition, white and black listing of data, statistical attributes and information entropy (Kloet, 2010). Such approaches are however open to errors with incorrectly carved files. This gives rise to performance characteristics. Garfinkel (2007) identified two key limitations with current data carving tools: 1. Files had to be stored in sequential clusters 2. No evaluation of the carved file leading to a large number of false positives Pal and Memon (2009) present a number of approaches that seek to automate the reconstruction of fragmented files with varying levels of success. Automated verification of the validity of data carving is no simple problem to solve. Whilst the literature provides a reasonable overview of the current state of the art, it is difficult to establish their relative performance. Moreover, it is not evident from the prior work, how well they perform in scenarios where files are fragmented. It was therefore considered prudent to perform an evaluation of current tool capabilities in order to evaluate the performance. An experiment was devised to test the capabilities of a number of data carvers against a fixed forensic image. The Digital Forensics Workshop (DFRWS) through its annual conference challenge produced a dataset in 2006 (and also a more advanced version in 2007) (DFRWS, 2006; DFRWS, 2007). The 2006 dataset focused primarily on 4 categories of files: HTML, Microsoft Office, JPEG and Zip and contained a total of 32 base files. A selection of open source and commercial data carving tools were utilized, including the industry leading products: Guidance Software’s Encase and AccessData’s FTK (Guidance Software, 2014; AccessData, 2014).
  • 4. Application No. of files present No. of Files Extracted No. of Successfully Carved No. of Partial Carved Files Encase 32 24 10 (31%) 6 FTK 32 24 6 (19%) 10 Scalpel 32 50 15 (47%) 5 WinHex 32 13 8 (25%) 5 Table 1: Data Carver Results for DFRWS 2006 Dataset The results from the DRFWS 2006 dataset demonstrate a relatively poor performance across the tools. The successful category is measured based upon a file that is completely carved correctly. It was notable on a number of occasions across all tools that partial recovery was possible. Indeed, utilizing Scalpel, three of the fragmented image files had been partial recovered successfully. In these particular cases, enough to recognize the content and thus be of potential use; however, this is not necessarily always the case. Notably, none of the carvers supported the Microsoft Excel spreadsheet or the text file formats, so neither were successfully carved. That said, some of the text files were contained within other partially carved files (i.e. appeared as a fragment after an HTML file). Initially, the 2007 dataset was also going to be evaluated; however, as it represents a more complex scenario incorporating a wider range of file types such as MP3, AVI, FLV and PDF and given the performance against the 2006 dataset, it was deemed unnecessary. Analysis of these results shows that the data carvers have a significant issue when it comes files that are fragmented, out of sequence or missing. What is particularly surprising is that these problems have been established for over 8 years and modern carvers are still unable to process them (Garfinkel, 2007). 3. FILE RECOVERY USING ARTIFICIAL HEADERS (FRAH) Given the prior art and evaluation of the tools, the research sought to develop an approach to data carving that look to solve several issues: • To provide the ability to render files with missing or corrupt headers • To provide the ability to render fragments of data that contain no associated header information. This approach to the problem enables the investigator to examine whether files that are not rendering (or cannot be open) might indeed be incomplete but yet contain valuable information. It also provides an approach to examine the slack space areas within the drive to determine whether the data is meaningful. It achieves this by inserting an artificial header on the file and subsequently manipulating the data in order to determine whether a valid file is present. A process model for the approach is presented in Figure 2.
  • 5. Figure 2: FRAH Process Model In order to test the approach, a prototype was developed. As illustrated in Figure 3, a simple interface was proposed that accepted the location of the file and would then subsequently proceed to evaluate the data against a set of pre-defined file types (e.g. BMP, PNG, GIF, PDF). In order to focus upon the concept of artificial headers, the tool was designed to take files that Access Data’s FTK was able to extract, rather than working on the individual forensic image; however, future developments will include this functionality. After the file has been entered and a file type selected, the system will apply the appropriate header and attempt to open the file using the system’s built-in viewer. Figure 3: FRAH Interface For the purposes of demonstrating the capability, the tool merely leaves the decision as to whether the file context is valid or not to the investigator. However, for large numbers of files, this process will need to be automated. 4. EVALUATION & DISCUSSION In order to test the tool across the differing files types, a number of test files were created (2 BMP, 2 PNG, 1 GIF, 1 PDF). In each of the cases, the header information was corrupted through the deletion or additional of random bytes. Importantly however, in all but one test file (Testfile1b), the data carving signature was included, meaning data carvers should be able to identify the file. As illustrated in Figure 3, in a standard file system view of the files, none of them are either rendered or identified except for Testfile1a – which is recognized as a BMP merely due to the file extension
  • 6. being present on the file name. Nevertheless it is still unable to render the image due to corruption. Figure 4: Evaluation Files: Initial State As illustrated in Table 3 and Figure 4, the application of FRAH results in each of these files being recoverable. In each case, FRAH merely ignores any header information present and merely inserts an artificial header onto the file. Filename: Carve Signature File Type Analysis File Type Successful Carve Testfile1a Yes BMP BMP Yes Testfile1b No Unknown BMP Yes Testfile2a Yes Unknown PNG Yes Testfile2b Yes Unknown PNG Yes Testfile3a Yes Unknown GIF Yes Testfile4a Yes Unknown PDF Yes Table 3: Evaluation Results Interestingly, even with valid carver signatures present in five of the six files, testing these files against the Access Data’s FTK resulted in FTK unable to recover any of the files. The FTK data carving process did however recover three partial carved files, but all three were associated with images contained within the PDF of Testfile4a. Figure 5: Evaluation Files: Post FRAH
  • 7. Notably, neither of the forensic images (DFRWS 2006 and 2007) contain files where the header is specifically corrupted or no longer present, although files that have been fragmented could arguably fall into this category for any fragments (bar the one containing the header). Therefore, a secondary external source was identified in order to evaluate the tool. The DC3 Digital Forensics Challenge is an annual forensics challenge run by the US Department of Defence (DC3, 2013). The challenge involves users putting their knowledge of security to the test in completing a range of tasks such as data carving, decryption, file registry analysis and steganography. The challenge consists of two files with missing headers (a PNG and PDF). As illustrated in Figure 4 both of these files were recovered successfully. Whilst the evaluation has proven successful, further analysis of the scenarios that would naturally occur within cases does highlight a number of limitations with the current approach. FRAH currently operates by inserting an artificial header onto the payload of the file. If a file header is corrupted then FRAH is able to recover the file. However, in circumstances where the header or the first fragment is missing, it is likely that elements of the payload in addition to the header are also missing. Further research needs to investigate the impact of missing or corrupt payload data, with a view to the padding and manipulation of the data in order to recover the files contents that remain. This approach would then also permit the application of single fragments of data to be recovered (rather than simple the first fragment as is typical with data carvers today). 5. CONCLUSIONS The proposed tool is capable of recovering files with corrupt or missing header information across a number of standard file types. An analysis of current data carvers demonstrated that none of these tools currently have such capability and the evaluation successfully demonstrated recovery for all files. The initial prototype is however limited and further research is required to provide a more robust carver with a level of automation. Enhancements are required in the following areas: • The ability to accept a range of data fragments, rather than a single file so that multiple data fragments can readily analyzed • To automate the identification of meaningful data, thereby removing the need for human-intervention • To manipulate the file contents in a systematic fashion in order to enable successful viewing of the content • To increase the range of file types supported REFERENCES AccessData (2014) “FTK-Forensic Toolkit”, http://www.accessdata.com/products/digital-forensics/ftk, date accessed 02 February 2014 Beek, C (2011) “Introduction to File Carving”, McAfee white paper, http://www.mcafee.com/uk/resources/white-papers/foundstone/wp-intro-to-file-carving.pdf, date accessed: 18 February 2014
  • 8. Brainz (2014) “15 Criminal Cases Solved With Digital Evidence”, http://brainz.org/15-criminal-cases-solved- digital-evidence/, date accessed: 15 January 2014 Casey, E. ed (2010). “Handbook of Digital Forensics and Investigation”, Academic Press. p. 567. ISBN 0-12-374267-6 DC3 (2013) “DC3 Cyber Crime Challenges”, https://www.dc3.mil/challenge/, date accessed: 10 February 2014 DFRWS (2006) “DFRWS 2006 Forensics Challenge Overview”, http://www.dfrws.org/2006/challenge/index.shtml, date accessed: 28 January 2014 DFRWS (2007) “DFRWS 2007 Forensics Challenge Overview”, http://www.dfrws.org/2007/challenge/index.shtml, date accessed: 28 January 2014 European Anti-Fraud Office (2014) “Digital Forensics”, http://ec.europa.eu/anti_fraud/investigations/forensics/index_en.htm, 19 February 2014 FBI (2011) “Digital Forensics Regional Labs Help Solve Local Crimes”, http://www.fbi.gov/news/stories/2011/may/forensics_053111, date accessed: 20 January 2014 Garfinkel, S. (2007). Carving contiguous and fragmented files with fast object validation. Available: http://dfrws.org/2007/proceedings/p2-garfinkel.pdf. Last accessed 12th Apr 2013. Guidance Software (2014) “EnCase Forensic”, http://www.guidancesoftware.com/products/Pages/encase-forensic/overview.aspx?cmpid=nav, date accessed: 06 February 2014 Hand, S. (2012). Bin-Carver: Automatic Recovery of Binary Executables. Available: http://www.dfrws.org/2012/proceedings/DFRWS2012-12.pdf. Last accessed 12th Apr 2013. Inforsecusa (2011) “Computer Forensics Criminal Cases”, http://infosecusa.com/computer-forensics-criminal- cases, date accessed: 22 January 2014 Kloet, B. (2010). Advanced File Carving. Available: http://computer-forensics.sans.org/summit-archives/ 2010/eu-digital-forensics-incident-response-summit-bas-kloet-advanced-file-carving.pdf. Last accessed 12th Apr 2013. Merola, A (2008) “Data Carving Concepts”, (http://www.sans.org/reading-room/ whitepapers/forensics/data-carving-concepts-32969, date accessed 10 febuary 2014 Pal, A & Memon, N. (2009). The Evolution of File Carving. Available: http://digital-assembly. com/technology/research/pubs/ieee-spm-2009.pdf. Last accessed 12th Apr 2013.