Data carving using artificial headers info sec conference

Data Carving using Artificial Headers
R. Daniel1, N.L. Clarke1,2
& F. Li1
1Centre for Security, Communications & Network Research (CSCAN), Plymouth
University, United Kingdom;
2Security Research Institute, Edith Cowan University, Western Australia
e-mail: info@cscan.org
Abstract
Digital forensic tools are an essential requirement in criminal and increasingly civil
cases in order to process electronic evidence. Investigators rely upon the functionality
of these tools to identify and extract relevant artifacts. One of these key processes is
data carving – an approach that ignores the file system and analyses the drive for files
that match a particular signature. Unfortunately, however, other than simple files, data
carving has many limitations that result in either missing files or producing high
numbers of false alarms. The core of their detection is largely based upon a signature
appearing in the header of the file. However, for files that have corrupted or missing
headers, modern data carvers are unable to recover the file successfully. This paper
proposes a new approach to data carving that inserts an artificial header onto the file,
thereby circumventing the header issue. Experiments have demonstrated that this
approach is able to successfully recover files that no current data-carving tools are
able to achieve.
Keywords: Digital forensics, computer forensics, data carving

Data Carving using Artificial Headers
1. INTRODUCTION
Digital forensics has become an invaluable tool in the identification of criminal
activities (Casey, 2010). Computer and mobile forensics have received particular
attention due to the demand from law enforcement, which is in turn linked to the
growth and popularity of such equipment (European Anti-Fraud Office, 2014). Used
for both cyber and traditional crime (e.g. terrorist attacks, child pornography and
information leakage), these electronic devices provide an invaluable source of
information and evidence. Indeed, criminals have been prosecuted based upon the
evidence recovered from their computers and mobile phones via digital forensic
techniques (FBI, 2011; Inforsecusa, 2011; Brainz, 2014). An essential analysis tool
available to investigators is to perform data carving. This process permits the recovery
of files from the raw image independent of any file system that might be present. This
enables files to be recovered from unallocated space, slack space and from within
files that an inspection of the file system would not reveal. The primary for detection
mechanism is to locate the header and footer of a file and extract the data in between
(Beek, 2011). Unfortunately, however, due to a variety of issues, such as
fragmentation, deletion and missing sectors, the ability for data carvers to recover the
data successfully is variable (Merola, 2008).
A key issue for data carvers is their ability to recover data in scenarios where no
associated header or footer information is present. For example, slack space often
contains information regarding files but with the header missing perhaps due to being
overwritten. The paper develops a new approach to data carving that enables the
investigator to be able to determine if particular chunks of data contain information.
The paper is structured as follows. Section 2 describes the current state of the art,
introduces a range of data carvers and performs an evaluation of data carvers to
investigate their performance. Section 3 presents the new tool and describes the
design, testing and logic of the approach. An evaluation of the tool is presented in
Section 4 alongside the conclusions and future work in Section 5.
2. BACKGROUND LITERATURE
Literature often seeks to classify data carving approaches into two: simple and
advanced (Pal & Memon, 2009). Simple data carvers are able to carve files via
identifying a unique signature within the header and locating its associated footer. For
example, a PDF file could be carved from a piece of data if it starts with “%PDF” (i.e.
the PDF header) and ends with “%EOF” (i.e. the PDF footer). The approach therefore
assumes the files are stored in continuous data clusters within the raw image (Hand,
2012). From one perspective, this is a sound assumption, as modern file systems will
always seek to store data in continuous data clusters.
However, due to the operation of the file system and the size of a file, a series of
alternative scenarios are possible. As illustrated in Figure 1, a variety of
fragmentation possibilities exist which result in the data for a file being injected with
data from another file, missing or reversed.

Figure 1: Examples of File Fragmentation
Advanced data carving approaches seek to overcome these issues. Techniques to date
largely focus upon relying upon some internal file structure within the data itself.
Content-based approaches utilize characteristics such as character count,
text/language recognition, white and black listing of data, statistical attributes and
information entropy (Kloet, 2010). Such approaches are however open to errors with
incorrectly carved files. This gives rise to performance characteristics.
Garfinkel (2007) identified two key limitations with current data carving tools:
1. Files had to be stored in sequential clusters
2. No evaluation of the carved file leading to a large number of false positives
Pal and Memon (2009) present a number of approaches that seek to automate the
reconstruction of fragmented files with varying levels of success. Automated
verification of the validity of data carving is no simple problem to solve.
Whilst the literature provides a reasonable overview of the current state of the art, it is
difficult to establish their relative performance. Moreover, it is not evident from the
prior work, how well they perform in scenarios where files are fragmented. It was
therefore considered prudent to perform an evaluation of current tool capabilities in
order to evaluate the performance. An experiment was devised to test the capabilities
of a number of data carvers against a fixed forensic image. The Digital Forensics
Workshop (DFRWS) through its annual conference challenge produced a dataset in
2006 (and also a more advanced version in 2007) (DFRWS, 2006; DFRWS, 2007).
The 2006 dataset focused primarily on 4 categories of files: HTML, Microsoft Office,
JPEG and Zip and contained a total of 32 base files.
A selection of open source and commercial data carving tools were utilized, including
the industry leading products: Guidance Software’s Encase and AccessData’s FTK
(Guidance Software, 2014; AccessData, 2014).

Application No. of
files
present
No. of Files
Extracted
No. of
Successfully
Carved
No. of
Partial
Carved
Files
Encase 32 24 10 (31%) 6
FTK 32 24 6 (19%) 10
Scalpel 32 50 15 (47%) 5
WinHex 32 13 8 (25%) 5
Table 1: Data Carver Results for DFRWS 2006 Dataset
The results from the DRFWS 2006 dataset demonstrate a relatively poor performance
across the tools. The successful category is measured based upon a file that is
completely carved correctly. It was notable on a number of occasions across all tools
that partial recovery was possible. Indeed, utilizing Scalpel, three of the fragmented
image files had been partial recovered successfully. In these particular cases, enough
to recognize the content and thus be of potential use; however, this is not necessarily
always the case. Notably, none of the carvers supported the Microsoft Excel
spreadsheet or the text file formats, so neither were successfully carved. That said,
some of the text files were contained within other partially carved files (i.e. appeared
as a fragment after an HTML file). Initially, the 2007 dataset was also going to be
evaluated; however, as it represents a more complex scenario incorporating a wider
range of file types such as MP3, AVI, FLV and PDF and given the performance
against the 2006 dataset, it was deemed unnecessary. Analysis of these results shows
that the data carvers have a significant issue when it comes files that are fragmented,
out of sequence or missing. What is particularly surprising is that these problems have
been established for over 8 years and modern carvers are still unable to process them
(Garfinkel, 2007).
3. FILE RECOVERY USING ARTIFICIAL HEADERS (FRAH)
Given the prior art and evaluation of the tools, the research sought to develop an
approach to data carving that look to solve several issues:
• To provide the ability to render files with missing or corrupt headers
• To provide the ability to render fragments of data that contain no associated
header information.
This approach to the problem enables the investigator to examine whether files that
are not rendering (or cannot be open) might indeed be incomplete but yet contain
valuable information. It also provides an approach to examine the slack space areas
within the drive to determine whether the data is meaningful. It achieves this by
inserting an artificial header on the file and subsequently manipulating the data in
order to determine whether a valid file is present. A process model for the approach is
presented in Figure 2.

Figure 2: FRAH Process Model
In order to test the approach, a prototype was developed. As illustrated in Figure 3, a
simple interface was proposed that accepted the location of the file and would then
subsequently proceed to evaluate the data against a set of pre-defined file types (e.g.
BMP, PNG, GIF, PDF). In order to focus upon the concept of artificial headers, the
tool was designed to take files that Access Data’s FTK was able to extract, rather than
working on the individual forensic image; however, future developments will include
this functionality. After the file has been entered and a file type selected, the system
will apply the appropriate header and attempt to open the file using the system’s built-in
viewer.
Figure 3: FRAH Interface
For the purposes of demonstrating the capability, the tool merely leaves the decision
as to whether the file context is valid or not to the investigator. However, for large
numbers of files, this process will need to be automated.
4. EVALUATION & DISCUSSION
In order to test the tool across the differing files types, a number of test files were
created (2 BMP, 2 PNG, 1 GIF, 1 PDF). In each of the cases, the header information
was corrupted through the deletion or additional of random bytes. Importantly
however, in all but one test file (Testfile1b), the data carving signature was included,
meaning data carvers should be able to identify the file. As illustrated in Figure 3, in a
standard file system view of the files, none of them are either rendered or identified
except for Testfile1a – which is recognized as a BMP merely due to the file extension

being present on the file name. Nevertheless it is still unable to render the image due
to corruption.
Figure 4: Evaluation Files: Initial State
As illustrated in Table 3 and Figure 4, the application of FRAH results in each of
these files being recoverable. In each case, FRAH merely ignores any header
information present and merely inserts an artificial header onto the file.
Filename: Carve
Signature
File Type Analysis
File Type
Successful
Carve
Testfile1a Yes BMP BMP Yes
Testfile1b No Unknown BMP Yes
Testfile2a Yes Unknown PNG Yes
Testfile2b Yes Unknown PNG Yes
Testfile3a Yes Unknown GIF Yes
Testfile4a Yes Unknown PDF Yes
Table 3: Evaluation Results
Interestingly, even with valid carver signatures present in five of the six files, testing
these files against the Access Data’s FTK resulted in FTK unable to recover any of
the files. The FTK data carving process did however recover three partial carved files,
but all three were associated with images contained within the PDF of Testfile4a.
Figure 5: Evaluation Files: Post FRAH

Notably, neither of the forensic images (DFRWS 2006 and 2007) contain files where
the header is specifically corrupted or no longer present, although files that have been
fragmented could arguably fall into this category for any fragments (bar the one
containing the header). Therefore, a secondary external source was identified in order
to evaluate the tool. The DC3 Digital Forensics Challenge is an annual forensics
challenge run by the US Department of Defence (DC3, 2013). The challenge involves
users putting their knowledge of security to the test in completing a range of tasks
such as data carving, decryption, file registry analysis and steganography. The
challenge consists of two files with missing headers (a PNG and PDF). As illustrated
in Figure 4 both of these files were recovered successfully.
Whilst the evaluation has proven successful, further analysis of the scenarios that
would naturally occur within cases does highlight a number of limitations with the
current approach. FRAH currently operates by inserting an artificial header onto the
payload of the file. If a file header is corrupted then FRAH is able to recover the file.
However, in circumstances where the header or the first fragment is missing, it is
likely that elements of the payload in addition to the header are also missing. Further
research needs to investigate the impact of missing or corrupt payload data, with a
view to the padding and manipulation of the data in order to recover the files contents
that remain. This approach would then also permit the application of single fragments
of data to be recovered (rather than simple the first fragment as is typical with data
carvers today).
5. CONCLUSIONS
The proposed tool is capable of recovering files with corrupt or missing header
information across a number of standard file types. An analysis of current data carvers
demonstrated that none of these tools currently have such capability and the
evaluation successfully demonstrated recovery for all files.
The initial prototype is however limited and further research is required to provide a
more robust carver with a level of automation. Enhancements are required in the
following areas:
• The ability to accept a range of data fragments, rather than a single file so that
multiple data fragments can readily analyzed
• To automate the identification of meaningful data, thereby removing the need
for human-intervention
• To manipulate the file contents in a systematic fashion in order to enable
successful viewing of the content
• To increase the range of file types supported
REFERENCES
AccessData (2014) “FTK-Forensic Toolkit”, http://www.accessdata.com/products/digital-forensics/ftk,
date accessed 02 February 2014
Beek, C (2011) “Introduction to File Carving”, McAfee white paper,
http://www.mcafee.com/uk/resources/white-papers/foundstone/wp-intro-to-file-carving.pdf, date
accessed: 18 February 2014

Brainz (2014) “15 Criminal Cases Solved With Digital Evidence”, http://brainz.org/15-criminal-cases-solved-
digital-evidence/, date accessed: 15 January 2014
Casey, E. ed (2010). “Handbook of Digital Forensics and Investigation”, Academic Press.
p. 567. ISBN 0-12-374267-6
DC3 (2013) “DC3 Cyber Crime Challenges”, https://www.dc3.mil/challenge/, date accessed: 10
February 2014
DFRWS (2006) “DFRWS 2006 Forensics Challenge Overview”,
http://www.dfrws.org/2006/challenge/index.shtml, date accessed: 28 January 2014
DFRWS (2007) “DFRWS 2007 Forensics Challenge Overview”,
http://www.dfrws.org/2007/challenge/index.shtml, date accessed: 28 January 2014
European Anti-Fraud Office (2014) “Digital Forensics”,
http://ec.europa.eu/anti_fraud/investigations/forensics/index_en.htm, 19 February 2014
FBI (2011) “Digital Forensics Regional Labs Help Solve Local Crimes”,
http://www.fbi.gov/news/stories/2011/may/forensics_053111, date accessed: 20 January 2014
Garfinkel, S. (2007). Carving contiguous and fragmented files with fast object validation. Available:
http://dfrws.org/2007/proceedings/p2-garfinkel.pdf. Last accessed 12th Apr 2013.
Guidance Software (2014) “EnCase Forensic”,
http://www.guidancesoftware.com/products/Pages/encase-forensic/overview.aspx?cmpid=nav, date
accessed: 06 February 2014
Hand, S. (2012). Bin-Carver: Automatic Recovery of Binary Executables. Available:
http://www.dfrws.org/2012/proceedings/DFRWS2012-12.pdf. Last accessed 12th Apr 2013.
Inforsecusa (2011) “Computer Forensics Criminal Cases”, http://infosecusa.com/computer-forensics-criminal-
cases, date accessed: 22 January 2014
Kloet, B. (2010). Advanced File Carving. Available: http://computer-forensics.sans.org/summit-archives/
2010/eu-digital-forensics-incident-response-summit-bas-kloet-advanced-file-carving.pdf. Last
accessed 12th Apr 2013.
Merola, A (2008) “Data Carving Concepts”, (http://www.sans.org/reading-room/
whitepapers/forensics/data-carving-concepts-32969, date accessed 10 febuary 2014
Pal, A & Memon, N. (2009). The Evolution of File Carving. Available: http://digital-assembly.
com/technology/research/pubs/ieee-spm-2009.pdf. Last accessed 12th Apr 2013.

Data carving using artificial headers info sec conference

More Related Content

What's hot (19)

Similar to Data carving using artificial headers info sec conference (20)

Recently uploaded (20)

Data carving using artificial headers info sec conference