PhotoRec Data Carving - CGSecurity
/**/
var skin = "monobook";
var stylepath = "/mw/skins";
var wgArticlePath = "/wiki/$1";
var wgScriptPath = "/mw";
var wgServer = "http://www.cgsecurity.org";
var wgCanonicalNamespace = "";
var wgNamespaceNumber = 0;
var wgPageName = "PhotoRec_Data_Carving";
var wgTitle = "PhotoRec Data Carving";
var wgArticleId = 1353;
var wgIsArticle = true;
var wgUserName = null;
var wgUserLanguage = "en";
var wgContentLanguage = "en";
/**/
PhotoRec Data Carving
From CGSecurity
Jump to: navigation, search
Data carving is the process of extracting a collection of data from a larger data set. Data carving techniques frequently occur during a digital investigation when the unallocated file system space is analyzed to extract files. The files are "carved" from the unallocated space using file type-specific header and footer values. File system structures are not used during the process. This is exactly how PhotoRec works.
Digital Forensics Research Workshop has issued a Data Carving challenge.
The data set for this challenge is a 50MB raw file. It has no file system, but it contains JPEG, ZIP, HTML, Text, and Microsoft Office files and fragments. The goal is to extract as many full JPEG, ZIP, HTML, Text, and Office files as possible from it. Using this challenge as a test bed, PhotoRec has been improved to recover even more data than before.
Everyone is welcome to contribute to the project.
Contents
1 Data Recovery Process
1.1 PhotoRec
1.2 Manual recovery of remaining JPEG
1.3 Manual recovery of zip files
1.4 Manual recovery of XLS/Ole file
2 Disk Layout
3 Files
4 Conclusion
if (window.showTocToggle) { var tocShowText = "show"; var tocHideText = "hide"; showTocToggle(); }
Data Recovery Process
PhotoRec
The first step has been to use PhotoRec. Version 6.5-WIP (WIP=Work In Progress) is considered.
PhotoRec has scanned the image file for known header and has successfully recognise all Jpeg, Ole/Office, HTML and ZIP headers. There is no false positive.
JPEG footer is used to determine the file size and validity of recovered JPEG is checked by PhotoRec using libjpeg.
ZIP footer are detected but the file integrity isn't checked.
OLE file format is very complex, its internals are similar to a filesystem but PhotoRec is able to get the file size by analyzing the FAT.
Text files are hard to detect because there is no header. After a UTF8 to ASCII translation, PhotoRec calculates the
index of coincidence to determine if a sector holds text or random data. There can be false positive if a Doc or an HTML file isn't well detected (ie. fragmented data).
Manual recovery of remaining JPEG
PhotoRec can handle some form of data fragmentation in JPEG file, using libjpeg library, it's able to check recovered data. This way, it has been able to recover 9 JPEG perfectly. Manual recovery was initiated to recover the remaining files. Using dd and PhotoRec, additional files have been recovered.
A picture of a hedgehog begins at sector 31475 and a picture from Mars begins at sector 31533.
Extract from photorec.log:
31475-31532: jpg
31533-32836: jpg
The second picture begins while the first isn't finished, both pictures are corrupted.
Reading photorec log file, we can learn that the mars picture is corrupted after about 118784 bytes (JPG error at offset 118784). Let's try to find the exact data fragment size.
$ dd if=dfrws-2006-challenge.raw of=mars1.jpg skip=31533 count=`expr 118784 / 512`
232+0 records in
232+0 records out
$ display mars1.jpg
display: Corrupt JPEG data: premature end of data segment `mars1.jpg'.
display: Corrupt JPEG data: premature end of data segment `mars1.jpg'.
The JPEG fragment is 232 sectors long but garbage can be seen at the end of the image, it means the fragment is too large.
By trial and error, it's possible to determine that the fragment is 220 sectors long.
$ dd if=dfrws-2006-challenge.raw of=mars2.jpg skip=31533 count=220
220+0 records in
220+0 records out
$ display mars2.jpg
display: Corrupt JPEG data: premature end of data segment `mars2.jpg'.
display: Corrupt JPEG data: premature end of data segment `mars2.jpg'.
There is no garbage left in the picture.
31475-31532: jpg fragment, hedgehog
31533-31752: jpg fragment, mars
31753-32836 ?
$ dd if=dfrws-2006-challenge.raw skip=31475 count=`expr 31532 - 31475 + 1` > hedgehog.jpg
58+0 records in
58+0 records out
$ dd if=dfrws-2006-challenge.raw skip=31753 count=`expr 32836 - 31753 + 1` >> hedgehog.jpg
1084+0 records in
1084+0 records out
$ display hedgehog.jpg
Now, the exact file size can be found using PhotoRec on the recovered picture.
$ photorec hedgehog.jpg
PhotoRec 6.4, Data Recovery Utility, June 2006
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Please wait...
Disk hedgehog.jpg - 584 KB / 571 KiB - CHS 1 255 63 (RO), sector size=512
PhotoRec exited normally.
$ ls -l recup_dir.1/f0.jpg
-rw-rw-r-- 1 kmaster kmaster 98354 Jul 10 11:30 recup_dir.1/f0.jpg
$ md5sum recup_dir.1/f0.jpg
db89684c177168036e274140ecf766a1 recup_dir.1/f0.jpg
The picture size is 98354 (193 sectors). We can now recover the mars picture.
$ expr 31753 + 193 - 58
31888
$ dd if=dfrws-2006-challenge.raw skip=31533 count=220 > mars3.jpg 220+0 records in
220+0 records out
$ dd if=dfrws-2006-challenge.raw skip=31888 count=`expr 32836 - 31888 + 1` >> mars3.jpg
As seen before, it's possible to get the exact file size:
$ photorec mars3.jpg
PhotoRec 6.4, Data Recovery Utility, June 2006
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Please wait...
Disk mars3.jpg - 598 KB / 584 KiB - CHS 1 255 63 (RO), sector size=512
PhotoRec exited normally.
$ ls -l recup_dir.2/
total 192
-rw-rw-r-- 1 kmaster kmaster 188693 Jul 10 11:47 f0.jpg
$ md5sum recup_dir.2/f0.jpg
0915313e99af0f6bf13bc06bcd003113 recup_dir.2/f0.jpg
Manual recovery of zip files
Three zip files are recovered by PhotoRec but one of them is corrupted.
A little perl script was used to fix the zip file beginning at sector 45015 found by PhotoRec.
Using unzip, this little perl script locates and removes the extra sectors
presents in the file.
Manual recovery of XLS/Ole file
Office document including Excel are using the Ole file format.
A document has been identified at sector 2051 but this document hasn't been successfully recovered by PhotoRec,
the file may be fragmented. A OLE file consists of a header structure and a list of all sectors following
the header. In our case,
the size of the sectors is 512 bytes.
The SID (Sector Identifier) of first sector of the directory stream is 1688.
The master sector allocation table is using 14 sectors: 1673-1685,1689.
The directory stream, SID 3761, lists the following components
Workbook SID 0 size 848333
SummaryInformation SID 1657 size 4096
DocumentSummaryInformation SID 1690 size 4096
Sectors
Object
SID
2051
Header
N/A
2052-?,?-3729
Workbook
0-1656
x-x+20
21 extra sectors, not XLS
N/A
3730-3737
SummaryInformation
1657-1664
3746-3758,3762
Allocation Table
1673-1685,1689
3761
RootDirectory
1688
3763-3770
DocumentSummaryInformation
1690-1697
Unfortunatly, I have failed to locate the 21 extra sectors.
Anyway, latest version of OpenOffice has been able to open the corrupted file and display most data.
A new version of the document can be found on fcc web site.
Disk Layout
Sectors File type Note
0-8
HTML (fragment)
Alice in Wonderland by Lewis Carroll
9-44
HTML
Alice in Wonderland by Lewis Carroll
2051
Office (fragment)
Excel, http://www.fcc.gov/Forms/Form477/477.xls
3868-4428
JPG
640x481 Mars
4436-4455
HTML (end is missing)
A STUDY IN SCARLET by Sir Arthur Conan Doyle
4456-4501
HTML
Stave 1: Marley's Ghost by Charles Dickens
4502-4556
HTML (beginning is missing)
Stave 1: Marley's Ghost by Charles Dickens
7964-8284
Office
Upcoming Research Symposium 1/3
8285-9473
JPG
http://www.dfrws.org/2004/photos/day2/rodeo1-3-dfrws2004.jpg
9474
Office
Upcoming Research Symposium 2/3
10031
Office
Upcoming Research Symposium 3/3
11619-11822
JPG
yeast 1/2
11823-11848
Text
Moby Dick, Chapter i - LOOMINGS (page 1-6)
11849-12017
JPG
yeast 2/2
12222-26116
JPG
DFRWS 2006 Forensics Challenge, 11598x11598
27496-27606
HTML
The Comedy of Errors by Shakespeare, Act I, Scene I (1/2)
27607-27977
JPG
The porcupine
27978-28196
HTML
The Comedy of Errors by Shakespeare (2/2)
28244-28245
HTML
Moby Dick - chapter 134 (1/2)
28246-28306
Text (fragment)
De la division du travail social, Emile Durkheim
28307-28344
HTML
Moby Dick page - chapter 135 (2/2)
28439-28726
ZIP
Zip Ok
28729-29528
ZIP
ZIP 1/2
29529-29895
HTML
The Tempest, Shakespeare
29896-31368
ZIP
ZIP 2/2
31475-31532
JPG
A hedgehog (1/2)
31533-31752
JPG
Mars (1/2)
31753-31887
JPG
A hedgehog (2/2)
31888-32036
JPG
Mars (2/2)
32837-33397
Office
http://www.tsa.gov/public/interweb/assetlibrary/Permitted_Prohibited_Facts.doc
34288-34306
Office
"Reports on Computer Systems Technology"
http://csrc.nist.gov/publications/nistpubs/800-26/sp800-26.doc 1/2
34307-34412
Text
The Adventure of the Copper Beeches
34413-36236
Office
http://csrc.nist.gov/publications/nistpubs/800-26/sp800-26.doc 2/2
36292-36640
JPG
?
36998-37649
Office
PREVENTING CRIME: WHAT WORKS, WHAT DOESN'T, WHAT'S PROMISING
http://www.ncjrs.org/docfiles/wholedoc.doc 1/3
37727-39427
Office
http://www.ncjrs.org/docfiles/wholedoc.doc 2/3
39477-40380
Office
http://www.ncjrs.org/docfiles/wholedoc.doc 3/3
40638-41219
JPG
http://www.dfrws.org/2004/photos/day2/rodeo1-breaf-dfrws2004.jpg 1/2
41239-41609
JPG
http://www.dfrws.org/2004/photos/day2/rodeo1-breaf-dfrws2004.jpg 2/2
41611-43433
JPG
http://imgsrc.hubblesite.org/hu/db/2006/10/images/a/formats/1280_wallpaper.jpg (1/2)
43434-44028
JPG
http://www.dfrws.org/2004/photos/day2/rodeo1-dfrws2004.jpg
44029-44200
JPG
http://imgsrc.hubblesite.org/hu/db/2006/10/images/a/formats/1280_wallpaper.jpg (2/2)
45015-45386
ZIP
Zip 1/2
45390-45545
ZIP
Zip 2/2
45566-45963
JPG
U. S. Geological Survey Open-File Report 01-154
Slope off Florida Keys
[hotos/1565.jpg ttp://pubs.usgs.gov/of/2001/of01-154/data/bphotos/1565.jpg 1/2
45964-46103
Office
Farm Credit System Insurance Corporation
Statement of Financial Condition
March 31, 2006 and December 31, 2005
http://www.fcsic.gov/documents/3-31-2006%20Financial%20Statement.doc
46104-46826
JPG
http://pubs.usgs.gov/of/2001/of01-154/data/bphotos/1565.jpg 2/2
46910-94836
JPG
DFRWS 2006 Forensics Challenge, 8640x8640
94846-95628
JPG
Saturn http://imgsrc.hubblesite.org/hu/db/2001/15/images/a/formats/full_jpg.jpg (1/2)
95630-96653
JPG
Saturn http://imgsrc.hubblesite.org/hu/db/2001/15/images/a/formats/full_jpg.jpg (2/2)
Files
File type
File size (in bytes)
MD5 hash
Sectors
Note
PhotoRec Score
HTML (fragment)
4608
ec89111e45da8265b641655d0f68725e
0-8
Alice in Wonderland by Lewis Carroll
5
HTML
18147
eec87931b03e5a4a4ef8fd51109a1227
9-44
Alice in Wonderland by Lewis Carroll
5
Office
869888
?
~ 2051-3770 (21 extra sectors)
http://www.fcc.gov/Forms/Form477/477.xls
1
JPG
287186
daf4205574abd6919b10ca8be92d17a3
3868-4428
640x481 Mars
5
HTML (end is missing)
10240
799ad2d2f2f1f17657338d98c97559c4
4436-4455
A STUDY IN SCARLET by Sir Arthur Conan Doyle
5
HTML
23544
f4481ed348d3d59c5dad80afeb0341f9
4456-4501
Stave 1: Marley's Ghost by Charles Dickens
5
HTML (beginning is missing)
27875
baf8b811ee9502408f9f0e73efa77cf0
4502-4556
Stave 1: Marley's Ghost by Charles Dickens
5
Office
450048
8d2a9a284e078805ada47db191f35244
7964-8284, 9474-10031
Upcoming Research Symposium
5
JPG
608703
4efc6c572683878efd8f3404ddaded7b
8285-9473
http://www.dfrws.org/2004/photos/day2/rodeo1-3-dfrws2004.jpg
5
JPG
190720
7b07320709e0caa947663f5df3a0a390
11619-11822, 11849-12017
yeast
5
Text
12826
f800a46e18fafd309825c5ee84a654a2
11823-11848
Moby Dick, Chapter i - LOOMINGS (page 1-6)
3
JPG
7113968
b070beae1606f67a342bc5f78c29c743
12222-26116
DFRWS 2006 Forensics Challenge, 11598x11598
5
HTML
168525
1959aa0391664b60fd0f2e64ed7a22f4
27496-27606, 27978-28196
The Comedy of Errors by Shakespeare, Act I, Scene I
2
JPG
189534
fe7e7ac67709f2d9c2483aa98c681b99
27607-27977
The porcupine
5
HTML
20019
045798407b927321326a547704e67831
28244-28245, 28307-28344
Moby Dick - chapter 134 and 135
2
Text (fragment)
30816
616a6bbe915c3dbf51014fd76f55b0e3
28246-28306
De la division du travail social, Emile Durkheim
0
ZIP
147150
ebabde39ba44d38888dd82606980498a
28439-28726
Zip Ok
5
ZIP
1163745
9a4c2d3a9bd203eb39c9f954a3c997e4
28729-29528, 29896-31368
ZIP
5
HTML
187793
158496c522d97b7389c9907cae777ac1
29529-29895
The Tempest, Shakespeare
5
JPG
98354
db89684c177168036e274140ecf766a1
31475-31532, 31753-31887
A hedgehog
2
JPG
188693
0915313e99af0f6bf13bc06bcd003113
31533-31752, 31888-32036
Mars
2
Office
287232
0e52e75029e99cd2e9dcd0af271cf4a2
32837-33397
http://www.tsa.gov/public/interweb/assetlibrary/Permitted_Prohibited_Facts.doc
5
Office
943616
d7ff92b8cc1c89c46a78288b9c673152
34288-34306, 34413-36236
http://csrc.nist.gov/publications/nistpubs/800-26/sp800-26.doc
2
Text
53870
5a12ef9dba88a186ef18a5d349b28e37
34307-34412
The Adventure of the Copper Beeches
3
JPG
178659
2fae8770cc013d22e9ea1c070f2f509b
36292-36640
?
5
Office
1667584
4a22f04b097920d11fff4e192e0667a4
36998-37649, 37727-39427, 39477-40380
PREVENTING CRIME: WHAT WORKS, WHAT DOESN'T, WHAT'S PROMISING
http://www.ncjrs.org/docfiles/wholedoc.doc
2
JPG
487473
f8c51e0688796b5d616f0e5d4a94d104
40638-41219, 41239-41609
http://www.dfrws.org/2004/photos/day2/rodeo1-breaf-dfrws2004.jpg
2
JPG
1021085
7cce072e518fd72484c97adb1b4be08e
41611-43433, 44029-44200
http://imgsrc.hubblesite.org/hu/db/2006/10/images/a/formats/1280_wallpaper.jpg
5
JPG
304413
c0da37b3f1a07af790e6e9171cedc4d2
43434-44028
http://www.dfrws.org/2004/photos/day2/rodeo1-dfrws2004.jpg
5
ZIP
270181
f940fcc37c82e8ff1431e5c3c061611e
45015-45386, 45390-45545
Zip
2
JPG
573499
2320fe9c41eaddb864a56c2ddc4dd186
45566-45963, 46104-46826
U. S. Geological Survey Open-File Report 01-154
Slope off Florida Keys
[hotos/1565.jpg http://pubs.usgs.gov/of/2001/of01-154/data/bphotos/1565.jpg
5
Office
71680
109284cc5abddc83879a29785795fd75
45964-46103
Farm Credit System Insurance Corporation
Statement of Financial Condition
March 31, 2006 and December 31, 2005
http://www.fcsic.gov/documents/3-31-2006%20Financial%20Statement.doc
5
JPG
24538540
db32b271506b2f4974791957627c61cc
46910-94836
DFRWS 2006 Forensics Challenge, 8640x8640
5
JPG
924877
1a5a843000ef617af93a9cad645e3cdf
94846-95628, 95630-96653
Saturn http://imgsrc.hubblesite.org/hu/db/2001/15/images/a/formats/full_jpg.jpg
1
PhotoRec Score Legend:
0 File not found
1 First sector identified
2 + correct file type
3 + all sectors identified
4 + correct file size
5 + correct checksum
Conclusion
PhotoRec has been able to retrieve most files automatically.
Results can still be improved by brute forcing JPG fragment location or adding some JPG search-only phase but this can be time-consuming.
Thanks to
Daniel Sedory for letting me know about this contest and his long time involvement in TestDisk/PhotoRec project
the following ESIEA students: Gregory BLANC, Fabien BOUFFARD, Hicham CHAARAOUI, Karim EL FILALI, Amine HASSANI, Igor VALLEE for their work on OLE file format.
Christophe GRENIER
Category: Data Recovery
if (window.isMSIE55) fixalpha();
Data Recovery
TestDisk
PhotoRec
download
This page was last modified 20:33, 22 October 2006.
Content is available under GNU Free Documentation License 1.2.
if (window.runOnloadHook) runOnloadHook();
Wyszukiwarka
Podobne podstrony:
photorecnach?m gebrauch von photorecwiederherstellbare?teiformate unter photorecphotoresistorafter using photorecTestDisk & PhotoRecfile formats recovered by photorecdigitale foto wiederherstellung mit photorecphotorec?photorec frdigital photos recovery using photorecwięcej podobnych podstron