Computer Forensic Exam of Najibullah Zazi’s Laptop

Earlier today, Jonathan Abolins tweeted about a US DOJ memorandum on detainee Najibullah Zazi.  The memorandum is about the motion the US government filed for a permanent order of detention for Zazi.  Part of the evidence that supports the order of detention, comes from a forensic exam of Zazi’s laptop.  I found a few pieces of evidence quite interesting from a digital forensics perspective.

  • Zazi is associated with three separate email accounts.  The memorandum states that one account is “directly subscribed to Zazi”, and “all three accounts contain slight variations of the same password.”
    • While not the best password policy, it could help with attribution.
  • JPEG images of handwritten notes about explosives (manufacture, handling, etc.) were found as email attachments.
    • Keyword searches would probably fail to find this evidence, since the notes are JPEG images.  Are there any digital forensics tools (or plugins/scripts) that support keyword searching of images? (perhaps by OCR?)
  • Browser artifacts were uncovered that suggested Zazi searched for hydrocholoric acid.  Additionally, a site for “Lab Safety for Hydrocholoric Acid” was bookmarked with two different web browsers.
    • The bookmarking could be useful in demonstrating intent, as users often bookmark sites they wish to remember, and/or return to.  The same bookmark in two different browsers makes this action less likely to be “accidental”.
  • Some of the browser artifacts suggested that Zazi “searched a beauty salon website for hydrocide and peroxide”.  Later, surveillance videos and receipts were used to show that Zazi purchased hydrogen peroxide products from a beauty supply store.  Other persons associated with Zazi, also purchased hydrogen and acetone, from three other beauty supply stores.
    • Digital evidence is just one type of evidence.  Here digital evidence (browser artifacts) is combined with physical evidence (surveillance video and receipts), to make the arguments more persuasive.
  • After executing another search warrant (at a later date), Zazi’s laptop was seized again.  The difference is that in the latter seizure, the hard drive was not recovered (it had been removed).
    • This could be considered a rudimentary form of anti-forensics.  You can’t analyze ones and zeros if they aren’t there.

You can view the memorandum here.

The Meaning of LEAK Records

I’ve been pretty quiet lately, largely due to spending time developing LibForensics.  Currently I’m adding support to read Microsoft Windows Internet cache containers (a.k.a. index.dat files).  If you’ve ever dealt with index.dat files before, you’ve probably encountered the mysterious “LEAK” record.  The purpose of this blog post is to explain one way that these records are created.

Background Information

In order to understand how LEAK records are created, it is useful to understand the Microsoft Windows Internet API.  The Microsoft Windows Internet API (WinInet) provides applications with the ability to interact with networked resources, usually over FTP and HTTP.  There are several functions in the WinInet API, including functions to provide caching.  Applications can use the WinInet API caching functions to store local (temporary) copies of files retrieved from the network.  The primary reason to use caching is to speed up future network requests, reading a local copy of a file, instead of from the network.

A cached file is called a “Temporary Internet File” (TIF).  The WinInet API manages TIFs using a cache container, which are files named index.dat.  There are several WinInet API functions to work with entries in cache containers, including creating a URL cache entry, reading locally cached files, and deleting URL cache entries.  The WinInet API also provides a cache scavenger, which periodically runs and cleans up entries that are marked for deletion.

The cache containers (index.dat files) are almost always associated with Microsoft Internet Explorer.  This is likely because Internet Explorer is one of the most commonly used applications that uses the WinInet API caching capabilities.  However, since the WinInet API is available to any end-user application, any application can use the caching capabilities.  This can pose an issue when attributing a specific entry in the cache container, to the program which generated the entry.

Internally a cache container is composed of a header, followed by one or more records.  There are several different types of records, including URL records (which describe cached URLs), and REDR records (for describing redirects).  A cached URL can have an associated TIF, which is described in the appropriate URL record.

LEAK Records

Now that we’ve reviewed index.dat files, we’ll see how to create LEAK records.  However before going further I want to emphasize that this is just one approach to creating LEAK records.  LEAK records may have uses outside of what is described in this post.

For the impatient: A LEAK record can be generated by attempting to delete a URL cache entry (via DeleteUrlCacheEntry) when the associated temporary internet file (TIF) can not be deleted.

The last paragraph of the MSDN documentation on the cache scavenger, discusses what happens when a cache entry is marked for deletion:

The cache scavenger is shared by multiple processes. When one application deletes a cache entry from its process space by calling DeleteUrlCacheEntry, it is normally deleted on the next cycle of the scavenger. However, when the item that is marked for deletion is in use by another process, the cache entry is marked for deletion by the scavenger, but not deleted until the second process releases it.

To summarize, when the cache scavenger runs and it encounters an item that is marked for deletion, but the item in use by another process, then the cache entry is not actually deleted.

Another reference to LEAK records can be found at Understanding index.dat Files.  The author describes LEAK as a “Microsoft term for an error”.

Combining these two ideas (deleting a cache entry when it is in use, and LEAK as a term for error), we can come up with a theory: a LEAK record is generated when an error occurs during the deletion of a url cache entry.  If you’ve ever taken a SANS Security 508 course (Computer Forensics, Investigation, and Response) from me, you’ll probably remember my approach to examinations (and investigations in general): theory (hypothesis) and test.

In order to test the theory, we need to create a series of statements and associated outcomes, that would be true if our theory is correct.

At this stage our theory is fairly generic.  To make the theory testable, we need to make it more specific.  This means we will need to determine a series of actions that will result in the generation of a LEAK record.  The first place to look is at the MSDN documentation on the WinInet API.  To save time, rather than walking through all the WinInet API functions, I’ll just reference the relevant ones:

Looking at this list, there are a few possible ways to generate an error while deleting a URL cache entry:

  1. Create/Commit a URL cache entry, and lock the entry using RetrieveUrlCacheEntryStream.
  2. Create/Commit a URL cache entry and corresponding TIF, and open the TIF.
  3. Create/Commit a URL cache entry and corresponding TIF, and make the TIF read-only.

The general approach is to create (and commit) a URL cache entry, then create a condition that would make deleting the entry fail.

Let’s solidify these into testable theories as “if-then” statements (logical implications) with function calls:

  • IF we create a URL cache entry using CreateUrlCacheEntry and CommitUrlCacheEntry, lock the entry using RetrieveUrlCacheEntryStream, and call DeleteUrlCacheEntry
    • THEN we will see a LEAK record.
  • IF we create a URL cache entry and corresponding TIF using CreateUrlCacheEntry and CommitUrlCacheEntry, open the TIF using open(), and call DeleteUrlCacheEntry
    • THEN we will see a LEAK record.
  • IF we create a URL cache entry and corresponding TIF using CreateUrlCacheEntry and CommitUrlCacheEntry, make the TIF read-only using chmod, and call DeleteUrlCacheEntry
    • THEN we will see a LEAK record.

Theory Testing

The next step is to test our theories.  It is relatively straight forward to translate the if-then statements into code.  In the “Sample Code” section I’ve included a link to a zip file that contains (amongst other things) three Python files, test_leak1.py, test_leak2.py, and test_leak3.py.  Each file implements one of the if-then statements.

Here is the output from running test_leak1.py (in a Windows 2003 virtual machine):

C:ToolsPython31>python z:inet_cachetest_leak1.py
Creating URL: http://rand_286715790
Using file: b'C:\Documents and Settings\Administrator\Local Settings\Temporary Internet Files\Content.IE5\81QNCLMB\CAUJ6C3U'
Locking URL: http://rand_286715790
Deleting URL: http://rand_286715790
ERROR: DeleteUrlCacheEntryA failed with error 0x20: The process cannot access the file because it is being used by another process.

The output from test_leak1.py indicates that there was an error during the call to DeleteUrlCacheEntry.  After copying the associated index.dat file to a Linux system, we can find a reference to http://rand_286715790:

xxd -g 1 -u index.dat.leak1
...
000ef00: 55 52 4C 20 02 00 00 00 00 00 00 00 00 00 00 00  URL ............
000ef10: 50 A1 F4 DB 08 32 CA 01 00 00 00 00 00 00 00 00  P....2..........
000ef20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000ef30: 60 00 00 00 68 00 00 00 02 00 10 10 80 00 00 00  `...h...........
000ef40: 01 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00  ..@.............
000ef50: 2A 3B B3 5A 02 00 00 00 01 00 00 00 2A 3B B3 5A  *;.Z........*;.Z
000ef60: 00 00 00 00 EF BE AD DE 68 74 74 70 3A 2F 2F 72  ........http://r
000ef70: 61 6E 64 5F 32 38 36 37 31 35 37 39 30 00 AD DE  and_286715790...
000ef80: 43 41 55 4A 36 43 33 55 00 BE AD DE EF BE AD DE  CAUJ6C3U........
...

The record is still marked as “URL “.  Further examination of the file shows no additional references to http://rand_286715790.  Here is the output from running test_leak2.py (in a Windows 2003 virtual machine):

C:ToolsPython31>python z:inet_cachetest_leak2.py
Creating URL: http://rand_3511348668
Opening file: b'C:\Documents and Settings\Administrator\Local Settings\Temporary Internet Files\Content.IE5\81QNCLMB\CAC23G8H'
Deleting URL: http://rand_3511348668

There was no clear indication that an error occurred.  After copying the index.dat file to a Linux system, we can find a reference to http://rand_3511348668:

xxd -g 1 -u index.dat.leak2
...
000ef00: 4C 45 41 4B 02 00 00 00 00 00 00 00 00 00 00 00  LEAK............
000ef10: 90 70 17 74 0C 32 CA 01 00 00 00 00 00 00 00 00  .p.t.2..........
000ef20: 00 04 00 00 00 00 00 00 00 00 00 00 00 E7 00 00  ................
000ef30: 60 00 00 00 68 00 00 00 02 00 10 10 80 00 00 00  `...h...........
000ef40: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000ef50: 2A 3B EB 5D 01 00 00 00 00 00 00 00 2A 3B EB 5D  *;.]........*;.]
000ef60: 00 00 00 00 EF BE AD DE 68 74 74 70 3A 2F 2F 72  ........http://r
000ef70: 61 6E 64 5F 33 35 31 31 33 34 38 36 36 38 00 DE  and_3511348668..
000ef80: 43 41 43 32 33 47 38 48 00 BE AD DE EF BE AD DE  CAC23G8H........
...

This time a LEAK record was created.  Further examination of the file shows no additional references to http://rand_3511348668.  Here is the output from running test_leak3.py (in a Windows 2003 virtual machine):

C:ToolsPython31>python z:inet_cachetest_leak3.py
Creating URL: http://rand_1150829499
chmod'ing file: b'C:\Documents and Settings\Administrator\Local Settings\Temporary Internet Files\Content.IE5\81QNCLMB\CAKB2RNB'
Deleting URL: http://rand_1150829499

Again, there was no clear indication that an error occurred.  After copying the index.dat file to a Linux system, we can find a reference to http://rand_1150829499:

xxd -g 1 -u index.dat.leak3
...
000ef00: 4C 45 41 4B 02 00 00 00 00 00 00 00 00 00 00 00  LEAK............
000ef10: 00 2B AF B5 0D 32 CA 01 00 00 00 00 00 00 00 00  .+...2..........
000ef20: 00 04 00 00 00 00 00 00 00 00 00 00 00 E7 00 00  ................
000ef30: 60 00 00 00 68 00 00 00 02 00 10 10 80 00 00 00  `...h...........
000ef40: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000ef50: 2A 3B 0A 5F 01 00 00 00 00 00 00 00 2A 3B 0A 5F  *;._........*;._
000ef60: 00 00 00 00 EF BE AD DE 68 74 74 70 3A 2F 2F 72  ........http://r
000ef70: 61 6E 64 5F 31 31 35 30 38 32 39 34 39 39 00 DE  and_1150829499..
000ef80: 43 41 4B 42 32 52 4E 42 00 BE AD DE EF BE AD DE  CAKB2RNB........
...

As with test_leak2.py, a LEAK record was generated. Further examination of the file shows no additional references to  http://rand_1150829499.

Given the results, we can assess the correctness of our theories.  Since test_leak1.py did not generate a LEAK record, while test_leak2.py and test_leak3.py did, we can narrow our original theory to TIFs.  Specifically that a LEAK record is generated when DeleteUrlCacheEntry is called, and the associated TIF (temporary internet file) can not be deleted.

It is also prudent to note that we only ran the tests once.  In all three tests it is possible that there are other (unknown) variables that we did not account for, and in the latter two tests the unknown variables just happened to work in our favor.  To strengthen the theory that LEAK records occur when a TIF can not be deleted, we could run the tests multiple times, as well as attempt other methods to make the TIF file “undeleteable”.

Sample Code

The file test_leak.zip contains code used to implement the testing of theories in this blog post.  The files test_leak1.py, test_leak2.py, and test_leak3.py implement the tests, while inet_cache_lib.py, groups.py, entries.py, and __init__.py are library files used by the test files.  All of the code was designed to run on Python3.1 on Windows systems, and interfaces with the Windows Internet API via the ctypes module.  The code is licensed under the GPL v3.

To install the sample code, unzip the file test_leak.zip to a directory of your choosing.  You can download the sample code by clicking on the link test_leak.zip.

Recovering a FAT filesystem directory entry in five phases

This is the last in a series of posts about five phases that digital forensics tools go through to recover data structures (digital evidence) from a stream of bytes. The first post covered fundamental concepts of data structures, as well as a high level overview of the phases. The second post examined each phase in more depth. This post applies the five phases to recovering a directory entry from a FAT file system.

The directory entry we’ll be recovering is from the Honeynet Scan of the Month #24. You can download the file by visiting the SOTM24 page. The entry we’ll recover is the 3rd directory entry in the root directory (the short name entry for _IMMYJ~1.DOC, istat number 5.)

Location

The first step is to locate the entry. It’s at byte offset 0x2640 (9792 decimal). How do we know this? Well assuming we know we want the third entry in the root directory, we can calculate the offset using values from the boot sector, as well as the fact that each directory entry is 0x20 (32 decimal) bytes long (this piece of information came from the FAT file system specification.) There is an implicit step that we skipped, recovering the boot sector (so we could use the values). To keep this post to a (semi) reasonable length, we’ll skip this step. It is fairly straightforward though. The calculation to locate the third entry in the root directory of the image file is:

3rd entry in root directory = (bytes per sector) * [(length of reserved area) + [(number of FATs) * (size of one FAT)]] + (offset of 3rd directory entry)

bytes per sector = 0x200 (512 decimal)

length of reserved area = 1 sector

number of FATs = 2

size of one FAT = 9 sectors

size of one directory entry = 0x20 (32 decimal) bytes

offset of 3rd directory entry = size of one directory entry *2 (start at 0 since it’s an offset)

3rd entry in root directory = 0x200 * (1 + (2 * 9))+ (0x20 * 2) = 0x2640 (9792 decimal)

Using xxd, we can see the hex dump for the 3rd directory entry:

$ xxd -g 1 -u -l 0x20 -s 0x2640 image
0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..

Extraction

Continuing to the extraction phase, we need to extract each field. For a short name directory entry, there are roughly 12 fields (depending on whether you consider the first character of the file name as it’s own field.) The multibyte fields are stored in little endian, so we’ll need to reverse the bytes that we see in the output from xxd.

To start, the first field we’ll consider is the name of the file. This starts at offset 0 (relative to the start of the data structure) and is 11 bytes long. It’s the ASCII representation of the name.

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
File name = _IMMYJ~1.DOC (_ represents the byte 0xE5)

The next field is the attributes field, which is at offset 12 and 1 byte long. It’s an integer and a bit field, so we’ll examine it further in the decoding phase.

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
Attributes = 0x20

Continuing in this manner, we can extract the rest of the fields:

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
Reserved = 0x00

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
Creation time (hundredths of a second) = 0x68

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
Creation time = 0x4638

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
Creation date = 0x2D2B

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
Access date = 0x2D2B

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
High word of first cluster = 0x0000

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
Modification time = 0x754F

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
Modification date = 0x2C8F

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
Low word of first cluster = 0x0002

0002640: E5 49 4D 4D 59 4A 7E 31 44 4F 43 20 00 68 38 46 .IMMYJ~1DOC .h8F
0002650: 2B 2D 2B 2D 00 00 4F 75 8F 2C 02 00 00 50 00 00 +-+-..Ou.,...P..
Size of file = 0x00005000 (bytes)

Decoding

With the various fields extracted, we can decode the various bit-fields. Specifically the attributes, dates, and times fields. The attributes field is a single byte, with the following bits used to represent the various attributes:

  • Bit 0: Read only
  • Bit 1: Hidden
  • Bit 2: System
  • Bit 3: Volume label
  • Bit 4: Directory
  • Bit 5: Archive
  • Bits 6 and 7: Unused
  • Bits 0, 1, 2, 3: Long name

When decoding the fields in a FAT file system, the right most bit is considered bit 0. To specify a long name entry, bits 0, 1, 2, and 3 would be set. The value we extracted from the example was 0x20 or 0010 0000 in binary. The bit at offset 5 (starting from the right) is set, and represents the “Archive” attribute.

Date fields for a FAT directory entry are encoded in two byte values, and groups of bits are used to represent the various sub-fields. The layout for all date fields (modification, access, and creation) is:

  • Bits 0-4: Day
  • Bits 5-8: Month
  • Bits 9-15: Year

Using this knowledge, we can decode the creation date. The value we extracted was 0x2D2B which is 0010 1101 0010 1011 in binary. The day, month, and year fields are thus decoded as:

0010 1101 0010 1011
Creation day: 01011 binary = 0xB = 11 decimal

0010 1101 0010 1011
Creation month: 1001 binary = 0x9 = 9 decimal

0010 1101 0010 1011
Creation year: 0010110 binary = 0x16 = 22 decimal

A similar process can be applied to the access and modification dates. The value we extracted for the access date was also 0x2D2B, and consequently the access day, month, and year values are identical to the respective fields for the creation date. The value we extracted for the modification date was 0x2C8F (0010 1100 1000 1111 in binary). The decoded day, month, and year fields are:

0010 1100 1000 1111
Modification day: 01111 binary = 0xF = 15 decimal

0010 1100 1000 1111
Modification month: 0100 binary = 0x4 = 4 decimal

0010 1100 1000 1111
Modification year: 0010110 binary = 0x16 = 22 decimal

You might have noticed the year values seem somewhat small (i.e. 22). This is because the value for the year field is an offset starting from the year 1980. This means that in order to properly interpret the year field, the value 1980 (0x7BC) needs to be added to the value of the year field. This is done during the next phase (interpretation).

The time fields in a directory entry, similar to the date fields, are encoded in two byte values, with groups of bits used to represent the various sub-fields. The layout to decode a time field is:

  • Bits 0-4: Seconds
  • Bits 5-10: Minutes
  • Bits 11-15: Hours

Recall that we extracted the value 0x4638 (0100 0110 0011 1000 in binary) for the creation time. Thus the decoded seconds, minutes, and hours fields are:

0100 0110 0011 1000
Creation seconds = 11000 binary = 0x18 = 24 decimal

0100 0110 0011 1000
Creation minutes = 110001 binary = 0x31 = 49 decimal

0100 0110 0011 1000
Creation hours = 01000 binary = 0x8 = 8 decimal

The last value we need to decode is the modification time. The bit-field layout is the same for the creation time. The value we extracted for the modification time was 0x754F (0111 0101 0100 1111 in binary). The decoded seconds, minutes, and hours fields for the modification time are:

0111 0101 0100 1111
Modification seconds = 01111 binary = 0xF = 15 decimal

0111 0101 0100 1111
Modification minutes = 101010 binary = 0x2A = 42 decimal

0111 0101 0100 1111
Modification hours = 01110 binary = 0xE = 14 decimal

Interpretation

Now that we’ve finished extracting and decoding the various fields, we can move into the interpretation phase. The values for the years and seconds fields need to be interpreted. The value of the years field is the offset from 1980 (0x7BC) and the seconds field is the number of seconds divided by two. Consequently, we’ll need to add 0x7BC to each year field and multiply each second field by two. The newly calculated years and seconds fields are:

  • Creation year = 22 + 1980 = 2002
  • Access year = 22 + 1980 = 2002
  • Modification year = 22 + 1980 = 2002
  • Creation seconds = 24 * 2 = 48
  • Modification seconds = 15 * 2 = 30

We also need to calculate the first cluster of the file, which simply requires concatenating the high and the low words. Since the high word is 0x0000, the value for the first cluster of the file is the value of the low word (0x0002).

In the next phase (reconstruction) we’ll use Python, so there are a few additional values that are useful to calculate. The first order of business is to account for the hundredths of a second associated with the seconds field for creation time. The value we extracted for the hundredths of a second for creation time was 0x68 (104 decimal). Since this value is greater than 100 we can add 1 to the seconds field of creation time. Our new creation seconds field is:

  • Creation seconds = 48 + 1 = 49

This still leaves four hundredths of a second left over. Since we’ll be reconstructing this in Python, we’ll use the Python time class which accepts values for hours, minutes, seconds, and microseconds. To convert the remaining four hundredths of a second to microseconds multiply by 10000. The value for creation microseconds is:

  • Creation microseconds = 4 * 10000 = 40000

The other calculation is to convert the attributes field into a string. This is purely arbitrary, and is being done for display purposes. So our new attributes value is:

  • Attributes = “Archive”

Reconstruction

This is the final phase of recovering our directory entry. To keep things simple, we’ll reconstruct the data structure as a Python dictionary. Most applications would likely use a Python object, and doing so is a fairly straight forward translation. Here is a snippet of Python code to create a dictionary with the extracted, decoded, and interpreted values (don’t type the >>> or …):

$ python
>>> from datetime import date, time
>>> dirEntry = dict()
>>> dirEntry["File Name"] = "xE5IMMYJ~1DOC"
>>> dirEntry["Attributes"] = "Archive"
>>> dirEntry["Reserved Byte"] = 0x00
>>> dirEntry["Creation Time"] = time(8, 49, 49, 40000)
>>> dirEntry["Creation Date"] = date(2002, 9, 11)
>>> dirEntry["Access Date"] = date(2002, 9, 11)
>>> dirEntry["First Cluster"] = 2
>>> dirEntry["Modification Time"] = time(14, 42, 30)
>>> dirEntry["Modification Date"] = date(2002, 4, 15)
>>> dirEntry["size"] = 0x5000
>>>

If you wanted to print out the values in a (semi) formatted fashion you could use the following Python code:

>>> for key in dirEntry.keys():
... print "%s == %s" % (key, str(dirEntry[key]))
...

And you would get the following output

Modification Date == 2002-04-15
Creation Date == 2002-09-11
First Cluster == 2
File Name == ?IMMYJ~1DOC
Creation Time == 08:49:49.040000
Access Date == 2002-09-11
Reserved Byte == 0
Modification Time == 14:42:30
Attributes == Archive
size == 20480
>>>

At this point, there are a few additional fields that could have been calculated. For instance, the file name could have been broken into the respective 8.3 (base and extension) components. It might also be useful to calculate the allocation status of the associated file (in this case it would be unallocated). These are left as exercises for the reader ;).

This concludes the 3-post series on recovering data structures from a stream of bytes. Hopefully the example helped clarify the roles and activities of each of the five phases. Realize that the five phases aren’t specific to recovering file system data structures, they apply to network traffic, code, file formats, etc.