Forensic Data Recovery From Flash Memory
Forensic Data Recovery From Flash Memory
1, JUNE 2007 1
Abstract—Current forensic tools for examination of embedded to file system level where common forensic media analysis
systems like mobile phones and PDA’s mostly perform data tools can be used for further analysis. Experimental results
extraction on a logical level and do not consider the type of are given on data originating from USB sticks and mobile
storage media during data analysis. This paper suggests a low
level approach for the forensic examination of flash memories and phones. Chapter V explains some artifacts characteristic to
describes three low-level data acquisition methods for making full data originating from flash file systems.
memory copies of flash memory devices. Results are presented of
a file system study in which USB memory sticks from 45 different II. F LASH T ECHNOLOGY
make and models were used. For different mobile phones is shown Flash memory is a type of non-volatile memory that can be
how full memory copies of their flash memories can be made electrically erased and reprogrammed. Flash memory comes
and which steps are needed to translate the extracted data into
a format that can be understood by common forensic media
in two flavors, NOR1 flash and NAND2 flash, named after
analysis tools. Artifacts, caused by flash specific operations like the basic logical structures of these chips. Contrary to NAND
block erasing and wear leveling, are discussed and directions are flash, NOR flash can be read byte by byte in constant time
given for enhanced data recovery and analysis on data originating which is the reason why it is often used when the primary goal
from flash memory. of the flash memory is to hold and execute firmware3 , while
Index Terms—embedded systems, flash memory, physical anal- parts of NOR flash that are not occupied by firmware can be
ysis, hex analysis, forensic, mobile phones, USB sticks. used for user data storage. Most mobile media, like USB flash
disks, or multimedia centred devices like digital camera’s and
I. I NTRODUCTION camera phones, use NAND flash memory to create compact
mobile data storage. This chapter explains the basics of flash
T HE evolution in consumer electronics has caused an
exponential growth in the amount of mobile digital data.
The majority of mobile phones nowadays has a build in camera
technology first on the physical level and then from a logical
perspective. An introduction to NAND flash memory can be
and is able to record, store, play and forward picture, audio, found in [5], more in depth information can be found in [9].
and video data. Some countries probably have more memory
A. Physical Characteristics
sticks than inhabitants. A lot of this data is related to human
behavior and might become subject of a forensic investigation. The physical mechanism to store data in flash memory is
Flash memory is currently the most dominant non-volatile based on storing electrical charge into a floating gate of a
solid-state storage technology in consumer electronic products. transistor. This charge can be stored for extended periods of
An increasing number of embedded systems use high level file time without using an external power supply but gradually
systems comparable to the file systems used on personal com- it will leak away caused by physical effects. Data retention
puters. Current forensic tools for examination of embedded specifications for current flash memory are between 10 and
systems like mobile phones or PDAs mostly perform logical 100 years.
data acquisition. With logical data acquisition it’s often not Flash memory can be written byte for byte, like EEPROM4 ,
possible to recover all data from a storage medium. Deleted but it has to be erased in blocks at a time before it can be
data for example, but sometimes also other data which is not re-written. Erasing results in a memory block that is filled
directly relevant from a user standpoint, can not be acquired completely with 1’s. In NAND flash, erase blocks are divided
and potentially interesting information might be missed. For further into pages, for example 32 or 64 per erase block. A
this reason data acquisition is wanted at the lowest layer where page is usually a multiple of 512 bytes in size, to emulate
evidence can be expected. For hard disk based storage media 512 byte sector size commonly found in file systems on
it’s common to copy all bytes from the original storage device magnetic media. Additionally, a page has a number of so
to a destination storage device and then do the analysis on this called ’spare area’ bytes, generally used for storing meta data.
copy. The same procedure is desired for embedded systems Some flash disk drivers use the concept of zones5 . A zone is a
with solid-state storage media. group of blocks, usually 256 to 1024. Contrary to blocks and
This paper suggests a low level approach for the forensic pages, a zone is just a logical concept, there is no physical
examination of flash memory. In chapter II the most important representation. See figure 1 for a dissection of NAND flash
technology basics of flash memories are explained. Chapter III memory.
describes three low-level data acquisition methods for flash 1 NOR flash memory was introduced in 1988 by Intel.
memories, first with so called flasher tools, then by usage of 2 NAND flash memory was introduced in 1989 by Toshiba.
3 Firmware is software that is embedded in a hardware device (like a mobile
an access port commonly used for testing and debugging and
phone or a PDA).
finally with a semi-invasive method where the flash memory 4 Electrically Erasable Programmable Read Only Memory.
chips are physically removed from the printed circuit board. 5 The term partition is sometimes also used to indicate sections of flash
Chapter IV explains methods to translate the extracted data memory.
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007 2
For mobile phones a lot of these tools can be found on When a forensically sound image cannot be produced with
the internet in forums like gsm-forum [23] or online shops flasher tools, a second option is to use a JTAG14 test access
like gsmtechnology [24] or gsm-server [25]. Flasher boxes are port of an embedded device. A JTAG test access port is
mostly accompanied by a large number of cables to connect normally used to test or debug embedded systems but can
different phone models. also be used to access flash memory [11].
This section explains about two test modes (extest15 and
An example of a flasher box with software which can be debug mode) and how to use these test modes for forensic
used with a wide variety of phones is the Twister flasher box. imaging of flash memory. JTAG enabled boards have extra test
Figure 4 shows a screenshot of the Twister series flasher box pads, usually not directly reachable for the user. The second
software. With this software it is possible to make full memory part of this section describes a method to find this JTAG test
copies of a large range of Nokia models. For some models access port on an embedded system with unknown layout.
only partial memory copies are possible. Figure 5 shows a 1) How to access flash memory using JTAG: Flash memory
screenshot of a tool for making complete flash memory copies chips are not JTAG enabled. But, as shown in an example
of Samsung D500/D600 handsets. Recent versions of this tool embedded system in figure 6, flash memory chips are usually
also copy the meta data needed for reconstruction of the file connected to other chips like a processor. This processor can
system (see section IV-B). A lot of these flasher tools work in a be used to gain access to flash memory if the processor is
similar way: they enter the bootstrap mode of a phone; upload JTAG enabled. Most JTAG enabled processors offer an extest
dedicated flash loader software to RAM; execute this software mode or debug mode. Note that extest or debug mode may not
and then use it for low level access to the flash memory. be available on all processors and some processors offer both
Further research is needed to incorporate this mechanism in modes. The next two paragraphs explain how to use these two
future forensic mobile phone acquisition software. modes for forensic imaging of flash memory.
a) Extest mode: In extest mode, all processor pins are
1) advantages and disadvantages: controlled by a JTAG controller while the processor core is
disabled. Test vectors are loaded or read using a, usually,
• Hardware connection is usually easy with a connector. long shift register. An external flash memory can be read by
• Flash memory can be imaged without de-soldering of loading and reading a series of test vectors. An example in
flash memory chips. figure 7 shows how to access a NOR flash memory using
• Some tools do not make a full forensic image of flash extest mode and a series of two test vectors.
memory (some do only parts of the memory space or
skip spare area). 14 JTAG: Joint Test Action Group, see “IEEE Std 1149.1 Standard Test
• It can not be guaranteed that no data is written in flash Access Port and Boundary-Scan Architecture”.
memory. 15 Extest: External test mode.
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007 5
Fig. 11. Pogo-pin This custom made design is called ‘NFI memory toolkit’. A
schematic is drawn in figure 13.
An FPGA is used for communicating with a flash memory
chip where configurations are available for a NAND and
NOR flash protocol (with multiplexed and de-multiplexed
address bus). All parameters, like address bus size and data
bus size are fully customizable by the PC software. In case
of a NOR flash memory a data structure is read from the
NOR flash memory (CFI21 data structure). This data structure
contains all parameters needed for reading that particular flash
memory (like protocol, memory size etc). The command to
read this data structure is compatible with all protocols and
the toolkit software automatically uses the parameters to read
a NOR flash chip without any configuration from the user.
NAND flash chips can also be read automatically because the
Fig. 12. Universal socket (left) and locator with chip (right) number of protocols used by NAND flash chip is very limited.
The toolkit software automatically scans all protocols until a
correct response is received from the NAND flash chip. Due
reader usually has several types of ZIF20 sockets for connect- to the automatic configuration properties of the software it is
ing a memory chip to the programmer or reader. Flash chips in sometimes possible to read flash chips even if a datasheet is
TSOP casing usually use a casing with 48 pins. Therefore most not available.
TSOP chips can be read with only one type of socket. Micro 4) Advantages / disadvantages of physical extraction:
BGA chips, however, are found in many different sizes and • It can be guaranteed that no data is written in flash
differ greatly in number of balls. Usually chips have casings memory because the embedded system stays powered
between 40 balls up to 167 or more. The number of sockets down.
to be used is huge (>40) and continues to grow. Usually a • Data from broken or damaged embedded systems can be
socket is expensive and has a long delivery time. Therefore it recovered.
is not feasible to buy a new socket for each type of chip. • A complete forensic image can be produced (all data,
A solution for this problem is to use a socket that can be inclusive spare area, bad blocks etc).
adapted for many types of chip casings [18]. The solution • A disadvantage is that there is a risk of damaging the
presented in this paper uses a matrix of 15 x 15 pogo pins flash memory chip due to the heat for de-soldering.
where sockets are available for 0.5mm, 0.75mm and 0.8mm • The embedded system has to be opened to reach and de-
pitch. The flash memory chip is held into position by a locator. solder flash memory chips.
This locator is specific for each type of casing and can be made
relatively fast and easily with a milling machine and is cheap. IV. F ILE S YSTEM A NALYSIS
See figure 12. Data acquisition as described in the previous chapter results
3) Flash memory chip programmer or reader: A flash in one or more binary files containing linear bitwise copies of
memory chip can be read with a commercially available flash memory data. Before any volume analysis and succeed-
memory chip programmer like BP 1600 from manufacturer BP ing file system analysis can take place, the sectors of data as
Microsystems [16]. A disadvantage is that a driver is needed used by the high level file system need to be placed in the
for each type of memory chip. If a driver for a certain type right order. For devices with a flash file system (FFS22 ) this
of chip is not available, the manufacturer of the programmer
21 CFI:
has to program this driver. This can take some time and is not Common Flash Interface.
22 Currentdefinitions of a flash file system are a bit greedy. In this paper the
always possible when a datasheet is not available for example.
term flash file system (FFS) is used to describe data translation mechanisms
Another solution is to use a universal flash chip reader. between the physical flash chip and the file system API of the host operating
system. This covers flash translation layers (FTL) of disk-drive emulators and
20 ZIF: Zero Insertion Force. dedicated flash optimized file systems.
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007 8
TABLE IV
means finding out how the FFS maps physical data to logical C ONTROLLER CHIPS IDENTIFIED IN THE REFERENCE COLLECTION
data and how the difference between valid and invalid data
can be determined. The result of flash file system analysis is Brand Type Company Website
6633
a method that splits the physical data into two parts: a file SSS
6666
www.3system.com.tw
with all logical sectors in the right order belonging to the AU9382
actual high level file system and a file with all other data not ALCOR AU9384 www.alcormicro.com
AU9385
belonging to the (current) high level file system. ChipsBank CBM1183 www.chipsbank.com
The file system analysis process is explained in the next PQI CLCPC02 www.pqi.com.tw
sections in the context of USB memory sticks and for mobile M-Systems
T4
www.m-systems.com
phones. Titan
KTC FC1325N www.ktc.com.tw
Lexar FC1610 www.lexar.com
A. File System Analysis on USB Memory Sticks GenesysLogic GL814 www.genesyslogic.com
OTi002168
The flash file systems on USB memory sticks are usually OTi OTi006808 www.oti.com.tw
relatively simple mechanisms that only translate the Logical OTi006828
Block Number used in high level file systems to a low level PP2201
PointChips www.pointchips.com
PP2366
physical address and do not support wear levelling. In the file Silicon Motion SM3210 www.siliconmotion.com.tw
systems described in this chapter, the block size of the flash SONIX
SN11085A
www.sonix.com.tw
file system is equal to the erase block size of the flash memory SN11088B
Trumpion t33521FL www.trumpion.com.tw
chip used. This means that when a logical block changes, the Prolific ? www.prolific.com.tw
new version is stored in a new erase block and the complete SanDisk ? www.sandisk.com
old version is erased. This is not a very efficient way of dealing Silicon Integrated Systems Corp. ? www.sis.com
with flash memory, because pages within an erase block that
are not changed are still copied to the new block and the old TABLE V
NAND FLASH CHIPS , IDENTIFIED IN THE REFERENCE COLLECTION
page is erased, yielding a higher wear that absolutely necessary
for this block. Brand Type Company Website
In the USB memory sticks studied for this chapter, the Hitachi HN29V51211T-50 www.hitachi.com
concept of zones is used in two devices23 . In these devices, a HY27UF081G2M-TPCB
Hynix HY27US08121M-TCB www.hynix.com
zone is defined as a group of erase blocks, for example: 1024 HYF33DS512800ATCG1
erase blocks are grouped into a zone. Within this zone, 1000 K91FG08U0M-YBB0
blocks are actively used to store the high level file system, 24 K9F1208UOM-YCB0
K9F1G08U0A-PCB0
blocks are kept aside to replace bad blocks when they arise. K9F1G08U0M-VIB0
These blocks are marked in a special way, so that they can be K9F1G08U0M-YCB0
recognised by the controller as such. K9F1G16U0M-YCB0
Samsung K9F2808U0B-YCB0 www.samsung.com
For the study on which this section is based a reference K9F2808U0C-YCB0
collection of USB memory sticks was needed. To create this K9F2808U0M-YCB0
collection, colleagues at the NFI were asked whether they K9F5608U0A-YCB0
K9F5608U0B-YCB0
owned any USB memory and whether they wanted to trade K9F5608U0C-YCB0
it for a new 128 Mbyte device. This resulted in 45 sticks of K9K1G08U0M-YIB0
different make and model. RF N1208U0B-0FF ?
NAND256W3A0AN6
1) Identification of controller and memory chips: The con- ST
NAND512W3A0AN6
www.st.com
troller chips found in USB sticks are sometimes hard to SanDisk S4164901 www.sandisk.com
identify, mainly because often only a manufacturer logo and TC58128FT
TC58DVG02A1FT00
a part number can be found on the chip. But even with this Toshiba
TC58DVG04B1FT00
www.toshiba.com
information and some creative Internet queries the manufac- TC58DVM92A1FT00
turer of the controller can be identified. The memory chips
are often much easier to identify. The memory chip usually
carries the manufacturer name and logo and the part number. of chips. Table V shows all unique memory chips identified
Furthermore, contrary to controller chips, flash chips appear in the reference collection.
to be produced by companies well known in the electronics To put the number of different memory chips in perspective,
industry. most NAND flash memory chips are compatible but some
In the reference collection of 45 USB memory sticks, 16 important properties are differing. Some of these properties
different manufacturers of controllers have been identified, are:
who produced 24 different controllers. Table IV shows all
controllers identified in the reference collection. In the ref- • Storage capacity: 16 Mbyte to 128 Mbyte
erence collection, 8 different manufacturers of NAND flash • Number of addressing cycles: three, four or five
memory have been identified, who produced 26 different types • Width of the I/O bus: 8 or 16 bit
• Operating voltages: 1.70~1.95V, 2.4~2.9V, 2.7~3.6V
23 Smart Media format and the Alcor 9385 controller format. • Erase block size: 16 kbyte, 128 kbyte
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007 9
Fig. 14. Block shifting into previous zone because of bad block skipping
Fig. 15. Smart Media format
TABLE VI
• Page size: 528 byte, 2112 byte B LOCK ADDRESS INSIDE THE BLOCK ADDRESS FIELDS
• Housing: TSOP 48
Byte D7 D6 D5 D4 D3 D2 D1 D0
2) Making an exact copy of the flash chip(s): When the 518, 523 0 0 0 1 0 BA9 BA8 BA7
chip is extracted from the PCB, it can be read with a device 519, 524 BA6 BA5 BA4 BA3 BA2 BA1 BA0 P
programmer24 as described in section III-C. When reading the
content of flash chips one needs to be aware of the fact that
some programmers have a special way of handling NAND contact to TSOP 48 housings. With this system a complete
flash. When programming a NAND flash in a production binary copy of a NAND flash memory chip can be made. The
environment, the programmer obviously wants to skip bad rest of this paragraph is based on complete binary copies of
blocks. Further more, when a file is loaded in the programmer, flash chips, made with the NFI memory toolkit.
one wants to be sure that the file will fit in the flash chip, so the 3) Converting the copy to the high level file system: In
programmer will only accept files smaller than the guaranteed order to convert the exact copy of the NAND flash memory
minimal number of good blocks. These two properties often back to the file system as seen by the host OS, the meta data
also play a role when reading the device. Bad blocks are not in the NAND flash memory needs to be interpreted. There are
read, and only the guaranteed minimal number of good blocks three main questions in this regard:
is read. For making a forensically sound copy of a memory 1) What is the granularity of the flash file system?
chip this is not the desired behaviour. 2) Where is the meta data stored?
Skipping of bad blocks can lead to the following problem 3) How can the meta data be interpreted?
when reconstructing the high level file system: Suppose a The answers to these questions are of course known by
USB memory controller divides the memory into zones of 256 the manufacturer of the USB memory controller. Sometimes
blocks. Each block (belonging to the high level file system) the answers can be found in literature, see for instance the
within a zone has to have a unique number, stored in the spare definition of the Smart Media File System. When unlucky, the
area of each page in that block. Then one bad block in a zone controller manufacturer is unable or unwilling to give informa-
arises. After this, the memory is imaged with a programmer tion of the flash file system. This leaves reverse engineering of
that skips bad blocks. The resulting image is also split up the flash file system as the last option. In this case a reference
into zones. Now, the zone with the bad block will contain one stick of same make and model is nearly indispensable.
block of the next zone because all blocks after the bad block a) Smart media flash file system: The Smart Media
will be shifted one block in the image. Now it will be very format, introduced by Samsung and Toshiba in the late 90’s is
likely that we have two blocks with the same ’unique’ number an example of how to store a FAT file system in flash memory.
in one zone. See Figure 14. Information on the Smart media flash file system cannot be
Reading up to only the guaranteed minimum number of found on Samsung’s and Toshiba’s websites anymore, as the
good blocks can result in blocks at the high end of the memory format is ‘End Of Life’. Now only copies of the document
chip not being read. These blocks might very well contain parts can be found on Internet [9].
of the high level file system so not reading them might hinder Each FAT cluster is stored in a flash erase block, while
reconstruction of the high level file system. the information on which FAT cluster is stored in which flash
There are several solutions to these problems. One is to erase block is kept in the spare area of each page in the erase
request the manufacturer of the device programmer to make a block. Furthermore, the spare area contains information on the
special version of the algorithm for the specific chip which status of the data, the block and error correction code.
reads all blocks, good and bad. Another is to develop an BA0 - BA9 are the bits for a nine bit block address, yielding
‘in house’ solution. For the Memory Toolkit, described in a maximum of 210 * 0x4000 = 16777216 bytes (16 MByte)
paragraph III-C3, an algorithm for reading NAND flash was of data. Smart Media memories with more than 16 MByte,
developed. Furthermore, an adapter socket was made to make use zone based block management where each zone has 1024
physical blocks in which 1000 blocks are used as logical
24 We initially used a BP1600 from BP Microsystems. blocks. P is the parity bit for even parity.
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007 10
TABLE VII
When all bad blocks are skipped all pages within an erase H ASH VALUES CALCULATED OVER PAGE AND BLOCK DATA
block have the same logical block number (LBN). So to
Logical block (16384 bytes) Md5 hash
convert a raw Smart Media flash memory copy to the high 00001350 0005D859EC1AA3DDD590A13761CB520A
level file system, one needs to sort all erase blocks to their 00000DCB 00186EC8203D6759E9049511E8F5E238
logical block number (within each zone) and for each page 0000008E 001F200A0668AB9071D9E207C9783895
00001C15 0020C85071AA64FF1AF56542CD2D8AA1
strip off the spare area. 00000490 0028EBEDAB6090610EDCFD0F45985F5D
b) Unknown flash file system: In case there is no knowl- 00000E84 002E99C24BC440505E8477811AAB1639
000003B5 0043A7ADF684459073A1E2EE4DAC9161
edge about the used flash file system, the flash file system ...
needs to be reverse engineered. A reference device of exact Physical block (16384 bytes) Md5 hash
same make and model as the exhibit is making the reverse 00001A43 00081EDB46257F7C61E14A175F638ADF
00001BBA 0018BC3D54C18EDCC54D8CBE344FCAEF
engineering job a lot easier, but it is indispensable when it 000008CA 0018E4019976D0B0D9984B4816241F9E
comes to validating the method. 0000070B 001FC96E9FB689249EA97B03EB298052
00000BAB 002711850DA93D0707FEF134945C83D3
4) What is the granularity of the flash file system?: To 00001EA2 0027BA1FA4C4F2C60A4926C61E62069F
illustrate a way to find out about granularity of the flash file 0000007A 00384D4C7AFE6548C2ACB6A6FD4AB664
...
system, we investigated a USB memory stick with a Lexar
Logical sector (512 bytes) Md5 hash
FC1610 controller and a K9K1G08U0M-YCB0 128 Mbyte 00026193 0000EE07779E4A23827BF396E501B121
memory by Samsung. The flash chip has a page size of 512+16 0001C10A 000100B88720F47EB8C517AB796D3C20
0000F7A0 0001A198418524B9A30E99CEB0882AB2
bytes, and 32 pages per erase block. To answer the question 000001DA 0001B7FE9BDAD0920FB2A505FBA0B20B
on granularity, first the USB memory stick is completely filled 000000E5 0001B7FE9BDAD0920FB2A505FBA0B20B
000374A6 0001DE7FABA54AAEECB6E8E3295D6D32
with files of random length and content. Then a logical image 00007D51 0001F3620D7C07A58050A223236EC1C4
needs to be made of the USB memory stick25 . When this ...
image is made, two lists can be produced from this image. Physical page (512 bytes) Md5 hash
000141B6 0000EE07779E4A23827BF396E501B121
One containing a hash value26 over each 512 bytes of the 0000B6AD 000100B88720F47EB8C517AB796D3C20
image. Then one containing a hash value over each 16384 00006E63 0001A198418524B9A30E99CEB0882AB2
00038E5D 0001B7FE9BDAD0920FB2A505FBA0B20B
bytes. Next, a physical image needs to be made of the flash 00038CC8 0001B7FE9BDAD0920FB2A505FBA0B20B
memory chip (see chapter III). From this image, two more lists 00031AA9 0001DE7FABA54AAEECB6E8E3295D6D32
00026BF4 0001F3620D7C07A58050A223236EC1C4
need to be made. One with hash values of each page (without ...
the spare area), and one with a hash value of each erase block
(without the spare area’s).
Load these lists into a tool with which the lists can be
5) Where is the meta data stored?: Meta data can be stored
sorted on the various columns. In this case Microsoft’s Access
in the spare areas of the flash memory. In case there is a page
and Excel27 were used. When sorting on hash values, it will
size granularity, all spare areas within each block will contain
become clear whether there are identical hash values each page
different information. In case there is a block size granularity,
bytes or each block.
spare areas within one erase block may contain at least a few
In table VII, the first 7 entries in the table are shown. From identical bytes: the ones that indicate the logical block number.
this it is already clear that on page level identical MD5 hashes In section IV-A6 a method is described to analyze meta data
are found, so the granularity for this flash file system is page stored in spare area’s.
size. Logical sector 0x26193 is stored in (mapped to) physical
Meta data can also be stored in the normal pages/blocks.
page 0x141B6, 0x1C10A is stored in 0xB6AD, and so on.
No generic method has yet been developed for USB memory
Note that logical sectors 0x01DA and 0x001E have the same
to analyze this type of meta data storage.
MD5 hash, so most probable they have the same content. This
6) How can the meta data in spare area’s be interpreted?:
can have several causes:
To illustrate a way to get information on the meaning of the
• the test data is not random enough; meta data, we investigated a USB memory stick with an Alcor
• there are identical (bad) blocks that are not changed by 9385 controller and a 64 Mbyte NAND512W3A flash chip by
the system anymore; ST.
• there are spare blocks (within a zone) that are not yet First the granularity of the flash file system of this controller
used to store the high level file system. was investigated as described in section IV-A4. It appeared that
Now this mapping is based on sorting the MD5 hashes the granularity is erase block size. This means that the order
of content of both physical and logical images. No need to of pages within an erase block are not changed when stored
say that when we want to reconstruct the file system from a in physical flash memory.
physical copy, the mapping from physical to logical has to be To get an idea of what data is stored in the spare areas,
found in another way. We need to explore the meta data. a table of 16 column by 256 row elements is build. Each
column represents one byte location in the spare area’s, each
25 Forinstance with the ‘dd’ command under linux. row is a possible value of that byte. All spare areas are read
26 Forinstance the MD5 hash.
27 When using Excel, the maximum number of rows is 65536. This is just
and for each byte location in the spare area, the counter in
enough for comparing all sectors of a memory of 32Mbyte or all 4k blocks the corresponding value row is incremented. When done, each
of a 1 Gbyte memory. element contains the frequency of a certain value in a certain
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007 11
TABLE VIII
PART OF THE SPARE AREA FREQUENCY TABLE
TABLE IX
P HYSICAL BLOCK NUMBERS VERSUS SPARE AREA BLOCK NUMBER INFO TABLE XI
H OW TO CALCULATE THE LBN
Physical Block Block number in zone
0001 0001
041F 0001
0804 0001
0C25 0001 counter in byte 7/12 and bits 0-2 of byte 6/11 and the zone
00F8 0002 number. Expressed in C code this looks like:
0401 0002
0808 0002
0C26 0002 Logicalblocknumber = Byte7;
0004 0004 Logicalblocknumber += (Byte6 & 0x07) << 8;
Logicalblocknumber += (Physicalblocknumber & 0xFC00) <<1;
0423 0004
0805 0004
0C27 0004
In the example above the LBN is 0x5B6E. All blocks can now
be rearranged by ordering the blocks by their LBN. When
done, the result can be checked by calculating and comparing
spare area byte location.
hash values over the reconstructed file system and over the
Table VIII is a fragment of the total frequency table of
image obtained through ‘normal’ methods.
the spare area’s. In the columns 7 and 12 there is an even
distribution of all possible values28 . This is a good indication
that there is a counter in this byte. The columns 6 and 11 only B. Mobile Phone File System Analysis
have values between 16 (0x10) and 23 (0x17)29 , so this might
Figure 16 depicts the NAND array structure of the multi-
also be a counter, but only of 3 bits (bits 0 through 2) and bit 4
chip package memory in a Samsung SGH-D500 phone as
always high. Further examination of data in columns 6/11 and
described in the datasheet [17]. Four 512 byte data sectors
7/12 learned that in all spare areas the data in these columns is
are grouped into one page together with four 16 byte spare
the same. Apparently the counter is stored twice in spare areas.
area data. Figure 17, also from [17] explains the assignment
Data in column 6/11 and 7/12 adds up to an 11 bit counter,
of the spare area bytes. The spare area description suggests
that can address 211 erase blocks, which is 2048 * 16384 = 32
that logical sector numbers (LSN) should be stored in byte 3-
MByte. Next thing that has to be investigated is whether the
6 of each 16 byte spare area part. To verify this and determine
counters are contiguous. If not, the erase blocks within a zone
the storage format a sample flash memory file with known data
have to be arranged in the order of the counter in bytes 6 and 7,
has been studied. The LSN is stored in byte 3-5 with the least
but missing numbers are allowed. Of course this will decrease
significant byte first as demonstrated below:
the addressable memory space of the counter, and increase the
number of zones. To find out about the contiguousness of the
FFFF 5C35 00FE 00FF 0C03 CCFF FFFF FFFF
counter, make a table with physical block number and block LSN = 00355Ch = 13660d
number based on byte 6/7 of the spare area.
FFFF 5D35 00FE 00FF 0C03 CCFF FFFF FFFF
When ordered on the last column, it is clear that each block LSN = 00355Dh = 13661d
number appears 4 times within the whole memory, leading
to the conclusion that there must be four zones. Table XI
illustrates how to calculate logical block numbers from the Calculating all LSN’s for an experimental flash memory file
gives different physical sectors with the same LSN. Additional
28 Except for the values 0 and 255. experiments were done to find out which physical sector from
29 Except for the values 0 and 255. a set of sectors with identical LSN’s is the sector that belongs
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007 12
METADATA INFORMATION
--------------------------------------------
Range: 2 - 3969666
Root Directory: 2
CONTENT INFORMATION
--------------------------------------------
Sector Size: 512
Cluster Size: 4096
Total Cluster Range: 2 - 31014
...
Byte Meaning
phones running Symbian complies with VFAT37 specifications, {3...0} The block number, (0-63)
making the file system compatible with desktop PC’s. The {7...4} After formatting, all blocks are numbered 1 to 64. The first
actual data itself however is stored in a proprietary format. The dirty page that is to be erased after a format will be labeled
65, the second 66, etc This number is perhaps used by a
results described in this paragraph are based on experiments wear-leveling algorithm
with Nokia 3650 and 7610 phones [26]. The described block {11...8} Initially contains the same number as 3..0, but is incremented
and page headers correspond to a Nokia 7610 phone, for the after the block becomes dirty and after being overwritten.
Possibly also related to wear-leveling
3650 they are some differences in sizes and offsets. {15...12} Initially contains the same number as 3..0, but is incremented
a) Data acquisition: The flash memory of the Nokia after the block becomes dirty and after being overwritten.
3650 can be copied by using the RRawDisk function of the Possibly also related to a wear-leveling
{12...43} Seems to be same for every block
Symbian file server API. Because this function only works {75...72} Possibly a checksum of the block header
when no other resources are accessing the file system, the
TABLE XII
Symbian backup server was used to close all handles to open I NTERPRETATION OF BYTES 0-75 OF A BLOCK HEADER
files before using the RRawDisk function. The application to
copy the internal flash memory needs to be installed on an
external MMC card which is also used as destination for Byte Meaning
the flash memory copy. Unfortunately this method did not {0} Data sector attributes; bit 2 and bit 3 of this byte (bit
work on other Symbian phones because not all file handles 0=LSB) seems to indicate whether the data sector contains
valid data, or whether it is dirty
could be closed successfully. The flash memories of the Nokia {1} The type of data that this stream represents
7610 have been removed physically and copied with the NFI {2}{3} Unknown
memory toolkit as described in section III-C. {5...4} Stream ID
{7...6} Were always found to be zero The stream ID may be a
b) Block and sector headers: The memory is divided in 32-bit number, with {7..6} as the higher bytes, instead of
128 blocks of 64 kb each. A block starts with a 76 bytes a 16-bit number
block header, which holds information about the block. The {9...8} Sequence number of the sector in the stream
{11...10} Seems to be always zero, though the sequence number may
rest of the block contains data sectors and sector headers. Each be a 32-bit number, with these as the higher bytes
data sector has a 27 bytes wide header located just after the {13}{12} This is an address offset of the data sector. The offset is
block header header. Data sectors are written to a block in a relative to the start of the block. This address is a word-
address, so a left-shift is needed to obtain the proper byte-
“bottom-up” fashion, and can have a maximum size of 512 address
bytes. Sequences of one or more data sectors constitute a {15}{14} Size of the data sector in bytes
stream. A single stream can represent any kind of data, for {19...16} Possibly a checksum of the data sector
{23...20} Possibly a checksum of the page sector
example, a file or a directory table. The data sectors that a
stream consists of need not be stored in a sequential manner, TABLE XIII
I NTERPRETATION OF BYTES 0-23 OF A SECTOR HEADER
and can span multiple blocks. The size and position of a data
sector is determined by its sector header. The sector header
also determines which stream the data block belongs to, and
Byte Meaning
the sequence number of the data-block. {3...0} Length of current directory entry in bytes. This is
If a block is empty, only the block header exists. Otherwise, always a multiple of 4 bytes
the block header is first followed by a sector header that {7...4} Stream id of the relevant file or subdirectory
{11...8} Some sort of counter; zero for the first directory
does not point to any data sector. Then a list of ‘useful’ data entry; the next entry is always 3-5 higher than the
sector headers follows. The list of sector headers is of constant previous. The meaning of this field is unknown
size. Table XII and table XIII contain partial specifications {12...size-of-entry} Name of the file/subdirectory, in 16-bit unicode.
Because the length of a directory entry is always
(determined from experiments) of the fields of block and sector a multiple of four, the name will be padded with
headers. 0x0000 if the amount of characters in the name is
When a file is deleted the sectors of a stream become dirty, an odd number
this is achieved by setting the valid bits in the sector header TABLE XIV
to 0. The sector still exists in the flash device, but is marked T HE INTERPRETATION OF A DIRECTORY ENTRY
as dirty.
c) Files and directories: A file consists of two streams,
with the same stream ID. Only the data type field in the
sector header (table XII) will be different. 0x84 in this field this stream is the root-directory. A directory is stored in much
denotes that the stream represents the contents of the file itself, the same way as a normal file. The content of this file is a
while 0x81 means that the stream contains the file attributes directory table, with a list directory entries pointing to its files
of this file. The only known field in this stream is the 6th byte, and subdirectories. A directory entry is organized as depicted
which denotes whether the data stream represents a user file, in table XIV.
or directory: 0x01 means the stream represents a file, 0x02 d) Decoding a memory copy: With the information de-
means the stream represents a directory. 0x03 indicates that scribed in the previous section it is possible to decode the
37 Virtual File Allocation Table, a virtual installable files system driver
current file system. Deleted files can be recovered by looking
serving as an interface between applications and the File Allocation Table for stream id’s in dirty sectors but this will be complicated if
(FAT). different deleted files with the same stream id exist.
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007 15
V. E VIDENCE S EARCHING R-Studio has been used to load the FAT file system file
and recover all file system data to a file server. With R-
After reconstruction of the high level file system and ex-
Studio three erased video files could be identified in the “video
traction of all other data in the most plausible order, further
clips” folder. The recover option of R-Studio reported that
investigation of the flash memory can be done in the same
differences between the file size and the length of the FAT
way as for any other forensic file system investigation [3].
chain indicate that these files were overwritten. The video
This section will discuss some specific analysis topics related
clips folder contains a “THUMB” folder with the same file
to data originating from flash file systems.
entries but shorter file sizes. R-Studio reported that these files
were successfully recovered but because it was not possible
A. File System Tools to decode these files they were further analyzed with TSK.
With the fls tool the cluster number of each file was decoded
Current forensic file system analysis tools like Encase, FTK, from the directory entries and with the istat tool all other
R-Studio, and TSK, are not fully aware of the physical media file metadata was displayed. The istat tool reported that
from which an image file originates. For advanced data recov- file recovery was not possible. How these files can still be
ery this knowledge of the physical properties might improve recovered by using data sectors from the flash memory not
the recovery process. Flash file systems for example often present in the FAT file system is illustrated with example
contain different versions of the same data objects because thumbnail “video-0003.3gp”. fls -o4 -f fat -r D500 FATFS.bin
flash memory can’t be erased in small quantities. Especially gives for “video-0003.3gp”:
for small objects (much smaller than one flash block) with a
high update frequency, a lot of old versions might exist outside ++++ r/r * 5901: video-0003.3gp
of the normal high level file system. istat -o4 -f fat D500 FATFS.bin 5901 gives:
For FAT file systems the FAT and directory entries are
interesting candidates for advanced analysis because of their Directory Entry: 5901
size, update frequency and evidence value. To give an idea Not Allocated
File Attributes: File
of the amount of different versions: in an actual case with Size: 2720
a Samsung SGH-D500 mobile phone the flash memory file Name: _IDEO-~3.3gp
Directory Entry Times:
contained 83 versions of some part of the FAT and 1464 Written: Tue May 3 17:41:24 2005
versions of the directory “\multimedia\VIDEOS\video clips” Accessed: Tue May 3 00:00:00 2005
Created: Tue May 3 17:41:24 2005
were all user recorded video movies are stored by default. Sectors:
A common forensic tool will show the last version of the 88909 88910 88911 88912 88913 88914 88915 88916
Recovery:
directory, possibly with some files marked as deleted but from File recovery not possible
the other versions of the directory data a lot of the user
behavior can be reconstructed.
The same holds for other data objects although larger Looking in ListLSN for the sector numbers gives38 :
objects (like movie files) are likely to be (partly) overwritten
ListLSN[88913]={0x06a7c940,0x044515b0,0x0444e430,0x067e0540}
earlier after deletion because they occupy complete flash ListLSN[88914]={0x06a7cb50,0x044517c0,0x0444e640,0x067e0330}
blocks which can be reused immediately after deletion. ListLSN[88915]={0x06a7cd60,0x044519d0,0x0444e850,0x067e0120}
ListLSN[88916]={0x06a7cf70,0x04451be0,0x0444ea60,0x067dff10}
ListLSN[88917]={0x06a7d180,0x04451df0,0x0444ec70,0x067dfd00}
ListLSN[88918]={0x06a7d390,0x0444ee80,0x067dfaf0}
B. Dedicated Search Strategies ListLSN[88919]={0x06a7d5a0,0x0444f090}
ListLSN[88920]={0x06a7d7b0,0x0444f2a0}
Some specific flash memory analysis issues are discussed
below based on a case example. In this case a doubtful
witness declares that he made a recording with his phone The current FAT-FS uses address range 0x06a7c940-
on which somebody confessed a murder. A standard forensic (0x06a7d7b0+0x200) for logical sectors 88913-88920 but that
investigation of the mobile phone with .XRY [21] did not data range is currently allocated to another file. If the non FAT-
show any relevant data so it was sent to the NFI for advanced FS sectors are grouped according to the heuristic described in
analysis on the presence of erased audio or video material. section IV-B2 the following ranges appear in the resulting file:
At the laboratory the flash memory chip was analyzed of
Range 1: 0x044515b0,0x044517c0,0x044519d0,0x04451be0,
a reference phone of the same brand and type. It contains 0x04451df0
32 MByte of NOR flash and 128 MB of NAND flash. From Range 2: 0x0444e430,0x0444e640,0x0444e850,0x0444ea60,
0x0444ec70,0x0444ee80,0x0444f090,0x0444f2a0
experiments it was found that multimedia data like sound, Range 3: 0x067e0540,0x067e0330,0x067e0120,0x067dff10,
pictures and video are stored in the NAND flash memory so 0x067dfd00,0x067dfaf0
this memory was further examined on the case phone. JTAG
[10] has been used to copy the NAND flash data to a binary After comparing data from known thumbnail video files the
file. This file has been used with the script discussed in section data belonging to range 3 could be decoded to a valid picture
IV-B1 to extract a file with the FAT file system and a file with
the remaining data in the most plausible order as discussed 38 4 sectors need to be added to istat’s starting sector because the flash
before. manager takes the first offset sectors into account and TSK not.
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007 16
file showing a small thumbnail of a video clip as used by PCB - Printed Circuit Board
the multimedia browser on the phone. From the decoded PDA - Personal Data Assistent
thumbnails two of the related video files could be marked RAPI - Remote Application Programming Interface
as “not relevant”. SCSI - Small Computer System Interface
In the search for deleted audio and video data a script was TSK - The Sleuth Kit
developed to use the non erased data to find all data in the TSOP - Thin Small-Outline Package
flash memory file originating from this non erased data . The USB - Universal Serial Bus
script produces a bookmark file for the Hex Workshop hex VFAT - Virtual File Allocation Table
editor [22] to make it easy to find all the known data parts. XSR - Extended Sector Remapper
With another script this bookmark file can be used to overwrite ZIF - Zero Insertion Force
all known data with a predefined pattern in order to make
searching for erased data easier. To assist the manual search for
deleted audio and video data a tool has been used developed
by the NFI to analyze video data. For this tool a .3gp parser R EFERENCES
has been written to search for data fragments originating from [1] R. van der Knijff, “Embedded Systems Analysis”, chapter 11 of “Hand-
.3gp files. After extensive search no audio or video data has book of Computer Crime Investigations - Forensic Tools and Technology”
been found that confirmed the statement of the witness. edited by E. Casey. Academic press, 2002.
[2] W. Jansen and R. Ayers, “Guidelines on Cell Phone Forensics,” Au-
gust 2006. [Online]. Available: http://csrc.nist.gov/publications/drafts/
VI. C ONCLUSION Draft-SP800-101.pdf. [Accessed: November 29, 2006]
[3] B. Carrier, File System Forensic Analysis. Addison-Wesley 2005.
Three techniques have been described for making low level [4] E. Sutter, Firmware Demystified. CMP Books, 2002.
byte-by-byte copies of flash memory chips. More research [5] Samsung Electronics, “APPLICATION NOTE for NAND Flash
needs to be done on the flash read mechanisms used by flasher Memory”, rev, 2, 1999. [Online]. Available: http://www.samsung.
com/Products/Semiconductor/Memory/appnote/app nand.pdf. [Accessed:
tools in order to adapt these mechanisms for usage in the next November 29, 2006].
generation of forensic data acquisitions tools. Steps have been [6] Sandisk, “Sandisk flash memory cards - wear leveling”, October
illustrated for translating acquired flash data to a level that 2003. [Online]. Available: www.sandisk.com/Assets/File/OEM/
WhitePapersAndBrochures/RS-MMC/WPaperWearLevelv1.0.pdf.
can be understood by existing forensic tools targeted towards [Accessed: November 29, 2006].
common used file systems. More research is needed for flash [7] M-Systems, “TrueFFS wear-leveling Mechanism”, Technical note
data that can’t directly be translated to file system level. More (TN-Doc-017). [Online]. Available: www.m-systems.com/NR/rdonlyres/
FCC7D817-38A5-4D80-8471-67DA793EA255/0/TN 017 TrueFFS
research is also needed on the relation between flash specific Wear Leveling Mechanism.pdf. [Accessed: November 29, 2006].
operations like block erasing and wear leveling on one side [8] HDDGURU, “ATA/ATAPI Command Set”, [Online]. Avail-
and the resulting artifacts and potentials for data recovery and able: http://hddguru.com/content/en/documentation/2006.01.
27-ATA-ATAPI-8-rev2b/. [Accessed: November 29, 2006].
analysis on the other side. With the results of this research [9] Samsung Electronics, “Smartmedia Format Introduction (Software Con-
future forensic tools might be able to improve the power and siderations)”, 1999. [Online]. Available: www.win.tue.nl/∼aeb/linux/
efficiency of embedded systems examinations for reasonably smartmedia/SmartMedia Format.pdf.
[10] B. Dipert and M. Levy, “Designing with Flash Memory,” Annabooks,
skilled IT professionals. 1994.
[11] M. Breeuwsma, “Forensic imaging of embedded systems using JTAG
VII. A BBREVIATIONS (boundary-scan)”, Digital Investigation, vol. 3, ed. 1, March 2006.
[12] IEEE - Standards Association, “IEEE standard test access port and
ATA - Advanced Technology Attachment boundary scan architecture - description”, July 23, 2001. [Online]. Avail-
API - Application Programming Interface able: http://standards.ieee.org/reading/ieee/std public/description/testtech/
1149.1-2001 desc.html.
BGA - Ball Grid Array [13] Intel, “Intel Wireless Communications and Computing Package User’s
CFI - Common Flash Interface Guide”, [Online]. Available: http://download.intel.com/design/flcomp/
ECC - Used for both Error Correcting Code and Error packdata/wccp/25341805.pdf. [Accessed: November 29, 2006].
[14] PACE, [Online]. Available: http://www.paceworldwide.com/. [Accessed:
Checking and Correction. November 30, 2006].
EEPROM - Electrically Erasable Programmable Read Only [15] Signum, [Online]. Available: http://www.signum.com. [Accessed:
Memory November 30, 2006].
[16] BPM Microsystems, [Online]. Available: http://www.bpmicro.com/.
FAT - File Allocation Table [Accessed: November 30, 2006].
FFS - Flash File System [17] Samsung Electronics, “Datasheet of the Multi-Chip Package MEMORY,
FSD - File System Driver 256M Bit (16M x16) Synchronous Burst , Multi Bank NOR Flash
Memory / 512M Bit(32Mx16) OneNAND, Flash*2 / 128M Bit(8Mx16)
FTL - Flash Translation Layer Synchronous Burst Uni-Transistor Random Access Memory”, [Online].
I/O - Input/Output Available: http://samsung.com/.
JTAG - Joint Test Action Group [18] Logic Technology, “Universal socket solution”, [Online]. Available:
http://www.logic.nl/trial.aspx?sid=380(ZIPpassword=”FFIT-2006”). [Ac-
LBA - Logical Block Address cessed: November 30, 2006].
LBN - Logical Block Number [19] The Sleuth Kit, [Online]. Available: http://www.sleuthkit.org. [Accessed:
LSN - Logical Sector Number November 29, 2006].
MMC- Multi Media Card [20] rtt, “R-Studio Data Recovery Software”, [Online]. Available: http://
www.data-recovery-software.net/. [Accessed: November 29, 2006].
NFI - Netherlands Forensic Institute [21] Micro Systemation, “.XRY phone examination”, [Online]. Available:
OS - Operating System http://www.msab.com/en/. [Accessed: November 29, 2006].
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007 17
Coert Klaver Coert Klaver received his B.Sc. in computer science in 1986
from the s-Hertogenbosch College of Advanced Technology. He has worked
for several public organizations and private companies since then, developing
embedded software and hardware. In 1999 he joined the Netherlands Forensic
Institute to work as a forensic scientist in the field of embedded systems. His
main interests are embedded operating systems in personal digital electronics
and improvised electronics used in high tech fraud schemes.
Ronald van der Knijff Ronald van der Knijff received his B.Sc. degree
on electrical engineering in 1991 from the Rijswijk Institute of Technology.
After performing military service as a Signal Officer he obtained his M.Sc.
degree on Information Technology in 1996 from the Eindhoven University of
Technology. Since then he works at the Digital Technology and Biometrics
department of the Netherlands Forensic Institute as a scientific investigator.
He is responsible for the embedded systems group and is also court-appointed
expert witness in this area. He is author of the (outdated) cards4labs and TULP
software and founder of the TULP2G framework. He is a visiting lecturer on
‘Cards & IT’ at the Dutch Police Academy; a visiting lecturer on ‘Smart
Cards and Biometrics’ at the Masters Program ‘Information Technology’ of
TiasNimbas Business School and a visiting lecturer on ‘Mobile and Embedded
Device Forensics’ at the Master’s in ‘Artificial Intelligence’ of the University
in Amsterdam (UvA). He wrote a chapter on embedded systems analysis in
Eoghan Casey’s Handbook of Computer Crime Investigation - Forensic Tools
and Technology.