[go: up one dir, main page]

0% found this document useful (0 votes)
331 views14 pages

2020 - Superfetch - The Famous Unknown Spy

Uploaded by

rohto1945
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
331 views14 pages

2020 - Superfetch - The Famous Unknown Spy

Uploaded by

rohto1945
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Journal of Computer Virology and Hacking Techniques

https://doi.org/10.1007/s11416-020-00370-y

ORIGINAL PAPER

Superfetch: the famous unknown spy


Mathilde Venault1 · Baptiste David1

Received: 11 July 2020 / Accepted: 28 September 2020


© Springer-Verlag France SAS, part of Springer Nature 2020

Abstract
Since Windows Vista, Microsoft has offered us a new life companion called SysMain or Superfetch from its old name. This
is a service which analyzes and records the user daily software use to increase the speed of his or her experience on the
operating system. However, this service provides the opportunity to track software used and private files seen such as movies
or confidential files, reveal his or her lifetime activities and map directories. More than just a privacy issue, this constitutes
a reliable approach in forensic analysis. Furthermore, this service is often misunderstood due to its little documentation and
myths surrounding it, which makes things soon complicated to investigate. This paper is an extended version of the talk
presented at Black Hat USA 2020: it aims at debunking partial and fake news about SysMain and its files. This paper will
examine in detail its architecture, analyze its mechanisms and explain its operating method. It will detail the format of all the
prefetch files which has been undocumented or obsolete so far. In addition, this paper will illustrate forensic concrete cases
in which SysMain turns out to be useful.

Keywords Superfetch · SysMain · Prefetch

1 Introduction 1.2 Goals and mechanisms

1.1 Vocabulary and history The main goal of the service is to increase the speed of the
user experience. To this end, SysMain focuses on two aspects:
The notion of Prefetcher appeared in 2001 within the Amer-
ican brevet 6,633,968 [1], announcing the technique which • Booting faster;
will be a part of Windows XP. Under Windows Vista, another • Gaining time from the start-up to the closure of any pro-
component called Superfetch is added to the algorithm and cess.
the service is renamed within the name of this improvement,
until Windows 10. In this version, the service is renamed
SysMain but Microsoft did not explain this change [2]. To boot as fast as possible, SysMain will frequently cal-
On Windows 10, Superfetch is only a part of SysMain, culate the “optimal layout” which is the order of the file to
which is the name of the whole service, containing many launch in memory at the boot. This list is established during
parts including the Prefetcher and Superfetch. As the name idle states: whenever CPU, disk and memory utilization are
was only changed with Windows 10, the whole algorithm is under a certain percentage of use, the service will process
commonly, though erroneously called Superfetch. to non-urgent operations such as the optimal layout calcu-
lation. The result is written on C:\Windows\Prefetch\
Layout.ini (Fig. 1).
On the other hand, increasing the navigation on applica-
tions is based on the mechanism of reducing page faults in
B Mathilde Venault memory, which is an optimization in memory paging.
venault@et.esiea.fr
Baptiste David Memory paging A process is a set of pages in memory,
bdavid@et.esiea.fr
which are the same sized block of data containing the instruc-
1 Laboratoire de Virologie et de Cryptologie Opérationnelles, tions of a program. Whenever a process is executed, these
ESIEA, Laval, France pages are mapped in theory into the physical memory (RAM)

123
M. Venault, B. David

Fig. 3 View of Prefetch directory


Fig. 1 Extract of Layout.ini

within spaces called frames. To execute instructions, the


CPU has to figure out where the instructions are with two will then rely on the operating system to find the page in the
available pieces of information: the page number and the virtual memory and will bring it back into a free frame of the
instruction offset. To facilitate the operation, each process Physical Memory, so the operation could be redone. Once
has its own Page table, associating page numbers and cor- the page is mapped into the physical memory, the page table
responding frame numbers. Therefore, when the CPU must is updated and resolves the frame number of the initial page
find an instruction, it looks into the process’ Page Table to number, just allocated. The CPU can now find the instruction.
get the frame number containing the page of the instruction. This procedure requires time and memory operations thus
Then, the CPU is able to execute the instruction at the offset reducing the program’s reactivity. Superfetch aims at curtail-
it had in the first place (Fig. 2). ing page faults to avoid this loss of time. For each program,
Still, under special circumstances, a page could be mapped it will log hard page faults occurred and it will record gen-
into the Virtual Memory, on the disk. When it happens, the eral page accesses so it could prelaunch in memory pages the
page table could not resolve the frame number and indicates user might need next time. Each process has one or more .pf
an invalid value to the CPU: a page fault occurs. The memory file on the C:\Windows\Prefetch directory (Fig. 3) as
management unit, responsible for declaring the page fault, a support for the next optimization.

Fig. 2 Memory paging


mechanisms

123
Superfetch: the famous unknown spy

1.3 Involvements the documentation from Windows Internals [9], covering the
basics of the global operation. However, there are inaccu-
The mechanisms explained above imply that SysMain is racies in it, reinforcing some myths already widespread.
watching and keeping traces of each action performed on One of them is that SysMain could be disabled setting
a computer such as: the registry value EnablePrefetecher to 0, within the key
HKLM \SYSTEM \CurrentControlSet\Control\
• Evidences of software installs; SessionManager \MemoryManagement. Despite this
• Dates and times of application launches; method having been shared across the Internet and also writ-
• Number of executions per program; ten about on Windows Internals [9], this is not enough to stop
• Names and locations of files used by each process; SysMain nowawdays.
• Links to cache files that might contain the content of Indeed, whatever the value of this key is, SysMain will
personal text documents. keep on writing its databases. This is observable checking the
“last written time” of files on theC:\Windows\Prefetch
Therefore, SysMain knows a lot about you, from the time directory after executing a program, with the value of
you woke up to your favorite songs. On the one hand, it EnablePrefetcher and EnableSuperfetch set to 0. While the
raises a very serious privacy issue since it tracks your lifetime service was supposed to stop, the prefetch files are still being
activities. Even though Superfetch does improve the speed updated. Regarding the registry value EnableSuperfetch,
of your experience on your OS, it raises the question of the SysMain has a function called PfSvSuperfetchCheck
limit between profiling and spying. AndEnable, which forces this value to 3 no matter the ini-
On the other hand, it is a significant forensic opportunity. tial value. This proves that these registry values do not have
Within a forensic analysis, it helps to analyze very precisely any impact on SysMain’s activity. For the record, disabling
the activities, especially because a few people are even aware SysMain manually on the service control manager will solve
of Superfetch traces, so these traces are usually left on a com- this issue. To do so, the Service Manager must be opened,
puter. For instance, in malware analysis, the .pf files could SysMain service selected and service properties accessed.
prove the program’s execution date and time and shows where The Startup type should be set to Disabled and the changes
were located the malicious files. applied. SysMain will not track the user anymore upon the
So, from a black hat point of view or from a white hat one, computer restart.
it does matter to know how SysMain works and what exactly
its files could reveal.
Part I
1.4 Previous work, documentation and myths
Global operation
Few studies have been done so far to document SysMain
operation.The first issue is that some of those done focused on This section will explain the key points of SysMain’s
Windows versions older than Windows 10 such as the study operation. The components listed and detailed are not the
Digital Forensic Analysis on Prefetch Files [3] and thus, are exhaustive list of SysMain parts but aim at clarifying the key
now obsolete. Regarding the format of the .pf files, doc- points. At first sight, the service could be seen as many divi-
umentation has been accessible and known approximately sions, handled by groups of functions identifiable by their
since 2010, and the most up-to-date is Joachim Metz’s study prefix (Table 1).
[4]. Still, Superfetch has another kind of file ending with a Further, these groups are connected to each other in order
“.db” extension and there are so far only two studies about: to ensure the different types of tasks (Fig. 4). The major
Rewolf’s Blog [5] and Joachim Metz’s study [6]. Despite task is the pf routines: the non-stop jobs responsible for the
these, the documentation is incomplete because Rewolf’s essential functions such as processing traces of applications,
Blog [5] covers precisely one file but sets aside the oth- predicting and pre-launching pages the user might need. They
ers which are very different and Joachim Metz’s study [6] are also the parts communicating with the rest of the kernel:
focuses on reporting observations more than concluding after exchanges with the drivers, RPC (Remote Procedure Call)
a reversing engineering process. requests, logging and event parts, requests for information
Another widely covered aspect of SysMain has been WNF (Windows Notification Facility) states, global system
explained: the hash function, notably [7,8]. Even though lots information.
of these sources disagree on details such as the origin of the Under special circumstances, SysMain can declare an idle
string to hash, they all converge on the same algorithm, pre- state state to process actions that need time and memory
sented on the section about the hash algorithm, in Sect. 5. operations, but do not require to be done every day. This is the
Regarding the global operation, the most reliable source is reason why SysMain will check for power supply presence

123
M. Venault, B. David

Table 1 Function initials and their meaning Agent global This agent oversees the context for one user.
Prefix Name It will define the criteria of Active Days, the limits of daily
phases (which hours and days belong to work time/morning
PfPr Prefetch Processor schedule), and it might organize histories successions of sce-
PfTr Prefetch Trace narios within a certain phase.
PfSi Prefetch Section Info
PfHp Prefetch Heap
PfCl Prefetch Collector Agent application launch AgApl is involved throughout
PfDb Prefetch Database the prediction chain. First, post-processing the data received
PfDi Prefetch Device Info from a driver called FileInfo and concerning the files used by
Rdb ReadyBoost a process or by the agent PfnDb. In addition, it creates Markov
HbDrv Hybrid Drive chains to represent the probabilities of program uses. This
AgAl Agent Application Launch constitutes the base for giving predictions. Given the calcu-
AgGl Agent Global lated probabilities, the agent will take decisions and ask the
AgPd Agent PFN Database memory manager to prelaunch certain pages.
AgRp Agent Robust Performance
AgTw Agent Trace Writer
Agent context This agent is responsible for watching the
overall context:
and utilization of CPU, disk and memory to be sure SysMain
will not be harmful to the user’s activities. In this free time, • The current state of the computer (standby, hibernation);
SysMain will update the optimal layout boot or execute the • The current session information (SID);
command defrag.exe -s -b: they are the idle tasks. • The current user information (user token).
For the actions that do not require doing frequently, SysMain
has periodic tasks, based on action planned to synchronize Whenever a change occurs, it updates the current information
registry keys values or remnant data. To ensure all these tasks, so SysMain could be aware of the new situation and reacts if
the work is divided per agent. needed. For instance, if there was a user session switching,
AgCx would take a snapshot of the current situation to be
able to restore it faster if it is required later. There are two
2 SysMain’s agents modes of disconnection:

SysMain includes agents: they are components dedicated • Classic Disconnect: properly quitting with the button and
to a specific task. They are constantly watching for change logging in on the other session.
and could be triggered anytime. They are loaded in order of • Lazy Disconnect: changing within clicking on “discon-
importance, which is the following: necting”. In this case, the agent will also go through
Classic Disconnection mode.
• Agent PFN (Page Frame Number);
• Agent Global;
• Agent Application Launch;
• Agent Context; Agent robust performance Basically, this agent oversees
• Agent Robust Performance. SysMain performance. According to Windows Internals [9],
it watches for specific file I/O access that might harm system
by populating the standby lists with unneeded data. It also
Agent page frame number The page frame number is an checks the frequency of accesses to the files referenced by
array representing each physical page state in memory on SysMain to avoid pre-launching irrelevant data such as the
the system (Active / Standby / Freed), which will then be whole content of a file opened just once. Thanks to an internal
aware of page faults. The agent will be the direct interlocutor threshold, AgRp prioritizes the files referenced to make sure
of the PFN, so it can relay the page faults or page accesses and the performance is at its best and processes regular checks to
classify the response. For instance, it classifies the memory avoid keeping irrelevant data.
page’s origin: foreground or background application, or the
state: committed page or not. This agent is the one in charge
of getting the data from the memory, which will be the basis
of future pre-launching.

123
Superfetch: the famous unknown spy

Fig. 4 SysMain global operation

3 SysMain’s pillar: PfSvcGlobals algorithm. PfSvcGlobals contains references to many other


important structures such as the ones related to the agents or
To facilitate communication from one function to another, parts of the prefetcher (PfXp, PfSi, PfTr, PfCl, PfIu) which
SysMain uses global variables. PfSvcGlobals is the major are needed for the global operation. In addition, many of its
global variable since the creation of the service. Unlike the flags have an impact on the type of actions which have to be
other global variables, this one is initialized before the main done in the pf routines.
thread worker and is by far the largest variable with 4 456
bytes.PfSvcGlobals contains values necessary to the proper
functioning of the whole service including:
4 Drivers connected to SysMain’s activity
• Handles to its own heap;
4.1 RdyBoost driver
• Handles to registry keys and registry values;
• Handles to process logging, process events or other exter-
There are two terms close to each other that might be confus-
nal kernel communications;
ing: ReadyBoot and ReadyBoost. ReadyBoost is the name of
• Time references for synchronization, task scheduling or
a driver, located at C:\Windows\System32\Drivers\
time measurement;
RdyBoost.sys and which has a set of information on reg-
• Current session information (SID, user information, state
istry within the key HKLM\SYSTEM\CurrentControl
of computer’s components);
Set\Services\rdyboost. ReadyBoot refers to one of
• References to other global variables;
the functionalities of SysMain to increase boot speed and has
• References to important structures;
its proper directory within the C:\Windows\Prefetch\
• Countless flags used everywhere.
ReadyBoot. According to Windows Internals [9], Ready-
Boost is “responsible for writing the cached data to the
It is important to understand that PfSvcGlobals is not NVRAM device. When you insert a USB flash disk into a
“a big array”: it is the pillar of all the components of the system, ReadyBoost looks at the device to determine its per-

123
M. Venault, B. David

formance characteristics and stores the results of its tests in


HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Emdmgmt.
Depending on these characteristics, ReadyBoost might ded-
icate a certain space of the disk for caching data and create
a file ReadyBoost.sfcache in the root of the device.
Data cached is compressed and encrypted per block using
AES (Advanced Encryption Standard) encryption. It allows
faster access to the data from the USB disk, contributing to
SysMain’s goals.

4.2 FileInfo driver

FileInfo is a mini-filter driver which gives to SysMain


information about the files used by a process, located on
C:\Windows\System32\Drivers\FileInfo.
sys. FileInfo allows SysMain to get the names of the current
memory pages processed and information about their origin.
Since FileInfo plays an important role in file construction,
at every launch SysMain starts the service and might even
change the driver’s start configuration, set to automatic start
depending on its needs.
According to Windows Internals [10] the prefetcher was
supposed to execute transparently to other activities on a
system but its file references can lead to sharing violations.
FileInfo was used to watch for potential sharing violation Fig. 5 Buffer given by FileInfo to SysMain
collisions and prevent them by stalling a second operation
on a file being accessed. This explains the need for creating 5 SysMain hash algorithm
an independent driver and not include the component within
the service. Whenever it comes to reverse SysMain, the hash process
In addition, since SysMain does not have the rights to soon becomes familiar. SysMain has its own hash algorithm
get to information from ring 0, using a driver to do so was to serve two purposes:
the easiest solution. Windows Internals [9] explains FileInfo
driver associates streams, identified by a unique key, cur- • Building a part of the .pf files name;
rently implemented as the FsContext field of the respective • Making references on internal hash tables.
file object, with file names so that the user-mode Superfetch
service can identify the specific file stream and offset with
Under Windows 10 the hash algorithm is:
which a page in the standby list belonging to a memory-
mapped section is associated. Result = 314159;
In concrete terms, this driver tracks names and paths f or (i = 0; i < len(StringT oH ash); i + +)
of the files used by a given process to build a buffer
char = StringT oH ash[i];
required to create the .pf files. To communicate, Sys-
char ma j = RtlU pcaseU nicodeChar (char );
Main sends an IO Control code through the function
NtDeviceIoControlFile() and gets back the buffer Result = (Result × 37 + char ma j) × 37;
built by FileInfo Fig. 5 through the function parameters. The (1)
output buffer is written in the NL format, which includes
path environment instead of directories full path. The buffer The initialization value used as a seed is the beginning
will be then translated under the watch of SysMain and post- of the pi decimals (3,14159). Superfetch has known several
processed to contribute to the formation of .pf files. hash algorithms since its creation, but they all share the same
Once SysMain has the NL formatted buffer, the envi- basis. According to some studies [7,8], the following ele-
ronment variable will be translated thanks to an internal ments might have been changed on the previous versions:
“translation table” and the data regarding the page numbers
and offsets will be stored into hash tables: the basis of the .pf • initialization value of the hash;
file. • multiplier coefficient;

123
Superfetch: the famous unknown spy

• add of modulo operation. • LastResPriGenTime Corresponds to a SystemFileTime of


the last synchronization, value x transformed such as x
This hash is far from being cryptographic: its operations shleft 64 shleft 23.
are basic and easy to reverse. Since uppercase and lower- • ResPriOptions
case characters will be processed the same way, two different • StartedComponents
input strings might have the same hashed output. Therefore, • Rebalancer Flags
the algorithm is not second pre-image resistant nor collision • PfRbMinPagesToPrefetch;
resistant. Still, this hash algorithm does not need to be cryp- • PfRbPrefetchStopTreshold
tographic because the strings hashed are used as references. • PfRbPostBootDurationInMs
There is no actual need to protect the input, which is not • PfRbPrioInversionThreshold
sensitive information. • PfRbImageScoreBoost
• PfRbLargeFileSizeInPages

6 Registry keys \PfAp

More than 22 registry keys are frequently consulted, cre- • ApFetch_%SIDHashed Compressed buffer.
ated, or deleted within routine protocols.The most important • ApLaunch_%SIDHashed Compressed buffer with a fixed
among them are: size which is actually a data containing full name of appli-
HKLM\SYSTEM\CurrentControlSet\Control\ cations. This buffer is updated periodically.
SessionManager\Memory Management\ • UserTime_%ID Compressed buffer related to the context
PrefetchParameters for a given user. This buffer is updated periodically.

• BaseTime \DiskAssessment
• BootId
• EnablePrefetcher Despite countless sources claiming
• DiskNumber
that setting this value to 0 disable SysMain, this is not
• LongSeekMicrosecondsBase
the case.
• LongSeekMicrosecondsPerSqrtGB
• EnableSuperfetch This value is forced by SysMain to 3
• PeakTransferMBsPerSecond
on the function PfSvSuperfetchCheckAndEnable().
• RPM
• SeekBreakPages
HKLM\SOFTWARE\Microsoft\Windows NT\
• SizeInGb
CurrentVersion\Prefetcher
• VolumeCreateTime
• VolumeSerialNumber
• BootFilesOptimized Changed on PfXpUpdateOptimal-
• PfRbSmallFileScoreBoost
Layout() just before the update of layout.ini.
• LastDiskLayoutTime
• MinRelayoutHours \StaticConfig
• LastDiskLayoutTimeString Date and time of the last
optimal layout update. • ProtectedProcesses
• MaxPrefetchFiles By default, 256. • ResPriHMImageListFilePath
• ResPriImageListFilePath
HKLM\SOFTWARE\Microsoft\Windows NT\ • Sku
CurrentVersion\Superfetch • ServiceKeyPath
• MigratedServiceKey
• PfPddata This value is database buffer, read within a
function called PfFgContextLoad(). Eventhough this is HKLM\SOFTWARE\Microsoft\Windows\
a registry, there is no doubt it is a database such as the CurrentVersion\OptimalLayout
one in the .db files, since it goes through the same checks
as the buffers from .db files. • Enable auto layout Whenever the operating detects an
• PfIuBatteryPaths Path of the battery. idle state, SysMain updates the optimal layout if the value
• PfIuHistory Compressed buffer of 5604 bytes containing is set to 1. Not always available.
file paths. The value is updated periodically within the • LayoutFilePath Full path of the optimal layout, by
function PfIuHistorySave(). default: C:\Windows\Prefetch\Layout.ini.

123
M. Venault, B. David

HKLM\SOFTWARE\Microsoft\Windows NT\ often forgotten and only a few documentations have so far
CurrentVersion\EMDMgmt This key is linked to the been made. Their main goal is to keep traces of the Agent’s
ReadyBoost job, EMD referring to External Memory Device, work so the data collected could be remnant through OS
the working name of ReadyBoost during its development reboots or context changes such as user switching or hiber-
according to Windows Internals [9]. nation. Each agent of SysMain (AgCx, AgGl, AgAl, AgRp,
AgPd) has one or more .db associated, with different names,
• GroupPolicyDisallowCaches since one agent might need more than one internal database
• Attributes for its operation. Their names and purposes have not been
• CacheSizeInMB documented at all and unfortunately the few studies made
• DeviceStatus about the format only partially documents it.
• LastTestedTime
• ReadyBootTrainingCountSinceLastServicing This value
refers to the activation of ReadyBoost.
7.1 Databases construction

Databases files are traces of internal databases left during


Part II
Superfetch activities. Each Agent has its type of database, but
it is important to understand all the databases are connected to
Prefetch Files
each other. Despite their different purposes, they are built on
the same basis. Sysmain initialization functions build internal
SysMain has two major types of support files:
databases “from scratch”, from different functions proper to
the agent. Each initialization function defines either default
• The scenario files (.pf) relative to programs;
parameters or specific parameters, then calls an underlying
• The database files (.db or .7db) relative to the agents.
function and so on. This means they are all built on the same
basis. Since they have the same basis, database files are read
They both can be found on the C:\Windows\Prefetch
within the same process and have the same way of finding
directory, but the databases files are not always present
back the information. Here is the lecture process that aims at
in the Prefetch directory. Another common characteristic
extracting useful information from the database file:
is the compression within the XPRESS_HUFFMAN algo-
rithm and with the RtlCompressBuffer() function
from Ntroskrnl.lib. The Table 2 details the databases
files and their brief description. 1. Get the view of the file when it is required, decompress
the file if required and read the buffer;
2. Initialize a corresponding internal database with default
7 Databases: the agent’s support parameters and defaults sizes;
3. Check file format conditions;
The database files are the files ending with .db or .7db files. 4. Fill in the information extracted from the file and adapts
Since they are not always present on the directory, they are the internal database characteristics with the information.

Table 2 Database names detailed


Database name Details

AgAppLaunch.db Related to the applications; contains names of .exe.


AgRobust.db Related to the performance of SysMain.
AgCx_%SID.snp.db Related to user sessions; once per user. If there was a disconnection from one session
to another, SysMain would take a snapshot of the previous session so it could be loaded
faster if required later.
AgGlFaultHistory.db AgGlFgAppHistory.db Related to the AgPd Agent, covering data from PFN, respecively referencing to Page
AgGlGlobalHistory.db Faults, Foreground application and global accesses.
AGlUAD_P_%SID.db AgGlUAD_%SID.db UAD might refer to User Active Days. Since SysMain includes the context within its
prediction, could be related to the context.
dynrespri.7db cadrespri.7db Referencing to Dynamic Reserved priority. They seem to be used as a basis to sort data
and synchronize. Whenever the file is created, the registry value LastResPriGenTime is
updated.

123
Superfetch: the famous unknown spy

Once the step of linking the database file and the internal Table 5 Magic number and associated .db
database is completed, the database is ready to be used else- Magic number .db associated seen
where on the service.
03 AgCs_%s.db AgGlFaultHistory.db
AgGlFaultAppHistory.db
7.2 Database format AgGIUAD_P%s.db dynrespri.7db
cadrespri.7db
Under Windows 10, the format of the compressed databases 05 PfPre_%sidhash.mkd AgAppLaunch.db
files is Table 3. 0F AgRobust.db
Whenever the file is decompressed, the common header
format for the .db files is Table 4. Please note after this header
the generalities are not possible to be made because of each Table 6 FileType parameters
database specificities. FileType Parameters

5 40h; 58h; 10h; 10h; 10h ; 10h; 0h ;0h


Magic number The magic number have been seen for groups 6 48h; 58h; 60h; 18h; 20h ; 10h; 10h ;0h
of databases but any connection could have been established. 7 48h; 48h ;60h ;18h ;10h ;10h ;10h ; 0h
Table 5 shows the magic number seen and their associated 8 60h; 38h; 50h; 8; 8; 14h; 8; 0h
database on one specific machine. 9 0h
A 60h; 38h; 50h; 8; 8; C; 8h; 0h
B 60h; 38h; 50h; 10h; 10h; 10h; 10h
Sizes The minimum total size is 72 bytes, verified in C 60h; 38h; 50h; C; 08h; 08h; 08h
common compliance function called PfDbFileVerify D 0h
E 48h; 70h; 90h; 10h; 10h; 10h; 10h
F 68h; 40h; 50h; 8h; 8h; 14h; 8h
Table 3 Compressed .db file format
10 60h; 40h; 88h; 10h; 18h; 8h; 8h
Offset Content 11 0h
0x00 0x4d4d41 MAM 12 50h; 50h; 58h; 18h; 10h; 10h; 10h
0x04 Total size of the decompressed data 13 60h; 38h; 50h; 8h; 8h: 8h; 8h
0x08 Checksum 14 60h; 40h; 58h; 10h; 8h; 8h; 8h
0x10 Data compressed 15 60h; 50h; 58h; 10h; 18h; 8h; 8h
16 60h; 40h; 50h; 8h; 8h; 8h; 8h

Table 4 Uncompressed .db file format


Offset Content Common(). The header size follows the same rule, it must
0x00 Magic Number be higher than 72 bytes. Still, each database file might have
0x04 Total Size
its own criteria thus the required sizes could be higher in
0x08 Header Size
complementary compliance functions.
0x0C FileType Param
0x10 Param 1
0x14 Param 2 Filetype parameters The FileType number indicates the
0x18 Param 3 index in an internal array relative to database sizes and offset
0x1C Param 4
calculation on the file (Table 6). The array is a 9 DWORD
0x20 Param 5
table long, declared within the name PfDbDatabasePar
amsForFileType. If there are reading problems, Sys-
0x24 Param 6
Main, thanks to the index indicated by the filetype number,
0x28 Param 7
knows where to find the parameters.
0x34 Count of volumes
The spread of the different FileType could be explained
0x38 Count of paths registered
by retro compatibility. Under Windows 10, the values tend
0x3C Check condition verified after the
lecture process to ensure integrity
to the last values, then the previous versions might have pre-
of the data vious numbers and some of them remained and some others
0x40 Condition to do specific lecture operations evaluated.

123
M. Venault, B. David

Explanation of the parameters c Verify the buffer format and basic conditions. If there
is a single problem, the file is deleted.
• Parameter 1: it is useful to calculate the offset of the d Copy the obsolete content on the new buffer, select
volume path, with the following sum: Offset of volume and update information.
path = End of header + Param 1. 5. Compress the buffer, add a header and write it into the
• Parameter 2: it is useful to calculate the offset of the file.
second string, with the following sum: Offset of string 2
= End of volume path + Param 2.
8.2 Scenario format
• Parameter 3: it is useful for offset calculation as recurrent
patterns. Unlike the Parameter 1 and 2, it will be used in
Under Windows 10, the format of the compressed scenario
an offset calculation loop.
is Table 7.
• Parameter 4, 5, 6 and 7: Size parameters for internal
Whenever the scenario is decompressed, the header format
database.
is Table 8.

8.3 Scenario content


8 Scenarios: the traces of the user’s activities
The scenarios are the tip of the SysMain iceberg: they con-
Scenarios are the supports for Superfetch to log what hap-
tain the result of a long process of the data generated by the
pened during a program’s execution and improve future
user experience within a certain context, and the condition
predictions. An application has one or more scenario files
to achieve what SysMain has been created for: prelaunching
attributed depending on the way it has been executed. The
what is required by a user when the time comes. Once the
name is always composed of “NameoftheApp - HASH.pf”,
buffer is decompressed, the last part of the file contains full
the hash referring to the command line that allowed the exe-
paths of files used by the applications. The majority are inter-
cution hashed. The highest number of scenarios files within
nal files required for the global operation of the application:
the Prefetch directory and their size are fixed by the value
DLLs (Dynamic Link Libraries), dependencies, or any files
of the registry key: SOFTWARE\Microsoft\Windows
with an extension specific to the application. Still, there are
NT\CurrentVersion\Prefetcher. By default, the
scenario maximum number is 256 and the maximum size
is 10 485 760 bytes. Whenever a process starts up, the sce- Table 7 Compressed .pf file format
nario is immediately created or updated, referencing the page
Offset Content
accesses and the page faults that occurred to avoid them at
the next launch. Given the references of these pages, the 0x00 0x4d4d41 MAM
next time, the process will be started, SysMain determines 0x04 Total size of the decompressed data
whether or not it is a prelaunchable application or not, calcu- 0x08 Data compressed
lates the entry threshold, compares it to an internal threshold
so it could be sure it is worth to prelaunch the page and does
so. Table 8 Uncompressed .pf file format
Offset Content
8.1 Scenario construction 0x00 Operating system identifier
0x04 SCCA: prefetch signature
Whenever a program is launched, SysMain always follows
0x08 11 41 43 43: format condition
the same logic to build the corresponding scenario:
0x0C Total size of the file
0x10 Program name
1. Get what the scenario name would be;
0x4C Hash Value
2. Retrieve the file used by the process thanks to the minifil-
0x50 RESERVED
ter driver FileInfo;
0x58 Count of paths registered
3. Initialize a “scenario info” buffer, containing the basic
0x64 Offset of paths block
header such as process information, the current date and
0x6C Offset of volume block
time execution;
0x64 Size of volume block
4. Only if the scenario file already exists:
0x80 8 last dates and times of execution
a Open the existing file; 0xC8 Count of executions
b Decompress and gets the older buffer;

123
Superfetch: the famous unknown spy

9.1 Malware analysis

As part of malware analysis, SysMain could be useful to


trace the history of events on a computer and determine the
circumstances of the attack. This may help to figure the exact
time of the attack and to identify the malicious payload.

Methods It would be possible to to visit the prefetch files,


examine the program traces at the estimated period of the
attack and look for something suspicious. If the malware is
Fig. 6 Extract of VLC.EXE-5A3EF7FA.pf (decompressed) an executable, its scenario would have been created: look
for prefetch files with unfamiliar names and recent activity.
If the malware is in the form of a script, it would have been
executed through a shell: search into the command-line inter-
also recent files or often used files (Fig. 6). Thus, the sce- preter scenarios the names of the scripts they have executed,
nario of photo editors contains the name of the last photos name and the location of the payload might be found there.
opened, the scenario of media players contains the names of If the script has not been removed, it might remain on the
songs listened to or the title of the last movie seen but also system and it might allow a reverse engineer process of the
dates and hours of these actions. Combining the list of those, malware.
SysMains allows to determine habits and hobbies of the user. The scenarios could also help to understand the vector of
In addition, SysMain goes further, storing references to the infection. If it is a phishing attack, traces might be left in
the cache files. The cache files are the result of the Cache Word office programs scenarios or pdf files readers: look for
Manager performance, which stores temporarily data to anything suspicious in the recent files opened. In this way,
reduce the access time next time the data is required. Within this might complete a forensic investigation in web browsers.
the UserDirectory\AppData\Local\Microsoft\
Windows\Cache, there are parts of files accessed remain-
ing from previous use. The Cache is designed to do stream Example On an incident response mission, a computer from
caching, which implies the data stored could be also parts an employee has been infected by a malware which deletes
of files, in clear. Therefore, SysMain referencing the cache files on the system and when the victim found out, he imme-
files gives the possibility to view clear text data from the user diately turned off his computer. The goal: find out what
applications, including documents editor containing private happened. The first look at the recent activities reveals that
data. PowerShell has been active at 4:23 pm, whereas the owner is
not supposed to use this kind of application. Another scenario
demonstrates PowerShell was executed less than 5 times but
unfortunately does not list any suspicious file.
9 Forensic uses and opportunities Investigating further on the recent activities shows that the
victim has opened Word for the last at 4:21 pm, and among the
Superfetch resources turn out to be useful for many forensic files related to Word’s activity there is one from a USB device
situations. Finding traces of executed programs provides the which is: D:\MYUSB\NovemberSchedule.docx. After-
opportunity to determine which activities have been done on wards, the victime precise he was reading this document from
a computer. In addition, the details given by the scenarios a USB key while all of his professional documents were dis-
allow to understand the context of the program use, thanks appearing from his desktop. The device is retrieved and the
to the hours of the activity or the amount of time it has been document opened in a sandbox: there is VBA macro exe-
used. Thus, it is possible to find some patterns in the user cuting directly on PowerShell the command deleting all the
activities or establish connections between actions. This part content of the user documents.
aims at showing some of the potentials clues that may be
found in the scenarios. All of the operations can be achieved 9.2 Suspicion of illegitimate activity
with the tool called SysMain View, available on Github
at MathildeVenault/SysMainView. This tool allows to com- The prefetch files play an important role in the context of
press, decompress, edit, and view information on databases criminal investigations since they are a means to track any
and scenarios files. This section details some circumstances suspect as long as there is an access to his or her computer.
where SysMain traces might contribute to forensic analysis. Indeed, they provide lots of information on the user personal

123
M. Venault, B. David

files and give concrete evidence of certain activities, which the traces are often destroyed or at least hidden as much as
are essential for forensic purposes. possible and SysMain is one of the rare means to get precise
and reliable information.

Methods The first step in this kind of forensic investiga-


tion involves profiling the owner of the machine. It includes Example A company just fired an employee for professional
determining what the common use of the machine is: what the misconduct. The company suspects illegal activities on the
context of use is (professional or private), which programs professional machine. How could it be possible to find evi-
are used the most, when the usual activities are realized, and dence of such acts?
so on. To this end, it is interesting to observe for each pro- The first approach is to look for suspicious programs, that
gram how many times they have been executed and the last could have been used for illegal activities, and among the
dates and times of execution. The hours of program execu- prefetch files, one catches attention: a scenario proving the
tion could give the exact schedule of the user. For instance, use of tor browser, one of the means to access to the dark web.
if Microsoft Teams and any mailbox application are opened This shows that tor has been executed more than fifty times
every morning between 8 am and 10 am except the Saturday and among the last execution dates and times, the majority
and Sunday it is possible to assume the computer is used indicate the execution at night. Looking into other scenarios
for professional purposes. Similarly, if a movie media player reveals a list of more than 30 files named “Passport” with
or video games applications are opened each evening, the identification numbers and last names in the scenario of a
machine is likely a personal computer. photo editor.
Profiling could be expended also thanks to the names of
the files used by each process, recorded in the scenarios. This 9.3 Warning: scenario falsification
tells a lot about user activities, especially text editors or media
players. This kind of program reveals the type of documents It is important to keep in mind that Superfetch traces might
the user has and exposes his or her hobbies and preferences. be falsified. Indeed, any hacker could remove prefetch files
Given the movies and songs recorded on media players, the or edit the information that could betray what has been done
type of music and movies he or she likes could be guessed. on the computer. For the record, it is possible to save the sce-
This applies to lots of other things such as text documents, nario corresponding to the program that is about to been used
which might reveal favorite readings or even wider informa- and once the action that has to be hidden is done, replace the
tion, such as works the user is working on lately. Sometimes, legitimate scenario with the scenario backed-up before. Sim-
the files recorded lead to other important information such ilarly, it is possible to edit sensitive information such as the
as events or intents. If the user has lately written a CV, it is dates and the count of executions, directly on the legitimate
possible to speculate he or she might think about getting a scenario. Windows will not notice the change and process
new job. If the user has seen his or her last holiday pictures, the scenario, without the traces of what has been done. In a
then the name of the folder would be recorded, and it might forensic investigation whilst falsification is unlikely to have
contain details about the date or the destination. been done, it is still a possibility. SysMain traces are useful
In addition to profile someone, the scenarios constitute but not infallible.
concrete evidence of the user’s activity on the machine.
Indeed, whether the file is still stored on the computer mem-
ory or not, the record on the scenarios remains. Looking Part III
into the prefetch files allows listing the files that have been
opened on the computer, even if they are now removed or if Remarks and Conclusions
they have been consulted from an external device. Irrespec-
tive of whether the file is still on the disk, the record remains 9.4 Weaknesses of SysMain
since SysMain references the original path of the file at the
moment the file was consulted. Grouping the different direc- The biggest weaknesses of SysMain are due to the need for
tories and files registered can thus lead to a precise mapping retro compatibility. Among its 2500 functions, some of them
of the computer, in the past. show deficiencies when functions need to adapt to old OS
The same goes for the applications: SysMain keeps the characteristics. For instance, the process of loading agents.
scenario of a program uninstalled for a certain time after- Whenever it comes to load its agents, SysMain has two ways
wards. The maximum duration depends on the global use of of processing:
the computer; SysMain will remove the corresponding sce-
nario once it figures out the file is not useful anymore, thanks • PfSvLoadDefaultAgents() which is an associa-
to its Agent Robust Performance. In the case of criminal acts, tion of basis load;

123
Superfetch: the famous unknown spy

• PfPrAgentsLoadFromRegistry() which is based knowledge of the facts. It also aimed at illustrating forensic
on references on the registry. concrete cases in which SysMain turns out to be useful and
at showing how the tool presented can help in these kind of
The function PfPrAgentsLoadFromRegistry plays forensic analysis.
an important role. First, it opens the following key:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ 9.6 Limitations
WindowsNT\CurrentVersion\Superfetch. It gets
all the values on the Agents hive, separating multiple names This study represents a solid base to understand SysMain on
within the same value name if they are many of them with Windows 10. Nevertheless, SysMain is intended to evolve,
the RegEnumValueA() (Fig. 7). and it might change dramatically with the next Windows
SysMain has now strings association, which are actually version or at any upgrade. Therefore, what is documented
library names (argument LibraryName) and function names now could be anytime obsolete or upgraded. Also, access
of the specified library (argument ProcName). The function to Prefetch directory requires an administrator’s privilege.
PfPrAgentLoad() will be called for each association. From a black hat point of view, if the hacker does not have
Therefore, as soon as the registry value is edited, it is administrator access, the process becomes complicated and
possible to indicate to Sysmain a specific dynamic library the knowledge about SysMain might be useless.
and any function of this dll, which will be loaded and exe-
cuted without any further checks. In addition, it is also 9.7 Future work
possible to execute multiple functions this way since the
PfPrAgentLoad() function is called once per value on Even though the study is a solid base covering all of the essen-
the \Agents hive. The SysMain process of agents loading tial aspects of the global operation, it would be interesting
is not explicit enough about characteristics of the DLL to be to explore further on the external components of SysMain’s
loaded and it is thus possible to load a malicious DLL: this is performance such as the drivers. Indeed, the drivers related to
DLL side-loading. Even though this is not a DLL Hijacking SysMain’s activity have lots of functions unused by SysMain.
such as explained on Blog [11], it was important enough to What are their uses? Do they have other functionalities? In
report the weakness to Microsoft. This weakness is likely due addition, SysMain’s performance is closely tied to the mem-
to retro compatibility because this function does not seem to ory manager, including the cache. As the cache management
be used anymore under Windows 10. is not widely documented, it would be rewarding to under-
stand its mechanisms and its exact links with SysMain.
9.5 Conclusion

SysMain has been misunderstood for a while, whereas it References


plays an important role within Windows’ daily performance.
Reversing was challenging: SysMain has lots of functions 1. Zwiegincew, A., Walsh, J.E.: Prefetching of pages prior to a hard
belonging to lots of different notions and the backward page fault sequence. U.S. Patent (2001)
compatibility needs make its architecture, which is already 2. suat.cini: Superfetch service has been promoted to sysmain. con-
gratulations! [Online]. Available: https://answers.microsoft.com/
complex by design, even wider. Another major point was en-us/insider/forum/all/superfetch-service-has-been-promoted-
evaluating the consequences of SysMain’s job. The privacy to-sysmain/395cd8b7-7a02-44fa-af91-dd6b358b7276. Accessed
issue is undeniable. As long as SysMain is enabled on a 07 2018
computer, it is possible to track the user and exploit the infor- 3. Shashidhar, N.K., Novak, D.: Digital forensic analysis on
prefetch files. Int. J. Inf. Secur. Sci. 4(2), 39–49 (2015).
mation. This paper aimed at clarifying SysMain’s operation https://pdfs.semanticscholar.org/2e5e/bffd41661a4ca85420be88
to give the means to anyone to make this decision in full 1f70b2162a4638.pdf
4. Metz, J.: Superfetch databases [Online]. Available: https://
github.com/libyal/libscca/blob/master/documentation/Windows
%20Prefetch%20File%20%28PF%29%20format.asciidocc.
Accessed 02 2020
5. Blog, R.: Windows superfetch file format-partial specification
[Online]. Available: http://blog.rewolf.pl/blog/?p=214. Accessed
10 2011
6. Metz, J.: Superfetch databases [Online]. Available: https://
github.com/libyal/libagdb/blob/master/documentation/Windows
%20SuperFetch%20(DB)%20format.asciidoc. Accessed 04 2014
7. Blog, H.: Prefetch hash calculator [Online]. Available: http://
www.hexacorn.com/blog/2012/06/13/prefetch-hash-calculator-
Fig. 7 Part 1 of PfPrAgentsLoadFromRegistry() function a-hash-lookup-table-xpvistaw7w2k3w2k8. Accessed 06 2012

123
M. Venault, B. David

8. Hiddenillusion Blog: Go prefetch yourself [Online]. Avail- 11. Blog, I.: Windows dll hijacking (hopefully) clarified [Online].
able: https://hiddenillusion.github.io/2016/05/10/go-prefetch- Available: https://itm4n.github.io/windows-dll-hijacking-clar
yourself/. Accessed 05 2016 ified. Accessed 04 2020
9. Yosifovich, S., Russinovich, M.E., Ionescu, A.: Windows Internals,
Part 2 (6th edn.). Microsoft Press, Redmond, Washington (2017)
10. Margosis, A., Rusisnovich, M.: Windows Sysinternal Administra-
Publisher’s Note Springer Nature remains neutral with regard to juris-
tor’s Reference. Microsoft Press, Redmond, Washington (2011)
dictional claims in published maps and institutional affiliations.

123

You might also like