2020 - Superfetch - The Famous Unknown Spy
2020 - Superfetch - The Famous Unknown Spy
https://doi.org/10.1007/s11416-020-00370-y
ORIGINAL PAPER
Abstract
Since Windows Vista, Microsoft has offered us a new life companion called SysMain or Superfetch from its old name. This
is a service which analyzes and records the user daily software use to increase the speed of his or her experience on the
operating system. However, this service provides the opportunity to track software used and private files seen such as movies
or confidential files, reveal his or her lifetime activities and map directories. More than just a privacy issue, this constitutes
a reliable approach in forensic analysis. Furthermore, this service is often misunderstood due to its little documentation and
myths surrounding it, which makes things soon complicated to investigate. This paper is an extended version of the talk
presented at Black Hat USA 2020: it aims at debunking partial and fake news about SysMain and its files. This paper will
examine in detail its architecture, analyze its mechanisms and explain its operating method. It will detail the format of all the
prefetch files which has been undocumented or obsolete so far. In addition, this paper will illustrate forensic concrete cases
in which SysMain turns out to be useful.
1.1 Vocabulary and history The main goal of the service is to increase the speed of the
user experience. To this end, SysMain focuses on two aspects:
The notion of Prefetcher appeared in 2001 within the Amer-
ican brevet 6,633,968 [1], announcing the technique which • Booting faster;
will be a part of Windows XP. Under Windows Vista, another • Gaining time from the start-up to the closure of any pro-
component called Superfetch is added to the algorithm and cess.
the service is renamed within the name of this improvement,
until Windows 10. In this version, the service is renamed
SysMain but Microsoft did not explain this change [2]. To boot as fast as possible, SysMain will frequently cal-
On Windows 10, Superfetch is only a part of SysMain, culate the “optimal layout” which is the order of the file to
which is the name of the whole service, containing many launch in memory at the boot. This list is established during
parts including the Prefetcher and Superfetch. As the name idle states: whenever CPU, disk and memory utilization are
was only changed with Windows 10, the whole algorithm is under a certain percentage of use, the service will process
commonly, though erroneously called Superfetch. to non-urgent operations such as the optimal layout calcu-
lation. The result is written on C:\Windows\Prefetch\
Layout.ini (Fig. 1).
On the other hand, increasing the navigation on applica-
tions is based on the mechanism of reducing page faults in
B Mathilde Venault memory, which is an optimization in memory paging.
venault@et.esiea.fr
Baptiste David Memory paging A process is a set of pages in memory,
bdavid@et.esiea.fr
which are the same sized block of data containing the instruc-
1 Laboratoire de Virologie et de Cryptologie Opérationnelles, tions of a program. Whenever a process is executed, these
ESIEA, Laval, France pages are mapped in theory into the physical memory (RAM)
123
M. Venault, B. David
123
Superfetch: the famous unknown spy
1.3 Involvements the documentation from Windows Internals [9], covering the
basics of the global operation. However, there are inaccu-
The mechanisms explained above imply that SysMain is racies in it, reinforcing some myths already widespread.
watching and keeping traces of each action performed on One of them is that SysMain could be disabled setting
a computer such as: the registry value EnablePrefetecher to 0, within the key
HKLM \SYSTEM \CurrentControlSet\Control\
• Evidences of software installs; SessionManager \MemoryManagement. Despite this
• Dates and times of application launches; method having been shared across the Internet and also writ-
• Number of executions per program; ten about on Windows Internals [9], this is not enough to stop
• Names and locations of files used by each process; SysMain nowawdays.
• Links to cache files that might contain the content of Indeed, whatever the value of this key is, SysMain will
personal text documents. keep on writing its databases. This is observable checking the
“last written time” of files on theC:\Windows\Prefetch
Therefore, SysMain knows a lot about you, from the time directory after executing a program, with the value of
you woke up to your favorite songs. On the one hand, it EnablePrefetcher and EnableSuperfetch set to 0. While the
raises a very serious privacy issue since it tracks your lifetime service was supposed to stop, the prefetch files are still being
activities. Even though Superfetch does improve the speed updated. Regarding the registry value EnableSuperfetch,
of your experience on your OS, it raises the question of the SysMain has a function called PfSvSuperfetchCheck
limit between profiling and spying. AndEnable, which forces this value to 3 no matter the ini-
On the other hand, it is a significant forensic opportunity. tial value. This proves that these registry values do not have
Within a forensic analysis, it helps to analyze very precisely any impact on SysMain’s activity. For the record, disabling
the activities, especially because a few people are even aware SysMain manually on the service control manager will solve
of Superfetch traces, so these traces are usually left on a com- this issue. To do so, the Service Manager must be opened,
puter. For instance, in malware analysis, the .pf files could SysMain service selected and service properties accessed.
prove the program’s execution date and time and shows where The Startup type should be set to Disabled and the changes
were located the malicious files. applied. SysMain will not track the user anymore upon the
So, from a black hat point of view or from a white hat one, computer restart.
it does matter to know how SysMain works and what exactly
its files could reveal.
Part I
1.4 Previous work, documentation and myths
Global operation
Few studies have been done so far to document SysMain
operation.The first issue is that some of those done focused on This section will explain the key points of SysMain’s
Windows versions older than Windows 10 such as the study operation. The components listed and detailed are not the
Digital Forensic Analysis on Prefetch Files [3] and thus, are exhaustive list of SysMain parts but aim at clarifying the key
now obsolete. Regarding the format of the .pf files, doc- points. At first sight, the service could be seen as many divi-
umentation has been accessible and known approximately sions, handled by groups of functions identifiable by their
since 2010, and the most up-to-date is Joachim Metz’s study prefix (Table 1).
[4]. Still, Superfetch has another kind of file ending with a Further, these groups are connected to each other in order
“.db” extension and there are so far only two studies about: to ensure the different types of tasks (Fig. 4). The major
Rewolf’s Blog [5] and Joachim Metz’s study [6]. Despite task is the pf routines: the non-stop jobs responsible for the
these, the documentation is incomplete because Rewolf’s essential functions such as processing traces of applications,
Blog [5] covers precisely one file but sets aside the oth- predicting and pre-launching pages the user might need. They
ers which are very different and Joachim Metz’s study [6] are also the parts communicating with the rest of the kernel:
focuses on reporting observations more than concluding after exchanges with the drivers, RPC (Remote Procedure Call)
a reversing engineering process. requests, logging and event parts, requests for information
Another widely covered aspect of SysMain has been WNF (Windows Notification Facility) states, global system
explained: the hash function, notably [7,8]. Even though lots information.
of these sources disagree on details such as the origin of the Under special circumstances, SysMain can declare an idle
string to hash, they all converge on the same algorithm, pre- state state to process actions that need time and memory
sented on the section about the hash algorithm, in Sect. 5. operations, but do not require to be done every day. This is the
Regarding the global operation, the most reliable source is reason why SysMain will check for power supply presence
123
M. Venault, B. David
Table 1 Function initials and their meaning Agent global This agent oversees the context for one user.
Prefix Name It will define the criteria of Active Days, the limits of daily
phases (which hours and days belong to work time/morning
PfPr Prefetch Processor schedule), and it might organize histories successions of sce-
PfTr Prefetch Trace narios within a certain phase.
PfSi Prefetch Section Info
PfHp Prefetch Heap
PfCl Prefetch Collector Agent application launch AgApl is involved throughout
PfDb Prefetch Database the prediction chain. First, post-processing the data received
PfDi Prefetch Device Info from a driver called FileInfo and concerning the files used by
Rdb ReadyBoost a process or by the agent PfnDb. In addition, it creates Markov
HbDrv Hybrid Drive chains to represent the probabilities of program uses. This
AgAl Agent Application Launch constitutes the base for giving predictions. Given the calcu-
AgGl Agent Global lated probabilities, the agent will take decisions and ask the
AgPd Agent PFN Database memory manager to prelaunch certain pages.
AgRp Agent Robust Performance
AgTw Agent Trace Writer
Agent context This agent is responsible for watching the
overall context:
and utilization of CPU, disk and memory to be sure SysMain
will not be harmful to the user’s activities. In this free time, • The current state of the computer (standby, hibernation);
SysMain will update the optimal layout boot or execute the • The current session information (SID);
command defrag.exe -s -b: they are the idle tasks. • The current user information (user token).
For the actions that do not require doing frequently, SysMain
has periodic tasks, based on action planned to synchronize Whenever a change occurs, it updates the current information
registry keys values or remnant data. To ensure all these tasks, so SysMain could be aware of the new situation and reacts if
the work is divided per agent. needed. For instance, if there was a user session switching,
AgCx would take a snapshot of the current situation to be
able to restore it faster if it is required later. There are two
2 SysMain’s agents modes of disconnection:
SysMain includes agents: they are components dedicated • Classic Disconnect: properly quitting with the button and
to a specific task. They are constantly watching for change logging in on the other session.
and could be triggered anytime. They are loaded in order of • Lazy Disconnect: changing within clicking on “discon-
importance, which is the following: necting”. In this case, the agent will also go through
Classic Disconnection mode.
• Agent PFN (Page Frame Number);
• Agent Global;
• Agent Application Launch;
• Agent Context; Agent robust performance Basically, this agent oversees
• Agent Robust Performance. SysMain performance. According to Windows Internals [9],
it watches for specific file I/O access that might harm system
by populating the standby lists with unneeded data. It also
Agent page frame number The page frame number is an checks the frequency of accesses to the files referenced by
array representing each physical page state in memory on SysMain to avoid pre-launching irrelevant data such as the
the system (Active / Standby / Freed), which will then be whole content of a file opened just once. Thanks to an internal
aware of page faults. The agent will be the direct interlocutor threshold, AgRp prioritizes the files referenced to make sure
of the PFN, so it can relay the page faults or page accesses and the performance is at its best and processes regular checks to
classify the response. For instance, it classifies the memory avoid keeping irrelevant data.
page’s origin: foreground or background application, or the
state: committed page or not. This agent is the one in charge
of getting the data from the memory, which will be the basis
of future pre-launching.
123
Superfetch: the famous unknown spy
123
M. Venault, B. David
123
Superfetch: the famous unknown spy
More than 22 registry keys are frequently consulted, cre- • ApFetch_%SIDHashed Compressed buffer.
ated, or deleted within routine protocols.The most important • ApLaunch_%SIDHashed Compressed buffer with a fixed
among them are: size which is actually a data containing full name of appli-
HKLM\SYSTEM\CurrentControlSet\Control\ cations. This buffer is updated periodically.
SessionManager\Memory Management\ • UserTime_%ID Compressed buffer related to the context
PrefetchParameters for a given user. This buffer is updated periodically.
• BaseTime \DiskAssessment
• BootId
• EnablePrefetcher Despite countless sources claiming
• DiskNumber
that setting this value to 0 disable SysMain, this is not
• LongSeekMicrosecondsBase
the case.
• LongSeekMicrosecondsPerSqrtGB
• EnableSuperfetch This value is forced by SysMain to 3
• PeakTransferMBsPerSecond
on the function PfSvSuperfetchCheckAndEnable().
• RPM
• SeekBreakPages
HKLM\SOFTWARE\Microsoft\Windows NT\
• SizeInGb
CurrentVersion\Prefetcher
• VolumeCreateTime
• VolumeSerialNumber
• BootFilesOptimized Changed on PfXpUpdateOptimal-
• PfRbSmallFileScoreBoost
Layout() just before the update of layout.ini.
• LastDiskLayoutTime
• MinRelayoutHours \StaticConfig
• LastDiskLayoutTimeString Date and time of the last
optimal layout update. • ProtectedProcesses
• MaxPrefetchFiles By default, 256. • ResPriHMImageListFilePath
• ResPriImageListFilePath
HKLM\SOFTWARE\Microsoft\Windows NT\ • Sku
CurrentVersion\Superfetch • ServiceKeyPath
• MigratedServiceKey
• PfPddata This value is database buffer, read within a
function called PfFgContextLoad(). Eventhough this is HKLM\SOFTWARE\Microsoft\Windows\
a registry, there is no doubt it is a database such as the CurrentVersion\OptimalLayout
one in the .db files, since it goes through the same checks
as the buffers from .db files. • Enable auto layout Whenever the operating detects an
• PfIuBatteryPaths Path of the battery. idle state, SysMain updates the optimal layout if the value
• PfIuHistory Compressed buffer of 5604 bytes containing is set to 1. Not always available.
file paths. The value is updated periodically within the • LayoutFilePath Full path of the optimal layout, by
function PfIuHistorySave(). default: C:\Windows\Prefetch\Layout.ini.
123
M. Venault, B. David
HKLM\SOFTWARE\Microsoft\Windows NT\ often forgotten and only a few documentations have so far
CurrentVersion\EMDMgmt This key is linked to the been made. Their main goal is to keep traces of the Agent’s
ReadyBoost job, EMD referring to External Memory Device, work so the data collected could be remnant through OS
the working name of ReadyBoost during its development reboots or context changes such as user switching or hiber-
according to Windows Internals [9]. nation. Each agent of SysMain (AgCx, AgGl, AgAl, AgRp,
AgPd) has one or more .db associated, with different names,
• GroupPolicyDisallowCaches since one agent might need more than one internal database
• Attributes for its operation. Their names and purposes have not been
• CacheSizeInMB documented at all and unfortunately the few studies made
• DeviceStatus about the format only partially documents it.
• LastTestedTime
• ReadyBootTrainingCountSinceLastServicing This value
refers to the activation of ReadyBoost.
7.1 Databases construction
123
Superfetch: the famous unknown spy
Once the step of linking the database file and the internal Table 5 Magic number and associated .db
database is completed, the database is ready to be used else- Magic number .db associated seen
where on the service.
03 AgCs_%s.db AgGlFaultHistory.db
AgGlFaultAppHistory.db
7.2 Database format AgGIUAD_P%s.db dynrespri.7db
cadrespri.7db
Under Windows 10, the format of the compressed databases 05 PfPre_%sidhash.mkd AgAppLaunch.db
files is Table 3. 0F AgRobust.db
Whenever the file is decompressed, the common header
format for the .db files is Table 4. Please note after this header
the generalities are not possible to be made because of each Table 6 FileType parameters
database specificities. FileType Parameters
123
M. Venault, B. David
Explanation of the parameters c Verify the buffer format and basic conditions. If there
is a single problem, the file is deleted.
• Parameter 1: it is useful to calculate the offset of the d Copy the obsolete content on the new buffer, select
volume path, with the following sum: Offset of volume and update information.
path = End of header + Param 1. 5. Compress the buffer, add a header and write it into the
• Parameter 2: it is useful to calculate the offset of the file.
second string, with the following sum: Offset of string 2
= End of volume path + Param 2.
8.2 Scenario format
• Parameter 3: it is useful for offset calculation as recurrent
patterns. Unlike the Parameter 1 and 2, it will be used in
Under Windows 10, the format of the compressed scenario
an offset calculation loop.
is Table 7.
• Parameter 4, 5, 6 and 7: Size parameters for internal
Whenever the scenario is decompressed, the header format
database.
is Table 8.
123
Superfetch: the famous unknown spy
123
M. Venault, B. David
files and give concrete evidence of certain activities, which the traces are often destroyed or at least hidden as much as
are essential for forensic purposes. possible and SysMain is one of the rare means to get precise
and reliable information.
123
Superfetch: the famous unknown spy
• PfPrAgentsLoadFromRegistry() which is based knowledge of the facts. It also aimed at illustrating forensic
on references on the registry. concrete cases in which SysMain turns out to be useful and
at showing how the tool presented can help in these kind of
The function PfPrAgentsLoadFromRegistry plays forensic analysis.
an important role. First, it opens the following key:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ 9.6 Limitations
WindowsNT\CurrentVersion\Superfetch. It gets
all the values on the Agents hive, separating multiple names This study represents a solid base to understand SysMain on
within the same value name if they are many of them with Windows 10. Nevertheless, SysMain is intended to evolve,
the RegEnumValueA() (Fig. 7). and it might change dramatically with the next Windows
SysMain has now strings association, which are actually version or at any upgrade. Therefore, what is documented
library names (argument LibraryName) and function names now could be anytime obsolete or upgraded. Also, access
of the specified library (argument ProcName). The function to Prefetch directory requires an administrator’s privilege.
PfPrAgentLoad() will be called for each association. From a black hat point of view, if the hacker does not have
Therefore, as soon as the registry value is edited, it is administrator access, the process becomes complicated and
possible to indicate to Sysmain a specific dynamic library the knowledge about SysMain might be useless.
and any function of this dll, which will be loaded and exe-
cuted without any further checks. In addition, it is also 9.7 Future work
possible to execute multiple functions this way since the
PfPrAgentLoad() function is called once per value on Even though the study is a solid base covering all of the essen-
the \Agents hive. The SysMain process of agents loading tial aspects of the global operation, it would be interesting
is not explicit enough about characteristics of the DLL to be to explore further on the external components of SysMain’s
loaded and it is thus possible to load a malicious DLL: this is performance such as the drivers. Indeed, the drivers related to
DLL side-loading. Even though this is not a DLL Hijacking SysMain’s activity have lots of functions unused by SysMain.
such as explained on Blog [11], it was important enough to What are their uses? Do they have other functionalities? In
report the weakness to Microsoft. This weakness is likely due addition, SysMain’s performance is closely tied to the mem-
to retro compatibility because this function does not seem to ory manager, including the cache. As the cache management
be used anymore under Windows 10. is not widely documented, it would be rewarding to under-
stand its mechanisms and its exact links with SysMain.
9.5 Conclusion
123
M. Venault, B. David
8. Hiddenillusion Blog: Go prefetch yourself [Online]. Avail- 11. Blog, I.: Windows dll hijacking (hopefully) clarified [Online].
able: https://hiddenillusion.github.io/2016/05/10/go-prefetch- Available: https://itm4n.github.io/windows-dll-hijacking-clar
yourself/. Accessed 05 2016 ified. Accessed 04 2020
9. Yosifovich, S., Russinovich, M.E., Ionescu, A.: Windows Internals,
Part 2 (6th edn.). Microsoft Press, Redmond, Washington (2017)
10. Margosis, A., Rusisnovich, M.: Windows Sysinternal Administra-
Publisher’s Note Springer Nature remains neutral with regard to juris-
tor’s Reference. Microsoft Press, Redmond, Washington (2011)
dictional claims in published maps and institutional affiliations.
123