Lesson 3:
Static & Dynamic Analysis
CSC448/548-CYEN404 – REVERSE ENGINEERING
DR. ANDREY TIMOFEYEV
OUTLINE
•Static analysis.
• Assessment and reconnaissance.
• Disassembly.
• Decompilation.
•Dynamic analysis.
• Monitoring.
• Tracing.
• Debugging
• Memory analysis.
•Obfuscation.
•Instrumentation.
STATIC ANALYSIS: PROCESS
•Static analysis.
• Process of understanding the structure, behavior, functionality of a given file without executing it.
•Static analysis steps:
• Initial assessment & reconnaissance.
• Disassembly & decompilation.
• Functions identification.
• Data analysis.
• Control flow analysis.
• String analysis.
• Patterns & signature matching.
STATIC ANALYSIS: ASSESSMENT & RECONNAISSANCE (1)
•Initial assessment & reconnaissance.
• Gather basic file information, identify file type, assess potential obfuscation.
•File identification.
• File type & format.
• Executable, library, firmware, script, archive, document.
• File architecture.
• x86, x64, ARM.
• Headers, signatures, metadata.
• Assessing obfuscation.
• Packing.
• Compressed executable (UPX, PECompact, AsPack).
• Protection.
• Protected executable (Themida, AsProtect, Enigma Protector).
• Encryption.
STATIC ANALYSIS: ASSESSMENT & RECONNAISSANCE (2)
•Executable file headers.
• Each OS has its own format for executable files.
• Portable Executable (PE) – Windows.
.acm, .ax, .cpl, .dll, .drv, .efi, .exe, .mui, .ocx, .scr, .sys, .tsp, .mun
• Executable and Linkable Format (ELF) – Unix.
none, .axf, .bin, .elf, .o, .out, .prx, .puff, .ko, .mod, and .so
• Mach Object (Mach-O) – MacOS.
none, .o, .dylib, .bundle
• Hierarchical file headers contain information about code location & entry point.
STATIC ANALYSIS: ASSESSMENT & RECONNAISSANCE (3)
•Executable file headers (cont).
• Components of PE file header:
• MZ header.
• DOS stub.
• PE header.
• Location of code & data in file.
• How file will be mapped into memory.
• Address of entry point (loaded to EIP register).
• Data directories.
• Addresses of important tables.
• Section table.
• Contains address of import table.
• Import table.
• Contains libraries and APIs.
STATIC ANALYSIS: ASSESSMENT & RECONNAISSANCE (4)
•Executable file headers (cont).
• Components of PE file header:
• MZ header.
• DOS stub.
• PE header.
• Location of code & data in file.
• How file will be mapped into memory.
• Address of entry point (loaded to EIP register).
• Data directories.
• Addresses of important tables.
• Section table.
• Contains address of import table.
• Import table.
• Contains libraries and APIs.
STATIC ANALYSIS: ASSESSMENT & RECONNAISSANCE (5)
•Executable file headers (cont).
• Components of ELF file header:
• ELF magic.
• ELF header.
• Essential information about file format & structure.
• Program header table (optional).
• Segments of file for execution & loading.
• Section header table.
• Individual sections of the file (code, data, symbols).
STATIC ANALYSIS: DISASSEMBLY & DECOMPILATION (1)
•Disassembly.
• Converting machine code into assembly language representation.
•Decompilation.
• Converting assembly code into high-level language representation.
•Language generations:
• First-generation.
• 0/1, hexadecimal.
• Machine code, byte code, binaries.
• Second-generation.
• Mapping between binary/hex and opcode mnemonics.
• Assembly.
• Third-generation.
• Keywords & constructs as program building blocks.
• C, Java, Python.
• Fourth-generation.
• High-level, more abstracted, more domain specific.
• SQL, SAS, MATLAB.
STATIC ANALYSIS: DISASSEMBLY & DECOMPILATION (2)
•Basic disassembly process:
• Identify region of code to disassemble.
• Distinguish between instructions/data and find entry point.
• Consult with executable file header (PE, ELF, Mach-O).
• Match binary string with opcode mnemonic.
• Determine instruction length, determine operands, understand prefixes.
• Format & output assembly instruction.
• Choose between output syntax types (x86, AT&T).
• Advance to next instruction & repeat until no more instructions.
STATIC ANALYSIS: DISASSEMBLY & DECOMPILATION (3)
•Disassembly algorithms:
• Linear sweep.
• Begins with first byte in code region, disassembling one instruction at a time.
• Advantage: complete coverage.
• Disadvantage: does not account for data comingled with code.
• Recursive descent.
• Based on control flow: instruction is disassembled based on being referenced by
another instruction.
• Instructions classified by the affect on EIP:
• Sequential instructions.
• Conditional & unconditional branching.
• Function calls & returns.
• Advantage: distinguishes code from data.
• Disadvantage: inability to follow indirect code paths.
• Jumps/calls with tables of pointers for look up addresses.
• Mitigated with heuristics.
STATIC ANALYSIS: DISASSEMBLY & DECOMPILATION (4)
•Decompilation challenges:
• Compilation process is lossy.
• No variable/function names, no data types/structures in machine language.
• Compilation is many-to-many operation.
• Compiling and decompiling file yields different results.
• Compilation is language & library dependent.
• Appropriate decompiler must be used.
• Must be based on high quality disassembly outputs.
• Errors and omission in disassembly propagate to decompilation.
STATIC ANALYSIS: FUNCTIONS IDENTIFICATION
•Functions identification.
• Locate function boundaries.
• Identify starting/ending points of functions within the code.
• Analyze function properties.
• Examine function names, parameters, local variables, interactions with other functions.
• Create call graph.
• Visualize function call relationships to understand code flow & structure.
STATIC ANALYSIS: DATA ANALYSIS
•Data analysis.
• Identify data types & structures.
• Recognize variables, data structures, and their relationships within the code.
• Track data flow.
• Follow how data is used & modified throughout the code.
• Understand purpose & potential vulnerabilities.
• Analyze cross-references.
• Find references to a particular variable or data structure.
• Grasp its impact on the code's behavior.
STATIC ANALYSIS: CONTROL FLOW ANALYSIS
•Control flow analysis.
• Trace code execution paths.
• Map the possible execution paths through the code.
• Conditional branches & loops.
• Identify key decision points.
• Understand how decisions are made within the code and their potential implications.
• Discover hidden code.
• Uncover code that might be reachable only under certain conditions or through indirect calls.
STATIC ANALYSIS: STRING ANALYSIS
•String analysis.
• Extract string literals.
• Isolate human-readable strings within the code.
• Provide clues about functionality or purpose.
• Search for keywords.
• Look for specific strings that might indicate functionality.
• File paths, URLs, API calls, sensitive data.
• Identify encoding & obfuscation.
• Recognize techniques used to conceal strings.
• Encryption, encoding.
STATIC ANALYSIS: PATTERNS & SIGNATURE MATCHING
•Patterns & signature matching.
• Search for known patterns.
• Search for known code patterns or signatures that indicate specific functionality or vulnerabilities.
• Identify libraries & frameworks.
• Recognize usage of common libraries or frameworks.
• Understand the code's context & potential attack vectors.
• Detect malware traits.
• Look for patterns known to be associated with malware or malicious code.
• Create a file hash.
• Hash digest of a file can be used to identify it in a malware database.
• Can help identifying different versions of the same malware.
DYNAMIC ANALYSIS: PROCESS
•Dynamic analysis.
• Process of understating the runtime behavior of the program.
• Helping understanding how code operates & interacts with environment.
•Dynamic analysis steps:
• Environment setup & monitoring.
• Debugging, tracing & instrumentation.
• Memory analysis.
• Additional considerations.
DYNAMIC ANALYSIS: ENVIRONMENT SETUP & MONITORING
•Environment setup.
• VMs, sandboxes, emulators.
• Dynamic analysis must be carried out in a safe / controlled environment.
•Monitoring.
• Runtime behavior is carefully monitored.
• Process monitors (system activity), network analyzers (network activity), debuggers, memory analysis tools.
• Data collection:
• System interactions (filesystem, registry, network).
• Memory usage.
• Function & API calls.
• Execution flow.
DYNAMIC ANALYSIS: DEBUGGING & TRACING
•Debugging & tracing.
• Debuggers allow monitoring execution by setting breakpoints & stepping through the code.
• Pausing execution at specific points of interest.
• Examining memory, registers, variables, stacks.
• Understand program's state at specific point.
• Tracing code execution path instruction by instruction.
• In addition, kernel-level tracing & monitoring of system activities.
•Dynamic instrumentation.
• Allows tracking function calls & memory access.
• Code injection & hooking into running processes.
DYNAMIC ANALYSIS: MEMORY ANALYSIS (1)
•Memory analysis.
• Capture & analyze memory dumps during execution.
• Understand memory usage, potential vulnerabilities, encrypted/obfuscated data.
•Virtual Address Space (VAS).
• Processes are running in VAS.
• Abstraction of physical memory.
• Each VAS is mapped to physical memory.
• Managed by OS kernel.
• Utilizes pages and page table.
• VAS is divided into user space & kernel space.
• User privileges (ring 3) & kernel privileges (ring 0).
DYNAMIC ANALYSIS: MEMORY ANALYSIS (2)
•Memory analysis.
• Capture & analyze memory dumps during execution.
• Understand memory usage, potential vulnerabilities, encrypted/obfuscated data.
•Virtual Address Space (VAS) (cont.)
• User space is initially allocated for:
•Stack.
•Heap.
•Program.
•Dynamic libraries.
•Thread Environment Block (TEB).
•Process Environment Block (PEB).
• Further allocations triggered by malloc or VirtualAlloc.
DYNAMIC ANALYSIS: ADDITIONAL STEPS
•Additional steps.
• Post-execution differences.
• Comparing differences between snapshots taken before / after running executable.
• Monitoring system changes.
• File monitoring.
• Created, modified, or deleted files and directories.
• Registry monitoring.
• Created, updated, or deleted registry keys, values, data.
• Analysis of runtime artifacts.
• Generated logs, crash reports, any artifacts created during runtime.
OBFUSCATION: INTRO
•Obfuscation.
• Deliberately making code difficult to understand, hindering reverse engineering analysis.
• Anti-reverse engineering.
•Obfuscation goals: •Obfuscation impacts:
• Evasion. • Increased damage.
• Bypassing signature-based detection. • Longer operation time = more harm.
• Analysis delay. • Slower response times.
• Deciphering takes time & expertise. • Longer to develop effective countermeasures.
• Anti-tampering. • Higher vulnerability.
• Harder to reverse-engineer & modify. • Easier to bypass security = easier to attack.
• Confidentiality.
• Harder to trace origins & link back.
OBFUSCATION: CATEGORIES
•Obfuscation categories:
• Anti-static analysis techniques.
• Disassembly desynchronization.
• Target addresses obfuscation.
• Control flow obfuscation.
• Opcode obfuscation.
• Anti-dynamic analysis techniques.
• Detecting virtualization.
• Anti-sandboxing.
• Detecting instrumentation.
• Anti-monitoring, anti-tracing.
• Detecting/preventing debugging.
OBFUSCATION: ANTI-STATIC TECHNIQUES (1)
•Disassembly desynchronization.
• Misaligning instructions.
• Inserting seemingly harmless data bytes within the code.
• Utilizing conditional jumps that always/never execute.
• Employing indirect jumps that target instructions unconventionally.
•Target addresses obfuscation.
• Dynamically computed target addresses (DCTA).
• Addresses are computed during execution.
• Using complex math formulas/algorithms.
• Combining values from variables, registers, memory.
• Employing operations that make prediction difficult.
OBFUSCATION: ANTI-STATIC TECHNIQUES (2)
•Control flow obfuscation.
• Flattening.
• No nested conditionals / loops -> single loop controlled by switch, selecting from massive number of blocks.
• Jump table obfuscation.
• Jump table references different parts of program. Addresses calculated based on data or code execution.
• Dead code injection.
• Code is injected with harmless snippets that never execute.
• Unstructured control flow.
• Using unconventional jumps/branches.
• Bit manipulation to encode target addresses within variables or instructions.
OBFUSCATION: ANTI-STATIC TECHNIQUES (3)
•Opcode obfuscation.
• Encode/encrypt actual instructions when executables is created.
• Instructions must be de-obfuscated before executed.
• Portion of the code is unencrypted – startup routine responsible for de-obfuscation.
• Original code fed to obfuscator utility.
• Obfuscates original code/data sections.
• Adds deobfuscation stub.
• Deobfuscates code/data before runtime.
• Transfers control to the original entry point.
• Modifies header to redirect entry point to deobfuscation stub.
OBFUSCATION: ANTI-DYNAMIC TECHNIQUES (1)
•Detecting virtualization.
• Virtualization-specific software.
• Detecting hypervisors.
• Virtualization-specific hardware.
• Detecting virtualized hardware.
• Process-specific behavioral changes.
• Instructions behave differently in native vs virtualized environment.
•Detecting instrumentation.
• Check for running monitoring/tracing tools.
• Shutdown if present.
OBFUSCATION: ANTI-DYNAMIC TECHNIQUES (2)
•Detecting/preventing debugging.
• Query OS for running debugger.
• Through API calls.
• Check memory/processor artifacts for debugger presence.
• Processor debug flag set to 1.
• Hinder debugging process.
• Introducing spurious breakpoints.
• Clearing hardware breakpoints.
• Intentionally generating exceptions.
INSTRUMENTATION: INTRO
•Instrumentation.
• Process of inserting of code/probes into program & collecting of valuable runtime data.
•Key instrumentation techniques:
• Dynamic binary instrumentation (DBI).
• Inserting custom codes snippets (instrumentation) into program during runtime for monitoring/modification.
• Code injection.
• Tracing.
• Capturing/recording events/data during execution, providing valuable insights into behavior & interactions.
• System calls, handles, library calls.
INSTRUMENTATION: DBI
•Dynamic binary instrumentation (DBI).
• Capabilities:
• Log arguments & return values.
• What data enters/exits function.
• Modify specific instructions.
• Temporarily (during runtime) change behavior.
• Trigger custom actions.
• Custom code executed whenever function is called.
• Process:
• Load target program.
• Identify instrumentation points.
• Inject probes.
• Execute instrumented program.
• Collect & analyze data.
INSTRUMENTATION: TRACING
•Tracing.
• Classified by the type of event/resource being traced:
• Function calls tracing.
• Identifies when program calls specific functions.
• Reveals execution flow & dependencies.
• System calls tracing.
• Identifies system calls made by program.
• Reveals interactions with the OS, file system, network.
• Library & API calls tracing.
• Identifies calls to shared libraries & API calls.
• Reveals how program utilizes external data structures, functions, services.