[go: up one dir, main page]

0% found this document useful (0 votes)
94 views8 pages

Lovely Professional University: Phagwara Punjab

This document discusses a disassembler, which is a program that translates machine language into assembly language. It describes the code analysis, design, implementation, and tools used for building disassemblers. The implementation has two main phases - the first pass identifies basic blocks of code, and the second phase generates assembly language instructions from the binary code through symbol generation and instruction matching. Commercial and open source disassembler tools are also listed.

Uploaded by

ashu12318
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views8 pages

Lovely Professional University: Phagwara Punjab

This document discusses a disassembler, which is a program that translates machine language into assembly language. It describes the code analysis, design, implementation, and tools used for building disassemblers. The implementation has two main phases - the first pass identifies basic blocks of code, and the second phase generates assembly language instructions from the binary code through symbol generation and instruction matching. Commercial and open source disassembler tools are also listed.

Uploaded by

ashu12318
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

LOVELY PROFESSIONAL UNIVERSITY

PHAGWARA
PUNJAB

TERM PAPER

SYSTEM SOFTWARE

CSE-318

TOPIC

DISASSEMBLER

Submitted To: Submitted By:

Miss. Aparajita Ashutosh Singh

RB1805A10
10806909
INDEX
 INTRODUCTION
 CODE ANALYSIS
1. SYMBOL TABLE
2. INSTRUCTION SET DECODER
 DESIGN OF DISASSEMBLER
 IMPLEMENTATION OF DISASSEMBLER
1. FIRST PASS OF DISASSEMBLY
2. SECOND PHASE OF DISASSEMBLY
 TOOLS FOR BUILDING DISASSEMBLERS
ABSTRACT
The project deals with problem what a disassemble is ? , how it works ,its code analysis ,various
databases used , design of disassemble, its implementation, various passes of disassemble and the
tools that are used to build a disassemble and how to disassemble the binary code into symbolic
instructions . The purpose of this work is to create the object oriented design of a program and to
implement the design.

INTRODUCTION
Def

A disassembler is a computer program that translates machine language into assembly language
—the inverse operation to that of an assembler. Disassembly, the output of a disassembler, is
often formatted for human-readability rather than suitability for input to an assembler, making it
principally a reverse-engineering tool.

The greatest problem with disassembling is determining what is code (instructions) and what is
data, as both are represented in the same way in current machines. Further, disassembly is
equivalent to the Halting Problem and hence cannot be fully automated for all input programs.

By compilation of a program written in programming language we get a binary code for


processor. Sometimes it is a binary code to be processed by virtual processors (e.g. Java Virtual
Machine). Reverse program translation is the translation of binary code into a code readable by a
human. The human readable format could be symbolic instruction language (the lowest level of
reverse program translation) or some kind of programming language (the highest level of reverse
translation). The program used for reverse translation of binary code to language of symbolic
instruction is called disassembler.

CODE ANALYSIS
Compiled program is saved into executable file. There are several different formats of executable
files. Some of them are usable only for some operating systems. Generally in every executable
file there are several sections. Some sections contain instructions, some contain data, constant
data etc. In disassembler it is important to distinguish two types of sections. The first type is data
section, the second is executable section—section containing instructions for processor. Data
section is disassembled into simple output of its content, which can be either in hexadecimal text
format or in binary format.

DESIGN OF DISASSEMBLER
Disassembler consists of three main parts. The first part solves the access to the sections of input
file, the second part is a symbol table and the last one deals with instruction decoding according
to instruction sets.
1) SYMBOL TABLE
Disassembler consists of three main parts. The first part solves the access to the sections
of input file, the second part is a symbol table and the last one deals with instruction
decoding according to instruction sets.

2) INSTRUCTION SET DECODER


For reverse translation it is necessary to know instruction set which is needed for decoding of
instructions. This set is composed of many tables and decoding variables. Decoding process
starts by decoding the first byte of executable section. Every byte of sections is used as index
into the table. This way we gain an decoding expression. This expression may contain references
to other tables indexed by next bytes and references to values of the decoding variables.
Instruction set decoder proccess this expression according to the next bytes of executable section
and the result of this expression will be a symbolic instruction.

INSTRUCTION SET DESCRIPTION LANGUAGE

Instruction sets are described in text files. The instruction set language is described by following
rewriting rules. ident is an identifier following the definition of identifier in C language. char is
list of any character except comma, quotation marks and these control characters: $,{,},(,).
number is a number in decimal or hexadecimal format, letter is one letter of the English alphabet
(a–z).

MAIN -> VARIABLE MAIN


MAIN -> TABLE MAIN
MAIN -> $
VARIABLE -> ident = “ char ” ;
TABLE -> = { TABLEITEMS }
TABLEITEMS ->TABLEITEM ; TABLEITEMS
TABLEITEMS -> e
TABLEITEM -> number , number , TEXT , LENGTH

TEXT -> “ ITEMTEXT ”


ITEMTEXT -> $ { ITEMTEXT } ITEMTEXT
ITEMTEXT -> char ITEMTEXT
ITEMTEXT -> $ ( ITEMTEXT , INDEX ) ITEMTEXT
ITEMTEXT -> e
LENGTH -> number LENGTHS
LENGTH -> letter LENGTHS
LENGTHS -> + LENGTH
LENGTHS -> e
IMPLEMENTATION OF DISASSEMBLER

The disassemble proceeds in following phases:

Initialization phase

Disassemble takes ELF binary file and processor specification in the IR as input. The
disassemble does the following tasks in this phase:

 It identifies the data encoding of the host processor using algorithms.


 It checks the integrity of the IR file by looking for the "META TABLE" (table of
contents) at the start.
 It reads the meta table entry and detects the encoding used in the IR file .
 The disassemble then extracts the information required for disassembly from IR file.
 Lastly ,it reads in the information from the binary file to be used for future access.this
information which includes things like ELF header ,symbol table etc..,is held in
appropriate data structure so that all the required information is easily available.

FIRST PASS OF DISASSEMBLY


In the first pass of disassembly, all basic blocks of the code are identified .the identification
process is based on the assumption that there must always be some ways to reach the code. If
there is no such path ,it shall never get executed and hence we need not to worry about it .Since
the ELF binary file contain the name of the functions in the symbol table, these are taken as the
basic blocks in the bgningning. The algorithm then proceed to track each one of these one by
one in order to discover all possible program paths.

The Information gathered in the first pass is stored for use in second pass. The disassembler
maintains a list of pairs. Each association consists of one entry point in the text section and the
corresponding name by which it is referred. The list is build up during this pass.

SECOND PHASE OF DISASSEMBLY


The objective of this phase is to generate the assembly language instruction from their binary
counter part. since the adderress ranges of valid code have been identified in the first pass, we
only need to disassemble the instructions in each address range.
The instruction disassembly is carried out in the following steps:

SYMBOL GENERATION
To perform the symbolic disassembly ,at the begning of the disassembly of an instruction,it is
checked whether a symbol is associated with the adders of the current instruction and if so,the
symbol table is also extracted.

INSTRUCTION GENERATION
An instruction is matched using the instruction matching algorithm and corresponding assembly
language instruction is output with appropriate parameters.

Various commercial, shareware and freeware disassemblers, and tools for building
disassemblers.

 IDA Pro. Generally agreed as the most powerful disassembler.

 Sourcer by V Communications. Also a very powerful disassembler.

 BDASM by Manuel Jiménez. A relatively new disassembler.

 Borg is a freeware disassembler for Windows 32-bit binaries.

Sourcer Disassembler

Sourcer is a commercial program, for disassembling x86 binaries (EXE, NE and PE). Sourcer
8.0, includes the BIOS Preprocessor and Windows Source (which were separate products
earlier). Sourcer does a good job at automatically detecting code and data fragments. More
information can be found on their web page. Used together with Windows Source it produces
lots and lots of additional information. Windows Source can extract information from .SYM,
Codeview or .DBG files; there is full codeview support.
PEBrowse Professional :Windows Disassembler

With the PEBrowse disassembler, one can open and examine any executable without the need to
have it loaded as part of an active process with a debugger. Applications, system DLLs, device-
drivers and Microsoft .NET assemblies are all candidates for offline analysis using either
PEBrowse programs. The information is organized in a convenient treeview index with the
major divisions of the PE file displayed as nodes. In most cases selecting nodes will enable
context-sensitive multiple view menu options, including binary dump, section detail,
disassembly and structure options as well as displaying sub-items, such as optional header
directory entries or exported functions, that can be found as part of a PE file unit. Several table
displays, hex/ASCII equivalents, window messages and error codes, as well as a calculator and
scratchpads are accessible from the main menu (PEBrowse Professional only).

While the binary dump display offers various display options, e.g., BYTE, WORD, or DWORD
alignment, the greatest value of PEBrowse comes when one disassembles an entry-point.

PEBrowse Professional will decompile type library information either embedded inside of the
binary as the resource "TYPELIB" or inside of individual type libraries, i.e., .TLB or .OLB files.
PEBrowse Professional and PEBrowse64 Professional also display all metadata for .NET
assemblies and displays IL (Intermediate Language) for .NET methods. They seamlessly handle
mixed assemblies, i.e., those that contain both native and managed code. Finally, PEBrowse can
be employed as a file browse utility for any type of file with the restriction that the file must be
small enough that it can be memory-mapped.

Win 32 Program Disassembler

Win32 Program Disassembler is a straight line disassembler of Windows 32-bit executables (i.e.
PE) by Sang Cho from South Korea. The program works in console mode (no graphical
interface) and uses the following command line option: disassem yourfile.exe > yourfile.txt

Win32PD appears to understand switch statements as it does not get tripped up by the pointers. It
also decodes Win32 API calls. No disassembly of the data section is done, but string statements
are emitted where appropriate. This disassembler does not support symbols in PE files.
References:

 http://en.wikipedia.org/wiki/Disassembler
 http://msdn.microsoft.com/en-us/library/f7dy01k1(VS.80).aspx
 http://blog.llvm.org/2010/01/x86-disassembler.html
 http://maximelabelle.wordpress.com/2010/11/04/custom-schema-resolve-disassembler
%C2%A0implementation

You might also like