Assembly Language Basics
Assembly language is a low-level programming language for a computer or
other programmable device specific to a particular computer architecture in
contrast to most high-level programming languages, which are generally
portable across multiple systems. Assembly language is converted into
executable machine code by a utility program referred to as an assembler like
NASM, MASM, etc.
For installation of NASM (make sure that you login as root)
1. RPM file – copy RPM file in Home. On terminal > rpm –ivh nasm-2.10.09-
3.x86_64.rpm
2. check nasm is installed or not > whereis nasm it will show the path if alrea
dy installed. Otherwise go to www.nasm.us
Download the the topmost directory Download the Linux source archive nasm-
X.XX.ta.gz
Unpack the archive into a directory which creates a subdirectory nasm-X. XX.
>tar xvzf nasm<version>tar.gz
>ls will get nasm<version> dir
>cd nasm<version>
>./configure. This shell script will find the best C compiler to use and set up
Makefiles accordingly.
>make to build the nasm
>make install to install nasm
An assembly program can be divided into three sections −
The data section,
The bss section, and
The text section.
The data Section
The data section is used for declaring initialized data or constants. This
data does not change at runtime. You can declare various constant values,
file names, or buffer size, etc., in this section.
The syntax for declaring data section is −
section.data
The bss Section
The bss section is used for declaring variables. The syntax for declaring bss
section is −
section.bss
The text section
The text section is used for keeping the actual code. This section must
begin with the declaration global _start, which tells the kernel where the
program execution begins.
The syntax for declaring text section is −
section.text
global _start
_start:
Comments
Assembly language comment begins with a semicolon (;). It may contain
any printable character including blank. It can appear on a line by itself, like
−
; This program displays a message on screen
or, on the same line along with an instruction, like −
add eax, ebx ; adds ebx to eax
Assembly Language Statements
Assembly language programs consist of three types of statements −
Executable instructions or instructions,
Assembler directives or pseudo-ops, and
Macros.
The executable instructions or simply instructions tell the processor
what to do. Each instruction consists of an operation code (opcode). Each
executable instruction generates one machine language instruction.
The assembler directives or pseudo-ops tell the assembler about the
various aspects of the assembly process. These are non-executable and do
not generate machine language instructions.
Macros are basically a text substitution mechanism.
Execution steps:
nasm –f elf64 hello.asm //-f for create file format, elf64 executable
linkable file format
ld –o hello hello.o // To link the object file and create an executable file
named hello
./hello //Execute the program
Code for hello world:
segment .text ;code segment
global_start ;must be declared for linker
_start: ;tell linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
segment .data ;data segment
msg db 'Hello, world!',0xa ;our dear string
len equ $ - msg ;length of our dear string
Memory Segments
A segmented memory model divides the system memory into groups of
independent segments referenced by pointers located in the segment
registers. Each segment is used to contain a specific type of data. One
segment is used to contain instruction codes, another segment stores the
data elements, and a third segment keeps the program stack.
In the light of the above discussion, we can specify various memory
segments as −
Data segment − It is represented by .data section and the .bss. The .data
section is used to declare the memory region, where data elements are stored
for the program. This section cannot be expanded after the data elements are
declared, and it remains static throughout the program.
The .bss section is also a static memory section that contains buffers for data to
be declared later in the program. This buffer memory is zero-filled.
Code segment − It is represented by .text section. This defines an area in
memory that stores the instruction codes. This is also a fixed area.
Stack − This segment contains data values passed to functions and procedures
within the program.