Module 1
Module 1
MODULE 1
INTRODUCTION TO DATA STRUCTURES: Data Structures, Classifications
(Primitive
& Non-Primitive), Data structure Operations
Review of pointers and dynamic Memory Allocation
ARRAYS and STRUCTURES: Arrays, Dynamic Allocated Arrays, Structures and
Unions,
Polynomials, Sparse Matrices, representation of Multidimensional Arrays, Strings
STACKS: Stacks, Stacks Using Dynamic Arrays, Evaluation and conversion of
Expressions
INTRODUCTION TO DATA STRUCTURES
BASIC TERMINOLOGY
Data: Data are simply values or sets of values that convey no meaning. For example, numbers,
names, marks etc are the data. Consider the two data items “Rama” and 10. The data “Rama”
and 10 does not convey any meaning to the reader.
Information: Information is defined as a collection of data from which conclusions may be
drawn.
“Rama scored 10 marks”. This sentence is an information. Because, it conveys meaning to the
reader.
The instructions given to a person or commands issued to the computer are considered as
information.
Entity: An entity can be defined as a person, place, thing or concept about which data can be
collected and stored. An entity is made up of a number of attributes which represent that entity.
An attribute is the one which describes the facts, details or characteristics of an entity.
For example, if we collect name, usn, phone number, branch of engineering of a particular
student, then student is treated as an entity whereas name, usn, phone number, branch are the
attributes.
Entity set: A collection of entities with similar attributes is an entity set. For example, all the
students in a college represent entity set.
Field:A field is a single elementary unit of information representing an attribute of an entity.
Record: A record is a collection of field values of a given entity.
File: A file is a collection of records of the entities in a given entity set.
Data Structures: The study of how the data is collected and stored in the memory, how
efficiently the data is organized in the memory, how efficiently the data can be retrieved and
manipulated, and the possible ways in which different data items are logically related is called
data structures.
Data Structures can be divided into two categories,
i) Primitive Data Structures
ii) Non-Primitive Data Structures
The data structures that can be manipulated directly by machine instructions are called
primitive data structures. Thus, the primitive data structures are fundamental data types that
are supported by a programming language. These data types consists of characters that cannot
be divided and hence they also called simple data types.
Example: Integers, Floating Point Numbers, Characters and Pointers etc.
The data structures that cannot be manipulated directly by machine instructions are called non-
primitive data structures. Thus, the non-primitive data structures are created or constructed
using primitive data structures. The nonprimitive data structures are also called compound data
structures.Example: Arrays, Lists and Files, Graphs, trees etc.
Based on the structure and arrangement of data, non-primitive data structures is
Further classified into
1. Linear Data Structure
2. Non-linear Data Structure
The data structure where its values or elements are stored in a sequential or linear order is called
linear data structure. The elements can be accessed sequentially one after the other in a linear
order and hence the name. The linear relationship between the elements is maintained:
• by means of sequential memory locations as in arrays: Here, one or more elements
are stored one after the other sequentially.
• by means of links as in linked list: Here, each element is stored in a node. One or more
nodes are logically adjacent and the logical adjacency is maintained using links. In
linked lists, even though the nodes are not physically adjacent, they seem to be adjacent
logically because of links. The logical adjacency is maintained using pointers.
For example, arrays, stacks, queues, linked lists etc., are all linear data structures.
The data structure where its values or elements are not stored in a sequential or linear order is
called non-linear data structures. Unlike linear data structures, here, the logical adjacency
between the elements is not maintained and hence elements cannot be accessed if we go in
sequential order.
For example, graphs, trees, files are all non-linear data structures.
if ptr is NULL appropriate message has to be displayed using the following statement:
if (ptr == NULL)
{
printf(“Insufficient memory\n”);
return;
}
Pointers are Dangerous
• In many computers, the size of int data type and size of pointer is same. So, a valid
integer value can be interpreted as a pointer which is very serious error. For example,
when a function returns an integer value, the returned value can be interpreted as a
pointer. This is very serious error.
Arrays in C:
A one-dimensional array can be declared as follows:
int list[5]; //array of 5 integers
int *plist[5]; //array of 5 pointers to integers
Program to print both address of ith element of given array & the value found at that
address:
ptr+i can be replaced by &ptr[i] also.
So, *(ptr+i) is equal to ptr[i].
The above code would allocate an array of exactly the required size and hence would not result
in any wastage.
REALLOC
• These functions resize memory previously allocated by either malloc or calloc. For
example,
realloc(p,s); //this changes the size of memory-block pointed at by p to s
• When s< oldSize, the rightmost oldSize-s bytes of old block are freed.
• When s>oldSize, the additional s-oldSize have an unspecified value.
• On successful resizing, it returns a pointer to the start of the new block. On failure, it returns
the value NULL.
• To create clean and readable programs, the REALLOC macro can be created as shown below:
strcpy(person.name,"james") ;
Person.age=20;
person.salary = 35000;
➢ We can create our own structure data types by using the typedef statement as below:
humanBeing person1,person2;
➢ Structures cannot be directly checked for equality or inequality. So, we can write a function
to do this.
PRG: Function to check equality of structures
int humansequal(humanbeing personl,humanbeing person2)
{ /* being otherwise return FALSE
if (strcmp(personl.name, person2.name)) return FALSE;
if (personl.age != person2.age) return FALSE;
if (personl.salary != person2.salary) return FALSE;
return TRUE;
return TRUE if personl and person2 are the same human
*/
}
if (humansequal(personl,person2))
printf("The two human beings are the same\n");
else
printf{"The two human beings are not the same\n");
typedef struct {
int month;
int day;
int year; } date;
typedef struct humanbeing {
char name[10];
int age;
float salary;
date dob;
};
humanbeing person1;
➢ A person born on February 11, 1944, would have the values for the date struct set as:
personl.dob.month = 2;
personl.dob.day = 11;
personl.dob.year = 1944;
Unions
➢ This is similar to a structure, but the fields of a union must share their memory space. This
means that only one field of the union is "active" at any given time.
typedef struct sextype {
enum tagfield {female, male
} sex;
union {
int children;
int beard ;
} u;
};
typedef struct humanbeing {
char name[10];
int age;
float salary;
date dob;
sextype sexinfo;
};
humanbeing personl, person2;
➢ We could assign values to person1 and person2 as:
personl.sexinfo.sex = male;
personl.sexinfo.u.beard = FALSE;
and
person2.sexinfo.sex = female;
person2.sexinfo.u.children - 4;
➢ we first place a value in the tag field. This allows us to determine which field in the union
is active. We then place a value in the appropriate field of the union.
Internal Implementation Of Structures
• The size of an object of a struct or union type is the amount of storage necessary to represent
the largest component, including any padding that may be required.
• Structures must begin and end on the same type of memory boundary, for example, an even
byte boundary or an address that is a multiple of 4, 8, or 16.
Self-Referential Structures
• A self-referential structure is one in which one or more of its components is a pointer to itself.
• These require dynamic storage management routines (malloc & free) to explicitly obtain and
release memory.
typedef struct list
{
char data;
list *link; //link is a pointer to a list structure
};
• Consider three structures and values assigned to their respective fields:
list item1,item2,item3;
item1.data='a';
item2.data='b';
item3.data='c';
item1.link=item2.link=item3.link=NULL;
• We can attach these structures together as follows
item1.link=&item2;
item2.link=&item3;
POLYNOMIALS
Definition: A polynomial is a mathematical expression consisting of sum of terms
where each term is made up of a coefficient multiplied by a variables raised to a
power.
Ex: A polynomial consisting of only one variable is shown below:
x4+ 10 x3+ 3x2+ 1
Each term consists of a co-efficient multiplied by a variable raised to a power. So, each term
can be represented by a structure consisting of 2 fields namely:coefficient and exponent.
To represent the polynomial 2x3+8x2+5
A[0]
A[1]
A[2]
• If exponents of current terms from a and b are equal, add their coefficients into c.
• If exponent of term in a is greater, copy term from a to c.
• If exponent of term in b is greater, copy term from b to c.
• Increment indices accordingly.
3 After merging, if there are any remaining terms in either a or b, copy them directly into
c.
Program:
if (a[i][1] == b[j][1]) {
} else {
while (i < m) {
while (j < n) {
printf("After addition:\n");
SPARSE MATRICES
A sparse matrix is a matrix that has very few non-zero elements spread out thinly.
Conversely, a matrix which has more number of zero elements (or high proportion of zeros
elements is called sparse matrix. The sparse matrix can be single dimensional or
multidimensional such as 2-dimensional, 3-dimensional and so on.
In the first matrix, 10 non-zero elements are present and 2 zero elements. The
number of non-zero elements are more than the number of zero elements. So, it is
In the second matrix only 8 non-zero elements are present and 12 zero elements
are present. So, the number of non-zero elements are less than the number of zero
The sparse matrix contains many zeroes. If we are manipulating only non-zero values, then
we are wasting the memory space by storing unnecessary zero values. This disadvantage can
be overcome by storing only non-zero values thus saving the space.
Sparse Matrix Representation:
We know that any element in the matrix can be uniquely defined using the triples
<row, col, val>. Thus, a sparse matrix can be created using the array of triples as
shown below:
#define MAX_TERMS 101 /* Maximum number of terms */
typdef struct
{
int row;
int col;
int val;
} TERM;
/* 1- dimensional array representing array of triples <row, col, val> */
TERM a[MAX_TERMS];
How do you represent the following spare matrix using triples in a single dimensional array?
The non-zero elements in the above matrix along with row and col position can be
represented using a single dimensional array starting from a[1] as shown below:
Transpose of a matrix
A matrix which is obtained by changing row elements into column elements and column
elements into row elements is called transpose of a matrix.
Initially, the size of the matrix must be reversed. This can be done by interchanging rows and
columns as shown below:
b[0].row = a[0].col
b[0].col = a[0].row
b[0].val = a[0].val
We find the transpose of given matrix by accessing col index of each non-zero element in array
a and store in array b as shown below:
b[k].row = a[j].col // make column as row
b[k].col = a[j].row // make row as column
b[k].val = a[j].val
k++; // to add the next entry
To obtain the index j, we start searching for 0th column, 1st column, 2nd column, 3rdcolumn
and 4th column from a[1].col to a[8].col. The corresponding code can be written as shown
below:
• Obtain the position of the null character of str1. This can be done using the following
statements:
i = 0;
while (str1[i] != ‘\0’) i++;
• zero if s1 = s2
• positive if s1 > s2
• negative if s1 < s2
5) strrev(str) – string reverse
we see the built in function strrev() and we write one user defined function without using any
of the built-in functions. It is defined in “string.h”.
Syntax:
void strrev (char str[])
This function reverses all characters in the string str except the terminating
NULLcharacter ‘\0’. The original string is lost.
C function that shows implementation of strrev()
void my_strrev (char src[], char dest[])
{
int i; /* Used as index to read characters from src */
int n; /* Holds the string length */
n = strlen(src); /* Compute the length of the string */
for (i = 0; i < n; i++) /* Reverse the given string */
{
dest[n-1-i] = src[i];
}
dest[n] = ‘\0’; /* NULL terminate the string */
}
Pattern matching algorithms
Given a string called pattern p with n characters and another string called text with m characters
where n <= m. It is required to search for the pattern string p in the text string t. If search is
successful return the position of the first occurrence of pattern string p in the text string t.
Otherwise, return -1. This process of searching for a pattern string in a given text string is called
pattern matching.
The easiest way to determine if pat is in string is to use the built-in function strstr.
The call (t = strstr(string,pat)) returns a null pointer if pat is not in string.
If pat is in string, t holds a pointer to the start of pat in string. The entire string beginning at
position t is printed out.
Although strstr seems ideally suited to pattern matching, there are two reasons why we may
want to develop our own pattern matching function:
• The function strstr is new to ANSI C. Therefore, it may not be available with the Compiler
we are using.
• There are several different methods for implementing a pattern matching function. The easiest
but least efficient method sequentially examines each character of the string until it finds the
pattern or it reaches the end of the string. If pat is not in string, this method has a computing
time of O(n . m) where n is the length of pat and m is the length of string.
Knuth, Morris, Pratt Pattern Matching algorithm.(KMP Algorithm)
Proper suffix and prefix for the string ABABD
The algorithm for pattern matching using KMP method?” The two steps to be followed using
this technique are:
Step 1: Create a failure function for the pattern to be searched
Step 2: Search for the pattern using the above failure function
Create a failure function for the pattern to be searched:For obtaining the
failure function for the pattern “ABABD”
Step 1: Given the pattern string “ABABD” obtain the proper prefixes and proper
suffixes for the strings “A”, “AB”, “ABA”, “ABAB” and “ABABD”. Refer
columns 1, 2 and 3 in the table in the next page.
Step 2: For every substring in step 1, obtain a longest proper prefix which is same
as the longest proper suffix.
Step 3: Find the length of the longest proper prefix which is same as the longest
proper suffix
Failure function:
As shown in above figure, the elements are added in the stack in the order A, B, C, D, E, then
E is the first element that is deleted from the stack and the last element is deleted from stack is
A. Figure illustrates this sequence of operations.
Since the last element inserted into a stack is the first element removed, a stack is also known
as a Last-In-First-Out (LIFO) list.
SYSTEM STACK
A stack used by a program at run-time to process function-calls is called system-stack.
• When functions are invoked, programs
→ create a stack-frame (or activation-record) &
→ place the stack-frame on top of system-stack
• Initially, stack-frame for invoked-function contains only
→ pointer to previous stack-frame &
→ return-address
• The previous stack-frame pointer points to the stack-frame of the invoking-function while
return-address contains the location of the statement to be executed after the function
terminates.
• If one function invokes another function, local variables and parameters of the invoking
function are added to its stack-frame.
• A new stack-frame is then
→ created for the invoked-function &
→ placed on top of the system-stack
• When this function terminates, its stack-frame is removed (and processing of the
invokingfunction, which is again on top of the stack, continues).
• Frame-pointer(fp) is a pointer to the current stack-frame.
Stack ADT
• The following operations make a stack an ADT. For simplicity, assume the data is an integer
type.
• Main stack operations
– Push (int data): Inserts data onto stack.
– int Pop(): Removes and returns the last inserted element from the stack.
• Auxiliary stack operations
– int Top(): Returns the last inserted element without removing it.
– int Size(): Returns the number of elements stored in the stack.
– int IsEmptyStack(): Indicates whether any elements are stored in the stack or not.
– int IsFullStack(): Indicates whether the stack is full or not.
• The easiest way to implement this ADT is by using a one-dimensional array, say, stack [MAX-
STACK-SIZE], where MAX STACK SIZE is the maximum number of entries.
• The first, or bottom, element of the stack is stored in stack[0], the second in stack[1] and the
ith in stack [i-1].
• Associated with the array is a variable, top, which points to the top element in the stack.
Initially, top is set to -1 to denote an empty stack.
• we have specified that element is a structure that consists of only a key field
1. CREATE STACK:
The element which is used to insert or delete is specified as a structure that consists of
only a key field.
1. Boolean IsEmpty(Stack)::= top < 0;
2. Boolean IsFull(Stack)::= top >= MAX_STACK_SIZE-1;
The IsEmpty and IsFull operations are simple, and is implemented directly in the
program push and pop functions. Each of these functions assumes that the variables
stack and top are global.
Insert an item into stack(Push operation)
C function to insert an integer item (by passing parameters)
Infix Expression: In this expression, the binary operator is placed in-between the operand. The
expression can be parenthesized or un- parenthesized.
Example: A + B
Here, A & B are operands and + is operand
Prefix or Polish Expression: In this expression, the operator appears before its operand.
Example: + A B
Here, A & B are operands and + is operand
Postfix or Reverse Polish Expression: In this expression, the operator appears after its
operand.
Example: A B +
Here, A & B are operands and + is operand
Precedence of the operators
The rules that determine the order in which different operators are evaluated are called
precedence rules or precedence of operators.
Highest precedence operators will have the highest value and lowest precedence operators will
have the least value. The table below shows arithmetic operators along with priority values.
The symbol ‘$’ or ‘^’ is considered as the operator to perform exponentiation and should be
given first precedence since it has higher priority value. The addition or subtraction is given
the least precedence since it has least priority value as shown in above table.
During evaluation, if two or more operators have the same precedence, then precedence rules
are not applicable. Instead, we go for associativity of the operators. The order in which the
operators with same precedence are evaluated in an expression is called associativity of the
operator. In such case, precedence rules are not considered.
In an expression, if two or more operators have the same priority and are evaluated from left-
to-right, then the operators are called left associative operators. The process of evaluating from
left to right is called left associativity.eg. The operator “+” is left associative.
In an expression, if two or more operators have the same priority and if they are evaluated from
right-to-left, then the operators are called right associative
Operators.Eg. the operator “$” is right-to-left associative (also called right associative).
Conversion from infix to postfix
1. Obtain the postfix expression for ( ( A + ( B – C ) * D ) ^ E + F )
2. Obtain the postfix expression for X ^ Y ^ Z – M + N + P / Q
To convert a given infix expression to its equivalent postfix expression
PROGRAM:
#include <ctype.h>
#include <stdio.h>
#define SIZE 50 /* Size of Stack */
char s[SIZE]; /* Global declarations */
int top = -1;
push(char elem) /* Function for PUSH operation */
{
s[++top] = elem;
}
char pop() /* Function for POP operation */
{
return (s[top--]);
}
int pr(char elem) /* Function for precedence */
{
switch (elem)
{
case '#': return 0;
case '(': return 1;
case '+':
case '-': return 2;
case '*':
case '/':
case '%': return 3;
case '^': return 4;
}
}
void main() /* Main Program */
{
char infx[50], pofx[50], ch, elem;
int i = 0, k = 0;
printf("\n\n Enter the Infix Expression : ");
scanf(“%s”,infx);
push('#');
while ((ch = infx[i++]) != '\0')
{
if (ch == '(')
push(ch);
else if (isalnum(ch))
pofx[k++] = ch;
else if (ch == ')')
{
while (s[top] != '(')
pofx[k++] = pop();
elem = pop(); /* Remove ( */
}
else /* Operator */
{
while (pr(s[top]) >= pr(ch))
pofx[k++] = pop();
push(ch);
}
}
while (s[top] != '#') /* Pop from stack till empty */
pofx[k++] = pop();
pofx[k] = '\0'; /* Make pofx as valid string */
printf("\n\n Given Infix Expn is: %s\n The Postfix Expn is: %s\n", infx, pofx);
}
Evaluation of Postfix expression
The evaluation of infix expression is not recommended because of the following reasons: