[go: up one dir, main page]

0% found this document useful (0 votes)
1K views23 pages

1 File Structure & Organization

File organization refers to how records in a file are arranged and accessed. There are several methods of file organization including sequential, heap, hash, and B+ tree. The hash file organization uses a hash function to map search keys to record addresses, allowing direct access without an index. Dynamic hashing allows data buckets to grow and shrink as the number of records changes.

Uploaded by

Sufi khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views23 pages

1 File Structure & Organization

File organization refers to how records in a file are arranged and accessed. There are several methods of file organization including sequential, heap, hash, and B+ tree. The hash file organization uses a hash function to map search keys to record addresses, allowing direct access without an index. Dynamic hashing allows data buckets to grow and shrink as the number of records changes.

Uploaded by

Sufi khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Chapter 1

. 1 File structure and organization


Introduction:-
a file organization is a method of arranging the
record in a flie
a file can be accessed or modify in different way for e.g sort
file
What is File?
This data has to be arranged in a proper way to accept
,process and communicate operation and result.
* File Structure within term
a) character or byte
a bit is the smallest unit of data representation (value of
a bit may be a 0 or 1)
1 character = 1 byte
b) Data Item
one or more character combined may from a data item
it is used to describe an attribute of an object
c)Record
The data items related to an object or entity are group
into record
Record can also be define as a set of logically related
fields. the record are two types
1)fixed length record
2) Variable length record
d) File
a file is set of logically related record almost all
information stored in computer must be file there are many
different files data file, text file, program files
Logical file
logical files is a file viewed in terms of what data items contains its record and
what processing operation may be performed on the file
physical file
a file viewed in terms of how the data is stored on a storage device and
how the processing operation are possible

Application program

O.S

Secondary Storage Device


*Basic File Operation*
what is basic file operation
1) open file operation –
we can open the existing physical file the program pointer is positioned at the beginning of the files and stored content are not changed
e.g picking up a phone to place call

2) Close file operation


operating system close a file when program terminates normally
closing a file is most important for the protection of data
3)Reading and Writing file
reading and writing are fundamental
operation performed on any file we can open a new
file in write mode or an existing file in append mode
in read operation content of a file taken into
the buffer area from secondary storage device and than
buffer to user working area of RAM.6
4)Seeking file operation
we through the file sequentially i.e reading one
byte after another we reach the end of file
The action of moving directly to a certain position
in a file is called seeking
• File – A file is named collection of related
information that is recorded on secondary
storage such as magnetic disks, magnetic tables
and optical disks. 
• What is File Organization? 
File Organization refers to the logical
relationships among various records that
constitute the file, particularly with respect to the
means of identification and access to any specific
record. In simple terms, Storing the files in
certain order is called file Organization.
•  File Structure refers to the format of the label
and data blocks and of any logical control record. 
• Types of File Organizations –
Sequential File Organization
• Heap File Organization 
• Hash File Organization 
• B+ Tree File Organization
•  Clustered File Organization
• Sequential File Organization –
• The easiest method for file Organization is Sequential
method. In this method the file are stored one after
another in a sequential manner. There are two ways to
implement this method:  
• File Method – This method is quite simple, in which we
store the records in a sequence i.e one after other in
the order in which they are inserted into the tables.
Insertion of new record – 
Let the R1, R3 and so on upto R5 and R4 be four records
in the sequence. Here, records are nothing but a row in
any table. Suppose a new record R2 has to be inserted in
the sequence, then it is simply placed at the end of the
file. 
Sorted File Method –In this method, As the
name itself suggest whenever a new record has
to be inserted, it is always inserted in a sorted
(ascending or descending) manner. Sorting of
records may be based on any primary key or any
other key. 
• Hashing
In a huge database structure, it is very inefficient
to search all the index values and reach the desired
data. Hashing technique is used to calculate the
direct location of a data record on the disk without
using index structure.
• Heap File Organization –
• Heap File Organization works with data blocks.
In this method records are inserted at the end
of the file, into the data blocks. No Sorting or
Ordering is required in this method. If a data
block is full, the new record is stored in some
other block, Here the other data block need not
be the very next data block, but it can be any
block in the memory. the new records.
•  
Hashing 
is an efficient technique to directly search the location of desired data on
the disk without using index structure. Data is stored at the data blocks whose
address is generated by using hash function. The memory location where these
records are stored is called as data block or data bucket.
• Hash File Organization :
• Data bucket – Data buckets are the memory locations where the records
are stored. These buckets are also considered as Unit Of Storage.
• Hash Function – Hash function is a mapping function that maps all the set
of search keys to actual record address. Generally, hash function uses
primary key to generate the hash index – address of the data block. Hash
function can be simple mathematical function to any complex mathematical
function.
• Hash Index-The prefix of an entire hash value is taken as a hash index. Every
hash index has a depth value to signify how many bits are used for
computing a hash function. These bits can address 2n buckets.
Below given diagram clearly depicts how hash function work:
• Static Hashing –
• In static hashing, when a search-key value is provided, the hash function always
computes the same address. For example, if we want to generate address for
STUDENT_ID = 76 using mod (5) hash function, it always result in the same bucket
address 4.  There will not be any changes to the bucket address here. Hence number
of data buckets in the memory for this static hashing .
• Operations –
• Insertion – When a new record is inserted into the table, The hash function h
generate a bucket address for the new record based on its hash key K.
Bucket address = h(K)
• Searching – When a record needs to be searched, The same hash function is used to
retrieve the bucket address for the record. For Example, if we want to retrieve whole
record for ID 76, and if the hash function is mod (5) on that ID, the bucket address
generated would be 4.
• Deletion – If we want to delete a record, Using the hash function we will first fetch
the record which is supposed to be deleted.  Then we will remove the records for
that address in memory.
• Updation – The data record that needs to be updated is first searched using hash
function, and then the data record is updated.
If we want to insert some new records into the file But the data bucket address
generated by the hash function is not empty or the data already exists in that address.
This becomes a critical situation to handle.  This situation in the static hashing is
called bucket overflow.
• Open Hashing –
In Open hashing method, next available data block is used
to enter the new record, instead of overwriting older one. This
method is also called  linear probing.For example, D3 is a new
record which needs to be inserted , the hash function generates
address as 105. But it is already full. So the system searches
next available data bucket, 123 and assigns D3 to it.
Closed hashing –
In Closed hashing method, a new data bucket is allocated with
same address and is linked it after the full data bucket. This method
is also known as  overflow chaining.
For example, we have to insert a new record D3 into the tables. The
static hash function generates the data bucket address as 105. But
this bucket is full to store the new data. In this case is a new data
bucket is added at the end of 105 data bucket and is linked to it.
Then new record D3 is inserted into the new bucket.
• Dynamic Hashing –
• The drawback of static hashing is that that it
does not expand or shrink dynamically as the
size of the database grows or shrinks.  In
Dynamic hashing, data buckets grows or
shrinks (added or removed dynamically) as the
records increases or decreases. Dynamic
hashing is also known as extended hashing.
• B+ Tree File Organization –
It uses a tree like structure to store records in
File. It uses the concept of Key indexing where the
primary key is used to sort the records. For each
primary key, an index value .
• B+ Tree is very much similar to binary search tree,
with the only difference that instead of just two
children, it can have more than two. All the
information is stored in leaf node and the
intermediate nodes acts as pointer to the leaf
• node.
•  
Cluster File Organization –
In cluster file organization, two or more related tables/records are stored
withing same file known as clusters. These files will have two or more tables in the
same data block and the key attributes which are used to map these table together
are stored only once. 
Thus it lowers the cost of searching and retrieving various records in different files as
they are now combined and kept in a single cluster. 
For example we have two tables or relation Employee and Department. These table
are related to each other. 
• What is Indexing?
• Indexing is a data structure technique which allows
you to quickly retrieve records from a database file.
An Index is a small table having only two columns.
The first column comprises a copy of the primary or
candidate key of a table. Its second column contains
a set of pointers for holding the address .
Types of Indexing
• Primary Index
• Secondary Index
• Clustering Index
• What is Multilevel Index?
Types of Indexing

Primary Index
Primary Index is an ordered file which is fixed length size with two fields. The first field is
the same a primary key and second, filed is pointed to that specific data block. In the primary
Index, there is always one to one relationship between the entries in the index table. The
primary Indexing in DBMS is also further divided into two types.
Dense Index
Sparse Index
Dense Index
In a dense index, a record is created for every search key valued in the database. This
helps you to search faster but needs more space to store index records. In this Indexing,
method records contain search key value and points to the real record on the disk.
• Primary Index − Primary index is defined on an
ordered data file. The data file is ordered on a key
field. The key field is generally the primary key of
the relation.
. Secondary Index − Secondary index may be
generated from a field which is a candidate key and
has a unique value in every record, or a non-key
with duplicate values.
• Clustering Index − Clustering index is defined on
an ordered data file. The data file is ordered on a
non-key field.
Primary Index are two types:-
• Sparse Index
• Dense Index
• Dense Index:_
There is an index record for every search key value in the
database. This makes searching faster but requires more
space to store index records itself. Index records contain
search key value and a pointer to the actual record on the
disk.
• Sparse Index
In index records are not created for every search key. An index
record here contains a search key and an actual pointer to the data
on the disk. To search a record, we first proceed by index record
and reach at the actual location of the data. If the data we are
looking for is not where we directly reach by following the index,
then the system starts sequential.

You might also like