[go: up one dir, main page]

0% found this document useful (0 votes)
195 views39 pages

Chapter 5 Data Representation PDF

The document discusses different number systems for representing quantities: 1) Decimal uses 10 digits and is the standard system. Binary uses only two digits, 0 and 1, and is less complex than decimal. 2) Hexadecimal represents numbers compactly using 16 digits and is commonly used for binary. Octal uses 8 digits and is less frequent than hexadecimal. 3) Conversions can be done between the different systems through techniques like summing weights of digits or repeated division/multiplication by the system's base.

Uploaded by

Amrit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
195 views39 pages

Chapter 5 Data Representation PDF

The document discusses different number systems for representing quantities: 1) Decimal uses 10 digits and is the standard system. Binary uses only two digits, 0 and 1, and is less complex than decimal. 2) Hexadecimal represents numbers compactly using 16 digits and is commonly used for binary. Octal uses 8 digits and is less frequent than hexadecimal. 3) Conversions can be done between the different systems through techniques like summing weights of digits or repeated division/multiplication by the system's base.

Uploaded by

Amrit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Chapter 5 Data Representation

1- Decimal Numbers

In the decimal number system each of the ten digits 0 through 9, represents a certain quantity. These
ten symbols (digits) don’t limit you to expressing only ten different quantities, because you use the
various digits in appropriate positions within a number to indicate the magnitude of the quantity.

The position of each digit in a decimal number indicates the magnitude of the quantity represented, and
can be assigned a weight. The decimal number system is said to be of base, or radix, 10 because it uses
10 digits and the coefficients are multiplied by powers of 10. In general, a number with a decimal point
is represented by a series of coefficients:

a4a3a2a1a0. a-1a-2a-3

The coefficients aj are any of the 10 digits (0, 1, 2,……, 9), and the subscript value j gives the place value
and, hence, the power of 10 by which the coefficient must be multiplied.

104 ×a4 +103×a3+102×a2 +101×a1 +100×a0. 10-1× a-1 +10-2×a-2 +10-3×a-3

2- Binary Numbers

This is another way to represent quantities. The binary system is less complicated than the decimal
system because it has only two digits. It’s a base- 2 system.

A binary digit, called a bit, has two values 0 & 1. Each coefficient aj is multiplied by 2j, and the results
are added to obtain the decimal equivalent of the number. For example,

11010.11 is equal to 26.75 as follows:

1×24 + 1×23 + 0×22 + 1× 21 + 0×20 + 1× 2-1 + 1× 2-2

16 + 8 + 0 + 2 + 0 + 0.5 + 0.25 = (26.75)10

In general, a number expressed in a base-r system has coefficients multiplied by powers of r.

Rn ×an + rn-1 ×an-1+ …..+ r2×a2 + r1×a1 + 1 ×a0+ r-1× a-1 + r-2×a-2 +…..+ r-m×a-m

Now, let us begin to count in the binary system:

One bit 2 values 0, 1

Two bits 4 values 00, 01, 10, 11


Computational course chapter 5 Page 1
Three bits 8 values 000, 001, 010, 011, 100, 101, 110, 111

Four bits 16 values 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100,
1101, 1110, 1111

And so on……..

Decimal number Binary number

8421

0 0000 (LSB) least significant bit

1 0001

2 0010

3 0011

4 0100

5 0101

6 0110

7 0111

8 1000

9 1001

10 1010

11 1011

12 1100

13 1101

14 1110

15 1111 (MSB) most significant bit

Computational course chapter 5 Page 2


LSB (right-most bit) has a weight of 20 = 1.

MSB (left- most bit) has a weight of 23 = 8.

In general, with n – bit, you can count up to a number equal to:

Largest decimal number = 2n – 1

When n = 5, you can count from 0 – 31, (32 values). Max number = (31)10.

The weight structure of a fractional binary number is

2n-1…..23 22 21 20 . 2-1 2-2 2-3…… 2-n

…... 8 4 2 1. 0.5 0.25 0.125 ……..

3 – Hexadecimal Numbers

It has (16) digits and is used primarily as a compact way of displaying or writing binary numbers, it’s very
easy to convert between binary and hexadecimal numbers.

Decimal number Binary number Hexadecimal number

8421

0 0000 0

1 0001 1

2 0010 2

3 0011 3

4 0100 4

5 0101 5

6 0110 6

7 0111 7

8 1000 8

Computational course chapter 5 Page 3


9 1001 9

10 1010 A

11 1011 B

12 1100 C

13 1101 D

14 1110 E

15 1111 F

How do you count in hexadecimal once you get to F? Simply start over with another column and
continue as follows:

10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D…….

With two hexadecimal digits, you can count up to FF which in decimal 255.

(100)16 = (256)10

(101)16 = (257)10

(FFF)16 = (4095)10

(FFFF)16 = (65535)10

4- Octal Numbers

This system provides a convenient way to express binary numbers and codes (like Hex. Number system),
it is used less frequently than hexadecimal. The octal number system is composed of eight digits, which
are:

01234567

To count above 7, begin another column and start over

10 11 12 13 14 15 16 17 20 21 22 23 24 25 26 27 30 31 32 33 34 35 36 37……

(15)8 = (13)10 = (D)16

Computational course chapter 5 Page 4


Binary – to – Decimal Conversion

The decimal value of any binary number can be found by adding the weights of all bits that are 1 and
discarding the weights of all bits that are 0.

Example

Convert the binary number 1101101 to decimal.

Solution

26 × 1+ 25 × 1+ 24 ×0+ 23 × 1 +22 × 1 + 21 × 0 + 20 ×1

64 + 32 + 0 + 8 + 4 + 0 + 0 + 1= (109)10

Example

Convert 0.1011 to decimal.

Solution

20 × 0 + 2-1 ×1 + 2-2 × 0 + 2-3 × 1 + 2-4 × 1

0 + 0.5 + 0.125 + 0 + 0.0625 = (0.6875)10

Decimal – to – Binary Conversion

Sum – of – Weights Method: determine the set of binary weights whose sum is equal to the decimal
number

Example: Convert 9 to binary.

Solution: 9= 8 + 1 “choosing numbers with power of 2 ”

Binary weights are: …..32 16 8 4 2 1

1001

So (9)10 = (1001)2

Example

Convert the following

Computational course chapter 5 Page 5


12 = 8 + 4 = 1100

25 = 16 + 8 + 1 = 11001

82 = 64 + 16 + 2 = 1010010.

Repeated Division -by- 2 Method:

12/2 = 6 reminder = 0 this is the LSB

6/2 = 3 reminder = 0

3/2 = 1 reminder = 1

1/2 = 0 reminder = 1 this is the MSB

Stop when the whole number quotient is 0.

Example: convert 45 to binary.

Solution:

45/2 = 22 reminder = 1

22/2 = 11 reminder = 0

11/2 = 5 reminder = 1

5/2 = 2 reminder = 1

2/2 = 1 reminder = 0

1/2 = 0 reminder = 1

(45)10 = (101101)2

What about decimal numbers with fractions?

Binary weights: 0.5 0.25 0.125 0.0625

(0.625)10 = 0.5 + 0.125 = (0.101)2 (by using sum – of – weight method)

OR by using repeated MULTIPLECTION - by – 2 method

Computational course chapter 5 Page 6


0.625 × 2 = 1.25 1 this is the MSB

0.25 × 2 = 0.5 0

0.5 × 2 = 1.00 1 this is the LSB

So (0.625)10 = (0.101)2

Binary to Hexadecimal Conversion

Simply break the binary number into 4-bit groups, starting at the rightmost bit and replace each 4-bit
group with the equivalent hexadecimal symbol.

Example

1100101001010111 = 1100 1010 0101 0111

C A 5 7

Note: each group must be four bits.

Hexadecimal – to – Binary Conversion

Reverse the process and replace each hexadecimal symbol with the appropriate four bits.

Example

10AF = 1 0 A F

0001 0000 1010 1111

So (10AF)16 = (0001000010101111)2

Hexadecimal to Decimal Conversion

Repeated division of a decimal number by 16 will produce the equivalent hexadecimal number, formed
by the reminders of the division. The first reminder produced is the LSD. Each successive division by 16
yields a reminder that becomes a digit in the equivalent hexadecimal number.

Example

(650)10 = ( ? )16

Computational course chapter 5 Page 7


Solution

650/16 = 40 reminder = 10 = A this is the LSD

40/16 = 2 reminder = 8 = 8

2/16 = 0 reminder = 2 = 2 this is the MSD

So (650)10 = (28A)16

Octal – to – Decimal Conversion

(2374)8 = 2 × 83 + 3 × 82 + 7 × 81 + 4 × 80

= 1024

Decimal – to – Octal Conversion

(359)10 = ( )8

359/8 = 44 reminder = 7 this is the LSD

44/8 = 5 reminder = 4

5/8 = 0 reminder = 5 this is the MSD

So (359)10 = (547)8

Octal – to – Binary Conversion

Replace each octal digit with the appropriate three bits.

Example

(25)8 = (010 101)2

(7526)8 = 7 5 2 6

111 101 010 110

(7526)8 = (111101010110)2

Binary – to – Octal Conversion

Computational course chapter 5 Page 8


Start with the rightmost group of three bits and moving from right to left, convert each 3-bit group to
the equivalent octal digit.

Example

11010000100 = 011 010 000 100

3 2 0 4

(11010000100)2 = (3204)8

Logic Gates

THE INVERTER

The inverter (NOT circuit) performs the operation called inversion or complementation. The inverter
changes one logic level to the opposite level.

In terms of bits, it changes a 1 to a 0 and a 0 to a 1.

Standard logic symbols for the inverter are shown in Fig which shows the distinctive shape symbols.

The negation indicator is a "bubble" (o) that indicates inversion or complementation when it appears on
the input or output of any logic element. Generally, input is on the left of a logic symbol and the output
is on the right. When appearing on the input, the bubble means that a 0 is the active, and the input is
called an active-LOW input. When appearing on the output, the bubble means that a 0 is the active, and
the output is called an active-LOW output.

When a HIGH level is applied to an inverter input, a LOW level will appear on its output. When a LOW
level is applied to its input, a HIGH will appear on its output. This operation is summarized in Table 3-1,
which shows the output for each possible input in terms of levels and corresponding bits. A table such as
this is called a truth table.

Computational course chapter 5 Page 9


Input Output
1 0
0 1

The AND Gate

The term gate is used to describe a circuit that performs a basic logic operation. The AND gate is
composed of two or more inputs and a single output, as indicated by the standard logic symbols shown
in Fig. Inputs are on the left, and the output is on the right in each symbol. Gates with two inputs are
shown; however, an AND gate can have any number of inputs greater than one.

Operation of an AND Gate

An AND gate produces a HIGH output only when all of the inputs are HIGH. When any of the inputs is
LOW, the output is LOW. Therefore, the basic purpose of an AND gate is to determine when certain
conditions are simultaneously true, as indicated by HIGH levels on all of its inputs, and to produce a
HIGH on its output to indicate that all these conditions are true.

The inputs of the 2-input AND gate in Figure below are labeled A, B and the output is labeled X. The gate
operation can be stated as follows:

For a 2-input AND gate, output X is HIGH only when inputs A and B are HIGH; X is LOW when either A
or B is LOW, or when both A and B are LOW.

Computational course chapter 5 Page 10


The logical operation of a gate can be expressed with a truth table that lists all input combinations with
the corresponding outputs, as illustrated in Table below for a 2-input AND gate. The truth table can be
expanded to any number of inputs. For any AND gate, regardless of the number of inputs, the output is
HIGH only when all inputs are HIGH.

The OR Gate

An OR gate can have more than two inputs. The OR gate is another of the basic gates from which all
logic functions are constructed. An OR gate can have two or more inputs and performs what is known as
logical addition.

An OR gate has two or more inputs and one output, as indicated by the standard logic symbol in Figure
below , where OR gates with two inputs are illustrated. An OR gate can have any number of inputs
greater than one.

Computational course chapter 5 Page 11


Operation of an OR Gate

An OR gate produces a HIGH on the output when any of the inputs is HIGH. The output is LOW only
when all of the inputs are LOW. The inputs of the 2-input OR gate in Figure above are labeled A, B and
the output is labeled X. The operation of the gate can be stated as follows:

For a 2-input OR gate, output X is HIGH when either input A or input B is HIGH, or when both A and B
are HIGH; X is LOW only when both A and B are LOW.

The HIGH level is the active or asserted output level for the OR gate. Figure below illustrates the
operation for a 2-input OR gate for all four possible input combinations.

OR Gate Truth Table

The operation of a 2-input OR gate is described in Table below. This truth table can be expanded for any
number of inputs regardless of the number of inputs. The output is HIGH when one or more of the
inputs are HIGH.

Computational course chapter 5 Page 12


The operation of a 2-input OR gate can be expressed as follows: If one input variable is A, if the other
input variable is B, and if the output variable is X, then the Boolean expression is

X=A+B

THE NAND GATE

The NAND gate is a popular logic element because it can be used as a universal gate: that is, NAND gates
can be used in combination to perform the AND, OR, and inverter operations.

The term NAND is a contraction of NOT-AND and implies an AND function with a complemented
(inverted) output. The standard logic symbol for a 2-input NAND gate and its equivalency to an AND gate
followed by an inverter are shown in Fig (a), where the symbol ≡ means equivalent to. A rectangular
outline symbol is shown in part (b).

Operation of a NAND Gate

Computational course chapter 5 Page 13


A NAND gate produces a LOW output only when all the inputs are HIGH. When any of the inputs is LOW,
the output will be HIGH. For the specific case of a 2-input NAND gate, as shown in Fig above with the
inputs labeled A, B and the output labeled X, the operation can be stated as follows:

For a 2-input NAND gate, output X is LOW only when inputs A and Bare HIGH; X is HIGH when either A
or B is LOW, or when both A and B are LOW.

Note that this operation is opposite that of the AND in terms of the output level. In a NAND gate, the
LOW level (0) is the active or asserted output level, as indicated by the bubble on the output. Fig below
illustrates the operation of a 2-input NAND gate for all four input combinations, and Table is the truth
table summarizing the logical operation of the 2-input NAND gate.

The NOR Gate

The NOR gate, like the NAND gate, is a useful logic element because it can also be used as a universal
gate i.e. NOR gates can be used in combination to perform the AND, OR, and inverter operations.

Computational course chapter 5 Page 14


The term NOR is a contraction of NOT-OR and implies an OR function with an inverted (complemented)
output. The standard logic symbol for a 2-input NOR gate and its equivalent OR gate followed by an
inverter are shown in Fig below.

Operation of a NOR Gate

A NOR gate produces a LOW output when any of its inputs is HIGH. Only when all of its inputs are LOW is
the output HIGH. For the specific case of a 2-input NOR gate, as shown in Fig above with the inputs
labeled A, B and the output labeled X, the operation can be stated as follows:

For a 2-input NOR gate, output X is LOW when either input A or input B is HIGH, or when both A and B
are HIGH; X is HIGH only when both A and B are LOW.

This operation results in an output level opposite that of the OR gate. In a NOR gate, the LOW output is
the active or asserted output level as indicated by the bubble on the output.

Fig below illustrates the operation of a 2-input NOR gate for all four possible input combinations, and
Table is the truth table for a 2-input NOR gate.

Computational course chapter 5 Page 15


THE EXCLUSIVE-OR AND EXCLUSIVE-NOR GATES

Exclusive-OR and exclusive-NOR gates are formed by a combination of other gates already discussed.
However, because of their fundamental importance in many applications, these gates are often treated
as basic logic elements with their own unique symbols.

The Exclusive-OR Gate

Standard symbol for an exclusive-OR (XOR for short) gate is shown in Fig below. The XOR gate has only
two inputs.

For an exclusive-OR gate, output X is HIGH when input A is LOW and input B is HIGH, or when input A
is HIGH and input B is LOW: X is LOW when A and B are both HIGH or both LOW.

The four possible input combinations and the resulting outputs for an XOR gate are illustrated in Fig
below. The HIGH level is the active or asserted output level and occurs only when the inputs are at
opposite levels. The operation of an XOR gate is summarized in the table shown in Table.

Computational course chapter 5 Page 16


The Exclusive-NOR Gate

Standard symbols for an exclusive-NOR (XNOR) gate are shown in Fig below. Like the XOR gate, an XNOR
has only two inputs. The bubble on the output of the XNOR symbol indicates that its output is opposite
that of the XOR gate. When the two input logic levels are opposite, the output of the exclusive-NOR gate
is LOW. The operation can be stated as follows (A and B are inputs, X is the output):

For an exclusive-NOR gate, output X is LOW when input A is LOW and input B is HIGH, or when A is
HIGH and B is LOW; X is HIGH when A and B are both HIGH or both LOW.

The four possible input combinations and the resulting outputs for an XNOR gate are shown in Fig
below. The operation of an XNOR gate is summarized in Table. Notice that the output is HIGH when the
same level is on both inputs.

Computational course chapter 5 Page 17


5.6 Database Management System

A database-management system (DBMS) is a collection of interrelated data and a set of programs to


access those data. This is a collection of related data with an implicit meaning and hence is a database.
The collection of data, usually referred to as the database, contains information relevant to an
enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database information
that is both convenient and efficient. By data, we mean known facts that can be recorded and that have
implicit meaning. For example, consider the names, telephone numbers, and addresses of the people
you know. You may have recorded this data in an indexed address book, or you may have stored it on a
diskette, using a personal computer and software such as DBASE IV or V, Microsoft ACCESS, or EXCEL. A
datum – a unit of data – is a symbol or a set of symbols which is used to represent something. This
relationship between symbols and what they represent is the essence of what we mean by information.
Hence, information is interpreted data – data supplied with semantics. Knowledge refers to the practical
use of information. While information can be transported, stored or shared without many difficulties the
same cannot be said about knowledge. Knowledge necessarily involves a personal experience. Referring
back to the scientific experiment, a third person reading the results will have information about it, while
the person who conducted the experiment personally will have knowledge about it.

Computational course chapter 5 Page 18


Database systems are designed to manage large bodies of information. Management of data involves
both defining structures for storage of information and providing mechanisms for the manipulation of
information. In addition, the database system must ensure the safety of the information stored, despite
system crashes or attempts at unauthorized access. If data are to be shared among several users, the
system must avoid possible anomalous results.

Because information is so important in most organizations, computer scientists have developed a large
body of concepts and techniques for managing data.

5.6.1 Data Processing Vs. Data Management Systems

Although Data Processing and Data Management Systems both refer to functions that take raw data and
transform it into usable information, the usage of the terms is very different. Data Processing is the
term generally used to describe what was done by large mainframe computers from the late 1940's until
the early 1980's (and which continues to be done in most large organizations to a greater or lesser
extent even today): large volumes of raw transaction data fed into programs that update a master file,
with fixed-format reports written to paper.

The term Data Management Systems refers to an expansion of this concept, where the raw data,
previously copied manually from paper to punched cards, and later into data-entry terminals, is now fed
into the system from a variety of sources, including ATMs, EFT, and direct customer entry through the
Internet. The master file concept has been largely displaced by database management systems, and
static reporting replaced or augmented by ad-hoc reporting and direct inquiry, including downloading of
data by customers. The ubiquities of the Internet and the Personal Computer have been the driving
force in the transformation of Data Processing to the more global concept of Data Management
Systems.

5.6.2 File Oriented Approach

The earliest business computer systems were used to process business records and produce
information. They were generally faster and more accurate than equivalent manual systems. These
systems stored groups of records in separate files, and so they were called file processing systems. In a
typical file processing systems, each department has its own files, designed specifically for those
applications. The department itself, working with the data processing staff, sets policies or standards for
the format and maintenance of its files.

Programs are dependent on the files and vice-versa; that is, when the physical format of the file is
changed, the program has also to be changed. Although the traditional file oriented approach to
information processing is still widely used, it does have some very important disadvantages.

5.6.2 Database Oriented Approach to Data Management

Computational course chapter 5 Page 19


Consider part of a savings-bank enterprise that keeps information about all customers and savings
accounts. One way to keep the information on a computer is to store it in operating system files. To
allow users to manipulate the information, the system has a number of application programs that
manipulate the files, including

 A program to debit or credit an account


 A program to add a new account
 A program to find the balance of an account
 A program to generate monthly statements

System programmers wrote these application programs to meet the needs of the bank. New application
programs are added to the system as the need arises. For example, suppose that the savings bank
decides to offer checking accounts. As a result, the bank creates new permanent files that contain
information about all the checking accounts maintained in the bank, and it may have to write new
application programs to deal with situations that do not arise in savings accounts, such as overdrafts.
Thus, as time goes by, the system acquires more files and more application programs.

This typical file-processing system is supported by a conventional operating system. The system stores
permanent records in various files, and it needs different application programs to extract records from,
and add records to, the appropriate files. Before database management systems (DBMSs) came along,
organizations usually stored information in such systems.

Keeping organizational information in a file-processing system has a number of major disadvantages:

 Data redundancy and inconsistency: Since different programmers create the files and
application programs over a long period, the various files are likely to have different formats and
the programs may be written in several programming languages. Moreover, the same
information may be duplicated in several places (files). For example, the address and telephone
number of a particular customer may appear in a file that consists of savings-account records
and in a file that consists of checking-account records. This redundancy leads to higher storage
and access cost. In addition, it may lead to data inconsistency; that is, the various copies of the
same data may no longer agree.
 Difficulty in accessing data: Conventional file-processing environments do not allow needed
data to be retrieved in a convenient and efficient manner. More responsive data-retrieval
systems are required for general use.
 Data isolation. Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
 Integrity problems. The data values stored in the database must satisfy certain types of
consistency constraints. For example, the balance of a bank account may never fall below a
prescribed amount (say, $25). Developers enforce these constraints in the system by adding
appropriate code in the various application programs. However, when new constraints are

Computational course chapter 5 Page 20


added, it is difficult to change the programs to enforce them. The problem is compounded when
constraints involve several data items from different files.
 Atomicity problems. Consider a program to transfer $50 from account A to account B. If a
system failure occurs during the execution of the program, it is possible that the $50 was
removed from account A but was not credited to account B, resulting in an inconsistent
database state. Clearly, it is essential to database consistency that either both the credit and
debit occur, or that neither occur. That is, the funds transfer must be atomic—it must happen in
its entirety or not at all. It is difficult to ensure atomicity in a conventional file-processing
system.
 Concurrent-access anomalies. Consider bank account ‘A’, containing $500. If two customers
withdraw funds (say $50 and $100 respectively) from account A at about the same time, the
result of the concurrent executions may leave the account in an incorrect (or inconsistent) state.
Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce
that value by the amount being withdrawn, and write the result back. If the two programs run
concurrently, they may both read the value $500, and write back $450 and $400, respectively.
Depending on which one writes the value last, the account may contain $450 or $400, rather
than the correct value of $350. To guard against this possibility, the system must maintain some
form of supervision. But supervision is difficult to provide because data may be accessed by
many different application programs that have not been coordinated previously.
 Security problems. Not every user of the database system should be able to access all the data.
For example, in a banking system, payroll personnel need to see only that part of the database
that has information about the various bank employees. They do not need access to information
about customer accounts. But, since application programs are added to the system in an ad hoc
manner, enforcing such security constraints is difficult.

These difficulties, among others, prompted the development of database systems. In what follows, we
shall see the concepts and algorithms that enable database systems to solve the problems with file-
processing systems. In most of this book, we use a bank enterprise as a running example of a typical
data-processing application found in a corporation.

Characteristics of Database

The database approach has some very characteristic features which are discussed in detail below:

Concurrent Use

A database system allows several users to access the database concurrently. Answering different
questions from different users with the same (base) data is a central aspect of an information system.
Such concurrent use of data increases the economy of a system.

An example for concurrent use is the travel database of a bigger travel agency. The employees of
different branches can access the database concurrently and book journeys for their clients. Each travel
Computational course chapter 5 Page 21
agent sees on his interface if there are still seats available for a specific journey or if it is already fully
booked.

Structured and Described Data

A fundamental feature of the database approach is that the database system does not only contain the
data but also the complete definition and description of these data. These descriptions are basically
details about the extent, the structure, the type and the format of all data and, additionally, the
relationship between the data. This kind of stored data is called metadata ("data about data").

Separation of Data and Applications

As described in the feature structured data the structure of a database is described through metadata
which is also stored in the database. Application software does not need any knowledge about the
physical data storage like encoding, format, storage place, etc. It only communicates with the
management system of a database (DBMS) via a standardized interface with the help of a standardized
language like SQL. The access to the data and the metadata is entirely done by the DBMS. In this way all
the applications can be totally separated from the data. Therefore database internal reorganizations or
improvement of efficiency do not have any influence on the application software.

Data Integrity

Data integrity is a byword for the quality and the reliability of the data of a database system. In a
broader sense data integrity includes also the protection of the database from unauthorized access
(confidentiality) and unauthorized changes. Data reflect facts of the real world.

Transactions

A transaction is a bundle of actions which are done within a database to bring it from one consistent
state to a new consistent state. In between the data are inevitable inconsistent. A transaction is atomic
what means that it cannot be divided up any further. Within a transaction all or none of the actions
need to be carried out. Doing only a part of the actions would lead to an inconsistent database state.
One example of a transaction is the transfer of an amount of money from one bank account to another.
The debit of the money from one account and the credit of it to another account make together a
consistent transaction. This transaction is also atomic.

Data Persistence

Data persistence means that in a DBMS all data is maintained as long as it is not deleted explicitly. The
life span of data needs to be determined directly or indirectly be the user and must not be dependent
on system features. Additionally data once stored in a database must not be lost. Changes of a database

Computational course chapter 5 Page 22


which are done by a transaction are persistent. When a transaction is finished even a system crash
cannot put the data in danger.

Advantages and Disadvantages of a DBMS

Using a DBMS to manage data has many advantages:

 Data independence: Application programs should be as independent as possible from details of


data representation and storage. The DBMS can provide an abstract view of the data to insulate
application code from such details.
 Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve
data efficiently. This feature is especially important if the data is stored on external storage
devices.
 Data integrity and security: If data is always accessed through the DBMS, the DBMS can enforce
integrity constraints on the data. For example, before inserting salary information for an
employee, the DBMS can check that the department budget is not exceeded. Also, the DBMS
can enforce access controls that govern what data is visible to different classes of users.
 Data administration: When several users share the data, centralizing the administration of data
can offer significant improvements. Experienced professionals who understand the nature of
the data being managed, and how different groups of users use it, can be responsible for
organizing the data representation to minimize redundancy and fine-tuning the storage of the
data to make retrieval efficient.
 Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data in
such a manner that users can think of the data as being accessed by only one user at a time.
Further, the DBMS protects users from the effects of system failures.
 Reduced application development time: Clearly, the DBMS supports many important functions
that are common to many applications accessing data stored in the DBMS. This, in conjunction
with the high-level interface to the data, facilitates quick development of applications.

Disadvantages of a DBMS

 Danger of a Overkill: For small and simple applications for single users a database system is
often not advisable.
 Complexity: A database system creates additional complexity and requirements. The supply and
operation of a database management system with several users and databases is quite costly
and demanding.
 Qualified Personnel: The professional operation of a database system requires appropriately
trained staff. Without a qualified database administrator nothing will work for long.
 Costs: Through the use of a database system new costs are generated for the system itself but
also for additional hardware and the more complex handling of the system.

Computational course chapter 5 Page 23


 Lower Efficiency: A database system is a multi-use software which is often less efficient than
specialized software which is produced and optimized exactly for one problem.

Instances and Schemas

Databases change over time as information is inserted and deleted. The collection of information stored
in the database at a particular moment is called an instance of the database. The overall design of the
database is called the database schema. Schemas are changed infrequently, if at all.

The concept of database schemas and instances can be understood by analogy to a program written in a
programming language. A database schema corresponds to the variable declarations (along with
associated type definitions) in a program. Each variable has a particular value at a given instant. The
values of the variables in a program at a point in time correspond to an instance of a database schema.

Database systems have several schemas, partitioned according to the levels of abstraction.

The physical schema describes the database design at the physical level, while the logical schema
describes the database design at the logical level. A database may also have several schemas at the view
level, sometimes called sub-schemas that describe different views of the database.

Of these, the logical schema is by far the most important, in terms of its effect on application programs,
since programmers construct applications by using the logical schema. The physical schema is hidden
beneath the logical schema, and can usually be changed easily without affecting application programs.
Application programs are said to exhibit physical data independence if they do not depend on the
physical schema, and thus need not be rewritten if the physical schema changes.

We study languages for describing schemas, after introducing the notion of data models in the next
section.

5.7 Data Models

Underlying the structure of a database is the data model: a collection of conceptual tools for describing
data, data relationships, data semantics, and consistency constraints.

To illustrate the concept of a data model, we outline two data models in this section: the entity-
relationship model and the relational model. Both provide a way to describe the design of a database at
the logical level.

5.7.1 The Entity-Relationship Model

The entity-relationship (E-R) data model is based on a perception of a real world that consists of a
collection of basic objects, called entities, and of relationships among these objects. An entity is a

Computational course chapter 5 Page 24


“thing” or “object” in the real world that is distinguishable from other objects. For example, each person
is an entity, and bank accounts can be considered as entities.

Entities are described in a database by a set of attributes. For example, the attributes account-number
and balance may describe one particular account in a bank, and they form attributes of the account
entity set. Similarly, attributes customer-name, customer-street address and customer-city may describe
a customer entity.

An extra attribute customer-id is used to uniquely identify customers (since it may be possible to have
two customers with the same name, street address, and city).

A unique customer identifier must be assigned to each customer. In the United States, many enterprises
use the social-security number of a person (a unique number the U.S. government assigns to every
person in the United States) as a customer identifier. A relationship is an association among several
entities. For example, a depositor relationship associates a customer with each account that she has.
The set of all entities of the same type and the set of all relationships of the same type are termed an
entity set and relationship set, respectively.

The overall logical structure (schema) of a database can be expressed graphically by an E-R diagram.

Basic Constructs of E-R Modeling

The ER model views the real world as a construct of entities and association between entities.

Entities

Entities are the principal data object about which information is to be collected. Entities are usually
recognizable concepts, either concrete or abstract, such as person, places, things, or events which have
relevance to the database. Some specific examples of entities are EMPLOYEES, PROJECTS and INVOICES.
An entity is analogous to a table in the relational model.

Entities are classified as independent or dependent (in some methodologies, the terms used are strong
and weak, respectively). An independent entity is one that does not rely on another for identification. A
dependent entity is one that relies on another for identification.

An entity occurrence (also called an instance) is an individual occurrence of an entity. An occurrence is


analogous to a row in the relational table.

Relationships

A Relationship represents an association between two or more entities. An example of a relationship


would be:

Computational course chapter 5 Page 25


 Employees are assigned to projects
 Projects have subtasks
 Departments manage one or more projects

Relationships are classified in terms of degree, connectivity, cardinality, and existence.

Attributes

Attributes describe the entity of which they are associated. A particular instance of an attribute is a
value. For example, "Jane R. Hathaway" is one value of the attribute Name. The domain of an attribute
is the collection of all possible values an attribute can have. The domain of Name is a character string.

Attributes can be classified as identifiers or descriptors. Identifiers, more commonly called keys,
uniquely identify an instance of an entity. A descriptor describes a non-unique characteristic of an entity
instance.

ER Notation

5.7.2 Relational Model

The relational model uses a collection of tables to represent both data and the relationships among
those data. Each table has multiple columns, and each column has a unique name.

The data is arranged in a relation which is visually represented in a two dimensional table. The data is
inserted into the table in the form of tuples (which are nothing but rows). A tuple is formed by one or
more than one attributes, which are used as basic building blocks in the formation of various
expressions that are used to derive meaningful information. There can be any number of tuples in the
table, but all the tuple contain fixed and same attributes with varying values. The relational model is
implemented in database where a relation is represented by a table, a tuple is represented by a row, an

Computational course chapter 5 Page 26


attribute is represented by a column of the table, attribute name is the name of the column such as
‘identifier’, ‘name’, ‘city’ etc., and attribute value contains the value for column in the row. Constraints
are applied to the table and form the logical schema. In order to facilitate the selection of a particular
row/tuple from the table, the attributes i.e. column names are used, and to expedite the selection of
the rows some fields are defined uniquely to use them as indexes, this helps in searching the required
data as fast as possible. All the relational algebra operations, such as Select, Intersection, Product,
Union, Difference, Project, Join, Division, Merge etc. can also be performed on the Relational Database
Model. Operations on the Relational Database Model are facilitated with the help of different
conditional expressions, various key attributes, pre-defined constraints etc.

Relational Model Concepts

We shall represent a relation as a table with columns and rows. Each column of the table has a name, or
attribute. Each row is called a tuple.

• Domain: a set of atomic values that an attribute can take

• Attribute: name of a column in a particular table (all data is stored in tables). Each attribute Ai
must have a domain, dom(A ).
i

• Relational Schema: The design of one table, containing the name of the table (i.e. the name of the
relation), and the names of all the columns, or attributes.

Relational keys

There are two kinds of keys in relations. The first are identifying keys: the primary key is the main
concept, while two other keys – super key and candidate key – are related concepts. The second kind is
the foreign key.

Computational course chapter 5 Page 27


Identity Keys

Super Keys

A super key is a set of attributes whose values can be used to uniquely identify a tuple within a relation.
A relation may have more than one super key, but it always has at least one: the set of all attributes that
make up the relation.

Candidate Keys

A candidate key is a super key that is minimal; that is, there is no proper subset that is itself a superkey.
A relation may have more than one candidate key, and the different candidate keys may have a different
number of attributes. In other words, you should not interpret 'minimal' to mean the super key with the
fewest attributes.

A candidate key has two properties:

 in each tuple of R, the values of K uniquely identify that tuple (uniqueness)


 no proper subset of K has the uniqueness property (irreducibility).

Primary Key

The primary key of a relation is a candidate key especially selected to be the key for the relation. In
other words, it is a choice, and there can be only one candidate key designated to be the primary key.

Foreign keys

The attribute(s) within one relation that matches a candidate key of another relation. A relation may
have several foreign keys, associated with different target relations.

Foreign keys allow users to link information in one relation to information in another relation. Without
FKs, a database would be a collection of unrelated tables.

5.7.3 Object oriented data model

The object-oriented model can be seen as extending the E-R model with notions object-oriented data
model.

Object oriented databases are also called Object Database Management Systems (ODBMS). Object
databases store objects rather than data such as integers, strings or real numbers. Objects are used in
object oriented languages such as Smalltalk, C++, Java, and others. Objects basically consist of the
following:

Computational course chapter 5 Page 28


 Attributes - Attributes are data which defines the characteristics of an object. This data may be
simple such as integers, strings, and real numbers or it may be a reference to a complex object.
 Methods - Methods define the behavior of an object and are what was formally called
procedures or functions.

Therefore objects contain both executable code and data. There are other characteristics of objects such
as whether methods or data can be accessed from outside the object. We don't consider this here, to
keep the definition simple and to apply it to what an object database is. One other term worth
mentioning is classes. Classes are used in object oriented programming to define the data and methods
the object will contain. The class is like a template to the object. The class does not itself contain data or
methods but defines the data and methods contained in the object. The class is used to create
(instantiate) the object. Classes may be used in object databases to recreate parts of the object that may
not actually be stored in the database. Methods may not be stored in the database and may be
recreated by using a class.

The object-relational data model combines features of the object-oriented data model and relational
data model. Semi-structured data models permit the specification of data where individual data items of
the same type may have different sets of attributes. This is in contrast with the data models mentioned
earlier, where every data item of a particular type must have the same set of attributes. The extensible
markup language (XML) is widely used to represent semi-structured data.

5.7.4 Network Database Model

A network database model is a database model that allows multiple records to be linked to the same
owner file. The model can be seen as an upside down tree where the branches are the member
information linked to the owner, which is the bottom of the tree. The multiple linkages which this
information allows the network database model to be very flexible. In addition, the relationship that the
information has in the network database model is defined as many-to-many relationship because one
owner file can be linked to many member files and vice versa.

The network model allows each record to have multiple parent and child records, forming a generalized
graph structure. This property applies at two levels: the schema is a generalized graph of record types
connected by relationship types (called "set types" in CODASYL), and the database itself is a generalized
graph of record occurrences connected by relationships (CODASYL "sets"). Cycles are permitted at both
levels. The chief argument in favor of the network model, in comparison to the hierarchic model, was
that it allowed a more natural modeling of relationships between entities.

5.7.5 Hierarchical Database

A hierarchical database is a design that uses a one-to-many relationship for data elements. Hierarchical
database models use a tree structure that links a number of disparate elements to one "owner," or
"parent," primary record.

Computational course chapter 5 Page 29


The idea behind hierarchical database models is useful for a certain type of data storage, but it is not
extremely versatile. Its limitations mean that it is confined to some very specific uses. For example,
where each individual person in a company may report to a given department, the department can be
used as a parent record and the individual employees will represent secondary records, each of which
links back to that one parent record in a hierarchical structure.

Hierarchical databases were popular in early database design, in the era of mainframe computers. While
some IBM and Microsoft models are still in use, many other types of business databases use more
flexible models to accommodate more sophisticated types of data management. Hierarchical models
make the most sense where the primary focus of information gathering is on a concrete hierarchy such
as a list of business departments, assets or people that will all be associated with specific higher-level
primary data elements.

Hierarchical database model

A hierarchical database model is a data model in which the data is organized into a tree-like structure.
The data is stored as records which are connected to one another through links. A record is a collection
of fields, with each field containing only one value. The entity type of a record defines which fields the
record contains.

Example of a hierarchical model

A record in the hierarchical database model corresponds to a row (or tuple) in the relational database
model and an entity type corresponds to a table (or relation).

The hierarchical database model mandates that each child record has only one parent, whereas each
parent record can have one or more child records. In order to retrieve data from a hierarchical database
the whole tree needs to be traversed starting from the root node. This model is recognized as the first
database model created by IBM in the 1960s

Computational course chapter 5 Page 30


5.8 Database Languages

A database system provides a data definition language to specify the database schema and a data
manipulation language to express database queries and updates. In practice, the data definition and
data manipulation languages are not two separate languages; instead they simply form parts of a single
database language, such as the widely used SQL language.

5.8.1 Data-Definition Language

We specify a database schema by a set of definitions expressed by a special language called a data-
definition language (DDL).

For instance, the following statement in the SQL language defines the account table:

create table account (account-number char(10), balance integer)

Execution of the above DDL statement creates the account table. In addition, it updates a special set of
tables called the data dictionary or data directory.

A data dictionary contains metadata—that is, data about data. The schema of a table is an example of
metadata. A database system consults the data dictionary before reading or modifying actual data.

We specify the storage structure and access methods used by the database system by a set of
statements in a special type of DDL called a data storage and definition language. These statements
define the implementation details of the database schemas, which are usually hidden from the users.

The data values stored in the database must satisfy certain consistency constraints. For example,
suppose the balance on an account should not fall below $100. The DDL provides facilities to specify
such constraints. The database systems check these constraints every time the database is updated.

5.8.2 Data-Manipulation Language

Data manipulation is

 The retrieval of information stored in the database


 The insertion of new information into the database
 The deletion of information from the database
 The modification of information stored in the database

A data-manipulation language (DML) is a language that enables users to access or manipulate data as
organized by the appropriate data model. There are basically two types:

Computational course chapter 5 Page 31


 Procedural DMLs require a user to specify what data are needed and how to get those data.
 Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data
are needed without specifying how to get those data.

Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since a user
does not have to specify how to get the data, the database system has to figure out an efficient means
of accessing data. The DML component of the SQL language is nonprocedural.

A query is a statement requesting the retrieval of information. The portion of a DML that involves
information retrieval is called a query language. Although technically incorrect, it is common practice to
use the terms query language and data manipulation language synonymously.

This query in the SQL language finds the name of the customer whose customer-id is 192-83-7465:

select customer.customer-name from customer where customer.customer-id = 192-83-7465

5.8.3 Data Dictionary

We can define a data dictionary as a DBMS component that stores the definition of data characteristics
and relationships. You may recall that such “data about data” were labeled metadata. The DBMS data
dictionary provides the DBMS with its self describing characteristic. In effect, the data dictionary
resembles and X-ray of the company’s entire data set, and is a crucial element in the data administration
function.

The two main types of data dictionary exist, integrated and stand alone. An integrated data dictionary is
included with the DBMS. For example, all relational DBMSs include a built in data dictionary or system
catalog that is frequently accessed and updated by the RDBMS. Other DBMSs especially older types, do
not have a built in data dictionary instead the DBA may use third party stand alone data dictionary
systems.

Data dictionaries can also be classified as active or passive. An active data dictionary is automatically
updated by the DBMS with every database access, thereby keeping its access information up-to-date. A
passive data dictionary is not updated automatically and usually requires a batch process to be run. Data
dictionary access information is normally used by the DBMS for query optimization purpose.

5.8.4 The Three-Schema Architecture

The goal of the three-schema architecture, illustrated in Figure 1.1, is to separate the user applications
and the physical database. In this architecture, schemas can be defined at the following three levels:

Computational course chapter 5 Page 32


1. The internal level has an internal schema, which describes the physical storage structure of the
database. The internal schema uses a physical data model and describes the complete details of
data storage and access paths for the database.

2. The conceptual level has a conceptual schema, which describes the structure of the whole
database for a community of users. The conceptual schema hides the details of physical storage
structures and concentrates on describing entities, data types, relationships, user operations,
and constraints. A high-level data model or an implementation data model can be used at this
level.

3. The external or view level includes a number of external schemas or user views. Each
external schema describes the part of the database that a particular user group is interested in
and hides the rest of the database from that user group. A high-level data model or an
implementation data model can be used at this level.

The three-schema architecture is a convenient tool for the user to visualize the schema levels in
a database system. Most DBMSs do not separate the three levels completely, but support the
three-schema architecture to some extent. Some DBMSs may include physical-level details in
the conceptual schema. In most DBMSs that support user views, external schemas are specified
in the same data model that describes the conceptual-level information. Some DBMSs allow
different data models to be used at the conceptual and external levels.

Notice that the three schemas are only descriptions of data; the only data that actually exists is
at the physical level. In a DBMS based on the three-schema architecture, each user group refers
only to its own external schema. Hence, the DBMS must transform a request specified on an
external schema into a request against the conceptual schema, and then into a request on the
internal schema for processing over the stored database. If the request is database retrieval, the
Computational course chapter 5 Page 33
data extracted from the stored database must be reformatted to match the user’s external
view. The processes of transforming requests and results between levels are called mappings.
These mappings may be time-consuming, so some DBMSs—especially those that are meant to
support small databases—do not support external views. Even in such systems, however, a
certain amount of mapping is necessary to transform requests between the conceptual and
internal levels.

5.9 Database Architecture

5.9.1 Centralized database

A centralized database is a collection of information at a single location accessible from numerous


points, in contrast with a distributed database where the information is spread out across multiple sites.
There are advantages and disadvantages to this setup that can become considerations when people
make decisions about how to configure databases. This is important to think about when setting up a
new database or retrofitting a database to meet new needs.

There are a number of ways to set up the centralized database. Multiple programming languages are
well suited to database building and companies can also purchase database software rather than
developing their own. Users may have a number of ways to access material, and the database can be set
up with varying security levels to allow for more access controls. Information technology staffs maintain
the database with various operations to keep it orderly and address early signs of problems like viral
infections. They can also change access levels on request and administer the security system.

One advantage of the centralized database is the ability to access all the information in one location.
Searches of the database can be fast because the search engine does not need to check multiple
locations to return results. Information may also be easier to organize in a single location. In a database
upgrade to handle more information, servers can be added to the database site easily, and the company
will not have to balance the needs of a distributed database.

A centralized database can also be easier to physically secure. It can be enclosed in a variety of ways to
protect it from theft, sabotage, fire, and other issues. It is also possible to set up an extremely
robust computer security system to prevent unauthorized access. For extremely sensitive databases, the
computers may not be connected to a network, and users will have to physically enter the database
location to pull information. This may be used with some government computers that contain high-
security information.

There can also be disadvantages. A centralized database tends to create bottlenecks if multiple users
need to access it and their needs are substantial. It can also be very vulnerable if something happens to
it and a backup has not been performed or the existing backup is outdated. One advantage of
distributed databases is the redundancy factor, which can allow the system to function even if an
individual database is down.

Computational course chapter 5 Page 34


5.9.2 Distributed database

Database design typically includes the physical layout of hardware and software devices that manage a
company's data storage. There are multiple techniques that can be applied when designing a database.
A distributed database is a database that is split over multiple hardware devices but managed by a
central database controller. This distributed approach typically provides better performance and
reliability.

Dividing a database into separate physical units has many benefits. This approach provides better
control over specific data. It also distributes the load on the computer hardware and network devices.

A distributed database is normally separated by business units, companies, or geographical regions. This
approach provides for faster response times for users because the database is local to each business
unit within the organization. The business unit is typically smaller then the entire organization, which
reduces the overall load on the each server.

Most large companies have separate business units for specific functions. Some examples include
accounting, human resources, and sales departments. A distributed database is designed to serve
specific business units throughout the organization, while maintaining control from a central server. This
technique enables the separation of hardware and data throughout the company, which provides for
better control and overall performance.

A distributed database design provides the benefits of central access by corporate headquarters, while
enabling local access for specific business units. This is a good design for companies that are disbursed
throughout the world. It is also recommend for organizations that support multiple portfolios. Some
examples of industries that would benefit from this design include manufacturing, hospitality, and
banking.

A distributed database might also be used in an accounting operation. A global organization would
typically include a distributed database designed to serve each country. This geographical distribution
approach would enable the local country to query data faster. The central database would access each
country's data without impacting each local accounting application.

Distributed databases provide better flexibility for a business. With the data divided between multiple
servers, it can easily be replicated onto new hardware throughout the organization. This reduces the risk
on unavailable data due to hardware failure.

There are some drawbacks to a distributed database design. The most prevalent is database
integrity and concurrency. At times the distributed data may become unavailable to the central server.
This is typically due to network issues within the computer system. While the database will remain
available to the local business units, it may become outdated within the center headquarters of the
organization until the network issue is repaired.

Computational course chapter 5 Page 35


5.9.3 Client server architecture

The client–server model of computing is a distributed application structure that partitions tasks or
workloads between the providers of a resource or service, called servers, and service requesters,
called clients. Often clients and servers communicate over a computer network on separate hardware,
but both client and server may reside in the same system. A server host runs one or more server
programs which share their resources with clients. A client does not share any of its resources, but
requests a server's content or service function. Clients therefore initiate communication sessions with
servers which await incoming requests.

The data processing is split into distinct parts. A part is either requester (client) or provider (server). The
client sends during the data processing one or more requests to the servers to perform specified tasks.
The server part provides services for the clients.

This basic structure is called 2-tier structure. The client and server parts may reside on the same node or
on different nodes. A part can play the roles of a server of a service and a client of another service at the
same time. A client can be connected to several servers.

5.10 Data mining

Data Mining, also popularly known as Knowledge Discovery in Databases (KDD), refers to the nontrivial
extraction of implicit, previously unknown and potentially useful information from data in databases.
While data mining and knowledge discovery in databases (or KDD) are frequently treated as synonyms,
data mining is actually part of the knowledge discovery process. The following figure shows data mining
as a step in an iterative knowledge discovery process.

Computational course chapter 5 Page 36


The Knowledge Discovery in Databases process comprises of a few steps leading from raw data
collections to some form of new knowledge. The iterative process consists of the following steps:

 Data cleaning: also known as data cleansing, it is a phase in which noise data and irrelevant data
are removed from the collection.
 Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in
a common source.
 Data selection: at this step, the data relevant to the analysis is decided on and retrieved from
the data collection.
 Data transformation: also known as data consolidation, it is a phase in which the selected data
is transformed into forms appropriate for the mining procedure.
 Data mining: it is the crucial step in which clever techniques are applied to extract patterns
potentially useful.
 Pattern evaluation: in this step, strictly interesting patterns representing knowledge are
identified based on given measures.
 Knowledge representation: is the final phase in which the discovered knowledge is visually
represented to the user. This essential step uses visualization techniques to help users
understand and interpret the data mining results.

Application of data mining

 Sales/Marketing: Data mining enables businesses to understand the hidden patterns inside
historical purchasing transaction data, thus helping in planning and launching new marketing
campaigns in prompt and cost effective way.
 Banking / Finance: Several data mining techniques e.g., distributed data mining have been
researched, modeled and developed to help credit card fraud detection. Data mining is used to
identify customer’s loyalty by analyzing the data of customer’s purchasing activities such as the

Computational course chapter 5 Page 37


data of frequency of purchase in a period of time, a total monetary value of all purchases and
when was the last purchase. After analyzing those dimensions, the relative measure is
generated for each customer. The higher the score, the more relative loyal the customer is.
 Health Care and Insurance: The growth of the insurance industry entirely depends on the ability
to convert data into the knowledge, information or intelligence about customers, competitors,
and its markets. Data mining is applied in insurance industry lately but brought tremendous
competitive advantages to the companies who have implemented it successfully
 Data Mining Applications in Medicine: Data mining enables to characterize patient activities to
see incoming office visits and also helps to identify the patterns of successful medical therapies
for different illnesses.

5.11 Computational biology

Broadly speaking, computational biology is the application of computer science, statistics, and
mathematics to problems in biology. Computational biology spans a wide range of fields within biology,
including genomics/genetics, biophysics, cell biology, biochemistry, and evolution. Likewise, it makes
use of tools and techniques from many different quantitative fields, including algorithm design, machine
learning, Bayesian and frequent statistics, and statistical physics.

Much of computational biology is concerned with the analysis of molecular data, such as bio-sequences
(DNA, RNA, or protein sequences), three-dimensional protein structures, gene expression data, or
molecular biological networks (metabolic pathways, protein-protein interaction networks, or gene
regulatory networks). A wide variety of problems can be addressed using these data, such as the
identification of disease-causing genes, the reconstruction of the evolutionary histories of species, and
the unlocking of the complex regulatory codes that turn genes on and off. Computational biology can
also be concerned with non-molecular data, such as clinical or ecological data.

Differences between computational biology and bioinformatics

The terms computational biology and bioinformatics are often used interchangeably. However,
computational biology sometimes connotes the development of algorithms, mathematical models, and
methods for statistical inference, while bioinformatics is more associated with the development of
software tools, databases, and visualization methods.

Software tools for computational biology

Computational Biologists use a wide range of software. These range from command line programs to
graphical and web-based programs. Some of them are as follows:

 Antimony: Antimony is a human-readable and human-writable language for describing


biological modules. The modules can be connected together by declaring overlapping molecular
species between two modules or via the PoPS in/PoPS out interface.

Computational course chapter 5 Page 38


 Athena: Athena is a tool for building, simulating, and analyzing genetic circuits (as well as
metabolic/signaling networks, such as SBML files). It provides a visual interface for building
biological modules that can be saved and later connected together. The connection can be
achieved using either the PoPS interface or by defining overlapping molecular species (similar to
the concept of module in CellML and SBML). In addition to simulation, Athena supports a few
other useful features: Database of Ecoli regulatory network from RegulonDB, Graphical view of
part sequence, Automated derivation of transcription rate equations, Interface to all Systems
Biology Workbench programs, Interface with R statistical language, Easy plug-in architecture
 SynBioSS: SynBioSS (Synthetic Biology Software Suite) is a software suite for the quantitative
simulation of biochemical networks using hybrid stochastic algorithms. We believe that one
shouldn’t need to know how to program (or use command-line) to use sophisticated numerical
methods. Through this software, we intend to put the most powerful techniques for simulating
chemically reacting networks into the hands of biologists (or any scientist who can put them to
good scientific use). SynBioSS can accurately simulate any system modeled as a network of
reactions. In order to achieve this result, we wrapped up state-of-the-art algorithms inside a
user friendly graphical interface (GUI) that handles input data, runs the simulations and vividly
visualizes simulation results, without requiring any programming background from the user. The
software is open and runs on any of the three platforms most used by scientists: Windows,
Macintosh, and Linux.

5.12 Computational nanoscience

Nanoscience and nanotechnology change the nature of almost every human-made object in this
century. Advances in the field of nanoscience empower us with new tools for proving electronic devices
with ever-decreasing scale. Many people have projected that nanometer-scale devices will continue this
trend, bringing control of matter to unprecedented scales. This includes scale reduction not only in
microelectronics, but also in fields such as quantumswitch-based computing in the shorter term. These
advances have the potential to change the way we engineer our environment, construct and control
systems, and interact in society. Computational science, which has emerged as a third way of doing
research, one that complements theory and experiment, plays a key role in developing our
understanding of materials at the nanometer scale and in the development “by-design” of new
nanoscale materials and devices. Hence, modeling and simulation are now integral components of
scientific research.

Computational course chapter 5 Page 39

You might also like