Computer Applications: System Analysis and Design
Computer Applications: System Analysis and Design
Computer Applications: System Analysis and Design
Computer Applications
System Analysis and Design:
A collection of components that work together to realize some objective forms a system. Basically there are three major components in every system, namely input, processing and output.
In a system the different components are connected with each other and they are interdependent. For example, Human body represents a complete natural system. We are also bound by many national systems such as political system, economic system, educational system and so forth. The objectives of the system demand that some output is produced as a result of processing the suitable inputs. The term system originates from the Greek term systema, which means to place together. Multiple business and engineering domains have definitions of a system. This text defines a system as:
System An integrated set of interoperable elements, each with explicitly specified and bounded capabilities, working synergistically to perform valueadded processing to enable a User to satisfy mission-oriented operational needs in a prescribed operating environment with a specified outcome and probability of success.
Characteristics of a System
1. Organization: It says the Structure or order of built. 2. Interaction: Procedure in which the components interact. 3. Interdependence: How the modules in a systems are interdependent 4. Integration: How different modules are integrated to get a full system 5. Central Objective: How the system is made to achieve the central goal and its performance towards the goal.
Types of Systems
1. Physical or abstract systems 2. Open or closed systems 3. Deterministic or probabilistic 4. Man made systems Formal systems Organization representation Informal systems Employee based system Computer based information systems Computer handling business applications. These are collectively known as Computer Based Information systems (CBIS). a. Transaction Processing System (TPS) b. Management Information System (MIS) c. Decision Support System (DSS) d. Office Automation System (OAS) Systems analysis is the interdisciplinary part of science, dealing with analysis of sets of interacting entities, the systems, often prior to their automation as computer systems, and the interactions within those systems. This field is closely related to operations research. It is also "an explicit formal inquiry carried out to help 2
someone, referred to as the decision maker, identify a better course of action and make a better decision than he might have otherwise made.
System study Feasibility study System analysis System design Coding and Implementation Testing User Implementation Maintenance
29.5 PHASES OF SYSTEM DEVELOPMENT LIFE CYCLE Let us now describe the different phases and the related activities of system development life cycle in detail. (a) System Study System study is the first stage of system development life cycle. This gives a clear picture of what actually the physical system is? In practice, the system study is done in two phases. In the first phase, the preliminary survey of the system is done which helps in identifying the scope of the system. The second phase of the system study is more detailed and in-depth study in which the identification of users requirement and the limitations and problems of the present system are studied. After completing the system study, a system proposal is prepared by the System Analyst (who studies the system) and placed before the user. The proposed system contains the findings of the present system and recommendations to overcome the limitations and problems of the present system in the light of the users requirements. To describe the system study phase more analytically, we would say that system study phase passes through the following steps:
(b) Feasibility Study On the basis of result of the initial study, feasibility study takes place. The feasibility study is basically the test of the proposed system in the light of its workability, meeting users requirements, effective use of resources and .of course, the cost effectiveness. The main goal of feasibility study is not to solve the problem but to achieve the scope. In the process of feasibility study, the cost and benefits are estimated with greater accuracy.
There are 3 types of feasibility 1. Economical 2. Operational 3. Technical
Assuming that a new system is to be developed, the next phase is system analysis. Analysis involved a detailed study of the current system, leading to specifications of a new system. Analysis is a detailed study of various operations performed by a system and their relationships within and outside the system. During analysis, data are collected on the available files, decision points and transactions handled by the present system. Interviews, on-site observation and questionnaire are the tools used for system analysis. Using the following steps it becomes easy to draw the exact boundary of the new system under consideration:
Keeping in view the problems and new requirements Workout the pros and cons including new areas of the system
All procedures, requirements must be analyzed and documented in the form of detailed data flow diagrams (DFDs), data dictionary, logical data structures and miniature specifications. System Analysis also includes subdividing of complex process involving the entire system, identification of data store and manual processes. The main points to be discussed in system analysis are:
Specification of what the new system is to accomplish based on the user requirements. Functional hierarchy showing the functions to be performed by the new system and their relationship with each other. Function network which are similar to function hierarchy but they highlight the those functions which are common to more than one procedure. List of attributes of the entities - these are the data items which need to be held about each entity (record)
(d) System Design Based on the user requirements and the detailed analysis of a new system, the new system must be designed. This is the phase of system designing. It is a most crucial phase in the development of a system. Normally, the design proceeds in two stages :
Preliminary or general design: In the preliminary or general design, the features of the new system are specified. The costs of implementing these features and the benefits to be derived are estimated. If the project is still considered to be feasible, we move to the detailed design stage.
Structure or Detailed design: In the detailed design stage, computer oriented work begins in earnest. At this stage, the design of the system becomes more structured. Structure design is a blue print of a computer system solution to a given problem having the same components and interrelationship among the same components as the original problem. Input, output and processing specifications are drawn up in detail. In the design stage, the programming language and the platform in which the new system will run are also decided. There are several tools and techniques used for designing. These tools and techniques are:
Flowchart Data flow diagram (DFDs) Data dictionary Structured English Decision table Decision tree
Each of the above tools for designing will be discussed in detailed in the next lesson. (e) Coding and Implementation After designing the new system, the whole system is required to be converted into computer understanding language. Coding the new system into computer programming language does this. It is an important stage where the defined procedure are transformed into control specifications by the help of a computer language. This is also called the programming phase in which the programmer converts the program specifications into computer instructions, which we refer as programs. The programs coordinate the data movements and control the entire process in a system. It is generally felt that the programs must be modular in nature. This helps in fast development, maintenance and future change, if required. (f) Testing Before actually implementing the new system into operations, a test run of the system is done removing all the bugs, if any. It is an important phase of a successful system. After codifying the whole programs of the system, a test plan should be developed and run on a given set of test data. The output of the test run should match the expected results. Using the test data following test run are carried out:
Unit test: When the programs have been coded and compiled and brought to working conditions, they must be individually tested with the prepared test data. Any undesirable happening must be noted and debugged (error corrections). System Test: After carrying out the unit test for each of the programs of the system and when errors are removed, then system test is done. At this stage the test is done on actual data. The complete system is executed on the actual data. At each stage of the execution, the results or output of the system is analyzed. During the result analysis, it may be found that the outputs are not matching the expected out of the system. In such case, the errors in the particular programs are identified and are fixed and further tested for the expected output. When it is ensured that the system is running error-free, the users are called with their own actual data so that the system could be shown running as per their requirements. (g) User Implementation After having the user acceptance of the new system developed, the implementation phase begins. Implementation is the stage of a project during which theory is turned into practice. During this phase, all the programs of the system are loaded onto the user's computer. After loading the system, training of the users starts. Main topics of such type of training are:
to to to to
execute the package enter the data process the data (processing details) take out the reports
After the users are trained about the computerized system, manual working has to shift from manual to computerized working. The following two strategies are followed for running the system:
Parallel run: In such run for a certain defined period, both the systems i.e. computerized and manual are executed in parallel. This strategy is helpful because of the following:
Manual results can be compared with the results of the computerized system.
Failure of the computerized system at the early stage, does not affect the working of the organization, because the manual system continues to work, as it used to do.
Pilot run: In this type of run, the new system is installed in parts. Some part of the new system is installed first and executed successfully for considerable time period. When the results are found satisfactory then only other parts are implemented. This strategy builds the confidence and the errors are traced easily.
(h) Maintenance Maintenance is necessary to eliminate errors in the system during its working life and to tune the system to any variations in its working environment. It has been seen that there are always some errors found in the system that must be noted and corrected. It also means the review of the system from time to time. The review of the system is done for:
knowing the full capabilities of the system knowing the required changes or the additional requirements studying the performance
If a major change to a system is needed, a new project may have to be set up to carry out the change. The new project will then proceed through all the above life cycle phases.
Requirements Gathering
Requirements analysis in systems engineering and software engineering, encompasses those tasks that go into determining the needs or conditions to meet for a new or altered product, taking account of the possibly conflicting requirements of the various stakeholders, such as beneficiaries or users. Requirements analysis is critical to the success of a development project. Requirements must be actionable, measurable, testable, related to identified business needs or opportunities, and defined to a level of detail sufficient for system design. Requirements can be functional and non-functional. Conceptually, requirements analysis includes three types of activity:
Eliciting requirements: the task of communicating with customers and users to determine what their requirements are. This is sometimes also called requirements gathering. Analyzing requirements: determining whether the stated requirements are unclear, incomplete, ambiguous, or contradictory, and then resolving these issues. 8
Recording requirements: Requirements might be documented in various forms, such as natural-language documents, use cases, user stories, or process specifications.
Requirements analysis can be a long and arduous process during which many delicate psychological skills are involved. New systems change the environment and relationships between people, so it is important to identify all the stakeholders, take into account all their needs and ensure they understand the implications of the new systems. Analysts can employ several techniques to elicit the requirements from the customer. Historically, this has included such things as holding interviews, or holding focus groups (more aptly named in this context as requirements workshops) and creating requirements lists. More modern techniques include prototyping, and use cases. Where necessary, the analyst will employ a combination of these methods to establish the exact requirements of the stakeholders, so that a system that meets the business needs is produced.
Requirements engineering
Systematic requirements analysis is also known as requirements engineering. It is sometimes referred to loosely by names such as requirements gathering, requirements capture, or requirements specification. The term requirements analysis can also be applied specifically to the analysis proper, as opposed to elicitation or documentation of the requirements, for instance. Developing an IT application is an investment. Since after developing that application it provides the organization with profits. Profits can be monetary or in the form of an improved working environment. However, it carries risks, because in some cases an estimate can be wrong. And the project might not actually turn out to be beneficial. Cost benefit analysis helps to give management a picture of the costs, benefits and risks. It usually involves comparing alternate investments. Cost benefit determines the benefits and savings that are expected from the system and compares them with the expected costs. The cost of an information system involves the development cost and maintenance cost. The development costs are one time investment whereas maintenance costs are recurring. The development cost is basically the costs incurred during the various stages of the system development.
Each phase of the life cycle has a cost. Some examples are :
1. Development costsDevelopment costs that are incurred during the development of the system are one time investment.
Wages Equipment
2. Operating costs, e.g. , Wages Supplies Overheads Another classification of the costs can be: 3. Hardware/software costs: It includes the cost of purchasing or leasing of computers and its peripherals. A software cost involves required software costs. 4. Personnel costs: It is the money, spent on the people involved in the development of the system. These expenditures include salaries, other benefits such as health insurance, conveyance allowance, etc. 5. Facility costs: Expenses incurred during the preparation of the physical site where the system will be operational. These can be wiring, flooring, acoustics, lighting, and air conditioning. 6. Operating costs: Operating costs are the expenses required for the day to day running of the system. This includes the maintenance of the system. That can be in the
form of maintaining the hardware or application programs or money paid to professionals responsible for running or maintaining the system. 7. Supply costs: These are variable costs that vary proportionately with the amount of use of paper, ribbons, disks, and the like. These should be estimated and included in the overall cost ofthe system. Benefits We can define benefit as Profit or Benefit = Income - Costs Benefits can be accrued by : - Increasing income, or - Decreasing costs, or - both The system will provide some benefits also. Benefits can be tangible or intangible, direct or indirect. In cost benefit analysis, the first task is to identify each benefit and assign a monetary value to it. The two main benefits are improved performance and minimized processing costs. Further costs and benefits can be categorized as Tangible or Intangible Costs and Benefits Tangible cost and benefits can be measured. Hardware costs, salaries for professionals, software cost are all tangible costs. They are identified and measured.. The purchase of hardware or software, personnel training, and employee salaries are example of tangible costs. Costs whose value cannot be measured are referred as intangible costs. The cost of breakdown of an online system during banking hours will cause the bank lose deposits. Benefits are also tangible or intangible. For example, more customer satisfaction, improved company status, etc are all intangible benefits. Whereas improved response time, producing error free output such as producing reports are all tangible benefits. Both tangible and intangible costs and benefits should be considered in the evaluation process. Direct or Indirect Costs and Benefits
From the cost accounting point of view, the costs are treated as either direct or indirect. Direct costs are having rupee value associated with it. Direct benefits are also attributable to a given project. For example, if the proposed systems that can handle more transactions say 25% more than the present system then it is direct benefit. Indirect costs result from the operations that are not directly associated with the system. Insurance, maintenance, heat, light, air conditioning are all indirect costs. Fixed or Variable Costs and Benefits Some costs and benefits are fixed. Fixed costs don't change. Depreciation of hardware, Insurance, etc are all fixed costs. Variable costs are incurred on regular basis. Recurring period may be weekly or monthly depending upon the system. They are proportional to the work volume and continue as long as system is in operation. Fixed benefits don't change. Variable benefits are realized on a regular basis. Performing Cost Benefit Analysis (CBA) Example: Cost for the proposed system (figures in USD Thousands)
Profit = Benefits - Costs = 300, 000 -154, 000 = USD 146, 000 Since we are gaining, this system is feasible. Steps of CBA can briefly be described as:
Estimate the development costs, operating costs and benefits Determine the life of the system When will the benefits start to accrue? When will the system become obsolete? Determine the interest rate (this should reflect a realistic low risk investment rate.)
1. 2. 3. 4. 5. 6.
Present value analysis Payback analysis Net present value Net benefit analysis Cash-flow analysis Break-even analysis
An example of relational database software would be Microsoft Access, Oracle, Sybase and Paradox. Flat-file Database Software A flat-file database program allows the user to create many databases but lets him/her work with only one file at a time. Using a flat -file database program, one can create simple applications such as mailing list databases or personnel files. Advantages of the database approach over traditional fileprocessing systems Following are some of the advantages of using a database over a traditional file-processing system: Potential for enforcing standards. Flexibility. Reduced application development time. Availability of up-to-date information to all users. Economies of scale. Benefits of a Relational Database Following are some of the advantages of a relational database: Data can be easily accessed. Data can be shared. Data modeling can be flexibility. Data storage and redundancy can be reduced. Data inconsistency can be avoided. Data Integrity can be maintained. Standards can be enforced. Security restrictions can be applied. Independence between physical storage and logical data design can be maintained. High-level data manipulation language (SQL) can be used to access and manipulate data. A Relational database stores data is tables. The data stored in a table is organized into rows and columns. Each row in a table represents an individual record and each column represents a field. A record is an individual entry in the database. For example, each persons name, address, and phone number is a single record of information in a phone book. Where as a "field" is a piece of information in a record. For example, you can divide a persons record in the phone book into fields for their last name, first name, address, city and phone number.
The DBMS (Database Management System) is preferred ever the conventional file processing system due to the following advantages: 1. Controlling Data Redundancy - In the conventional file processing system, every user group maintains its own files for handling its data files. This may lead to Duplication of same data in different files. Wastage of storage space, since duplicated data is stored. Errors may be generated due to updation of the same data in different files. Time in entering data again and again is wasted. Computer Resources are needlessly used. It is very difficult to combine information.
2. Elimination of Inconsistency - In the file processing system information is duplicated throughout the system. So changes made in one file may be necessary be carried over to another file. This may lead to inconsistent data. So we need to remove this duplication of data in multiple file to eliminate inconsistency. For example: - Let us consider an example of student's result system. Suppose that in STUDENT file it is indicated that Roll no= 10 has opted for 'Computer course but in RESULT file it is indicated that 'Roll No. =l 0' has opted for 'Accounts' course. Thus, in this case the two entries for z particular student don't agree with each other. Thus, database is said to be in an inconsistent state. Sc to eliminate this conflicting information we need to centralize the database. On centralizing the data base the duplication will be controlled and hence inconsistency will be removed. Data inconsistency are often encountered in everyday life Consider an another example, w have all come across situations when a new address is communicated to an organization that we deal it (Eg - Telecom, Gas Company, Bank). We find that some of the communications from that organization are received at a new address while other continued to be mailed to the old address. So combining all the data in database would involve reduction in redundancy as well as inconsistency so it is likely to reduce the costs for collection storage and updating of Data. Let us again consider the example of Result system. Suppose that a student having Roll no -201 changes his course from 'Computer' to 'Arts'. The change is made in the SUBJECT file but not in RESULT'S file. This may lead to inconsistency of the data. So we need to centralize the database so that changes once made are reflected to all the tables where a particulars field is stored. Thus the update is brought automatically and is known as propagating updates. 3. Better service to the users - A DBMS is often used to provide better services to the users. In conventional system, availability of information is often poor, since it normally difficult to obtain information that the existing systems were not designed for. Once several conventional systems are combined to form one centralized database, the availability of information and its updateness is likely to improve since the data can now be shared and DBMS makes it easy to respond to anticipated information requests. 16
Centralizing the data in the database also means that user can obtain new and combined information easily that would have been impossible to obtain otherwise. Also use of DBMS should allow users that don't know programming to interact with the data more easily, unlike file processing system where the programmer may need to write new programs to meet every new demand. 4. Flexibility of the System is Improved - Since changes are often necessary to the contents of the data stored in any system, these changes are made more easily in a centralized database than in a conventional system. Applications programs need not to be changed on changing the data in the database. 5. Integrity can be improved - Since data of the organization using database approach is centralized and would be used by a number of users at a time. It is essential to enforce integrity-constraints. In the conventional systems because the data is duplicated in multiple files so updating or changes may sometimes lead to entry of incorrect data in some files where it exists. For example: - The example of result system that we have already discussed. Since multiple files are to maintained, as sometimes you may enter a value for course which may not exist. Suppose course can have values (Computer, Accounts, Economics, and Arts) but we enter a value 'Hindi' for it, so this may lead to an inconsistent data, so lack of Integrity. Even if we centralized the database it may still contain incorrect data. For example: Salary of full time employ may be entered as Rs. 500 rather than Rs. 5000. A student may be shown to have borrowed books but has no enrollment. A list of employee numbers for a given department may include a number of non existent employees. These problems can be avoided by defining the validation procedures whenever any update operation is attempted. 6. Standards can be enforced - Since all access to the database must be through DBMS, so standards are easier to enforce. Standards may relate to the naming of data, format of data, structure of the data etc. Standardizing stored data formats is usually desirable for the purpose of data interchange or migration between systems. 7. Security can be improved - In conventional systems, applications are developed in an adhoc/temporary manner. Often different system of an organization would access different components of the operational data, in such an environment enforcing security can be quiet difficult. Setting up of a database makes it easier to enforce security restrictions since data is now centralized. It is easier to control that has access to what parts of the database. Different checks can be established for each type of access (retrieve, modify, delete etc.) to each piece of information in 17
Consider an Example of banking in which the employee at different levels may be given access to different types of data in the database. A clerk may be given the authority to know only the names of all the customers who have a loan in bank but not the details of each loan the customer may have. It can be accomplished by giving the privileges to each employee. 8. Organization's requirement can be identified - All organizations have sections and departments and each of these units often consider the work of their unit as the most important and therefore consider their need as the most important. Once a database has been setup with centralized control, it will be necessary to identify organization's requirement and to balance the needs of the competition units. So it may become necessary to ignore some requests for information if they conflict with higher priority need of the organization. It is the responsibility of the DBA (Database Administrator) to structure the database system to provide the overall service that is best for an organization. For example: - A DBA must choose best file Structure and access method to give fast response for the high critical applications as compared to less critical applications. 9. Overall cost of developing and maintaining systems is lower - It is much easier to respond to unanticipated requests when data is centralized in a database than when it is stored in a conventional file system. Although the initial cost of setting up of a database can be large, one normal expects the overall cost of setting up of a database, developing and maintaining application programs to be far lower than for similar service using conventional systems, Since the productivity of programmers can be higher in using non-procedural languages that have been developed with DBMS than using procedural languages. 10. Data Model must be developed - Perhaps the most important advantage of setting up of database system is the requirement that an overall data model for an organization be build. In conventional systems, it is more likely that files will be designed as per need of particular applications demand. The overall view is often not considered. Building an overall view of an organization's data is usual cost effective in the long terms. 11. Provides backup and Recovery - Centralizing a database provides the schemes such as recovery and backups from the failures including disk crash, power failures, software errors which may help the database to recover from the inconsistent state to the state that existed prior to the occurrence of the failure, though methods are very complex. 12. Concurrent Access of data can be possible while keeping the whole data of the database consistent. Disadvantages of DBMS 18
A data model is a conceptual representation of the data structures that are required by a database. The data structures include the data objects, the associations between data objects, and the rules which govern operations on the objects. As the name implies, the data model focuses on what data is required and how it should be organized rather than what operations will be performed on the data. To use a common analogy, the data model is equivalent to an architect's building plans.
It maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user. In addition, the model can be used as a design plan by the database developer to implement a data model in specific database management software.
Basic Constructs of E-R Modeling The ER model views the real world as a construct of entities and association between entities. Entities Entities are the principal data object about which information is to be collected. Entities are usually recognizable concepts, either concrete or abstract, such as person, places, things, or events which have relevance to the database. Some specific examples of entities are EMPLOYEES, PROJECTS, and INVOICES. An entity is analogous to a table in the relational model.
Entities are classified as independent or dependent (in some methodologies, the terms used are strong and weak, respectively). An independent entity is one that does not rely on another for identification. A dependent entity is one that relies on another for identification. An entity occurrence (also called an instance) is an individual occurrence of an entity. An occurrence is analogous to a row in the relational table. Special Entity Types Associative entities (also known as intersection entities) are entities used to associate two or more entities in order to reconcile a many-to-many relationship. Subtypes entities are used in generalization hierarchies to represent a subset of instances of their parent entity, called the super type, but which have attributes or relationships that apply only to the subset. Associative entities and generalization hierarchies are discussed in more detail below. Relationships A Relationship represents an association between two or more entities. An example of a relationship would be: employees are assigned to projects projects have subtasks departments manage one or more projects Relationships are classified in terms of degree, connectivity, cardinality, and existence. These concepts will be discussed below. Attributes Attributes describe the entity of which they are associated. A particular instance of an attribute is a value. For example, "Jane R. Hathaway" is one value of the attribute Name. The domainof an attribute is the collection of all possible values an attribute can have. The domain of Name is a character string.
Attributes can be classified as identifiers or descriptors. Identifiers, more commonly called keys, uniquely identify an instance of an entity. A descriptor describes a non-unique characteristic of an entity instance. Classifying Relationships Relationships are classified by their degree, connectivity, cardinality, direction, type, and existence. Not all modeling methodologies use all these classifications. Degree of a Relationship The degree of a relationship is the number of entities associated with the relationship. The n-ary relationship is the general form for degree n. Special cases are the binary, and ternary, where the degree is 2, and 3, respectively. Binary relationships, the association between two entities are the most common type in the real world. A recursive binary relationship occurs when an entity is related to itself. An example might be "some employees are married to other employees". A ternary relationship involves three entities and is used when a binary relationship is inadequate. Many modeling approaches recognize only binary relationships. Ternary or n-ary relationships are decomposed into two or more binary relationships. Connectivity and Cardinality
The connectivity of a relationship describes the mapping of associated entity instances in the relationship. The values of connectivity are "one" or "many". The cardinality of a relationship is the actual number of related occurrences for each of the two entities. The basic types of connectivity for relations are: one-to-one, oneto-many, and many-to-many. A one-to-one (1:1) relationship is when at most one instance of a entity A is associated with one instance of entity B. For example, "employees in the company are each assigned their own office. For each employee there exists a unique office and for each office there exists a unique employee. A one-to-many (1:N) relationships is when for one instance of entity A, there are zero, one, or many instances of entity B, but for one instance of entity B, there is only one instance of entity A. An example of a 1:N relationships is a department has many employees each employee is assigned to one department
A many-to-many (M:N) relationship, sometimes called non-specific, is when for one instance of entity A, there are zero, one, or many instances of entity B and for one instance of entity B there are zero, one, or many instances of entity A. An example is: Employees can be assigned to no more than two projects at the same time; Projects must have assigned at least three employees A single employee can be assigned to many projects; conversely, a single project can have assigned to it many employee. Here the cardinality for the relationship between employees and projects is two and the cardinality between project and employee is three. Many-to-many relationships cannot be directly translated to relational tables but instead must be transformed into two or more one-to-many relationships using associative entities. Direction The direction of a relationship indicates the originating entity of a binary relationship. The entity from which a relationship originates is the parent entity; the entity where the relationship terminates is the child entity. The direction of a relationship is determined by its connectivity. In a one-to-one relationship the direction is from the independent entity to a dependent entity. If both entities are independent, the direction is arbitrary. With one-to-many relationships, the entity occurring once is the parent. The direction of many-tomany relationships is arbitrary.
An identifying relationship is one in which one of the child entities is also a dependent entity. A non-identifying relationship is one in which both entities are independent.
Existence denotes whether the existence of an entity instance is dependent upon the existence of another, related, entity instance. The existence of an entity in a relationship is defined as either mandatory or optional. If an instance of an entity must always occur for an entity to be included in a relationship, then it is mandatory. An example of mandatory existence is the statement "every project must be managed by a single department". If the instance of the entity is not required, it is optional. An example of optional existence is the statement, "employees may be assigned to work on projects".
Generalization Hierarchies A generalization hierarchy is a form of abstraction that specifies that two or more entities that share common attributes can be generalized into a higher level entity type called a supertype or generic entity. The lower-level of entities become the subtype, or categories, to the supertype. Subtypes are dependent entities. Generalization occurs when two or more entities represent categories of the same real-world object. For example, Wages_Employees and
Classified_Employees represent categories of the same entity, Employees. In this example, Employees would be the supertype; Wages_Employees and Classified_Employees would be the subtypes. Subtypes can be either mutually exclusive (disjoint) or overlapping (inclusive). A mutually exclusive category is when an entity instance can be in only one category. The above example is a mutually exclusive category. An employee can either be wages or classified but not both. An overlapping category is when an entity instance may be in two or more subtypes. An example would be a person who works for a university could also be a student at that same university. The completeness constraint requires that all instances of the subtype be represented in the supertype. Generalization hierarchies can be nested. That is, a subtype of one hierarchy can be a supertype of another. The level of nesting is limited only by the constraint of simplicity. Subtype entities may be the parent entity in a relationship but not the child. ER Notation There is no standard for representing data objects in ER diagrams. Each modeling methodology uses its own notation. The original notation used by Chen is widely used in academics texts and journals but rarely seen in either CASE tools or publications by non-academics. Today, there are a number of notations used, among the more common are Bachman, crow's foot, and IDEFIX. All notational styles represent entities as rectangular boxes and relationships as lines connecting boxes. Each style uses a special set of symbols to represent the cardinality of a connection. The notation used in this document is from Martin. The symbols used for the basic ER constructs are:
Entities are represented by labeled rectangles. The label is the name of the entity. Entity names should be singular nouns. Relationships are represented by a solid line connecting two entities. The name of the relationship is written above the line. Relationship names should be verbs. Attributes, when included, are listed inside the entity rectangle. Attributes which are identifiers are underlined. Attribute names should be singular nouns. Cardinality of many is represented by a line ending in a crow's foot. If the crow's foot is omitted, the cardinality is one. Existence is represented by placing a circle or a perpendicular bar on the line. Mandatory existence is shown by the bar (looks like a 1) next to the entity for an instance is required. Optional existence is shown by placing a circle next to the entity that is optional.
Figure 1: ER Notation
Developing an ERD
Developing an ERD requires an understanding of the system and its components. Before discussing the procedure, let's look at a narrative created by Professor Harman.
Consider a hospital:
Patients are treated in a single ward by the doctors assigned to them. Usually each patient will be assigned a single doctor, but in rare cases they will have two. Heathcare assistants also attend to the patients, a number of these are associated with each ward. Initially the system will be concerned solely with drug treatment. Each patient is required to take a variety of drugs a certain number of times per day and for varying lengths of time. The system must record details concerning patient treatment and staff payment. Some staff are paid part time and doctors and care assistants work varying amounts of overtime at varying rates (subject to grade). The system will also need to track what treatments are required for which patients and when and it should be capable of calculating the cost of treatment per week for each patient (though it is currently unclear to what use this information will be put).
3. Add attributes to the relations; these are determined by the queries,and may
also suggest new entities, e.g. grade; or they may suggest the need for keys or identifiers. What questions can we ask? a. Which doctors work in which wards? b. How much will be spent in a ward in a given week? c. How much will a patient cost to treat? d. How much does a doctor cost per week? e. Which assistants can a patient expect to see? f. Which drugs are being used? 4. Add cardinality to the relations Many-to-Many must be resolved to two one-to-manys with an additional entity Usually automatically happens Sometimes involves introduction of a link entity (which will be all foreign key) Examples: Patient-Drug 5. This flexibility allows us to consider a variety of questions such as: a. Which beds are free? b. Which assistants work for Dr. X? c. What is the least expensive prescription? d. How many doctors are there in the hospital? e. Which patients are family related? 6. Represent that information with symbols. Generally E-R Diagrams require the use of the following symbols:
Reading an ERD
It takes some practice reading an ERD, but they can be used with clients to discuss business rules. These allow us to represent the information from above such as the E-R Diagram below:
ERD brings out issues: Many-to-Manys Ambiguities Entities and their relationships What data needs to be stored The Degree of a relationship Now, think about a university in terms of an ERD. What entities, relationships and attributes might you consider? Look at this simplified view. There is also an example of a simplified view of an airline on that page.
Database Normalization
In the field of relational database design, normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristicsinsertion, update, and deletion 27
anomaliesthat could lead to a loss of data integrity and also it is process of removing the redundancy. Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know as the First Normal Form (1NF) in 1970. Codd went on to define the Second Normal Form (2NF) and Third Normal Form (3NF) in 1971. When an attempt is made to modify (update, insert into, or delete from) a table, undesired side-effects may follow. Not all tables can suffer from these side-effects; rather, the side-effects can only arise in tables that have not been sufficiently normalized. An insufficiently normalized table might have one or more of the following characteristics:
The same information can be expressed on multiple rows; therefore updates to the table may result in logical inconsistencies. For example, each record in an "Employees' Skills" table might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will potentially need to be applied to multiple records (one for each of his skills). If the update is not carried through successfullyif, that is, the employee's address is updated on some records but not othersthen the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular employee's address is. This phenomenon is known as an update anomaly. There are circumstances in which certain facts cannot be recorded at all. For example, each record in a "Faculty and Their Courses" table might contain a Faculty ID, Faculty Name, Faculty Hire Date, and Course Codethus we can record the details of any faculty member who teaches at least one course, but we cannot record the details of a newly-hired faculty member who has not yet been assigned to teach any courses. This phenomenon is known as an insertion anomaly. There are circumstances in which the deletion of data representing certain facts necessitates the deletion of data representing completely different facts. The "Faculty and Their Courses" table described in the previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the last of the records on which that faculty member appears. This phenomenon is known as a deletion anomaly.
An update anomaly. Employee 519 is shown as having different addresses on different records.
An insertion anomaly. Until the new faculty member, Dr. Newsome, is assigned to teach at least one course, his details cannot be recorded.
A deletion anomaly. All information about Dr. Giddens is lost when he temporarily ceases to be assigned to any courses.
First Normal Form
Eliminate repeating groups in individual tables. Create a separate table for each set of related data. Identify each set of related data with a primary key.
A table is said to be in 1NF if the intersection of every row and column contains only single data. i.e the multivalued data is not permitted . Do not use multiple fields in a single table to store similar data. For example, to track an inventory item that may come from two possible sources, an inventory record may contain fields for Vendor Code 1 and Vendor Code 2. But what happens when you add a third vendor? Adding a field is not the answer; it requires program and table modifications and does not smoothly accommodate a dynamic number of vendors. Instead, place all vendor information in a separate table called Vendors, then link inventory to vendors with an item number key, or vendors to inventory with a vendor code key.
For a table to be in first normal form, data must be broken up into the smallest units possible. For example, the following table is not in first normal form. 29
Address 123 Broadway New York, NY, 11234 456 Jolly Jumper St. Trenton NJ, 11547
To conform to first normal form, this table would require additional fields. The name field should be divided into first and last name and the address should be divided by street, city state, and zip like this.
State Zip NY NJ
In addition to breaking data up into the smallest meaningful values, tables in first normal form should not contain repetitions groups of fields such as in the following table.
Client 1 Time 1 Client 2 Time 2 Client 3 US Corp. 14 hrs Italiana 67 hrs Taggarts 26 hrs Linkers 2 hrs
Time 3
The problem here is that each representative can have multiple clients not all will have three. Some may have less as is the case in the second record, tying up storage space in your database that is not being used, and some may have more, in which case there are not enough fields. The solution to this is to add a record for each new piece of information.
Rep ID
TS-89 Gilroy TS-89 Gilroy TS-89 Gilroy RK-56 Mary RK-56 Mary
US Corp 14 hrs Taggarts 26 hrs Kilroy Inc. Italiana Linkers 9 hrs 67 hrs 2 hrs
Notice the splitting of the first and last name fields again. 30
This table is now in first normal form. Note that by avoiding repeating groups of fields, we have created a new problem in that there are identical values in the primary key field, violating the rules of the primary key. In order to remedy this, we need to have some other way of identifying each record. This can be done with the creation of a new key called client ID.
Client US Corp
This new field can now be used in conjunction with the Rep ID field to create a multiple field primary key. This will prevent confusion if ever more than one Representative were to serve a single client.
Create separate tables for sets of values that apply to multiple records. Relate these tables with a foreign key.
A table is said to be in 2NF if it is in 1NF and all non key attributes are fully functionally dependent on the primary key. A fully functional dependency is a dependency where the r.h.s (dependent) is fully dependent on the composite L.h.s (determinant) part. i.e AB --- > C. Then C is fully functionally dependent on AB. If A --- > C or B --- > C holds then it is not to be said as fully functionally dependency rather partial dependency. Records should not depend on anything other than a table's primary key (a compound key, if necessary). For example, consider a customer's address in an accounting system. The address is needed by the Customers table, but also by the Orders, Shipping, Invoices, Accounts Receivable, and Collections tables. Instead of storing the customer's address as a separate entry in each of these tables, store it in one place, either in the Customers table or in a separate Addresses table.
Computer Applications BBA 3rd year The second normal form applies only to tables with multiple field primary keys. Take the following table for example.
Rep ID*
TS-89 Gilroy TS-89 Gilroy TS-89 Gilroy RK-56 Mary RK-56 Mary RK-56 Mary
This table is already in first normal form. It has a primary key consisting of Rep ID and Client ID since neither alone can be considered a unique value.
The second normal form states that each field in a multiple field primary key table must be directly related to the entire primary key. Or in other words, each non-key field should be a fact about all the fields in the primary key. Only fields that are absolutely necessary should show up in our table, all other fields should reside in different tables. In order to find out which fields are necessary we should ask a few questions of our database. In our preceding example, I should ask the question "What information is this table meant to store?" Currently, the answer is not obvious. It may be meant to store information about individual clients, or it could be holding data for employees time cards. As a further example, if my database is going to contain records of employees I may want a table of demographics and a table for payroll. The demographics will have all the employees personal information and will assign them an ID number. I should not have to enter the data twice, the payroll table on the other hand should refer to each employee only by their ID number. I can then link the two tables by a relationship and will then have access to all the necessary data. In the table of the preceding example we are devoting three field to the identification of the employee and two to the identification of the client. I could identify them with only one field each -- the primary key. I can then take out the extraneous fields and put them in their own table. For example, my database would then look like the following.
Computer Applications BBA 3rd year Rep ID* Client ID* Time With Client 14 hrs 26 hrs 9 hrs 67 hrs 2 hrs 4 hrs
TS-89 978 TS-89 665 TS-89 782 RK-56 221 RK-56 982 RK-56 665
Client Client Name ID* 978 665 782 221 982 US Corp Taggarts Kilroy Inc. Italiana Linkers
The above table contains Client Information These tables are now in normal form. By splitting off the unnecessary information and putting it in its own tables, we have eliminated redundancy and put our first table in second normal form. These tables are now ready to be linked through relationship to each other.
I.e. in 3rd normal form there should not be any transitive dependency.
Remove transitive dependencies. Transitive Dependency A type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key. Create a separate table containing the attribute and the fields that are functionally dependent on it. Tables created at this step will usually contain descriptions of either resources or agents. Keep a copy of the key attribute in the original file.
A table is said to be in 3NF if it is in 2NF and every non key attributes are not transitively dependent on the Primary key. I.e. there should not be any transitive dependency. Third Normal Form Example The new tables would be: CustomerNo, CustomerName, CustomerAdd ClerkNo, ClerkName All of these fields except the primary key will be removed from the original table. The primary key will be left in the original table to allow linking of data as follows: SalesOrderNo, Date, CustomerNo, ClerkNo Together with the unchanged tables below, these tables make up the database in third normal form. ItemNo, Description SalesOrderNo, ItemNo, Qty, UnitPrice What if we did not Normalize the Database to Third Normal Form? Repetition of Data Detail for Cust/Clerk would appear on every SO Delete Anomalies Delete a sales order, delete the customer/clerk Insert Anomalies To insert a customer/clerk, must insert sales order. Update Anomalies To change the name/address, etc, must change it on every SO.
Completed Tables in Third Normal Form Customers: CustomerNo, CustomerName, CustomerAdd Clerks: ClerkNo, ClerkName Inventory Items: ItemNo, Description 34
Computer Applications BBA 3rd year Sales Orders: SalesOrderNo, Date, CustomerNo, ClerkNo SalesOrderDetail: SalesOrderNo, ItemNo, Qty, UnitPrice
Values in a record that are not part of that record's key do not belong in the table. In general, any time the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table. For example, in an Employee Recruitment table, a candidate's university name and address may be included. But you need a complete list of universities for group mailings. If university information is stored in the Candidates table, there is no way to list universities with no current candidates. Create a separate Universities table and link it to the Candidates table with a university code key. EXCEPTION: Adhering to the third normal form, while theoretically desirable, is not always practical. If you have a Customers table and you want to eliminate all possible interfield dependencies, you must create separate tables for cities, ZIP codes, sales representatives, customer classes, and any other factor that may be duplicated in multiple records. In theory, normalization is worth pursuing; however, many small tables may degrade performance or exceed open file and memory capacities. It may be more feasible to apply third normal form only to data that changes frequently. If some dependent fields remain, design your application to require the user to verify all related fields when any one is changed.
SQL often referred to as Structured Query Language is a database computer language designed for managing data in relational database management systems (RDBMS), and originally based upon relational algebra. Its scope includes data query and update, schema creation and modification, and data access control. SQL was one of the first languages for Edgar F. Codd's relational model in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks"[4] and became the most widely used language for relational databases. 35
SQL was developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. This version, initially called SEQUEL, was designed to manipulate and retrieve data stored in IBM's original relational database product, System R. The SQL language is sub-divided into several language elements, including:
Clauses, which are in some cases optional, constituent components of statements and queries.[9] Expressions which can produce either scalar values or tables consisting of columns and rows of data. Predicates which specify conditions that can be evaluated to SQL threevalued logic (3VL) Boolean truth values and which are used to limit the effects of statements and queries, or to change program flow. Queries which retrieve data based on specific criteria. Statements which may have a persistent effect on schemas and data, or which may control transactions, program flow, connections, sessions, or diagnostics. o SQL statements also include the semicolon (";") statement terminator. Though not required on every platform, it is defined as a standard part of the SQL grammar.
The most common operation in SQL is the query, which is performed with the declarative SELECT statement. SELECT retrieves data from one or more tables, or expressions. Standard SELECT statements have no persistent effects on the database. Some non-standard implementations of SELECT can have persistent effects, such as the SELECT INTO syntax that exists in some databases.[10] Queries allow the user to describe desired data, leaving the database management system (DBMS) responsible for planning, optimizing, and performing the physical operations necessary to produce that result as it chooses. A query includes a list of columns to be included in the final result immediately following the SELECT keyword. An asterisk ("*") can also be used to specify that the query should return all columns of the queried tables. SELECT is the most complex statement in SQL, with optional keywords and clauses that include:
The FROM clause which indicates the table(s) from which data is to be retrieved. The FROM clause can include optional JOIN subclauses to specify the rules for joining tables. The WHERE clause includes a comparison predicate, which restricts the rows returned by the query. The WHERE clause eliminates all rows from the result set for which the comparison predicate does not evaluate to True. The GROUP BY clause is used to project rows having common values into a smaller set of rows. GROUP BY is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The WHERE clause is applied before the GROUP BY clause.
The HAVING clause includes a predicate used to filter rows resulting from the GROUP BY clause. Because it acts on the results of the GROUP BY clause, aggregation functions can be used in the HAVING clause predicate. The ORDER BY clause identifies which columns are used to sort the resulting data, and in which direction they should be sorted (options are ascending or descending). Without an ORDER BY clause, the order of rows returned by an SQL query is undefined.
The following is an example of a SELECT query that returns a list of expensive books. The query retrieves all rows from the Book table in which the price column contains a value greater than 100.00. The result is sorted in ascending order by title. The asterisk (*) in the select list indicates that all columns of the Book table should be included in the result set.
SELECT * FROM Book WHERE price > 100.00 ORDER BY title;
Data manipulation
The Data Manipulation Language (DML) is the subset of SQL used to add, update and delete data:
INSERT INTO My_table (field1, field2, field3) VALUES ('test', 'N', NULL);
TRUNCATE deletes all data from a table in a very fast way. It usually implies a subsequent COMMIT operation. MERGE is used to combine the data of multiple tables. It combines the INSERT and UPDATE elements. It is defined in the SQL:2003 standard; prior to that, some databases provided similar functionality via different syntax, sometimes called "upsert".
Transaction controls(TCL)
Computer Applications BBA 3rd year Transactions, if available, wrap DML operations:
START TRANSACTION (or BEGIN WORK, or BEGIN TRANSACTION, depending on SQL dialect) mark the start of a database transaction, which either completes entirely or not at all. SAVE TRANSACTION (or SAVEPOINT ) save the state of the database at the current point in transaction
CREATE TABLE tbl_1(id int); INSERT INTO tbl_1(id) value(1); INSERT INTO tbl_1(id) value(2); COMMIT; UPDATE tbl_1 SET id=200 WHERE id=1; SAVEPOINT id-1upd; UPDATE tbl_1 SET id=1000 WHERE id=2; ROLLBACK TO id-1upd; SELECT id FROM tbl_1;
COMMIT causes all data changes in a transaction to be made permanent. ROLLBACK causes all data changes since the last COMMIT or ROLLBACK to be discarded, leaving the state of the data as it was prior to those changes.
Once the COMMIT statement completes, the transaction's changes cannot be rolled back. COMMIT and ROLLBACK terminate the current transaction and release data locks. In the absence of a START TRANSACTION or similar statement, the semantics of SQL are implementation-dependent. Example: A classic bank transfer of funds transaction. START TRANSACTION; UPDATE Account SET amount=amount-200 WHERE account_number=1234; UPDATE Account SET amount=amount+200 WHERE account_number=2345; IF ERRORS=0 COMMIT; IF ERRORS<>0 ROLLBACK;
Data definition
The Data Definition Language (DDL) manages table and index structure. The most basic items of DDL are the CREATE, ALTER, RENAME, DROP and TRUNCATE statements:
CREATE creates an object (a table, for example) in the database. DROP deletes an object in the database, usually irretrievably. 38
ALTER modifies the structure of an existing object in various waysfor example, adding a column to an existing table.
Example: CREATE TABLE My_table ( my_field1 INT, my_field2 VARCHAR(50), my_field3 DATE NOT NULL, PRIMARY KEY (my_field1, my_field2) );
Embedded SQL
Embedded SQL is a method of combining the computing power of a programming language and the database manipulation capabilities of SQL. Embedded SQL statements are SQL statements written in line with the program source code of the host language. The embedded SQL statements are parsed by an embedded SQL preprocessor and replaced by host-language calls to a code library. The output from the preprocessor is then compiled by the host compiler. This allows programmers to embed SQL statements in programs written in any number of languages such as: C/C++, COBOL and FORTRAN. The ANSI SQL standards committee defined the embedded SQL standard in two steps: a formalism called Module Language was defined, then the embedded SQL standard was derived from Module Language.[1] The SQL standard defines embedding of SQL as embedded SQL and the language in which SQL queries are embedded is referred to as the host language. A popular host language is C. The mixed C and embedded SQL is called Pro*C in Oracle and Sybase database management systems. In the PostgreSQL database management system this precompiled version is called ECPG. Other embedded SQL precompilers are Pro*Ada, Pro*COBOL, Pro*FORTRAN, Pro*Pascal, and Pro*PL/I.
Oracle Corporation
Ada Pro*Ada was officially desupported by Oracle in version 7.3. Starting with Oracle8, Pro*Ada has been replaced by SQL*Module but appears to have not been updated 39
since.[3] SQL*Module is a module language that offers a different programming method from embedded SQL. SQL*Module supports the Ada83 language standard for Ada. C/C++ Pro*C became Pro*C/C++ with Oracle8. Pro*C/C++ is currently supported as of Oracle Database 11g. COBOL Pro*COBOL is currently supported as of Oracle Database 11g. Fortran Pro*FORTRAN is no longer updated as of Oracle8 but Oracle will continue to issue patch releases as bugs are reported and corrected.[4] Pascal Pro*Pascal was not released with Oracle8.[4] PL/I Pro*PL/I was not released with Oracle8. The Pro*PL/I Supplement to the Oracle Precompilers Guide, however, continues to make appearances in the Oracle Documentation Library (current as of release 11g).[4]
C/C++ ECPG is part of PostgreSQL since version 6.3. COBOL Cobol-IT is now distributing a COBOL precompiler for PostgreSQL
C/C++ SESC is an embedded SQL precompiler provided by Altibase Corp. for its DBMS server.
With DataFlex 3.2 and Visual DataFlex you can pass SQL statements via one of the Data Access CLI connectivity kits to Microsoft SQL Server, IBM DB2 or any ODBC supporting database. The results can be retrieved and processed.
COBOL Cobol-IT is distributing a Embedded SQL precompiler for COBOL.
File Organization
File organization is the methodology which is applied to structured computer files. Files contain computer records which can be documents or information which is stored in a certain way for later retrieval. File organization refers primarily to the logical arrangement of data (which can itself be organized in a system of records with correlation between the fields/columns) in a file system. It should not be confused with the physical storage of the file in some types of storage media. There are certain basic types of computer file, which can include files stored as blocks of data and streams of data, where the information streams out of the file while it is being read until the end of the file is encountered. We will look at two components of file organization here:
1. The way the internal file structure is arranged and 2. The external file as it is presented to the O/S or program that calls it. Here we will also examine the concept of file extensions.
We will examine various ways that files can be stored and organized. Files are presented to the application as a stream of bytes and then an EOF (end of file) condition. A program that uses a file needs to know the structure of the file and needs to interpret its contents.
A file should be organized in such a way that the records are always available for processing with no delay. This should be done in line with the activity and volatility of the information.
Sequential Organization
A sequential file contains records organized in the order they were entered. The order of the records is fixed. The records are stored and sorted in physical, contiguous blocks within each block the records are in sequence. Records in these files can only be read or written sequentially. Once stored in the file, the record cannot be made shorter, or longer, or deleted. However, the record can be updated if the length does not change. (This is done by replacing the records by creating a new file.) New records will always appear at the end of the file. If the order of the records in a file is not important, sequential organization will suffice, no matter how many records you may have. Sequential output is also useful for report printing or sequential reads which some programs prefer to do.
Line-Sequential Organization
Line-sequential files are like sequential files, except that the records can contain only characters as data. Line-sequential files are maintained by the native byte stream files of the operating system. In the COBOL environment, line-sequential files that are created with WRITE statements with the ADVANCING phrase can be directed to a printer as well as to a disk.
Indexed-Sequential Organization
Key searches are improved by this system too. The single-level indexing structure is the simplest one where a file, whose records are pairs, contains a key pointer. This pointer is the position in the data file of the record with the given key. A subset of the records, which are evenly spaced along the data file, is indexed, in order to mark intervals of data records. This is how a key search is performed: the search key is compared with the index keys to find the highest index key coming in front of the search key, while a linear search is performed from the record that the index key points to, until the search key is matched or until the record pointed to by the next index entry is reached. Regardless of double file access (index + data) required by this sort of search, the access time reduction is significant compared with sequential file searches. Let's examine, for sake of example, a simple linear search on a 1,000 record sequentially organized file. An average of 500 key comparisons is needed (and this assumes the search keys are uniformly distributed among the data keys). However, using an index evenly spaced with 100 entries, the total number of comparisons is reduced to 50 in the index file plus 50 in the data file: a five to one reduction in the operations count! Hierarchical extension of this scheme is possible since an index is a sequential file in itself, capable of indexing in turn by another second-level index, and so forth and so on. And the exploit of the hierarchical decomposition of the searches more and more, to decrease the access time will pay increasing dividends in the reduction of processing time. There is however a point when this advantage starts to be reduced by the increased cost of storage and this in turn will increase the index access time. Hardware for Index-Sequential Organization is usually Disk-based, rather than tape. Records are physically ordered by primary key. And the index gives the physical 43
location of each record. Records can be accessed sequentially or directly, via the index. The index is stored in a file and read into memory at the point when the file is opened. Also, indexes must be maintained. Life sequential organization the data is stored in physical contiguous box. However the difference is in the use of indexes. There are three areas in the disc storage:
Primary Area:-Contains file records stored by key or ID numbers. Overflow Area:-Contains records area that cannot be placed in primary area. Index Area:-It contains keys of records and there locations on the disc.
Inverted List
In file organization, this is a file that is indexed on many of the attributes of the data itself. The inverted list method has a single index for each key type. The records are not necessarily stored in a sequence. They are placed in the data storage area, but indexes are updated for the record keys and location. Here's an example, in a company file, an index could be maintained for all products, and another one might be maintained for product types. Thus, it is faster to search the indexes than every record. These types of file are also known as "inverted indexes." Nevertheless, inverted list files use more media space and the storage devices get full quickly with this type of organization. The benefits are apparent immediately because searching is fast. However, updating is much slower. Content-based queries in text retrieval systems use inverted indexes as their preferred mechanism. Data items in these systems are usually stored compressed which would normally slow the retrieval process, but the compression algorithm will be chosen to support this technique. When querying a file there are certain circumstances when the query is designed to be modal which means that rules are set which require that different information be held in the index. Here's an example of this modality: when phrase querying is undertaken, the particular algorithm requires that offsets to word classifications are held in addition to document numbers.
interrelated databases distributed over a computer network. Sometimes "distributed database system" is used to refer jointly to the distributed database and the distributed DBMS.
A distributed database is a database that is under the control of a central database management system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers. Collections of data (e.g. in a database) can be distributed across multiple physical locations. A distributed database can reside on network servers on the Internet, on corporate intranets or extranets, or on other company networks. Replication and distribution of databases improve database performance at end-user worksites. Template: Needs clarification To ensure that the distributive databases are up to date and current, there are two processes: replication and duplication. Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be very complex and time consuming depending on the size and number of the distributive databases. This process can also require a lot of time and computer resources. Duplication on the other hand is not as complicated. It basically identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. In the duplication process, changes to the master database only are allowed. This is to ensure that local data will not be overwritten. Both of the processes can keep the data current in all distributive locations. Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consistency and integrity.
Important considerations
Care with a distributed database must be taken to ensure the following:
The distribution is transparent users must be able to interact with the system as if it were one logical system. This applies to the system's performance, and methods of access among other things. Transactions are transparent each transaction must maintain database integrity across multiple databases. Transactions must also be divided into subtransactions, each subtransaction affecting one database system.
Management of distributed data with different levels of transparency. Increase reliability and availability. Easier expansion. Reflects organizational structure database fragments are located in the departments they relate to. Local autonomy a department can control the data about them (as they are the ones familiar with it.) Protection of valuable data if there were ever a catastrophic event such as a fire, all of the data would not be in one place, but distributed in multiple locations. Improved performance data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on one module of the database won't affect other modules of the database in a distributed database.) Economics it costs less to create a network of smaller computers with the power of a single large computer. Modularity systems can be modified, added and removed from the distributed database without affecting other modules (systems). Reliable transactions - Due to replication of database. Hardware, Operating System, Network, Fragmentation, DBMS, Replication and Location Independence. Continuous operation. Distributed Query processing. Distributed Transaction management.
Single site failure does not affect performance of system. All transactions follow A.C.I.D. property: a-atomicity, the transaction takes place as whole or not at all; c-consistency, maps one consistent DB state to another; i-isolation, each transaction sees a consistent DB; d-durability, the results of a transaction must survive system failures. The Merge Replication Method used to consolidate the data between databases.
Complexity extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to account for the disconnected nature of the database for example, joins become prohibitively expensive when performed across multiple systems. Economics increased complexity and a more extensive infrastructure means extra labour costs. Security remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure 46
must also be secured (e.g., by encrypting the network links between remote sites). Difficult to maintain integrity in a distributed database, enforcing integrity over a network may require too much of the network's resources to be feasible. Inexperience distributed databases are difficult to work with, and as a young field there is not much readily available experience on proper practice. Lack of standards there are no tools or methodologies yet to help users convert a centralized DBMS into a distributed DBMS. Database design more complex besides of the normal difficulties, the design of a distributed database has to consider fragmentation of data, allocation of fragments to specific sites and data replication. Additional software is required. Operating System should support distributed environment. Concurrency control: it is a major issue. It is solved by locking and time stamping.
A distributed database management system is software for managing databases stored on multiple computers in a network. A distributed database is a set of databases stored on multiple computers that typically appears to applications on a single database. Consequently, an application can simultaneously access and modify the data in several databases in a network. DDBMS is specially developed for heterogeneous database platforms, focusing mainly on heterogeneous database management systems (HDBMS).
The car example is analogous to the object-oriented software. Rather than writing a huge program to create, for example, a project management system, the solution is broken into real-world parts such as project, task, estimate, actual, deliverable, etc. Each of these can then be developed and tested independently before being combined. An object-oriented program may be considered a collection of interacting objects. Each object is capable of sending and receiving messages, and processing data. Consider the objects of a driver, a car, and a traffic light. When the traffic light changes, it sends a virtual message to the driver. The driver receives the message, and then chooses to accelerate or decelerate. This sends a virtual message to the car. When the car's speed changes, it sends a virtual message back to the driver, via the speedometer.
Key Concepts
Classes and Objects
The basic building blocks of object-oriented programming are the class and the object. A class defines the available characteristics and behavior of a set of similar objects and is defined by a programmer in code. A class is an abstract definition that is made concrete at run-time when objects based upon the class are instantiated and take on the class' behavior. As an analogy, let's consider the concept of a 'vehicle' class. The class developed by a programmer would include methods such as Steer(), Accelerate() and Brake(). The class would also include properties such as Colour, NumberOfDoors, TopSpeed and NumberOfWheels. The class is an abstract design that becomes real when objects such as Car, RacingCar, Tank and Tricycle are created, each with its own version of the class' methods and properties.
A class defines the characteristics of an object i.e its an template for creating objects. Characteristics include: Attributes (fields or properties), and behaviors (methods or operations). For example, a "Car" class could have properties such as: year, make, model, color, number of doors, and engine. Behaviors of the "Car" class include: On, off, change gears, accelerate, decelerate, turn, and brake.
An object is an instance of a class. Creating an object is also known as instantiation. For example, the object "my Porsche" is an instance of the car class.
A method is a behavior of an object. Within a program, a method usually affects only one particular object. In our example, all cars can accelerate, but the program only needs to make "my Porsche" accelerate. 48
Message Passing
Message passing (or method calling) is the process where an object sends data to another object to invoke a method. For example, when the object called "joe" (an instance of the driver class), presses the gas pedal, he literally passes an accelerate message to object "my Porsche", which in turn, invokes the "my Porsche" accelerate method. Message passing, also known as interfacing, describes the communication between objects using their public interfaces. There are three main ways to pass messages. These are using methods, properties and events. A property can be defined in a class to allow objects of that type to advertise and allow changing of state information, such as the 'TopSpeed' property. Methods can be provided so that other objects can request a process to be undertaken by an object, such as the Steer() method. Events can be defined that an object can raise in response to an internal action. Other objects can subscribe to these so that they can react to an event occurring. An example for vehicles could be an 'ImpactDetected' event subscribed to by one or more 'AirBag' objects.
Encapsulation conceals the functional details of a class from objects that send messages to it. For example, the "Porsche Carrera GT" class has a method called "accelerate". The code for the "accelerate" method defines exactly how acceleration occurs. In this example, fuel is pumped from gas tank and mixed with air in the cylinders. Pistons move causing compression, resulting in combustion, etc. Object "Joe" is an instance of the "Driver" class. It does not need to know how "my Porsche" accelerates when sending it an accelerate message. Encapsulation protects the integrity of an object by preventing users from changing internal data into something invalid. Encapsulation reduces system complexity and thus increases robustness, by limiting inter-dependencies between components. Encapsulation, also known as data hiding, is an important object-oriented programming concept. It is the act of concealing the functionality of a class so that the internal operations are hidden, and irrelevant, to the programmer. With correct encapsulation, the developer does not need to understand how the class actually operates in order to communicate with it via its publicly available methods and properties; known as its public interface. Encapsulation is essential to creating maintainable object-oriented programs. When the interaction with an object uses only the publicly available interface of methods and properties, the class of the object becomes a correctly isolated unit. This unit 49
can then be replaced independently to fix bugs, to change internal behavior or to improve functionality or performance. In the car analogy this is similar to replacing a headlight bulb. As long as we choose the correct bulb size and connection (the public interface), it will work in the car. It does not matter if the manufacturer has changed or the internal workings of the bulb differ from the original. It may even offer an improvement in brightness!
Abstraction is the practice of reducing details so that someone can focus on a few consepts at a time. For example, "my Porsche" may be treated as a "Car" most of the time. It may sometimes be treated as a "Porsche Carrera GT" to access specific properties and methods relevant to "Porsche Carrera GT". It could also be treated as a vehichle, the parent class of "Car" when considering all traffic in your neighborhood.
Inheritance (also known as subclasses) occurs when a specialized version of a class is defined. The subclass inherits attributes and behaviors from the parent class. For example, the "Car" class could have subclasses called "Porsche car", "Chevy car", and "Ford car". Subclasses inherit properties and methods from the parent class. The software engineer only only has to write the code for them once. A subclass can alter its inherited attributes or methods. In our example, the "Porsche car" subclass would specify that the default make is Porsche. A subclass can also include its own attributes and behaviors. We could create a subclass of "Porsche car" called "Porsche Carrera GT". The "Porsche Carrerra GT" class could further specify a default model of "Carrera GT" and "number of doors" as two. It could also include a new method called "deploy rear wing spoiler".
The object of "my Porsche" may be instantiated from class "Porsche Carrera GT" instead of class "Car". This allows sending a message to invoke the new method "deploy rear wing spoiler". Inheritance is an interesting object-oriented programming concept. It allows one class (the sub-class) to be based upon another (the super-class) and inherit all of its functionality automatically. Additional code may then be added to create a more specialized version of the class. In the example of vehicles, sub-classes for cars or motorcycles could be created. Each would still have all of the behavior of a vehicle but can add specialized methods and properties, such as 'Lean()' and 'LeanAngle' for motorcycles. 50
Some programming languages allow for multiple inheritances where a subclass is derived from two or more super-classes. C# does not permit this but does allow a class to implement multiple interfaces. An interface defines a contract for the methods and properties of classes that implement it. However, it does not include any actual functionality.
Polymorphism is the ability of one type to appear as (and be used like) another type. Classes "Porsche Carrera GT" and "Ford Mustang" both inherited a method called "brake" from a similar parent class. The results of executing method "brake" for the two types produces different results. "my Porsche" may brake at a rate of 33 feet per second, whereas "my Mustang" may brake at a rate of 29 feet per second. Polymorphism is the ability for an object to change its behaviour according to how it is being used. Where an object's class inherits from a super-class or implements one or more interfaces, it can be referred to by those class or interface names. So if we have a method that expects an object of type 'vehicle' to be passed as a parameter, we can pass any vehicle, car or motorcycle object to that method even though the data type may be technically different.
Function Overloading We are overloading a function name f by declaring more than one function with the name f in the same scope. The declarations of f must differ from each other by the types and/or the number of arguments in the argument list. When you call an overloaded function named f, the correct function is selected by comparing the argument list of the function call with the parameter list of each of the overloaded candidate functions with the name f. A candidate function is a function that can be called based on the context of the call of the overloaded function name. Consider a function print, which displays an int. As shown in the following example, you can overload the function print to display other types, for example, double and char*. You can have three functions with the same name, each performing a similar operation on a different data type:
#include <iostream> using namespace std; void print(int i) { cout << " Here is int " << i << endl; } void print(double f) { cout << " Here is float " << f << endl; } void print(char* c) { cout << " Here is char* " << c << endl; }
Virtual Function
C++ virtual function is a member function of a class, whose functionality can be over-ridden in its derived classes. The whole function body can be replaced with a new set of implementation in the derived class. The concept of c++ virtual functions is different from C++ Function overloading. C++ Virtual Function - Properties: C++ virtual function is, A member function of a class Declared with virtual keyword Usually has a different functionality in the derived class A function call is resolved at run-time The difference between a non-virtual c++ member function and a virtual member function is, the non-virtual member functions are resolved at compile time. This mechanism is called static binding. Whereas the c++ virtual member functions are resolved during run-time. This mechanism is known as dynamic binding. C++ Virtual Function - Reasons: The most prominent reason why a C++ virtual function will be used is to have a different functionality in the derived class. For example a Create function in a class Window may have to create a window with white background. But a class called Command Button derived or inherited from Window may have to use a gray background and write a caption on the center. The Create function for Command Button now should have functionality different from the one at the class called Window. C++ Virtual function - Example: This article assumes a base class named Window with a virtual member function named Create. The derived class name will be Command Button, with our over ridden function Create. class Window // Base class for C++ virtual function example { public: virtual void Create() // virtual function for C++ virtual function example { cout <<"Base class Window"< 52
Computer Applications BBA 3rd year } }; class CommandButton : public Window { public: void Create() { cout<<"Derived class Command Button - Overridden C++ virtual function"< } }; void main() { Window *x, *y; x = new Window(); x->Create(); y = new CommandButton(); y->Create(); }
The output of the above program will be, Base class Window Derived class Command Button If the function had not been declared virtual, then the base class function would have been called all the times. Because, the function address would have been statically bound during compile time. But now, as the function is declared virtual it is a candidate for run-time linking and the derived class function is being invoked. C++ Virtual function - Call Mechanism: Whenever a program has a C++ virtual function declared, a v-table is constructed for the class. The v-table consists of addresses to the virtual functions for classes and pointers to the functions from each of the objects of the derived class. Whenever there is a function call made to the c++ virtual function, the v-table is used to resolve to the function address. This is how the Dynamic binding happens during a virtual function call.
Friend Functions
In this C++ tutorial, you will learn about friend functions, need for friend function, how to define and use friend function and few important points regarding friend function, explained with example.
As we know from access specifies, when a data is declared as private inside a class, and then it is not accessible from outside the class. A function that is not a member or an external class will not be able to access the private data. A programmer may have a situation where he or she would need to access private data from nonmember functions and external classes. For handling such cases, the concept of Friend functions is a useful tool. What is a Friend Function? A friend function is used for accessing the non-public members of a class. A class can allow non-member functions and other classes to access its own private data, by making them friends. Thus, a friend function is an ordinary function or a member of another class. How to define and use Friend Function in C++: The friend function is written as any other normal function, except the function declaration of these functions is preceded with the keyword friend. The friend function must have the class to which it is declared as friend passed to it in argument. Some important points to note while using friend functions in C++:
The keyword friend is placed only in the function declaration of the friend function and not in the function definition. . It is possible to declare a function as friend in any number of classes. . When a class is declared as a friend, the friend class has access to the private data of the class that made this a friend. . A friend function, even though it is not a member function, would have the rights to access the private members of the class. . It is possible to declare the friend function as either private or public. . The function can be invoked without the use of an object. The friend function has its argument as objects, seen in example below.
Computer Applications BBA 3rd year int a,b; public: void test() { a=100; b=200; } friend int compute(exforsys e1) //Friend Function Declaration with keyword friend and with the object of class exforsys to which it is friend passed to it };
int compute(exforsys e1) { //Friend Function Definition which has access to private data return int(e1.a+e2.b)-5; } main() { exforsys e; e.test(); cout<<"The result is:"< //Calling of Friend Function with object as argument. }
The output of the above program is The result is: 295 The function compute () is a non-member function of the class exforsys. In order to make this function have access to the private data a and b of class exforsys , it is created as a friend function for the class exforsys. As a first step, the function compute () is declared as friend in the class exforsys as:
friend int compute (exforsys e1)
Objects can work together in many ways within a system. In some situations, classes and objects can be tightly coupled together to provide more complex functionality. This is known as composition. In the car example, the wheels, panels, engine, gearbox, etc. can be thought of as individual classes. To create the car class, you link all of these objects together, possibly adding
further functionality. The internal workings of each class are not important due to encapsulation as the communication between the objects is still via passing messages to their public interfaces.
In addition to the concepts described above, object-oriented programming also permits increased modularity. Individual classes or groups of linked classes can be thought of as a module of code that can be re-used in many software projects. This reduces the need to redevelop similar functionality and therefore can lower development time and costs.