Module 5
1. Decomposition Techniques:
1.1 Software sizing:
Definition:
Software sizing is the process of estimating the size of a software project, which helps determine
the effort, time, and cost required to develop it. Accurate software sizing is critical for project
planning.
Key Points:
1. Factors Affecting Accuracy:
o Proper estimation of the product's size.
o Ability to translate size into human effort, time, and cost using reliable metrics.
o The capabilities of the software team.
o Stability of product requirements and the supporting environment.
2. Measuring Software Size:
o Direct Approach (LOC): Measures the software’s size by counting Lines of Code
(LOC).
o Indirect Approach (FP): Uses Function Points (FP), focusing on the functionality
provided by the system.
3. Approaches to Software Sizing:
o Fuzzy Logic Sizing: Uses approximate reasoning techniques, where the planner
identifies the type of application, estimates its magnitude qualitatively, and refines the
estimate.
o Function Point Sizing: Estimates based on information domain characteristics, such
as inputs, outputs, inquiries, files, and interfaces.
o Standard Component Sizing: Estimates based on standard components like
subsystems, modules, reports, or batch programs. Historical project data helps
estimate the size for each component.
o Change Sizing: Used when modifying existing software. The planner estimates the
number and type of changes (e.g., adding, deleting, or reusing code).
4. Advantages:
o Helps in understanding the scope of the software and estimating resources early.
o Various approaches allow flexibility depending on the project’s nature.
5. Limitations:
o Accuracy depends on available historical data and the stability of requirements.
o Some methods (like fuzzy logic) can be subjective and less precise.
1.2 Problem based estimation:
Definition:
Problem-based estimation uses LOC (Lines of Code) or FP (Function Points) to estimate the size
of a software project and calculate the associated cost and effort. These estimation variables help
break down the scope of the project into smaller, manageable functions, which are then individually
estimated.
Key Points:
1. Two Uses of LOC and FP:
o Sizing Functions: LOC or FP are used to size each function in the software.
o Historical Data: Past project data is used to calculate cost and effort by applying
baseline productivity metrics (e.g., LOC/pm or FP/pm).
2. Process:
o Decompose Software Scope: The project scope is broken into smaller functions, and
LOC or FP is estimated for each function.
o Apply Productivity Metrics: Historical productivity data (e.g., LOC/pm or FP/pm) is
used to derive cost or effort for each function.
o Combine Estimates: Function estimates are summed up to produce an overall
estimate for the entire project.
3. Challenges with Baseline Metrics:
o Productivity metrics can vary, so it’s better to compute domain-specific averages
(e.g., by team size, complexity, or application area) to improve estimate accuracy.
4. LOC vs. FP Estimation:
o LOC Estimation: Requires detailed decomposition of the project to estimate the
number of lines of code for each function.
o FP Estimation: Focuses on information domain characteristics (inputs, outputs,
data files, etc.), and uses complexity adjustment values to estimate function points.
5. Three-Point Estimation:
o A range of values (optimistic, most likely, pessimistic) is used to estimate each
function. The expected value (S) is calculated as a weighted average:
o This provides a more reliable estimate based on the most likely scenario, with some
consideration for uncertainty.
6. Final Check: After applying historical data and calculating the expected value, it's important
to cross-check estimates with other methods and rely on experience and common sense.
1.3 An Example of LOC-Based Estimation
A CAD software project is to be developed for mechanical components, involving interactions
with peripherals like a mouse, digitizer, and printer. The preliminary scope includes:
Accepting 2D/3D geometric data from an engineer.
Interacting through a user interface.
Maintaining a CAD database and interfacing with graphics peripherals.
LOC Estimates:
For 3D geometric analysis:
o Optimistic: 4600 LOC
o Most likely: 6900 LOC
o Pessimistic: 8600 LOC
Expected value for 3D geometric analysis: 6800 LOC.
After estimating the LOC for all functions, the total LOC estimate for the CAD system is 33,200
LOC.
Historical Data:
The average productivity is 620 LOC per person-month (LOC/pm).
The cost per LOC is approximately $13 (with a burdened labour rate of $8000 per month).
Total estimated project cost: $431,000.
Total estimated effort: 54 person-months.
This example illustrates how LOC-based estimation can predict the project’s size, cost, and
effort using historical productivity data.
LOC-Based Estimation with Example
Definition:
LOC-based estimation is a technique used to estimate the size of a software project by calculating
the Lines of Code (LOC) needed to implement different functions. It helps predict the effort, time,
and cost required for development.
Steps in LOC-Based Estimation:
1. Decompose the System:
Break the software system into major functions or components.
2. Estimate LOC for Each Function:
For each function, develop optimistic, most likely, and pessimistic LOC estimates. Then,
calculate the expected LOC using a weighted average.
3. Sum the LOC Estimates:
Add the expected LOC for all functions to get the total LOC for the project.
4. Apply Historical Data:
Use historical productivity metrics (e.g., LOC per person-month) to calculate the effort and
cost.
Example:
Let's take an example of a CAD software system. The software is designed to interact with
various peripherals like a mouse, digitizer, and printer. The system has multiple functions, and the
LOC estimates for one of the key functions (3D Geometric Analysis) are:
Optimistic LOC: 4600
Most Likely LOC: 6900
Pessimistic LOC: 8600
The expected LOC for this function is:
By summing the expected LOC for all the functions, the total LOC for the CAD system is
estimated to be 33,200 LOC.
Historical Data:
Based on historical productivity data:
The average productivity is 620 LOC per person-month (LOC/pm).
With a labour rate of $8000 per month, the cost per LOC is approximately $13.
Final Estimates:
Total Estimated Cost: $431,000
Total Estimated Effort: 54 person-months
1.4 An Example of FP-Based Estimation
In FP-based estimation, decomposition focuses on information domain values such as inputs,
outputs, inquiries, files, and external interfaces, rather than the software functions themselves.
1. Decomposition Process:
o For the CAD software, you would estimate the complexity of each information
domain value, such as inputs, outputs, inquiries, files, and external interfaces.
o The complexity weight for each domain is assigned, with the average complexity
assumed for this estimate.
2. Formula for Function Points (FP): The Function Points (FP) are calculated using the
following formula:
Where Fi refers to the complexity adjustment factors for each information domain value, and
count total represents the total number of the domain values. The complexity weight is assumed
to be average for this calculation.
3. Historical Data:
o The average productivity for systems of this type is 6.5 FP per person-month
(FP/pm).
o Based on a burdened labour rate of $8000 per month, the cost per FP is
approximately $1230.
4. Final Estimates:
o Total estimated project cost: $461,000
o Total estimated effort: 58 person-months
1.5 Process-Based Estimation:
Definition:
Process-based estimation involves breaking down the software development process into smaller
tasks or activities. The effort required to accomplish each task is then estimated, typically in
person-months.
Key Points:
1. Starting Point:
Process-based estimation begins by defining software functions from the project scope.
These functions are then mapped to specific process activities.
2. Decomposing the Process:
The software process is broken down into a set of framework activities (e.g., design,
coding, testing). Each activity is associated with a specific function, creating a matrix where
effort estimates are calculated for each combination.
3. Effort Estimation:
o For each function and activity, the effort (person-months) required is estimated.
o These estimates are combined to form a central table, where the labor required for
each task is identified.
4. Labour Rates:
The labour rate (cost per unit of effort) is applied to each process activity. Labor rates may
vary based on task complexity and staff seniority. For example, senior staff will be involved
in early activities (e.g., design) and will generally have higher rates than junior staff working
on later activities (e.g., coding and release).
5. Cost and Effort Calculation:
After estimating effort, costs and effort for each function and activity are calculated. These
estimates can be compared and reconciled with other estimation methods (e.g., LOC or
FP-based) to check for consistency.
6. Validation:
If both the process-based and other estimation methods show similar results, the estimates
are likely reliable. If they disagree, further investigation is needed to refine the estimates.
1.6 An Example for Process Based Estimation:
For the CAD software project, the system configuration and software functions remain the same
as outlined in the project scope.
1. Process-Based Table:
The table estimates the effort (in person-months) required for each software
engineering activity for every CAD software function. The activities include:
o Engineering and construction tasks are subdivided into major software engineering
tasks.
o Gross estimates for customer communication, planning, and risk analysis are
provided in the total row at the bottom.
2. Effort Distribution:
Horizontal and vertical totals show the estimated effort required for different stages like
analysis, design, coding, and testing. It is noted that 53% of all effort goes into front-end
engineering tasks (requirements analysis and design), highlighting their importance.
3. Cost and Effort Calculation:
o Using an average labour rate of $8000 per month, the total estimated project cost
is $368,000, and the total estimated effort is 46 person-months.
o If needed, labour rates can be applied to each framework activity or task for more
detailed cost breakdowns.
1.7 Estimation with Use Cases:
Use cases provide valuable insights into software scope and requirements, but using them for
estimation is challenging due to several reasons:
1. No Standard Format: Use cases are described in many different formats and styles,
without a standard form.
2. External View: Use cases represent the user's view of the software and can be written at
varying levels of abstraction.
3. Complexity: Use cases do not address the complexity of the software’s functions and
features.
4. Varied Effort: Different use cases may require vastly different amounts of effort to
implement. One use case might take months, while another may be completed in a few
days.
Despite these challenges, Smith suggests that use cases can be used for estimation, but only
within a defined structural hierarchy. This means:
The hierarchy should be divided into levels, and no more than 10 use cases should
describe any level.
Each use case should contain no more than 30 distinct scenarios.
The level of abstraction depends on the system’s size. For larger systems, use cases are
written at a higher abstraction level.
To use the use cases for estimation, several factors must be considered:
The level within the hierarchy of use cases.
The average length (in pages) of each use case.
The type of software (e.g., real-time, business, WebApp, embedded).
The rough architecture of the system.
Once these characteristics are established, empirical data can be used to estimate the number of
LOC (Lines of Code) or FP (Function Points) per use case. Historical data helps compute the
effort required to develop the system.
Formula for LOC Estimate:
Where:
N = actual number of use cases
LOCavg = historical average LOC per use case
Sa, Sh = actual and average scenarios per use case
Pa, Ph = actual and average pages per use case
LOCadjust = adjustment factor based on project differences
This formula adjusts the average LOC per use case based on actual scenarios, page length,
and the project’s uniqueness.
1.8 An Example of Use-Case–Based Estimation:
The CAD software is divided into three subsystem groups:
1. User Interface Subsystem: Includes UICF (described by 6 use cases).
2. Engineering Subsystem: Includes 2DGA, 3DGA, and DAM (described by 10 use cases).
3. Infrastructure Subsystem: Includes CGDF and PCF (described by 5 use cases).
Each use case has the following characteristics:
User Interface Subsystem: 6 use cases, each with up to 10 scenarios and an average of 6
pages.
Engineering Subsystem: 10 use cases, each with up to 20 scenarios and an average of 8
pages.
Infrastructure Subsystem: 5 use cases, each with 6 scenarios and an average of 5
pages.
Using the formula from Expression (26.2) and with a 30% adjustment factor (n=30%), the LOC
estimates for each subsystem group are calculated. The historical data used for the estimates is:
User Interface: 800 LOC per use case with no more than 12 scenarios and fewer than 5
pages.
After applying the formula to all subsystems, the total estimated LOC for the CAD system is
42,500 LOC.
Using the historical productivity rate of 620 LOC per person-month (LOC/pm) and a labour rate
of $8000 per month, the cost per LOC is approximately $13.
Based on this estimate, the total project cost is $552,000, and the estimated effort is 68
person-months.
1.9 Reconciling Estimates:
In the estimation process, multiple methods are used to generate different estimates of effort,
project duration, or cost. These estimates need to be reconciled into a single, final estimate.
For the CAD software example:
The estimated effort ranges from 46 person-months (process-based estimation) to 68
person-months (use-case estimation).
The average estimate from all four methods is 56 person-months.
The variation from the average is approximately 18% lower and 21% higher.
When estimates show significant disagreement, it is important to re-evaluate the information used
for the estimates. The causes of such discrepancies are often:
1. Misunderstanding or misinterpretation of the project scope by the planner.
2. Inaccurate or outdated productivity data used in estimation techniques, or the data was
misapplied.
To resolve this, the cause of the divergence must be identified, and the estimates must be
reconciled.
2. Empirical Estimation Models:
Estimation models predict software effort based on LOC (Lines of Code) or FP
(Function Points), using formulas derived from past project data.
These models are based on limited data and may not be suitable for all software types or
development environments, so results should be used carefully.
The model should be calibrated to local conditions and tested with data from completed
projects to compare actual vs. predicted results.
If predictions are inaccurate, the model needs to be tuned and retested before being used
for future estimates.
2.1 The Structure of Estimation Models:
models are created using regression analysis based on past software project data.
These models predict effort (E) in person-months based on an estimation variable (ev), such as
LOC (Lines of Code) or FP (Function Points).
The general structure of these models is:
E=A× B (ev)C
Where A and B are empirically derived constants, and ev is either LOC or FP.
models also have an adjustment component that modifies E based on factors like
problem complexity, staff experience, and the development environment.
LOC-based models include:
E=5.2×(KLOC)0.91 Walston-Felix model
E=5.5×0.73×(KLOC)1.16 Bailey-Basili model
E=3.2×(KLOC)1.05 Boehm simple model
E=5.288×(KLOC)1.047 Doty model for KLOC > 9
FP-based models include:
E=91.4+0.355×FP Albrecht and Gaffney model
E=-37 + 0.96×FP Kemerer model
E=-12.88 + 0.405×FP Small project regression model
Point: Since each model produces different results for the same LOC or FP values, these
models must be calibrated to reflect specific project needs for accurate estimations.
2.2 The COCOMO II Model:
COCOMO II (Constructive Cost Model II) is an advanced software cost estimation model
developed by Barry Boehm. It is used to predict the effort, cost, and schedule of software
development projects. The model consists of three sub-models:
1. Application Composition Model: Used in the early stages of development for prototyping,
user interface design, and system interaction evaluation. It estimates effort using object
points, which are based on the number of screens, reports, and components in the system.
Each object is assigned a complexity level (simple, medium, or difficult).
2. Early Design Model: Applied once the project requirements are stable, and the basic
architecture is defined. It provides a broader estimate of effort and cost based on system
complexity.
3. Post-Architecture Model: Used during the construction phase of the software. It gives
more detailed estimates based on finalized architecture and design.
Sizing Options in COCOMO II: The model requires sizing information to calculate estimates:
Object Points: Used in the application composition model, based on screens, reports, and
components.
Function Points: Used for more advanced stages, measuring software functionality from
the user's perspective.
Lines of Code (KLOC): Measures software size by counting lines of code.
Formulas:
Productivity Rate (PROD):
Where NOP is the number of object points, and person-months is the time a developer or team
works on the project.
Effort Estimation:
This formula calculates the total effort in person-months required to complete the project based on
productivity rate.
COCOMO II also accounts for reuse by adjusting the object point count based on the percentage
of reused components. This helps in improving the accuracy of estimates.