[go: up one dir, main page]

0% found this document useful (0 votes)
2 views123 pages

APDP (Sample)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views123 pages

APDP (Sample)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 123

Higher Nationals

Internal verification of assessment decisions – BTEC (RQF)

INTERNAL VERIFICATION – ASSESSMENT DECISIONS

Programme title BTEC Higher National Diploma in Computing - Software Engineering

Assessor Internal Verifier


Unit 20: Applied Programming and Design Principles
Unit(s)
Sales Analysis System for Sampath Food City (PVT) Ltd
Assignment title

Student’s name

List which assessment Pass Merit Distinction


criteria the Assessor
has awarded.

INTERNAL VERIFIER CHECKLIST

Do the assessment criteria


awarded match those shown in the
assignment brief? Y/
N

Is the Pass/Merit/Distinction grade


awarded justified by the assessor’s Y/N
comments on the student work?

Has the work been assessed


accurately? Y/
N

Is the feedback to the student:


Give details:

• Constructive?
Y/
• Linked to relevant assessment N
criteria? Y/
N
• Identifying opportunities
for improved performance?
Y/
• Agreeing actions? N
Y/
N

Does the assessment decision need


amending? Y/
N

Assessor signature Date

Internal Verifier signature Date

Programme Leader signature


(if required) Date

1
Confirm action completed
Remedial action taken

Give details:

Internal
Verifier Date
signature
Programme Leader
signature (if Date
required)

Higher Nationals - Summative Assignment Feedback Form

Student Name/ID
Unit Title Unit 20: Applied Programming and Design Principles

Assignment Number 01 Assessor

Date Received
Submission Date 1st submission
Date Received 2nd
Re-submission Date submission

2
Assessor Feedback:

LO1. Investigate the impact of SOLID development principles on the OOP paradigm

Pass, Merit & P1 P2 M1 D1


Distinction Descripts
LO2. Design a large dataset processing application using SOLID principles and clean coding
techniques

Pass, Merit & P3 P4 M2


Distinction Descripts

LO3. Build a data processing application based on a developed design


Pass, Merit & P5 M3
Distinction Descripts

LO4. Perform automatic testing on a data processing application


Pass, Merit & P6 P7 M4 D2
Distinction Descripts

Grade: Assessor Signature: Date:

Resubmission Feedback:

Grade: Assessor Signature: Date:

Internal Verifier’s Comments:

Signature & Date:

3
* Please note that grade decisions are provisional. They are only confirmed once internal and external
moderation has taken place and grades decisions have been agreed at the assessment board.

Assignment Feedback

Formative Feedback: Assessor to Student

Action Plan

Summative feedback

Feedback: Student to Assessor

Assessor signature Date

4
Student signature Date

Pearson
Higher Nationals in
Computing

Unit 20: Applied Programming and


Design Principles
Assignment 01
5
General Guidelines

1. A Cover page or title page – You should always attach a title page to your assignment. Use previous
page as your cover sheet and make sure all the details are accurately filled.
2. Attach this brief as the first section of your assignment.
3. All the assignments should be prepared using a word processing software.
4. All the assignments should be printed on A4 sized papers. Use single side printing.
5. Allow 1” for top, bottom, right margins and 1.25” for the left margin of each page.

Word Processing Rules

1. The font size should be 12 point, and should be in the style of Time New Roman.
2. Use 1.5-line spacing. Left justify all paragraphs.
3. Ensure that all the headings are consistent in terms of the font size and font style.
4. Use footer function in the word processor to insert Your Name, Subject, Assignment No, and
Page Number on each page. This is useful if individual sheets become detached for any reason.
5. Use word processing application spell check and grammar check function to help editing your
assignment.

Important Points:

1. It is strictly prohibited to use textboxes to add texts in the assignments, except for the compulsory
information. eg: Figures, tables of comparison etc. Adding text boxes in the body except for the
before mentioned compulsory information will result in rejection of your work.
6
2. Carefully check the hand in date and the instructions given in the assignment. Late submissions will
not be accepted.
3. Ensure that you give yourself enough time to complete the assignment by the due date.
4. Excuses of any nature will not be accepted for failure to hand in the work on time.
5. You must take responsibility for managing your own time effectively.
6. If you are unable to hand in your assignment on time and have valid reasons such as illness, you may
apply (in writing) for an extension.
7. Failure to achieve at least PASS criteria will result in a REFERRAL grade.
8. Non-submission of work without valid reasons will lead to an automatic RE FERRAL. You will
then be asked to complete an alternative assignment.
9. If you use other people’s work or ideas in your assignment, reference them properly using
HARVARD referencing system to avoid plagiarism. You have to provide both in-text citation and a
reference list.
10. If you are proven to be guilty of plagiarism or any academic misconduct, your grade could be
reduced to A REFERRAL or at worst you could be expelled from the course.
11. If you are proven to be guilty of plagiarism or any academic misconduct, your grade could be
reduced to A REFERRAL or at worst you could be expelled from the course.

Student Declaration

I hereby, declare that I know what plagiarism entails, namely to use another’s work and to present it as my
own without attributing the sources in the correct way. I further understand what it means to copy another’s
work.

1. I know that plagiarism is a punishable offence because it constitutes theft.


2. I understand the plagiarism and copying policy of the Pearson UK.
3. I know what the consequences will be if I plagiaries or copy another’s work in any of the
assignments for this program.
4. I declare therefore that all work presented by me for every aspect of my program, will be my own,
and where I have made use of another’s work, I will attribute the source in the correct way.
5. I acknowledge that the attachment of this document signed or not, constitutes a binding agreement
between myself and Pearson, UK.
7
6. I understand that my assignment will not be considered as submitted if this document is not attached
to the attached.

Student’s Signature:
(Provide E-mail ID)
Date:
(Provide Submission Date)

Assignment Brief

Student Name /ID Number

Unit Number and Title Unit 20: Applied Programming and Design Principles

Academic Year 2023 / 24

Unit Tutor

Assignment Title Sales Analysis System for Sampath Food City (PVT) Ltd

Issue Date

Submission Date

IV Name & Date

Submission Format:

Part 1.
Report- Submit a professional report with appropriate report formatting and guidelines followed. All the
research data should be referenced along with in-text citations using the Harvard referencing system.

8
Part 2
A fully functional standalone software system (command-line interface based)

Unit Learning Outcomes:

LO1 Investigate the impact of SOLID development principles on the OOP paradigm.

LO2 Design a large dataset processing application using SOLID principles and clean coding techniques.

LO3 Build a data processing application based on a developed design.

LO4 Perform automatic testing on a data processing application.

Assignment Brief and Guidance:

Assignment Brief

Scenario

‘Data Labs’ is a leading software development company in Sri Lanka. They are focusing on helping
businesses to build their businesses through creative and effective solutions. Assume that you work as an
apprentice software developer for Data Labs company. As a part of your role, you have been asked to
develop a software system (command-line interface based) for the following scenario using python
programming language.

Sampath Food City (PVT) Ltd is one of the main supermarket networks in Sri Lanka. Currently, Sampath
Food City has several branches island wide. At present, transactions of each branch are recorded through
a point of sale (pos) system. At the end of each month, recorded data of each point of the sales system are
transferred to a centralized database. Top-level management of the company use the centralized data to do
the monthly sales data analysis of the whole company at the end of the month to find insights to take

9
managerial decisions for the company. Currently, the company uses a paper-based manual system to do
monthly sales data analysis. Some weaknesses and drawbacks that have occurred in the manual system
such as human errors leading to inaccurate information, time consuming, data redundancy, inconsistency
and difficulty to find insights affect the business performance negatively.

Therefore, the management of Sampath Food City has decided that using a customized software system
for sales data analysis is the solution for eliminating above mentioned weaknesses and drawbacks of the
existing sales data analysis process.

Assume yourself as a software developer of Data Labs (PVT) Ltd and assigned to develop a sales data
analysis system (command-line interface based) using python programming language for scenario given
above.

New system should provide following features:

 Monthly sales analysis of each branch.


 Price analysis of each product
 Weekly sales analysis of supermarket network
 Product preference analysis
 Analysis of the distribution of total sales amount of purchases

Develop a command-line interface-based solution for the above scenario and produce a report
covering the following tasks.

Activity 1

 Investigate the characteristics of the object-orientated paradigm, including class relationships


(inheritance, association, composition, aggregation) and evaluate the impact pf SOLID principles
(single responsibility principle, open/closed principle, Liskov’s substitution principle, interface
segregation principle and dependency inversion principle) by taking suitable examples
incorporating UML diagrams and coding samples. Your answer should include suitable examples
10
to evaluate the impact of SOLID principles in Object Oriented Development.

 Explain how clean coding techniques can impact on the use of data structures and operations when
writing algorithms by taking suitable examples from the given scenario. Analyse each of the
creational, structural and behavioral design patterns with relevant examples.

Activity 2

 Design a large data set processing application, utilizing SOLID principles, clean coding
techniques, a design pattern and data structures by providing justifications for selected design
pattern and selected data structures.

 Design a suitable testing regime for the application developed with a provision for automated
testing, selected test types and selected automatic testing tools, and provide justifications for the
selections. Refine the design to include multiple design patterns by justifying the reasons for the
inclusion of each design pattern for the given scenario.

Activity 3

 Build a large dataset processing application based on the design produced, by using python
programming language and provide evidence for the usage of data structures and file handling
techniques. Your answer must include an assessment of how effective the use of SOLID
principles, clean coding techniques and programming patterns on the application developed. Take
suitable examples from the developed application to elaborate your answer.

Activity 4

 Examine the benefits and drawbacks of different methods of automatic testing of applications and
software systems available for automatic testing by taking examples from the developed
application. Provide an action plan to address the identified drawbacks of testing of the developed
application.

11
 Implement automatic testing of the developed application by using selected testing tools and
provide evidence for the automatic testing. Discuss how developer-produced and vendor-provided
automatic testing tools differ for applications and software systems by taking suitable examples
from the testing of the developed application.

12
Grading Rubric

Grading Criteria Achieved Feedback

LO1 Investigate the impact of SOLID development principles


on the OOP paradigm.
P1 Investigate the characteristics of the object orientated
paradigm, including class relationships and SOLID principles.
P2 Explain how clean coding techniques can impact on the use of
data structures and operations when writing algorithms.
M1 Analyse, with examples, each of the creational, structural
and behavioral design pattern types.

D1 Evaluate the impact of SOLID development principles on


object orientated application development.

LO2 Design a large dataset processing application using


SOLID principles and clean coding techniques.
P3 Design a large data set processing application, utilizing
SOLID principles, clean coding techniques and a design pattern.

P4 Design a suitable testing regime for the application, including


provision for automated testing.

M2 Refine the design to include multiple design patterns.

LO3 Build a data processing application based on a


developed design
P5 Build a large dataset processing application based on the
design produced.
M3 Assess the effectiveness of using SOLID principles, clean
coding techniques and programming patterns on the application

13
developed.
LO4 Perform automatic testing on a data processing
application.
P6 Examine the different methods of implementing automatic
testing as designed in the test plan.
P7 Implement automatic testing of the developed application.

M4 Discuss the differences between developer-produced and


vendor-provided automatic testing tools for applications and
software systems.

D2 Analyse the benefits and drawbacks of different forms of


automatic testing of applications and software systems, with
examples from the developed application.

14
1
OBSERVATION RECORD

Learner name:
Qualification:
Unit number &
title:
Description of activity undertaken

Assessment criteria

How the activity meets the requirements of the assessment criteria

Learner name:

1
Learner signature: Date:
Assessor name:
Assessor signature: Date:

WITNESS STATEMENT

Learner name:
Qualification:
Unit number &
title:
Description of activity undertaken (please be as specific as possible)

Assessment criteria (for which the activity provides evidence)

How the activity meets the requirements of the assessment criteria, including how and
where the activity took place

2
Witness name: Job role:
Witness
Date:
signature:
Learner name:
Learner
Date:
signature:
Assessor name:
Assessor
Date:
signature:

3
ACTIVITY 1

Key object-oriented concepts

Encapsulation

Encapsulation is the bundling of data and methods that operate on that data within a
single unit or class while restricting direct access to some components. This ensures that
the internal representation of an object is hidden from the outside, and access to it is
controlled via public methods. By enforcing encapsulation, we ensure that only
authorized interactions with the object's data occur, thereby improving security and
maintainability. For example, private attributes and getter/setter methods enable this
principle in Python. Encapsulation prevents unintended interference and misuse of an
object's internal states.

The BankAccount class demonstrates encapsulation by making the attributes


__account_holder and __balance private using double underscores. These attributes
cannot be accessed directly from outside the class. Instead, methods like deposit and
get_balance allow controlled interaction. The deposit method checks if the amount is
valid and updates the balance securely. The get_balance method returns the current
balance. This ensures that the object's state is modified and retrieved only through
predefined methods, preserving its integrity and encapsulation. Direct access to __balance
is restricted, ensuring it cannot be tampered with externally.

4
Inheritance

Inheritance allows one class (child class) to inherit the properties and behaviors (methods)
of another class (parent class). This promotes code reuse and establishes a hierarchical
relationship between classes. By inheriting from a parent class, the child class gains
access to all its methods and attributes, allowing developers to build specialized classes
without rewriting shared functionality. Inheritance also supports polymorphism, enabling
flexibility and dynamic method overriding. Python implements inheritance using the
parent class name in parentheses while defining the child class.

The example demonstrates inheritance where the Dog class inherits from the Animal
class. The Animal class has a method sound that returns a generic sound. The Dog class
overrides the sound method to provide a specific implementation for dogs. When we
create an object of Dog and call the sound method, the overridden version in the Dog
class is executed, showcasing polymorphism. This allows code reuse as the Dog class
inherits other properties of the Animal class, even though only the sound method is
explicitly overridden here. Inheritance simplifies extending or modifying functionality.

Polymorphism

Polymorphism allows objects of different classes to be treated as objects of a common


super class. It enables methods in different classes to have the same name but behave
differently based on the object that calls them. This is particularly useful for
implementing dynamic and flexible code. Polymorphism can be achieved through method
overriding, where a child class redefines a method of its parent class, or through

5
interfaces and abstract classes in languages that support them. In Python, polymorphism
works seamlessly with dynamic typing, allowing objects to implement the same method
in unique ways.

This example demonstrates polymorphism with the Bird and Penguin classes. The
Penguin class overrides the fly method of the Bird class to reflect its unique
characteristics. When iterating over the Bird and Penguin objects in a loop and calling fly,
each object executes its respective fly method. This behavior highlights polymorphism as
the same method name can adapt its implementation depending on the object's type. It
allows for treating objects uniformly while retaining their specialized behavior. The
ability to redefine methods enables flexibility and supports extending functionalities
dynamically.

Abstraction

Abstraction is the process of hiding implementation details and showing only the essential
features of an object. It focuses on simplifying complex systems by exposing only
relevant aspects to the user while concealing the underlying complexities. Abstraction can
be achieved in Python through abstract classes and methods, where abstract methods are
declared but not implemented in the base class. Subclasses inheriting the abstract class
must provide implementations for the abstract methods. This promotes the use of
consistent interfaces and enforces a blueprint for child classes.

6
In this example, the Shape class is an abstract class that defines the area method as
abstract using the @abstractmethod decorator. The Rectangle class inherits from Shape
and provides a specific implementation of the area method. Abstract classes serve as
blueprints, enforcing that all child classes must implement the required methods. Here,
the Rectangle class calculates the area based on its dimensions. Abstract classes ensure
consistency across related classes while hiding unnecessary details, fostering a clear
separation between interface and implementation. This allows users to focus on high-level
functionalities without concerning themselves with internal mechanics. (Bhuyan, 2024)

Main class relationships

Aggregation

Aggregation is a "has-a" relationship where one class contains another class, but both can
exist independently. It represents a weak association where the contained object is not
owned by the container object, meaning the lifecycle of the contained object is not
dependent on the container. Aggregation allows objects to be reused in different contexts.
For example, a Department may have multiple Teachers, but the Teachers can exist
independently of the Department. Aggregation promotes loose coupling between classes,
making the code more modular and maintainable.

7
The Teacher class represents individual teachers, while the Department class represents a
department containing multiple teachers. The Department class has an add_teacher
method to associate teachers with it. Importantly, the Teacher objects can exist
independently of the Department. For example, teacher1 and teacher2 are created before
associating them with the "Science" department, demonstrating the loose coupling
characteristic of aggregation. This approach allows flexibility as teachers can be reused
across different departments or contexts.

The UML diagram illustrates the aggregation relationship between Department and
Teacher. The open diamond on the Department class represents aggregation, signifying
that the Department has a collection of Teacher objects. The Teacher class exists
independently of the Department, as shown by the lack of ownership in the relationship.
The association line labeled with o-- highlights the weak connection. This relationship

8
reflects the real-world scenario where teachers belong to departments but are not bound to
them for their existence, maintaining modularity and reusability.

Association

Association is a general "uses-a" relationship between two classes, where one class
interacts with another. It signifies that objects of one class are connected to objects of
another, but neither depends on the other's lifecycle. This relationship can be one-to-one,
one-to-many, or many-to-many. For example, a Student can be associated with a Library
to borrow books, but both can exist independently. Association is a broader concept than
aggregation or composition, as it does not imply ownership. It facilitates interaction
between objects without enforcing tight coupling, allowing classes to communicate
effectively while remaining independent.

The Student and Library classes demonstrate association, as a Student can borrow a book
from a Library. The borrow_book method in the Library class accepts a Student object
and prints the borrowing action. Here, neither Student nor Library is dependent on the
other’s lifecycle. The objects interact for specific actions but remain independent. For
instance, the Student object exists without requiring a Library, and vice versa. This
showcases a loose coupling where the classes maintain independence, promoting
modularity and easier maintenance.

9
The UML diagram depicts an association between the Student and Library classes. The
connecting line (--) represents the interaction between the two classes. Neither class owns
or depends on the other, as shown by the absence of diamonds in the relationship. The
Library class provides the borrow_book method to enable interaction with Student
objects, while the Student class remains independent. This setup models a simple and
modular association where objects communicate to perform tasks without tight coupling
or ownership.

Composition

Composition is a strong "has-a" relationship where one class owns another class, and the
lifecycle of the contained object is tied to the container. If the container is destroyed, so
are the objects it contains. This represents a whole-part relationship where the contained
class cannot exist independently of the container. For example, a Car may have an
Engine, which ceases to exist if the Car is destroyed. Composition enforces a tighter
relationship, ensuring that the container manages the lifetime of the contained objects,
thereby enhancing encapsulation.

10
The Car class demonstrates composition with the Engine class. The Car class owns an
Engine object, created inside its constructor. The Engine class cannot exist independently
because its lifecycle is tied to the Car. When a Car object is created, an associated Engine
object is also instantiated. The show_details method in the Car class provides details
about both the car and its engine, demonstrating the close relationship. This composition
enforces a tighter coupling where the Engine class cannot be reused without being part of
a Car.

The UML diagram shows a composition relationship between Car and Engine. The solid
diamond (*--) on the Car class represents ownership and strong lifecycle dependency.
The Engine object is part of the Car and cannot exist independently. The association line
indicates that the Car class contains an Engine instance. This whole-part relationship
ensures that when a Car is destroyed, the associated Engine is also removed, reinforcing
the tight coupling between these classes.

Dependency

11
Dependency is a "uses-a" relationship where one class relies on another to perform a
specific function. This is a temporary relationship where a class depends on another class
during runtime to fulfill a specific task. Unlike association, dependency is not persistent
and exists only during method calls or specific interactions. For example, a Printer may
depend on a Document to print its content, but the two do not have a permanent
relationship. Dependency allows classes to interact without binding them together,
making the code more flexible and reusable.

The Printer class demonstrates dependency on the Document class. The print_document
method accepts a Document object and prints its content. The Printer does not store or
maintain a reference to the Document, and the relationship exists only during the method
call. This temporary interaction ensures flexibility as the Printer can work with any
Document instance without being tightly coupled to it. This setup models a lightweight
dependency, promoting reusability and easier maintenance while maintaining a clear
separation of responsibilities.

12
The UML diagram represents a dependency between Printer and Document. The dotted
line (..>) indicates that Printer depends on Document to perform the print_document
operation. The Document class provides the content needed for the Printer to function,
but there is no persistent connection between them. The dependency relationship exists
only temporarily during the method call, allowing both classes to remain decoupled and
reusable. This approach promotes modularity, as changes to one class do not directly
impact the other.

Inheritance

Inheritance is a "is-a" relationship where a child class derives properties and behaviors
from a parent class. This allows code reuse and hierarchical classification. The child class
inherits attributes and methods from the parent, and it can also override or extend them.
For example, a Dog is an Animal but has specific behaviors like barking. Inheritance
fosters modularity and maintainability, allowing common functionality to reside in the
parent class while specialized behavior is implemented in the child class.

The Dog class inherits from the Animal class, as demonstrated by the move method in
both classes. The Dog class overrides the move method to provide a specific
implementation while retaining the ability to use other attributes or methods from the
Animal class. This hierarchical relationship enables code reuse and specialization, where
shared functionality is defined in the parent class, and specific behavior is implemented in
the child class. The Dog class retains the general characteristics of Animal but with its
unique attributes.

13
The UML diagram depicts an inheritance relationship between Animal and Dog. The
arrow (<|--) points from the child (Dog) to the parent (Animal), representing that the Dog
class inherits from the Animal class. The Animal class provides a general move method,
which is overridden by the Dog class to specify its unique behavior. This inheritance
relationship promotes reuse of common functionality while allowing child classes to
implement specialized behavior, adhering to the "is-a" relationship principle. (This vs.
That, 2023)

SOLID principles

Single Responsibility Principle (SRP)

The Single Responsibility Principle (SRP) states that a class should have only one reason
to change, meaning it should have only one responsibility. By adhering to SRP, we ensure
that a class focuses on a single functionality, which enhances maintainability, testability,
and readability. For example, in a banking application, a class responsible for processing
transactions should not also handle logging. Violating SRP makes code harder to modify
and test since changes for one functionality might unintentionally affect another.
Breaking down responsibilities into separate classes improves modularity and makes the
code easier to manage.

14
The TransactionProcessor class is responsible solely for processing transactions, while
the Logger class handles logging. By separating these responsibilities, each class has a
single reason to change. For example, if the logging mechanism needs to change (e.g.,
switching to a file logger), the TransactionProcessor class remains unaffected. This
separation of concerns makes the code easier to maintain and test independently.
Adhering to SRP ensures that classes remain focused on one functionality, minimizing
unintended side effects when modifications are made.

The UML diagram illustrates SRP by separating the TransactionProcessor and Logger
classes. The TransactionProcessor class is solely responsible for processing transactions,
while the Logger class is responsible for logging messages. The arrow (-->) indicates that
the TransactionProcessor interacts with the Logger to log messages but does not handle
logging itself. This separation ensures that each class has a single responsibility,
enhancing maintainability, modularity, and testability.

Open / Closed Principle (OCP)

15
The Open/Closed Principle (OCP) states that classes should be open for extension but
closed for modification. This means a class's behavior should be extendable without
altering its existing code. OCP helps to avoid breaking existing functionality when
introducing new features. For example, if we need to add a new type of shape in a
drawing application, we should be able to extend the system without modifying existing
shape-related code. Adhering to OCP reduces the risk of introducing bugs and makes the
code easier to evolve over time.

The Shape class is an abstract base class with an abstract area method, adhering to OCP
by allowing new shapes to be added without modifying existing code. The Circle and
Rectangle classes extend the Shape class and implement the area method. New shape
types can be added by creating additional subclasses without altering the Shape class or
existing subclasses. This ensures the system remains extendable and avoids the need to
change existing, tested code, reducing the risk of bugs and maintaining the integrity of the
application.

16
The UML diagram shows OCP by defining Shape as an abstract class with subclasses
Circle and Rectangle. The arrow (<|--) represents inheritance. The Shape class provides a
general interface for calculating the area, while specific shapes implement the area
method independently. This structure allows new shapes to be introduced by extending
the Shape class, leaving existing code untouched. This adherence to OCP ensures
extensibility without risking modifications to existing functionality, preserving system
stability.

Liskov Substitution Principle (LSP)

The Liskov Substitution Principle (LSP) ensures that subclasses can replace their parent
classes without affecting the functionality of the program. If a subclass violates the
expectations of the parent class, it breaks LSP, leading to unpredictable behavior.
Subtypes must maintain the integrity and behavior of their base types. For instance, if a
base class Bird has a method fly, a subclass Penguin that cannot fly would violate LSP if
it inherits the fly method. Adhering to LSP ensures that derived classes extend the
behavior of their parent classes in a predictable and compatible way.

17
The Bird class defines a fly method, which is overridden by the Penguin class to indicate
its inability to fly. Despite this difference, the Penguin class adheres to LSP because it
provides a meaningful implementation for the fly method. The let_bird_fly function
accepts any object of type Bird, ensuring substitutability. Both Sparrow and Penguin can
replace the Bird class in the function without causing any unexpected behavior. This
ensures that the system respects LSP by maintaining compatibility across the base and
derived classes.

The UML diagram illustrates the LSP-compliant inheritance relationship between the
Bird class and its subclasses Sparrow and Penguin. Both subclasses inherit the fly method
from the Bird class. The diagram uses arrows (<|--) to show inheritance. Each subclass
ensures substitutability by either inheriting or overriding the fly method meaningfully.

18
The system thus adheres to LSP, ensuring that instances of the subclasses can seamlessly
replace the parent class without disrupting the functionality.

Interface Segregation Principle (ISP)

The Interface Segregation Principle (ISP) states that a class should not be forced to
implement methods it does not use. This principle promotes the use of multiple specific
interfaces instead of a single general-purpose interface. Adhering to ISP ensures that
classes remain lightweight and focused on their specific responsibilities. For example, a
Printer interface should separate printing functionalities into distinct interfaces like Scan
and Print rather than forcing all devices to implement unrelated methods. This modular
approach reduces unnecessary dependencies and enhances code clarity and flexibility.

The Print and Scan interfaces separate printing and scanning functionalities, adhering to
ISP. The MultiFunctionPrinter class implements both interfaces, providing concrete
methods for print_document and scan_document. By dividing the responsibilities into
specific interfaces, other devices, such as single-function printers or scanners, can
implement only the methods they require. This modular design ensures that no class is
forced to implement unnecessary methods, enhancing maintainability and flexibility
while reducing coupling between unrelated functionalities.

19
The UML diagram shows the ISP principle by separating Print and Scan interfaces. The
MultiFunctionPrinter class implements both interfaces, as depicted by the dotted arrows
(<|..). Each interface defines a specific functionality, allowing devices to implement only
the required methods. This separation ensures that classes are not burdened with
irrelevant methods. Adhering to ISP results in a cleaner, modular system where
functionalities are decoupled, enabling flexibility and reducing unnecessary
dependencies.

Dependency Inversion Principle (DIP)

The Dependency Inversion Principle (DIP) states that high-level modules should not
depend on low-level modules; instead, both should depend on abstractions. Abstractions
should not depend on details, but details should depend on abstractions. This principle
decouples classes and promotes flexibility. For instance, a PaymentProcessor should not
depend on a specific payment method like CreditCard. Instead, both should rely on an
abstraction, enabling new payment methods to be integrated seamlessly without
modifying the PaymentProcessor.

20
The PaymentMethod abstraction ensures that the PaymentProcessor depends on an
interface rather than a specific implementation. The CreditCard class implements this
abstraction, providing a concrete pay method. The PaymentProcessor class interacts with
the PaymentMethod interface to process payments, making it independent of specific
payment methods. This adherence to DIP allows new payment methods to be added by
implementing the PaymentMethod interface without modifying the PaymentProcessor,
ensuring flexibility and minimizing the risk of breaking existing functionality.

The UML diagram shows the DIP-compliant relationship between the PaymentProcessor,
PaymentMethod interface, and CreditCard. The PaymentProcessor depends on the
abstraction PaymentMethod, as indicated by the arrow (-->). The CreditCard class
implements the PaymentMethod interface, as depicted by the dotted arrow (<|..). This

21
separation ensures that the PaymentProcessor is decoupled from specific payment
implementations. New payment methods can be added without modifying the
PaymentProcessor, adhering to DIP and promoting a flexible, extensible design. (Joseph,
2024)

Key clean coding principles

Meaningful naming

Good code begins with meaningful and descriptive naming. Variable, function, and class
names should clearly convey their purpose. Poor naming leads to confusion and
misinterpretation of code functionality. For instance, instead of naming a variable x, use
total_price to reflect its role in the application. Functions should have verb-based names
like calculate_total to express their behavior, while class names should be noun-based,
such as InvoiceProcessor. Proper naming eliminates ambiguity, makes code self-
explanatory, and reduces the need for extensive comments. A clear naming strategy also
makes the code more readable and maintainable for collaborators.

The code uses meaningful names like ShoppingCart, add_item, and calculate_total_price,
which clearly reflect their purpose. The ShoppingCart class represents a shopping cart,
while the add_item method is self-explanatory, indicating it adds items to the cart. The
calculate_total_price method computes the total cost of the items. This naming
convention ensures that the code is self-documenting, making it easier for other
developers to understand and extend the functionality without needing excessive
comments or additional explanation.

22
Keep functions small

Functions should be small and perform only one task. Large, multi-purpose functions are
harder to understand, test, and maintain. A well-designed function should encapsulate a
single responsibility, making it reusable and easier to debug. If a function is performing
multiple operations, consider splitting it into smaller functions. Keeping functions small
enhances readability and reduces the cognitive load for developers. It also enables better
testing, as small functions are less prone to errors and easier to isolate during debugging.

The OrderProcessor class contains small, focused methods: calculate_subtotal,


calculate_tax, and calculate_total. Each method performs a single task, such as computing
the subtotal or tax. The calculate_total method combines the results of the other methods
without duplicating their logic. This structure adheres to the principle of keeping
functions small, making the code more modular, easier to test, and less error-prone. Any
change in tax calculation logic, for instance, requires modification only in calculate_tax,
ensuring minimal impact on other parts of the code.

Avoid code duplication

Code duplication increases the risk of bugs and makes maintenance cumbersome. If the
same logic is repeated, any change requires updating all instances, increasing the chances
of inconsistencies. Instead, extract the repeated logic into reusable functions or classes.
Reusability reduces redundancy and ensures consistency across the codebase. Adhering to

23
the DRY (Don't Repeat Yourself) principle minimizes code duplication, leading to a
cleaner, more maintainable codebase.

The apply_discount method encapsulates the discount logic, avoiding duplication. Instead
of repeating the formula for each item, this method is reused for multiple items. This
approach adheres to the DRY principle by centralizing the discount calculation logic. If
the discount formula changes, updates are made in one location, ensuring consistency
across the code. The extracted method makes the code more concise, easier to understand,
and reduces the risk of errors introduced by duplicated logic.

Commenting and documentation

While clean code should be self-explanatory, comments and documentation are essential
to clarify complex logic or provide context. Comments should describe the why behind a
decision, not the what the code does. Avoid unnecessary comments for simple code, as
they can clutter the codebase. Proper documentation, including docstrings in functions,
classes, and modules, helps developers understand the purpose and expected usage of the
code. This is particularly important for public APIs or complex algorithms. Well-
documented code improves collaboration and ensures continuity when new developers
join the project.

24
The TemperatureConverter class is self-documenting due to meaningful naming and the
addition of docstrings. The class-level docstring provides an overview, while the method-
level docstring explains the conversion formula and its usage. This documentation
clarifies the method’s purpose for developers who encounter it for the first time. By
including docstrings, the code achieves a balance between self-explanatory naming and
supplementary comments for complex logic, ensuring clarity and maintainability.
Developers can quickly understand and use the celsius_to_fahrenheit method without
guessing its purpose.

Error handling

Proper error handling ensures the program behaves predictably in unexpected scenarios.
Use exceptions to handle errors gracefully without crashing the program. Avoid silent
failures, which make debugging difficult. Provide meaningful error messages that help
identify the problem. Wrap critical code sections with try and except blocks and ensure
errors are logged for future analysis. Good error handling improves user experience and
code reliability by allowing the application to recover gracefully from issues.

The divide_numbers function incorporates error handling using a try and except block. It
ensures that division by zero, which would otherwise crash the program, is gracefully
managed. When b is zero, the function returns a clear error message, informing the user
of the issue. This approach ensures that the program remains functional even when
encountering invalid inputs. Proper error handling like this not only improves robustness
but also enhances the user experience by preventing abrupt program termination.

Consistency

25
Consistency in coding style and structure improves readability and reduces confusion.
Follow a consistent naming convention (e.g., snake_case for variables and methods,
PascalCase for classes). Organize code consistently across files and modules. Use
consistent indentation, spacing, and commenting styles to maintain uniformity. Adopting
style guides like PEP 8 in Python ensures a cohesive codebase, making it easier for teams
to collaborate. Consistency reduces cognitive load and makes transitioning between
different parts of the codebase seamless for developers.

The BankAccount class follows consistent naming conventions (snake_case for variables
and methods, PascalCase for class names). Its methods (deposit and withdraw) have a
uniform structure, making it easier for developers to predict their behavior. The use of
consistent argument names like amount ensures clarity and avoids ambiguity. Adhering to
a consistent coding style, as shown in this example, improves maintainability and allows
new team members to quickly understand and extend the codebase without confusion.
(Egbajie, 2023)

How clean coding leads to better readability, maintainability, and optimization in data
structure usage and algorithm performance

Improved code readability

Clean coding practices make code easier to read by using meaningful variable names,
proper indentation, and clear function definitions. Readability allows developers to
quickly understand the structure and purpose of the code. By following clean coding
principles, such as keeping functions small and avoiding unnecessary complexity, the
flow of the program becomes intuitive. Clear code helps developers identify bugs, add

26
features, and modify existing functionality with minimal friction, saving time and
reducing errors. The easier code is to read, the faster a new developer can get up to speed
with the project.

The calculate_area function is simple and clear, with a meaningful name and a direct
implementation of the area formula. The use of radius as a variable name further
improves clarity. The code is easy to understand at a glance, ensuring that developers or
collaborators can quickly grasp its purpose. Clean coding practices, such as clear naming
and direct implementation, contribute to the function's readability, enabling others to
modify or extend the code with minimal effort.

Simplified debugging

Clean code is structured logically, which makes identifying and fixing bugs easier. When
a program is well-organized, with functions performing one task each and variables
clearly named, it's easier to trace the source of an issue. Proper error handling, such as
checking for null values or invalid input, can prevent problems before they arise. Clean
code allows for easier debugging by maintaining a simple, understandable structure,
reducing the need for excessive searching to find the problem.

The safe_divide function checks if b is zero before attempting division, avoiding a


runtime error. The code is clear and easy to follow, with simple conditions and clear
logic. In the event of an error, the function returns a helpful message instead of crashing.
This kind of error handling ensures that issues are addressed early, simplifying the

27
debugging process. Clean code practices like this make identifying and fixing bugs much
more manageable, as the flow of the program is easy to track and understand.

Faster onboarding for new developers

When code is clean, with meaningful names and consistent structures, it is easier for new
developers to understand the project. Clean code reduces the time required for onboarding
by eliminating unnecessary complexity. Developers can quickly pick up where previous
team members left off without the need for extensive training. By following best practices
such as consistent indentation, modular code, and detailed documentation, new
developers can quickly grasp the logic behind the code and contribute effectively to the
project.

The Calculator class uses clear, descriptive method names like add and subtract to make
the code easily understandable. The methods are simple and perform a single, well-
defined operation. A new developer can quickly grasp how the class works and what it
does, ensuring a faster onboarding process. By avoiding unnecessary complexity, clean
code like this allows new team members to focus on writing additional functionality
instead of deciphering how the existing code works.

Enhanced maintainability

Clean code ensures that future modifications or additions are easier and less error-prone.
By following clear and consistent design patterns and writing modular code, developers
can make changes to individual components without affecting other parts of the program.
This reduces the likelihood of introducing bugs during maintenance and ensures that code
remains robust even as it evolves over time. Clean, modular code is easier to test, and

28
individual functions can be updated or optimized without affecting other areas of the
program.

The OrderProcessor class is modular, with methods focusing on individual tasks:


calculating the total and applying a discount. This separation of concerns makes the code
easy to maintain, as developers can modify one method (e.g., apply_discount) without
affecting others (like calculate_total). If the discount logic needs to change, developers
can make adjustments to apply_discount without having to dig through the entire class,
ensuring maintainability. Clean, modular code makes future updates less prone to bugs,
improving long-term code quality.

Better reusability

Clean, modular code increases reusability, as components are designed to perform


specific tasks and can be easily reused in different parts of the program or in future
projects. Functions that are focused on one task can be applied wherever that task is
required, reducing duplication. This avoids reinventing the wheel each time a similar
function is needed and promotes efficiency by utilizing pre-existing code. By adhering to
clean coding principles, developers can create reusable modules that work across various
scenarios.

29
The calculate_tax function is simple and reusable. It accepts two parameters, price and
tax_rate, and returns the calculated tax. This function can be reused in different parts of
the program or in different projects, as it performs one task in a generic way. By adhering
to clean code principles, this function can easily be used wherever tax calculation is
required, reducing duplication and promoting efficient, reusable code.

Increased performance via optimization

Clean code practices allow for better performance optimization, as developers can
identify bottlenecks and unnecessary calculations more easily. When code is modular, it's
easier to isolate areas that need improvement. Clean code also ensures that unnecessary
computations are avoided by simplifying algorithms and removing redundant processes.
By focusing on optimizing individual, well-defined components, clean code leads to more
efficient use of resources and faster execution times.

The optimized_sum function uses Python's built-in sum function, which is highly
optimized for performance. The code avoids unnecessary loops or additional steps,
relying on Python’s efficient built-in methods. By utilizing clean code practices like this,
performance optimization becomes easier, as the developer focuses on making well-
defined, efficient functions. This approach minimizes unnecessary complexity and
computation, improving the overall performance of the program.

Clear algorithm structure

Clean code ensures that algorithms are structured clearly and efficiently. A well-
structured algorithm, with clear steps and logical flow, reduces the chances of errors
during implementation. Code that follows clean coding principles makes algorithms
easier to understand, test, and optimize. By breaking down complex algorithms into
smaller, well-defined functions, the overall structure becomes more transparent, making it
easier to pinpoint areas for performance improvement.

30
The binary_search function implements a standard binary search algorithm, clearly
breaking down the logic into smaller steps: checking mid-value, adjusting search bounds,
and returning the result. The code is simple and efficient, making it easy to understand
and debug. Its clear structure allows for easy testing and potential optimization. By
following clean coding principles, the algorithm’s flow is transparent, making it easier for
developers to maintain and improve.

Scalability

Clean code supports scalability by making it easier to add new features or extend the
program’s functionality. Modular code and clear abstractions allow developers to extend
a program with minimal changes to the existing code. As new requirements arise, clean
code makes it possible to scale up a project without introducing excessive complexity or
breaking existing functionality. This ensures that the program can handle increased data
or user load without significant rework.

The Product class is modular and can easily be extended by adding more attributes or
methods without affecting other parts of the program. For instance, adding new features

31
like product ratings or inventory levels could be done without changing the existing
functionality. This scalability is supported by the clear, modular design, where each part
of the code is responsible for a single concern. Clean code practices enable easy
expansion of the system to accommodate new requirements without unnecessary
complexity.

Efficient memory usage

Clean code ensures that memory usage is optimized by avoiding unnecessary data
structures or operations. By using the appropriate data structures for each task, developers
can improve memory efficiency. Clean code practices ensure that memory is allocated
only when necessary and that large structures are not duplicated unnecessarily. This
reduces the overhead of managing memory, ensuring that the program runs efficiently
even as data grows.

The find_duplicates function uses a set to efficiently remove duplicates from the list. The
set data structure automatically eliminates duplicate values, optimizing memory usage by
reducing the size of the data structure. This approach is cleaner and more memory-
efficient compared to manually iterating through the list and maintaining a separate list
for duplicates. Clean code practices, such as using efficient data structures, ensure that the
program runs optimally even as data sets increase in size.

Better collaboration

Clean code fosters better collaboration between team members. By adhering to coding
standards and writing code that is easy to understand, developers can work together more
effectively. Clean code reduces the chances of misunderstandings and errors, as the
purpose of each function, variable, and class is clear. When the code is easy to follow,
team members can contribute to various parts of the project with minimal friction.

32
The get_user_info function is simple and returns a dictionary containing user information.
It’s easy to understand and follow, ensuring that developers can quickly understand its
purpose and reuse it elsewhere. Clean code practices make it easier for multiple
developers to collaborate, as each part of the code is clear and well-defined. This reduces
the likelihood of mistakes and conflicts, allowing the team to focus on solving problems
and enhancing the project efficiently. (Andersen, 2024)

ACTIVITY 2

Application architecture for processing large datasets

To process large datasets effectively, the application architecture for the sales analysis
system must be designed with scalability, efficiency, and maintainability in mind.
Processing large datasets introduces challenges such as memory management, data
loading times, and computational efficiency. The architecture must address these
challenges while maintaining the simplicity and modularity required for the system's core
functions.

At its foundation, the system should adopt a layered architecture that separates data
access, business logic, and presentation layers. The data access layer, represented by the
CSV Reader in the initial design, needs to be enhanced to handle large datasets. Instead of
loading the entire dataset into memory at once, it can use techniques such as chunked data
reading. By reading data in smaller, manageable chunks, the system can operate
efficiently without exceeding memory limits. Python’s built-in csv module supports
iterators, allowing data to be processed row by row. This approach is especially useful for
systems running on hardware with limited memory resources.

The business logic layer, where the analysis modules reside, must also be optimized to
work seamlessly with large datasets. These modules should employ lazy evaluation

33
techniques to defer computations until results are explicitly needed. For example, using
Python's generator functions instead of lists allows the system to handle data streams
efficiently. Additionally, sorting and grouping operations should be implemented using
algorithms optimized for large datasets, such as divide-and-conquer strategies or external
sorting if data exceeds memory capacity.

To further improve processing efficiency, the system could adopt parallel processing or
multithreading within the business logic layer. For instance, analyzing sales data for
different branches or weeks can be parallelized, as these tasks are independent of one
another. Python's multiprocessing module can be leveraged to distribute workload across
multiple CPU cores, significantly reducing processing time for large datasets.

Another critical enhancement lies in the data storage layer. While the CSV file format is
sufficient for small to medium-sized datasets, larger datasets might benefit from
transitioning to more robust storage solutions such as relational databases (e.g., SQLite,
MySQL) or NoSQL databases (e.g., MongoDB). These systems support efficient
querying and indexing, enabling faster data retrieval and processing. For example, using
SQL queries to filter data at the database level reduces the amount of data transferred to
the application, optimizing performance.

The presentation layer, which interacts with the user via the command-line interface, must
also accommodate large datasets gracefully. Instead of displaying all results at once, the
system should implement pagination to display data in smaller chunks, ensuring the
interface remains responsive. Additionally, providing options for exporting analysis
results to files (e.g., CSV or JSON) allows users to access the data without overwhelming
the interface.

Error handling and logging are essential for processing large datasets, as data
inconsistencies or system failures are more likely to occur. The architecture should
include robust mechanisms for validating input data, catching exceptions, and logging
errors. For instance, invalid rows in the CSV file should be logged and skipped without
halting the entire processing workflow.

34
Lastly, scalability should be a key consideration in the architecture design. As the dataset
grows, the system must be able to scale horizontally or vertically. Horizontal scaling can
be achieved by distributing the workload across multiple instances of the application,
while vertical scaling involves upgrading hardware resources such as memory or
processing power. (Vercel.app, 2025)

Applying SOLID principles in design

Single Responsibility Principle (SRP)

The SRP ensures that each module or class has only one responsibility. In the context of
the sales analysis system, the CSV Reader module should exclusively handle data loading
and parsing from the CSV file, without performing any analysis or business logic.
Similarly, each analysis module (e.g., monthly sales analysis, product preference
analysis) should focus solely on its specific type of analysis. For instance, the Monthly
Sales Analysis module will aggregate branch-wise sales data, while the Price Analysis
module will compute product price trends. By adhering to SRP, the architecture ensures
that changes to one aspect (e.g., switching from a CSV file to a database) do not affect
unrelated modules.

Open-Closed Principle (OCP)

The system should be open to extension but closed to modification. For instance, new
types of analysis (e.g., customer behavior analysis) can be added without altering the
existing codebase. This can be achieved by designing a base class or interface, such as
AnalysisModule, with methods like load_data() and analyze(). Each analysis module
(e.g., MonthlySalesAnalysis, PriceAnalysis) can inherit or implement this base class. If a
new analysis type is introduced, a new subclass can be added without modifying the
existing ones. This approach prevents regression and ensures the architecture remains
adaptable to new requirements.

35
Liskov Substitution Principle (LSP)

The LSP ensures that subclasses can replace their parent classes without altering the
functionality of the system. For example, if there is a base class DataLoader for loading
data, subclasses like CSVDataLoader or DatabaseDataLoader can be created to handle
specific data sources. The rest of the system interacts with the DataLoader interface and
does not need to know whether the data comes from a CSV file or a database. This design
makes it easy to substitute data sources without impacting the analysis modules or other
components.

Interface Segregation Principle (ISP)

The ISP advocates for creating smaller, more specific interfaces rather than one large,
monolithic interface. In this architecture, instead of having a single interface that handles
all aspects of data processing and analysis, separate interfaces can be created for tasks
like data loading, data filtering, and analysis. For example, a DataLoader interface can be
used for loading data, while an Analyzer interface can handle the computation logic. This
separation ensures that modules only depend on the functionality they use, reducing
unnecessary dependencies and making the codebase more modular.

Dependency Inversion Principle (DIP)

The DIP emphasizes that high-level modules should not depend on low-level modules but
rather on abstractions. In this system, analysis modules like MonthlySalesAnalysis should
depend on an abstraction (e.g., DataProvider) rather than directly interacting with a CSV
Reader or database loader. This abstraction can be implemented by classes like
CSVDataProvider or DatabaseDataProvider. By decoupling high-level modules from
low-level implementation details, the architecture becomes more flexible and resilient to

36
changes. For instance, switching from CSV to a database will only require changes to the
implementation of the DataProvider abstraction, without affecting the analysis modules.
(DigitalGadgetWave.com, 2023)

Clean coding practices in the design

Meaningful names

All variables, classes, methods, and modules should have meaningful, descriptive names
that clearly convey their purpose. For instance, the CSVReader module explicitly
describes its functionality—reading data from a CSV file. Similarly, method names like
load_data, filter_sales_by_branch, and calculate_weekly_totals make their operations
self-explanatory. Descriptive names reduce the cognitive load for developers by
eliminating ambiguity, especially when multiple modules interact in a large system.

Single responsibility

Each class and method should perform a single, well-defined task. For example, the
DataLoader class focuses on loading data, while the SalesAnalyzer class handles analysis
tasks. By adhering to this principle, each component remains decoupled and easier to test,
debug, or replace. For large datasets, this separation ensures that optimizations in data
loading (e.g., chunked processing) do not impact the analysis logic or interface design.

Keep methods short and simple

Methods should be concise, performing one task and avoiding excessive nesting or
complexity. For instance, the process_sales_data method can break down tasks into
smaller helper methods like read_csv_in_chunks, aggregate_branch_sales, and

37
generate_analysis_report. This modularity improves readability and allows developers to
focus on specific sections of logic without navigating through lengthy, tangled code.

Avoid code duplication

Redundant code should be minimized by reusing common functionality through modular


design. For example, a generic filter_data method can be reused across various analysis
modules (e.g., for filtering data by branch, product, or time period). Avoiding duplication
reduces maintenance overhead and ensures consistency when changes are required.

Optimize data structures

Efficient data structures should be chosen based on the specific use case. For processing
large datasets, lists, dictionaries, and generators can be used appropriately. For instance,
sales data can be stored as dictionaries where keys represent product IDs or branches for
O(1) lookups. Using generators for data streams ensures minimal memory usage, which is
critical for large datasets.

Commenting and documentation

While clean code minimizes the need for excessive comments, critical sections of the
code should be well-documented to explain complex logic or assumptions. For example,
methods implementing advanced data aggregation algorithms or parallel processing
should have inline comments explaining their purpose and approach. Additionally, high-
level documentation should describe the architecture and data flow for future developers.

Error handling and validation

38
Robust error handling is crucial for processing large datasets. Methods should include try-
except blocks to catch potential issues like missing files, corrupted data, or invalid inputs.
For instance, the CSVReader class can handle exceptions like FileNotFoundError and
provide clear error messages. Validation checks (e.g., ensuring non-empty datasets or
verifying correct column names) prevent errors from propagating to other modules.

Separation of concerns

Each layer of the architecture—data access, business logic, and user interface—should
remain independent. The CSVReader module handles data access, while the analysis
modules focus on computations. The user interface layer (command-line interface)
interacts with these modules but does not perform any analysis or data manipulation
itself. This separation ensures that changes in one layer do not disrupt other layers.

Testing and modularity

The design should facilitate testing by keeping modules decoupled and independent. Each
module can be unit-tested in isolation. For instance, the MonthlySalesAnalysis module
can be tested with a mock dataset to ensure it calculates totals correctly, while the
CSVReader can be tested for its ability to parse data accurately. This modular approach
reduces the risk of introducing bugs when modifying or extending the system.

Scalability and performance

Clean coding practices prioritize performance when processing large datasets. Chunked
data reading, lazy evaluations, and parallel processing should be implemented efficiently.
For example, instead of loading an entire file into memory, the read_csv_in_chunks

39
method processes data iteratively. Analysis modules can use Python's multiprocessing or
itertools to handle operations concurrently, ensuring that performance remains consistent
even as dataset size increases. (Amr Saafan, 2023)

Design pattern and its justification

Singleton pattern

The Singleton pattern ensures that a class has only one instance throughout the
application. This pattern can be applied to the CSVReader or DatabaseConnection
components. Since the application loads data from a single source (either a CSV file or a
database) and this data needs to be shared across various modules (e.g., monthly sales
analysis, weekly analysis), it makes sense to enforce a single instance. For example, the
CSVReader could be instantiated once and passed to all analysis modules, ensuring that
the same file or database connection is used consistently throughout the system.

In systems dealing with large datasets, such as this one, having a single, centralized
access point for data loading ensures consistency, prevents unnecessary file openings or
database connections, and reduces memory usage. The Singleton pattern ensures that the
data access is efficient and that only one instance of the data loader is ever created,
preventing redundant reads and improving overall performance.

Strategy pattern

The Strategy pattern is useful when you need to choose from a variety of algorithms to
perform a task. In this case, the sales analysis system needs to support various types of
analyses, such as monthly sales, price analysis, and product preference analysis. Each of
these analyses can be encapsulated in different strategy classes that follow a common
interface.

40
For example, the SalesAnalyzer class could define a compute_analysis() method, while
specific analysis strategies (like MonthlySalesStrategy, PriceAnalysisStrategy) implement
this method according to their unique logic. The SalesAnalyzer class would delegate the
analysis process to the strategy class.

The Strategy pattern provides flexibility in adding new types of analyses without
modifying existing code. Each new analysis strategy can be added as a new class that
implements the common interface, making the system easily extensible. This approach
promotes the Open-Closed Principle (OCP) from SOLID, where the system is open for
extension but closed for modification. The Strategy Pattern enhances maintainability by
isolating the specific logic for each analysis type in separate classes.

Factory Method pattern

The Factory Method pattern allows the creation of objects without specifying the exact
class of object that will be created. In the context of the sales analysis system, the factory
pattern can be used to create various types of data readers or analysis modules, based on
the type of input or analysis required.

For example, a DataLoaderFactory class could create instances of different data loaders
(such as CSVDataLoader, DatabaseDataLoader, etc.) depending on the input data source.
Similarly, a AnalysisModuleFactory could be used to create different analysis modules
based on user input or command-line options (e.g., MonthlySalesAnalysis,
ProductPreferenceAnalysis).

The Factory Method pattern decouples the creation of objects from the rest of the
application. This enables the system to easily switch between different data sources or
analysis types without affecting the rest of the architecture. This pattern enhances

41
flexibility and supports scalability as new data loaders or analysis modules are added. By
encapsulating object creation in factory methods, the system avoids unnecessary
dependencies between modules, contributing to a cleaner, more modular design.

Adapter pattern

The Adapter pattern allows incompatible interfaces to work together. In this case, if the
system needs to support multiple data sources (e.g., CSV, database, API), the Adapter
Pattern can be applied to ensure that each data source provides data in a common format,
regardless of its original interface.

For example, the CSVReader and DatabaseReader modules can be adapted to provide the
same interface for data access. Both can implement a common interface like
IDataProvider which exposes methods like load_data(). Internally, the CSVReader and
DatabaseReader will handle their respective data reading processes but can be used
interchangeably by other parts of the system.

The Adapter pattern ensures that different modules interacting with varying data sources
can use a common interface without rewriting code for each data source. This design
promotes Interface Segregation (ISP) by allowing modules to depend on smaller, more
focused interfaces, and it enhances the flexibility of the system by allowing for easy
integration of new data sources in the future. The Adapter pattern also simplifies testing
by allowing mock adapters for different data sources during unit tests.

Observer pattern

The Observer pattern can be applied to monitor changes in the sales data or analysis
results. For instance, if the sales data changes or updates are made to the dataset, the

42
system can notify multiple components (observers) that are interested in those changes,
such as the Dashboard displaying real-time metrics or the Sales Trends Analysis module.

For example, the DataChangeNotifier class would maintain a list of observers (modules
interested in changes) and notify them whenever the data is updated. The observers, such
as the SalesTrendObserver or SalesDashboardObserver, would then update their outputs
accordingly.

The Observer pattern is highly effective in scenarios where multiple components need to
react to changes in a single data source. In the case of large datasets, this pattern can help
keep the user interface or other components synchronized with the latest sales data
without polling or constantly reloading data. It improves system responsiveness and
allows real-time monitoring of changes. This approach also adheres to the Dependency
Inversion Principle (DIP), as the high-level modules (e.g., the UI components) depend on
abstractions (observers), not concrete data sources.

Command pattern

The Command pattern is useful when you want to encapsulate all details of a request in a
single object, which can be executed at a later time. In the case of this sales analysis
system, the Command pattern can be used to represent different user commands (e.g.,
running a monthly sales report, generating a price analysis) as command objects. Each
command object will implement an interface such as execute(), and when a user selects a
command from the menu, the corresponding command is executed.

Justification: The Command pattern decouples the request from the logic that processes it,
allowing for a more flexible and extensible design. As new commands (e.g., different
types of analysis) are added to the system, they can be encapsulated as new command
objects without modifying the existing logic. This makes the system easy to extend,

43
adheres to the Open-Closed Principle (OCP), and simplifies the addition of new
functionality without breaking existing workflows. (Ramuglia, 2023)

Architectural design

The architectural design of the sales analysis system follows a modular, layered
architecture approach, clearly divided into three core layers: the User Interaction Layer,
the Business Logic Layer, and the Data Access Layer. This separation ensures a clean
organization of responsibilities, promotes scalability, and simplifies future maintenance
and enhancements.

At the top of the architecture lies the User Interaction Layer, which manages all user-
facing functionalities. This layer includes the User, Login System, and Menu System
components. The User interacts with the system by initiating a login process through the
Login System, which validates hardcoded credentials to restrict access to authorized
personnel only. Once authenticated, users are granted access to the Menu System, which
acts as the central navigation interface. This menu allows users to select specific types of
sales analyses, guiding them seamlessly to different parts of the system based on their
input.

44
Beneath the interface layer is the Business Logic Layer, which contains the core
analytical functionalities of the system. This layer includes distinct modules such as the
Monthly Sales Analysis Module, Price Analysis Module, Weekly Sales Analysis Module,
Product Preference Analysis Module, and Sales Distribution Analysis Module. Each of
these modules encapsulates a specific analytical function and operates independently,
allowing for focused development and testing. These modules process data passed from
the lower data layer and generate insights such as branch-level performance, pricing
trends, weekly sales patterns, popular products, and category-based sales distribution. The
use of separate modules for each type of analysis also supports extensibility, making it
easy to integrate new analysis features without disrupting existing ones.

At the base of the architecture is the Data Access Layer, represented by the CSV Reader.
This component is responsible for reading, parsing, and formatting data from an external
CSV file into a structure suitable for analysis. It acts as the single source of truth for all
data required by the business logic layer. Each analysis module depends on this reader to
access consistent and accurate data. The modularity of this layer is especially
advantageous—should the system later transition to a database or another data source,
only the CSV Reader component would need to be updated, leaving the upper layers
untouched.

Overall, this architectural design not only supports clarity and logical separation of
concerns but also enhances maintainability, scalability, and reusability. It ensures that
each layer can evolve independently while maintaining the integrity of the system as a
whole.

Class diagram

45
The class diagram illustrates a modular and layered architecture of the sales analysis
system, where each component is encapsulated within its own class, promoting separation
of concerns and maintainability. At the user interaction layer, the User class is responsible
for initiating interaction with the system. This class calls upon the LoginSystem to handle
user authentication, validating credentials before granting access to the core functionality.
The LoginSystem contains an authenticate method that checks hardcoded username and
password inputs to ensure that only authorized users can proceed.

Once authentication is successful, control is passed to the MenuSystem, which acts as the
central navigation hub. The MenuSystem provides a structured interface where users can
select from a variety of sales analysis options. Each menu option corresponds to a specific
analysis module: MonthlySalesAnalysis, PriceAnalysis, WeeklySalesAnalysis,
ProductPreferenceAnalysis, and SalesDistributionAnalysis. The menu interacts
dynamically with these modules, allowing the user to access targeted analytical
functionality depending on their selection.

46
Each of these analysis modules is implemented as a separate class, adhering to the
principles of modularity and reusability. The MonthlySalesAnalysis class focuses on
branch-wise aggregation of sales data, calculating total sales per branch. The
PriceAnalysis class analyzes average product prices to reveal pricing trends and
anomalies. The WeeklySalesAnalysis module examines sales data on a weekly basis,
highlighting short-term trends and fluctuations. The ProductPreferenceAnalysis class
ranks products based on quantity sold, providing insight into customer preferences.
Lastly, the SalesDistributionAnalysis class breaks down sales across different categories
such as branches or product types to help identify profitable or underperforming
segments.

All of these analysis modules rely on a centralized data access layer represented by the
CSVReader class. This component is responsible for reading and parsing sales data from
a CSV file and delivering it in the form of a list of dictionaries. It ensures consistent and
structured access to data for all analysis modules. This approach isolates data retrieval
from business logic, which enhances scalability—should the data source transition from
CSV to a database in the future, only the CSVReader class would require modification,
leaving the rest of the system untouched.

In summary, the class diagram depicts a well-structured system where each class has a
distinct responsibility, interactions are clearly defined, and modularity is prioritized to
support ease of maintenance, scalability, and user-centric operation. (Kumar, 2023)

Key testing approaches

Unit testing focuses on testing individual components or functions in isolation. It is the


most granular level of testing and ensures that every small piece of code works correctly.
By writing test cases for specific functions, developers can identify and fix defects at their
source. For example, in a sales analysis system, a unit test could check whether the
function for calculating branch totals works with correct input-output mapping. This

47
approach promotes modular design, simplifies debugging, and creates a reliable
foundation for building larger systems. Furthermore, unit tests act as living
documentation, helping future developers understand the behavior and edge cases of
individual components.

Integration testing ensures that different modules or components of the system interact
properly. While unit testing focuses on isolated functionality, integration testing validates
data flow and collaboration between components. In the sales analysis system, integration
testing would ensure that data loaded from a CSV file can seamlessly feed into the
analysis module, and the results are correctly displayed. This approach catches issues like
mismatched data formats or broken APIs that might not surface in isolated tests.
Integration testing plays a vital role in guaranteeing that the combined system performs as
expected when different modules work together.

Functional testing assesses the system’s functionality against defined requirements. It is


user-centric, ensuring that the system does what it is supposed to do from the perspective
of the end user. For example, a functional test in the sales analysis system would validate
whether the system generates accurate monthly and weekly reports based on the provided
sales data. This testing approach uses real-world scenarios to evaluate how the software
performs under typical usage. By covering both normal and edge cases, functional testing
ensures the system fulfills its intended purpose and aligns with business objectives.

System testing evaluates the entire integrated system to ensure it functions correctly as a
whole. It includes testing all functional and non-functional aspects, such as performance,
usability, and compatibility. For the sales analysis system, system testing would involve
simulating the processing of large datasets, generating multiple reports, and verifying that
the outputs align with expectations. This holistic approach replicates real-world
conditions and uncovers issues that may not arise during isolated or component-level
testing. System testing ensures that the system is ready for deployment and meets both
user and organizational needs.

48
Performance testing evaluates how well the system performs under various conditions,
such as heavy loads or limited resources. This testing approach is crucial for systems
processing large datasets, like the sales analysis application. For instance, performance
testing might involve measuring how quickly the system can analyze data from multiple
branches or handle 100,000 transactions without slowing down. Key metrics like
response time, throughput, and memory usage are monitored to identify bottlenecks. By
optimizing performance, this approach ensures that the system delivers consistent and
efficient results even under high demand.

Regression testing ensures that new features or changes to the system do not break
existing functionality. In iterative development, where updates are frequent, regression
testing is critical to maintaining system stability. For instance, adding a feature for
product price analysis in the sales analysis system should not affect the accuracy of
monthly or weekly analysis. By re-executing previous test cases after changes, developers
can quickly identify and resolve issues. Automated regression tests are especially useful
for large systems, enabling faster testing and reducing the risk of introducing defects.

Acceptance testing validates the system against user requirements and determines
whether it is ready for deployment. This type of testing ensures that the software delivers
value to the client or end user. In the sales analysis system, acceptance testing might
involve ensuring that the command-line interface is intuitive and that generated reports
meet the client’s expectations for accuracy and formatting. Conducted by end users or
client representatives, acceptance testing bridges the gap between development and real-
world usage. It serves as the final checkpoint before the system is rolled out.

Exploratory testing is an unscripted, dynamic approach where testers actively explore the
system to uncover defects. This testing method relies on the creativity and intuition of
testers to identify issues that structured tests might miss. For the sales analysis system,
exploratory testing could involve inputting malformed data, using unsupported

49
commands, or intentionally exceeding data limits to observe the system’s response.
Exploratory testing is particularly effective for identifying usability issues and unexpected
behavior, making it a valuable complement to traditional testing methods.

Security testing evaluates the system’s ability to protect data and prevent unauthorized
access. In systems like the sales analysis application, which handle sensitive sales data,
security testing ensures that only authorized personnel can access the system and that data
is safeguarded against breaches. Simulated attacks, such as SQL injections or brute-force
login attempts, help identify vulnerabilities. By addressing these issues, security testing
ensures compliance with data protection standards and builds user trust in the system.

Usability testing focuses on ensuring the system is user-friendly and intuitive. For a
command-line-based system like the sales analysis application, usability tests might
involve evaluating the clarity of commands, the organization of menus, and the system’s
overall ease of navigation. Feedback from usability testing helps developers refine the
interface, reduce user friction, and improve the overall experience. This approach ensures
that the system meets user expectations and is accessible to its intended audience. (Haque,
2023)

How automated testing tools (Unittest) will be used to automate the testing process

Automated testing tools like Python's unittest module play a critical role in streamlining
the testing process, reducing manual effort, and ensuring consistent software quality. The
unittest framework, part of Python's standard library, allows developers to create test
cases, test suites, and automate the execution of those tests. By leveraging the power of
automation, it becomes possible to verify code functionality, maintain stability during
development, and quickly detect regressions, even in large and complex systems. Below
is a detailed explanation of how unittest can be used to automate the testing process and
its significance in modern software development.

50
First, unittest facilitates the creation of reusable test cases, which form the foundation of
an automated testing workflow. Developers can define test cases as methods within a test
class, inheriting from unittest.TestCase. Each test case focuses on a specific functionality
or component, such as verifying the output of a function or validating edge cases. For
example, in a sales analysis system, test cases might check whether a method correctly
calculates total sales or handles missing data. By structuring tests into self-contained
units, unittest encourages modular testing, making it easier to debug failures and pinpoint
issues in the codebase.

Another key feature of unittest is the ability to create test suites, which aggregate multiple
test cases for systematic execution. This allows developers to group related tests, such as
those targeting a specific module or feature. For instance, a test suite for a sales analysis
system could include tests for data processing, report generation, and error handling. Test
suites enable developers to run all relevant tests with a single command, ensuring
comprehensive validation of the system. Additionally, the unittest framework supports
selective execution, allowing developers to focus on specific test cases during iterative
development, which saves time.

Automated testing with unittest ensures consistency in the testing process. By writing
tests that follow predefined inputs and expected outputs, developers eliminate the
variability and potential errors of manual testing. The automated nature of unittest also
means that tests can be run repeatedly without additional effort, making it ideal for
continuous integration and delivery (CI/CD) pipelines. Every time code is committed, the
test suite can be executed automatically to detect issues early in the development
lifecycle. This early feedback loop significantly reduces the cost of fixing bugs and
ensures stability throughout the project.

The framework's built-in assertion methods, such as assertEqual, assertTrue, and


assertRaises, are another valuable feature for automated testing. These assertions allow
developers to define precise expectations for each test case, ensuring that failures provide
clear and actionable feedback. For example, if a method is expected to return a specific

51
value, assertEqual will immediately flag a mismatch, along with a descriptive error
message. This clarity accelerates debugging and fosters confidence in the reliability of the
system.

One of the most impactful aspects of unittest is its ability to simulate and test edge cases
that might be difficult to handle manually. For example, developers can use mocking (via
the unittest.mock module) to simulate scenarios like missing files, network failures, or
incorrect user inputs. In a sales analysis system, mocks can be used to test how the system
handles malformed CSV data or simulate the behavior of external dependencies like
databases. This capability ensures robust testing even in complex environments, making
the system more resilient to unexpected situations.

In addition to improving code quality, automated testing with unittest enhances


productivity by reducing the time spent on repetitive manual testing. Developers can
focus on writing and improving code while the automated tests continuously validate the
system's behavior. This is particularly beneficial in agile development environments,
where frequent updates and iterations are common. Automated testing ensures that new
features or changes do not introduce regressions, enabling faster and more reliable
development cycles.

Furthermore, unittest integrates seamlessly with other tools and frameworks, enhancing
its usability in larger projects. For example, it can generate detailed reports on test results,
including the number of tests run, passed, or failed, along with error details. These reports
provide valuable insights into the system's health and help prioritize areas that need
improvement. Integration with CI/CD platforms, such as Jenkins or GitHub Actions,
ensures that unittest becomes an integral part of the development workflow, contributing
to the overall reliability of the software delivery process.

Finally, unittest supports testing for both functional and non-functional requirements.
Functional tests verify that the software meets specified requirements, such as generating

52
accurate reports or processing data correctly. Non-functional tests, such as performance
or scalability tests, can also be automated by defining benchmarks and using assertions to
validate results. For instance, a performance test might assert that a sales analysis system
processes 10,000 transactions within a specified time limit. Automating such tests ensures
that the system not only works but also meets performance expectations.
(www.netguru.com, n.d.)

Tests that will be conducted and the tools used, ensuring that edge cases and large
datasets are covered

Unit tests for core functionalities

The first step involves writing unit tests for the core functionalities of the system, such as
reading data from CSV files, calculating sales metrics, and generating reports. Each
function in the system is tested independently using specific inputs and expected outputs.
For instance, a unit test for the function calculating total sales for a branch will use a
predefined set of sales records and assert that the output matches the expected total. This
ensures that each individual piece of functionality works correctly in isolation. Edge
cases, such as empty datasets, malformed CSV files, or invalid data entries, are tested to
verify that the system handles these scenarios gracefully. For example, a test might
simulate a file with missing column headers or non-numeric values in numeric fields,
ensuring the system can either process the file with warnings or reject it with meaningful
error messages.

Integration tests for module interactions

Integration tests validate the interaction between different components of the system. For
the sales analysis system, this involves ensuring seamless integration between data
ingestion, analysis, and output modules. For instance, integration tests would check

53
whether data read from a CSV file is accurately passed to the analysis functions and
whether the analysis results are properly formatted for output. A test might involve
providing a sample CSV file with diverse sales data and asserting that the generated
monthly or weekly reports are accurate. These tests also cover edge cases like handling
inconsistent data formats across files or processing multiple branches with varying dataset
sizes.

Performance testing with large datasets

One critical aspect of testing the sales analysis system is ensuring it performs efficiently
with large datasets. Using unittest, performance tests can be conducted by generating
synthetic datasets with tens of thousands of sales records and measuring execution time.
These tests assess whether the system can handle the volume of data typically expected in
real-world scenarios, such as the sales records of multiple branches over several months.
Assertions can be added to ensure that the processing time remains within acceptable
limits, such as completing the analysis of 100,000 records in under a specified time
threshold. Additionally, the system’s memory usage can be monitored to ensure it does
not exceed reasonable limits, especially when processing large files.

Testing edge cases and data validation

Edge cases are a critical focus in the testing process. Tests are designed to handle
scenarios such as missing data, negative sales amounts, or transactions with zero quantity.
For example, a test case might simulate a dataset where some rows are missing product
IDs, and assertions can ensure the system either skips those rows or handles them in a
predefined manner. Another edge case involves duplicate entries, where tests verify that
the system detects and handles duplicates appropriately, either by removing them or
flagging them for review. These tests ensure that the system is robust and can handle
unexpected input gracefully, without crashing or producing incorrect results.

54
Mocking and simulating dependencies

The unittest.mock module is used to simulate external dependencies and test components
in isolation. For example, the data ingestion module can be tested independently by
mocking the file-reading process, allowing the developer to simulate different file
scenarios without relying on actual files. Similarly, database interactions, such as
retrieving branch-specific data or storing analysis results, can be mocked to test how the
system interacts with external storage systems. This approach ensures comprehensive
testing without the need for a fully operational database during development.

Regression testing for system stability

Regression tests are crucial to ensure that new features or bug fixes do not introduce
unintended side effects. With unittest, previously written test cases are executed after
every change to the system. For instance, if a new feature is added to analyze product
price trends, regression tests ensure that the existing functionality for monthly and weekly
sales analysis continues to work as expected. Automated regression testing saves time,
especially in large systems, by quickly detecting and pinpointing issues introduced by
recent changes.

Usability testing for CLI commands

Although unittest is primarily used for functional and integration testing, it can also be
used to test the command-line interface (CLI) of the system. Tests can simulate user
inputs, such as selecting analysis options or specifying file paths, and assert that the
system responds correctly. For instance, a test might simulate a user requesting a branch’s
weekly sales analysis and verify that the correct report is displayed. Edge cases, such as
invalid commands or missing file paths, are also tested to ensure the system provides
helpful error messages and maintains usability.

55
Automated reporting of test results

The unittest framework’s reporting capabilities ensure that developers receive detailed
feedback on test results. Each test case outputs information on whether it passed or failed,
along with failure reasons and stack traces. This makes it easy to identify and address
issues quickly. By integrating unittest into a continuous integration (CI) pipeline, these
reports can be automatically generated and reviewed after every code commit, ensuring
ongoing quality assurance.

Scalability testing for future requirements

Scalability is a key consideration for the sales analysis system, especially as the business
expands. Tests are conducted to simulate future scenarios, such as processing data from
hundreds of branches or handling multiple concurrent users. Synthetic datasets are used to
mimic these conditions, and assertions verify that the system continues to perform
efficiently. By addressing scalability early in the development process, the system is
prepared to accommodate growth without requiring significant redesigns.

Code coverage and maintenance

Finally, automated testing with unittest ensures high code coverage, meaning that most of
the system’s code is executed during testing. Tools like coverage.py can be used
alongside unittest to measure which lines of code are tested and identify gaps in test
coverage. This encourages developers to write comprehensive tests and maintain a high
standard of code quality. Regularly updating test cases as the system evolves ensures that
the tests remain relevant and useful throughout the development lifecycle. (Kitakabee,
2022)

56
ACTIVITY 3

Python application for processing large datasets

Codes

57
58
Output

59
60
CSV file

Evidences for proper use of data structures

Efficient data handling with lists and dictionaries

In the context of the sales analysis system, one of the primary data structures used is the
list. Lists are ideal for storing ordered sales records or transaction entries where the
sequence of the records matters. For example, when processing the monthly sales data for
each branch, the system could use a list to store individual transaction records, with each

61
record being a dictionary containing key-value pairs such as product ID, sales quantity,
price, and transaction time. The list allows the system to efficiently store and iterate over
sales data in the order they occurred, making it easier to calculate totals, perform
aggregations, and generate reports. Additionally, dictionaries are used extensively to map
keys to values, such as mapping product IDs to product names or storing sales data for
each branch. This enables quick lookups, modifications, and efficient data retrieval. For
instance, when calculating total sales for a specific product, the system can use a
dictionary to directly access the sales data associated with the product ID, ensuring
efficient performance even as the dataset grows in size.

Handling large datasets with pandas DataFrames

For handling large datasets, especially when processing transaction records or sales data,
the pandas DataFrame is an indispensable data structure. Although the task requirement
specifically calls for no external libraries, the concept of using a structured tabular format
like a DataFrame is highly relevant in terms of efficiency and organization of data. In
cases where the system must handle thousands of sales records, using a DataFrame-like
structure provides a clear advantage over simple lists or dictionaries. In a more complex
sales analysis system, a DataFrame allows operations such as filtering, grouping, and
summarizing data to be performed efficiently. For example, when the system needs to
calculate the total sales for a given branch over the month, the DataFrame can easily
aggregate the data based on specific columns (e.g., branch ID, product category, etc.),
optimizing both performance and code readability. By representing data in rows and
columns, the DataFrame also makes it easier to conduct advanced analysis and ensure that
data is well-organized.

Utilizing stacks and queues for order processing

Stacks and queues are fundamental data structures that are especially useful in scenarios
involving order processing or transactional workflows in the sales analysis system. A
stack (Last In, First Out - LIFO) can be used to handle a sequence of actions or
transactions that need to be processed in reverse order. For example, a stack might be
used to track a series of user actions, such as undoing or redoing specific tasks within the

62
sales report generation process. On the other hand, a queue (First In, First Out - FIFO)
can be used to manage a list of pending transactions that need to be processed in the order
they are received. This is particularly useful in a real-time system where each incoming
sale needs to be processed and logged sequentially. In both cases, stacks and queues
ensure that operations are carried out in the correct order, providing structure and
efficiency in processing data.

Tree structures for hierarchical data representation

In the sales analysis system, tree structures are particularly useful for representing
hierarchical data, such as the relationship between different branches of the supermarket
network. For instance, a tree structure can be used to model a directory of branches,
where the root node represents the main headquarters and the child nodes represent
individual branch offices. This allows the system to quickly traverse the data, analyze
sales per branch, and generate aggregated reports at different levels (e.g., total sales for a
region or for the entire network). The tree structure provides a clear and efficient way to
represent hierarchical relationships, enabling fast searches and updates as new branches
are added or sales data changes. For example, when performing an analysis of weekly
sales, the system can quickly traverse the tree to aggregate data at various levels, from
individual branches to regional groupings.

Graphs for product preference and sales trends

For the analysis of product preferences and sales trends, graph data structures are a
powerful tool for representing and analyzing relationships between products, customers,
and sales patterns. In the case of product preference analysis, the system could represent
products as nodes in a graph, with edges between them indicating a co-purchase
relationship. This enables the system to identify products that are often purchased
together, helping the management team make data-driven decisions about product
placement or promotional offers. Similarly, a graph can be used to model sales trends
over time, with nodes representing time intervals (e.g., weeks, months) and edges
representing the volume of sales during those intervals. By using a graph, the system can

63
track how sales for individual products or categories evolve, allowing for in-depth trend
analysis and better forecasting.

Hash tables for fast data lookups

Hash tables are ideal for situations where fast lookups and quick retrieval of data are
essential. In the sales analysis system, hash tables can be employed to map customer IDs
to customer details or product IDs to product information. This enables rapid lookups and
ensures that the system can efficiently retrieve data even as the dataset expands. For
example, when generating a report that summarizes sales for a particular customer, the
system can use a hash table to retrieve the customer's details and associated transaction
history in constant time. This is especially critical in real-world systems where
performance is a concern, and ensuring efficient data retrieval is vital to maintaining
responsiveness in the system.

Priority queues for sales data processing

In cases where certain sales data or transactions need to be prioritized over others, priority
queues can be used. For example, if the system needs to prioritize processing high-value
transactions first (perhaps to generate reports for the most important clients), a priority
queue can be employed to ensure that transactions are processed in the order of their
value. Priority queues use a heap-based structure, which allows for efficient insertion and
retrieval of elements based on a priority ranking. This ensures that the system processes
high-priority transactions first, without compromising performance or accuracy in data
processing.

Arrays for fixed-size data

In situations where the size of the data is known in advance and does not change
frequently, arrays are a simple and effective data structure to use. For example, the
system could use arrays to store fixed sets of data, such as predefined sales targets or
fixed product categories. Since arrays offer constant time access and a fixed size, they
provide an efficient solution when the data structure's size is predetermined and does not

64
require resizing. Arrays are particularly effective when used in conjunction with other
data structures like dictionaries or lists to store and manipulate more dynamic data. (On,
2023)

Evidences for implementation of file handling

Reading sales data from CSV files

The first step in implementing file handling involves reading data from external CSV files
containing sales information. The system must be able to open, parse, and process sales
data that is regularly generated by individual branches. In the sales analysis system,
Python’s built-in file handling and the csv module are used to read CSV files. A critical
part of this implementation is ensuring that the file path is provided, allowing the system
to access the correct files from the local or remote storage. For example, the program uses
Python’s open() function to open the file and the csv.reader() function to parse the file
line-by-line. Each line of the file corresponds to a sales record, and the system converts
this data into a structured format, such as a dictionary or a list of lists, for further analysis.
This method ensures that even large datasets can be processed efficiently, reading the file
in chunks instead of loading the entire file into memory at once. As a result, this approach
prevents memory overflow and allows the system to scale for large datasets. The
implementation also checks for errors, such as missing files or malformed data, by
employing exception handling (try-except blocks) to provide informative error messages
if the data cannot be read.

Handling file paths and user input for file location

One important aspect of file handling in the system is the flexibility to specify file paths.
This allows the system to dynamically load different CSV files for different branches or
for different periods (e.g., monthly reports). By allowing users to input or hardcode the
location of the CSV files, the system becomes more adaptable and capable of handling
diverse datasets without needing to recompile or hardcode each file. For instance, when a
user runs a report for a specific branch, the system might ask for or automatically search
for the corresponding file based on branch IDs. The implementation supports absolute

65
and relative file paths, providing the user with flexibility in choosing where the files are
stored on the system. This flexibility is crucial when working with multiple branches and
regional datasets, where files may be stored in different directories based on the
organizational structure.

Writing processed data to CSV files

After reading the raw sales data and performing necessary calculations or analysis, the
system must write the results back to CSV files for reporting purposes. This is
particularly important for generating monthly sales summaries, price analysis reports, or
weekly sales trends. In the sales analysis system, Python’s csv.writer() method is
employed to write data back to CSV format. For example, the system can generate a new
file that contains total sales data for each branch or a summary of the most popular
products. This file handling is designed to ensure the accurate formatting of the output,
with proper handling of commas, quotes, and newlines, as specified by the CSV format.
Each piece of calculated data is added to the output file in a new row, preserving the
structured format. The system also checks for the existence of the output file before
writing and prompts the user for permission to overwrite the file if necessary. This feature
ensures that users can easily store their results for future reference, and it supports the
export of analysis results to a CSV file that can be shared or used for further processing.

Handling multiple files for branch-specific data

In a large supermarket chain with multiple branches, each branch generates its own sales
data, often stored in separate CSV files. The system must be able to handle and process
multiple CSV files to generate aggregated reports. For example, the system can read and
process sales data from several CSV files, each corresponding to a different branch, and
then aggregate the data to provide insights at a company-wide level. This implementation
requires efficient file handling to read each file individually, process the data contained
within, and then combine the results into a unified report. The system uses Python’s os
module to iterate through a directory containing multiple CSV files, opening each file in
turn and applying the necessary analysis to the data. Once the data from all branches is
processed, the system consolidates the results and writes the aggregated data into a new

66
summary report. This implementation is essential for generating overall performance
metrics, such as total sales, average product prices, or regional sales trends, while
maintaining individual branch data integrity.

Exception handling and data validation during file operations

A crucial aspect of implementing file handling in the sales analysis system is ensuring
that file operations, such as opening, reading, and writing, are robust and resilient to
errors. The system employs exception handling to manage scenarios where files might not
exist, where file paths are incorrect, or where the data within the file is malformed or
inconsistent. For example, if the user inputs an incorrect file path or the file is empty, the
system catches the FileNotFoundError or IOError exceptions and displays a user-friendly
message. Similarly, if the system detects that the contents of the CSV file are not in the
expected format (e.g., missing columns or invalid values), it raises an appropriate error
and prompts the user to correct the issue. This prevents the system from crashing and
ensures that users are alerted to any data-related problems during the file handling
process. The ability to handle errors gracefully ensures that the file handling
implementation is reliable and user-friendly, providing a positive experience even in the
case of unexpected issues.

Data processing and memory optimization with file handling

When dealing with large datasets, such as sales records spanning several months or years,
it is essential to optimize memory usage. The system’s file handling approach uses
efficient methods to read and process data without overwhelming the system’s memory.
For example, rather than loading the entire CSV file into memory at once, the system
processes the file row by row using Python’s csv.reader(). This method minimizes
memory usage by only keeping the current line in memory and discarding it once it is
processed. Additionally, the system can read and process files in chunks, allowing it to
handle large datasets even if the system’s memory is limited. Once the data is processed,
the results can be written back to a new file, ensuring that the memory footprint remains
manageable throughout the execution of the system.

67
User interaction and file management

The sales analysis system also allows users to interact with the file handling process by
selecting specific files or directories for processing. This interaction can be done through
the command line interface (CLI), where users input file names or browse directories to
select the appropriate files. The file handling implementation includes user prompts for
selecting files and confirming file overwrite actions when generating output reports. This
ensures that the system provides a smooth and intuitive interface for working with files.
For example, if the user needs to select a file containing the sales data of a particular
branch, the system will display available options and allow the user to select the correct
one. This feature streamlines the workflow and minimizes user errors related to file
selection. (On, 2023)

Evidences for ensuring the design and features meet the specifications

Requirement analysis and feature mapping

The first step in ensuring the design aligns with specifications is conducting an in-depth
requirement analysis. The project begins with understanding the exact needs of Sampath
Food City, such as monthly sales analysis, price analysis, weekly sales analysis, product
preference analysis, and total sales distribution. Each of these requirements was
systematically translated into discrete system features. For instance, the monthly sales
analysis feature is designed to allow users to input data from individual branches and
generate comprehensive reports. Similarly, the weekly sales analysis feature aggregates
data across multiple files to identify trends over shorter time frames. These mappings
between requirements and features ensure that the software design and functionality
comprehensively address the stated goals.

The evidence of this alignment is reflected in the modular structure of the code, with
functions dedicated to specific analyses, such as analyze_monthly_sales or
analyze_product_preferences. Each function performs targeted operations that directly

68
correlate with the desired features, ensuring no critical aspect is overlooked. Additionally,
the use of command-line input prompts ensures that user needs, such as selecting specific
datasets, are met seamlessly.

Validation of feature implementation against specifications

Each feature of the system was validated to ensure it met its intended purpose. For
example, the price analysis feature required the system to calculate average prices,
maximum and minimum prices, and identify price fluctuations over time. Rigorous
testing was performed to ensure the calculations were accurate across various datasets,
including edge cases such as missing or invalid price values. This validation demonstrates
that the system adheres to the specifications and reliably delivers correct results.

To further validate the feature implementation, unit testing was conducted using Python’s
unittest framework. Each critical function was subjected to test cases, ensuring proper
handling of both normal and abnormal inputs. For instance, the product preference
analysis was tested against datasets containing skewed preferences, confirming the
function’s ability to rank products accurately even in such scenarios. This testing
evidence proves that features work as intended under real-world conditions.

User-focused design for data interaction

One of the core specifications was to design a system that integrates with the existing
workflow of Sampath Food City’s POS system, specifically by reading CSV files
containing sales data. The design of the file handling module is evidence of this
integration. Users can specify file paths for data input and view detailed output in a user-
friendly CSV format. The system is designed to work with large datasets, supporting both
row-by-row processing and file chunking to accommodate memory constraints. This

69
design ensures compatibility with the real-world data volumes generated by a
supermarket chain, meeting the scalability specification.

Furthermore, the command-line interface provides clear prompts and error messages to
guide users. For example, when analyzing sales data, the system asks for the file location,
ensures the file exists, and confirms that the data format is correct. These interactions
ensure that users can operate the system without prior technical knowledge, aligning with
the specification of creating a user-friendly solution.

Meeting performance and optimization goals

The system’s architecture and design ensure efficient handling of large datasets, meeting
the performance specifications outlined in the requirements. By leveraging efficient file
handling techniques, such as using Python’s csv.reader to process data row-by-row, the
system avoids memory overflows and performs operations in a time-efficient manner. For
instance, during total sales distribution analysis, the system processes hundreds of rows of
data without performance degradation, even on systems with limited computational
resources. This efficiency directly supports the need for quick, accurate data analysis, as
required by the specifications.

Additionally, algorithm optimization is evident in features like weekly sales analysis,


where the system aggregates data by summing sales values for each week. These
operations use Python’s dictionaries for quick lookups and efficient computation,
ensuring that even large datasets can be analyzed within a reasonable time frame. This
evidence demonstrates the system’s capability to meet both functionality and performance
specifications.

Alignment with business goals through analytical insights

70
The specifications emphasize the importance of deriving actionable insights from the data
to support managerial decision-making. This goal is fulfilled through features like product
preference analysis, which identifies top-performing products, and sales distribution
analysis, which highlights revenue contributions from different purchase categories. The
system’s output is not limited to raw data but includes visual insights like rankings and
summaries, which directly contribute to business decision-making.

For example, the output of the product preference analysis includes a ranked list of
products based on sales volume. This ranking enables managers to identify high-demand
items and adjust inventory strategies accordingly. Similarly, the price analysis feature
highlights pricing trends, helping management identify opportunities for price
adjustments. These analytical capabilities demonstrate that the system fulfills the core
business objectives outlined in the specifications.

Comprehensive error handling and robustness

Another specification was to create a system that minimizes human errors and handles
anomalies gracefully. Evidence of this is present in the system’s extensive error handling
mechanisms. For example, the system checks for missing or invalid data during file
reading and provides clear error messages to the user. If a file is not found or is
incorrectly formatted, the system raises an appropriate exception and guides the user to
correct the issue. This robust design ensures the system can operate reliably in real-world
scenarios where data inconsistencies are common.

Iterative development and continuous feedback

The development process itself provides evidence of meeting specifications, as features


were implemented iteratively and tested thoroughly before moving to the next stage. This
approach allowed for continuous feedback from stakeholders, ensuring that the final

71
system met their expectations. Each module, such as the monthly sales analysis, was
developed, tested, and refined based on real-world use cases. This iterative approach
provides strong evidence that the system’s design and features align with the
specifications. (www.stan.vision, n.d.)

ACTIVITY 4

Examination of automate testing approaches

Unit testing

Unit testing focuses on testing individual units or functions of the code in isolation.
Python’s unittest framework is a powerful tool for this approach. For the sales analysis
system, each function, such as analyze_monthly_sales or analyze_product_preferences,
can be tested independently to ensure it produces correct outputs for various inputs. This
approach helps validate that core functionalities are implemented correctly and handle
normal, boundary, and edge cases effectively. For example, unit tests can confirm that the
calculate_weekly_sales function properly aggregates sales data from a CSV file and
correctly identifies invalid or missing data. Automated unit testing ensures that changes to
one part of the system do not unintentionally break other parts. Continuous execution of
unit tests during development accelerates the debugging process and improves developer
confidence.

Integration testing

Integration testing ensures that different modules of the system work seamlessly together.
In the sales analysis system, the interaction between file handling, data analysis functions,
and the command-line interface must be validated. Automated integration tests involve
creating end-to-end scenarios where the system reads from a sample CSV file, processes

72
the data, and outputs the expected results. For instance, a test can validate whether the
sales data is correctly parsed from the file and passed to the analyze_sales_distribution
function for accurate computation. By automating integration tests, potential mismatches
between modules can be quickly identified and resolved, especially when the system
scales to handle more features or integrates with external data sources.

Regression testing

Regression testing ensures that new updates or bug fixes do not introduce unintended
errors into the existing system. In a system as complex as the sales analysis software,
where features like monthly or weekly sales analysis are interdependent, introducing new
features (e.g., year-over-year comparison) can potentially disrupt existing functionalities.
Automated regression tests, using tools like unittest, re-run all previously validated tests
to confirm that no older functionality is impacted by new changes. For example, if a
feature is added to analyze seasonal sales trends, regression tests would verify that
existing monthly sales and product preference analyses still produce accurate results.

Performance testing

Performance testing evaluates how the system performs under different conditions,
including large datasets and high computational loads. In the context of the sales analysis
system, automated performance tests can be implemented to simulate large-scale
operations, such as processing millions of rows of sales data. These tests ensure that the
system handles large datasets efficiently without crashing or slowing down significantly.
Automating these tests allows for continuous performance monitoring during
development and after deployment, ensuring the system remains optimized as
requirements evolve.

Data-driven testing

73
Data-driven testing automates the process of validating the system's functionality across
multiple datasets. For the sales analysis software, which relies heavily on input data files,
this approach is particularly important. Automated tests can be configured to read
multiple test CSV files containing diverse sales scenarios, such as branches with missing
data, irregular product categories, or highly skewed sales distributions. Each dataset is
processed by the system, and the output is compared against pre-defined expected results.
This ensures that the system is robust and reliable across a wide variety of data
conditions.

Boundary and edge case testing

Automating boundary and edge case testing ensures the system behaves predictably under
extreme or unconventional conditions. For instance, the sales analysis system should be
tested with datasets containing zero sales, invalid prices, duplicate product entries, or
non-UTF8-encoded files. Automated tests can programmatically generate such datasets,
pass them to the system, and validate the error handling or fallback mechanisms. This
guarantees that the system is prepared for unexpected real-world data anomalies and
avoids crashes or inaccuracies.

User interface testing

Although the system is command-line-based, the interface’s usability and interaction


logic are critical to its success. Automated tests can simulate user inputs, such as file path
entries and menu navigation, to verify that the interface functions as expected. For
example, tests can ensure that invalid file paths prompt the correct error messages, and
valid inputs lead to successful data processing. By automating this testing approach,
developers can ensure consistent user experiences without manual verification.

74
Continuous integration and testing

Automated testing is often integrated into a continuous integration (CI) pipeline, where
tests are automatically executed whenever code changes are pushed to the repository.
This ensures that the system’s quality is constantly monitored, and any issues introduced
during development are identified early. For the sales analysis system, a CI pipeline can
run all unit, integration, and regression tests on every update, ensuring that the codebase
remains stable and that all features are working as intended. This approach significantly
reduces the time and effort required for manual testing.

Security and input validation testing

Security is an essential aspect of any system handling business-critical data. Automated


tests can simulate malicious inputs, such as injecting invalid file paths, executing
unauthorized commands, or processing corrupted CSV files, to verify that the system
handles such cases securely. For instance, an automated test might check if the system
gracefully denies access to sensitive operations without proper user authentication. This
ensures the system’s resilience against potential vulnerabilities.

Test coverage and metrics

Automated testing tools provide detailed test coverage reports, highlighting parts of the
code that are not tested adequately. By analyzing these reports, developers can identify
areas that require additional test cases, such as rarely executed conditions in error
handling. For the sales analysis system, achieving high test coverage ensures that even the
smallest components, like helper functions for parsing data, are thoroughly validated.
This comprehensive approach enhances the system’s reliability and ensures its readiness
for deployment. (Das, 2024)

75
Exploration of testing strategies and how they are applied to the system

Unit testing strategy

Unit testing is a foundational testing strategy that focuses on validating individual


components or functions of the system in isolation. For the sales analysis system, this
strategy ensures that each module—such as data parsing, monthly sales computation, and
product price analysis—works as intended. Each function is tested with normal,
boundary, and invalid input values to verify its behavior. For example, the
analyze_monthly_sales function is tested with complete datasets, datasets with missing
rows, and datasets containing invalid formats. By isolating individual units, unit testing
allows developers to identify issues at the micro-level and resolve them quickly without
affecting other parts of the system. This strategy is automated using Python’s unittest
framework, which enables systematic testing, seamless integration with continuous
testing pipelines, and generation of detailed reports for further analysis.

Integration testing strategy

The integration testing strategy evaluates the interactions between different modules and
ensures that they work together as intended. The sales analysis system relies on the
seamless integration of file handling, data processing, and reporting functionalities. For
instance, the integration between the file reader module, which extracts data from CSV
files, and the data analysis modules must be validated. Tests are conducted to ensure that
data read from the file is accurately passed into processing functions and that errors, such
as missing columns or malformed data, are correctly propagated and handled. This
strategy identifies mismatches or failures in communication between modules, ensuring
that the system functions cohesively. By automating integration tests, the testing process
becomes repeatable, efficient, and adaptable to changes in system design.

76
System testing strategy

System testing involves validating the end-to-end functionality of the entire application.
For the sales analysis system, this includes testing workflows such as user login, CSV file
input, sales analysis execution, and results display. The testing strategy ensures that the
system meets its requirements, such as providing accurate monthly sales data and
identifying product preferences. Scenarios are created to simulate real-world use cases,
such as analyzing sales data for a branch over multiple months or processing data from
multiple files simultaneously. Automating system tests provides consistency and ensures
that the application performs reliably across all defined functionalities.

Regression testing strategy

Regression testing ensures that new changes to the system do not inadvertently break
existing functionality. This strategy is particularly important for the sales analysis system,
where multiple features, such as weekly sales analysis and sales distribution computation,
are interconnected. When a new feature, such as yearly sales trend analysis, is added,
regression tests are executed to validate that previous features still function correctly.
Automated regression tests are run using Python’s unittest framework to repeatedly test
the entire codebase against existing test cases. This strategy minimizes the risk of
introducing bugs during iterative development and ensures system stability over time.

Performance testing strategy

Performance testing evaluates how well the system performs under various load
conditions, such as processing large datasets or handling multiple user requests. In the
context of the sales analysis system, performance tests assess how efficiently the system
reads, processes, and analyzes large CSV files. Scenarios include processing files with
millions of rows, analyzing data from multiple branches, and executing multiple

77
computations concurrently. Performance testing identifies bottlenecks, such as slow file
parsing or inefficient algorithms, and enables developers to optimize system performance.
Automating performance tests ensures consistent monitoring of system behavior under
different conditions and provides actionable insights for further optimization.

Security testing strategy

Security testing validates the system’s ability to protect sensitive data and resist malicious
inputs. For the sales analysis system, this strategy ensures that unauthorized users cannot
access the application or its underlying data. The login functionality, which involves
hardcoded credentials, is tested to verify that invalid credentials are consistently rejected
and that the system does not expose sensitive information. Additionally, security tests are
conducted to validate input handling, ensuring that invalid file paths or maliciously
crafted CSV files do not crash the system or lead to data corruption. Automating these
tests ensures continuous security validation during development and deployment.

Boundary and edge case testing strategy

Boundary and edge case testing ensures the system performs reliably under extreme or
unusual conditions. For the sales analysis system, edge cases include datasets with zero
sales, negative prices, or missing product information. Boundary cases, such as datasets
with exactly one row of data or maximum allowable file sizes, are also tested. This
strategy validates that the system gracefully handles these scenarios by providing
meaningful error messages or fallback mechanisms. Automating edge and boundary tests
ensures systematic validation of all potential scenarios, reducing the likelihood of system
failure in production.

Usability testing strategy

78
Although the system is command-line-based, usability testing ensures that the interface is
user-friendly and intuitive. The testing strategy involves simulating user interactions, such
as navigating menus, entering file paths, and executing analysis commands. Tests validate
that the system provides clear instructions, handles invalid inputs gracefully, and displays
results in an understandable format. Automated usability tests simulate various user
workflows and validate that the system meets user expectations consistently.

Data-driven testing strategy

Data-driven testing involves testing the system against multiple datasets to ensure
consistent functionality across diverse scenarios. For the sales analysis system, this
strategy uses various test CSV files containing different data configurations, such as
missing columns, irregular sales patterns, or outliers. Automated tests iterate through
these datasets, execute the analysis functions, and validate the results against expected
outputs. This strategy ensures the system's robustness and adaptability to real-world data
variations.

Test coverage analysis strategy

Test coverage analysis ensures that all parts of the code are adequately tested. For the
sales analysis system, this strategy involves generating reports to identify untested
functions, conditions, or error-handling paths. Automated tools integrated with the
unittest framework provide detailed coverage metrics, enabling developers to identify and
address gaps in test coverage. Comprehensive test coverage ensures that all aspects of the
system are validated, reducing the risk of undetected bugs and improving overall
reliability. (Nader, 2023)

Discussion of test coverage for different parts of the system

79
Core functional modules

The core functional modules, such as monthly sales analysis, price analysis, weekly sales
analysis, and product preference analysis, are the heart of the system. Test coverage for
these modules involves creating unit tests to validate the correctness of each function
under normal, boundary, and erroneous conditions. For example, the monthly sales
analysis module is tested with datasets representing a variety of scenarios, such as
missing data, invalid entries, and large-scale files. Similarly, price analysis functions are
tested to ensure proper handling of edge cases like negative prices or zero values. Test
coverage ensures these core functions consistently produce accurate results and handle
unexpected scenarios gracefully.

File handling and data loading

File handling is a critical part of the system, as it directly impacts how data is ingested
and processed. Comprehensive test coverage ensures the system can handle valid CSV
files while properly identifying and responding to issues such as missing files, incorrect
file paths, or corrupted data. For example, test cases simulate scenarios where files are
missing columns, contain malformed data, or exceed predefined size limits. Coverage
extends to verifying the system’s ability to extract meaningful error messages or fallback
mechanisms when data issues arise. Testing file handling ensures data integrity and
smooth interaction between the system and the underlying datasets.

Data parsing and validation

Once files are loaded, the data parsing and validation module ensures the incoming data
meets predefined standards. Test coverage for this part of the system includes validating
column names, data types, and ranges. For example, tests confirm that invalid product
names or sales values outside an acceptable range trigger appropriate error messages.

80
Boundary tests check whether minimal datasets (e.g., one row) and maximal datasets
(e.g., millions of rows) are processed efficiently. This ensures the system performs
reliably regardless of input size or format.

Error handling and exception management

Error handling is integral to creating a resilient system that continues functioning even
when faced with unexpected inputs or operational issues. Test coverage for this module
involves simulating various error conditions, such as empty datasets, incorrect user
inputs, or system resource limitations (e.g., low memory). For example, if a user enters an
invalid file path, the system should display a meaningful error message without crashing.
Comprehensive testing ensures all exceptions are caught and handled gracefully,
maintaining system stability.

User login and security

The login module plays a vital role in controlling access to the system. Test coverage
includes scenarios where users input correct, incorrect, or empty credentials. Tests also
validate the system's behavior under potential brute-force attempts by simulating repeated
login failures. This ensures that the login functionality is robust, secure, and resistant to
unauthorized access.

Reporting and output generation

The system’s ability to generate reports, whether as terminal outputs or saved files, is a
key feature. Test coverage for this part includes validating the accuracy of calculations
and ensuring the generated output matches user expectations. For example, tests simulate
scenarios where monthly sales data is summarized for a specific branch and cross-verify
the results against manually calculated benchmarks. Reporting modules are also tested to

81
ensure formatting remains consistent across different scenarios, such as varying dataset
sizes or missing information.

Integration between modules

Testing the integration between modules ensures seamless communication and


functionality across the system. For example, test cases verify that data parsed and
validated by the file handling module flows correctly into the analysis functions.
Integration tests also check whether results generated by analysis modules are accurately
displayed or stored in output files. Comprehensive test coverage identifies any issues
arising from inter-module dependencies, ensuring a smooth workflow throughout the
system.

Performance and scalability

While not directly related to functional correctness, test coverage also extends to
performance testing. Tests measure the system's ability to handle large datasets, ensuring
that file handling, parsing, and analysis modules perform efficiently without significant
delays. For example, stress tests simulate datasets with millions of rows to evaluate
memory usage and computation time. Coverage in this area ensures the system can scale
as the company’s data grows.

Edge and boundary cases

Edge and boundary testing ensures the system remains reliable under extreme conditions.
Test coverage includes scenarios such as datasets with minimal or no data, files with
excessive rows, and unusually formatted inputs. For example, tests validate how the
system responds when a CSV file contains only headers without data rows or when sales

82
values significantly exceed typical ranges. These tests confirm that the system remains
stable and provides meaningful feedback in such cases.

Code paths and conditional logic

The sales analysis system contains numerous conditional statements to handle different
workflows, such as branching logic for sales analysis or error reporting. Test coverage
ensures all possible paths within these conditions are executed during testing. For
example, if the system includes logic to detect whether a dataset belongs to a specific
branch, tests validate both the true and false branches of that logic. This ensures no
hidden bugs exist in rarely executed code paths. (Morgan, 2023)

Introduction to test plan

A test plan is a comprehensive document that outlines the strategy, objectives, resources,
scope, and schedule for testing activities in a software development project. It serves as a
guiding framework, providing clarity and direction to the testing process and ensuring
that all aspects of the software are thoroughly evaluated. A test plan is critical for aligning
the testing efforts with the overall project goals and business requirements, fostering a
systematic and efficient approach to quality assurance. By defining the testing scope,
priorities, and success criteria, the test plan helps mitigate risks and ensures the delivery
of a high-quality product.

The test plan outlines essential elements, such as the features to be tested, the testing
techniques to be used, and the tools or resources required for execution. It also identifies
the roles and responsibilities of team members, the testing environment setup, and the
schedule for testing activities, ensuring that timelines are met without compromising
quality. Furthermore, the test plan accounts for edge cases, dependencies, and potential
risks, outlining mitigation strategies to address unforeseen challenges. By documenting

83
these details, the test plan provides transparency and fosters collaboration among
stakeholders, including developers, testers, project managers, and clients, ensuring that
expectations are aligned and deliverables are met.

One of the key benefits of a test plan is its ability to serve as a reference throughout the
development lifecycle. It provides a clear roadmap for testing, enabling teams to track
progress and make informed decisions. Additionally, the test plan facilitates effective
communication, ensuring that all stakeholders have a shared understanding of the testing
goals and priorities. In complex projects involving multiple teams or geographically
distributed teams, the test plan acts as a unifying document, promoting consistency and
reducing the likelihood of oversights or misunderstandings. Ultimately, the test plan is an
indispensable tool for ensuring that the software meets the highest standards of quality,
reliability, and performance. (FineReport, 2020)

Test plan

Purpose

The purpose of the test plan is to ensure the sales analysis system for Sampath Food City
meets its functional and non-functional requirements. This system automates sales data
analysis tasks like branch-wise monthly sales, product price analysis, weekly trends,
product preferences, and sales distribution, thereby addressing the inefficiencies of the
manual process. The test plan ensures that the program provides accurate and reliable
results while handling diverse datasets and user interactions through a command-line
interface. Additionally, it validates the robustness, data integrity, and error-handling
capabilities of the system, ensuring that potential edge cases, incorrect inputs, and large
datasets are managed efficiently.

Scope

84
The scope of this test plan encompasses all functional and non-functional aspects of the
sales analysis system. Functional testing focuses on verifying the key modules, such as
user authentication, data loading from CSV files, and all five types of data analysis
operations. Non-functional testing includes performance evaluation on large datasets,
error handling for invalid or missing data, and usability validation of the command-line
interface. The scope also ensures that edge cases, such as empty datasets, incorrect CSV
formats, and invalid login attempts, are handled gracefully. Ultimately, the test plan
guarantees that the system aligns with the specifications and provides accurate results for
business decision-making.

Objectives

 Ensure that the program successfully logs in with the hardcoded username and
password.
 Verify that incorrect login attempts fail and display the appropriate error message.
 Validate that the program reads a correctly formatted CSV file and loads the data
into memory.
 Check if the program handles a missing or incorrect file path gracefully with an
error message.
 Ensure the system correctly calculates the total sales for each branch from the
dataset.
 Verify that the program calculates and displays the average price for each product
accurately.
 Confirm that the weekly sales totals are calculated correctly based on the dataset.
 Validate that the system identifies the top five most popular products based on
quantity sold.
 Ensure the program accurately calculates and displays the sales distribution by
category.
 Confirm that the program performs well and provides accurate results when
processing large datasets. (Brooks, 2023)

85
Introduction to test case

A test case is a fundamental unit within the software testing process that outlines a
specific set of conditions, inputs, and expected outputs used to validate the functionality,
behavior, and performance of a software application. Test cases are designed to ensure
that every aspect of the application, from basic features to complex interactions, operates
as intended. A well-defined test case acts as a blueprint for testers, specifying the steps to
execute, the expected results, and any prerequisites or assumptions necessary for testing.
Each test case is tailored to address a particular functionality or scenario, whether it
involves testing core business logic, validating user interactions, or assessing edge cases
and potential failures. By documenting test cases, organizations can maintain a consistent,
structured approach to testing, ensuring thorough coverage of the application’s features.

In software systems, test cases play a crucial role in detecting bugs, verifying feature
implementations, and ensuring compliance with requirements. They help identify
discrepancies between expected and actual behaviors, providing valuable feedback to
developers for refinement. Test cases can be automated or manual, with automation
offering speed, repeatability, and efficiency, especially for regression testing or repetitive
tasks. Properly constructed test cases not only streamline testing but also contribute to the
maintainability of the application by highlighting areas that require attention or
improvement. (www.lambdatest.com, n.d.)

Test cases for the application

Test case 01

Te Test Pre- Test steps Test data Expect Post- Actual Stat
st case conditi ed condit result us
ca

86
se on result ion
ID
T Login Need Run the Username The The The Succ
C1 with the login() : "admin", system user is system ess
correct applica function Password: should logged correct
credent tion and input "password print in, and ly
ials the 123" "Login the validat
username successf system ed the
"admin" ul!" and procee credent
and return ds to ials
password True. the and
"password next logged
123". step. the
user in.
Verify that
the system
compares
the input
credentials
against the
hardcoded
ones.

Confirm
that the
system
prints
"Login
successful
!" and
returns
True.

87
Test case 02

Te Test Pre- Test Test Expect Post- Actual Statu


st case conditio steps data ed conditi result s
cas n result on
e
ID
TC Login Need Run the Userna The The The Succe
2 with the login() me: system user is system ss
incorrec applicati function "user", should not correct
t on and input Passwor print logged ly
credenti the d: "Login in, and rejecte
als usernam "pass12 failed!" the d the
e "user" 3" and system login
and return asks for attemp
password False. the t and
"pass123 credenti display
". als ed
again. "Login
Verify failed!
that the ".
system

88
compar
es the
input
credenti
als
against
the
hardcod
ed
ones.

Confir
m that
the
system
prints
"Login
failed!"
and
returns
False.

89
Test case 03

Test Test case Pre- Test Test Expe Post- Actual Status
case condition steps data cted conditi result
ID resul on
t
TC3 CSV file Need the Run CSV The The The file Success
exists applicatio the file syste CSV "sales_
n test path m file is data.cs
for : shoul confirm v"
check "sale d ed to exists
ing s_da return exist in in the
the ta.cs True the specifie
existe v" if the specifie d
nce of file d location
the exists location .
CSV . .
file.

Ensur
e that
the
file
path
provi
ded
in the
CSV_
FILE

90
_PAT
H is
corre
ct.

Verif
y that
the
syste
m
confir
ms
the
file's
existe
nce
by
return
ing
True.

91
Test case 04

Tes Test case Pre- Test Test Expe Post- Actual Status
t condition steps data cted condit result
cas result ion
e
ID
TC Correctly Need the Run the A The The The Success
4 reading applicatio read_cs CSV syste sales system
data from n v() file m data is succes
CSV functio conta shoul correct sfully
n to ining d ly read read
read the produ return and the
data ct a list availa CSV
from sales contai ble for data
the data ning analysi and
CSV the s. returne
file. sales da
data non-
Verify from empty
that the the list.
functio CSV
n reads file.
the data
without
errors.

Ensure
that the
data is
returne
d and
not

92
empty.

Test case 05

Test Test Pre- Test Test Expec Post- Actual Status


case case condition steps data ted condit result
ID result ion
TC5 Monthly Need the Prepar {"br The The The Success
sales applicatio e a set anch total sales system
analysis n of ": sales data is correctly
sales "Col for correct aggregat
data omb "Colo ly ed the
for o", mbo" aggreg sales,
multip "tota should ated showing
le l_am be by total
branch ount 800.00 branch sales for
es. ": LKR. for "Colom
"500 monthl bo" as
Run .00" y 800.00
the } analysi LKR.
month {"br s.

93
ly_sal anch
es_ana ":
lysis() "Col
functi omb
on o",
with "tota
the l_am
provid ount
ed ":
data. "300
.00"
Verify }
that
the
total
sales
per
branch
are
calcul
ated
and
displa
yed
correc
tly.

94
Test case 06

Te Test Pre- Test steps Test Expect Post- Actual Statu


st case conditi data ed conditi result s
cas on result on
e
ID
TC Price Need Prepare a {"produ The The The Succe
6 analy the set of sales ct": averag averag system ss
sis of applicat data with "Apple", e price e price correctl
each ion product "price": for for y
produ prices. "100"} "Apple each calculat
ct {"produ " product ed the
Run the ct": should is average

95
price_analy "Apple", be calcula price
sis() "price": 110.00 ted and for
function "120"} LKR. availab "Apple
with the le for " as
provided analysi 110.00
data. s. LKR.

Verify that
the average
price of
each
product is
calculated
and
displayed
correctly.

96
Test case 07

Te Test Pre- Test steps Test data Expe Post- Actual Stat
st case condit cted condit result us
ca ion result ion
se
ID
T Wee Need Prepare sales {"week": The The The Succ
C7 kly the data for "1", total sales system ess
sales applica multiple weeks. "total_am sales data is correct
anal tion ount": for correct ly
ysis Run the "1000"} week ly aggreg
weekly_sales_a {"week": "1" aggreg ated
nalysis() "1", shoul ated the
function with "total_am d be by sales,
the provided ount": 3000. week showin
data. "2000"} 00 for g total
LKR. weekly sales
Verify that the analysi for
total sales per s. week
week are "1" as
calculated and 3000.0
displayed 0
correctly. LKR.

97
Test case 08

Test Test Pre- Test steps Test Expe Post Actual Status
case case condition data cted - result
ID result cond
ition
TC8 Valida Need the Prepare {"pr The The The Success
te applicatio sales data oduc total total system
system n for t": sales sales correct
respon multiple "Ora quant quan ly
se to products. nge" ity for tity summ
corrup , "Oran for ed the
ted or Run the "qua ge" each quantit
missin product_p ntity shoul prod ies for
g CSV reference_ ": d be uct "Oran

98
files analysis() "3"} 5. is ge" as
function {"pr calc 5.
with the oduc ulate
provided t": d
data. "Ora and
nge" avail
Verify , able
that the "qua for
product ntity anal
quantities ": ysis.
are "2"}
summed
and
displayed
correctly.

99
Test case 09

Te Test Pre- Test steps Test Expe Post- Actua Stat


st case condit data cted condit l us
ca ion resul ion result
se t
ID
T Sales Need Prepare sales {"categor The The The Suc
C distrib the data with y": total sales syste cess
9 ution applic different "Fruits", sales data is m
analys ation categories. "total_a for correct correc
is mount": "Fruit ly tly
Run the "500"} s" aggreg aggre
sales_distributio {"categor shoul ated gated
n_analysis() y": d be by the
function with the "Fruits", 1200. catego sales,
provided data. "total_a 00 ry for showi
mount": LKR. distrib ng
Verify that the "700"} ution total
total sales per analys sales
category are is. for
calculated and "Fruit
displayed s" as
correctly. 1200.
00
LKR.

100
Test case 10

Test Test Pre- Test steps Test Expect Post- Actual Statu
case case conditio data ed conditi result s
ID n result on
TC1 Invali Need the Run the Invali The The The Succe
0 d applicati main_men d system system system ss
menu on u() menu should continu correctl
choic function choic display es to y
e with e: 0 "Invalid prompt display
invalid choice! for ed the
menu Please valid error
choices. select a input messag
valid until a e for an
Enter an option." correct invalid

101
invalid . choice choice.
option is made.
(e.g., 0).

Verify that
the system
outputs an
error
message
and
prompts
the user to
enter a
valid
option.

(Safa Emhemed, 2023)

102
REFERENCES

Bhuyan, A.P. (2024). Combining Object-Oriented and Functional Programming in Large


Projects. [online] DEV Community.
Available at:
https://dev.to/adityabhuyan/combining-object-oriented-and-functional-programming-in-
large-projects-6m2
[Accessed 6 Jun. 2025].

‌This vs. That. (2023). Aggregation vs. Association - What’s the Difference? | This vs.
That. [online]
Available at:
https://thisvsthat.io/aggregation-vs-association
[Accessed 6 Jun. 2025].

‌Joseph, F. (2024). SOLID principles for JavaScript - LogRocket Blog. [online]


LogRocket Blog.
Available at:
https://blog.logrocket.com/solid-principles-javascript
[Accessed 6 Jun. 2025].

‌Egbajie, S. (2023). Best Practices for all developers. [online] DEV Community.
Available at:
https://dev.to/codepapi/best-practices-for-all-developers-1ak0
[Accessed 6 Jun. 2025].

Andersen, G. (2024). Full Stack Development: How to Effectively Handle Large


Datasets. [online] Moldstud.com.
Available at:
https://moldstud.com/articles/p-full-stack-development-how-to-effectively-handle-large-
datasets
[Accessed 6 Jun. 2025].

103
‌Vercel.app. (2025). Large Data Processing In Java | Restackio. [online]
Available at:
https://store-restack.vercel.app/p/java-problem-solving-methodologies-answer-large-data-
processing
[Accessed 6 Jun. 2025].

DigitalGadgetWave.com. (2023). Understanding SRP: What is Single Responsibility


Principle and How It Improves Code Quality - [Updated January 2025 ]. [online]
Available at:
https://digitalgadgetwave.com/understanding-srp-what-is-single-responsibility
[Accessed 6 Jun. 2025].

‌Amr Saafan (2023). Clean Code In C#: A Guide To Writing Elegant .NET Applications |
Nile Bits. [online] Nile Bits.
Available at:
https://www.nilebits.com/blog/2023/10/clean-code-in-c-a-guide-to-writing-elegant-net-
applications
[Accessed 6 Jun. 2025].

‌Ramuglia, G. (2023). Singleton Class in Java: What It Is and How to Use It. [online]
Linux Dedicated Server Blog.
Available at:
https://ioflood.com/blog/what-is-singleton-class-in-java
[Accessed 6 Jun. 2025].

Kumar, A. (2023). What is Sales Analysis? 4 Elements, Process, Principles, Problems.


[online] Getuplearn.
Available at:
https://getuplearn.com/blog/sales-analysis
[Accessed 6 Jun. 2025].

‌Haque, M. (2023). Functional Testing vs. Unit Testing: Unraveling the Key Differences.
[online] Bitbytesoft.com.

104
Available at:
https://bitbytesoft.com/functional-testing-vs-unit-testing-difference
[Accessed 6 Jun. 2025].

‌www.netguru.com. (n.d.). Python Testing Essentials: A Comprehensive Guide. [online]


Available at:
https://www.netguru.com/blog/python-testing-frameworks
[Accessed 6 Jun. 2025].

‌Kitakabee (2022). Unit Testing: A Detailed Guide. [online] BrowserStack.


Available at:
https://www.browserstack.com/guide/unit-testing-a-detailed-guide
[Accessed 6 Jun. 2025].

‌On, S. (2023). What are The Most Popular Data Structures in Python? [online] Squash:
On demand test environments for web apps and microservices. Save time and iterate
faster with disposable virtual machines for each branch of code.
Available at:
https://www.squash.io/most-commonly-used-python-data-structure-an-analysis
[Accessed 6 Jun. 2025].

On, S. (2023). How to Work with CSV Files in Python: An Advanced Guide. [online]
Squash: On demand test environments for web apps and microservices. Save time and
iterate faster with disposable virtual machines for each branch of code.
Available at:
https://www.squash.io/processing-csv-files-in-python
[Accessed 6 Jun. 2025].

‌www.stan.vision. (n.d.). What Is a Product Design Specification? And How to Write It |


StanVision Agency. [online]
Available at:
https://www.stan.vision/journal/what-is-a-product-design-specification-and-how-to-write-
it

105
[Accessed 6 Jun. 2025].

‌Das, A. (2024). Unit Testing in Frontend: Tools and Best Practices. [online]
PixelFreeStudio Blog -.
Available at:
https://blog.pixelfreestudio.com/unit-testing-in-frontend-tools-and-best-practices
[Accessed 6 Jun. 2025].

‌Nader, K. (2023). Different Types of Software Testing. [online] @appmaster_io.


Available at:
https://appmaster.io/blog/types-software-testing
[Accessed 6 Jun. 2025].

‌Morgan, L. (2023). What Does Client Management Systems Such As ERP, SRM Mean?
[online] Mobile Tutorials, Satellite And IPTV tutorials, PC And Automotive Tips.
Available at:
https://www.lemmymorgan.com/important-client-management-systems
[Accessed 6 Jun. 2025].

‌FineReport. (2020). A Beginner’s Guide to Sales Analysis. [online] Available at:


https://www.finereport.com/en/data-analysis/sales-analysis.html [Accessed 6 Jun. 2025].

‌Brooks, H. (2023). QA Roadmap: Test Plan vs. Test Strategy - testRigor Software
Testing Tool. [online] testRigor AI-Based Automated Testing Tool.
Available at:
https://testrigor.com/blog/test-plan-vs-test-strategy
[Accessed 6 Jun. 2025].

‌www.lambdatest.com. (n.d.). How To Write Test Cases - A Complete Guide With


Examples And Best Practices. [online]
Available at:
https://www.lambdatest.com/learning-hub/test-case
[Accessed 6 Jun. 2025].

106
‌Safa Emhemed (2023). Test Case VS Test Suite. [online] Software Testing Journey.
Available at:
https://testing-journey.hashnode.dev/test-case-vs-test-suite
[Accessed 6 Jun. 2025].

107

You might also like