[go: up one dir, main page]

0% found this document useful (0 votes)
55 views50 pages

Report On Ectopic Beat Classification Using ML

The project report titled 'Ectopic Beat Classification Using Machine Learning' details a system developed to classify ectopic beats in ECG signals using machine learning techniques, specifically a Random Forest classifier. Utilizing the MIT-BIH Arrhythmia Database, the project achieved a classification accuracy of 99.15% and includes a web-based interface for real-time analysis. The report outlines the significance of automated ECG analysis in improving cardiac diagnostics and addresses the limitations of traditional manual interpretation methods.

Uploaded by

pavan Janagama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views50 pages

Report On Ectopic Beat Classification Using ML

The project report titled 'Ectopic Beat Classification Using Machine Learning' details a system developed to classify ectopic beats in ECG signals using machine learning techniques, specifically a Random Forest classifier. Utilizing the MIT-BIH Arrhythmia Database, the project achieved a classification accuracy of 99.15% and includes a web-based interface for real-time analysis. The report outlines the significance of automated ECG analysis in improving cardiac diagnostics and addresses the limitations of traditional manual interpretation methods.

Uploaded by

pavan Janagama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

A Real Time Project Report on

“Ectopic Beat Classification Using Machine


Learning’’

Submitted in partial fulfilment of


the Requirement for the award
of
Real Time Project in

ELECTRONICS AND COMMUNICATION ENGINEERING


(Communication and Signal Processing)

Submitted by
L.HRUSHIKESH REDDY –
23011A0404
A. HIMAJA REDDY - 23011A0419
CH. RISHIKA PRIYA – 23011A0434
JANAGAMA PAVAN – 23011A0458
ABDUL MANAN KHAN –
23011A0465

Under the esteemed


guidance of
Dr. M. RAJESH
Assistant professor (c)

Jawaharlal Nehru Technological University


Hyderabad University College of Engineering
Hyderabad (Autonomous) Kukatpally, Hyderabad -
500085, Telangana
Academic Year 2024-25
1
JNTUH UNIVERSITY COLLEGE OF ENGINEERING, SCIENCE
AND TECHNOLOGY HYDERABAD 2024-2025

DECLARATION OF CANDIDATES

We, the undersigned declare that the project report entitled “Ectopic
Beat Classification Using Machine Learning” has been carried out and
submitted in partial fulfilment of the requirements for the Award of the
Real Time Project at JNTUH College if Engineering, Science and
Technology Hyderabad, is an authentic work carried out under
Guidance of Dr M. RAJESH Assistant Professor(c) ECE,
JNTUHUCESTH and has not been submitted to any other university.

L. HRUSHIKESH REDDY – 23011A0404

A. HIMAJA REDDY-23011A0419

CH. RISHIKA PRIYA-23011A0434

JANAGAMA PAVAN –23011A0458

ABDUL MANAN KHAN-23011A0465

2
JNTUH UNIVERSITY COLLEGE OF ENGINEERING,
SCIENCE AND TECHNOLOGY, HYDERABAD
2024-2025

This is to certify that the project entitled “Ectopic Beat Classification


Using Machine Learning’’ Bonafide record of a project carried out
by

L.HRUSHIKESH REDDY – 23011A0404


A. HIMAJA REDDY - 23011A0419
CH. RISHIKA PRIYA – 23011A0434
JANAGAMA PAVAN – 23011A0458
ABDUL MANAN KHAN – 23011A0465

The Department of Electronics and Communication and Engineering,


Jawaharlal Nehru Technological University College of Engineering,
Science and Technology Hyderabad, submitted in partial fulfilment of the
requirement for the award of Real Time Project.

PROJECT GUIDE HEAD OF THE DEPARTMENT


Dr. M. RAJESH Dr. T. MADHAVI KUMARI
Assistant Professor (c), Associate Professor & Head
Department of ECE, Department of ECE,
JNTUHUCESTH JNTUHUCESTH

3
ACKNOWLEDGEMENT

We wish our sincere thanks to our project guide Dr M. RAJESH,


Assistant Professor (c), Dept. of ECE, for continuous guidance
and suggestions in carrying out the project work in the college.

We are Thankful to Dr. T. Madhavi Kumari, Professor and


Head of the Department, Department of ECE, JNTUHUCESTH
for supporting us to carry out the project work in the college.

We are writing this acknowledgment with great honor, pride


and pleasure to pay our respects to the people who enabled us either
directly or indirectly in the accomplishment of this project.

We are immensely thankful to each and every faculty member


of ECE Department for their relentless contribution towards
successful completion of our course.

L.HRUSHIKESH REDDY-23011A0404

A. HIMAJA REDDY - 23011A0419

CH. RISHIKA PRIYA 23011A0434

JANAGAMA PAVAN – 23011A0458

ABDUL MANAN KHAN - 23011A0465

4
TABLE OF CONTENTS PAGE NO

ABSTRACT 8

CHAPTER 1: INTRODUCTION 9
1.1 Types of Ectopic Beats 10
1.2 Objectives of the Project 11
1.3 Conclusion 12

CHAPTER 2: LITERATURE REVIEW 13


2.1 Motivation for the Study 13
2.2 Review of Existing Approaches 14
2.3 Justification for This Project 15
2.4 Summary of Existing Studies 16
2.5 Research Gaps 16
2.6 Contributions of the Present Work 17

CHAPTER 3: ECG SIGNAL PREPROCESSING & ML 18


BASED BEAT CLASSIFICATION
3.1 Introduction 18
3.2 System Model 18
3.2.1 Data Collection 19
3.2.2 Preprocessing 20
3.2.3 Feature Extraction 20
3.2.4 ML Training 22
3.2.5 Web Page Development 24
3.3 Dataset 27
3.4 Conclusion 28

CHAPTER 4: RESULTS & Analysis 29


4.1 ML Simulation Results 29
4.2 Web Page Simulation results 32

5
CHAPTER 5: CONCLUSION & FUTURE SCOPE 34
5.1 Conclusion 34
5.1.1 Advantages 34
5.1.2 Disadvantages 35
5.1.3 Applications 36
5.2 Future Scope 37

REFERENCES 38

APPENDICES 39

6
List of Figures

Figure no Caption Page no


3.1 Block diagram for ectopic beat classification 17
using machine learning
3.4.1 Internal workflow of Random Forest Model for 21
ECG Feature-based classification
3.5.1 Webpage frontend deployment 22
3.5.2 ECG Data Processing and Classification Flow 24

4.1.1 Confusion matrix 28


4.2.1 Frontend of webpage 31
4.2.2 Output after uploading the ECG features 32
Figure (a) Data frame, which was given to ML model 39
for training & testing

List of Tables
Table no Title Page no
1.1 Types of Ectopic Beats 10
2.3 Summary of Previous Studies 15
3.3 Feature Extraction 19
3.4.1 Total Number of Beats in the Dataset 20
3.4.2 Train set 20
3.4.3 Test set 20
3.6 The features extracted 26
3.7 Summary 26
4.1 Evaluation Metrics 29

7
ABSTRACT
Cardiovascular diseases are a leading cause of global mortality, with
arrhythmias- particularly ectopic beats, which are premature heartbeats
originating from abnormal locations in the heart- are critical indicators of
cardiac health. Early and accurate detection of arrhythmias is essential for
timely diagnosis and treatment. This project presents a machine learning-
based system for classification of ectopic beats using ECG signals.
The MIT-BIH Arrhythmia Database, developed by Beth Israel Hospital
in collaboration with MIT and hosted by PhysioNet, was used as the primary
data source for this project. It contains 48 half-hour ECG recordings from
47 different patients, representing a diverse range of arrhythmias and
normal rhythms. This makes it one of the most widely used benchmark
datasets for evaluating arrhythmia detection algorithms. After downloading
the dataset, preprocessing techniques such as bandpass filtering (to
remove baseline drift and noise) and heartbeat segmentation (based on R-
peak detection) were applied. From each segmented beat, a rich set of
meaningful features spanning time-domain, morphological, slope-based,
and heart rate variability (HRV) metrics were the 19 features extracted to
form the input for machine learning classification.
The extracted features were used to train a Random Forest classifier,
chosen, as it performed more robustly on noisy ECG data, compared to
other models such as SVM or KNN. The model achieved a high classification
accuracy of 99.15% and F1 score of 94%, especially for imbalanced dataset
like ECG data, in distinguishing between five classes of ectopic beats based
on the AAMI EC57 standard: Non-ectopic (N), Supraventricular ectopic (S),
Ventricular ectopic (V), Fusion (F), and Unknown (Q).
To make the system accessible and interactive, a web-based interface
was developed that allows users to upload ECG features and receive beat
classification results. The project demonstrates the potential of machine
learning in automating cardiac diagnostics, thereby supporting clinical
decision-making and improving patient care by real-time monitoring.

8
CHAPTER 1
INTRODUCTION

The human heart functions by maintaining a rhythmic electrical activity


that controls its beating. Any deviation from this normal rhythm is termed
an arrhythmia, which may present as an irregular, abnormally fast, or slow
heartbeat. While some arrhythmias are benign, a specific type of
arrhythmia—ectopic beats can be life-threatening and may lead to strokes,
heart failure, or sudden cardiac arrest. Ectopic beats do not originate from
the sinoatrial (SA) node, but rather from other sites within the heart.
With the global rise in cardiovascular diseases, the early and accurate
detection of arrhythmias has become a major priority in healthcare.

Traditionally, arrhythmias are diagnosed using electrocardiogram (ECG)


recordings that monitor the heart’s electrical activity. These recordings are
manually interpreted by cardiologists. However, manual interpretation is
time-consuming, prone to human error, and requires specialized expertise,
particularly for long-duration or continuous ECG monitoring.

To address these challenges, automated ECG analysis systems have


emerged. These systems integrate signal processing techniques with
machine learning (ML) algorithms to detect and classify arrhythmic
patterns. ML models are capable of recognizing subtle patterns in ECG
waveforms that may go unnoticed by human observers, thereby reducing
diagnostic time and supporting clinical decision-making.

This project aims to develop such a system using the MIT-BIH Arrhythmia
Database, a widely recognized benchmark dataset in this field. We employ
a Random Forest classifier to classify each heartbeat into one of five
categories, based on the AAMI EC57 classification standard.
The AAMI EC57 standard, developed by the Association for the
Advancement of Medical Instrumentation (AAMI), provides guidelines for
the evaluation and categorization of arrhythmia detection algorithms. It

9
defines standardized heartbeat classes and performance metrics to ensure
consistency, safety, and comparability across different ECG analysis
systems. By adhering to this standard, the classification results become
more clinically relevant and interpretable by healthcare professionals.

1.1 Types of Ectopic Beats

Ectopic Beat Feature-Based Classification Origin


Characteristics
Type
N - Normal Beat PR interval is consistent. QRS duration SA node
/ Non-ectopic is narrow and uniform. RR intervals are
beat regular and stable. Heart rate is steady
and within normal limits. P, Q, R, S, T
amplitudes follow typical morphology.
Slopes (PR, QR, RS, ST) are smooth and
predictable. Low variability in RR
intervals (low SDNN, RMSSD).
S- PR interval may be shortened
or Atria or AV
node
Supraventricular inconsistent. P wave may be absent,
Ectopic Beat inverted, or merged with T wave. QRS
(SVEB) duration remains narrow (not widened
like VEB). RR interval is irregular, often
with premature occurrence. Pre-RR and
Post-RR intervals are asymmetric. PR
slope may be steeper or erratic.
Increased variability in RR intervals.
V - Ventricular QRS duration is noticeably widened and Ventricles
Ectopic Beat distorted. P wave is typically absent
(VEB) before the ectopic beat. R and S
amplitudes are exaggerated or abnormal
in shape. RR interval shows a
compensatory pause after the ectopic
beat. RS slope is steep or irregular. ST

10
segment may be displaced. High
variability in RR intervals and beat
morphology.
F - Fusion Beat QRS complex appears as a hybrid Both
supraventricul
between normal and VEB. R amplitude
ar &
and shape are intermediate or blended. ventricular
PR interval may be distorted or partially
present. QR and RS slopes show mixed
characteristics. T wave may be altered or
ambiguous. RR interval is slightly
irregular but not as erratic as SVEB or
VEB.
Q - Unknown Features do not clearly match any known Indeterminate
Beat beat type. Multiple waveform
components are distorted or missing. QT
interval may be abnormal without a clear
pattern. T amplitude and ST slope are
often atypical. RR intervals may be
highly irregular or inconsistent. High
variability in all features; used when
classification is uncertain.
Table 1.1: Types of Ectopic Beats

1.2 Objectives of the Project

The key objectives of this project are:

1. To utilize the MIT-BIH Arrhythmia Database for training and evaluation,


addressing class imbalance and inter-patient variability in ECG data.

2. To preprocess raw ECG signals by applying noise filtering, heartbeat


segmentation, and annotation of key points (P, Q, R, S, T).

3. To extract relevant features such as time-domain metrics, morphological


and slope-based attributes, and heart rate variability (HRV) indicators.

11
4. To train and validate a Random Forest classifier for accurate multi-class
heartbeat classification based on AAMI EC57 standards.

5. To develop a web-based interface for real-time ECG analysis, allowing


users to upload data, perform automated classification, and view results.

1.3 Conclusion
This chapter introduced the significance of detecting serious cardiac
conditions. It highlighted the limitations of manual ECG
interpretation and the need for automated, accurate solutions. The
proposed system utilizes the MIT-BIH Arrhythmia Database, applies
preprocessing and feature extraction techniques, and employs a
Random Forest classifier to categorize ECG beats into five AAMI-
defined classes. The integration of a web-based interface further
enhances usability, making the system suitable for practical
deployment in clinical or remote.

12
CHAPTER 2

LITERATURE REVIEW
Automated ECG analysis has been a growing area of research, driven by the
need for efficient and accurate cardiac diagnostics. Numerous studies have
explored the use of machine learning for arrhythmia detection, with a particular
focus on classifying abnormal beats such as ectopic beats. More recent work has
explored deep learning models, particularly Convolutional Neural Networks
(CNNs), which can learn hierarchical features directly from raw ECG waveforms.
While CNNs and LSTMs show superior accuracy, there is limited work on hybrid
systems that combine interpretability with accuracy for ectopic beat detection in
real-time scenarios.

This project builds on these foundations by implementing a feature-based


classification pipeline tailored for ectopic beat detection. It aims to strike a
balance between performance and practicality, offering a solution that is both
accurate and computationally efficient.

2.1. Motivation for the Study

The following surveys and studies were conducted to justify the


relevance and necessity of this project:

1. Increasing Global Burden of Heart Disease


According to the World Health Organization (WHO), cardiovascular
diseases (CVDs) are the leading cause of death globally, accounting for
~17.9 million deaths annually. Among CVDs, arrhythmias are often
underdiagnosed, especially in low-resource [1].

2. Rise in Wearable ECG Devices


The popularity of devices like Apple Watch, Fitbit, and Holter monitors
has increased interest in automated ECG analysis tools. These devices
generate large volumes of ECG data, which requires intelligent algorithms
for real-time interpretation.

13
3. Availability of Standardized Datasets
The MIT-BIH Arrhythmia Database from PhysioNet is a globally
recognized dataset that provides reliable annotations and is used as a
benchmark in many research studies. This availability enables
reproducible and comparative research [2].

4. Success of ML in Healthcare
Recent studies and publications have shown that machine learning
models like Random Forests, CNNs, and LSTMs outperform traditional
rule-based ECG classifiers. This provides a strong technical foundation
and validation for pursuing an ML-based solution [3].

2.2 Review of Existing Approaches

Rule-Based Systems:
Initial arrhythmia detection methods were predominantly rule-based systems,
relying on handcrafted algorithms that analyzed ECG waveforms based on
amplitude thresholds, time intervals, and signal morphology. Although these
systems offered interpretable logic, they lacked adaptability across patient
profiles and were highly susceptible to noise and signal variability.
Moody & Mark (2001) et.al.[4] highlighted how the early use of the MIT-BIH
dataset in rule-based classifiers provided the foundation for standardized
evaluation but exposed limitations in handling class imbalance and beat
variability.

Traditional Machine Learning Approaches:

With the availability of labeled datasets and feature extraction techniques,


traditional machine learning models such as Support Vector Machines (SVM),
Decision Trees, and k-Nearest Neighbors (k-NN) gained popularity.
Wasimuddin et al. (2020) [8] performed a comprehensive study of such
approaches using time-domain, HRV, and morphological features. Their results
indicated that while models like SVM performed well (~95% accuracy), they were
sensitive to noise and often required intensive feature engineering.
Mahmood et al. (2021) [12] introduced a hierarchical method using RR
intervals and random projections, achieving ~96% accuracy. However, their
14
model struggled with classifying rare arrhythmias and required dimensionality
reduction techniques for performance optimization.

Deep Learning Techniques:


Deep learning has emerged as a dominant force in ECG classification due
to its ability to learn features automatically from raw signals. Convolutional
Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks are
among the most successful architectures.
Gour et al. (2024) [11] reviewed ECG-based heart disease classification
techniques and concluded that CNNs achieved up to 97–98% accuracy without
manual feature extraction. Despite their superior performance, these models are
often criticized for being computationally intensive and lacking interpretability—
limiting their adoption in clinical settings.

Hybrid Models:

To address the trade-off between accuracy and interpretability, some


studies explored hybrid methods—combining deep learning with traditional
features or incorporating ensemble learning with feature engineering. Although
not widespread, these models show promise in balancing performance with
transparency. For instance, ensemble methods like Random Forests have been
favored for their interpretability, lower computational cost, and robustness to
noise.
The current project builds on this approach by using a Random Forest
classifier trained on 19 handcrafted features derived from ECG beats, covering
time-domain, morphological, slope-based, and HRV characteristics.

2.3 Justification for This Project

The goal of this project is to develop an ECG beat classification system is:
Interpretable, for clinical use and validation and deployable, via a user-friendly
web interface.
Random Forests offer a strong balance of accuracy, speed, and transparency,
especially in cases involving noisy biomedical signals. Compared to deep neural
networks, they are less prone to overfitting and can perform well even with
smaller datasets like MIT-BIH.
15
2.4 Summary of Existing Studies

Author(s) Methodology Dataset Features Accuracy Limitations


Used
Moody & Rule-based MIT- Intervals, ~90% Poor
Mark (2001) BIH Morphology generalization
[4]
Wasimuddin SVM, k-NN MIT- Time- ~95% Sensitive to
et al. (2020) BIH domain, noise
[5] HRV
Mahmood et Hierarchical MIT- RR + ~96% Lacked
al. (2021) ML + RR BIH Projections interpretability
[10] Intervals
Gour et al. CNN (Deep MIT- Raw ECG 97–98% Black-box
(2024) [11] Learning) BIH signal behavior
This Project Random MIT- 19 ECG- 99.15% Limited to 1
Forest BIH derived dataset
features
Table 2.3: Summary of Previous Studies

2.5 Research Gaps

Despite extensive research and significant advancements in the field of ECG


arrhythmia classification, several key challenges and limitations persist:
1. Limited Interpretability: Many deep learning-based models function as "black
boxes," offering limited insight into decision-making processes. This lack of
transparency makes them difficult to validate and adopt in clinical environments.
2. Underperformance on Minority Classes: Classification accuracy for
underrepresented arrhythmia classes—such as Fusion (F) and Supraventricular (S)
beats—remains low due to dataset imbalance, leading to biased or incomplete
diagnostic outputs.
3. Lack of Real-time Integration: While algorithmic accuracy has improved, few
models have been developed into deployable systems that operate in real-time or
interactive clinical workflows.

16
4. Generalizability Issues: Most studies rely heavily on limited datasets such as
the MIT-BIH Arrhythmia Database. Models trained exclusively on these datasets
often fail to generalize across different populations, device types, or signal
acquisition conditions.

2.6 Contributions of the Present Work

To bridge the research gaps outlined above, this project proposes a practical and
interpretable solution tailored to ectopic beat classification in real-time scenarios:
1. Proposed Utilization of interpretable features rather than opaque latent vectors.
2. Adopted and applied Random Forest classifier for balanced accuracy and
robustness.
3. Developed Web-Based Integration for real-time usability through a web-based
interface.

17
CHAPTER 3
ECG SIGNAL PREPROCESSING & ML BASED BEAT
CLASSIFICATION

3.1 Introduction
Electrocardiogram (ECG) signals, being non-stationary and prone to various
types of noise and artifacts, require systematic preprocessing before they can be
used effectively in machine learning models. This chapter outlines the end-to-end
workflow for ECG-based beat classification, starting from raw signal acquisition to
model deployment. The process involves critical steps such as noise filtering, beat
segmentation, feature extraction, and supervised classification. Each stage plays a
vital role in ensuring that the ECG data is clean, meaningful, and structured
appropriately for accurate and reliable classification of arrhythmic beats. The
overall methodology is designed to ensure both clinical relevance and
computational efficiency, ultimately enabling real-time analysis through a web-
based interface.

3.2 System Model

The complete process of ECG signal analysis with ectopic beat classification
using machine learning is illustrated in figure 3.1. It begins with an ECG dataset,
which undergoes preprocessing involving conversion to an ECG signal array,
filtering to remove noise, segmentation, to isolate meaningful portions of the signal,
and labelling.

WFDB
ECG ECG
Dataset signal Filtering Segmentation Labelling
array

preprocessing

ML Feature
Web page extraction
training

Figure. 3.1: Block diagram for ectopic beat classification using machine learning

18
After preprocessing, the segmented signals are passed through a feature extraction
stage where critical characteristics such as intervals and amplitudes are derived.
These features serve as input to a machine learning classifier, which is trained to
identify various beat types. The trained model is then integrated into a web platform
for real-time interaction, visualization, and decision support.
1. ECG Dataset
The process starts with acquiring the ECG dataset, which contains raw ECG signals.
2. Preprocessing
The ECG data undergoes several preprocessing steps, which include:
ECG Signal Array: Converting raw ECG data into a signal array suitable for
processing using WFDB library
Filtering: Removing noise, baseline wander, and other artifacts from the ECG
signal.
Segmentation: Dividing the ECG signal into smaller, manageable segments
(e.g., heartbeat cycles).
Labeling: The points P, Q, S, and T in each ECG cycle are labelled using
adaptive windowing method based on its morphology and clinical
characteristics
3. Feature Extraction
From the segmented and filtered ECG signals, key features (like RR intervals,
QRS duration, amplitudes, etc.) are extracted for analysis.
4. ML Training (Machine Learning Training)
The extracted features are used to train a machine learning model. This model
can then be used for tasks such as classification, anomaly detection, or
diagnosis.
5. Web Page
Finally, the trained model is deployed on a web page to enable user interaction
or visualization of results.

3.2.1 Data Collection:


In this project, MIT-BIH Arrhythmia dataset was taken. The raw ECG data
was first converted into ECG signal array using WFDB library so that it can
undergo filtering and segmentation.

19
3.2.2 Preprocessing:
This step involves filtering, ECG beat segmentation and labelling
 Filtering:
A Butterworth bandpass filter with lower cutoff frequency of 0.5 Hz to remove
baseline wander and Higher cutoff frequency of 40 Hz to remove muscle noise
& interference was used.
 Segmentation:
In the segmentation process, first, R-peaks were detected from annotation
files, where the ECG data is sampled at 360 Hz frequency.
 Labelling:
Inspired by Umer et al. (2014), the adaptive labeling method was adopted
which scales the search window for P-wave and T-wave relative to each beat’s
local RR interval, thereby aligning better with physiological timing variability.
Unlike rigid fixed windows, this approach adjusts dynamically to changes in
heart rate and beat morphology, yielding high accuracy (± 5%) even on
arrhythmic ECGs such as those in the MIT‑BIH database.
3.2.3 Feature Extraction:
After pre-processing, the Time-domain, Morphological, Slope-based, and
Heart rate variability features were extracted.
Feature Included Clinical Significance
Category Features
Time-Domain PR interval, - Reflect conduction times through atria,
Features QRS AV node, and ventricles.- Abnormal
duration, durations indicate blocks, pre-excitation,
QT interval, or ventricular origin.- RR interval reflects
RR interval, rhythm regularity.
Heart rate,
RR
variation
Morphological P, Q, R, S, - Represent electrical activity of atria and
Features T ventricles.- Abnormal shapes or amplitudes
amplitudes, suggest ectopic origin, hypertrophy, or
Pre-RR avg, ischemia.- P wave absence/inversion →

20
post-RR atrial ectopy.- High R or deep S →
avg ventricular ectopy.
Slope-Based PR slope, - Reflect the rate of voltage change between
Features QR slope, waveform segments.- Steep slopes may
RS slope, indicate rapid depolarization or abnormal
ST slope conduction.- ST slope changes may reflect
ischemia or repolarization abnormalities.
Heart Rate SDNN, - Measure beat-to-beat variability and
Variability RMSSD autonomic nervous system balance.- High
(HRV) variability may indicate arrhythmia or
Features ectopic activity.- RMSSD reflects short-
term variability; SDNN reflects overall
variability.
Table 3.3: Feature Extraction

'PR interval': (Q - P) / fs,


'QRS_duration': (S - Q) / fs,
'QT_interval': (T - Q) / fs,
'RR_interval': RR,
'RR_variation': RR_var,
'P_amp': signal_data[P],
'Q_amp': signal_data[Q],
'R_amp': signal_data[R],
'S_amp': signal_data[S],
'T_amp': signal_data[T],
'PR_slope': (signal_data[R] - signal_data[P]) / (R - P)
'QR_slope': (signal_data[R] - signal_data[Q]) / (R - Q)
'RS_slope': (signal_data[S] - signal_data[R]) / (S - R)
'ST_slope': (signal_data[T] - signal_data[S]) / (T - S)
'Pre_R_avg_amp': np.mean(signal_data[max(0, R - 30):R]),
'Post_R_avg_amp': np.mean(signal_data[R:R + 30]) ;
'Heart_rate': 60 / RR

21
3.2.4 ML Training
The obtained features along with the corresponding beat annotation and AAMI label
were made into a Data frame for ML model training. The total data frame consists
of the following number of beats:
Label Number of Beats
N 85,738
Q 8,007
V 3,189
S 2,114
F 772
Table 3.4.1: Total Number of Beats in the Dataset

This data was split into 80% training and 20% testing sets as shown below:

Train set(80%)
Label Number of Beats
N 68,590
Q 6,406
V 2,551
S 1,691
F 618
Table 3.4.2: Train set

Test set (20%)


Label Number of Beats
N 17,148
Q 1,601
V 638
S 423
F 154
Table 3.4.3: Test set

The figure 3.4.1 illustrates the internal mechanism of the Random Forest model
used for classifying ECG beats based on 19 extracted features. Each ECG beat is
first transformed into a numerical feature vector capturing time-domain,
morphological, and heart rate variability (HRV)-based characteristics. This feature
vector is then passed to an ensemble of decision trees—each trained on a random subset of
features and data. Each tree independently predicts a class label (e.g., Normal (N),
Ventricular (V), etc.). In the example shown, Decision Tree #1 and #100 predict class 'N',
22
while Decision Tree #2 predicts 'V'.
The final classification is determined by majority voting, where the most frequently
predicted class among all the trees is selected as the final output. For instance, if
60 out of 100 trees vote for class 'N' and 40 vote for 'V', the final prediction will be
'N'.
This ensemble strategy improves model robustness, reduces overfitting, and
provides reliable classification, especially for imbalanced datasets like MIT-BIH,
where some beat types are underrepresented.

Figure 3.4.1 Internal workflow of Random Forest Model for ECG Feature-based classification

23
3.2.5 Web Page development
FRONTEND
The frontend is the user-facing part of the project — the webpage or
interface that users interact with to upload ECG features, initiate
classification, and view the results. It serves as a bridge between the user
and the backend model.
 User selects and uploads ECG features data.
 Sends it to backend via a request.
 Displays the classification results and graphs after receiving response.

DISPLAY
RESULTS
UPLOAD ON
USER send file
ECG FILE WEBPAGE:
OPENS to ML
(.csv Beat
WEBPAGE model
format) classification
summary

Figure 3.5.1 webpage frontend deployment

3.5.1Key Functionalities:

1. User-Friendly Layout and Configuration:


o The webpage uses {st.set_page_config()} to set a custom page title

and icon, making the interface visually appealing and easy to


recognize.
o A title and brief instruction appear at the top using {st.title() and
st.write()} to orient the user.

2. Model Upload Section:


o In the sidebar, users can upload a trained machine learning model

(.pkl file) using a file_uploader.


o Once uploaded, the model is loaded using pickle.load(), and
Streamlit’s session_state is used to retain the model across
interactions.

24
3. ECG Feature Upload:

o Users upload ECG feature data in .csv or .xlsx format.

o The system displays a preview of the uploaded data in a scrollable


dataframe, allowing users to verify input correctness.

4. Prediction Mechanism:

o When the "Make Predictions" button is clicked, the backend


validates and processes the data.
o Features are matched against the trained model’s expected inputs.
Missing features are filled with default values (zeros).
o The Random Forest model then predicts the class for each ECG beat.

5. Result Display:

o Prediction results are shown in a clean, categorized summary using


st.metric() for each beat type (e.g., N, S, V, F, Q), with their meaning.
o A detailed table of predictions is also provided, allowing users to
inspect individual results.
6. Waveform Visualization:

o A custom ECG waveform is generated for each beat using Gaussian


functions to simulate PQRST complexes.
o The user can select how many ECG beats to visualize, and
matplotlib is used to plot the synthetic ECG waveform with labeled
peaks.
7. Download Option:

o Users can download their classification results in CSV format using


st.download_button().
8. Session Management:

o st.session_state is used to track uploaded files, model state, and


prediction status, allowing smooth transitions without losing user
progress.

25
BACKEND
Handles ECG features data.
Classifies the ECG beat using the trained Random Forest model.
Sends results back to frontend for display.

• user uploads file


Receive File via
API Request

• Checks format
• Reads ECG data
File Validation

• Predicts beat category


• Random Forest Classifier
Use Trained
ML Model

• Predictions per beat


• display of waveform
Output

Fig no 3.5.2: ECG Data Processing and Classification Flow

26
Web Development requirements
 Frontend
Languages: Python
Libraries : streamlit, pickle, numpy, pandas, matplotlib.pyplot,
collections.Counter, io

 Backend
Framework: Python Flask.
Functions:
 Accept ECG input from user
 Preprocess and extract features
 Call the trained ML model for classification
 Return and display results
3.3 DATASET
The dataset used in this project is MIT-BIH Arrhythmia Dataset, developed
by Beth Isreal Hospital and MIT, which contains 48 half-hour ECG recordings
from 47 subjects sampled at a rate of 360 Hz frequency per channel. The leads
used are modified limb lead II (MLII), V1, occasionally V2 or V5 and in one
instance V4. Each beat is annotated by expert cardiologists with rhythm and
beat type labels. The data is distributed in WFDB format, including .dat(ECG
signal), .hea(record info, sampling frequency, etc), .atr(annotation) files.
 From this Dataset, 19 set of meaningful features were extracted and
categorized the corresponding beat level annotations into 5 classes according
to AAMI EC57
 (Association for the Advancement of Medical Instrumentation) standard

The features extracted were:


Category Features
Time-domain PR interval, QRS duration, QT interval, RR interval, Heart
rate, RR variation
Morphological P amplitude, Q amplitude, R amplitude, S amplitude, T
amplitude, Pre RR average amp, Post RR average amp
Slope-based PR slope, QR slope, RS slope, ST slope

27
Statistical RR_SDNN (standard deviation of RR intervals),
(HRV) RR_RMSSD (Root Mean Square of Successive Differences)
Table 3.6: The features extracted
 The obtained features along with the corresponding beat annotation and
AAMI
 label are made into a Dataframe for ML model training.

3.4. Conclusion

This project uses ECG data from the MIT-BIH Arrhythmia Database to detect heart
arrhythmias with a Random Forest Classifier. Implemented in Python, it extracts
key features like time-domain, morphological, slope, and HRV.

Libraries used include WFDB, NumPy, Pandas, SciPy, Scikit-learn, and Matplotlib
for data handling, ML, and visualization. A simple HTML/CSS/JavaScript frontend
is included for user interaction. The system offers an efficient, automated method
for Ectopic detection using biomedical signal processing and machine learning.

28
CHAPTER 4
RESULTS & ANALYSIS
The experimental environment was designed to evaluate the
performance of an ECG signal classification system using machine learning
techniques. The simulation and implementation were carried out using
Python, with key libraries including NumPy, Pandas, Matplotlib, SciPy, and
scikit-learn for data manipulation, visualization, signal processing, and model
training.
The ECG dataset used in this study consists of labelled beats categorized as
N (Normal), Q, V, S, and F classes. Preprocessing steps included filtering,
segmenting, and labelling the signals before feature extraction.
The features extracted were grouped into four major categories: Time-Domain
Features, Morphological Features, Slope-Based Features, and Heart Rate
Variability (HRV) Features. These were selected based on their clinical
significance in detecting abnormalities in heart rhythms and conduction
pathways. The feature extraction process resulted in a diverse set of input
parameters fed into machine learning algorithms for classification.
The total dataset was split into 80% training and 20% testing sets. The
training set included 68,590 N beats, 6,406 Q beats, 2,551 V beats, 1,691 S
beats, and 618 F beats. The test set comprised 17,148 N beats, 1,601 Q beats,
638 V beats, 423 S beats, and 154 F beats.
Classification was performed using Random Fores model and model’s
performance was evaluated using accuracy, precision, recall, and F1-score.

4.1 ML Simulation Results


During training and testing the Random Forest ML model, 99.21%
efficiency was achieved. The model was trained to detect and classify the
following five types of ectopic beats: Non-ectopic beat, Supraventricular
ectopic beat, Ventricular ectopic beat, Fusion beat and Unknown beat.

29
Class Precision Recall F1-score Support
F 0.95 0.75 0.84 154
N 0.99 1.00 1.00 17,148
Q 1.00 0.99 1.00 1,601
S 0.98 0.84 0.91 423
V 0.98 0.93 0.96 638
Accuray 0.99 19964
Macro 0.98 0.90 0.94 19964
Average
Weighted 0.99 0.99 0.99 19964
Average
Table 4.1: Evaluation Metrics

Figure 4.1.1: confusion matrix

30
Analysis:
The classification model was evaluated using standard performance metrics—
Precision, Recall, F1-score, and Support—across five beat classes: F (Fusion),
N (Normal), Q (Unknown), S (Supraventricular), and V (Ventricular). The
results are summarized below:

Key Observations

1. High Overall Accuracy


o The model achieved an overall accuracy of 99.15%, indicating
o excellent general performance across the dataset of 19,964 beats.
2. Class-wise Performance
o Normal (N) beats, which constitute the majority class, were
classified with near-perfect performance: Precision = 0.99,
Recall = 1.00, F1-score = 1.00.
o Unknown (Q) beats also showed excellent classification with
Precision = 1.00 and F1-score = 1.00, though Recall was slightly
lower at 0.99.
o Ventricular (V) beats were detected with high reliability (F1-score
= 0.96), showing the model’s strength in identifying clinically
critical ectopic events.
o Supraventricular (S) beats had a slightly lower Recall (0.84),
suggesting that some instances were missed or misclassified.
o Fusion (F) beats had the lowest performance, particularly in
Recall (0.75) and F1-score (0.84), indicating that the model
struggles most with this rare and morphologically ambiguous
class.
3. Macro vs. Weighted Averages
o Macro Average (treats all classes equally):
 Precision = 0.98, Recall = 0.90, F1-score = 0.94
 Indicates strong average performance across all classes,
regardless of class imbalance.
o Weighted Average (accounts for class frequency):
 Precision = 0.99, Recall = 0.99, F1-score = 0.99
31
 Confirms that the model performs exceptionally well on the
dataset as a whole, especially for dominant classes.

4.2 Web Page Simulation results

Figure 4.2.1: frontend of webpage

32
Figure 4.2.2: output after uploading the ECG features

33
CHAPTER 5
CONCLUSION & FUTURE SCOPE

5.1 Conclusion

The project successfully demonstrates the potential of machine


learning techniques in automating the detection and classification
of ectopic beats using ECG features. By utilizing the MIT-BIH
Arrhythmia Database, robust preprocessing methods, and a
comprehensive set of extracted features, the system was able to
classify ECG beats. A Random Forest classifier was trained and
tested on a dataset of nearly 100,000 ECG beats, achieving a high
classification accuracy of 99.21%.

The key achievements of this project include:

 Excellent classification performance, particularly for Normal,


Unknown, and Ventricular beats.
 Fusion and Supraventricular beats present classification challenges,
likely due to their low support and morphological overlap with other
classes.
 The high F1-scores across most classes indicate a well-balanced trade-off
between Precision and Recall, making the model suitable for real-world
ECG monitoring applications.
 The integration of a user-friendly web interface further enhanced
the project by demonstrating how this solution can be used in
real-world settings for both patients and healthcare professionals.

5.1.1 Advantages
 Enables faster and more efficient ectopic beat detection, especially
in emergency or high-volume scenarios.

 Achieved an impressive 99.21% classification accuracy using a


Random Forest classifier.

34
 Liable performance across multiple ectopic classes (N, S, V, F, Q)
improves diagnostic confidence.

 Reduces dependency on repeated ECG reviews by specialists.

 Can be deployed in rural or under-resourced areas with minimal


infrastructure, reducing healthcare costs.

 The user-friendly web interface allows healthcare staff or patients


to upload ECG features files and get immediate results.

 Enables remote usage without specialized software.

 Useful for training medical students in ECG interpretation.

 Acts as a baseline for academic research in biomedical signal


processing and health AI

5.1.2. Disadvantages

 The model was trained using only the MIT-BIH Arrhythmia


Database, which consists of recordings from just 47 subjects.
This may reduce its generalizability to patients with different
demographics, heart conditions, or ECG device settings.

 Some arrhythmia classes (like Fusion (F) or Unknown (Q)) occur


much less frequently in the dataset. As a result, the model may
be biased toward majority classes (e.g., Normal beats) and
perform poorly on rare but critical arrhythmias.

 Preprocessing, feature extraction, and classification are done on


the server after ECG upload. The current pipeline may not be
suitable for real-time monitoring or continuous ECG streaming
from wearable devices.

35
 The segmentation and feature extraction process relies heavily
on correctly identifying R-peaks. Any error in R-peak detection
(due to noise or motion artifacts) can cause incorrect
classification.

 Uploading ECG data to a web server raises concerns about data


security, especially under laws like HIPAA or GDPR. The current
implementation may not fully protect patient confidentiality.

5.1.3 APPLICATIONS
Clinical Decision Support Systems:
 Assists doctors in diagnosing ectopic beats more quickly and
accurately.
 Reduces the workload on cardiologists by automating

AI-Powered Healthcare Devices beat-by-beat analysis.


 The ML model can be embedded into mobile apps, IoT
devices, or ECG monitors.
 Used in ambulances or portable ECG kits to assist
paramedics with early triage.

Medical Research & Education


 The system can be used by researchers and students to study:
 ECG signal patterns
 Classification techniques
 Impact of various features on arrhythmia detection
 Supports training simulations for cardiology residents and
medical students.

Integration with Hospital Management Systems (HMS)


 Can be linked with HMS or Electronic Health Records (EHR) to store
and access ECG diagnosis history.

36
5.2 Future Scope

o Future improvements could focus on augmenting underrepresented classes


(like F and S) and enhancing feature sensitivity to subtle waveform
variations.

o The Integration with real-time ECG streaming from wearables or


mobile devices.

o Use of deep learning models (CNN, LSTM) to improve performance


and feature learning.

o Support for multi-lead ECGs and 12-lead standard hospital ECG


devices.

o Validation through clinical trials for regulatory compliance and


deployment in medical environments.

o Enhancement of data privacy and security features for patient


safety and legal compliance.

37
REFERENCES
[1] https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_1

[2] https://physionet.org/content/mitdb/1.0.0/

[3]https://accessmedicine.mhmedical.com/content.aspx?bookid=2725&sectionid=225759283

[4] G. B. Moody and R. G. Mark, “The impact of the MIT-BIH arrhythmia database,” IEEE
Engineering in Medicine and Biology Magazine, vol. 20, no. 3, pp. 45–50, May 2001, doi:
10.1109/51.932724.

[5] M. Wasimuddin, K. Elleithy, A.-S. Abuzneid, M. Faezipour, and O. Abuzaghleh, “Stages-Based


ECG Signal Analysis From Traditional Signal Processing to Machine Learning Approaches: A
Survey,” IEEE Access, vol. 8, pp. 177782–177798, Sep. 2020, doi:
10.1109/ACCESS.2020.3026968.

[6] A. L. Goldberger et al., “PhysioNet: A Web-Based Resource for the Study of Physiologic
Signals,” IEEE Engineering in Medicine and Biology Magazine, vol. 20, no. 3, pp. 70–75, May–
Jun. 2001, doi: 10.1109/51.932728.

[7] Association for the Advancement of Medical Instrumentation, Testing and Reporting
Performance Results of Cardiac Rhythm and ST Segment Measurement Algorithms, AAMI
EC57:2012, AAMI, 2012.

[8] PhysioNet, “MIT-BIH Arrhythmia Database,” [Online]. Available: https://physionet.org.


[Accessed: Jul. 5, 2025].

[9] Scikit-learn Developers, “Scikit-learn: Machine Learning in Python,” [Online]. Available:


https://scikit-learn.org/stable/. [Accessed: Jul. 5, 2025].

[10] MIT Laboratory for Computational Physiology, “WFDB Python Toolbox,” [Online]. Available:
https://github.com/MIT-LCP/wfdb-python. [Accessed: Jul. 5, 2025].

[11] A. Gour, M. Gupta, R. Wadhvani, and S. Shukla, “ECG based heart disease classification:
Advancement and review of techniques,” Procedia Computer Science, vol. 235, pp. 1634–1648,
2024, doi: 10.1016/j.procs.2024.04.155.

[12] A. N. Mahmood, N. N. Q. Zulkefli, and A. A. Manaf, “A new hierarchical method for inter-
patient ECG arrhythmia classification using random projections and RR intervals,” Computer
Methods and Programs in Biomedicine, vol. 208, p. 106226, 2021, doi:
10.1016/j.cmpb.2021.106226.

[13] S. Umer, M. U. D. Sheikh, S. Raza, and M. A. Wahla,“Electrocardiogram feature extraction


and pattern recognition using a novel windowing algorithm,”Adv. Biosci. Biotechnol., vol. 5, no.
11, pp. 886–894, 2014.doi: 10.4236/abb.2014.511103

38
APPENDICES

Figure (a): Data frame, which was given to ML model for training & testing
(*this is not the complete data frame, the data frame is huge, with 99821 rows and 22colums)

Webpage Creation
import streamlit as st
import pickle
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
#import io

# Set page config


st.set_page_config(
page_title="ECG Beat Classification",
page_icon="❤",
layout="wide"
)

# Title and description


st.title("❤ ECG Beat Classification System")
st.write("Upload your trained model and ECG features to classify heartbeat types")
39
# Initialize session state
if 'model' not in st.session_state:
st.session_state.model = None
if 'predictions_made' not in st.session_state:
st.session_state.predictions_made = False

# Label meanings dictionary


label_meaning = {
'N': 'Normal / Non-ectopic beat',
'S': 'Supraventricular ectopic beat',
'V': 'Ventricular ectopic beat',
'F': 'Fusion beat',
'Q': 'Unknown beat'
}

# Gaussian wave function


def gaussian_wave(center, width, amplitude, t):
return amplitude * np.exp(-0.5 * ((t - center) / width) ** 2)

# ECG plot function


def plot_synthetic_ecg(features, row_index, prediction):
fs = 500
rr_interval = features['RR_interval']
num_samples = int(rr_interval * fs)
t = np.linspace(0, rr_interval, num_samples)

p_center = 0.2 * rr_interval


q_center = 0.4 * rr_interval
r_center = 0.42 * rr_interval
s_center = 0.44 * rr_interval
t_center = 0.6 * rr_interval

p_wave = gaussian_wave(p_center, 0.025, features['P_amp'], t)


q_wave = gaussian_wave(q_center, 0.012, features['Q_amp'], t)
r_wave = gaussian_wave(r_center, 0.015, features['R_amp'], t)
s_wave = gaussian_wave(s_center, 0.012, features['S_amp'], t)
t_wave = gaussian_wave(t_center, 0.05, features['T_amp'], t)

ecg_waveform = p_wave + q_wave + r_wave + s_wave + t_wave

fig, ax = plt.subplots(figsize=(10, 4))


ax.plot(t, ecg_waveform, label=f'Beat {row_index}', color='blue', linewidth=2)
ax.axvline(p_center, color='green', linestyle='--', alpha=0.7, label='P')
ax.axvline(q_center, color='purple', linestyle='--', alpha=0.7, label='Q')
ax.axvline(r_center, color='red', linestyle='--', alpha=0.7, label='R')
ax.axvline(s_center, color='orange', linestyle='--', alpha=0.7, label='S')
ax.axvline(t_center, color='brown', linestyle='--', alpha=0.7, label='T')

prediction_text = f"{prediction} ({label_meaning.get(prediction, 'Unknown')})"


ax.set_title(f"ECG Waveform - Beat {row_index} | Prediction: {prediction_text}",
fontsize=14, fontweight='bold')

40
ax.set_xlabel("Time (s)")
ax.set_ylabel("Amplitude (mV)")
ax.grid(True, alpha=0.3)
ax.legend()
plt.tight_layout()

return fig

# Sidebar for file uploads


st.sidebar.header(" File Uploads")

# Step 1: Model Upload


st.sidebar.subheader("1. Upload Trained Model")
model_file = st.sidebar.file_uploader(
"Choose your trained model file (.pkl)",
type=['pkl'],
help="Upload the pickle file containing your trained ECG classification model"
)

if model_file is not None:


try:
# Load the model
model = pickle.load(model_file)
st.session_state.model = model
st.sidebar.success(" Model loaded successfully!")

# Display model info


if hasattr(model, 'feature_names_in_'):
st.sidebar.info(f"Model expects {len(model.feature_names_in_)} features")

except Exception as e:
st.sidebar.error(f" Error loading model: {str(e)}")

# Step 2: ECG Data Upload


st.sidebar.subheader("2. Upload ECG Features")
data_file = st.sidebar.file_uploader(
"Choose ECG feature file (.csv or .xlsx)",
type=['csv', 'xlsx', 'xls'],
help="Upload the file containing ECG features for classification"
)

# Main content
if st.session_state.model is None:
st.info(" Please upload your trained model file (.pkl) in the sidebar to get started")
elif data_file is None:
st.info(" Please upload your ECG features file (.csv or .xlsx) in the sidebar")
else:
try:
# Load ECG data
if data_file.name.endswith('.csv'):
df = pd.read_csv(data_file)
else:
41
df = pd.read_excel(data_file)

st.success(f" Loaded ECG data: {len(df)} records")

# Show data preview


with st.expander(" Data Preview"):
st.dataframe(df.head())

# Process data and make predictions


if st.button(" Make Predictions", type="primary"):
with st.spinner("Processing ECG data and making predictions..."):
# Clean data
df_processed = df.drop(columns=['Annotation', 'Annotation_meaning',
'aami_label'], errors='ignore')

# Get required features


model = st.session_state.model
required_features = model.feature_names_in_

# Add missing columns


missing_cols = []
for col in required_features:
if col not in df_processed.columns:
df_processed[col] = 0
missing_cols.append(col)

if missing_cols:
st.warning(f"⚠ Added missing columns with default values: {',
'.join(missing_cols)}")

# Reorder columns to match training


df_processed = df_processed[required_features]

# Make predictions
predictions = model.predict(df_processed)

# Store results in session state


st.session_state.predictions = predictions
st.session_state.df_processed = df_processed
st.session_state.predictions_made = True

# Display results if predictions are made


if st.session_state.predictions_made:
predictions = st.session_state.predictions
df_processed = st.session_state.df_processed

# Summary
st.header(" Prediction Results")
summary = Counter(predictions)

# Create columns for summary display


cols = st.columns(len(summary))
42
for i, (label, count) in enumerate(summary.items()):
with cols[i]:
meaning = label_meaning.get(label, 'Unknown')
st.metric(
label=f"{label} - {meaning}",
value=count,
help=f"Number of {meaning.lower()} detected"
)

# Detailed results table


with st.expander(" Detailed Results"):
results_df = pd.DataFrame({
'Beat_Index': range(len(predictions)),
'Prediction': predictions,
'Meaning': [label_meaning.get(p, 'Unknown') for p in predictions]
})
st.dataframe(results_df)

# ECG Waveform Visualization


st.header(" ECG Waveform Visualization")

# Allow user to select number of plots


max_plots = min(10, len(df_processed))

if max_plots == 1:
num_plots = 1
st.info(" Displaying the single ECG waveform in your dataset")
else:
num_plots = st.slider("Number of ECG waveforms to display:", 1, max_plots,
min(5, max_plots))

# Plot ECG waveforms


for i in range(num_plots):
prediction = predictions[i]
st.subheader(f"Beat {i} - Prediction: {prediction} ({label_meaning.get(prediction,
'Unknown')})")

try:
fig = plot_synthetic_ecg(df_processed.iloc[i], i, prediction)
st.pyplot(fig)
plt.close() # Close figure to free memory
except Exception as e:
st.error(f"Error plotting beat {i}: {str(e)}")

# Download results
st.header(" Download Results")
results_df = pd.DataFrame({
'Beat_Index': range(len(predictions)),
'Prediction': predictions,
'Meaning': [label_meaning.get(p, 'Unknown') for p in predictions]
})

43
csv = results_df.to_csv(index=False)
st.download_button(
label=" Download Predictions as CSV",
data=csv,
file_name="ecg_predictions.csv",
mime="text/csv"
)

except Exception as e:
st.error(f" Error processing data: {str(e)}")

# Footer
st.markdown("---")
st.markdown("ECG Beat Classification System | Built with Streamlit ❤") this is code for
webpage creation

ML training and testing


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
import joblib # Import for saving the model

# --- Step 1: Load your CSV containing extracted features and AAMI labels ---
df = pd.read_csv(r"C:\Adaptive
labelling\(final)aami_standardized_features_adaptive_label.csv") # Replace with your
actual file path

# --- Step 2: Separate features and labels ---


X = df.drop(columns=['aami_label', 'Annotation', 'Annotation_meaning'], errors='ignore') #
Drop non-feature columns
y = df['aami_label'] # Use AAMI label as the target variable

# --- Step 3: Split the data into 80% training and 20% testing ---
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
) # Stratify ensures balanced class distribution

# --- Step 4: Train a Random Forest Classifier ---


model = RandomForestClassifier(n_estimators=100, random_state=42) # Use 100 trees
model.fit(X_train, y_train) # Train on training set

# --- Step 5: Make predictions on the test set ---


y_pred = model.predict(X_test) # Predict using trained model

# --- Step 6: Calculate and print metrics ---


print(" Classification Report:\n")

44
print(classification_report(y_test, y_pred, zero_division=0)) # Handles divisions by zero
print(" Accuracy Score:", accuracy_score(y_test, y_pred))

# --- Step 7: Plot the confusion matrix ---


cm = confusion_matrix(y_test, y_pred, labels=model.classes_) # Get confusion matrix

# --- Step 8: Visualize Confusion Matrix using Seaborn ---


plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=model.classes_,
yticklabels=model.classes_)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')
plt.tight_layout()
plt.show()

# --- Step 9: Save the trained model using joblib ---


model_save_path = "(final)Random_forest_arrhythmia_model.pkl" # Path to save the
model
joblib.dump(model, model_save_path) # Save the trained model to disk

print(f" Model trained on full data and saved to '{model_save_path}' successfully.")

this is code ML training and testing

AAMI Mapping
import pandas as pd

# --- Step 1: Load your feature DataFrame ---


df = pd.read_csv(r"C:\Adaptive labelling\(final)extracted_ecg_features_Adaptive_label.csv")

# --- Step 2: Define AAMI Mapping Dictionary ---


aami_mapping = {
'N': 'N', 'L': 'N', 'R': 'N', 'e': 'N', 'j': 'N', # Normal beats
'A': 'S', 'a': 'S', 'J': 'S', 'S': 'S', # Supraventricular beats
'V': 'V', 'E': 'V', # Ventricular beats
'F': 'F', # Fusion beats
'/' : 'Q', 'f': 'Q', 'Q': 'Q', '!' : 'Q', '|' : 'Q' # Unknown or unclassifiable beats
}

# --- Step 3: Apply AAMI Label Mapping ---


df['aami_label'] = df['Annotation'].map(aami_mapping)
df = df.dropna(subset=['aami_label']) # Drop rows with unmapped labels

# --- Step 4: Print Sample Count Per Class (Before Saving) ---
print(" Sample count per AAMI class:")
print(df['aami_label'].value_counts())

# --- Step 5: Save the AAMI-labeled DataFrame (No Balancing Applied) ---
df.to_csv("(final)aami_standardized_features_adaptive_label.csv", index=False)

45
print("\n Saved to: (final)aami_standardized_features_adaptive_label.csv")
this is code for aami mapping

Data preprocessing and feature extraction


import os
import wfdb
import numpy as np
import pandas as pd
from scipy.signal import butter, filtfilt, find_peaks

# ------------------------------
# 1. Load and Select Lead
# ------------------------------
def load_record(base_path, record_name):
try:
record = wfdb.rdrecord(os.path.join(base_path, record_name))
annotation = wfdb.rdann(os.path.join(base_path, record_name), 'atr')
print(f" Loaded: {record_name}")
return record, annotation
except Exception as e:
print(f" Failed to load {record_name}: {e}")
return None, None

def select_ecg_lead(record):
lead_names = record.sig_name
signals = record.p_signal
if 'MLII' in lead_names:
lead_index = lead_names.index('MLII')
selected_lead = signals[:, lead_index]
print(" ➤ Selected Lead: MLII (Lead II)")
else:
lead_index = 0
selected_lead = signals[:, lead_index]
print(f" ➤ MLII not found. Using lead: {lead_names[lead_index]}")
return selected_lead, lead_names[lead_index]

def process_all_records(data_dir):
processed_data = []
dat_files = [f for f in os.listdir(data_dir) if f.endswith('.dat')]
for dat_file in dat_files:
record_name = os.path.splitext(dat_file)[0]
record, annotation = load_record(data_dir, record_name)
if record and annotation:
signal, lead_used = select_ecg_lead(record)
processed_data.append({
'record_name': record_name,
'lead_name': lead_used,
'signal': signal,
'annotation': annotation
})

46
return processed_data

# ------------------------------
# 2. Filtering
# ------------------------------
def butter_bandpass(lowcut, highcut, fs, order=4):
nyquist = 0.5 * fs
low = lowcut / nyquist
high = highcut / nyquist
b, a = butter(order, [low, high], btype='band')
return b, a

def apply_bandpass_filter(signal, fs, lowcut=0.5, highcut=40.0, order=4):


b, a = butter_bandpass(lowcut, highcut, fs, order)
return filtfilt(b, a, signal)

def filter_loaded_records(all_records_data):
for record in all_records_data:
fs = 360
raw_signal = record['signal']
record['filtered_signal'] = apply_bandpass_filter(raw_signal, fs)
return all_records_data

# ------------------------------
# 3. R-peak Detection & Labeling
# ------------------------------
def detect_r_peaks(filtered_signal, fs=360):
r_peaks, _ = find_peaks(filtered_signal, height=0.5, distance=fs/2)
return r_peaks

def label_r_peaks_with_annotations(filtered_signal, annotation, fs=360):


detected_r_peaks = detect_r_peaks(filtered_signal, fs)
labeled_r_peaks = []
for i, sample in enumerate(annotation.sample):
closest_idx = np.argmin(np.abs(detected_r_peaks - sample))
closest_peak = detected_r_peaks[closest_idx]
if abs(closest_peak - sample) <= fs * 0.05:
labeled_r_peaks.append((closest_peak, annotation.symbol[i]))
return labeled_r_peaks

def label_all_r_peaks_for_all_records(all_filtered_data):
for record in all_filtered_data:
record['labeled_r_peaks'] = label_r_peaks_with_annotations(
record['filtered_signal'], record['annotation']
)
return all_filtered_data

# ------------------------------
# 4. Adaptive PQRST Labeling
# ------------------------------
def label_pqrst_points_adaptive(filtered_signal, labeled_r_peaks, fs=360):
pqrst_labeled_points = []

47
for i, (r_index, symbol) in enumerate(labeled_r_peaks):
if 0 < i < len(labeled_r_peaks) - 1:
prev_rr = r_index - labeled_r_peaks[i - 1][0]
next_rr = labeled_r_peaks[i + 1][0] - r_index
rr_interval = int((prev_rr + next_rr) / 2)
elif i > 0:
rr_interval = r_index - labeled_r_peaks[i - 1][0]
elif i < len(labeled_r_peaks) - 1:
rr_interval = labeled_r_peaks[i + 1][0] - r_index
else:
rr_interval = int(0.8 * fs)

labeled_points = {'R': r_index, 'R_symbol': symbol}


search_window = {
'P': (r_index - int(0.4 * rr_interval), r_index - int(0.1 * rr_interval)),
'Q': (r_index - int(0.08 * fs), r_index),
'S': (r_index + 1, r_index + int(0.08 * fs)),
'T': (r_index + int(0.1 * rr_interval), r_index + int(0.5 * rr_interval))
}

for point in ['P', 'Q', 'S', 'T']:


start, end = search_window[point]
start = max(0, start)
end = min(len(filtered_signal), end)
segment = filtered_signal[start:end]

if len(segment) == 0:
labeled_points[point] = None
continue

idx = np.argmax(segment) if point in ['P', 'T'] else np.argmin(segment)


labeled_points[point] = start + idx

pqrst_labeled_points.append(labeled_points)
return pqrst_labeled_points

# ------------------------------
# 5. Annotation Map
# ------------------------------
annotation_map = {
'N': 'Normal beat', 'L': 'LBBB', 'R': 'RBBB', 'A': 'APB', 'V': 'PVC', 'F': 'Fusion', 'E': 'VEB',
'/': 'Paced beat', 'f': 'Fusion of paced and normal beat', 'j': 'Nodal escape beat'
}

# ------------------------------
# 6. Feature Extraction
# ------------------------------
def compute_hrv_features(r_peaks, fs=360):
rr_intervals = np.diff(r_peaks) / fs
if len(rr_intervals) < 2:
return 0.0, 0.0
sdnn = np.std(rr_intervals)

48
rmssd = np.sqrt(np.mean(np.diff(rr_intervals) ** 2))
return sdnn, rmssd

def extract_ecg_features(signal_data, r_peaks, pqrst_points, fs=360):


features = []
sdnn, rmssd = compute_hrv_features(r_peaks, fs)
for i in range(1, len(r_peaks)):
r = r_peaks[i]
prev_r = r_peaks[i - 1]
beat = pqrst_points.get(r, {})
if not all(k in beat and beat[k] is not None for k in ['P', 'Q', 'S', 'T']):
continue
P, Q, R, S, T = beat['P'], beat['Q'], r, beat['S'], beat['T']
RR = (r - prev_r) / fs
RR_prev = (prev_r - r_peaks[i - 2]) / fs if i > 1 else RR
RR_var = abs(RR - RR_prev)
row = {
'PR_interval': (Q - P) / fs,
'QRS_duration': (S - Q) / fs,
'QT_interval': (T - Q) / fs,
'RR_interval': RR,
'RR_variation': RR_var,
'P_amp': signal_data[P],
'Q_amp': signal_data[Q],
'R_amp': signal_data[R],
'S_amp': signal_data[S],
'T_amp': signal_data[T],
'PR_slope': (signal_data[R] - signal_data[P]) / (R - P) if R != P else 0.0,
'QR_slope': (signal_data[R] - signal_data[Q]) / (R - Q) if R != Q else 0.0,
'RS_slope': (signal_data[S] - signal_data[R]) / (S - R) if S != R else 0.0,
'ST_slope': (signal_data[T] - signal_data[S]) / (T - S) if T != S else 0.0,
'Pre_R_avg_amp': np.mean(signal_data[max(0, R - 30):R]),
'Post_R_avg_amp': np.mean(signal_data[R:R + 30]),
'Heart_rate': 60 / RR if RR > 0 else 0,
'RR_SDNN': sdnn,
'RR_RMSSD': rmssd,
'Annotation': beat.get('annotation', 'N'),
'Annotation_meaning': annotation_map.get(beat.get('annotation', 'N'), 'Unknown')
}
features.append(row)
return pd.DataFrame(features)

# ------------------------------
# 7. Execute Pipeline
# ------------------------------
if _name_ == "_main_":
mitdb_path = r"C:\final requirements for ECG project\mit-bih-arrhythmia-database"
all_records_data = process_all_records(mitdb_path)
all_filtered_data = filter_loaded_records(all_records_data)
all_labeled_data = label_all_r_peaks_for_all_records(all_filtered_data)
all_features = []

49
for record in all_labeled_data:
record['pqrst_points'] = label_pqrst_points_adaptive(
record['filtered_signal'], record['labeled_r_peaks']
)
r_peaks = [r[0] for r in record['labeled_r_peaks']]
pqrst = {
r['R']: {
'P': r['P'], 'Q': r['Q'], 'S': r['S'], 'T': r['T'], 'annotation': r['R_symbol']
} for r in record['pqrst_points']
}
features_df = extract_ecg_features(record['signal'], r_peaks, pqrst)
all_features.append(features_df)

final_df = pd.concat(all_features, ignore_index=True)


final_df.to_csv("extracted_ecg_features_Adaptive_label(final).csv", index=False)
print(" ECG features saved to 'extracted_ecg_features_Adaptive_label(final).csv'")
code for data preprocessing and feature extraction

50

You might also like