Report On Ectopic Beat Classification Using ML
Report On Ectopic Beat Classification Using ML
Submitted by
L.HRUSHIKESH REDDY –
23011A0404
A. HIMAJA REDDY - 23011A0419
CH. RISHIKA PRIYA – 23011A0434
JANAGAMA PAVAN – 23011A0458
ABDUL MANAN KHAN –
23011A0465
DECLARATION OF CANDIDATES
We, the undersigned declare that the project report entitled “Ectopic
Beat Classification Using Machine Learning” has been carried out and
submitted in partial fulfilment of the requirements for the Award of the
Real Time Project at JNTUH College if Engineering, Science and
Technology Hyderabad, is an authentic work carried out under
Guidance of Dr M. RAJESH Assistant Professor(c) ECE,
JNTUHUCESTH and has not been submitted to any other university.
A. HIMAJA REDDY-23011A0419
2
JNTUH UNIVERSITY COLLEGE OF ENGINEERING,
SCIENCE AND TECHNOLOGY, HYDERABAD
2024-2025
3
ACKNOWLEDGEMENT
L.HRUSHIKESH REDDY-23011A0404
4
TABLE OF CONTENTS PAGE NO
ABSTRACT 8
CHAPTER 1: INTRODUCTION 9
1.1 Types of Ectopic Beats 10
1.2 Objectives of the Project 11
1.3 Conclusion 12
5
CHAPTER 5: CONCLUSION & FUTURE SCOPE 34
5.1 Conclusion 34
5.1.1 Advantages 34
5.1.2 Disadvantages 35
5.1.3 Applications 36
5.2 Future Scope 37
REFERENCES 38
APPENDICES 39
6
List of Figures
List of Tables
Table no Title Page no
1.1 Types of Ectopic Beats 10
2.3 Summary of Previous Studies 15
3.3 Feature Extraction 19
3.4.1 Total Number of Beats in the Dataset 20
3.4.2 Train set 20
3.4.3 Test set 20
3.6 The features extracted 26
3.7 Summary 26
4.1 Evaluation Metrics 29
7
ABSTRACT
Cardiovascular diseases are a leading cause of global mortality, with
arrhythmias- particularly ectopic beats, which are premature heartbeats
originating from abnormal locations in the heart- are critical indicators of
cardiac health. Early and accurate detection of arrhythmias is essential for
timely diagnosis and treatment. This project presents a machine learning-
based system for classification of ectopic beats using ECG signals.
The MIT-BIH Arrhythmia Database, developed by Beth Israel Hospital
in collaboration with MIT and hosted by PhysioNet, was used as the primary
data source for this project. It contains 48 half-hour ECG recordings from
47 different patients, representing a diverse range of arrhythmias and
normal rhythms. This makes it one of the most widely used benchmark
datasets for evaluating arrhythmia detection algorithms. After downloading
the dataset, preprocessing techniques such as bandpass filtering (to
remove baseline drift and noise) and heartbeat segmentation (based on R-
peak detection) were applied. From each segmented beat, a rich set of
meaningful features spanning time-domain, morphological, slope-based,
and heart rate variability (HRV) metrics were the 19 features extracted to
form the input for machine learning classification.
The extracted features were used to train a Random Forest classifier,
chosen, as it performed more robustly on noisy ECG data, compared to
other models such as SVM or KNN. The model achieved a high classification
accuracy of 99.15% and F1 score of 94%, especially for imbalanced dataset
like ECG data, in distinguishing between five classes of ectopic beats based
on the AAMI EC57 standard: Non-ectopic (N), Supraventricular ectopic (S),
Ventricular ectopic (V), Fusion (F), and Unknown (Q).
To make the system accessible and interactive, a web-based interface
was developed that allows users to upload ECG features and receive beat
classification results. The project demonstrates the potential of machine
learning in automating cardiac diagnostics, thereby supporting clinical
decision-making and improving patient care by real-time monitoring.
8
CHAPTER 1
INTRODUCTION
This project aims to develop such a system using the MIT-BIH Arrhythmia
Database, a widely recognized benchmark dataset in this field. We employ
a Random Forest classifier to classify each heartbeat into one of five
categories, based on the AAMI EC57 classification standard.
The AAMI EC57 standard, developed by the Association for the
Advancement of Medical Instrumentation (AAMI), provides guidelines for
the evaluation and categorization of arrhythmia detection algorithms. It
9
defines standardized heartbeat classes and performance metrics to ensure
consistency, safety, and comparability across different ECG analysis
systems. By adhering to this standard, the classification results become
more clinically relevant and interpretable by healthcare professionals.
10
segment may be displaced. High
variability in RR intervals and beat
morphology.
F - Fusion Beat QRS complex appears as a hybrid Both
supraventricul
between normal and VEB. R amplitude
ar &
and shape are intermediate or blended. ventricular
PR interval may be distorted or partially
present. QR and RS slopes show mixed
characteristics. T wave may be altered or
ambiguous. RR interval is slightly
irregular but not as erratic as SVEB or
VEB.
Q - Unknown Features do not clearly match any known Indeterminate
Beat beat type. Multiple waveform
components are distorted or missing. QT
interval may be abnormal without a clear
pattern. T amplitude and ST slope are
often atypical. RR intervals may be
highly irregular or inconsistent. High
variability in all features; used when
classification is uncertain.
Table 1.1: Types of Ectopic Beats
11
4. To train and validate a Random Forest classifier for accurate multi-class
heartbeat classification based on AAMI EC57 standards.
1.3 Conclusion
This chapter introduced the significance of detecting serious cardiac
conditions. It highlighted the limitations of manual ECG
interpretation and the need for automated, accurate solutions. The
proposed system utilizes the MIT-BIH Arrhythmia Database, applies
preprocessing and feature extraction techniques, and employs a
Random Forest classifier to categorize ECG beats into five AAMI-
defined classes. The integration of a web-based interface further
enhances usability, making the system suitable for practical
deployment in clinical or remote.
12
CHAPTER 2
LITERATURE REVIEW
Automated ECG analysis has been a growing area of research, driven by the
need for efficient and accurate cardiac diagnostics. Numerous studies have
explored the use of machine learning for arrhythmia detection, with a particular
focus on classifying abnormal beats such as ectopic beats. More recent work has
explored deep learning models, particularly Convolutional Neural Networks
(CNNs), which can learn hierarchical features directly from raw ECG waveforms.
While CNNs and LSTMs show superior accuracy, there is limited work on hybrid
systems that combine interpretability with accuracy for ectopic beat detection in
real-time scenarios.
13
3. Availability of Standardized Datasets
The MIT-BIH Arrhythmia Database from PhysioNet is a globally
recognized dataset that provides reliable annotations and is used as a
benchmark in many research studies. This availability enables
reproducible and comparative research [2].
4. Success of ML in Healthcare
Recent studies and publications have shown that machine learning
models like Random Forests, CNNs, and LSTMs outperform traditional
rule-based ECG classifiers. This provides a strong technical foundation
and validation for pursuing an ML-based solution [3].
Rule-Based Systems:
Initial arrhythmia detection methods were predominantly rule-based systems,
relying on handcrafted algorithms that analyzed ECG waveforms based on
amplitude thresholds, time intervals, and signal morphology. Although these
systems offered interpretable logic, they lacked adaptability across patient
profiles and were highly susceptible to noise and signal variability.
Moody & Mark (2001) et.al.[4] highlighted how the early use of the MIT-BIH
dataset in rule-based classifiers provided the foundation for standardized
evaluation but exposed limitations in handling class imbalance and beat
variability.
Hybrid Models:
The goal of this project is to develop an ECG beat classification system is:
Interpretable, for clinical use and validation and deployable, via a user-friendly
web interface.
Random Forests offer a strong balance of accuracy, speed, and transparency,
especially in cases involving noisy biomedical signals. Compared to deep neural
networks, they are less prone to overfitting and can perform well even with
smaller datasets like MIT-BIH.
15
2.4 Summary of Existing Studies
16
4. Generalizability Issues: Most studies rely heavily on limited datasets such as
the MIT-BIH Arrhythmia Database. Models trained exclusively on these datasets
often fail to generalize across different populations, device types, or signal
acquisition conditions.
To bridge the research gaps outlined above, this project proposes a practical and
interpretable solution tailored to ectopic beat classification in real-time scenarios:
1. Proposed Utilization of interpretable features rather than opaque latent vectors.
2. Adopted and applied Random Forest classifier for balanced accuracy and
robustness.
3. Developed Web-Based Integration for real-time usability through a web-based
interface.
17
CHAPTER 3
ECG SIGNAL PREPROCESSING & ML BASED BEAT
CLASSIFICATION
3.1 Introduction
Electrocardiogram (ECG) signals, being non-stationary and prone to various
types of noise and artifacts, require systematic preprocessing before they can be
used effectively in machine learning models. This chapter outlines the end-to-end
workflow for ECG-based beat classification, starting from raw signal acquisition to
model deployment. The process involves critical steps such as noise filtering, beat
segmentation, feature extraction, and supervised classification. Each stage plays a
vital role in ensuring that the ECG data is clean, meaningful, and structured
appropriately for accurate and reliable classification of arrhythmic beats. The
overall methodology is designed to ensure both clinical relevance and
computational efficiency, ultimately enabling real-time analysis through a web-
based interface.
The complete process of ECG signal analysis with ectopic beat classification
using machine learning is illustrated in figure 3.1. It begins with an ECG dataset,
which undergoes preprocessing involving conversion to an ECG signal array,
filtering to remove noise, segmentation, to isolate meaningful portions of the signal,
and labelling.
WFDB
ECG ECG
Dataset signal Filtering Segmentation Labelling
array
preprocessing
ML Feature
Web page extraction
training
Figure. 3.1: Block diagram for ectopic beat classification using machine learning
18
After preprocessing, the segmented signals are passed through a feature extraction
stage where critical characteristics such as intervals and amplitudes are derived.
These features serve as input to a machine learning classifier, which is trained to
identify various beat types. The trained model is then integrated into a web platform
for real-time interaction, visualization, and decision support.
1. ECG Dataset
The process starts with acquiring the ECG dataset, which contains raw ECG signals.
2. Preprocessing
The ECG data undergoes several preprocessing steps, which include:
ECG Signal Array: Converting raw ECG data into a signal array suitable for
processing using WFDB library
Filtering: Removing noise, baseline wander, and other artifacts from the ECG
signal.
Segmentation: Dividing the ECG signal into smaller, manageable segments
(e.g., heartbeat cycles).
Labeling: The points P, Q, S, and T in each ECG cycle are labelled using
adaptive windowing method based on its morphology and clinical
characteristics
3. Feature Extraction
From the segmented and filtered ECG signals, key features (like RR intervals,
QRS duration, amplitudes, etc.) are extracted for analysis.
4. ML Training (Machine Learning Training)
The extracted features are used to train a machine learning model. This model
can then be used for tasks such as classification, anomaly detection, or
diagnosis.
5. Web Page
Finally, the trained model is deployed on a web page to enable user interaction
or visualization of results.
19
3.2.2 Preprocessing:
This step involves filtering, ECG beat segmentation and labelling
Filtering:
A Butterworth bandpass filter with lower cutoff frequency of 0.5 Hz to remove
baseline wander and Higher cutoff frequency of 40 Hz to remove muscle noise
& interference was used.
Segmentation:
In the segmentation process, first, R-peaks were detected from annotation
files, where the ECG data is sampled at 360 Hz frequency.
Labelling:
Inspired by Umer et al. (2014), the adaptive labeling method was adopted
which scales the search window for P-wave and T-wave relative to each beat’s
local RR interval, thereby aligning better with physiological timing variability.
Unlike rigid fixed windows, this approach adjusts dynamically to changes in
heart rate and beat morphology, yielding high accuracy (± 5%) even on
arrhythmic ECGs such as those in the MIT‑BIH database.
3.2.3 Feature Extraction:
After pre-processing, the Time-domain, Morphological, Slope-based, and
Heart rate variability features were extracted.
Feature Included Clinical Significance
Category Features
Time-Domain PR interval, - Reflect conduction times through atria,
Features QRS AV node, and ventricles.- Abnormal
duration, durations indicate blocks, pre-excitation,
QT interval, or ventricular origin.- RR interval reflects
RR interval, rhythm regularity.
Heart rate,
RR
variation
Morphological P, Q, R, S, - Represent electrical activity of atria and
Features T ventricles.- Abnormal shapes or amplitudes
amplitudes, suggest ectopic origin, hypertrophy, or
Pre-RR avg, ischemia.- P wave absence/inversion →
20
post-RR atrial ectopy.- High R or deep S →
avg ventricular ectopy.
Slope-Based PR slope, - Reflect the rate of voltage change between
Features QR slope, waveform segments.- Steep slopes may
RS slope, indicate rapid depolarization or abnormal
ST slope conduction.- ST slope changes may reflect
ischemia or repolarization abnormalities.
Heart Rate SDNN, - Measure beat-to-beat variability and
Variability RMSSD autonomic nervous system balance.- High
(HRV) variability may indicate arrhythmia or
Features ectopic activity.- RMSSD reflects short-
term variability; SDNN reflects overall
variability.
Table 3.3: Feature Extraction
21
3.2.4 ML Training
The obtained features along with the corresponding beat annotation and AAMI label
were made into a Data frame for ML model training. The total data frame consists
of the following number of beats:
Label Number of Beats
N 85,738
Q 8,007
V 3,189
S 2,114
F 772
Table 3.4.1: Total Number of Beats in the Dataset
This data was split into 80% training and 20% testing sets as shown below:
Train set(80%)
Label Number of Beats
N 68,590
Q 6,406
V 2,551
S 1,691
F 618
Table 3.4.2: Train set
The figure 3.4.1 illustrates the internal mechanism of the Random Forest model
used for classifying ECG beats based on 19 extracted features. Each ECG beat is
first transformed into a numerical feature vector capturing time-domain,
morphological, and heart rate variability (HRV)-based characteristics. This feature
vector is then passed to an ensemble of decision trees—each trained on a random subset of
features and data. Each tree independently predicts a class label (e.g., Normal (N),
Ventricular (V), etc.). In the example shown, Decision Tree #1 and #100 predict class 'N',
22
while Decision Tree #2 predicts 'V'.
The final classification is determined by majority voting, where the most frequently
predicted class among all the trees is selected as the final output. For instance, if
60 out of 100 trees vote for class 'N' and 40 vote for 'V', the final prediction will be
'N'.
This ensemble strategy improves model robustness, reduces overfitting, and
provides reliable classification, especially for imbalanced datasets like MIT-BIH,
where some beat types are underrepresented.
Figure 3.4.1 Internal workflow of Random Forest Model for ECG Feature-based classification
23
3.2.5 Web Page development
FRONTEND
The frontend is the user-facing part of the project — the webpage or
interface that users interact with to upload ECG features, initiate
classification, and view the results. It serves as a bridge between the user
and the backend model.
User selects and uploads ECG features data.
Sends it to backend via a request.
Displays the classification results and graphs after receiving response.
DISPLAY
RESULTS
UPLOAD ON
USER send file
ECG FILE WEBPAGE:
OPENS to ML
(.csv Beat
WEBPAGE model
format) classification
summary
3.5.1Key Functionalities:
24
3. ECG Feature Upload:
4. Prediction Mechanism:
5. Result Display:
25
BACKEND
Handles ECG features data.
Classifies the ECG beat using the trained Random Forest model.
Sends results back to frontend for display.
• Checks format
• Reads ECG data
File Validation
26
Web Development requirements
Frontend
Languages: Python
Libraries : streamlit, pickle, numpy, pandas, matplotlib.pyplot,
collections.Counter, io
Backend
Framework: Python Flask.
Functions:
Accept ECG input from user
Preprocess and extract features
Call the trained ML model for classification
Return and display results
3.3 DATASET
The dataset used in this project is MIT-BIH Arrhythmia Dataset, developed
by Beth Isreal Hospital and MIT, which contains 48 half-hour ECG recordings
from 47 subjects sampled at a rate of 360 Hz frequency per channel. The leads
used are modified limb lead II (MLII), V1, occasionally V2 or V5 and in one
instance V4. Each beat is annotated by expert cardiologists with rhythm and
beat type labels. The data is distributed in WFDB format, including .dat(ECG
signal), .hea(record info, sampling frequency, etc), .atr(annotation) files.
From this Dataset, 19 set of meaningful features were extracted and
categorized the corresponding beat level annotations into 5 classes according
to AAMI EC57
(Association for the Advancement of Medical Instrumentation) standard
27
Statistical RR_SDNN (standard deviation of RR intervals),
(HRV) RR_RMSSD (Root Mean Square of Successive Differences)
Table 3.6: The features extracted
The obtained features along with the corresponding beat annotation and
AAMI
label are made into a Dataframe for ML model training.
3.4. Conclusion
This project uses ECG data from the MIT-BIH Arrhythmia Database to detect heart
arrhythmias with a Random Forest Classifier. Implemented in Python, it extracts
key features like time-domain, morphological, slope, and HRV.
Libraries used include WFDB, NumPy, Pandas, SciPy, Scikit-learn, and Matplotlib
for data handling, ML, and visualization. A simple HTML/CSS/JavaScript frontend
is included for user interaction. The system offers an efficient, automated method
for Ectopic detection using biomedical signal processing and machine learning.
28
CHAPTER 4
RESULTS & ANALYSIS
The experimental environment was designed to evaluate the
performance of an ECG signal classification system using machine learning
techniques. The simulation and implementation were carried out using
Python, with key libraries including NumPy, Pandas, Matplotlib, SciPy, and
scikit-learn for data manipulation, visualization, signal processing, and model
training.
The ECG dataset used in this study consists of labelled beats categorized as
N (Normal), Q, V, S, and F classes. Preprocessing steps included filtering,
segmenting, and labelling the signals before feature extraction.
The features extracted were grouped into four major categories: Time-Domain
Features, Morphological Features, Slope-Based Features, and Heart Rate
Variability (HRV) Features. These were selected based on their clinical
significance in detecting abnormalities in heart rhythms and conduction
pathways. The feature extraction process resulted in a diverse set of input
parameters fed into machine learning algorithms for classification.
The total dataset was split into 80% training and 20% testing sets. The
training set included 68,590 N beats, 6,406 Q beats, 2,551 V beats, 1,691 S
beats, and 618 F beats. The test set comprised 17,148 N beats, 1,601 Q beats,
638 V beats, 423 S beats, and 154 F beats.
Classification was performed using Random Fores model and model’s
performance was evaluated using accuracy, precision, recall, and F1-score.
29
Class Precision Recall F1-score Support
F 0.95 0.75 0.84 154
N 0.99 1.00 1.00 17,148
Q 1.00 0.99 1.00 1,601
S 0.98 0.84 0.91 423
V 0.98 0.93 0.96 638
Accuray 0.99 19964
Macro 0.98 0.90 0.94 19964
Average
Weighted 0.99 0.99 0.99 19964
Average
Table 4.1: Evaluation Metrics
30
Analysis:
The classification model was evaluated using standard performance metrics—
Precision, Recall, F1-score, and Support—across five beat classes: F (Fusion),
N (Normal), Q (Unknown), S (Supraventricular), and V (Ventricular). The
results are summarized below:
Key Observations
32
Figure 4.2.2: output after uploading the ECG features
33
CHAPTER 5
CONCLUSION & FUTURE SCOPE
5.1 Conclusion
5.1.1 Advantages
Enables faster and more efficient ectopic beat detection, especially
in emergency or high-volume scenarios.
34
Liable performance across multiple ectopic classes (N, S, V, F, Q)
improves diagnostic confidence.
5.1.2. Disadvantages
35
The segmentation and feature extraction process relies heavily
on correctly identifying R-peaks. Any error in R-peak detection
(due to noise or motion artifacts) can cause incorrect
classification.
5.1.3 APPLICATIONS
Clinical Decision Support Systems:
Assists doctors in diagnosing ectopic beats more quickly and
accurately.
Reduces the workload on cardiologists by automating
36
5.2 Future Scope
37
REFERENCES
[1] https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_1
[2] https://physionet.org/content/mitdb/1.0.0/
[3]https://accessmedicine.mhmedical.com/content.aspx?bookid=2725§ionid=225759283
[4] G. B. Moody and R. G. Mark, “The impact of the MIT-BIH arrhythmia database,” IEEE
Engineering in Medicine and Biology Magazine, vol. 20, no. 3, pp. 45–50, May 2001, doi:
10.1109/51.932724.
[6] A. L. Goldberger et al., “PhysioNet: A Web-Based Resource for the Study of Physiologic
Signals,” IEEE Engineering in Medicine and Biology Magazine, vol. 20, no. 3, pp. 70–75, May–
Jun. 2001, doi: 10.1109/51.932728.
[7] Association for the Advancement of Medical Instrumentation, Testing and Reporting
Performance Results of Cardiac Rhythm and ST Segment Measurement Algorithms, AAMI
EC57:2012, AAMI, 2012.
[10] MIT Laboratory for Computational Physiology, “WFDB Python Toolbox,” [Online]. Available:
https://github.com/MIT-LCP/wfdb-python. [Accessed: Jul. 5, 2025].
[11] A. Gour, M. Gupta, R. Wadhvani, and S. Shukla, “ECG based heart disease classification:
Advancement and review of techniques,” Procedia Computer Science, vol. 235, pp. 1634–1648,
2024, doi: 10.1016/j.procs.2024.04.155.
[12] A. N. Mahmood, N. N. Q. Zulkefli, and A. A. Manaf, “A new hierarchical method for inter-
patient ECG arrhythmia classification using random projections and RR intervals,” Computer
Methods and Programs in Biomedicine, vol. 208, p. 106226, 2021, doi:
10.1016/j.cmpb.2021.106226.
38
APPENDICES
Figure (a): Data frame, which was given to ML model for training & testing
(*this is not the complete data frame, the data frame is huge, with 99821 rows and 22colums)
Webpage Creation
import streamlit as st
import pickle
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
#import io
40
ax.set_xlabel("Time (s)")
ax.set_ylabel("Amplitude (mV)")
ax.grid(True, alpha=0.3)
ax.legend()
plt.tight_layout()
return fig
except Exception as e:
st.sidebar.error(f" Error loading model: {str(e)}")
# Main content
if st.session_state.model is None:
st.info(" Please upload your trained model file (.pkl) in the sidebar to get started")
elif data_file is None:
st.info(" Please upload your ECG features file (.csv or .xlsx) in the sidebar")
else:
try:
# Load ECG data
if data_file.name.endswith('.csv'):
df = pd.read_csv(data_file)
else:
41
df = pd.read_excel(data_file)
if missing_cols:
st.warning(f"⚠ Added missing columns with default values: {',
'.join(missing_cols)}")
# Make predictions
predictions = model.predict(df_processed)
# Summary
st.header(" Prediction Results")
summary = Counter(predictions)
if max_plots == 1:
num_plots = 1
st.info(" Displaying the single ECG waveform in your dataset")
else:
num_plots = st.slider("Number of ECG waveforms to display:", 1, max_plots,
min(5, max_plots))
try:
fig = plot_synthetic_ecg(df_processed.iloc[i], i, prediction)
st.pyplot(fig)
plt.close() # Close figure to free memory
except Exception as e:
st.error(f"Error plotting beat {i}: {str(e)}")
# Download results
st.header(" Download Results")
results_df = pd.DataFrame({
'Beat_Index': range(len(predictions)),
'Prediction': predictions,
'Meaning': [label_meaning.get(p, 'Unknown') for p in predictions]
})
43
csv = results_df.to_csv(index=False)
st.download_button(
label=" Download Predictions as CSV",
data=csv,
file_name="ecg_predictions.csv",
mime="text/csv"
)
except Exception as e:
st.error(f" Error processing data: {str(e)}")
# Footer
st.markdown("---")
st.markdown("ECG Beat Classification System | Built with Streamlit ❤") this is code for
webpage creation
# --- Step 1: Load your CSV containing extracted features and AAMI labels ---
df = pd.read_csv(r"C:\Adaptive
labelling\(final)aami_standardized_features_adaptive_label.csv") # Replace with your
actual file path
# --- Step 3: Split the data into 80% training and 20% testing ---
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
) # Stratify ensures balanced class distribution
44
print(classification_report(y_test, y_pred, zero_division=0)) # Handles divisions by zero
print(" Accuracy Score:", accuracy_score(y_test, y_pred))
AAMI Mapping
import pandas as pd
# --- Step 4: Print Sample Count Per Class (Before Saving) ---
print(" Sample count per AAMI class:")
print(df['aami_label'].value_counts())
# --- Step 5: Save the AAMI-labeled DataFrame (No Balancing Applied) ---
df.to_csv("(final)aami_standardized_features_adaptive_label.csv", index=False)
45
print("\n Saved to: (final)aami_standardized_features_adaptive_label.csv")
this is code for aami mapping
# ------------------------------
# 1. Load and Select Lead
# ------------------------------
def load_record(base_path, record_name):
try:
record = wfdb.rdrecord(os.path.join(base_path, record_name))
annotation = wfdb.rdann(os.path.join(base_path, record_name), 'atr')
print(f" Loaded: {record_name}")
return record, annotation
except Exception as e:
print(f" Failed to load {record_name}: {e}")
return None, None
def select_ecg_lead(record):
lead_names = record.sig_name
signals = record.p_signal
if 'MLII' in lead_names:
lead_index = lead_names.index('MLII')
selected_lead = signals[:, lead_index]
print(" ➤ Selected Lead: MLII (Lead II)")
else:
lead_index = 0
selected_lead = signals[:, lead_index]
print(f" ➤ MLII not found. Using lead: {lead_names[lead_index]}")
return selected_lead, lead_names[lead_index]
def process_all_records(data_dir):
processed_data = []
dat_files = [f for f in os.listdir(data_dir) if f.endswith('.dat')]
for dat_file in dat_files:
record_name = os.path.splitext(dat_file)[0]
record, annotation = load_record(data_dir, record_name)
if record and annotation:
signal, lead_used = select_ecg_lead(record)
processed_data.append({
'record_name': record_name,
'lead_name': lead_used,
'signal': signal,
'annotation': annotation
})
46
return processed_data
# ------------------------------
# 2. Filtering
# ------------------------------
def butter_bandpass(lowcut, highcut, fs, order=4):
nyquist = 0.5 * fs
low = lowcut / nyquist
high = highcut / nyquist
b, a = butter(order, [low, high], btype='band')
return b, a
def filter_loaded_records(all_records_data):
for record in all_records_data:
fs = 360
raw_signal = record['signal']
record['filtered_signal'] = apply_bandpass_filter(raw_signal, fs)
return all_records_data
# ------------------------------
# 3. R-peak Detection & Labeling
# ------------------------------
def detect_r_peaks(filtered_signal, fs=360):
r_peaks, _ = find_peaks(filtered_signal, height=0.5, distance=fs/2)
return r_peaks
def label_all_r_peaks_for_all_records(all_filtered_data):
for record in all_filtered_data:
record['labeled_r_peaks'] = label_r_peaks_with_annotations(
record['filtered_signal'], record['annotation']
)
return all_filtered_data
# ------------------------------
# 4. Adaptive PQRST Labeling
# ------------------------------
def label_pqrst_points_adaptive(filtered_signal, labeled_r_peaks, fs=360):
pqrst_labeled_points = []
47
for i, (r_index, symbol) in enumerate(labeled_r_peaks):
if 0 < i < len(labeled_r_peaks) - 1:
prev_rr = r_index - labeled_r_peaks[i - 1][0]
next_rr = labeled_r_peaks[i + 1][0] - r_index
rr_interval = int((prev_rr + next_rr) / 2)
elif i > 0:
rr_interval = r_index - labeled_r_peaks[i - 1][0]
elif i < len(labeled_r_peaks) - 1:
rr_interval = labeled_r_peaks[i + 1][0] - r_index
else:
rr_interval = int(0.8 * fs)
if len(segment) == 0:
labeled_points[point] = None
continue
pqrst_labeled_points.append(labeled_points)
return pqrst_labeled_points
# ------------------------------
# 5. Annotation Map
# ------------------------------
annotation_map = {
'N': 'Normal beat', 'L': 'LBBB', 'R': 'RBBB', 'A': 'APB', 'V': 'PVC', 'F': 'Fusion', 'E': 'VEB',
'/': 'Paced beat', 'f': 'Fusion of paced and normal beat', 'j': 'Nodal escape beat'
}
# ------------------------------
# 6. Feature Extraction
# ------------------------------
def compute_hrv_features(r_peaks, fs=360):
rr_intervals = np.diff(r_peaks) / fs
if len(rr_intervals) < 2:
return 0.0, 0.0
sdnn = np.std(rr_intervals)
48
rmssd = np.sqrt(np.mean(np.diff(rr_intervals) ** 2))
return sdnn, rmssd
# ------------------------------
# 7. Execute Pipeline
# ------------------------------
if _name_ == "_main_":
mitdb_path = r"C:\final requirements for ECG project\mit-bih-arrhythmia-database"
all_records_data = process_all_records(mitdb_path)
all_filtered_data = filter_loaded_records(all_records_data)
all_labeled_data = label_all_r_peaks_for_all_records(all_filtered_data)
all_features = []
49
for record in all_labeled_data:
record['pqrst_points'] = label_pqrst_points_adaptive(
record['filtered_signal'], record['labeled_r_peaks']
)
r_peaks = [r[0] for r in record['labeled_r_peaks']]
pqrst = {
r['R']: {
'P': r['P'], 'Q': r['Q'], 'S': r['S'], 'T': r['T'], 'annotation': r['R_symbol']
} for r in record['pqrst_points']
}
features_df = extract_ecg_features(record['signal'], r_peaks, pqrst)
all_features.append(features_df)
50