UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT
CEP PROPOSAL
Network Security and Cryptography
Team Members:
Habib Ur Rehman 21-CP-09
Muniza Babar 21-CP-31
Sumaiya Bibi 21-CP-37
Minal Fatima 21-CP-55
Areej Mehboob 21-CP-97
Dataset Overview
The **CIC-AndBot 2020** dataset, developed by the Canadian Institute for Cybersecurity
(CIC) at the University of New Brunswick, is tailored for evaluating Android botnet
detection systems. It simulates a real-world network environment and includes both benign
and malicious traffic traces originating from Android-based botnets. The dataset facilitates
binary and multi-class classification tasks for Android botnet detection, with a focus on
network-based behavior analysis.
Basic Dataset Information
| Attribute | Details |
|----------------------|-------------------------------------------------------------------------|
| Dataset Name | CIC-AndBot 2020 |
| Provider | Canadian Institute for Cybersecurity, University of New Brunswick (UNB)
|
| Purpose | Detection and classification of Android botnets |
| Primary Focus | Network traffic-based analysis of botnets |
| Data Types | PCAP files, extracted features (CSV), labeled flows |
Botnet Families and Distribution
The dataset includes several Android botnet families with varying traffic behavior. Each
botnet scenario is simulated separately and traffic is labeled accordingly. Families include:
- **Geinimi**
- **DroidKungFu**
- **FakeInst**
- **Plankton**
- **Zsone**
- **BaseBridge**
- **Obad**
- **Youmi**
Benign traffic is also included, generated from common Android applications to simulate
real-world background activity.
Data Collection and Labeling
Data was collected in a controlled environment emulating a cellular network, where
infected and clean Android devices interacted with various online services. Network flows
were captured using Wireshark and exported in PCAP and CSV formats. Labeling was done
manually based on application behavior and botnet activity signatures.
Feature Extraction
Network flow features were extracted using CICFlowMeter, which generates time-series
characteristics from network packets. Key features include:
- Flow Duration
- Packet Length Statistics
- Flow Bytes and Packets per Second
- Flow Inter-arrival Times
- TCP Flags and Header Information
- Source/Destination Ports and Protocols
These features help differentiate between normal and malicious behavior by analyzing
traffic patterns.
Literature Review
| Model / Paper Title | Year | Methodology & Techniques |
Classifier & Accuracy | Classes Details | Strengths | Weaknesses
|
|--------------------------------------------------|------|-----------------------------------------------------------
----|------------------------|------------------|----------------------------------------------------------|----------
------------------------------|
| BotHunter: ML-based Android Botnet Detection | 2024 | Supervised learning using
random forests on network features | ~94% accuracy | Binary | High
interpretability, low false positive rate | May struggle with unseen botnet types |
| NetBot-ID: Deep Learning for Botnet Detection | 2023 | CNN on packet sequences and
RNN on flow sequences | ~96% accuracy | Binary & Multi | Captures sequential
patterns effectively | Requires large datasets |
| Hybrid Intrusion Detection using Flow Analysis | 2022 | Feature selection + ensemble
learning | ~92% accuracy | Multi-class | Robust to noise, good
generalization | Higher computational cost |
Proposed Methodology
The proposed system aims to utilize the CIC-AndBot 2020 dataset to develop a robust
botnet detection model. The process will include the following steps:
1. 1. **Data Preprocessing**: Raw PCAP files will be converted into structured flow-based
CSV files using CICFlowMeter. Redundant or non-informative features will be removed.
Missing values will be handled and features normalized.
2. **Feature Selection**: Using techniques like correlation analysis and Recursive Feature
Elimination (RFE) to identify the most impactful features for classification.
3. **Model Development**: A Random Forest classifier will be implemented using Python
(Scikit-learn), given its interpretability and strong performance in similar contexts.
Additional models like SVM and XGBoost will also be tested for comparison.
4. **Classification Tasks**: Both binary classification (botnet vs. benign) and multi-class
classification (based on botnet family) will be explored.
5. **Evaluation**: Model performance will be assessed using accuracy, precision, recall, F1-
score, and confusion matrix. K-fold cross-validation will be used to ensure generalizability.
6. **Tools and Platform**: Implementation will be carried out on Google Colab with all
necessary Python libraries.
This methodology is expected to offer practical insights into Android botnet behavior and
help design more resilient intrusion detection systems.