A comprehensive data warehouse solution for Ethiopian medical business data scraped from Telegram channels, including data scraping, object detection with YOLO, and ETL/ELT processes.
The repository is organized into the following directories:
.github/workflows/: Contains configurations for GitHub Actions, enabling continuous integration and automated testing..vscode/: Configuration files for the Visual Studio Code editor, optimizing the development environment.app: Contains the implementation of the machine learning model API, allowing interaction with the model through RESTful endpoints.notebooks/: Jupyter notebooks used for tasks such as data exploration, feature engineering, and preliminary modeling.scripts/: Python scripts for data preprocessing, feature extraction, and the implementation of the credit scoring model.tests/: Unit tests to ensure the correctness and robustness of the implemented model and data processing logic.
To run the project locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/epythonlab/EthiomedDataWarehouse.git cd EthiomedDataWarehouse -
Set up the Virtual Environment:
Create a virtual environment to manage the project's dependencies:
For Linux/MacOS:
python3 -m venv .venv source .venv/bin/activateFor Windows:
python -m venv .venv .venv\Scripts\activate
-
Install Dependencies:
Install the required Python packages by running:
pip install -r requirements.txt
- Navigate to the
scripts/directory and runtelegram_scraper. - Ensure that the required libraries are installed and store the API ID and hash in the
.envfile. - Next, run
data_cleaner.pyto auto-clean the data. - Once cleaned, run
store_data.py. - Ensure you create a database in your PostgreSQL database and store credentials in the
.envfile, then start the PostgreSQL server.
-
Go to the
ethio_medical_projectdirectory and explore the DBT configurations. -
Run the DBT commands:
dbt run
-
Testing and documentation:
dbt test dbt docs generate dbt docs serve
-
Setting Up the Environment: Ensure you have the necessary dependencies installed, including YOLO and its required libraries (e.g., OpenCV, TensorFlow, or PyTorch depending on the YOLO implementation).
pip install opencv-python pip install torch torchvision # for PyTorch-based YOLO pip install tensorflow # for TensorFlow-based YOLO
-
Downloading the YOLO Model:
git clone https://github.com/ultralytics/yolov5.git cd yolov5 pip install -r requirements.txt -
Once you are installed the yolo model, go to the notebooks directory and Run the notebook to check the outputs and explore the PostgreSQL database for stored data.
-
Make sure you are in the root directory and run this command:
uvicorn app.main:app --reload
-
Note: Ensure that all the required libraries are installed. You can install any missing dependencies manually using
requirements.txt.
We welcome contributions to improve the project. Please follow the steps below to contribute:
- Fork the repository.
- Create a new branch for your feature or bugfi 5535 x.
- Submit a pull request with a detailed explanation of your changes.
