E5BA tarekmasryo (Tarek Masryo) · GitHub
[go: up one dir, main page]

Skip to content
View tarekmasryo's full-sized avatar

Block or report tarekmasryo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
tarekmasryo/README.md

Tarek Masryo Banner

Typing SVG

AI/ML Engineer building production ML services and reliable GenAI apps (RAG + Agents).
From raw data → validated pipelines → deployed APIs → decision-ready dashboards.

Kaggle Datasets Grandmaster Kaggle Notebooks Master

GitHub Website Repos LinkedIn

Kaggle Hugging Face Streamlit


🧭 What I do

Area What you can expect
Production ML delivery Data validation + leak-safe evaluation → calibrated models → threshold policies → inference-ready artifacts
GenAI (RAG & Agents) Grounded RAG, structured extraction/summarization, tool-calling agents with guardrails & evaluation
ML APIs & deployment Dockerized FastAPI services, strict request/response schemas, versioned artifacts, CI-friendly delivery
MLOps & monitoring MLflow tracking, monitoring signals (latency/errors/drift/cost), reproducibility and quality gates
Applied NLP & CV NLP: classification/extraction/semantic search • CV: classification/detection/segmentation

🌟 Featured

🧩 Dashboards & Apps

Project Focus Link
Fraud Detection Dashboard Streamlit app integrated with ML artifacts + decision-first UX Repo
Streamlit profile Deployed dashboards gallery Profile
Hugging Face profile Spaces + Datasets Profile

🤖 GenAI (RAG & Agentic Workflows)

Project Focus Link
LLM System Ops — Production Telemetry LLM/RAG monitoring signals: quality, cost, latency, failure patterns Repo
RAG QA Evaluation Logs & Corpus Retrieval + answer quality evaluation with realistic logs Dataset

📦 Data products (Kaggle)

Dataset What it’s for Link
YouTube Shorts & TikTok Trends 2025 Short-form trends analytics and virality exploration Dataset
Cancer Risk Factors Clean features for health EDA and risk modeling Dataset
Football Matches 2024/2025 (Top Leagues + UCL) Standardized match-level data for analytics/modeling Dataset
Digital Lifestyle & Mental Wellness Behavioral signals for wellbeing analytics and prediction Dataset

🧰 Systems & Pipelines

Project Focus Link
Credit Card Fraud Detection — A Pipeline Journey End-to-end pipeline layout + evaluation mindset Repo
Text Sentiment Analysis NLP workflow + clean evaluation structure Repo
Pima Diabetes Pipeline Tabular ML pipeline layout (train/evaluate/infer) with validation + reproducible runs Repo

🛠️ Tech stack

Category Tools
Languages & Core Python SQL Bash Git Linux
Data & Analytics NumPy Pandas Polars DuckDB Jupyter
ML / DL PyTorch TensorFlow scikit-learn XGBoost LightGBM
NLP / CV / LLM Hugging Face Transformers OpenCV
Visualization & Apps Matplotlib Seaborn Plotly Streamlit Gradio
APIs & Deployment FastAPI Pydantic Docker Postgres
GenAI / RAG Stack LangChain LlamaIndex FAISS pgvector
MLOps & Quality MLflow GitHub Actions pytest Ruff

🤝 Collaboration

  • 🚀 Build & ship ML/GenAI products: FastAPI + Docker, clean contracts, production-ready delivery
  • 🧠 RAG/LLM reliability: retrieval evaluation, grounded answers, guardrails & regression suites
  • 🛠️ MLOps: MLflow tracking, CI quality gates, monitoring signals (latency/errors/drift/cost)

Best contact: LinkedIn

If you find the work useful, a ⭐ helps more people discover it.

Footer Banner

Pinned Loading

  1. pima-diabetes-pipeline pima-diabetes-pipeline Public

    End-to-end diabetes risk prediction pipeline (Pima): EDA → feature engineering → calibration + cost-aware threshold → deployable artifacts.

    Jupyter Notebook 9

  2. tarekmasryo.github.io tarekmasryo.github.io Public

    Tarek Masryo — AI/ML Engineer Portfolio

    JavaScript 2

  3. text-sentiment-analysis text-sentiment-analysis Public

    IMDB reviews sentiment analysis: EDA → TF-IDF baselines (NB/LogReg/Linear SVM + calibration) → F1 threshold tuning → explainability → BiLSTM baseline.

    Jupyter Notebook 6

  4. fraud-detection-dashboard fraud-detection-dashboard Public

    Production-minded Streamlit + Plotly fraud detection dashboard with decision policies (Strict/Balanced/Lenient), cost-vs-threshold analysis, and calibrated model artifacts.

    Python 5

  5. rag-qa-logs-corpus-data rag-qa-logs-corpus-data Public

    Synthetic multi-table RAG QA telemetry benchmark (corpus→chunks→retrieval→eval): labels for correctness/faithfulness/hallucination + cost/latency for RAG evaluation and dashboards.

    2

  6. llm-system-ops-production-telemetry-sft-data llm-system-ops-production-telemetry-sft-data Public

    Production-grade synthetic dataset for LLMOps: interaction-level telemetry (latency/cost/tokens), failure RCA, tool-use analytics, user feedback, plus 1:1 aligned SFT samples.

    1

0