Project Title Description Requirements Process
1. Data ingestion: Deploy
Kafka producers to capture
user events.
2. ETL pipeline: Use Spark on
- Data processing:
Hadoop cluster; write cleaned
Kafka,
tables into Snowflake.
Spark/PySpark,
3. Feature engineering: Build
Hadoop
features with Pandas, NumPy,
- Feature store:
Prophet for seasonality.
Redis, Snowflake
4. Model training & tuning:
- Modeling: Scikit-
Train LightGBM/XGBoost on
Learn, LightGBM,
Sagemaker; optimize
XGBoost, Bayesian
hyperparams via Bayesian
Build a low-latency, Optimization, SHAP
Optimization.
1. Real-Time personalized product - Serving: FastAPI,
5. Explainability: Analyze
Retail recommender for an e- Docker, Kubernetes,
feature impact with SHAP.
Recommendation commerce platform, AWS EC2/EKS,
6. Containerize: Package
Engine ingesting clickstreams Lambda,
model server in Docker; push
and purchase history. Sagemaker
to ECR.
Endpoints
7. Deploy: Orchestrate with
- Monitoring &
Kubernetes/EKS; expose via
Logging:
FastAPI behind AWS ALB.
Prometheus,
8. Real-time scoring: Lambda
Grafana,
functions triggered by Kafka;
CloudWatch, ELK
cache lookups in Redis.
(Elasticsearch)
9. Monitoring: Track latency &
- UI: Streamlit UI,
errors in Prometheus/Grafana;
Flask
ship logs to Elasticsearch.
10. Dashboard: Build admin UI
in Streamlit to visualize
recommendations and drift.
Page 1 of 10
Project Title Description Requirements Process
1. File ingestion: Upload docs
- Ingestion & to S3; trigger Glue crawler to
Storage: S3, Glue, catalog.
Athena 2. OCR: Invoke AWS Textract
- OCR & NLP: AWS via Lambda; fallback to
Textract, Tesseract, Tesseract if needed.
spaCy, NLTK, 3. Text cleaning: Preprocess
Transformers with spaCy, NLTK.
(BERT), LangChain, 4. Embedding & Semantic
Whisper search: Generate embeddings
- Database & via Transformers; store in
Search: Pinecone.
Elasticsearch, 5. Entity extraction &
End-to-end system to Pinecone, Neo4j classification: Use BERT fine-
2. Automated
ingest, OCR-extract, (knowledge graph) tuned on invoices; build graph
Document
classify, and analyze - Orchestration: in Neo4j.
Intelligence
business documents Airflow, Step 6. Workflow orchestration:
Platform
(invoices, contracts). Functions, AWS Airflow DAG or Step Functions
Lambda for ETL→OCR→NLU.
- Web UI: Flask, 7. API layer: Expose
React + Streamlit search/classify via FastAPI;
UI, Gradio containerize.
- Deployment: 8. UI: Interactive demo in
Docker, AWS Gradio or Streamlit.
ECS/Fargate, 9. Monitoring & alerts:
Kubernetes CloudWatch + Prometheus
- Monitoring: metrics.
CloudWatch, 10. Versioning & retraining:
Prometheus, MLflow for model registry;
Grafana schedule retraining with
Airflow.
Page 2 of 10
Project Title Description Requirements Process
1. Scenario setup: Define road
networks in SUMO; control via
Python ROS scripts.
2. Data ingestion: Use Spark
on Hadoop to process traffic
logs; store in Snowflake.
3. Route planning: Formulate
as optimization; solve with
- Simulator: SUMO,
OR-Tools/Gurobi.
ROS
4. Model acceleration:
- Routing &
Convert PyTorch/TensorFlow
Optimization: OR-
planning networks to ONNX;
Tools, Gurobi,
optimize with TensorRT on
Bayesian
CUDA GPUs.
Optimization
5. Batch simulations: Launch
- Data handling:
parallel jobs on
3. Scalable Simulate fleets of Hadoop, Spark,
Kubernetes/EKS or AWS
Autonomous autonomous vehicles, Snowflake
Batch.
Vehicle plan routes, and - Compute: CUDA,
6. Metrics collection:
Simulation & evaluate with traffic TensorRT, ONNX,
Aggregate latency, collision
Planning models. PyTorch,
stats in Snowflake; push to
TensorFlow
Prometheus.
- Orchestration:
7. Visualization: Plot routes
Kubernetes, Docker,
and performance in OpenCV
AWS Batch, EC2
overlays; chart with
GPU (p3)
Matplotlib/Seaborn.
- Visualization:
8. Hyperparameter tuning:
OpenCV, Matplotlib,
Use Bayesian Optimization to
Seaborn, Grafana
refine planning heuristics.
9. Continuous integration:
MLflow to track experiments;
deploy new planners via
Docker.
10. Dashboard: Grafana
dashboards for fleet health
and simulation KPIs.
Page 3 of 10
Project Title Description Requirements Process
1. Stream ingestion: Deploy
Kafka consumers; ingest
transaction stream.
2. Feature computation: Real-
time features in Spark
Streaming; store in Redis.
- Streaming: Kafka, 3. Baseline modeling: Fit
Kinesis, Spark seasonal models with Prophet;
Streaming, PySpark detect outliers.
- Feature store: 4. Machine learning: Train
Redis, Snowflake Isolation Forest/XGBoost on
- Anomaly historical data stored in
Detection: Scikit- Snowflake.
Learn, Isolation 5. Deployment: Expose
Detect anomalous Forest, Prophet models via SageMaker
4. Real-Time
transactions in (seasonal patterns) endpoints and FastAPI
Fraud Detection
streaming finance data, - Model hosting: containers.
& Alerting
raise alerts with low SageMaker, 6. Real-time scoring: Lambda
System
false positives. FastAPI, Lambda functions invoked per
- Alerting: AWS transaction; push scores via
SNS, WebSocket, WebSocket to front-end.
Grafana 7. Alerting: SNS pushes
Alertmanager SMS/email for high-risk flags;
- Logging & Audit: Grafana monitors thresholds.
CloudTrail, 8. Audit logs: Stream logs to
Elasticsearch, Elasticsearch; build Kibana
Kibana dashboards.
9. Feedback loop: Label
confirmed fraud; retrain
weekly via Glue & Airflow.
10. Security & compliance:
Enforce IAM policies; encrypt
data at rest/in transit.
Page 4 of 10
Project Title Description Requirements Process
1. Intent and entity design:
Configure Dialogflow with
training phrases.
2. Embedding pipeline: Use
Whisper for speech-to-text;
generate embeddings via
Transformers; index in
- NLP: Dialogflow, FAISS/Pinecone.
Rasa (optional), 3. Context management:
Transformers Build LangChain chains; store
(BERT, Whisper), session state in Firebase.
spaCy, NLTK 4. Backend APIs: FastAPI
- Vector DB: FAISS, endpoints for chat; integrate
Pinecone Dialogflow via webhook.
- Orchestration: 5. Orchestration: Use Airflow
End-to-end chatbot
5. Conversational LangChain, Airflow to retrain NLU models daily
with multimodal
AI & Virtual - Backend: FastAPI, with new transcripts.
capabilities and context
Assistant Flask, Firebase 6. UI: Web chat widget built in
management.
- Deployment: Flask or Streamlit UI.
Docker, Kubernetes, 7. Containerization & scaling:
AWS Lambda, Cloud Docker + Kubernetes on AWS
Functions EKS; Lambda for lightweight
- Monitoring: tasks.
Prometheus, 8. Monitoring: Track user
Grafana, sessions and latencies in
CloudWatch Prometheus/Grafana.
9. Analytics: Store logs in
Elasticsearch; analyze in
Kibana.
10. Continuous improvement:
Use MLflow for experiment
tracking; deploy updated NLU
models.
Page 5 of 10
Project Title Description Requirements Process
1. Data collection: Automate
with Selenium and Beautiful
Soup; push to Kafka.
2. Storage & ETL: Spark jobs
on Hadoop; land into
Snowflake.
- Scraping:
3. Preprocessing: Clean text
Beautiful Soup,
with spaCy/NLTK.
Selenium
4. Topic modeling: Train LDA
- Streaming &
or Transformers-based
storage: Kafka,
classifiers.
Hadoop, Snowflake
5. Sentiment analysis: Fine-
- NLP:
tune BERT; serve via FastAPI.
Ingest social feeds, Transformers
6. Social Media 6. Trend forecasting:
analyze sentiment & (BERT), spaCy,
Analytics & Trend Forecast topic volumes with
topics, forecast viral NLTK, TextBlob
Prediction Prophet; refine with SciPy
trends. - Time series:
optimizations.
Prophet, SciPy
7. Dashboard: Build real-time
- Visualization:
Grafana dashboards; embed
Matplotlib, Seaborn,
print-quality charts in Tableau.
Tableau, Grafana
8. Pipeline orchestration:
- Deployment:
Orchestrate
Docker, AWS EC2,
scraping→ETL→modeling via
Lambda, Airflow
Airflow.
9. Alerts: Lambda-triggered
SNS alerts for spikes.
10. Scaling & monitoring:
Kubernetes on AWS; metrics
in CloudWatch/Prometheus.
Page 6 of 10
Project Title Description Requirements Process
1. Data ingestion: Load EHR
data into S3; catalog with
Glue.
2. Exploratory analysis: Query
via Athena; visualize with
- Data platforms:
Seaborn.
Hadoop, Snowflake,
3. Feature engineering: Use
AWS
Pandas, SciPy for numeric,
S3/Glue/Athena
Transformers for unstructured
- Modeling: Scikit-
notes.
Learn, XGBoost,
4. Model training: Compare
LightGBM, PyTorch,
XGBoost/LightGBM vs. deep
Keras,
nets in PyTorch/Keras; tune
Transformers
with Bayesian Optimization.
- Optimization:
7. Personalized Predict patient 5. Explainability: Compute
Bayesian
Healthcare outcomes, recommend SHAP values; build
Optimization, SHAP
Predictive interventions, visualize interpretability reports.
for explainability
Analytics risk. 6. Deployment: Package best
- Deployment:
model in Docker; deploy on
SageMaker,
SageMaker endpoint.
FastAPI, Docker,
7. API & UI: Serve predictions
Kubernetes, AWS
via FastAPI; interactive
Lambda
dashboard in Streamlit.
- Visualization:
8. Monitoring & retraining:
Dashboards in
Track data drift via SageMaker
Tableau, Streamlit
Model Monitor; schedule
UI, Matplotlib,
retrain with Airflow.
Seaborn
9. Security: Use IAM, encrypt
PHI, Secrets Manager.
10. Reporting: Export KPI
charts to Tableau for
clinicians.
Page 7 of 10
Project Title Description Requirements Process
1. Market data feed: Ingest via
Kinesis; buffer in Kafka.
2. Feature extraction: Real-
time features in Spark
- Streaming: Streaming; cache in Redis.
Kinesis, Kafka 3. Strategy modeling: Build
- Compute: EC2 RL/supervised models in
Spot GPU (for PyTorch; accelerate inference
backtest ML), Ray via TensorRT.
for distributed 4. Backtesting engine: Use
execution QuantLib for pricing; solve
- Frameworks: portfolio allocation via Gurobi
PyTorch, and OR-Tools.
8. High- Ultra-low-latency TensorFlow, CUDA, 5. Distributed execution:
Frequency analytics and TensorRT Orchestrate with Ray across
Trading Analytics backtesting system for - Backtesting: EC2 spot clusters.
Platform trading strategies. QuantLib, OR-Tools, 6. Results storage: Persist
Gurobi trade logs to Snowflake.
- Data store: Redis, 7. Dashboard & alerts:
Snowflake, Hadoop Grafana for latency/trade P&L;
- Orchestration: SNS alerts for anomalies.
Airflow, Step 8. CI/CD: Dockerize
Functions strategies; deploy via
- Visualization: Kubernetes/EKS.
Grafana, Matplotlib, 9. Scheduling: Airflow DAG for
Seaborn nightly backtests; Step
Functions for deploy pipelines.
10. Performance tuning:
Profile with CUDA tools;
iterate.
Page 8 of 10
Project Title Description Requirements Process
1. Video ingestion: Stream
from Kinesis Video; shard to
Kafka.
2. Preprocessing: Decode
- CV frameworks:
frames with OpenCV; resize,
OpenCV, YOLO,
normalize.
Roboflow, ONNX,
3. Object detection: Run YOLO
TensorRT
in PyTorch; convert to ONNX
- Streaming: Kafka,
→ TensorRT on GPU.
Kinesis Video
4. Tracking & classification:
Streams
Use SORT/DeepSORT; classify
- Storage: S3,
actions with deep nets.
Hadoop
5. Audio analytics: Extract
- Modeling:
Detect, track, and audio; run Whisper for
9. Intelligent PyTorch,
classify speech-to-text.
Video Analytics & TensorFlow,
objects/activities in live 6. Event pipeline: Lambda
Surveillance Transformers (for
video feeds. triggers on detections; store
captioning),
metadata in Snowflake.
Whisper (audio)
7. Dashboard: Real-time
- Deployment: AWS
metrics in Grafana; playback
EC2 GPU, Lambda
UI in Streamlit.
(for triggers),
8. Model updates: Roboflow
Docker, Kubernetes
pipeline to label new data;
- Monitoring:
retrain weekly.
CloudWatch,
9. Scalability: Kubernetes
Prometheus,
auto-scale based on stream
Grafana
volume.
10. Alerts & export: SNS
email/SMS on critical events;
export clips to S3.
Page 9 of 10
Project Title Description Requirements Process
1. Setup repo: Define standard
project template with MLflow
integration.
2. Data ingestion: Ingest from
- Experiment S3 or Snowflake; register
tracking: MLflow, features in Redis.
Weights & Biases 3. Experimentation: Use
- CI/CD: GitHub MLflow to log
Actions, AWS metrics/artifacts for sklearn,
CodePipeline, TF, PyTorch.
CodeBuild, 4. Hyperparameter tuning:
CodeDeploy Integrate Bayesian
- Containers & Optimization in MLflow
Orchestration: pipeline.
Docker, Kubernetes, 5. CI/CD: Configure GitHub
EKS/ECS Actions → build Docker
- Model registry &
images → push to ECR.
Build a generic MLOps serving: MLflow
10. End-to-End 6. Model registry: Promote
framework to track, Serving, SageMaker
MLOps Platform models through
package, deploy, and - Data & feature
with MLflow Dev→Staging→Prod in
monitor any ML model. store: Snowflake,
Redis, AWS MLflow.
S3/Glue/Athena 7. Deployment: Serve via
- Monitoring: MLflow Serving or SageMaker
Prometheus, endpoints; orchestrate with
Grafana, Kubernetes.
CloudWatch 8. Monitoring: Instrument
- Languages & libs: code for Prometheus metrics;
Python, Scikit- dashboards in Grafana.
Learn, TensorFlow, 9. Alerts: CloudWatch alarms
PyTorch, XGBoost, for drift; SNS notifications.
LightGBM, 10. Documentation &
Transformers, templates: Provide Streamlit
UI to kick off new
SpaCy, NLTK
experiments; include
examples using Transformers,
XGBoost, Deep Learning, NLP
with spaCy.
Page 10 of 10