Hi, I'm Bayo.
I'm a passionate Data/ML Engineer with a knack for building robust, scalable data pipelines and turning raw data into actionable insights. With years of experience in the field, I've worked on projects ranging from real-time streaming analytics to large-scale batch processing systems. My expertise extends to machine learning, where I've implemented ML pipelines and deployed models at scale, bridging the gap between data engineering and data science.
I'm always excited to expand my knowledge and stay up-to-date with the latest trends in data engineering. Currently, I'm focusing on:
- Generative AI: Exploring applications of generative models in data pipelines and analytics
- MLOps: Implementing best practices for deploying and maintaining machine learning models in production
- Graph Databases: Learning Neo4j for handling complex, interconnected data
- Data Mesh Architecture: Studying decentralized data management approaches
Real-time Data Processing Pipeline with Spark Streaming
- Developed a robust real-time data processing pipeline using Apache Spark Streaming and Kafka
- Ingested high-volume streaming data from IoT devices and processed it in real-time
- Implemented windowed operations and stateful transformations to analyze time-series data
- Utilized Spark SQL for complex aggregations and Delta Lake for reliable storage
- Deployed the pipeline on AWS EMR for scalability and cost-effectiveness
Data Warehouse Optimization
- Designed and implemented a star schema data model for a large-scale data warehouse
- Optimized query performance by creating appropriate indexes and partitioning strategies
- Reduced query execution time by 60% through careful schema design and query tuning
ETL Pipeline Automation
- Built an automated ETL pipeline using Apache Airflow to process daily batches of data
- Integrated multiple data sources and implemented data quality checks
- Reduced manual intervention by 87% and improved data freshness