[go: up one dir, main page]

0% found this document useful (0 votes)
370 views3 pages

Data Engineering and MLops

The document outlines the syllabus for a Data Engineering and MLOps course, detailing course objectives, teaching strategies, and assessment methods. It covers key topics such as the data engineering lifecycle, data architecture, MLOps challenges, and model governance. The course aims to equip students with the skills to analyze and apply data engineering principles and MLOps features effectively.

Uploaded by

Abubaker osman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
370 views3 pages

Data Engineering and MLops

The document outlines the syllabus for a Data Engineering and MLOps course, detailing course objectives, teaching strategies, and assessment methods. It covers key topics such as the data engineering lifecycle, data architecture, MLOps challenges, and model governance. The course aims to equip students with the skills to analyze and apply data engineering principles and MLOps features effectively.

Uploaded by

Abubaker osman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

DATA ENGINEERING AND MLOps Semester 7

Course Code BAD714C CIE Marks 50


Teaching Hours/Week (L:T:P: S) 3:0:0:0 SEE Marks 50
Total Hours of Pedagogy 50 Total Marks 100
Credits 04 Exam Hours 3
Examination type (SEE) Theory
Course objectives:
1. To introduce the concepts and lifecycle of Data Engineering.
2. To explore principles of data architecture and distributed systems.
3. To familiarize students with MLOps pipelines for scalable ML solutions.
4. To understand model deployment, CI/CD, monitoring, and feedback loops.
5. To ensure governance, reproducibility, and responsible AI compliance.

Teaching-Learning Process
These are sample Strategies, which teachers can use to accelerate the attainment of the various course
outcomes.
1. Lecturer method (L) needs not to be only a traditional lecture method, but alternative effective teaching
methods could be adopted to attain the outcomes.
2. Use of Video/Animation to explain functioning of various concepts.
3. Encourage collaborative (Group Learning) Learning in the class.
4. Ask at least three HOT (Higher order Thinking) questions in the class, which promotes critical thinking.
5. Adopt Problem Based Learning (PBL), which fosters students’ Analytical skills, develop design thinking skills
such as the ability to design, evaluate, generalize, and analyze information rather than simply recall it.
6. Introduce Topics in manifold representations.
7. Show the different ways to solve the same problem with different circuits/logic and encourage the students to
come up with their own creative ways to solve them.
8. Discuss how every concept can be applied to the real world - and when that's possible, it helps improve the
students' understanding
9. Use any of these methods: Chalk and board, Active Learning, Case Studies

Module-1
Data Engineering: Definition, The Data Engineering Lifecycle, Evolution of the Data Engineer, Data
Engineering and Data Science, Data Engineering Skills and Activities, Data Maturity and the Data Engineer,
The Background and Skills of a Data Engineer, Business Responsibilities, Technical Responsibilities, The
Continuum of Data Engineering Roles, Data Engineers Inside an Organization ,
Internal-Facing Versus External-Facing Data Engineers, Data Engineers and Other Technical Roles, Data
Engineers and Business Leadership.
Data Engineering Lifecycle: The Data Lifecycle Versus the Data Engineering Lifecycle, Generation: Source
Systems, Major Undercurrents Across the Data Engineering Lifecycle

Textbook 1:Chapter 1 (1.1–1.5), Chapter 2 (2.1–2.4)


Module-2
Data Architecture: Enterprise Architecture Defined, Data Architecture Defined, “Good” Data Architecture,
Principles of Good Data Architecture, Major Architecture Concepts, Domains and Services , Distributed
Systems, Scalability, and Designing for Failure ,Tight Versus Loose Coupling: Tiers, Monoliths, and
Microservices , User Access: Single Versus Multitenant , Event-Driven Architecture , Examples and Types of
Data Architecture
Choosing Technologies Across the Data Engineering Lifecycle: Team Size and Capabilities, Speed to Market,
Interoperability, Cost Optimization and Business Value, Total Cost of Ownership Total Opportunity Cost of
Ownership, FinOps, Today Versus the Future: Immutable Versus Transitory Technologies: Hybrid Cloud,
Multicloud , Decentralized: Blockchain and the Edge ,Monolith Versus Modular , Serverless Versus Servers,
Server Versus Serverless evaluation

1
Textbook 1:Chapter 3 (3.1–3.7), Chapter 4 (4.1–4.6)

Module-3
MLOps Challenges, MLOps to Mitigate Risk, Risk Assessment, Risk Mitigation, MLOps for Responsible
AI,MLOps for Scale.
Key MLOps Features: Model Development, Establishing Business Objectives, Data Sources and Exploratory
Data Analysis, Feature Engineering and Selection, Training and Evaluation, Reproducibility, Responsible AI,
Productionalization and Deployment, Model Deployment Types and Contents, Model Deployment
Requirements, Monitoring
Developing Models: Machine Learning Model, Required Components, Different ML Algorithms, Different
MLOps Challenges, Data Exploration, Feature Engineering and Selection, Feature Engineering Techniques,
How Feature Selection Impacts MLOps Strategy, Experimentation, Evaluating and Comparing Models,
Choosing Evaluation Metrics, CrossChecking Model Behavior, Impact of Responsible AI on Modeling, Version
Management and Reproducibility

Textbook 2: Chapter 1 (1.1–1.3), Chapter 2 (2.1–2.4)

Module-4
Preparing for Production: Runtime Environments, Adaptation from Development to Production Environments,
Data Access Before Validation and Launch to Production, Final Thoughts on Runtime Environments, Model
Risk Evaluation, The Purpose of Model Validation, The Origins of ML Model Risk, Quality Assurance for
Machine Learning.
Deploying to Production: CI/CD Pipelines, Building ML Artifacts, The Testing Pipeline, Deployment Strategies,
Categories of Model Deployment, Considerations When Sending Models to Production, Maintenance in
Production, Containerization, Scaling Deployments, Requirements and Challenges.

Textbook 2:Chapter 3 (3.1–3.5), Chapter 4 (4.1–4.4)


Module-5 10 hours
Monitoring and Feedback Loop: Models Be Retrained, Understanding Model Degradation,
Ground Truth Evaluation, Input Drift Detection, Drift Detection in Practice, Example Causes of Data Drift, Input
Drift Detection Techniques, The Feedback Loop, Logging, Model Evaluation, Online Evaluation
Model Governance: Governance the Organization Needs, Matching Governance with Risk Level, Current
Regulations Driving MLOps Governance, Pharmaceutical Regulation in the US: GxP
Financial Model Risk Management Regulation, GDPR and CCPA Data Privacy Regulations, The New Wave of
AI-Specific Regulation, The Emergence of Responsible AI, Key Elements of Responsible AI (Element 1 to
Element 5), A Template for MLOps Governance (Step 1 to 8).

Textbook 2:Chapter 5 (5.1–5.4), Chapter 6 (6.1–6.3)

Course outcome
At the end of the course, the student will be able to :
1. Explain Data Engineering and various roles.
2. Analyze various major architecture concepts of Data engineering.
3. Apply MLOps Features and analyze the challenges in developing and Deploying Machine Learning Models
4. Design CI/CD Pipelines for Deploying Machine Learning Models
5. Explain the need for model governance and MLOps governance.

2
Assessment Details (both CIE and SEE)
The weightage of Continuous Internal Evaluation (CIE) is 50% and for Semester End Exam (SEE) is
50%. The minimum passing mark for the CIE is 40% of the maximum marks (20 marks out of 50)
and for the SEE minimum passing mark is 35% of the maximum marks (18 out of 50 marks). A
student shall be deemed to have satisfied the academic requirements and earned the credits allotted
to each subject/ course if the student secures a minimum of 40% (40 marks out of 100) in the sum
total of the CIE (Continuous Internal Evaluation) and SEE (Semester End Examination) taken
together.

Continuous Internal Evaluation:


● For the Assignment component of the CIE, there are 25 marks and for the Internal Assessment
Test component, there are 25 marks.
● The first test will be administered after 40-50% of the syllabus has been covered, and the second
test will be administered after 85-90% of the syllabus has been covered
● Any two assignment methods mentioned in the 22OB2.4, if an assignment is project-based
then only one assignment for the course shall be planned. The teacher should not conduct two
assignments at the end of the semester if two assignments are planned.
● For the course, CIE marks will be based on a scaled-down sum of two tests and other methods
of assessment.
Internal Assessment Test question paper is designed to attain the different levels of Bloom’s taxonomy
as per the outcome defined for the course.

Semester-End Examination:
Theory SEE will be conducted by University as per the scheduled timetable, with common question papers for
the course (duration 03 hours).
1. The question paper will have ten questions. Each question is set for 20 marks.
2. There will be 2 questions from each module. Each of the two questions under a module (with a maximum
of 3 sub-questions), should have a mix of topics under that module.
3. The students have to answer 5 full questions, selecting one full question from each module.
4. Marks scored shall be proportionally reduced to 50 marks.

Suggested Learning Resources:


Textbooks:
1. Joe Reis, Matt Housley, Fundamentals of Data Engineering, O’Reilly, 2022
2. Mark Treveil & Dataiku Team, Introducing MLOps, O’Reilly, 2020
Web links and Video Lectures (e-Resources):
https://www.ibm.com/think/topics/data-engineering
https://martinfowler.com/articles/microservices.html
https://www.coursera.org/specializations/mlops
Activity Based Learning (Suggested Activities in Class)/ Practical Based learning
● Assignment 1 (15 marks) : Select a simple machine learning use case (e.g., house price prediction,
customer churn prediction, or fraud detection). Design an MLOps pipeline that includes
a. Problem statement and business objective
b. Data sources and exploratory data analysis summary
c. Feature engineering and selection approach
d. Model training and evaluation plan
e. Reproducibility and version control strategy
Draw a flowchart or block diagram representing the MLOps pipeline
● Assignment 2(10 marks): Choose any one open-source MLOps tool (like MLflow, Kubeflow, or TFX).
Study how the tool supports
a. CI/CD pipelines
b. Model testing & validation

You might also like