Cloud & Data Engineering Syllabus (40+ hours)
Data Engineering Roadmap:
Data Engineering Roadmap Video: https://youtu.be/8uVRbry5A2U?feature=shared
Data Engineering basics: (Following will be covered as part of Data Engineering basics)
Data warehousing
SQL
Python
Linux
Data Engineering Core:
Apache Hadoop - (HDFS - Landing Zone for all the incoming files)
Introduction to Big Data & Hadoop Fundamentals
Dimensions of Big data
Type of Data generation
Apache ecosystem & its projects
Hadoop distributors
HDFS core concepts
Modes of Hadoop employment
HDFS Flow architecture
HDFS MRV1 vs MRV2 architecture
Types of Data compression techniques
Rack topology/awareness
HDFS utility commands with usages
Min h/w requirements for a cluster & property files changes
Apache Spark - (Spark Jobs deployment with Python Programming)
Introduction to Spark & features
Spark Core & SparkSQL concepts
Actions & Transformations logics
Spark script to read & write table in Hbase & S3 buckets.
Apache Hbase - (No-SQL Database)
Introduction to Hbase concepts & features
Introduction to NoSQL/CAP theorem concepts
Hbase design/architecture flow
Hbase table commands
Apache Airflow - (Workflows orchestration)
Airflow Introduction
Installation
Architecture
Sample Project
Apache Kafka+ Streaming - (Streaming data from datasource)
Introduction to Kafka and what is streaming data
Working with Kafka & Installing
Projects in Kafka
Cloud Computing Services
Basic overview of the cloud
Different types of cloud models
Different types of cloud services
Different vendors of cloud implementation
Why to choose AWS?
Features of AWS and key offerings
AWS S3 (create buckets to have the data ingested & also the transformed data)
What is AWS S3 & where it is used for?
What is AWS S3 buckets and how to create buckets in AWS Console?
How to upload and manage files in AWS S3
Features & advantages of S3
How does AWS S3 works?
AWS EC2 - (Instance creation with VPC, Networking, Security Groups etc.)
What is EC2 and its important features?
Types of EC2 computing instances
How to create EC2 instances with selecting AMI, Security Groups, VPC and connect using Putty
What are the advantages of EC2 instances?
AWS EMR - (Spin up cluster for deploying Spark jobs)
What is the usage the EMR and big data concepts?
How to launch and configure the EMR service
Run a sample Spark program to view the job details to analyse the big data
AWS Athena (Query data in S3 buckets)
What is Amazon Athena and its features?
How to create database, tables in Athena from S3 buckets and from DDL?
How to use Athena with other AWS Services with usecase
AWS CLI (To query data using CLI commands)
What is Amazon CLI and its features?
How to use cloudshell for accessing aws services?
How to use command line interface for triggering & querying datasets in S3 buckets
AWS DynamoDB (No-SQL DB to have the configuration mapping loaded)
What is Amazon DynamoDB and its features?
How to Create, Insert and Query A Table In DynamoDB
How to integrate DynamoDB with other AWS services
AWS Lambda (Serverless computation service)
What is Amazon Lambda and its features?
How to write simple & basic Lambda function
How to integrate Lambda+ S3 with other AWS services
AWS Glue (to convert file format conversion)
Use AWS Glue Crawlers to discover the schema of your data in S3.
Create an AWS Glue Data Catalog to store metadata information.
Develop AWS Glue ETL jobs to transform the data using SparkSQL.
Utilize AWS Glue Dynamic Frames for schema flexibility.
Schedule and orchestrate ETL jobs using AWS Glue Triggers and Workflows.
AWS Step Functions (to orchestrate all the workflows step by step)
What is Step functions and its features?
How to orchestrate workflows with different AWS Services?
How to define Tasks, States and create State Machines in AWS
How to integrate Step functions in AWS with other services
How to enrol this course: If you're interested in joining this course, Please feel free to contact us:
Call: +91 90424 63272, +91 93422 72961
WhatsApp - +91 96196 63272
email id : admin@tamilboomi.com