[go: up one dir, main page]

0% found this document useful (0 votes)
17 views64 pages

HLD-1 Module Bytes Notes

The document outlines various modules related to deploying applications on AWS, including cloud computing concepts, AWS services like EC2 and S3, and tools such as SSH and PuTTY. It provides detailed instructions for setting up a server environment, installing Docker, and running a Docker container for the QEats application. Key features of AWS services and commands for managing virtual machines and containers are also discussed.

Uploaded by

Anusree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views64 pages

HLD-1 Module Bytes Notes

The document outlines various modules related to deploying applications on AWS, including cloud computing concepts, AWS services like EC2 and S3, and tools such as SSH and PuTTY. It provides detailed instructions for setting up a server environment, installing Docker, and running a Docker container for the QEats application. Key features of AWS services and commands for managing virtual machines and containers are also discussed.

Uploaded by

Anusree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

HLD-1 Module Bytes Notes

Module 1 - Deploy App server in AWS................................................................................ 1


Module 2 - QPrep - System Design.....................................................................................15
Module 3 - Jenkins............................................................................................................... 35
Module 4 - Docker Introduction.......................................................................................... 42
Module 5 - Docker Advanced.............................................................................................. 48
Module 6 - Kafka...................................................................................................................55

Module 1 - Deploy App server in AWS


By self
●​ Cloud
○​ Cloud computing
■​ refers to the delivery of computing services—such as servers,
storage, databases, networking, software, analytics, and
intelligence—over the internet (“the cloud”) rather than hosting on
a local computer or physical server.
■​ This model provides faster innovation, flexible resources, and
economies of scale.
○​ Key features of cloud computing include:
■​ On-demand services: You can access resources like storage or
compute power as needed, without active management by the user.
■​ Scalability: You can easily scale up or down depending on demand.
■​ Pay-as-you-go: You only pay for what you use, reducing capital
expenses.
■​ Global access: Cloud services are available over the internet,
allowing access from anywhere with an internet connection.
○​ There are three main types of cloud computing models:
■​ Infrastructure as a Service (IaaS):
●​ Provides virtualized computing resources over the internet
●​ (e.g., Amazon EC2, Google Cloud Compute Engine).
■​ Platform as a Service (PaaS):
●​ Delivers hardware and software tools over the internet
●​ (e.g., Google App Engine, Microsoft Azure).
■​ Software as a Service (SaaS):
●​ Provides access to software applications over the internet
●​ (e.g., Google Workspace, Microsoft Office 365).
○​ Cloud services allow businesses to focus on innovation, security, and growth
without the need to manage complex hardware and infrastructure themselves.
●​ AWS
○​ Amazon Web Services (AWS)
■​ is one of the most popular cloud service providers out in the
market.
■​ They provide different kinds of services for companies and
individuals.
○​ Some of the popular offerings from AWS are:
■​ Virtual machines (EC2) - Elastic Compute Cloud
●​ VM: Virtual Machine
○​ A software-based emulation (reproduction/ imitation)
of a physical computer, with specific configurations
(e.g., CPU, RAM, storage).
○​ in simple terms, computers with some configuration;
eg: 2cpu, 16GB RAM, 20 GB hard disk
■​ Storage (S3) - Simple Storage Service
●​ A cloud storage service provided by AWS that offers scalable
object storage.
■​ Load Balancers (ELB) - Elastic Load Balancer
●​ A service by AWS that automatically distributes incoming
application traffic across multiple targets, such as EC2
instances, to improve application availability and fault
tolerance.
●​ AWS Elastic Compute Cloud (EC2)
○​ Offers scalable and flexible virtual server instances in the cloud.
○​ With AWS EC2, users can quickly launch, configure, and manage virtual
machines known as "instances" to run their applications and workloads.
■​
●​ Virtual Server vs Virtual Machine
○​ Virtual server is generally the same as a virtual machine (VM), though the
terms are used in slightly different contexts:
○​ Virtual Machine (VM): Refers to the complete emulation of a physical
computer that runs its own operating system (OS) and applications.
○​ Virtual Server: This term is often used when the VM is being used
specifically for server purposes, such as hosting websites, applications, or
databases.
●​ SSH
○​ SSH (Secure Shell) is a
■​ cryptographic network protocol (like http but cryptographic) that
provides a secure way to access and manage remote systems
over an unsecured network, like the internet.
■​ It encrypts all data exchanged between the client and server, ensuring
that sensitive information, such as login credentials and commands, is
protected.
○​ Key Features of SSH:
■​ Secure Remote Access: Enables users to remotely log in and
manage systems securely.
■​ Encryption: Encrypts communication to protect against
eavesdropping.
■​ Authentication: Uses public key authentication or passwords to verify
the identity of users and devices.
■​ File Transfer: Facilitates secure file transfers using protocols like
SFTP (SSH File Transfer Protocol) or SCP (Secure Copy).
■​ Command Execution: Allows running commands on remote
machines securely.
■​ Port Forwarding: Creates secure tunnels for other applications.
○​ SSH is commonly used by system administrators and developers to manage
servers, transfer files, and execute commands remotely.
●​ PuTTY
○​ We download puTTY after learning about AWS to run SSH safely
○​ PuTTY is a
■​ free and open-source terminal emulator, serial console, and
network file transfer application
■​ commonly used for making secure shell (SSH) connections to
remote servers, especially in Unix/Linux environments.
■​ PuTTY supports various network protocols like SSH, Telnet, SCP,
SFTP, and Rlogin, allowing users to securely log into remote
systems, manage servers, and transfer files.
○​ Key Features:
■​ SSH Client: Securely connect to remote servers over SSH for
command-line interaction.
■​ File Transfers: Supports SCP and SFTP protocols for transferring
files securely.
■​ Telnet and Rlogin: Supports older protocols for less secure
connections (though SSH is more common today).
■​ Serial Console: Allows communication with devices via serial ports.
■​ Cross-Platform: Available on Windows, macOS (with third-party
builds), and Linux.
○​ PuTTY is especially popular among system administrators and developers for
remote server management.
○​ Terms:
■​ Terminal Emulator - emulates several network protocols (SSH,
Telnet, SCP, SFTP, and Rlogin) via terminal
○​ Serial Console - method of managing or interacting with a computer, server,
or network device through a serial communication interface ie.,
terminal/console
○​ network file transfer application - software that allows you to transfer files
between computers or devices over a network
●​ AWS EC2 features
○​ Amazon EC2 offers the following key features:
○​ Instances: Virtual servers for computing tasks.
○​ Amazon Machine Images (AMIs): Preconfigured templates with OS and
software for instances.
○​ Instance Types: Different CPU, memory, storage, and networking
configurations.
○​ Amazon EBS Volumes: Persistent storage for data.
■​ Elastic Block Store(EBS) - a storage service provided by Amazon
Web Services (AWS).
○​ Instance Store Volumes: Temporary storage that is deleted upon stopping or
terminating the instance.
○​ Key Pairs: Secure login credentials using public/private key pairs.
○​ Security Groups: Virtual firewalls controlling inbound/outbound traffic.
○​ PCI DSS Compliance: Supports secure processing of credit card data,
compliant with PCI standards.
■​ PCI DSS Compliance refers to adhering to the Payment Card
Industry Data Security Standard (PCI DSS)
■​ It is a set of security standards created to protect cardholder data and
reduce credit card fraud.
●​ EBS - Elastic Block Store
○​ EBS stands for Elastic Block Store, an easy to use, high-performance,
block-storage service designed for use with EC2.
○​ Block storage systems are used to host databases, support random
read/write operations, and keep system files of the running virtual machines.
○​ Data is stored in volumes and blocks where files are split into
evenly-sized blocks. Each block has its own address, but unlike objects they
do not have metadata.
○​ When storing large amounts of data, files are split into smaller chunks of a
fixed size, the “blocks,” which are distributed amongst the storage nodes.
○​ This also supports the volume IO performance.
○​ In EBS stopping the instance will not not stop the billing and charges, instead
only terminating the instance will stop it. In case of stopping, data on Elastic
Block Store volume remains, whereas in terminating that gets deleted. This
means that your EBS data will continue to persist, and you may be charged
for it, if you only stop the instance.
●​ Run QEats in AWS instance
○​ In Ubuntu terminal to run QEats in AWS instance, We run:
○​ sudo apt-get update​
sudo apt-get install wget​
wget -qO- https://get.docker.com/ | sh​
sudo usermod -aG docker $USER​
sudo service docker start​
newgrp docker
■​ sudo apt-get update
●​ Tdlr: get updates from systems source file list
●​ sudo
○​ stands for "superuser do" and is a command in Unix
and Linux-based systems
○​ that allows a permitted user to execute a command
as the superuser (root) or another user with elevated
privileges
●​ apt- get
○​ APT: Advanced Package Tool
○​ GET: Retrieve packages (as in "get" software or
packages)
○​ apt is a command-line interface for the Advanced
Package Tool (APT)
○​ apt-get
■​ is part of the APT (Advanced Package Tool)
suite
■​ is designed to retrieve and manage software
packages for installation, upgrade, or removal
on Debian-based systems like Ubuntu.
●​ As a superuser, to retrieve updates for the package list (i.e.,
information about available software and their versions) from
the repositories
●​ Repos are defined in the system's sources list files, which are
stored in files like
○​ /etc/apt/sources.list
○​ Files inside /etc/apt/sources.list.d/
■​ sudo apt-get install wget
●​ Tdlr: install wget
●​ As superuser, install wget (a utility that retrieves or downloads
content from the web)
■​ wget -qO- https://get.docker.com/ | sh
●​ Tdlr: download docker from web and execute it in shell
●​ wget: The command-line utility used to download files from the
web.
●​ -q: This option tells wget to run in "quiet" mode, suppressing
the usual output.
●​ -O-: This tells wget to output the downloaded content directly
to standard output (i.e., the terminal) instead of saving it to a
file.
●​ https://get.docker.com/: This is the URL from which
wget is downloading content. In this case, it's a script provided
by Docker to install Docker on your system.
●​ |: This is a pipe, which takes the output of the command on
the left (wget) and sends it as input to the command on the
right (sh).
●​ sh: This command tells the shell to execute the script that is
being piped in from wget.
■​ sudo usermod -aG docker $USER
●​ Tdlr: add the current user (represented by $USER) to the
docker group on a Linux system.
●​ sudo: Runs the command with superuser (root) privileges, as
modifying user groups requires administrative rights.
●​ usermod: This is the command to modify a user account.
●​ -aG: These are options for the usermod command:
○​ -a (append): Adds the user to the specified group(s)
without removing them from any other groups they are
already part of.
○​ -G (group): Specifies the group(s) to which the user
should be added. In this case, it's the docker group.
●​ docker: This is the name of the group you're adding the user
to. Docker is a containerization platform, and adding the user
to the docker group allows them to run Docker
commands without needing to use sudo every time.
●​ $USER: This is a shell variable that refers to the currently
logged-in user.
■​ sudo service docker start
●​ Tdlr: start the Docker service on a Linux system
●​ sudo: Runs the command with superuser (root) privileges, as
managing services requires administrative rights.
●​ service: A command used to manage services (also known
as daemons) on older Linux systems. It allows you to start,
stop, restart, and check the status of services.
●​ docker: The name of the service you're managing—in this
case, the Docker service, which runs the Docker daemon that
manages containers.
●​ start: This action starts the Docker service if it is not already
running.
■​ newgrp docker
●​ Tdlr: switch the current user’s group to the docker group
within the same session
●​ newgrp: This command is used to log in to a new group,
effectively changing the current group ID for your shell session.
●​ docker: This specifies the group you want to switch to, which
is the docker group in this case.
○​ Then we run:
■​ sudo apt install -y telnet

■​ Tdlr: to install the Telnet client on a Linux system


■​ sudo: Runs the command with superuser (root) privileges, as
installing software requires administrative rights.
■​ apt: This is the package manager command used to manage
software packages (install, update, remove, etc.) on Debian-based
systems.
■​ install: This tells apt that you want to install a package.
■​ -y: This flag automatically answers "yes" to any prompts during the
installation process. Without it, you would be asked to confirm the
installation manually.
■​ telnet: The name of the package you want to install. In this case, it's
the Telnet client, a tool that allows you to connect to remote servers
using the Telnet protocol.
○​ Then we run:

sudo docker run -d -m 800m -v /var/log:/var/log:rw -p 8081:8081


criodo/qeats-server
■​ Tdlr: to start a Docker container with several specific
configurations
■​ runs the criodo/qeats-server Docker container in the
background, limits its memory usage to 800 MB, mounts the
/var/log directory from the host to the container with read-write
permissions, and exposes port 8081 of the container to port
8081 on the host
■​ sudo: Runs the command with superuser privileges, which is required
to manage Docker containers if your user isn't part of the Docker
group.
■​ docker run: This is the Docker command to create and start a new
container from an image.
■​ -d: Runs the container in "detached" mode, meaning the container will
run in the background, not attached to your terminal.
■​ -m 800m: Limits the container's memory usage to 800 MB. This
prevents the container from using more than 800 MB of RAM.
■​ -v /var/log:/var/log:rw: Mounts a volume, allowing you to
share files between the host and the container:
●​ /var/log on the host is mapped to /var/log inside the
container.
●​ rw means this mount will have read-write access, so both the
host and the container can read and write to the logs.
■​ -p 8081:8081: Maps port 8081 on the host to
port 8081 in the container:
●​ The first 8081 refers to the host's port.
●​ The second 8081 refers to the container's port. This allows
access to the service running inside the container (on port
8081) via port 8081 on your local machine.
■​ criodo/qeats-server: This is the name of the Docker image
being used to create the container. It refers to an image
(qeats-server) stored in the criodo repository, likely representing
a server component of the qeats project.
○​ Then we run:
■​ tail -f /var/log/qeats-server.log
■​ Tdlr: to continuously monitor the contents of the log file
/var/log/qeats-server.log in real time
■​ tail: This command displays the last part of a file. By default, it shows the
last 10 lines of a file.
■​ -f: This option stands for "follow." It tells tail to keep the file open and
display any new lines that are added in real time. This is useful for
monitoring logs as they are being written.
■​ /var/log/qeats-server.log: This is the path to the log file that you want to
monitor. In this case, it's the log file for the qeats-server.
○​ Then we run:
■​ cat /var/log/qeats-server.log
■​ Tdlr: to display the entire contents of the file
/var/log/qeats-server.log in the terminal.
■​ cat: This command stands for "concatenate" and is commonly used to
read and display the content of a file or files.
■​ /var/log/qeats-server.log: This is the path to the log file you want to view,
which in this case contains logs related to the qeats-server.
●​ Login and out of QEats
○​ telnet localhost 8081 - login
○​ Logout - logout
●​ Telnet
○​ Telnet is a
■​ network protocol and application used to remotely access and
manage devices over a network.
■​ It allows users to log in to another computer or network device
(like routers, switches, etc.) and issue commands as if they were
directly connected to the remote machine.
○​ Key Features of Telnet:
■​ Remote Access: Provides the ability to log into a remote device or
server and execute commands.
■​ Text-Based Communication: Operates as a command-line interface
(CLI), where users type commands to interact with the remote system.
■​ Unencrypted Communication: Telnet sends all data, including
passwords, in plaintext, which makes it insecure for modern usage
over the internet.
■​ Port Number: By default, Telnet runs over TCP port 23.
○​ Telnet Usage:
■​ Used primarily for connecting to remote devices to perform
administrative tasks or troubleshooting.
■​ Historically used for remote terminal connections, but it has largely
been replaced by SSH (Secure Shell), which offers encryption for
secure remote access.
○​ Telnet vs SSH:
■​ Telnet transmits data in plaintext, making it insecure.
■​ SSH (Secure Shell) encrypts the communication, providing secure
and authenticated access, which is why it's favoured over Telnet
today.
○​ Telnet is still sometimes used in private networks or for specific diagnostic
purposes, but it's not recommended for secure remote management over
public networks due to its lack of encryption.
●​ Docker
○​ Docker is an
■​ open-source platform that automates the deployment, scaling, and
management of applications using containers.
○​ Containers are lightweight, portable, and self-sufficient environments that
package an application and its dependencies, ensuring it runs consistently
across different environments, from development to production.
○​ Key Concepts of Docker:
■​ Containers:
●​ Containers are lightweight, standalone environments that run
applications and their dependencies. They ensure the
application runs the same regardless of the host system.
●​ Unlike virtual machines (VMs), containers share the host
system's kernel, making them faster and more efficient
because they don't require a separate OS for each instance.
■​ Images:
●​ A Docker image is a lightweight, immutable file that includes
the application code, runtime, libraries, and settings needed
to run the application.
●​ Images are like blueprints used to create Docker
containers. You can pull existing images from Docker Hub
(Docker’s public repository) or create custom images using a
Dockerfile.
■​ Dockerfile:
●​ A Dockerfile is a script that contains a set of instructions
to build a Docker image. It specifies the base image,
application code, dependencies, environment variables, and
commands needed to run the container.
■​ Docker Hub:
●​ Docker Hub is a repository for Docker images. It allows users
to upload and share images publicly or privately, or pull
pre-built images from others.
■​ Docker Engine:
●​ The core component that allows you to build and run
containers. It is responsible for managing the containers on the
host system.
■​ Docker Compose:
●​ A tool used to define and run multi-container Docker
applications using a YAML file. Compose allows you to define
how different services (containers) work together in an
application (e.g., a web server, database, etc.).

■​
●​ Docker Ecosystem
●​ │
●​ ├── Docker Engine
●​ │ ├── Manages Containers
●​ │ │ ├── Containers are created from Images
●​ │ │ └── Containers are lightweight, isolated environments
running applications
●​ │ ├── Manages Images
●​ │ │ ├── Images are built from Dockerfiles
●​ │ │ └── Images are read-only templates with instructions to
create containers
●​ │ └── Pulls Images from Docker Hub
●​ │ └── Docker Hub is a cloud-based registry for storing
and sharing images
●​ │
●​ ├── Dockerfile
●​ │ └── Defines instructions to build an Image
●​ │ └── The result is a Docker Image
●​ │
●​ └── Docker Compose
●​ └── Manages multi-container applications
●​ ├── Uses YAML files to define and run multiple
containers
●​ └── Uses the Docker Engine to run these containers
○​ Advantages of Docker:
■​ Portability: Docker ensures that your application runs the same way
on any system, as long as Docker is installed.
■​ Efficiency: Containers are lightweight and use fewer resources
compared to traditional VMs because they share the host's OS kernel.
■​ Isolation: Applications running in containers are isolated from each
other, reducing compatibility issues.
■​ Scalability: Docker makes it easy to scale applications across
multiple environments with consistent configurations.
○​ Use Cases:
■​ Development: Provides developers a consistent environment,
avoiding the "it works on my machine" problem.
■​ Continuous Integration/Continuous Deployment (CI/CD):
Automates testing, building, and deploying code.
■​ Microservices: Enables applications to be broken down into smaller,
manageable services that can be independently deployed and scaled.
○​ In summary, Docker simplifies the development and deployment process by
packaging applications and their dependencies in containers, ensuring
consistency across different environments.
○​
●​ Genymotion
○​ Use Genymotion website to emulate the QEats app apk
○​ is an Android emulator that allows developers to test their Android
applications on virtual devices.
●​
●​ Mattermost
○​ is an open-source messaging platform designed for team collaboration.
○​ It provides features similar to those found in popular communication tools
like Slack or Microsoft Teams but with an emphasis on privacy, control, and
customization.
●​ Slack
○​ collaboration and communication platform designed to facilitate teamwork and
streamline workplace communication like MS Teams

By Chatgpt
●​ Overview
●​ Primary Goals
■​ Launch Your First Virtual Server in AWS
●​ Set up an EC2 instance (virtual machine) with desired
specifications (e.g., 2 CPU, 16GB RAM).
■​ Deploy the App Backend Server
●​ Deploy your backend server on the AWS EC2 instance.
■​ Connect Mobile App to the App Backend Server
●​ Configure your mobile app to communicate with the backend
server running on AWS.
●​ Key Concepts Covered
■​ Cloud Computing
●​ Cloud: A collection of hardware resources (storage, compute)
managed externally for you. For example, AWS provides cloud
infrastructure to run your servers and apps.
■​ AWS (Amazon Web Services)
●​ EC2 (Elastic Compute Cloud): Virtual machines or instances
used to run applications.
●​ S3 (Simple Storage Service): Storage service for files and
data.
●​ ELB (Elastic Load Balancer): Automatically distributes traffic
across multiple instances.
■​ Docker (Optional)
●​ Docker provides containerization technology, offering a
lightweight alternative to VMs.
●​ Setup1 - Create an AWS Account
●​ Steps to Create an AWS Account:
■​ Visit AWS Console:
●​ Go to the AWS Console: https://aws.amazon.com/console/.
■​ Sign Up:
●​ Create a new account by following the steps on the AWS site.
■​ Payment Information:
●​ Use a valid Debit/Credit card for the sign-up process (A small
fee of 2 INR will be deducted and refunded after verification).
■​ Login to AWS Console:
●​ After account creation, log in to the AWS console.
■​ Dashboard Overview:
●​ Explore the AWS Console dashboard; it's use-case specific
and you don’t need to worry about all options immediately.
■​ Recommendation:
●​ Use a new AWS account if you already have one, to avoid
exceeding free tier limits and potential charges.
●​
●​ Milestone1 - Create an EC2 instance
●​ Steps to Create and Connect to an AWS EC2 Instance:
■​ Login to AWS:
●​ Go to AWS Console and login.
■​ Navigate to EC2:
●​ Search for “EC2” in the top search bar and select EC2: Virtual
Servers in the Cloud.
■​ Launch EC2 Instance:
●​ Click on Launch Instance to create a new EC2 instance.
■​ Instance Details:
●​ Name your instance.
●​ Select Ubuntu from the Quick Start section in the AMI list.
■​ Create Key Pair:
●​ Scroll to the Key Pair section.
●​ Click on Create a new key pair, name it, and download the
.pem file.
■​ Launch Instance:
●​ Click on Launch Instance to create the instance.
■​ View Instances:
●​ Once the instance is launched, click on View all instances to
see the status.
■​ Connect to EC2:
●​ Select the instance from the dashboard.
●​ Click Actions → Connect → SSH Client.
●​ Follow the instructions to connect via SSH, including updating
the permissions for the private key file (.pem).
■​ Error Handling:
●​ If you get an error, check that the .pem file permissions are set
to read-only (use chmod 400 filename.pem).
■​ Open Port 8081:
●​ Select your instance and go to Security Groups.
●​ Click on Edit Inbound Rules.
●​ Add a Custom TCP rule for port 8081 with CIDR block
0.0.0.0/0.
●​ Click Save Rules.
■​ Prevent Unintended Charges:
●​ Stop or terminate the instance when not in use to avoid
charges.
●​ Notes:
■​ Security Groups: Used to control inbound/outbound traffic.
■​ EBS (Elastic Block Store): Look into EBS only for understanding
instance storage. EC2 Instance states
●​ Milestone2 - SSH into your instance
●​ Steps to SSH into Your AWS EC2 Instance:
■​ Install PuTTY (for Windows users):
●​ Download and set up PuTTY from here.
■​ Understand SSH:
●​ SSH (Secure Shell) allows you to remotely log in to your server
securely.
■​ Set Permissions for PEM File:
●​ Navigate to where your .pem file is located (usually
~/Downloads/).
■​ Run the command to set read-only permissions for the PEM file:
●​ chmod 400 <path-to-your-pem-file>
■​ SSH into Your EC2 Instance:
●​ Use the following SSH command to log into your instance:
○​ ssh -i "<path-to-your-pem-file>"
ubuntu@<your-ec2-url>
●​ Example:
○​ ssh -i "~/Downloads/ubuntu-test.pem"
ubuntu@ec2-13-233-100-101.ap-south-1.comp
ute.amazonaws.com
■​ Get EC2 Public DNS:
●​ Find your EC2 instance URL under the Public IPv4 DNS
heading on your EC2 dashboard.
■​ Login Confirmation:
●​ Once you run the SSH command successfully, you will be
logged into your remote server.
●​ Notes:
■​ Windows Users: Use PuTTY for SSH access by converting .pem to
.ppk format.
■​ References:
●​ Learn more about SSH and how to convert .pem to .ppk
format for Putty if you're on Windows.
●​ What is SSH
●​ SSH into AWS instance
●​ Converting private key from .pem to .ppk format for Putty
(Windows)
●​ Milestone3 - Deploy your first app backend server
●​ Steps to Deploy the App Backend Server for QEats:
■​ Install Docker:
●​ Update the package manager:
○​ sudo apt-get update
●​ Install wget:
○​ sudo apt-get install wget
●​ Download and install Docker:
○​ wget -qO- https://get.docker.com/ | sh
●​ Add the current user to Docker group:
○​ sudo usermod -aG docker $USER
●​ Start Docker service:
○​ sudo service docker start
●​ Apply group changes:
○​ newgrp docker
■​ Install Telnet:
●​ Install telnet client:
○​ sudo apt install -y telnet
●​ Run the Docker Container for QEats Backend:
○​ Ensure Port 8081 is open for external connections.
●​ Run the Docker container:
○​ sudo docker run -d -m 800m -v
/var/log:/var/log:rw -p 8081:8081
criodo/qeats-server
■​ Monitor Server Logs:
●​ To see live logs:
○​ tail -f /var/log/qeats-server.log
●​ To print all logs:
○​ cat /var/log/qeats-server.log
■​ Test the Server:
●​ Check server connection:
○​ telnet localhost 8081
○​ If the connection is successful, you should see a
confirmation message.
○​ To exit telnet, press Ctrl + ] and type quit.
■​ Logout:
●​ Exit the AWS instance:
●​ logout
●​ Notes:
■​ Ensure firewall rules allow incoming traffic on port 8081.
■​ Use Docker Hub to download and run containers.
■​ Telnet is used to verify server functionality on port 8081.
■​ Check logs if the server doesn't start properly.
■​ Crio’s Docker Byte
■​ Getting started with Docker
■​ Telnet
●​ Milestone4 - Connect app to the server
●​ Steps to Connect the QEats Android App with Your Server:
■​ Download and Install the QEats Android App:
●​ Download the APK from this link.
●​ Install it on your Android phone.
●​ If you don't have an Android phone, use an emulator (steps in
the next section).
■​ Login to QEats App:
●​ Open the app on your phone or emulator.
●​ Enter the following details:
○​ IP Address: Public IP of your AWS EC2 instance.
○​ Port Number: 8081.
●​ Log in to the app.
■​ Interact with the App:
●​ After logging in, you'll see restaurants listed.
●​ Use the app's features, such as searching for restaurants and
placing orders, to interact with the backend.
●​ Steps to Run QEats App on an Android Emulator:
■​ Go to Genymotion:
●​ Visit Genymotion.
●​ Follow the instructions to set up the Android emulator on your
computer.
■​ Install QEats App on the Emulator:
●​ Download the QEats APK on your computer.
●​ Install the APK in the emulator.
■​ Connect the Emulator to Your Server:
●​ Enter the AWS IP address and port 8081 to connect the app
to your backend server.
■​ After this, you can explore the app's features and interact with the
backend server hosted on your AWS EC2 instance.
●​ Milestone5 - Takeaways
●​ Takeaways:
■​ Successfully deployed your first app server on AWS EC2 using
Docker.
■​ Your friends can now install the QEats app and connect to your server
using the app link .
■​ Remember to terminate your EC2 instance after usage to avoid
unnecessary charges.
●​ Solutions to Curious Cats:
■​ Solutions to Curious Cats
■​ Try hosting Mattermost (a Slack/Microsoft Teams alternative)
yourself by following the installation instructions: Mattermost
Installation.
●​ After hosting it, connect through browser and mobile apps
(iOS/Android).
●​ Adjust firewall settings to allow port 8065 for external
connections.
●​ Interview Corner:
■​ What is EC2?
●​ Amazon Elastic Compute Cloud (EC2) provides scalable
virtual server instances in the cloud.
■​ What is an AMI (Amazon Machine Image)?
●​ Preconfigured templates containing an OS and software, used
to create EC2 instances.
■​ What are different types of AWS instances?
●​ General Purpose, Compute Optimized, Memory Optimized,
and Storage Optimized, varying in CPU, RAM, and storage to
handle different workloads.
■​ How do you connect to a remote machine?
●​ Using SSH (Secure Shell) protocol.
■​ What is SSH?
●​ A protocol for securely logging into and managing remote
servers over an encrypted connection.
■​ What is Docker?
●​ A containerization platform that packages applications and
dependencies into containers for consistent performance
across environments.
●​ Milestone6 - Next Steps
●​ Next Steps:
■​ Reflect on the experience:
●​ Was deploying the app server for QEats enjoyable?
■​ Challenge yourself:
●​ Try building the app server on your own.
■​ Explore further:
●​ Dive deeper into backend development by trying out the Java
Backend Developer Experience: Java Backend Developer
Experience.

Module 2 - QPrep - System Design


By Self
●​ GCP
○​ Google Cloud Platform (GCP) is a comprehensive suite of cloud
computing services offered by Google. It provides infrastructure, platform,
and software services to businesses and developers to build, deploy, and
scale applications and services
○​ core components and features of GCP:
■​ 1. Compute Services
●​ Google Compute Engine (GCE): Provides virtual machines
with customizable configurations, allowing users to run
workloads in the cloud.
●​ Google Kubernetes Engine (GKE): A managed environment
for deploying, managing, and scaling containerized
applications using Kubernetes.
●​ Cloud Functions: A serverless compute service that runs
code in response to events, allowing developers to build
lightweight, event-driven applications.
●​ Cloud Run: A fully managed compute platform for deploying
and scaling containerized applications without worrying about
managing infrastructure.
■​ 2. Storage Services
●​ Cloud Storage: A scalable object storage service for
unstructured data (e.g., images, videos, backups).
●​ Persistent Disk: Block storage that can be attached to virtual
machines for data persistence.
●​ Cloud SQL: Managed relational databases (MySQL,
PostgreSQL, and SQL Server) with automated backups,
scaling, and maintenance.
●​ Cloud Spanner: A globally distributed, horizontally scalable,
and strongly consistent database service for large-scale,
mission-critical applications.
●​ Cloud Firestore: A fully managed NoSQL document database
for mobile, web, and server development.
■​ 3. Networking Services
●​ VPC (Virtual Private Cloud): Allows users to create isolated
networks for their workloads.
●​ Cloud Load Balancing: Distributes incoming traffic across
multiple servers for optimal performance and availability.
●​ Cloud CDN: A content delivery network that accelerates
content delivery by caching content at the network edge.
●​ Cloud Interconnect: Provides dedicated connectivity between
GCP and on-premises infrastructure for hybrid cloud setups.
■​ 4. AI and Machine Learning
●​ AI Platform: Provides tools for building, training, and
deploying ML models, including TensorFlow integration.
●​ AutoML: A suite of ML products that allow developers to train
custom models without extensive expertise in AI.
●​ Cloud Vision API: Provides image analysis capabilities like
object detection and facial recognition.
●​ Natural Language API: Analyzes and processes natural
language for tasks like sentiment analysis, entity recognition,
and syntax analysis.
■​ 5. Big Data and Analytics
●​ BigQuery: A fully managed data warehouse for fast
SQL-based queries over large datasets, enabling real-time
analytics.
●​ Dataflow: A serverless stream and batch processing service
based on Apache Beam, designed for ETL (Extract, Transform,
Load) pipelines.
●​ Dataproc: A managed service for running Apache Hadoop and
Apache Spark clusters.
●​ Pub/Sub: A messaging service for event-driven architectures,
allowing for real-time data streaming.
■​ 6. Identity and Security
●​ Identity and Access Management (IAM): Controls who has
access to resources, defining permissions at a granular level.
●​ Cloud Identity: Manages users, devices, and apps from a
single platform, providing secure access to GCP services.
●​ Cloud Key Management: A service for managing
cryptographic keys for encrypting data and applications.
●​ Security Command Center: A tool for monitoring and
responding to security risks across GCP resources.
■​ 7. Developer Tools
●​ Cloud Source Repositories: Git-based repositories for
managing source code.
●​ Cloud Build: A continuous integration and delivery (CI/CD)
platform for automating the building, testing, and deployment
of applications.
●​ Cloud Deployment Manager: Allows users to define
infrastructure as code, deploying and managing GCP
resources with templates.
■​ 8. Hybrid and Multi-Cloud
●​ Anthos: A platform that enables consistent development and
operations across hybrid and multi-cloud environments,
allowing Kubernetes-based applications to run on GCP,
on-premises, or other clouds.
●​ Cloud Run for Anthos: Extends serverless containers to run
across environments (on-premises or in the cloud) seamlessly.
■​ 9. API Management
●​ Apigee API Management: A comprehensive platform for
designing, securing, and scaling APIs.
●​ Cloud Endpoints: A simpler API management solution for
developing and deploying APIs using NGINX.
■​ 10. Monitoring and Operations
●​ Cloud Monitoring: Provides metrics, dashboards, and alerting
to monitor infrastructure and applications.
●​ Cloud Logging: Offers real-time log management, including
storage, search, and analysis of log data.
●​ Cloud Trace: Helps developers trace and monitor distributed
applications and reduce latency.
●​ Cloud Profiler: A performance management tool that
continuously collects CPU and memory usage data from
applications.
■​ NGINX
●​ pronounced as "Engine-X"
●​ high-performance, open-source web server and reverse
proxy server. It is widely used for serving static content, load
balancing, reverse proxying, mail proxy, HTTP cache, and
more.
●​ The main goal of the NGINX project is to ensure a stable,
lightweight, and highly efficient web server for websites that
experience a huge amount of traffic.
●​ key tool in modern web development and DevOps due to its
flexibility, scalability, and efficiency
●​ Reverse proxy
○​ type of server that sits between client devices (like
web browsers) and backend servers (such as web
servers or application servers)
○​ It forwards client requests to the appropriate backend
server and then returns the server’s response to the
client.
○​ In contrast to a forward proxy, which handles outbound
traffic, a reverse proxy handles inbound traffic from
clients to servers.
●​ Reverse proxy vs Forward proxy
○​ Reverse Proxy: Handles inbound traffic, forwarding
client requests to servers. It hides the identity of
servers.
○​ Forward Proxy: Handles outbound traffic, forwarding
requests from clients to external servers. It hides the
identity of clients.
●​ Load Balancer
○​ load balancer is a critical component in cloud
computing and networking that distributes incoming
network traffic across multiple servers, ensuring no
single server becomes overwhelmed.
●​ Mail Proxy
○​ A mail proxy (or email proxy) is an intermediary
server that sits between email clients (like Outlook,
Gmail, etc.) and email servers (such as Microsoft
Exchange, SMTP, or IMAP servers).
○​ It serves to filter, route, and sometimes optimize email
traffic before it reaches its destination.
●​ HTTP cache
○​ An HTTP cache is a mechanism used to temporarily
store copies of web resources (such as HTML pages,
images, CSS files, and JavaScript)
●​ Azure
○​ cloud computing platform and service offered by Microsoft
●​ Google Compute Engine (GCE)
○​ Deploying NGINX refers to the process of installing, configuring, and running
the NGINX web server
○​ Debian : free and open-source Linux distribution
○​ Linux distribution : often abbreviated as Linux distro) is a complete
operating system built around the Linux kernel
○​ Kernel :
■​ The kernel is a fundamental component of an operating system.
■​ It acts as the bridge between the hardware of a computer and the
software applications running on it.
■​ It is a core component of an operating system that interacts directly
with the hardware but operates in the software layer.
●​ Ping
○​ ping is a way to check whether one device can communicate with another
over a network and how long that communication takes.
○​ "Ping" is a
■​ network utility that tests the reachability of a host (like a computer,
server, or virtual machine) on an IP network.
■​ It works by sending ICMP (Internet Control Message Protocol) echo
request packets to the target device and waiting for a reply.
■​ If the device is reachable, it responds with an echo reply.
■​ Azure VMs don't respond to ping by default due to security settings
○​ ICMP
■​ The ICMP protocol is typically used for diagnostic and is often used to
troubleshoot networking issues.
■​ One of the diagnostic tools using ICMP is ping

By Chatgpt
●​ Introduction
○​ Introduction to System Design Concepts
■​ Course Overview
●​ Duration: 20 hours
●​ Focus: System Design
●​ Pre-requisites: Basic Linux knowledge
■​ Key Concepts Covered
●​ Client-Server Architecture
○​ Understanding how clients and servers interact in a
network.
●​ HTTP (Hypertext Transfer Protocol)
○​ Basics of how data is transferred over the web.
●​ DNS (Domain Name System)
○​ How domain names are translated into IP addresses.
●​ NS (Name Server)
○​ Role of name servers in DNS resolution.
●​ Load Balancing
○​ Techniques to distribute network traffic across multiple
servers.
●​ Caching
○​ Strategies to temporarily store data for quicker access.
●​ Data Consistency
○​ Ensuring data remains accurate and reliable across
systems.
■​ Learning Objectives
●​ Gain a solid understanding of web architecture fundamentals.
●​ Prepare effectively for System Design interviews.
●​ Develop practical skills to address complex design scenarios.
●​ Learn to explain the process of what happens when a URL is
typed in a browser (e.g., www.google.com).
■​ Target Audience
●​ Developers seeking practical exposure to essential System
Design concepts.
●​ Host a webserver in cloud
○​ Introduction to Hosting a Web Server in the Cloud
■​ Course Overview
●​ Duration: 20 hours
●​ Focus: System Design
●​ Pre-requisites: Basic Linux knowledge
■​ Architecture Diagram
●​ Objective: Study the overall architecture to be implemented in
the upcoming modules.
■​ Objective
●​ Create a Virtual Machine (VM) in the cloud and deploy an
HTTP server.
■​ Background
●​ Virtual Machine (VM):
○​ An emulation of a physical computer providing the
same functionality.
○​ Offered by major cloud providers (Google Cloud,
Amazon Cloud, Azure).
○​ Advantages:
■​ Pay-per-use model allows flexibility in VM size
(e.g., larger VM today, smaller VM tomorrow).
■​ Primary Goals
●​ Create a VM in Google Cloud Platform (GCP).
●​ Deploy and configure an NGINX HTTP server on the VM to
serve external traffic.
○​ Signing Up for Cloud Services
■​ Google Cloud Platform (GCP)
●​ Free Trial Availability:
○​ GCP offers a free trial for students without requiring a
credit card.
○​ To sign up, submit proof of student status.
●​ Registration Instructions:
○​ Follow the link to create a free cloud account: Register
for a free trial.
●​ Credit Card Note:
○​ You may be asked for credit card details, but you will
not be charged without your consent.
■​ Microsoft Azure
●​ Alternative Signup:
○​ If unable to register for GCP, sign up for Azure, which
does not require a credit card.
●​ Registration Instructions:
○​ Visit: Azure Free for Students.
●​ Creating a VM:
○​ Follow tutorials for creating a VM: Azure VM Quick
Create.
■​ Additional Notes
●​ All subsequent instructions will be focused on GCP, but the
steps will be similar in Azure.
●​ Once the setup is complete, you can proceed to your first
milestone task.
○​ Creating an Ubuntu 18.04 VM
■​ Instructions
●​ Follow the Link:
○​ Use the provided link to create a VM in Google Cloud
Engine (GCE).
●​ Version Selection:
○​ Choose Ubuntu 18.04 LTS when prompted for the
Linux version.
■​ Task Goal
●​ Aim to SSH into the created VM and execute basic commands.
●​ You can adopt any approach or tutorial from the internet that
you find comfortable.
○​ Quiz
■​ Ping Command:
●​ Can you ping the VM you just created from your laptop?
●​ Command Example: ping <External_IP_of_VM>
●​ Output Explanation:
○​ If successful, you will see response times.
○​ If unsuccessful, the output may indicate "Host
Unreachable" if using an internal IP or if an external IP
hasn't been created yet.
○​ Static vs. Dynamic IPs
■​ Overview
●​ When you create a VM, it typically gets an external IP
address.
●​ This IP address is usually a dynamic (or ephemeral) IP.
■​ Experiment Steps
●​ Note the Current External IP:
○​ Record the external IP of the VM.
●​ Stop the VM:
○​ Shut down the VM.
●​ Reboot After 5 Minutes:
○​ Restart the VM and check the external IP again.
■​ Observation
●​ IP Change:
○​ If the external IP changes after rebooting, it’s because
dynamic IPs are reassigned by the cloud provider.
■​ References
●​ Static vs. Dynamic IP: Understand the differences and
implications.
●​ IP Addresses 101: Basics of IP addressing.
●​ Static vs. Dynamic IP
●​ IP addresses 101
●​
○​ Adding a Static IP
■​ Reserve a Static External IP:
●​ Configure the VM with a static IP to prevent it from changing
on reboot. Reserve static external IP address
■​ Ping Test:
●​ Verify that you can ping this static IP from your laptop.
○​ Quiz Insights
■​ IP Address Comparison:
●​ Commands: Use ipconfig (Windows) or ifconfig
(Linux/Mac) to get local IP.
●​ Google Search: Typing "What’s my IP" provides your public
IP.
■​ Correct Answers:
●​ Different because:
○​ ifconfig/ipconfig gives a private IP.
○​ Google gives a public IP.
○​ The WiFi router acts as a NAT gateway.
○​ One is provided by the router, the other by the ISP.
■​ IP Address Tests:
●​ Consistency Observations:
○​ Same IP address from both laptop and mobile on WiFi.
○​ 4G IP changes based on location.
○​ Laptop reboot doesn’t change the IP address.
■​ Recommended Articles
●​ Private vs. Public IP Addresses
●​ Understanding Public IP Addresses
○​ Nginx as a Web Server
■​ Overview
●​ Nginx is a versatile web server that can also function as a
reverse proxy, load balancer, mail proxy, and HTTP cache.
●​ For this task, you will configure Nginx to serve static web
content.
■​ Installation and Configuration Steps
●​ Install Nginx:
●​ Run the following command in your VM:
●​ sudo apt -y install nginx
●​ Check Nginx Server Status:
●​ Verify that Nginx is running with:
●​ sudo service nginx status
●​ Confirm Port 80 is Open:
●​ Check if the web server is working by testing port 80:
●​ telnet localhost 80
○​ Successful Connection: If you can connect, your
Nginx server is running.
○​ Connection Refused: If you see telnet: Unable
to connect to remote host: Connection
refused, then the server is not running.
■​ Useful Utilities
●​ Ping:
○​ Checks if an IP is reachable.
●​ Example:
○​ ping www.google.com
●​ Telnet:
○​ Checks if a specific port is open on an IP.
○​ Examples:
●​ Check HTTPS port:
○​ telnet google.com 443
●​ Check SSH port:
○​ telnet google.com 22
■​ Note
●​ Ensure your VM has internet connectivity for external checks.
●​ How to use Nginx?
○​ Check Accessibility to Your Nginx Site
■​ Access the Site:
●​ Open your web browser.
●​ Type http://<your_external_ip> and hit Enter to check
if you can access the website hosted with Nginx.
■​ Successful Access:
●​ If the webpage loads successfully, everything is configured
correctly. Understand why it worked.
■​ Troubleshooting Access Issues:
■​ If you cannot view the webpage, use the telnet command to check if
the appropriate port (port 80 for HTTP) is open:
●​ telnet <your_external_ip> 80
■​ Configure VM Firewall Rules:
●​ If the port is closed, configure the VM firewall to accept HTTP
traffic.
●​ This can typically be done through:
○​ Cloud provider console settings (e.g., GCP, Azure).
○​ Command line interface (CLI) commands.
■​ References:
●​ Check the documentation on Firewall Rules for detailed steps
on setting up firewall rules for your VM.
●​ Firewall rules
○​ View Your Hosted Web Page
■​ Access the Web Page:
●​ Open a web browser on your laptop.
●​ Type http://<vm-ip> in the address bar and press Enter.
■​ Confirm Successful Access:
●​ Ensure you can view the web page hosted with Nginx.
■​ Update the Web Page:
●​ Modify the content of your hosted website to match the
provided image.
■​ Next Steps:
●​ After completing the milestone tasks, proceed to the debrief
tasks for further instructions.
○​ Quiz Points
■​ File to Edit for Website Content:
●​ Correct Answer:
/var/www/html/index.nginx-debian.html
■​ Hosting on a Different Port (e.g., 4000):
●​ Yes, you can host the HTTP server on port 4000.
●​ Steps:
○​ Edit the port configuration in
/etc/nginx/sites-enabled/default.
○​ Update the firewall rules to allow traffic on port 4000.
○​ Access the site in your browser using
http://<vm_ip>:4000.
●​ Host your site in a domain and understand protocols involved
○​ Introduction to Hosting Your Site
■​ Objective:
●​ Understand the sequence of events that occur when accessing
a URL.
■​ Duration:
●​ 20 hours
■​ Focus:
●​ System Design
■​ Pre-requisites:
●​ Basic Linux knowledge
■​ Background:
●​ HTTP: Fundamental protocol for the World Wide Web;
understanding its workings and error codes is crucial.
●​ DNS: Converts domain names (e.g., www.crio.do) into IP
addresses.
■​ Primary Goals:
●​ Gain a deeper understanding of how HTTP functions.
●​ Purchase a domain name.
●​ Configure the DNS server to point the domain to your server's
IP address.
○​ Learning by Doing: Purchasing a Domain
■​ Cost:
●​ Many domain providers offer domains for under $3.
■​ Recommended Provider:
●​ GoDaddy.com is a popular choice for affordable domains.
■​ Choosing a Domain:
●​ Select an inexpensive domain unless you have future plans for
it.
■​ Importance:
●​ This step is essential for practical learning; the $3 investment
is worthwhile.
■​ Student Tip:
●​ Search for domain providers that offer free domains for
students with a valid student ID.
○​ What Happens in the Browser When Visiting Your Site
■​ HTTP Request Process:
●​ When you enter a URL, the browser sends an HTTP request to
the server.
■​ Using Developer Tools (DevTools):
●​ Access DevTools to inspect network activity and analyze
request and response details.
■​ Exploring DevTools:
●​ Look for Request Headers, Response Headers, Response
Body, etc.
■​ Sample HTTP Request-Response:
●​ Familiarize yourself with how HTTP requests and responses
are structured.
○​ Quiz Insights
■​ Status Code for http://crio.do:
●​ Answer: 301 - Moved Permanently
●​ Explanation: This indicates the browser should redirect from
unsecured (http) to secured (https).
■​ Status Code for http://<vm-ip>/welcome.html:
●​ Answer: 404 - Not Found
●​ Explanation: The server returns this when the requested page
does not exist.
■​ Identifying Nginx Version:
●​ Answer: Yes, it is nginx/1.14.0.
●​ Explanation: The server version can be found in the HTTP
Response header.
■​ Extra HTTP GET Calls for Images:
●​ Answer: 5 extra HTTP GET calls.
●​ Explanation: Each image added with <img> tags results in an
additional GET request.
■​ Error for Non-existent Domain:
●​ Answer: DNS_PROBE_FINISHED_NXDOMAIN
●​ Explanation: This error indicates that the domain name does
not exist.
○​ Configure DNS to Point to Your Site
■​ Access DNS Settings:
●​ Log in to your domain provider's dashboard.
■​ Add DNS ‘A’ Record:
●​ Create an 'A' record pointing to your VM's external IP address.
■​ Remove Existing Records:
●​ If there are any existing ‘A’ records, delete them before adding
the new one.
■​ Wait for Propagation:
●​ It may take a few minutes for DNS changes to propagate.
■​ Testing Accessibility:
●​ After configuration, access your web server by typing
http://<your-domain> in your browser.
■​ Identify Name Server (NS):
●​ Use the nslookup command in your VM or online to find the
Name Server for your domain.
■​ Understand Record Types:
●​ Familiarize yourself with why an 'A' record is used instead of
'TXT' or 'CNAME' records.
○​ Food for Thought
■​ Explore the differences between DNS record types and use the online
nslookup tool to practice with various domains.
○​ Quiz Insights
■​ True Statements About DNS Records:
●​ Answer:
○​ TXT record is not used by DNS servers to resolve
names.
○​ CNAME record results in recursive DNS name
resolution calls.
●​ Explanation: Learn more about different DNS record types
here.
■​ Understanding the dig Command:
●​ Question: What happens when you run dig +trace
@d.root-servers.net www.crio.do?
●​ Answer: This command traces the DNS queries to resolve the
domain name, providing insight into the resolution process.
More details can be found here.
○​ What Happens When You Hit a URL
■​ URL Entry:
●​ You type a URL (e.g., http://google.com) into your
browser and press Enter.
■​ DNS Resolution:
●​ The browser checks its cache for the IP address associated
with the domain.
●​ If not found, it queries a DNS server to resolve the domain
name to an IP address.
■​ Establishing Connection:
●​ The browser establishes a TCP connection to the server at the
resolved IP address, often using the HTTP or HTTPS protocol.
■​ HTTP Request:
●​ The browser sends an HTTP request to the server, requesting
the webpage.
■​ Server Processing:
●​ The server processes the request and generates a response,
often retrieving data from databases or file systems.
■​ HTTP Response:
●​ The server sends back an HTTP response, which includes a
status code (e.g., 200 for success) and the requested content
(HTML, images, etc.).
■​ Rendering the Page:
●​ The browser receives the response, interprets the HTML, and
renders the webpage for the user.
●​ If there are additional resources (like CSS, JavaScript, or
images), the browser makes further requests to fetch them.
■​ Display and Interaction:
●​ The webpage is displayed to the user, who can now interact
with it.
○​ Additional Resources
■​ Explore detailed explanations about the process of visiting a URL
through provided reading materials.
■​ Understand the roles of the client (browser), server (VM), and DNS in
the web architecture.
○​ Note
■​ After completing the Milestone tasks, proceed to the Debrief tasks for
further insights and learning.
■​ Using DevTools to inspect network activity
■​ HTTP 101
■​ Common DNS records and their uses
■​ Detailed explanation of what happens when you visit a URL
■​ What happens when you type google.com and press enter?
■​ Events that take place when you visit a URL
●​ Fun with Load Balancers
○​ Fun with Load Balancers
■​ Duration:
●​ 20 hours
■​ Focus:
●​ System Design
■​ Pre-requisites:
●​ Basic Linux knowledge
■​ Objective:
●​ Configure a simple Load Balancer using DNS.
●​ Understand the concept of Load Balancing.
●​ Optionally, set up an HTTP Load Balancer as an additional
challenge.
○​ Background
■​ Role of Load Balancer:
●​ Acts as a "traffic cop" in front of servers.
●​ Routes client requests to maximize speed and capacity
utilization.
●​ Prevents any single server from being overworked, ensuring
optimal performance.
■​ Benefits of Load Balancing:
●​ Automatically redirects traffic if a server goes down.
●​ Begins sending requests to new servers added to the server
group.
○​ Primary Goals
■​ Configure DNS Load Balancing.
■​ Configure an HTTP Load Balancer (optional).
○​ Create a Duplicate VM
■​ Objective:
●​ Clone your existing VM (VM1) to create a second VM (VM2).
■​ Instructions:
●​ Follow instructions from a reliable source to clone your VM.
●​ VM2 will serve as a second HTTP server.
■​ Customize VM2:
●​ Change the website content on VM2 to a new design or layout
(as specified).
■​ Terminology:
●​ Original VM: VM1
●​ Cloned VM: VM2
○​ Experiment: Server Goes Down
■​ Objective:
●​ Understand the impact of server downtime and how DNS Load
Balancing works.
■​ Steps:
●​ Shut Down VM1:
○​ Turn off VM1 and try loading your domain in a browser.
●​ Observe:
○​ Notice the behavior of the website when VM1 is down.
■​ Add VM2:
●​ Add VM2’s IP address as another ‘A’ record in the DNS
configuration.
■​ Restart VM1:
●​ Turn VM1 back on.
■​ Ping the Domain:
●​ Use the command ping <domain-name> to see the resolved
IPs.
■​ Observation:
●​ The website may serve from either VM1 or VM2.
●​ Pings will resolve to different IPs at different times.
○​ Inference
■​ DNS Behavior:
●​ The DNS server uses round-robin to distribute requests across
multiple IP addresses.
■​ High Availability:
●​ Even if one VM goes down, the website remains operational,
served by the other VM.
○​ References
■​ DNS Failover
■​ Load Balancing 101
■​ DNS Failover
■​ Load Balancing 101
■​
○​ Load Balancing in the Real World
■​ Types of Load Balancing:
●​ DNS Level:
○​ DNS Load Balancer.
●​ Network Level:
○​ Network Load Balancer (handles TCP/UDP
connections).
●​ HTTP Level:
○​ HTTP Load Balancer (manages HTTP requests).
■​ Using Nginx:
●​ Nginx is a popular choice for HTTP Load Balancing.
■​ Optional Task: Set Up HTTP Load Balancer in Google Cloud
●​ Configure an HTTP Load Balancer using Google Cloud Load
Balancer.
■​ Traffic Routing Configuration:
●​ Set the Load Balancer to route traffic based on one of the
following methods:
○​ Utilization.
○​ IP​
combination.
○​ Round-robin.
○​ References
■​ Limitations of DNS Load Balancer
○​ Next Steps
■​ Proceed to the Debrief task after completing the Milestone tasks.
○​ Factors Affecting Load Balancer (LB) Algorithm Choice
■​ User Request Distribution:
●​ Whether successive requests from a single user can be
directed to multiple servers.
■​ Server State:
●​ Whether the servers are stateful (maintaining session
information) or stateless (not retaining session information).
■​ Cache Server Availability:
●​ The presence of cache servers that can improve response
times and efficiency.
○​ Correct Answer Summary
■​ Multiple factors must be considered when choosing a load balancing
algorithm, particularly in a financial application involving multiple API
calls.
●​
●​ Scale your server performance with CDN
○​ Objective
■​ Handle Traffic Overloading: Implement solutions to improve website
performance under heavy traffic.
○​ Background
■​ Scaling Infrastructure: As a product grows, it's essential to scale
infrastructure appropriately based on the nature of incoming traffic.
■​ Traffic Nature Awareness:
●​ Read-Intensive Traffic: Example - Facebook homepage.
●​ Write-Heavy Traffic: Example - Payment gateway sites.
■​ Meaningful Scaling Decisions: Understanding workload
characteristics is crucial for making effective architectural choices.
○​ Primary Goals
■​ Identify Performance Bottlenecks: Analyze the server to determine
limitations.
■​ Configure a Content Delivery Network (CDN): Implement a CDN to
enhance server performance and manage traffic effectively.
○​ Sign Up with a Free CDN Provider
■​ Choose a CDN Provider: Consider using CDN77 or another free
provider of your choice.
■​ Create an Account: Follow the registration process to set up your
account.
■​ Access Free Trial: Ensure you take advantage of any free trial offers
available.
■​ Documentation: Review the provider’s documentation for setup
instructions and features.
○​ Configure Web Server to Serve a Large Image
■​ Download the Large Image:
wget
"https://raw.githubusercontent.com/drumadrian/multipart_upload_tutorial/
master/bigpic.jpg"
■​ Create an Images Directory:
sudo mkdir /var/www/html/images
■​ Copy the Image to the Directory:
sudo cp bigpic.jpg /var/www/html/images/large-file.jpg
■​ Edit the Nginx Index File:
●​ Open /var/www/html/index.nginx-debian.html and
add the following lines:
<body>​
<h1>Learn By Doing!</h1>​
<p>I hear and I forget. I see and I remember. I do and I
understand.</p>​
<div style="display:inline-flex;width:'100%;">​
<div style="width:50%;">​
Without Cache<img src="images/large-file.jpg" alt="IMG_NOT_FOUND"
style="width:100%;border:1px solid #000;" />​
</div>​
</div>​
</body>
■​ Reload Your Website:
●​ Visit your website in a browser to see the large image
displayed.
○​ Scale Down the Size of Your VM
■​ Stop the Running Instance:
●​ Access your VM management interface and stop the currently
running VM instance.
■​ Select the Smallest Available Size:
●​ Navigate to the settings or configuration options for the VM.
●​ Choose the smallest machine type available in your cloud
provider (e.g., GCE).
■​ Confirm the Changes:
●​ Save the changes and ensure that the VM is configured to the
new, smaller size.
■​ Note on Scaling Down:
●​ Understand that reducing the VM size helps visualize server
performance under low load conditions.
●​ A smaller VM requires less traffic to observe performance
slowdowns compared to a larger instance.
○​ References
■​ Change the Machine Type of a Stopped Instance: Consult your
cloud provider’s documentation for specific instructions on resizing VM
instances.
■​ Change the machine type of a stopped instance
○​ Record the Benchmark Performance of Your Site
■​ Measure Initial Load Time:
●​ Note the time taken to load your website, including the large
image.
■​ Perform Forced Refresh:
●​ Use Ctrl + Shift + R to force refresh the page multiple times.
●​ Record the load time for each refresh.
■​ Compare Load Times:
●​ Analyze the differences in load times between normal loads
and forced refreshes.
●​ Look for patterns in the download times.
■​ Observations from Crio Trial:
●​ Without cache, loading the image took over 7+ minutes.
●​ The HTTP server experienced significant slowdowns even with
just one client (your laptop).
●​ A larger VM might experience slowdowns at a higher QPS
(Queries Per Second).
■​ Conclusion:
●​ Identify whether there is a significant difference in load times
due to caching and server performance.
○​ Configure CDN to Serve Static Image Content
■​ Set Up CDN:
●​ Use the CDN provider (e.g., CDN77) to serve the large image.
■​ Configure Cache Expiration:
●​ Set a small cache expiration time to observe eventual
consistency.
■​ Copy CDN URL:
●​ Copy the CDN URL for the large image.
■​ Update HTML File:
●​ Replace the existing <div> in
/var/www/html/index.nginx-debian.html with the
following code:
■​ html​
Copy code​
<body>
■​ <h1>Learn By Doing!</h1>
■​ <p>I hear and I forget. I see and I remember. I do
and I understand.</p>
■​ <div style="display:inline-flex;width:'100%;">
■​ <div style="width:50%;">
■​ Without Cache<img src="images/large-file.jpg"
alt="IMG_NOT_FOUND" style="width:100%;border:1px
solid #000;" />
■​ </div>
■​ <div style="width:50%;">
■​ With Cache<img
src="https://1988345710.rsc.cdn77.org/images/large-f
ile.jpg" alt="IMG_NOT_FOUND"
style="width:100%;border:1px solid #000;"/>
■​ </div>
■​ </div>
■​ </body>
●​
○​ Measure Performance with CDN
■​ Load the Updated Page:
●​ Load the webpage and closely observe the load times.
■​ Perform Force Reloads:
●​ Use Ctrl + Shift + R to force reload the website multiple times.
●​ Notice if there is a significant change in loading time after
several refreshes.
■​ Compare Load Times:
●​ Compare the load time of the image with CDN (cached) versus
the original image (without cache).
■​ Analyze Results:
●​ Assess the difference in performance between loading with
and without cache.
■​ Experiment with Eventual Consistency
■​ Download Images:
■​ Use the following commands to download images:​
bash​
Copy code​
wget
"https://storage.googleapis.com/crio-assets/CrioLogo
.png"
■​ wget
"https://storage.googleapis.com/crio-assets/CrioLogo
White.jpg"
●​
■​ Create a Toggle Script:
●​ Create a script to change the image every few seconds:
■​ bash​
Copy code​
cat > toggle-images.sh <<- "EOF"
■​ #!/bin/bash
■​
■​ x=1
■​
■​ while :
■​ do
■​ if [ $x -eq 1 ]
■​ then
■​ sudo cp CrioLogo.png
/var/www/html/images/large-file.jpg
■​ x=0
■​ else
■​ sudo cp CrioLogoWhite.jpg
/var/www/html/images/large-file.jpg
■​ x=1
■​ fi
■​ echo "Changing image; ctrl c to abort"
■​ sleep 5s
■​ done
■​ EOF
■​
■​ Set Execute Permission:
●​ Make the script executable:
■​ bash​
Copy code​
chmod +x toggle-images.sh
■​
■​ Run the Toggle Script:
●​ Start the script to change the image every 5 seconds:
■​ bash​
Copy code​
sudo ./toggle-images.sh
■​
■​ Force Refresh the Website:
●​ Refresh your website several times quickly (Ctrl + Shift + R).
●​ Observe the changes in the image displayed.
■​ Note Differences:
●​ You may see two different versions of large-file.jpg:
○​ One served from your web server (master source of
truth).
○​ Another served from the CDN.
■​ Stop the Script:
●​ Once you stop the script and refresh the page, the images
should eventually match again.
■​ Understand Eventual Consistency:
●​ This phenomenon is known as Eventual Consistency, where
data will become consistent across all sources over time.
○​ References
■​ Eventual vs. Strong Consistency in Distributed Databases
■​ Relatable Example of Eventual Consistency
○​ Quiz Answers
■​ Plausibly True Statements:
●​ Facebook uses Eventual Consistency for posts appearing
on your Wall.
●​ Twitter uses Eventual Consistency for comments on your
posts.
●​ All financial institutions use Immediate Consistency for
credit/debit transactions.
■​ Explanation:
●​ Eventual Consistency is acceptable in scenarios where users
can tolerate slight delays in data updates (e.g., social media
posts).
●​ Immediate Consistency is crucial for financial transactions to
ensure data accuracy across all servers.
■​ True Statements:
●​ Sharding helps with write-heavy workloads.
●​ CDN helps with cutting down latency for read-heavy
workloads.
■​ Explanation:
●​ Sharding distributes data across multiple servers, improving
performance for write-heavy operations.
●​ CDNs reduce latency by caching and serving static content
closer to users, optimizing read-heavy workloads.
○​ Key Concepts
■​ Client-Server Architecture
●​ Fundamental model where clients request resources or
services from servers.
■​ HTTP and HTTP Status Codes
●​ Protocol for transferring data over the web; status codes
indicate the result of a request (e.g., 200 OK, 404 Not Found).
■​ DNS and Name Servers
●​ DNS translates domain names into IP addresses; name
servers manage these translations.
■​ Load Balancers
●​ Distribute incoming traffic across multiple servers to optimize
resource use and ensure availability.
○​ DNS Load Balancers: Distribute traffic based on DNS
records.
○​ HTTP Load Balancers: Distribute traffic at the HTTP
level, often considering session data.
■​ Caching and Data Consistency
●​ Caching: Storing copies of files to reduce access time.
●​ Eventual Consistency vs. Immediate Consistency:
Eventual consistency allows for temporary discrepancies, while
immediate consistency ensures all servers reflect the same
data instantly.
○​ System Design Questions
■​ Design an Image Server
●​ Consider architecture to serve millions of static images
globally.
●​ Analyze if the architecture changes for serving memes due to
differences in file sizes, access patterns, and storage.
■​ Uber App Server Selection
●​ Identify which server the Uber app contacts based on user
location (e.g., Bangalore vs. Sydney).
●​ Determine the layer (e.g., Load Balancer, DNS) responsible for
routing the request to the appropriate server.
○​ Reflection
■​ Assess whether your understanding of these concepts allows you to
answer the above questions more effectively than before.

Module 3 - Jenkins
●​ Introduction to Jenkins and CI/CD
○​ Overview
■​ Duration: 2 hours
■​ Focus: Jenkins, CI/CD
■​ Pre-requisites: None
○​ Background
■​ Evolution of Software Development:
●​ Traditional water-fall model included lengthy cycles:
○​ Month 1: Requirements
○​ Months 2-3: Development
○​ Month 3: Testing + Bug fixes
○​ Month 4: Alpha/Beta Launch + Bug fixes + Testing
○​ Month 5: Software shipped
●​ Modern development cycles can be as short as 1-2 days.
■​ Challenges of Rapid Development:
●​ Frequent updates mean developers may forget earlier code
changes by the time they go live.
○​ Continuous Integration/Continuous Deployment (CI/CD)
■​ CI/CD Concept:
●​ Automates the development, testing, and deployment process.
■​ Typical Workflow:
●​ Code Commit: Developer commits code to a git repository or
SCM.
●​ Build Process: Automated build process is triggered.
●​ Monitoring: Continuous monitoring during the build (can take
1+ hour).
●​ Testing: Runs unit tests and code quality checks post-build.
●​ Failure Notifications: If failures occur, sends an email to the
developer for fixes.
●​ Deployment: If successful, deploys code to the 'dev'
environment and runs integration tests.
■​ Key Phases:
●​ Build Phase
●​ Test Phase
●​ Deploy Phase
○​ Role of Jenkins
■​ Automation Server: Jenkins automates the CI/CD pipeline.
■​ Jobs: Each step (build, test, deploy) is accomplished through Jenkins
jobs, which can be configured with dependencies.
○​ Primary Goals
■​ Create/Modify/Debug Jobs: Focus on jobs across build, test, and
deploy phases.
■​ Experiment with Job Configurations: Play around with different job
settings.
■​ Explore Other Applications: Understand various applications and
capabilities of Jenkins in development workflows.
●​ Context: Building a Simple Calculator Application with Jenkins
○​ Repository Information
■​ Repository URL: Calculator Application Repository
■​ Branches:
●​ master:
○​ Deployed to production systems.
○​ Should have zero bugs.
●​ working_version:
○​ Used by developers for feature changes.
○​ Undergoes extensive testing before merging into
master.
○​ CI/CD Pipeline Components
■​ Jobs to Create:
●​ Build Job
●​ Test Job
●​ Deploy Job
○​ What is a Jenkins Job?
■​ A Jenkins job is a task executed to achieve objectives, such as:
●​ Building source code
●​ Running unit/integration tests
●​ Deploying code to cloud platforms
●​ Scheduling periodic backups
●​ Sending alerts (e.g., weather alerts via SMS)
■​ Job Execution
■​ A Jenkins job executes a series of commands to perform its tasks.
○​ Activity: Deployment Job
■​ Build the Job:
●​ Navigate to the Deployment job.
●​ Click Build Now.
●​ Check the Build History for the new run.
■​ Review Console Output:
●​ Click the link for the new build run.
●​ Access Console Output to see the executed commands.
■​ Analyze Commands:
●​ Look for commands following the '+' symbol.
●​ Identify where these commands are configured in the job
settings.
○​ Job Configuration Overview
■​ Accessing Configuration:
●​ Click on the Configure button in the left pane.
■​ Configuration Sections:
●​ General:
○​ Job description, parameter passing, concurrency
settings.
●​ Source Code Management:
○​ Information for cloning the repository (e.g., GitLab).
○​ Specify branches for code checkout.
●​ Build Triggers:
○​ Various methods to trigger the job (manual, webhooks,
periodic).
●​ Build:
○​ Define actions to execute (e.g., shell commands,
Gradle commands).
●​ Post-build Actions:
○​ Define actions after the job completes (e.g., triggering
another job, sending emails).
○​ Plugins in Jenkins
■​ Functionality Enhancement:
●​ Plugins extend Jenkins capabilities, similar to browser
extensions.
●​ Examples include:
○​ Deployment to AWS
○​ Ticket creation on Jira
○​ Slack notifications
○​ Beautiful test reports
■​ Note: Plugin installation may be restricted in certain environments
(e.g., Crio Labs).
○​ Handling Concurrency
■​ Distributed Builds:
●​ Jenkins uses a Master-Slave setup for resource-intensive jobs.
●​ The master server distributes jobs to slave machines, which
process them concurrently.
○​ Curious Cats
■​ Build Number:
●​ Maintained for each job run to allow independent access to
console output.
●​ Facilitates checking historical runs by build number.
■​ Job Isolation:
●​ Each job has its own directory (sandbox), ensuring no
interference unless common resources are used.
●​ Changes to shared resources (e.g., files in /etc/) may affect
multiple jobs.
○​ Reference
■​ Eventual vs. Strong Consistency in Distributed Databases
■​ Relatable example of Eventual consistency
■​ Sample question
■​ Sample question
■​
●​ Creating a Test Job in Jenkins
○​ Activity Overview
■​ Goal: Create a test job in Jenkins.
■​ Job Name Format: <your_email_id>-test-build-job
●​ Remove all special characters and domain name from your
email.
●​ Example: crio.beaver@gmail.com →
criobeaver-test-build-job
○​ Job Requirements
■​ Clone Repository:
●​ Clone the repository:
https://gitlab.crio.do/crio_bytes/me_jenkins.g
it.
■​ Branch Checkout:
●​ Checkout the appropriate branch based on the Build
parameter.
■​ Run JUnit Tests:
■​ Execute JUnit tests using the Gradle wrapper: ./gradlew test
■​ On Success:
●​ Publish JUnit Test Results:
○​ Test results can be found in:
/build/test-results/test/TEST-*.xml.
●​ Trigger Deployment Job:
○​ Trigger the AWS deployment job:
deploy-build-job-CRIO-DO-NOT-MODIFY.
■​ On Failure:
●​ Publish JUnit Test Results:
●​ Send Email Notifications:
○​ Email to:
■​ jenkins-lab-auto-emails@criodo.com
■​ Your email address (use space to separate
email addresses).
○​ Testing the Job
■​ Run the Job:
●​ Test on both branches by passing different parameters.
●​ Expected Outcomes:
○​ Tests on the master branch should run successfully
and trigger deployment.
○​ Tests on the working_version branch will fail, and
you should receive an email notification.
○​ Configuration Insights
■​ Parameter Usage:
●​ Observe how $BRANCH is used to get the passed parameter in
the job configuration.
■​ Local Verification:
●​ If unsure of the path for generated test XML files, clone the
repository locally and run the same commands to find the XML
file location.
■​ Viewing Test Results:
●​ Access JUnit test results from the left pane via the ‘Test Result’
icon.
○​ Quiz Question
■​ What was the reason for the test failure in
com.java.calculator.CalculatorTest.testDivisionByZer
o()?
●​ Correct Answer:
org.opentest4j.AssertionFailedError: Expected
com.java.calculator.exception.DivisionByZeroExce
ption to be thrown, but nothing was thrown.
●​ Your First CI/CD Setup in Jenkins
○​ Overview
■​ Congrats! You've created your first Jenkins job that runs tests.
■​ CI/CD Stages Completed:
●​ Build Job
●​ Test Job - ✓
●​ Deploy Job - ✓
○​ Build Job
■​ Purpose:
●​ Integrates GitLab with Jenkins for building applications.
●​ Typically involves tasks like creating JAR files, publishing
artifacts, building Docker images, and performing static code
analysis (e.g., Checkstyle, Spotbugs).
■​ Configuration Similarities:
●​ Similar to the test job but focused on running the Gradle build
task.
○​ Activity
■​ Check Build Triggers:
●​ Examine the Build triggers section of the Build job.
●​ Observation: Builds are triggered when developers push code
changes or create a merge request.
■​ Integration Setup:
●​ GitLab (and other SCMs) easily integrate with Jenkins.
●​ Configure Jenkins integration in GitLab to automate build
triggers.
○​ Quick Recap of CI/CD Pipeline
■​ Developer Actions:
●​ Developer commits code to
https://gitlab.crio.do/crio_bytes/me_jenkins or
creates a merge request.
■​ Build Job Triggered:
●​ GitLab triggers the build-job-CRIO-DO-NOT-MODIFY.
■​ Test Job Execution:
●​ The test-build-job-CRIO-DO-NOT-MODIFY is run either
periodically or after a successful build.
■​ Deployment:
●​ If all tests pass, the
deploy-build-job-CRIO-DO-NOT-MODIFY is executed.
○​ Key CI/CD Terms
■​ Continuous Integration (CI):
●​ Developers frequently merge changes to the main branch,
validating with builds and automated tests to prevent
integration issues.
■​ Continuous Delivery (CD):
●​ Automatically deploys code changes to testing and/or
production environments post-build.
■​ Continuous Deployment:
●​ Every change that passes all stages is automatically released
to customers without human intervention.
○​ Reference
■​ Learn More: What is CI/CD and what are its benefits?
●​ Other Uses of Jenkins
○​ Jenkins has a wide range of applications beyond CI/CD setups. Here are
some notable examples:
○​ Drop-in Replacement for Cron Jobs:
■​ Scheduled Tasks: Jenkins can execute jobs at specific times
periodically, such as:
●​ Backing up servers every 4 hours.
●​ Identifying and removing orphaned resources in the cloud to
save costs.
●​ Performing disk cleanup on servers.
●​ Running health checks on infrastructure.
○​ Offline Jobs:
■​ Handling Synchronous vs. Offline Tasks: Large companies often
have both real-time and offline tasks.
■​ Example: In online payment processing (e.g., credit card payments
on Amazon):
●​ Users receive immediate payment confirmation (synchronous
task).
●​ Offline jobs run nightly to reconcile credit card payments with
banks.
■​ Parallel Execution: Jenkins can parallelize these offline jobs across
multiple slave machines for efficiency.
●​ Why Improve Debugging Skills in Jenkins
○​ Importance of Debugging:
■​ Configuring Jenkins jobs often requires troubleshooting.
■​ Reading console logs is crucial for identifying and isolating problems
quickly.
○​ Exercise: Failure Job #1
■​ Activity:
●​ Run the job labeled Failure Job #1.
●​ Identify the reason for the failure.
■​ Observation:
●​ Review the console logs.
●​ Key finding: The error indicates that the docker command
was not found.
■​ Conclusion:
●​ This suggests that the Jenkins system likely doesn’t have
Docker installed.
●​ Action: Contact the Jenkins administrator to install Docker
(e.g., using sudo apt install docker).
○​ Exercise: Failure Job #2
■​ Activity:
●​ Run the job labeled Failure Job #2.
●​ Determine the cause of the failure.
■​ Observation:
●​ Scroll through the console logs.
●​ Key finding: The error states that checkStyleMain is not
found in the project.
■​ Debugging Steps:
●​ Replicate Jenkins steps locally to identify the issue.
■​ Clone the GitLab repository:
git clone https://gitlab.crio.do/crio_bytes/me_jenkins.git​
cd me_jenkins
●​
■​ Run the following commands:
./gradlew build​
./gradlew test​
./gradlew checkStyleMain
■​
●​ Identify that checkStyleMain is not a valid task.
■​ Action:
●​ Update the build.gradle file to resolve the issue before
re-running the job in Jenkins.
●​ Interview Corner: Jenkins Cheat-Sheet
○​ Resource: Jenkins Cheat-Sheet
○​ Quiz Questions
■​ Which parameter decides how many concurrent jobs you can run
on Jenkins?
●​ Correct Answer: # of executors
●​ Explanation:
○​ An executor is a process that runs a build/job.
○​ Example: With one executor, only one job runs at a
time; with two executors, two jobs can run in parallel.
■​ When would you go for Jenkins Pipeline?
●​ Correct Answer:
○​ When you need to define the whole application lifecycle
with complex dependencies.
○​ When you want to code the configuration.
○​ When you require robustness, such as automatic
resume after unexpected server restarts.
●​ References
○​ https://wiki.jenkins.io/display/JENKINS/Distributed+builds
○​

Module 4 - Docker Introduction


●​ Overview
○​ Introduction: Learn Docker Basics
■​ Duration: 8 hours​
Focus: Containers​
Pre-requisites: None
○​ Objective:
■​ Understand the Docker philosophy and basic Docker commands.
○​ Background:
■​ Containers are widely used in the cloud computing world, crucial for
modern backend development.
■​ Containers are essential for creating scalable and serverless systems
like Kubernetes, Docker Swarm, and Cloud Run.
○​ Key Concepts:
■​ Importance of Containers:
●​ Containers are essential for modern backend development.
●​ Almost all cloud systems use containers or serverless
technology.
■​ Evolution and Advantages:
●​ Learn why containers are used and their evolution.
●​ Understand their advantages over traditional systems.
■​ Docker Syntax and Files:
●​ Understand the structure and syntax of Dockerfiles.
■​ Container Creation:
●​ Learn how to create basic containers from your code or Docker
images.
■​ Publishing with Containers:
●​ Learn how to publish your code using containers.
○​ Primary Goals:
■​ Familiarize yourself with container terminology.
■​ Learn the basics of Docker and how to create containers.
■​ Understand and work with Dockerfiles.
■​ Publish your code using containers.
●​ Setup1 - Get started with the Linux terminal
○​ Getting Started with Docker Byte:
■​ Two Working Options:
●​ Option 1: Gitpod (Recommended):
○​ Online workspace environment.
○​ No setup required for Docker installation.
○​ Use Gitpod by signing in with GitHub.
●​ Option 2: Local Setup:
○​ Install Docker Desktop on your local system.
○​ Download Docker Desktop from here.
○​ Ensure Docker Desktop is running when working on the
project.
■​ Running Docker Desktop:
●​ When Docker Desktop is active, check the bottom left for
status: "Running."
●​ If it shows "Failed to start," follow specific troubleshooting
steps provided in the guide.
■​ Important Note:
●​ Docker commands are not supported in Crio Workspace.
●​ Choose either Gitpod or Docker Desktop for working on this
Byte.
●​ Milestone1 - Know the basics
○​ Why Containers?
■​ Case Study: QMoney:
●​ Initial Setup:
○​ QMoney, a portfolio management firm, decided to
enable user trading with stock exchanges.
○​ They integrated a Python library for trading while
primarily using Java for development.
○​ This led to hosting a separate microservice in Python
for trading functionality.
■​ Complications Arising:
●​ Multiple Servers:
○​ Developers ended up managing three different servers:
■​ Server 1: Ubuntu 18 + Java 11 + QMoney
REST APIs
■​ Server 2: Ubuntu 18 + Python 3.4 + Trading
APIs
■​ Server 3: Ubuntu 20 + Python 3.8 + TensorFlow
for technical analysis
●​ This setup created challenges in executing end-to-end
workflows due to version incompatibilities.
■​ Need for Solutions:
●​ Developers struggled with maintaining multiple environments
and dependencies.
●​ The complexity of managing different versions of Java and
Python on the same machine increased significantly.
○​ Evolution of Containerization
■​ Challenges with Virtual Machines:
●​ Virtualization: Older solutions involved setting up separate
machines for different environments using tools like VirtualBox
or VMware.
●​ Problems with Virtual Machines:
○​ Resource-intensive (heavy on CPU and RAM).
○​ Difficult to manage and replicate environments.
■​ Introduction of Containers:
●​ Containers emerged as a lightweight alternative to virtual
machines.
●​ They encapsulate applications and their dependencies,
allowing them to run consistently across different environments
without the overhead of a full operating system.
■​ Key Concepts:
●​ Containers provide a solution to the challenges faced by
QMoney developers.
●​ They enable seamless deployment of applications regardless
of underlying infrastructure, improving efficiency and scalability.
■​ Further Exploration:
●​ For a deeper understanding, watch the videos on
Containerization Challenges and Evolution of Containerization.
●​ Milestone2 - Feel the pain
○​ Feel the Pain of Going Without Docker
■​ Installation Example: RabbitMQ:
●​ Follow the guide to install RabbitMQ on Ubuntu: Install
RabbitMQ.
●​ Steps Involved:
○​ Install Erlang.
○​ Install RabbitMQ.
○​ Setup necessary ports and configurations.
■​ Challenges Faced:
●​ Expect difficulties with:
○​ Sudo Privileges: Need admin rights for installations.
○​ User Settings: Configuration issues may arise.
■​ Configuration Complexity:
●​ Installing software involves several tasks:
○​ Install executables.
○​ Set up default configurations.
○​ Create user groups with specific access.
○​ Configure system services and startup scripts.
●​ Some configurations are irreversible, leading to residual
components even after uninstallation.
■​ System Pollution:
●​ Each installed software can clutter your system, making it
harder to manage dependencies and configurations.
○​ Advantages of Containers
■​ Single Command Installation:
■​ To run RabbitMQ using Docker, use the command:
docker run -d --hostname my-rabbit --name some-rabbit
rabbitmq:3
●​ This command downloads and runs a RabbitMQ server quickly
without the complexities of manual installation.
■​ Ease of Use:
●​ Note: Use sudo before Docker commands if facing
permissions issues.
■​ Applying to QEats Application:
●​ Running the QEats app in production involves multiple
dependencies (Java, MongoDB, RabbitMQ, Redis).
■​ Instead of lengthy installations, run it with a single command:
docker run -p 8081:8081 criodo/qeats-server
■​ Testing the QEats API:
■​ Open another terminal and test the QEats API:
curl -X GET
"http://localhost:8081/qeats/v1/restaurants?latitude=2
8.4900591&longitude=77.536386&searchFor=tamil"
●​ Observation: The API works without manual installations;
Docker handles all dependencies and configurations
automatically.
●​ Milestone3 - Getting started
○​ Create Your First Container
■​ Set Up Directory:
■​ Navigate to your workspace and create a directory:
cd ~/workspace​
mkdir docker-apache && cd docker-apache
■​ Create a Dockerfile:
touch Dockerfile
■​ Edit Dockerfile:
●​ Add the following content to your Dockerfile:
FROM httpd:2.4​
RUN echo "Hello World" > /index.html​
RUN cp /index.html /usr/local/apache2/htdocs/
■​ Alternatively, you can use:
echo -e "FROM httpd:2.4\n \​

RUN echo 'Hello World' > /index.html \n \​

RUN cp /index.html /usr/local/apache2/htdocs/" >
Dockerfile
■​ Build the Docker Image:
■​ Run the build command:
cd ~/workspace/docker-apache​
docker build -t apache-server .
●​ Check the output for successful build messages.
■​ Run the Container:
■​ Start the container using:
docker run -p 80:80 apache-server
■​ Access the Application:
●​ Test by opening a browser or using curl:
○​ curl http://localhost
■​ List Docker Images:
■​ Run the following command to see all images:
○​ docker images -a
●​ Note the additional images generated during the build
process.
■​ Rebuild the Image:
■​ Build the image again:
○​ docker build -t apache-server .
●​ The process finishes quickly due to caching (known as
layers in Docker).
○​ Introduction to Docker Layers
■​ Understanding Layers:
●​ Each command in a Dockerfile creates a layer.
●​ Layers help speed up builds by caching results and reusing
them.
■​ Clone and Explore Spring Boot Application:
●​ Navigate to your workspace and clone the repository:
cd ~/workspace
git clone
git@gitlab.crio.do:public_content/spring-start
er.git
cd spring-starter
●​ Examine the Dockerfile in the cloned repository.
■​ Review Dockerfile Contents:
●​ The Dockerfile includes steps to build a Spring Boot
application:
○​ Base image: FROM gradle:jdk11-focal
○​ Environment variable for non-interactive installs: ARG
DEBIAN_FRONTEND=noninteractive
○​ Install required packages and MongoDB.
○​ Create a directory and copy application code.
○​ Build the application and define the entry point.
■​ Build Spring Boot Image:
●​ Create the Docker image:
○​ docker build -t spring-starter .
■​ Explore Docker Hub:
●​ Visit Docker Hub to find your image.
●​ Click on "tags" to view image details.
■​ Understanding Layers:
●​ Changes made during image creation are recorded as layers.
●​ Layers are similar to Git commits, enabling efficient storage
and versioning.
○​ Push Your Own Images to Docker Hub
■​ Create Docker Hub Account:
●​ Sign up at Docker Hub.
■​ Login via Terminal:
●​ Use the command:
●​ docker login
■​ Tag Your Image:
●​ Tag the image for pushing:
●​ docker tag <image_id>
<your_docker_hub_id>/spring-starter
■​ Push the Image:
●​ Push your image to Docker Hub:
●​ docker push <your_account>/spring-starter
■​ Verify Upload:
●​ Check your Docker Hub account to confirm the image upload.
●​ Milestone4 - Know the commands
○​ Common Docker Commands
■​ Overview
●​ Working with Docker involves more than just starting and
stopping containers; it includes various features for container
management, execution, and communication.
■​ Categories of Docker Commands
●​ Container Management:
○​ docker ps / docker ps -a: Lists all
running/stopped containers.
○​ docker rm: Removes a specified container from the
workspace (similar to VM termination).
○​ docker rmi: Removes the underlying image from the
cache.
●​ Container Execution:
○​ docker run: Runs a container with various options,
including:
■​ --volume: Shares the filesystem with the
container.
■​ --port: Maps container ports to host ports.
■​ --name: Assigns a name to the container.
■​ --attach: Attaches to the container's output.
■​ --rm: Automatically removes the container after
it exits.
○​ docker stop: Stops a running container.
○​ docker start: Starts a stopped container.
●​ Container Communication:
○​ docker exec: Connects to a running container to
execute a command.
○​ docker cp: Copies files to/from a running container.
■​ Additional Information
●​ For a complete list of options for a specific command,
use:docker <command> --help
○​ Quiz Highlights
■​ COPY Command Placement:
●​ Question: Where should you place the COPY command in a
Dockerfile?
●​ Correct Answer: Just before the CMD command.
●​ Explanation: The COPY command invalidates the cache for
subsequent layers, making it efficient to place it towards the
end of the Dockerfile.
■​ Resulting Layers from Two Dockerfiles:
●​ Question: How many resulting layers will there be from two
given Dockerfiles?
●​ Correct Answer: 5 layers.
●​ Explanation: Each command creates a new layer. Even
identical commands in different Dockerfiles create separate
layers due to different checksums, leading to a total of 5 layers
when combined.

Module 5 - Docker Advanced


●​ Overview
○​ Advanced Docker Overview
■​ Objective
■​ Gain familiarity with various advanced Docker commands and
container management.
○​ Background
■​ Working with containers extends beyond starting and stopping them; it
involves a wide range of functionalities.
■​ Understanding how to share the filesystem and utilize Docker
commands is essential.
○​ Key Docker Commands
■​ Container Management:
●​ docker build: Used for building images (previously
discussed).
●​ docker ps / docker ps -a: Lists all running and stopped
containers.
●​ docker rm: Removes a specified container (similar to VM
termination).
●​ docker rmi: Removes an image from the cache.
■​ Container Execution:
●​ docker run: Starts a container with various options,
including:
○​ --volume: Shares filesystem with the container.
○​ --port: Maps container ports to host ports.
○​ --name: Assigns a name to the container.
○​ --attach: Attaches to the container's output.
○​ --rm: Automatically removes the container after exit.
●​ docker stop: Stops a running container.
●​ docker start: Starts a previously stopped container.
■​ Container Communication:
●​ docker exec: Connects to a running container to execute
commands.
●​ docker cp: Copies files to or from a running container.
●​ Primary Goals
■​ Play with Docker containers.
■​ Understand commands for managing containers.
■​ Learn to publish, delete, and republish container images while
grasping the underlying infrastructure.
●​
●​ Setup1 - Get started with the Linux terminal
○​ Getting Started with Docker
■​ Two Setup Options:
●​ Gitpod (Online Workspace): No installation required.
○​ Click on the provided Gitpod link.
○​ Complete GitHub authorization.
○​ A new Gitpod workspace will be created automatically.
●​ Local Setup:
○​ Install Docker Desktop from the official Docker website.
○​ Ensure Docker is running (check the bottom left corner
for confirmation).
■​ Docker Desktop:
●​ If Docker fails to start, follow troubleshooting steps (hack
provided).
■​ Note:
●​ Docker commands are not supported in Crio Workspace, so
use either Gitpod or Docker Desktop for the project.
●​
●​ Milestone1 - Container management
○​ Container Management Steps
■​ Running the Container:
●​ Command: docker run -p80:80 apache-server
●​ This runs the container and maps port 80 on the host to port
80 in the container.
■​ Checking Running Containers:
●​ Command: docker ps
●​ Lists all running containers.
■​ Checking Stopped Containers:
●​ Command: docker ps -a
●​ Lists all containers, including stopped ones.
■​ Starting a Stopped Container:
●​ Command: docker start <container_id>
●​ Use the container ID from the docker ps -a output to start a
stopped container.
■​ Port Mapping:
●​ Example: -p80:80
●​ Maps host port 80 to guest (container) port 80. This cannot be
changed after the container starts.
■​ Testing Port Mapping:
●​ Command: curl localhost
●​ If the mapping works, this command returns the content served
on port 80 (e.g., the Apache server homepage).
■​ Stopping and Removing Containers:
●​ Commands:
○​ docker stop <container_id> – Stops the running
container.
○​ docker rm <container_id> – Removes the
stopped container.
■​ Confirming No Containers Are Running:
●​ Command: docker ps -a
●​ Ensure no containers are listed after removal.
■​ Modifying the Dockerfile:
●​ Comment out the lines that write content to
/usr/local/apache2/htdocs.
■​ Rebuilding the Image:
●​ Command: docker build -t apache-server .
●​ Rebuild the image after Dockerfile modifications.
■​ Verifying Image Tag:
●​ Command: docker images
●​ Check that the image is correctly tagged as apache-server.
■​ Running the New Container:
●​ Command: docker run -p80:80 apache-server
●​ This starts the container with the updated image.
■​ Accessing Apache Default Page:
●​ Command: curl localhost
●​ You will see the Apache server's default homepage if
everything is set up correctly.
●​ Milestone2 - Playing with running container
○​ Playing with a Running Container
■​ Listing Running Containers:
●​ Command: docker ps
●​ Note the container ID from the output.
■​ Executing a Command in a Running Container:
●​ First attempt: docker exec <container_id> bash
●​ The command executes but nothing is returned because of
missing parameters.
■​ Checking Docker Exec Parameters:
●​ Command: docker exec --help
●​ Important flags:
○​ -d: Run in non-interactive mode.
○​ -e: Set environment variables.
○​ -i: Run in interactive mode (keeps standard I/O
connected).
○​ -t: Allocate a terminal to the running process.
■​ Running an Interactive Terminal:
●​ Correct command: docker exec -it <container_id>
bash
●​ Opens a bash terminal inside the container.
■​ Exploring Inside the Container:
●​ Use standard commands like ls, cd, and cat to explore files
within the container.
●​ Example: Check the content of index.html inside
/usr/local/apache2/htdocs/.
■​ Exiting the Container:
●​ Command: exit
●​ Exits the terminal session within the container.
■​ Creating a Simple File on Host Machine:
●​ Command: echo "This is my simple html file" >
~/index.html
●​ Check the content with cat ~/index.html.
■​ Copying File from Host to Container:
●​ Command: docker cp ~/index.html
<container_id>:/usr/local/apache2/htdocs/index
.html
●​ Copies the file index.html from the host system to the
container.
■​ Verifying Changes Inside the Container:
●​ Reconnect to the container using: docker exec -it
<container_id> bash
●​ Check the updated file using: cat
/usr/local/apache2/htdocs/index.html.
■​ Testing the Updated Content:
●​ Command: curl localhost
●​ Confirms if the changes reflect in the Apache server running
inside the container.
●​ Milestone3 - Container execution commands
○​ Container Management Commands and Concepts
■​ Stopping a Running Container:
●​ Command: docker stop <container_name>
●​ Example: docker stop upbeat_wescoff (container name
generated by Docker).
■​ Automatic Container Naming:
●​ Docker assigns cool names to containers if not provided by the
user.
●​ You can check this using the command: docker run
--help | grep name.
■​ Running a Container with a Custom Name:
●​ Command:
●​ docker run -p<host_port>:<container_port>
--name <custom_name> <image_name>.
●​ Example:
●​ docker run -p8080:80 --name my-apache-server
apache-server.
■​ Mounting a Volume (File Sharing between Host and Container):
●​ Command:

docker run -v <host_directory>:<container_directory>


<image_name>.

●​ Example:

docker run -v html:/usr/local/apache2/htdocs

●​ apache-server mounts the local html folder to the


container's Apache server directory.
■​ Creating Files and Directories on Host System:
●​ Create file: echo "This file is hosted from host
system" > index.html
●​ Move file: mkdir html && mv index.html html
■​ Accessing the Hosted Content:
●​ Check running containers: docker ps
●​ Access content via browser or command line: curl
localhost:8080
■​ Stopping and Removing Containers:
●​ Stop container: docker stop my-apache-server
●​ Remove container image: docker rmi apache-server
■​ Listing and Removing Images:
●​ List all images (including unused ones): docker images -a
●​ Remove unused images: docker image prune
●​ Remove all unused containers and images: docker system
prune
■​ Understanding Dockerfile and Build Process:
●​ Example Dockerfile snippet:
FROM gradle:jdk11​
ENV DEBIAN_FRONTEND=noninteractive​
RUN apt-get update && apt-get -y upgrade​
RUN apt-get -y install git redis-server wget​
git clone
git@gitlab.crio.do:public_content/spring-starter
.git​
cd spring-starter​
chmod +x gradlew​
./gradlew bootjar

●​ Will it pick up the latest version of the spring-starter


repository?
○​ No, because the Docker build process uses cached
layers. The git clone command will not re-fetch the
latest commit unless there is a change in the Dockerfile
before the git clone step.
●​ Milestone4 - Know the commands
○​ Introduction to Docker Orchestration
■​ Docker offers much more functionality than just starting and stopping
containers.
■​ Filesystem Sharing: Docker supports sharing the filesystem between
host and container.
■​ To explore more, run: docker --help.
○​ Key Docker Command Categories:
■​ Container Management:
●​ docker build: Builds Docker images (covered previously).
●​ docker ps: Lists running containers.
●​ docker ps -a: Lists all containers, including stopped ones.
●​ docker rm <container_id>: Removes a stopped
container.
●​ docker rmi <image_id>: Removes an unused image from
the cache.
■​ Container Execution:
●​ docker run: Runs a container with multiple options like:
○​ Volume: Share directories/files between host and
container.
○​ Port: Map ports between host and container.
○​ Name: Assign a custom name to the container.
○​ Attach: Attach to the container's I/O.
○​ rm: Automatically remove the container when it stops.
●​ Full list: docker run --help.
●​ docker stop <container_id>: Stops a running container.
●​ docker start <container_id>: Starts a stopped
container.
■​ Container Communication:
●​ docker exec <container_id> <command>: Executes a
command inside a running container.
●​ docker cp <source_path>
<container_id>:<destination_path>: Copies files
between the host and container.
○​ Quiz Insights:
■​ Apache Server Access:
●​ Works seamlessly:
○​ When Apache server runs on the host, and accessed
via http://127.0.0.1.
○​ When Apache runs in a container, and another process
in the same container accesses it via
http://127.0.0.1.
○​ Explanation: 127.0.0.1 is a loopback address for the
same machine. To access host/container ports, they
need to be explicitly exposed, and 127.0.0.1 refers to
localhost inside the container, not the host.
■​ Removing a Docker Container:
●​ Command: docker rm <container_id> (removes a
stopped container).
●​ docker rmi <image_id> is for removing an image.
●​ No commands like terminate or kill exist in Docker for
container removal.

Module 6 - Kafka
●​ Overview
○​ Introduction to Apache Kafka
■​ Duration: 4 hours
■​ Focus: Apache Kafka
■​ Pre-requisites: None
■​ Objective: Understand core concepts of Kafka and gain hands-on
experience with its key features to explore its use cases.
○​ What is Apache Kafka?
■​ Kafka is an event streaming platform designed to:
●​ Read, write, store, and process event streams.
●​ Widely used for financial transactions, sensor data, and
data-driven architectures.
○​ Kafka Architecture Overview:
■​ Producers: Generate and send events (messages) to topics
(queues).
■​ Consumers: Receive and process the events.
■​ Topics: A queue to which events are sent. Topics are replicated
across partitions for scalability and fault tolerance.
■​ Partitions: A topic can have multiple partitions, and each has at least
one replica to ensure high availability.
■​ Brokers: Store messages from producers on disk and make them
available to consumers.
■​ Cluster: A group of brokers that form the Kafka environment.
○​ Key Concepts to Explore:
■​ Event-Driven Architecture: Kafka allows handling events
asynchronously, supporting decoupled systems.
■​ Stream Processing: Kafka’s capability to handle continuous streams
of data efficiently.
○​ Primary Learning Goals:
■​ Understand the need for a stream processing platform, message
brokers, or pub/sub systems.
■​ Learn how Kafka enables:
●​ Asynchronous workloads.
●​ Event-driven architecture.
●​ Stream processing in scalable systems.
○​ Prerequisites:
■​ Basic knowledge of Python is helpful for hands-on tasks.
●​ Setup1 - Setup & Getting Started
○​ Getting Started with Kafka on Docker
■​ Two Setup Options:
●​ Gitpod (Recommended): An online workspace that eliminates
the need for local setup of Docker or Kafka. Just click the
provided link to start a Gitpod workspace and skip the local
setup steps.
●​ Local Setup: Requires installing Docker and Kafka on your
machine.
■​ Understanding Docker:
●​ Docker allows packaging your code along with its
dependencies, libraries, and environment variables into a
container.
●​ A Docker image contains instructions for setting up and
running the container.
■​ Install Docker Desktop:
●​ Download Docker Desktop from here.
●​ Ensure Docker is running to avoid errors.
●​ Verify the Docker status in the bottom-left corner of the Docker
Desktop window. If it shows "Failed to start," follow the
provided hack to fix it.
■​ Installing Kafka:
●​ Download Kafka from here.
●​ Windows Users: Go to /kafka/bin, copy the path, and add
it to your Environment Variables (Add bin to path).
●​ Linux Users: Add Kafka to the system path (more details
provided).
■​ Pull Kafka Docker Image:
●​ Open a terminal and pull the Kafka Docker image using:
■​ docker pull spotify/kafka
●​
■​ Start Kafka Docker Container:
●​ Start a Kafka container using this command (run once):
■​ docker run -p 2181:2181 -p 9092:9092 --env
ADVERTISED_HOST=localhost --env ADVERTISED_PORT=9092
spotify/kafka
●​
●​ At this point, Kafka should be running in the container.
■​ Python Setup:
●​ Ensure Python 3 is installed. If not, download it from here.
●​ Clone or download the repo containing code snippets for this
byte from GitLab.
●​ If you encounter permission errors, add an SSH key to your
GitLab account.
■​ Set Up Python Virtual Environment:
●​ Create a virtual environment using the instructions here.
■​ Activate the virtual environment and install the required dependencies:
■​ pip install -r requirements.txt
●​
■​ Basic Docker Commands:
■​ Show running containers: docker ps
●​
■​ Show all containers (including stopped ones): docker ps -a
●​
■​ Start a container docker start <CONTAINER_ID>
●​
●​ Stop a container: docker stop <CONTAINER_ID>
●​
●​ Milestone1 - Introduction to Apache Kafka
○​ Event Streaming and Kafka Overview
■​ What is Event Streaming?
●​ Capturing data in real-time from sources (databases, sensors,
etc.) as streams of events.
●​ Storing these streams for later retrieval.
●​ Processing and reacting to the event streams in real-time or
retrospectively.
●​ Routing streams to various destinations.
●​ Ensures the right data reaches the right place at the right time.
■​ Introduction to Kafka:
●​ Kafka stores streams of events in real-time (e.g., tweets,
playlist updates) rather than static snapshots like databases.
●​ Three main functions of Kafka:
○​ Publish/Subscribe: Write and read event streams.
○​ Storage: Store streams reliably.
○​ Process: Process events in real-time or retrospectively.
■​ Kafka Architecture:
●​ Producers publish events.
●​ Consumers subscribe to events.
●​ Topics organize events (similar to folders, events are the files).
●​ Partitions allow topics to be divided for scalability.
●​ Kafka ensures durability and fault tolerance through
partitioning and replication.
■​ Example: Twitter Model:
●​ User tweets → Backend stores tweet → Append tweet to
followers' timelines.
●​ Problem: Manually updating timelines for large numbers of
followers (e.g., 100,000) is time-consuming, leading to
performance issues.
■​ Solution with Kafka:
●​ Treat a tweet as an event.
●​ Use Kafka as a queue to asynchronously publish and
consume events (tweets).
●​ Timeline Service consumes the tweet event and updates
followers' timelines without blocking the API call.
○​ Kafka Setup and Hands-on Implementation
■​ Running a Kafka-based Twitter Service:
●​ Use the Flask app provided in the kafka-byte repo to simulate
a Twitter-like service.
●​ The /tweets API endpoint allows posting a tweet and
updating follower timelines.
■​ Steps to Run the Flask App:
■​ Run the Flask app:
■​ export FLASK_APP=beaver_twitter.py
■​ flask run
●​
■​ Use cURL to send a POST request simulating a tweet:
■​ curl --location --request POST
http://localhost:5000/tweets \
■​ --header 'Content-Type: application/json' \
■​ --data-raw '{ "author_id": "1223", "text": "foobar
#cat @dog" }'
●​
■​ Testing with Different Follower Counts:
●​ Increase the follower count in twitter_core.py to simulate
how long it takes to update timelines for 1000, 100,000
followers.
●​ Observing how response time increases highlights the need for
asynchronous processing using Kafka.
○​ Kafka Implementation
■​ Create a Kafka Topic:
●​ Ensure Docker container is running (check with Docker
Desktop or docker ps).
■​ Create a topic called "tweets":
■​ kafka-topics.sh --bootstrap-server localhost:9092
--create --topic tweets --partitions 2
--replication-factor 1
●​
■​ Verify the topic creation
■​ kafka-topics.sh --bootstrap-server localhost:9092
--list
●​
■​ Kafka Producer (Publishing Tweets):
■​ Implement the Kafka producer in beaver_twitter.py:
■​ producer =
KafkaProducer(bootstrap_servers=['localhost:9092'],
value_serializer=lambda v:
json.dumps(v.__dict__).encode('utf-8'))
■​ producer.send('tweets', tweet)
●​
●​ Comment out the synchronous timeline update logic to rely on
Kafka for event processing.
■​ Kafka Consumer (Updating Timelines):
■​ Use timeline_service.py to consume the tweet event from the
Kafka topic and update follower timelines:
■​ python timeline_service.py
●​
●​ Send a new tweet using cURL and observe the tweet being
processed by the consumer.
■​ Performance Benefits of Kafka:
●​ API will return immediately even with large follower counts, as
Kafka processes timeline updates asynchronously.
●​ Kafka helps eliminate the need for blocking the producer while
waiting for consumers to finish processing.
○​ Additional Kafka Exploration
■​ Create and Experiment with New Topics:
●​ Create additional topics using the same kafka-topics.sh
command.
●​ Experiment with filtering messages by specific topics.
■​ Kafka: Push or Pull-based?
●​ Kafka uses a pull-based system, where consumers fetch
messages from the Kafka broker as needed.
●​
●​ Milestone2 - Add more consumer services
○​ Multiple Consumers in Kafka
■​ Concurrent Access by Multiple Consumers:
●​ Kafka allows multiple consumers to access the same producer
stream simultaneously without affecting one another.
●​ This is crucial for applications like Twitter that need different
services (e.g., trending topics, real-time ads, analytics) to
consume events from the same topic concurrently.
■​ Partitions and Replication Factor:
●​ Kafka Clusters: In production, Kafka runs on a multi-node
cluster to ensure scalability and fault tolerance. Locally, we run
it on a single container, which limits its performance and
reliability.
●​ Partitions: Kafka uses partitions to distribute events across
multiple nodes. This allows for parallel processing and
scalability. Each event is assigned to one partition.
●​ Keys in Kafka Events: Events can have optional keys. If a
key is provided, all events with the same key are assigned to
the same partition. If no key is provided, events are distributed
randomly across partitions.
●​ Replication: The replication factor ensures data is copied
across multiple nodes for fault tolerance. If a node goes down,
data is not lost as it's replicated across other nodes.
■​ Offsets:
●​ Each event in a partition has an offset (a unique identifier).
●​ Kafka clients manage their own offsets and can request events
from earlier offsets to re-consume data if necessary.
■​ Multiple Consumer Services:
●​ Kafka enables different consumer services to access the same
topic. For instance, a trends service can analyze trending
hashtags, while a timeline service updates users' timelines.
●​ The consumer client requests events based on offsets, and
each consumer can independently process the events without
interference.
○​ Hands-on Kafka Setup with Multiple Consumers
■​ Running Multiple Consumers:
●​ Set up a new trends service (trends_service.py) that
consumes from the "tweets" topic and prints trending hashtags.
●​ Run the following services concurrently:
○​ beaver_twitter.py (Flask app)
○​ timeline_service.py (updates followers' timelines)
○​ trends_service.py (analyzes trending hashtags)
■​ In a third terminal, use cURL to make POST requests, simulating
tweets:
■​ curl --location --request POST
http://localhost:5000/tweets \
■​ --header 'Content-Type: application/json' \
■​ --data-raw '{ "author_id": "1223", "text": "foobar
#cat @dog" }'
●​
●​ Observe how both timeline_service.py and
trends_service.py consume the tweet event from the
"tweets" topic.
■​ Effect of Partitions:
●​ Kafka's partitioning allows the topic to scale across multiple
machines.
●​ Parallel consumption: Different consumers can consume
from different partitions of the same topic, improving scalability.
●​ In a consumer group, each consumer subscribes to a specific
partition, ensuring that the entire group consumes all the
messages from the topic.
○​ Advanced Kafka Features
■​ Setting Keys for Events:
●​ When publishing events, you can set a key (e.g., base64
encoding of the user_id).
●​ Events with the same key (e.g., user_id) will be assigned to the
same partition, ensuring the events are consumed in order.
■​ Consume from a Specific Partition:
●​ Kafka allows consumers to fetch messages from specific
partitions.
●​ This can be useful when analyzing data at the partition level or
for advanced use cases where ordering within partitions is
critical.
○​ Key Takeaways:
■​ Partitions enable parallelism, fault tolerance, and scalability.
■​ Replication ensures Kafka can handle node failures without data
loss.
■​ Multiple Consumers can consume the same stream of events
concurrently for different use cases.
■​ Keys can be used to route specific events to specific partitions,
ensuring order within those partitions.
●​
●​ Milestone3 - Recover from application crash
○​ Recovering from Application Crash in Kafka
■​ Scenario:
●​ When a service, like the timeline service, crashes, it may miss
events that are produced while it's down.
■​ Testing the Crash Recovery:
●​ Stop the Timeline Service: Temporarily kill the timeline
service.
●​ Post a Tweet: Use cURL to post a tweet while the service is
down.
●​ Restart the Timeline Service: Upon restarting, notice that it
does not display the missed tweet.
■​ Key Concepts:
●​ Offset: An integer representing the position within a partition
for the next message. Kafka uses it to track the consumer's
current position.
●​ Broker: A Kafka component that receives messages from
producers and stores them on disk with unique offsets. It
allows consumers to fetch messages based on topic, partition,
and offset.
■​ Consumer Group:
●​ Kafka allows consumers to store the maximum offset they
have consumed using a consumer group. This is tracked by a
broker called the group coordinator.
●​ If a consumer is part of a group, it can fetch its last consumed
offset and resume from there upon restarting.
■​ Action Steps:
●​ Assign a Consumer Group Name: Modify the timeline and
trends services to assign a unique group name (e.g.,
"timeline") so they can resume processing after a crash.
●​ Check Consumer Groups: Use the following commands to
manage and inspect consumer groups:
■​ List all consumer groups:
■​ kafka-consumer-groups.sh --bootstrap-server
localhost:9092 --list
○​
■​ Describe a specific group:
■​ kafka-consumer-groups.sh --bootstrap-server
localhost:9092 --group timeline --describe
○​
■​ Understanding Output Terms:
●​ PARTITION: A segment of a topic where events are stored.
●​ LOG-END-OFFSET: The offset of the latest event in a
partition.
●​ CURRENT-OFFSET: The offset of the last event read by the
consumer group.
●​ LAG: The number of events in a partition that have yet to be
consumed by the consumer group.
■​ Practical Scenario:
●​ Kill All Consumer Services: Stop all consumer services and
produce multiple events to the topic.
●​ Monitor Lag: Describe the consumer group to observe the lag.
●​ Restart Timeline Service: Upon restarting, it should read from
the last offset and gradually reduce the lag to zero.
○​ Curious Cats
■​ Consumer Groups: Multiple consumers can belong to a single group,
distributing the event processing load.
■​ Multiple Producers: When multiple producers publish to the same
topic, events are distributed across partitions based on the event key.
■​ Zookeeper: Understand its role in managing Kafka brokers and
consumer groups. Check additional resources to learn more.
●​
●​ Milestone4 - Takeaways
○​ A Final Word
■​ Overview:
●​ This byte provided a brief, hands-on experience with event
streaming using Kafka.
●​ It focused on foundational concepts, encouraging further
exploration of Kafka’s documentation for deeper
understanding.
■​ Summary of Kafka:
●​ Event Streaming Platform: Kafka enables reading, writing,
modifying, storing, and processing event streams.
●​ Topics: Events are organized into topics to categorize related
events.
●​ Producers and Consumers:
○​ Multiple producers can publish to a topic.
○​ Multiple consumers can subscribe to a topic.
●​ Distributed System:
○​ Consists of servers and clients communicating via TCP,
deployable on hardware or cloud.
●​ Components:
○​ Brokers: Handle storage.
○​ Kafka Connect: Imports/exports data as event
streams, integrating with databases and other systems.
○​ Clients: Enable writing distributed applications and
microservices that process streams in parallel and
fault-tolerantly.
●​ Reliability: Utilizes topic partitions and replication to avoid
single points of failure.
■​ Fun Trivia About Kafka:
●​ Named after novelist Franz Kafka, who has no connection to
the software.
●​ Over 18,000 companies use Kafka, including notable names
like Spotify, Uber, and Netflix.
●​ By the time you read this, Netflix processes around 80 million
streams, thanks to Kafka.
○​ Curious Cats
■​ Try Out These Activities:
●​ Create Multiple Producers: Experiment with concurrent data
publishing.
●​ Create Multiple Partitions and Replicas: Understand data
distribution and redundancy.
●​ Create Multiple Consumer Groups: Learn about load
balancing in consumption.
●​ Fetch Events by Topic, Partition, and Offset: Test your
understanding of Kafka's data structure.
●​ Test Fault Tolerance: Simulate failure by bringing down a
partition and observe the replica’s response.
●​ Experiment with Brokers, Partitions, and Replicas:
Consider how to design these elements for a robust system.
○​ References
■​ Kafka Storage Internals: In-depth insights into how Kafka stores
data.
■​ Choosing Topics/Partitions: Guidelines on determining the number
of topics and partitions needed.
■​ Kafka Architecture: Clarifications on the master node concept within
Kafka.
●​

You might also like