0% found this document useful (0 votes)

29 views7 pages

IA Big Data Lab Works

The document outlines a series of lab works for a Master 1 course in Cloud Computing & Big Data at Mohamed Khider University, focusing on practical skills in database management, data analysis, and big data processing. Each lab work involves tasks such as creating relational databases, implementing intelligent query processing, exploring database indexing, building recommendation systems, and working with Hadoop and NoSQL databases like MongoDB and Cassandra. Students are expected to utilize various technologies and techniques to analyze and manage large datasets effectively.

Uploaded by

Anis Dab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views7 pages

IA Big Data Lab Works

Uploaded by

Anis Dab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Mohamed Khider University - Biskra 2024/2025

Department of Computer Science Level: Master 1

Module: Cloud Computing & Big Data Option : IA

Lab Work 1

The objective of this practical work is to design, build, and manage a large-scale relational
database using an open dataset from Kaggle. You will import data, establish relationships
between tables, and execute advanced SQL queries.

1. Dataset Selection

 Choose a large and structured dataset from Kaggle.com that can be organized into
multiple related tables.
 Examples of suitable datasets:
o E-commerce transactions
o Movie ratings and reviews
o Financial transactions
o Healthcare records
o Social media interactions

2. Database Creation & Data Import

 Use PostgreSQL, MySQL, or SQLite to create your database.

 Write SQL scripts to define tables with appropriate data types, keys, and constraints.
 Import data from CSV files into the corresponding tables.

3. Data Analysis

Execute SQL queries to analyze the data, including:

 Aggregations: SUM, AVG, COUNT, MAX, MIN.

 Implement indexing on large tables to improve query performance.

4. Web Interface Development

 Design a web-based interface using HTML, CSS, and JavaScript to interact with
the database.
 Implement basic CRUD operations (Create, Read, Update, Delete) to allow users to
manage records.
Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 2

Intelligent Query Processing

The goal of this practical work is to implement intelligent query processing techniques to
enhance user interactions with databases. You will explore:

1. Levenshtein Distance for auto-correction of misspelled queries.

2. Autocomplete using Trees to suggest relevant queries based on user input.
3. BK-Tree (Burkhard-Keller Tree) for efficient fuzzy searching in large datasets.

Instructions :

1. Create SQL Database with big datasets

2. Create a web-based interface using HTML, CSS, and JavaScript to interact with the

SQL query .

3. Use the Levenshtein algorithm to detect and correct

4. Implement Autocomplete using a Trie (Prefix Tree). Example: If the user types "SEL",

the system suggests "SELECT", "SELF", etc.

5. Implement a BK-Tree to efficiently handle approximate matching in large datasets.

This structure is useful for quickly finding the closest matches to a given input.

Example: If searching for "Biksra", the system finds similar names like "Biskra",.
Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 3

Database Indexing and TF-IDF for Efficient Search

The goal of this practical work is to explore database indexing techniques to optimize query
performance and implement TF-IDF (Term Frequency - Inverse Document Frequency)
for text search relevance. You will:

 Create and use indexes to speed up SQL queries.

 Implement TF-IDF to rank search results based on relevance.
 Compare performance between indexed and non-indexed queries.

1. Create a Database and Load Dataset (e.g., articles, product reviews, or

customer transactions).
2. Create Indexes for Faster Queries
3. Calculate Term Frequency (TF) : Compute the frequency of a word in a document
4. Calculate Inverse Document Frequency (IDF) : Compute the importance of a word across
all documents
5. Implement a Query using TF-IDF Ranking
6. Compare indexed vs. non-indexed queries and measure execution time.
7. Display ranked search results in Interface
Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 4

Recommendation System & Product Comparison

The aim of this practical work is to build a recommendation system using TF-IDF (Term

Frequency - Inverse Document Frequency) to compare product descriptions and suggest

1. Extract textual features from product descriptions.

2. Compute TF-IDF scores to measure word importance.

3. Use cosine similarity to compare and recommend similar products.

4. Evaluate the effectiveness of TF-IDF for recommendations.

Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 5
Big Data Processing with Hadoop

The objective of this practical work is to introduce students to Hadoop, a powerful framework
for distributed storage and processing of large datasets. Students will set up a Hadoop
environment, process data using HDFS (Hadoop Distributed File System), and perform
MapReduce operations to analyze a dataset.

1. Download and install Hadoop (Single-node), Configure core-site.xml, hdfs-site.xml, and

mapred-site.xml.
2. Download a dataset (e.g., a Kaggle dataset like movie reviews, stock market data, or
web logs).
3. Word Count Example in Java : Implement a MapReduce job that counts word
occurrences in a dataset.
4. Download and process a large dataset (e.g., customer reviews, social media posts).
a. Use HDFS to store the dataset.
b. Implement a MapReduce job to analyze trends (e.g., most common words, user
activity).
Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 6
NoSQL Database Management with MongoDB

The objective of this practical work is to introduce students to MongoDB, a NoSQL database
used for handling large amounts of unstructured and semi-structured data. Students will learn
how to:

 and Install and configure MongoDB

 Create and manage collections and documents
 Perform CRUD (Create, Read, Update, Delete) operations
 Execute complex queries using MongoDB’s aggregation framework

1. Install MongoDB on your system (MongoDB Download)

2. Create a Database
3. Manage Collections & Documents :
i. Insert Data into a Collection
ii. Retrieve Data
iii. Delete Documents
iv. Update Documents
4. Integrating MongoDB with a Web Application
Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 7
Big Data Storage and Processing with Cassandra

The goal of this practical work is to introduce students to Cassandra, a distributed,

scalable, and NoSQL database designed for handling large amounts of data across
multiple nodes with high availability. Students will learn how to :

 Set up an Cassandra environment

 Create and manage tables
 Perform CRUD (Create, Read, Update, Delete) operations

 Execute advanced queries using CQL (Cassandra Query Language) and Java API

Instructions

1. Download and Install Cassandra (Standalone )

2. Create a Table in Cassandra Shell

a. Insert Data into the Table

b. Retrieve Data from the Table

c. Update Data

d. Delete Data

3. Set Up a Java Project with Cassandra

Gujarat Technological University
No ratings yet
Gujarat Technological University
9 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
MongoDB Guide for Data Science Students
No ratings yet
MongoDB Guide for Data Science Students
24 pages
Big Data Analytics
No ratings yet
Big Data Analytics
19 pages
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
No ratings yet
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
3 pages
Big Data Lab Practical Guide
No ratings yet
Big Data Lab Practical Guide
25 pages
Manual Mango
No ratings yet
Manual Mango
17 pages
MCA Syllabus
No ratings yet
MCA Syllabus
76 pages
Cse 511
No ratings yet
Cse 511
7 pages
DSA Practical Index
No ratings yet
DSA Practical Index
3 pages
Big Data
No ratings yet
Big Data
4 pages
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
No ratings yet
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
5 pages
Wa0037.
No ratings yet
Wa0037.
3 pages
PCAC2009
No ratings yet
PCAC2009
3 pages
Course Code: Course Title: TPC Version No. Course Pre-Requisites/ Co-Requisites Anti-Requisites (If Any) - Objectives
No ratings yet
Course Code: Course Title: TPC Version No. Course Pre-Requisites/ Co-Requisites Anti-Requisites (If Any) - Objectives
4 pages
COS221 Assignment 1 2025
No ratings yet
COS221 Assignment 1 2025
3 pages
001-2023-0921 DLMDSBDT01 Course Book
No ratings yet
001-2023-0921 DLMDSBDT01 Course Book
124 pages
BDA Lab Manual 200305105108
No ratings yet
BDA Lab Manual 200305105108
44 pages
Advanced Data Processing Course
No ratings yet
Advanced Data Processing Course
2 pages
Big Data Technologies Course Outline
No ratings yet
Big Data Technologies Course Outline
2 pages
Internship Report (Data Science)
No ratings yet
Internship Report (Data Science)
32 pages
Resume ICICI
No ratings yet
Resume ICICI
3 pages
Big Daa R18 Manual
No ratings yet
Big Daa R18 Manual
84 pages
Bca Bigdata Fifth - Sem Approved Syllabus
No ratings yet
Bca Bigdata Fifth - Sem Approved Syllabus
23 pages
Int 421
No ratings yet
Int 421
2 pages
ME CSE Sem 1
No ratings yet
ME CSE Sem 1
9 pages
SEM VII BDA Syllabus Theory
No ratings yet
SEM VII BDA Syllabus Theory
4 pages
Mrcet R20 Iv 1 QB
No ratings yet
Mrcet R20 Iv 1 QB
79 pages
Eti 2 - Compressed
No ratings yet
Eti 2 - Compressed
11 pages
Introduction of Subject
No ratings yet
Introduction of Subject
28 pages
Database Modeling and Database Systems - DLBCSDMDS01 - Course - Book
No ratings yet
Database Modeling and Database Systems - DLBCSDMDS01 - Course - Book
148 pages
2024 25 ODD CE449 BDA Syllabus
No ratings yet
2024 25 ODD CE449 BDA Syllabus
4 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
119 pages
MongoDB MSEC
No ratings yet
MongoDB MSEC
18 pages
BAD601 Important Question
No ratings yet
BAD601 Important Question
2 pages
2CS702-CPD-Odd 23 24
No ratings yet
2CS702-CPD-Odd 23 24
9 pages
Data Engineering Nanodegree Program Syllabus PDF
No ratings yet
Data Engineering Nanodegree Program Syllabus PDF
5 pages
Big Data Lab File
No ratings yet
Big Data Lab File
49 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
134 pages
Mscit Sem3and4
No ratings yet
Mscit Sem3and4
11 pages
Starting MajorprojectDOC
No ratings yet
Starting MajorprojectDOC
11 pages
CSET 371 Course File
No ratings yet
CSET 371 Course File
81 pages
536C3A
No ratings yet
536C3A
2 pages
Bigdataspark Manual (MR-22)
No ratings yet
Bigdataspark Manual (MR-22)
106 pages
12 IP Splitup Syllabus XII IP 2024 25
No ratings yet
12 IP Splitup Syllabus XII IP 2024 25
2 pages
B.Tech Jntuh DWDM Course Description
No ratings yet
B.Tech Jntuh DWDM Course Description
6 pages
Computer Science Internship Resume
No ratings yet
Computer Science Internship Resume
2 pages
BDA Practical File
No ratings yet
BDA Practical File
61 pages
Unit 1
No ratings yet
Unit 1
19 pages
IIT Jodhpur Postgraduate Diploma in Data Engineering & Cloud Computing
No ratings yet
IIT Jodhpur Postgraduate Diploma in Data Engineering & Cloud Computing
18 pages
Termproject
No ratings yet
Termproject
5 pages
Cse 4-1 4-2
No ratings yet
Cse 4-1 4-2
19 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
3 pages
Big Data CH 1
No ratings yet
Big Data CH 1
62 pages
COIT20253: Business Intelligence Using Big Data: Assignment 2: Presentation 12092891-Romel Miranda
No ratings yet
COIT20253: Business Intelligence Using Big Data: Assignment 2: Presentation 12092891-Romel Miranda
18 pages
Big Data
No ratings yet
Big Data
3 pages
Activity Based Intelligence Principles and Applications Patrick Biltgen Instant Download
No ratings yet
Activity Based Intelligence Principles and Applications Patrick Biltgen Instant Download
142 pages
Big Data in Management: VI Trimester - ELECTIVE Session 1 - 5
No ratings yet
Big Data in Management: VI Trimester - ELECTIVE Session 1 - 5
29 pages
CSC 428 - 4
No ratings yet
CSC 428 - 4
12 pages
SPBA110 Assessment - 2
No ratings yet
SPBA110 Assessment - 2
4 pages
Lecture 7 Chapter 5 Part 3 Big Data Storage Concepts
No ratings yet
Lecture 7 Chapter 5 Part 3 Big Data Storage Concepts
11 pages
Privacy and Ethics in The Age of Big Data
No ratings yet
Privacy and Ethics in The Age of Big Data
42 pages
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
No ratings yet
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
40 pages
Indonesian Electrical Engineering Journal
No ratings yet
Indonesian Electrical Engineering Journal
5 pages
20 Short Questions
No ratings yet
20 Short Questions
11 pages
Research On The Mental Health Education of College
No ratings yet
Research On The Mental Health Education of College
4 pages
International Journal of Computer Science & Information Security
No ratings yet
International Journal of Computer Science & Information Security
192 pages
Big Data PHD Thesis
100% (3)
Big Data PHD Thesis
5 pages
ML Ops
100% (1)
ML Ops
19 pages
Big Data's Role in Smart Cities
No ratings yet
Big Data's Role in Smart Cities
6 pages
Literature Review Prof V Bansal IITK
100% (1)
Literature Review Prof V Bansal IITK
23 pages
PHD Research Assistance
0% (1)
PHD Research Assistance
20 pages
People Analytics in The Era of Big Data
100% (1)
People Analytics in The Era of Big Data
404 pages
Buy Ebook Clinical Analytics and Data Management For The DNP Martha L Sylvia Cheap Price
100% (1)
Buy Ebook Clinical Analytics and Data Management For The DNP Martha L Sylvia Cheap Price
49 pages
Big Data Course Overview
No ratings yet
Big Data Course Overview
97 pages
BDA (18CS72) Module-1
No ratings yet
BDA (18CS72) Module-1
36 pages
Analytics of Airtel
No ratings yet
Analytics of Airtel
5 pages
T13, 14
No ratings yet
T13, 14
3 pages
Data Science & Big Data Basics
No ratings yet
Data Science & Big Data Basics
29 pages
Ins 511 Individual Assignment
No ratings yet
Ins 511 Individual Assignment
7 pages
PURA Solutions PFE Book 2025 1731612556
No ratings yet
PURA Solutions PFE Book 2025 1731612556
34 pages
Python Developer Profile
No ratings yet
Python Developer Profile
7 pages
Why Has The Theft of Computing Devices Become More Serious Over Time
No ratings yet
Why Has The Theft of Computing Devices Become More Serious Over Time
2 pages

IA Big Data Lab Works

Uploaded by

IA Big Data Lab Works

Uploaded by

Mohamed Khider University - Biskra 2024/2025

Department of Computer Science Level: Master 1

2. Database Creation & Data Import

 Use PostgreSQL, MySQL, or SQLite to create your database.

Execute SQL queries to analyze the data, including:

 Aggregations: SUM, AVG, COUNT, MAX, MIN.

4. Web Interface Development

Intelligent Query Processing

1. Levenshtein Distance for auto-correction of misspelled queries.

1. Create SQL Database with big datasets

3. Use the Levenshtein algorithm to detect and correct

the system suggests "SELECT", "SELF", etc.

5. Implement a BK-Tree to efficiently handle approximate matching in large datasets.

Database Indexing and TF-IDF for Efficient Search

 Create and use indexes to speed up SQL queries.

1. Create a Database and Load Dataset (e.g., articles, product reviews, or

Recommendation System & Product Comparison

Frequency - Inverse Document Frequency) to compare product descriptions and suggest

similar items. You will:

1. Extract textual features from product descriptions.

2. Compute TF-IDF scores to measure word importance.

3. Use cosine similarity to compare and recommend similar products.

4. Evaluate the effectiveness of TF-IDF for recommendations.

1. Download and install Hadoop (Single-node), Configure core-site.xml, hdfs-site.xml, and

 and Install and configure MongoDB

1. Install MongoDB on your system (MongoDB Download)

The goal of this practical work is to introduce students to Cassandra, a distributed,

 Set up an Cassandra environment

1. Download and Install Cassandra (Standalone )

2. Create a Table in Cassandra Shell

a. Insert Data into the Table

b. Retrieve Data from the Table

3. Set Up a Java Project with Cassandra

You might also like