0% found this document useful (0 votes)

15 views1 page

Eti Appa

The micro-project report focuses on the use of Apache Hive for real-time queries and analytics in big data, specifically aimed at predicting credit card application approval statuses. It outlines the project's aims, methodology, and the skills developed, emphasizing the importance of real-time data analysis and the integration of big data technologies. The report concludes that the project successfully demonstrates the capabilities of Apache Hive for efficient, near real-time analytics, providing valuable insights for businesses.

Uploaded by

nikitahingmire164

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views1 page

Eti Appa

Uploaded by

nikitahingmire164

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

SHREEYASH PRATISHTHAN’S

SHREEYASH COLLEGE OF ENGINEERING AND TECHNOLOGY (POLYTECHNIC),

CHH. SAMBHAJINAGAR

MICRO-PROJECT REPORT

NAME OF DEPARTMENT:- ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

ACADEMIC YEAR:- 2024-25
SEMESTER:- SIXTH
COURSE NAME:- BIG DATA ANLYTICS COURSE
CODE:- 22684
MICRO-PROJECT TITLE:- APACHE HIVE FOR REAL-TIME QUERIES AND
ANALYTICS

PREPARED BY:-
1) SUSHANT DUDHMAL EN. NO.2210920385

UNDER THE GUIDANCE OF:- Prof. P.N .GOPALE

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION, MUMBAI

CERTIFICATE
This is to certify that Mr./ Ms. Sushant Daulat Dudhmal of 6TH Semester of Diploma in Artificial
Intelligence & Machine Learning Of Institute Shreeyash College Of Engineering &
Technology Chh.Sambhajinagar has successfully completed Micro- Project Work in Course of
big data analytics For The academic year 2024-25 as prescribed in the I-Scheme Curriculum.

Date:- Enrollment No:-

2210920385
Place:- CHH.SAMBHAJINAGAR Exam Seat No.:-

Signatur Signature Signature

e
Guide HOD Principal

Seal of Institute

ACKNOWLEDGEMENT
We wish to express our profound gratitude to our guide Prof.
P.N .GOPALE who guided us endlessly in framing and completion of Micro- Project. He / She
guided us on all the main points in that Micro-Project. We are indebted to his / her constant
encouragement, cooperation and help. It was his / her enthusiastic support that helped us in
overcoming of various obstacles in the Micro-Project.
We are also thankful to our Principal, HOD, Faculty Members and
classmates for extending their support and motivation in the completion of this Micro-
Project.

Annexure-1
Micro-Project Proposal

Title of Micro-Project:-
APACHE HIVE FOR REAL-TIME QUERIES AND ANALYTICS
1.0 Aims/Benefits of the Micro-Project
The aim of this microproject is to develop a predictive analytics system using Big Data technologies
and machine learning algorithms to accurately forecast the approval status of credit card
applications. By analyzing large-scale applicant data, the project seeks to automate decision-
making, improve accuracy, and assist financial institutions in minimizing risk and enhancing
operational efficiency.

2.0 Course Outcomes Addressed

a. Describe Big data and Big Data Analytics.
b. Apply the Big data Analytics procedure to work on datasets.
c. Describe Hadoop Distributed File System.
d. Analyze structured data using HIVE.

3.0 Proposed Methodology

 Data Collection:
Gather historical credit card application data with relevant features like income, age, credit
score, and approval status.
 Data Preprocessing:
Clean and transform the data by handling missing values, encoding categorical variables, and
normalizing numerical features.
 Big Data Integration:
Store and process the data using Big Data tools like Apache Spark or Hadoop for scalability and
efficiency.
 Model Development:
Apply machine learning algorithms (e.g., Logistic Regression, Decision Tree, Random Forest)
to build a predictive model.
 Model Evaluation:
Evaluate model accuracy using metrics like accuracy, precision, recall, and F1-score.
 Prediction:
Use the trained model to predict credit card approval status for new applicants.

4.0 Action Plan

Sr. Planned Planned

No. Week Details of activity Start date Finish
date
1 1 &2 Discussion & Finalization of 02/02/2025 05/02/2025
Topic
2 3 Preparation of the Abstract 05/02/2025 06/02/2025
3 4 Literature Review 06/02/2025 08/02/2025
4 5 Submission of Microproject 09/02/2025 10/02/2025
Proposal ( Annexure-I)
5 6 Collection of information about 11/02/2025 13/02/2025
Topic
6 7 Collection of relevant content / 14/02/2025 16/02/2025
materials for the execution of
Microproject.
7 8 Discussion and submission of 17/02/2025 19/02/2025
outline of the Microproject.
8 9 Analysis / execution of Collected 20/02/2025 25/02/2025
data / information and preparation
of Prototypes / drawings / photos /
charts
/ graphs / tables / circuits / Models
/ programs etc.
9 10 Completion of Contents of Project 26/02/2025 03/02/2025
Report
10 11 Completion of Weekly progress 1/03/2025 05/03/2025
Report
11 12 Completion of Project 06/03/2025 10/03/2025
Report ( Annexure-II)
12 13 Viva voce / Delivery of
Presentation

5.0 Resources Required

Sr. No. Name of Resources / Specification Qty Remarks

Materials
1 Computer Ram minimum 4gb ,i5 7th 1
gen
2 Operating system Windows 10 1
3 internet google 1

Names of Team Members with En. Nos.

1. SUSHANT DUDHMAL EN.NO:- 2210920385

Annexure-II

Micro-Project Report

Title of Micro-Project:- APACHE HIVE FOR REAL TIME QUERIES AND

ANALYCIS
1.0 Rationale

In the age of big data, organizations are increasingly seeking ways to make timely, data-driven
decisions. Traditional batch-processing systems often introduce delays that hinder responsiveness. This
microproject aims to bridge that gap by leveraging Apache Hive to perform near real-time analytics
on incoming data streams.

Apache Hive, though originally designed for batch processing, has evolved to support faster querying
through partitioning, bucketing, and integration with newer data formats like ORC and Parquet. By
simulating real-time data ingestion and applying optimized query techniques, Hive can be transformed
into a powerful engine for time-sensitive analytics on large-scale datasets.

2.0 Aims/Benefits of the Micro-Project:-

 Provides near real-time insights.

 Handles large-scale data efficiently.
 Fast queries using partitioned tables.
 Cost-effective and scalable.
 Offers hands-on experience with big data tools.
 Can be integrated with other analytics platforms.

3.0 Course Outcomes Achieved

 CO1: Understand Apache Hive and its role in big data.

 CO2: Create and manage Hive tables for structured data.

 CO3: Use partitioning to speed up queries.

 CO4: Perform near real-time data analysis with HiveQL.

 CO5: Analyze and interpret sales trends from data.

 CO6: Apply emerging tech to real-world analytics problems.

4.0 Literature Review:-

Sure! Here's a concise and well-structured Literature Review for your ETI microproject on Apache
Hive for Real-Time Queries and Analysis:

Big data technologies have transformed the way organizations handle and analyze vast volumes of
structured and unstructured data. Apache Hive, introduced by Facebook in 2008, was developed to
simplify querying large datasets stored in the Hadoop Distributed File System (HDFS). Hive translates
SQL-like queries (HiveQL) into MapReduce jobs, enabling non-programmers to analyze big data
efficiently.

Over the years, researchers and practitioners have explored Hive's capabilities for improving
performance in analytical workloads. Studies show that data partitioning, bucketing, and the use of
optimized file formats like ORC and Parquet can significantly reduce query latency. While Hive is
traditionally batch-oriented, recent developments like Hive LLAP (Live Long and Process) have
aimed to reduce query execution time, enabling near real-time analytical capabilities.

Literature also highlights the importance of integrating Hive with tools like Apache Flume, Kafka,
and Spark for streaming data processing. These integrations allow Hive to work as part of a hybrid
architecture, where real-time data is ingested, stored, and analyzed quickly.

In the context of business applications, Hive is widely used in e-commerce, telecommunications, and
financial services for real-time monitoring of transactions, sales, and customer behavior. This makes it
a valuable tool for building scalable and responsive analytics platforms.

4.0Actual Methodology Followed

 Problem Definition: Identified the need for real-time analysis of product sales data.

 Data Generation: Simulated sales data using a Python script (order_id, product details, timestamp).

 Data Ingestion: Ingested data into HDFS using scripts to simulate real-time data flow.

 Hive Table Creation: Created external and partitioned tables in Hive for efficient data storage.

 Data Transformation: Cleaned and loaded data into partitioned tables using HiveQL queries.

 Query Execution: Performed real-time analytical queries (e.g., total sales, top products).

 Visualization (Optional): Used Apache Superset for real-time data visualization.

 Testing: Evaluated query performance and data processing efficiency.

6.0 Actual Resources Used (Mention the actual resources used).

Sr. No. Name of Resources / Materials Specification Qty Remarks

1 Computer Ram minimum 4gb ,i5 7th 1

gen
2 Operating system Windows 10 1
3 internet google 1

7.0 Outputs of the Micro-Projects

 Real-time sales analytics (total sales, top products, hourly trends).

 Optimized Hive queries for fast data processing.

 Simulated data ingested periodically into HDFS.

 HiveQL scripts for data transformation and querying.

 Real-time dashboard for visualization (optional).

 Performance metrics for query response and data processing.

8.0 Skill Developed/Learning outcome of this Micro-Project

 Apache Hive Proficiency

 Hands-on experience in creating tables, partitioning, and querying data in Hive.

 Big Data Management

 Ability to manage large datasets in HDFS and optimize storage and querying with Hive.

 Data Ingestion & ETL

 Skills in data ingestion from various sources and ETL (Extract, Transform, Load) processes
using HiveQL.

 Real-Time Analytics

 Practical knowledge of setting up near real-time data analysis using Hive, with partitioning for
time-sensitive queries.

9.0 Applications of this Micro-Project:-

 Real-Time Sales Analytics

 Used by e-commerce and retail businesses to analyze and track product sales in near real-time,
enabling faster decision-making.

 Customer Behavior Analysis

 Helps businesses understand buying patterns and trends, improving marketing strategies and
customer targeting.

 Inventory Management

 Facilitates monitoring of inventory levels in real time, helping businesses manage stock and
avoid shortages or overstocking.

Conclusion

This microproject successfully demonstrates the potential of Apache Hive for near real-time
analytics on large-scale datasets. By simulating a continuous stream of product sales data, we were
able to efficiently perform time-sensitive queries and derive actionable business insights. The use of
partitioning and query optimization in Hive allowed for faster data processing, showcasing its ability
to handle large volumes of data while providing real-time insights.

Through this project, we developed essential skills in big data management, data ingestion, query
optimization, and real-time analytics. The system built here is applicable across industries like e-
commerce, retail, and finance, where quick access to data insights can drive business decisions.
Ultimately, the project highlights how emerging technologies like Apache Hive can be adapted for
modern, real-time data analysis, creating value for businesses looking to stay agile and data-driven.

Reference

1. Microsoft Power BI Documentation – https://learn.microsoft.com/power-bi

2. Splunk Official Website – https://www.splunk.com
3. Apache Kafka – https://kafka.apache.org
4. W3Schools – Python Pandas & Matplotlib
5. Medium.com – “How Event Data Drives Business Decisions”

Annexure-IV
MICRO-PROJECT EVOLUTION SHEET

Name of Student:- SUSHANT DUDHMAL En.no2210920385

Name of Program:-Artificial intelligence & machine learning
Semester:- 6TH
Course Name:- BIG DATA ANLYTICS
Course Code:- 22684
Title of The Micro-Project:- APACHE HIVE FOR REAL-TIME QUERIES AND ANALYTICS
Course Outcomes Achieved:-
a. Describe Big data and Big Data Analytics.
b. Apply the Big data Analytics procedure to work on datasets.
c. Describe Hadoop Distributed File System.
d. Analyze structured data using HIVE.

Sr. Characteristic to be Poor Averag Good Excellen Sub

Tota
No. assessed (Marks1- e (Marks 6- t (Marks9- l
3) (Marks4- 8) 10)
5)
(A) Process and Product Assessment (Convert Below total marks out of 6Marks)

1 Relevance to the course

2 Literature
Review/information
collection
3 Completion of the
Target as Per project
proposal
4 Analysis of Data and
representation
5
Quality of
Prototype/Model
6 Report Preparation
(B) Individual Presentation/Viva(Convert Below total marks out of
4Marks)
7 Presentation
8
Viva

(A) (B)
Process and Individual Presentation/ Viva Total
Product (4 marks) Marks
Assessment (6 10
marks)

Comments/Suggestions about team work/leadership/inter-personal communication

Name of Course Teacher:-

Dated Signature:-

ETIREPORT NEWpdf
No ratings yet
ETIREPORT NEWpdf
18 pages
Java Annexure
No ratings yet
Java Annexure
9 pages
PWP Project Done
No ratings yet
PWP Project Done
19 pages
Computer Engineering Micro Project
No ratings yet
Computer Engineering Micro Project
13 pages
Microp ICT New
No ratings yet
Microp ICT New
15 pages
ETI Report
No ratings yet
ETI Report
39 pages
ETI Project
No ratings yet
ETI Project
8 pages
Shreeyash College of Engineering and Technology (Polytechnic), Chh. Sambhajinagar Micro-Project Report
No ratings yet
Shreeyash College of Engineering and Technology (Polytechnic), Chh. Sambhajinagar Micro-Project Report
8 pages
Index
No ratings yet
Index
6 pages
WPD First Page
No ratings yet
WPD First Page
4 pages
Environmental Studies
No ratings yet
Environmental Studies
12 pages
Python MP
No ratings yet
Python MP
34 pages
Microproject Report Format-CO5I
No ratings yet
Microproject Report Format-CO5I
5 pages
CSS MICROPROJECT REPORTanjali
No ratings yet
CSS MICROPROJECT REPORTanjali
12 pages
OOP Format
No ratings yet
OOP Format
10 pages
Smart City Report Micro-Project
No ratings yet
Smart City Report Micro-Project
6 pages
PWP EXP-4 To EXP-10
No ratings yet
PWP EXP-4 To EXP-10
5 pages
MGT Final Project
No ratings yet
MGT Final Project
31 pages
PHP Project Kalpesh
No ratings yet
PHP Project Kalpesh
20 pages
Ajp Project
No ratings yet
Ajp Project
18 pages
Microproject Report Format-CO5I
No ratings yet
Microproject Report Format-CO5I
5 pages
Ste Final MC
No ratings yet
Ste Final MC
23 pages
Title OF Micro Project
No ratings yet
Title OF Micro Project
19 pages
BDA (V)
No ratings yet
BDA (V)
23 pages
PHP - Report On Blood Bank and Donor Management System
No ratings yet
PHP - Report On Blood Bank and Donor Management System
16 pages
Course Outcomes Addressed: Microproject Proposal For Information About Scientific Calculator Application
No ratings yet
Course Outcomes Addressed: Microproject Proposal For Information About Scientific Calculator Application
19 pages
ETI MP Master
No ratings yet
ETI MP Master
22 pages
Mad Micro Project 06
No ratings yet
Mad Micro Project 06
16 pages
Advanced Java Micro Project
No ratings yet
Advanced Java Micro Project
16 pages
Mcss MP
No ratings yet
Mcss MP
25 pages
Complete WPD Micro Project
No ratings yet
Complete WPD Micro Project
21 pages
Eti Project
No ratings yet
Eti Project
29 pages
ETI MicroProject
No ratings yet
ETI MicroProject
24 pages
Eti Online Examination System Format
No ratings yet
Eti Online Examination System Format
7 pages
Ravi Ture Cad
No ratings yet
Ravi Ture Cad
9 pages
Micro ETI
No ratings yet
Micro ETI
18 pages
LogBook Miniproject IV Sem 2022
No ratings yet
LogBook Miniproject IV Sem 2022
18 pages
PHP Final 1.1
No ratings yet
PHP Final 1.1
23 pages
Micro Project Sne
No ratings yet
Micro Project Sne
23 pages
Acn Annexure 1.1
No ratings yet
Acn Annexure 1.1
9 pages
Cloud Computing ASHUU
No ratings yet
Cloud Computing ASHUU
24 pages
Css Project Final
No ratings yet
Css Project Final
30 pages
Ede Micro
No ratings yet
Ede Micro
23 pages
Gad Final Micro Project
No ratings yet
Gad Final Micro Project
20 pages
PHP Final Microproject
No ratings yet
PHP Final Microproject
26 pages
DBMS
No ratings yet
DBMS
16 pages
IoT Devices: Features & Insights
No ratings yet
IoT Devices: Features & Insights
25 pages
Micro Project Report format-TY
No ratings yet
Micro Project Report format-TY
17 pages
Mic-Microproject B.S
No ratings yet
Mic-Microproject B.S
12 pages
AI & ML Object Detection Project
No ratings yet
AI & ML Object Detection Project
22 pages
2609 BDA Final
No ratings yet
2609 BDA Final
23 pages
Resource Management": A Project Report On
No ratings yet
Resource Management": A Project Report On
20 pages
DTMSU MicroProject 1.1
No ratings yet
DTMSU MicroProject 1.1
23 pages
Teacher Evaluation Sheet 3I ETI
No ratings yet
Teacher Evaluation Sheet 3I ETI
3 pages
AJP MP
No ratings yet
AJP MP
20 pages
MGT Micro Report
No ratings yet
MGT Micro Report
11 pages
A Micro Project Report On - Basic Calculator Programming
100% (6)
A Micro Project Report On - Basic Calculator Programming
15 pages
Log Book
No ratings yet
Log Book
15 pages
English Project
No ratings yet
English Project
3 pages
MAd Final
No ratings yet
MAd Final
19 pages
11th Classfication Analysis - Colab
No ratings yet
11th Classfication Analysis - Colab
6 pages
Ede (Mayuri)
No ratings yet
Ede (Mayuri)
14 pages
Nikita OSY
No ratings yet
Nikita OSY
15 pages
CCD Niku
No ratings yet
CCD Niku
15 pages
Active Directory Replication Explained
No ratings yet
Active Directory Replication Explained
3 pages
Bab Iv Hasil: Gambar 5.1.1 Simulasi Peluru
No ratings yet
Bab Iv Hasil: Gambar 5.1.1 Simulasi Peluru
3 pages
4 DBMS Module-IV
No ratings yet
4 DBMS Module-IV
12 pages
(DBA-403-M) DBA-403 SQLOS Memory Manager Changes in SQL Server 2012
No ratings yet
(DBA-403-M) DBA-403 SQLOS Memory Manager Changes in SQL Server 2012
32 pages
CH 12 Business Intelligence and Performance Management
No ratings yet
CH 12 Business Intelligence and Performance Management
11 pages
Introduction to Management Science
No ratings yet
Introduction to Management Science
21 pages
Lab - Building and Orchestrating ETL Pipelines by Using Athena and Step Functions
No ratings yet
Lab - Building and Orchestrating ETL Pipelines by Using Athena and Step Functions
38 pages
Recovering A Corrupted HP Data Protector Software StoreOnce Store - Dynamic Datacenter
No ratings yet
Recovering A Corrupted HP Data Protector Software StoreOnce Store - Dynamic Datacenter
3 pages
Dadm (1) Sidra
No ratings yet
Dadm (1) Sidra
9 pages
How To Describe Charts in English
100% (1)
How To Describe Charts in English
22 pages
RAC - Cheatsheet
No ratings yet
RAC - Cheatsheet
6 pages
DBMS Lab Manual ORACLE
No ratings yet
DBMS Lab Manual ORACLE
13 pages
Sathyabama: (Deemed To Be University)
No ratings yet
Sathyabama: (Deemed To Be University)
3 pages
AP Computer Science Principles Sample Exam Question
No ratings yet
AP Computer Science Principles Sample Exam Question
19 pages
Recoverable Database Schedules Guide
No ratings yet
Recoverable Database Schedules Guide
15 pages
Int SQL PLSQL
No ratings yet
Int SQL PLSQL
32 pages
How To Take TSM Database Backup When The TSM Server Is Down
100% (1)
How To Take TSM Database Backup When The TSM Server Is Down
2 pages
Introduction To Vibration
100% (3)
Introduction To Vibration
6 pages
The Lexicon of Greek Personal Names
0% (1)
The Lexicon of Greek Personal Names
62 pages
Project Stakeholder Register
No ratings yet
Project Stakeholder Register
3 pages
Collection of Data
No ratings yet
Collection of Data
22 pages
Introduction, Nature and Functions of Research
No ratings yet
Introduction, Nature and Functions of Research
8 pages
Inventory Management PDF
No ratings yet
Inventory Management PDF
5 pages
Core Extensions For Postgresql Performance Tuning
No ratings yet
Core Extensions For Postgresql Performance Tuning
4 pages
Alternatives For Telco Data Network The Value of Spatial and Referral Networks For Churn Detection
No ratings yet
Alternatives For Telco Data Network The Value of Spatial and Referral Networks For Churn Detection
20 pages
Introduction To Statistics: Learning Objectives
No ratings yet
Introduction To Statistics: Learning Objectives
33 pages
World Economic Forum - The Technology Assessment Questionnaire 2022
No ratings yet
World Economic Forum - The Technology Assessment Questionnaire 2022
3 pages
SQL Server Replication Guide
No ratings yet
SQL Server Replication Guide
61 pages
Airbnb Data Insights for Stakeholders
No ratings yet
Airbnb Data Insights for Stakeholders
15 pages
CV - Vrunda Shah - Data Scientist - 2.5 Years Experience
No ratings yet
CV - Vrunda Shah - Data Scientist - 2.5 Years Experience
2 pages

Eti Appa

Uploaded by

Eti Appa

Uploaded by

SHREEYASH PRATISHTHAN’S

SHREEYASH COLLEGE OF ENGINEERING AND TECHNOLOGY (POLYTECHNIC),

NAME OF DEPARTMENT:- ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

UNDER THE GUIDANCE OF:- Prof. P.N .GOPALE

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION, MUMBAI

Date:- Enrollment No:-

Signatur Signature Signature

2.0 Course Outcomes Addressed

3.0 Proposed Methodology

4.0 Action Plan

Sr. Planned Planned

5.0 Resources Required

Sr. No. Name of Resources / Specification Qty Remarks

Names of Team Members with En. Nos.

Title of Micro-Project:- APACHE HIVE FOR REAL TIME QUERIES AND

2.0 Aims/Benefits of the Micro-Project:-

 Provides near real-time insights.

3.0 Course Outcomes Achieved

 CO1: Understand Apache Hive and its role in big data.

 CO2: Create and manage Hive tables for structured data.

 CO3: Use partitioning to speed up queries.

 CO4: Perform near real-time data analysis with HiveQL.

 CO5: Analyze and interpret sales trends from data.

 CO6: Apply emerging tech to real-world analytics problems.

4.0 Literature Review:-

4.0Actual Methodology Followed

 Visualization (Optional): Used Apache Superset for real-time data visualization.

 Testing: Evaluated query performance and data processing efficiency.

6.0 Actual Resources Used (Mention the actual resources used).

Sr. No. Name of Resources / Materials Specification Qty Remarks

1 Computer Ram minimum 4gb ,i5 7th 1

7.0 Outputs of the Micro-Projects

 Real-time sales analytics (total sales, top products, hourly trends).

 Optimized Hive queries for fast data processing.

 Simulated data ingested periodically into HDFS.

 HiveQL scripts for data transformation and querying.

 Real-time dashboard for visualization (optional).

 Performance metrics for query response and data processing.

8.0 Skill Developed/Learning outcome of this Micro-Project

 Apache Hive Proficiency

 Hands-on experience in creating tables, partitioning, and querying data in Hive.

 Big Data Management

 Data Ingestion & ETL

9.0 Applications of this Micro-Project:-

 Real-Time Sales Analytics

 Customer Behavior Analysis

1. Microsoft Power BI Documentation – https://learn.microsoft.com/power-bi

Name of Student:- SUSHANT DUDHMAL En.no2210920385

Sr. Characteristic to be Poor Averag Good Excellen Sub

1 Relevance to the course

Comments/Suggestions about team work/leadership/inter-personal communication

Name of Course Teacher:-

You might also like