SHREEYASH PRATISHTHAN’S
SHREEYASH COLLEGE OF ENGINEERING AND TECHNOLOGY (POLYTECHNIC),
                         CHH. SAMBHAJINAGAR
                                           MICRO-PROJECT REPORT
 NAME OF DEPARTMENT:- ARTIFICIAL INTELLIGENCE & MACHINE LEARNING
 ACADEMIC YEAR:- 2024-25
 SEMESTER:- SIXTH
 COURSE NAME:- BIG DATA ANLYTICS COURSE
 CODE:- 22684
             MICRO-PROJECT TITLE:- APACHE HIVE FOR REAL-TIME QUERIES AND
                                      ANALYTICS
 PREPARED BY:-
       1) SUSHANT DUDHMAL                                               EN. NO.2210920385
                                UNDER THE GUIDANCE OF:- Prof. P.N .GOPALE
                  MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION, MUMBAI
                                                 CERTIFICATE
 This is to certify that Mr./ Ms. Sushant Daulat Dudhmal of 6TH Semester of Diploma in Artificial
 Intelligence & Machine Learning Of Institute Shreeyash College Of Engineering &
 Technology Chh.Sambhajinagar has successfully completed Micro- Project Work in Course of
 big data analytics For The academic year 2024-25 as prescribed in the I-Scheme Curriculum.
  Date:-                                                    Enrollment No:-
                                                            2210920385
  Place:- CHH.SAMBHAJINAGAR                                 Exam Seat No.:-
                    Signatur                         Signature                       Signature
                        e
                     Guide                             HOD                           Principal
                                                  Seal of Institute
                                            ACKNOWLEDGEMENT
                               We wish to express our profound gratitude to our guide Prof.
 P.N .GOPALE who guided us endlessly in framing and completion of Micro- Project. He / She
 guided us on all the main points in that Micro-Project. We are indebted to his / her constant
 encouragement, cooperation and help. It was his / her enthusiastic support that helped us in
 overcoming of various obstacles in the Micro-Project.
                               We are also thankful to our Principal, HOD, Faculty Members and
 classmates for extending their support and motivation in the completion of this Micro-
 Project.
                                                                                        Annexure-1
                                              Micro-Project Proposal
 Title of Micro-Project:-
                APACHE HIVE FOR REAL-TIME QUERIES AND ANALYTICS
 1.0 Aims/Benefits of the Micro-Project
 The aim of this microproject is to develop a predictive analytics system using Big Data technologies
 and machine learning algorithms to accurately forecast the approval status of credit card
 applications. By analyzing large-scale applicant data, the project seeks to automate decision-
 making, improve accuracy, and assist financial institutions in minimizing risk and enhancing
 operational efficiency.
 2.0         Course Outcomes Addressed
             a. Describe Big data and Big Data Analytics.
             b. Apply the Big data Analytics procedure to work on datasets.
             c. Describe Hadoop Distributed File System.
             d. Analyze structured data using HIVE.
 3.0 Proposed Methodology
  Data Collection:
       Gather historical credit card application data with relevant features like income, age, credit
       score, and approval status.
  Data Preprocessing:
       Clean and transform the data by handling missing values, encoding categorical variables, and
       normalizing numerical features.
  Big Data Integration:
       Store and process the data using Big Data tools like Apache Spark or Hadoop for scalability and
       efficiency.
  Model Development:
       Apply machine learning algorithms (e.g., Logistic Regression, Decision Tree, Random Forest)
       to build a predictive model.
  Model Evaluation:
       Evaluate model accuracy using metrics like accuracy, precision, recall, and F1-score.
  Prediction:
       Use the trained model to predict credit card approval status for new applicants.
  4.0        Action Plan
       Sr.                                                 Planned       Planned
       No. Week Details of activity                        Start date    Finish
                                                                         date
       1       1 &2 Discussion & Finalization of            02/02/2025 05/02/2025
                    Topic
       2       3    Preparation of the Abstract             05/02/2025 06/02/2025
       3       4    Literature Review                       06/02/2025 08/02/2025
       4       5    Submission of Microproject              09/02/2025 10/02/2025
                    Proposal ( Annexure-I)
       5       6    Collection of information about         11/02/2025 13/02/2025
                    Topic
       6       7    Collection of relevant content /        14/02/2025 16/02/2025
                    materials for the execution of
                    Microproject.
       7       8    Discussion and submission of            17/02/2025 19/02/2025
                    outline of the Microproject.
       8       9       Analysis / execution of Collected 20/02/2025 25/02/2025
                       data / information and preparation
                       of Prototypes / drawings / photos /
                       charts
                       / graphs / tables / circuits / Models
                       / programs etc.
       9       10      Completion of Contents of Project 26/02/2025 03/02/2025
                       Report
       10 11           Completion of Weekly progress     1/03/2025 05/03/2025
                       Report
       11 12           Completion of Project             06/03/2025 10/03/2025
                       Report ( Annexure-II)
       12 13           Viva voce / Delivery of
                       Presentation
              5.0      Resources Required
   Sr. No.           Name of Resources /               Specification                      Qty    Remarks
                     Materials
   1                 Computer                          Ram minimum 4gb ,i5 7th            1
                                                       gen
   2                 Operating system                  Windows 10                         1
   3                 internet                          google                             1
                     Names of Team Members with En. Nos.
                      1. SUSHANT DUDHMAL         EN.NO:- 2210920385
                                                                                        Annexure-II
                                               Micro-Project Report
 Title of Micro-Project:-                       APACHE HIVE FOR REAL TIME QUERIES AND
                                                ANALYCIS
 1.0 Rationale
 In the age of big data, organizations are increasingly seeking ways to make timely, data-driven
decisions. Traditional batch-processing systems often introduce delays that hinder responsiveness. This
microproject aims to bridge that gap by leveraging Apache Hive to perform near real-time analytics
on incoming data streams.
Apache Hive, though originally designed for batch processing, has evolved to support faster querying
through partitioning, bucketing, and integration with newer data formats like ORC and Parquet. By
simulating real-time data ingestion and applying optimized query techniques, Hive can be transformed
into a powerful engine for time-sensitive analytics on large-scale datasets.
 2.0 Aims/Benefits of the Micro-Project:-
           Provides near real-time insights.
           Handles large-scale data efficiently.
           Fast queries using partitioned tables.
           Cost-effective and scalable.
           Offers hands-on experience with big data tools.
           Can be integrated with other analytics platforms.
 3.0 Course Outcomes Achieved
 CO1: Understand Apache Hive and its role in big data.
 CO2: Create and manage Hive tables for structured data.
 CO3: Use partitioning to speed up queries.
 CO4: Perform near real-time data analysis with HiveQL.
 CO5: Analyze and interpret sales trends from data.
 CO6: Apply emerging tech to real-world analytics problems.
 4.0 Literature Review:-
Sure! Here's a concise and well-structured Literature Review for your ETI microproject on Apache
Hive for Real-Time Queries and Analysis:
Big data technologies have transformed the way organizations handle and analyze vast volumes of
structured and unstructured data. Apache Hive, introduced by Facebook in 2008, was developed to
simplify querying large datasets stored in the Hadoop Distributed File System (HDFS). Hive translates
SQL-like queries (HiveQL) into MapReduce jobs, enabling non-programmers to analyze big data
efficiently.
Over the years, researchers and practitioners have explored Hive's capabilities for improving
performance in analytical workloads. Studies show that data partitioning, bucketing, and the use of
optimized file formats like ORC and Parquet can significantly reduce query latency. While Hive is
traditionally batch-oriented, recent developments like Hive LLAP (Live Long and Process) have
aimed to reduce query execution time, enabling near real-time analytical capabilities.
Literature also highlights the importance of integrating Hive with tools like Apache Flume, Kafka,
and Spark for streaming data processing. These integrations allow Hive to work as part of a hybrid
architecture, where real-time data is ingested, stored, and analyzed quickly.
In the context of business applications, Hive is widely used in e-commerce, telecommunications, and
financial services for real-time monitoring of transactions, sales, and customer behavior. This makes it
a valuable tool for building scalable and responsive analytics platforms.
 4.0Actual Methodology Followed
 Problem Definition: Identified the need for real-time analysis of product sales data.
 Data Generation: Simulated sales data using a Python script (order_id, product details, timestamp).
 Data Ingestion: Ingested data into HDFS using scripts to simulate real-time data flow.
 Hive Table Creation: Created external and partitioned tables in Hive for efficient data storage.
 Data Transformation: Cleaned and loaded data into partitioned tables using HiveQL queries.
 Query Execution: Performed real-time analytical queries (e.g., total sales, top products).
 Visualization (Optional): Used Apache Superset for real-time data visualization.
 Testing: Evaluated query performance and data processing efficiency.
 6.0 Actual Resources Used (Mention the actual resources used).
   Sr. No.           Name of Resources / Materials Specification                          Qty    Remarks
   1                 Computer                          Ram minimum 4gb ,i5 7th            1
                                                       gen
   2                 Operating system                  Windows 10                         1
   3                 internet                          google                             1
 7.0 Outputs of the Micro-Projects
   Real-time sales analytics (total sales, top products, hourly trends).
 Optimized Hive queries for fast data processing.
 Simulated data ingested periodically into HDFS.
 HiveQL scripts for data transformation and querying.
 Real-time dashboard for visualization (optional).
 Performance metrics for query response and data processing.
 8.0 Skill Developed/Learning outcome of this Micro-Project
  Apache Hive Proficiency
           Hands-on experience in creating tables, partitioning, and querying data in Hive.
 Big Data Management
           Ability to manage large datasets in HDFS and optimize storage and querying with Hive.
 Data Ingestion & ETL
           Skills in data ingestion from various sources and ETL (Extract, Transform, Load) processes
            using HiveQL.
 Real-Time Analytics
           Practical knowledge of setting up near real-time data analysis using Hive, with partitioning for
            time-sensitive queries.
 9.0 Applications of this Micro-Project:-
 Real-Time Sales Analytics
           Used by e-commerce and retail businesses to analyze and track product sales in near real-time,
            enabling faster decision-making.
 Customer Behavior Analysis
           Helps businesses understand buying patterns and trends, improving marketing strategies and
            customer targeting.
 Inventory Management
           Facilitates monitoring of inventory levels in real time, helping businesses manage stock and
            avoid shortages or overstocking.
 Conclusion
 This microproject successfully demonstrates the potential of Apache Hive for near real-time
analytics on large-scale datasets. By simulating a continuous stream of product sales data, we were
able to efficiently perform time-sensitive queries and derive actionable business insights. The use of
partitioning and query optimization in Hive allowed for faster data processing, showcasing its ability
to handle large volumes of data while providing real-time insights.
Through this project, we developed essential skills in big data management, data ingestion, query
optimization, and real-time analytics. The system built here is applicable across industries like e-
commerce, retail, and finance, where quick access to data insights can drive business decisions.
Ultimately, the project highlights how emerging technologies like Apache Hive can be adapted for
modern, real-time data analysis, creating value for businesses looking to stay agile and data-driven.
 Reference
       1.     Microsoft Power BI Documentation – https://learn.microsoft.com/power-bi
       2.    Splunk Official Website – https://www.splunk.com
       3.    Apache Kafka – https://kafka.apache.org
       4.    W3Schools – Python Pandas & Matplotlib
       5.    Medium.com – “How Event Data Drives Business Decisions”
                                                                                                Annexure-IV
                                    MICRO-PROJECT EVOLUTION SHEET
 Name of Student:- SUSHANT DUDHMAL                               En.no2210920385
 Name of Program:-Artificial intelligence & machine learning
 Semester:- 6TH
 Course Name:- BIG DATA ANLYTICS
 Course Code:- 22684
 Title of The Micro-Project:-         APACHE HIVE FOR REAL-TIME QUERIES AND ANALYTICS
 Course Outcomes Achieved:-
             a. Describe Big data and Big Data Analytics.
             b. Apply the Big data Analytics procedure to work on datasets.
             c. Describe Hadoop Distributed File System.
             d. Analyze structured data using HIVE.
   Sr. Characteristic to be                    Poor          Averag         Good       Excellen      Sub
                                                                                                    Tota
   No. assessed                             (Marks1-      e              (Marks 6-   t (Marks9-     l
                                            3)            (Marks4-       8)          10)
                                                          5)
                          (A) Process and Product Assessment (Convert Below total marks out of 6Marks)
       1      Relevance to the course
       2      Literature
              Review/information
              collection
       3      Completion of the
              Target as Per project
              proposal
       4      Analysis of Data and
              representation
       5
              Quality of
              Prototype/Model
       6      Report Preparation
                         (B) Individual Presentation/Viva(Convert Below total marks out of
                         4Marks)
       7      Presentation
       8
              Viva
                               (A)                                   (B)
                           Process and                  Individual Presentation/ Viva     Total
                             Product                             (4 marks)                    Marks
                          Assessment (6                                                       10
                             marks)
 Comments/Suggestions about team work/leadership/inter-personal communication
 Name of Course Teacher:-
 Dated Signature:-