Project Report 1
Project Report 1
On
To
RIMJHIM RANA
(00224402024)
SWATI
(00324402024)
SANYA MOTWANI
(01524402024)
ANANYA SRIVASTAVA
(02824402024)
Firstly, we would like to extend our deepest gratitude to Mr. Divyank Chauhan,
whose invaluable guidance and insights were essential throughout the training.
His expertise and patience were crucial in helping me navigate the complexities
of advanced data science and machine learning.
We also wish to sincerely thank IITM Janakpuri for collaborating with Shape
MySkills Pvt. Ltd. to provide this exceptional learning opportunity. Special
appreciation goes to our course coordinator, Dr. Meenu, for her dedicated efforts
and support during the training. Furthermore, we are deeply grateful to ,
Head of the Department, for her encouragement and for providing the necessary
resources and environment to facilitate this learning experience.
Finally, we acknowledge the unwavering support of our family and friends, who
have been our pillars of strength and encouragement throughout this journey.
Thank you all for making this experience valuable and memorable.
Sincerely,
Rimjhim
Swati
Sanya
Ananya
Certificate
This is to certify that Rimjhim, Swati, Sanya and Ananya has successfully
completed the project titled “FLIPKART SALES DASHBOARD” as part of
the Data Analytics summer training organised by IITM Janakpuri in
collaboration with ShapeMySkills Pvt. Ltd.
This project was conducted under the esteemed guidance of Mr. Divyank
Chauhan, whose expertise and mentorship were instrumental in its successful
completion. The project exemplifies a thorough understanding of data
analytics techniques, highlighting the skills acquired during the training
program.
We commend Rimjhim, Swati, Sanya and Ananya for their dedication, hard
work, and enthusiasm throughout the project duration.
Head of Department:
Date:
Signature:
INDEX
2 Chapter 2 6-11
Introduction to Python Libraries
3 Chapter 3 12-16
Sqlite basics and Data operations
4 Chapter 4 17-21
Advance Excel
5 Chapter 5 22-26
Introduction to Powerbi
6 Chapter 6 27-30
Dashboard
List of Abbreviations
S No. Name
1 OOP: Object Oriented programming
2 I/0: Input/Output
10 PY: Python
1. for Loop
The for loop is used to iterate over a sequence like a list, tuple, string,
or range.
Syntax
for item in sequence:
# code block
2. while Loop
The while loop keeps running as long as a given condition is true.
Syntax:
while condition:
# code block
Example:
count = 0
while count < 5:
print("Count is:", count)
count += 1
Keyword Description
Variables in Python
A variable stores data in memory.
You don’t need to declare the data type.
name = "Alice"
age = 25
pi = 3.14
is_valid = True
Dynamic Typing
x=5 # int
x = "five" # str – Python changes type automatically!
Operators in Python
Chapter 2:
Introduction to Python libraries
NumPy
NumPy (Numerical Python) is a powerful open-source Python library
used for performing numerical computations efficiently. It is the
foundational package for scientific computing in Python and is widely
used in data analytics, machine learning, and big data applications.
NumPy provides support for arrays, matrices, and a vast collection of
mathematical functions.
Key features:
3.Array Operations
Element-wise Operations:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b)
Mathematical Functions:
np.mean(arr)
np.median(arr)
np.std(arr)
np.sum(arr)
np.max(arr)
np.min(arr)
4,Array Manipulation:
Reshape: arr.reshape(3, 2)
Flatten: arr.flatten()
Transpose: arr.T
Pandas
Pandas is a fast, powerful, and flexible open-source data analysis and
data manipulation library for Python. It is built on top of NumPy and
designed for working with structured data like tables, Excel files,
CSVs, and databases. It introduces two primary data structures: Series
and DataFrame.
Key Features:
# Write to CSV
df.to_csv('output.csv', index=False)
Data Exploration
df.head() # First 5 rows
df.tail() # Last 5 rows
df.info() # Summary
df.describe() # Stats summary
df.columns # Column names
df.shape # Rows and columns
Data Selection and Indexing
df['Name'] # Access single column
df[['Name', 'Age']] # Access multiple columns
df.loc[0] # Access row by label
df.iloc[1] # Access row by index
df[ df['Age'] > 20 ] # Conditional filtering
Matplotlib
Matplotlib is a widely used data visualization library in Python. It enables
users to create static, animated, and interactive plots with high flexibility
and customization. It is particularly useful for data analysts to represent
data insights visually through graphs and charts.
1. Line Plot:
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y)
plt.title("Simple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
2. Bar Chart:
labels = ['A', 'B', 'C']
values = [10, 15, 7]
plt.bar(labels, values)
plt.title("Bar Chart")
plt.show()
3. Scatter Plot:
plt.scatter(x, y, color='red')
plt.title("Scatter Plot")
plt.show()
Seaborn
Seaborn is a Python data visualization library built on top of
Matplotlib, designed to make statistical graphics easier and more
attractive. It works seamlessly with Pandas DataFrames, making it
ideal for data analytics tasks.
Example:
Chapter 3:
Sqlite basics and Data operations
Main Features
Serverless – No need to install or run a separate database server.
Zero Configuration – No setup, just connect and use.
Self-contained – Everything (code + data) is in one file.
Portable – Database files can be copied between systems easily.
Cross-platform – Runs on Windows, Linux, macOS, Android, and iOS.
Fast & Efficient – Performs well for small to medium datasets.
Reliable – Fully ACID-compliant, supports transactions.
Use Cases
Mobile apps (e.g., WhatsApp, Android apps)
Desktop software (e.g., Firefox, Chrome)
Embedded systems & IoT devices (e.g., Raspberry Pi)
Data analysis & quick prototypes (Python projects)
Educational or test projects without setting up a full DBMS
import sqlite3
o Creating a Database
You can create a new SQLite database using the connect() function. If the
specified file (e.g., students.db) does not exist, SQLite will automatically create
it. If it already exists, it will simply connect to the existing one.
cursor = conn.cursor()
o Creating a Table
cursor.execute('''
CREATE TABLE IF NOT EXISTS students (
id INTEGER PRIMARY KEY,
name TEXT,
age INTEGER,
grade TEXT
)
''')
conn.commit()
CRUD stands for Create, Read, Update, and Delete — the four
basic operations for managing data in a database. With SQLite and
Python, these operations are simple and efficient using the built-in
sqlite3 module. Below are examples of each operation using a sample
table called students.
To add new records to the database, we use the INSERT INTO SQL
statement. This can be done using parameterized queries to prevent
SQL injection.
import sqlite3
conn = sqlite3.connect('school.db')
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS students (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
age INTEGER,
grade TEXT
)
''')
python
CopyEdit
cursor.execute("SELECT name, grade FROM students WHERE age
> 18")
python
CopyEdit
cursor.execute("DELETE FROM students WHERE name = 'Sanya'")
conn.commit()
Advantages of SQLite
1. Serverless Architecture
SQLite is self-contained and does not require a separate server to
operate. All you need is the database file, which can be easily created
and accessed using a simple API.
2. Lightweight and Fast
SQLite is extremely lightweight—just a few hundred kilobytes in size
—and performs very well for read-heavy operations or small-to-
medium datasets.
3. Zero Configuration
There is no setup or installation required. You can start using SQLite
immediately without configuring user accounts, permissions, or a
server.
4. Cross-Platform Compatibility
SQLite works seamlessly on Windows, macOS, Linux, Android, iOS,
and embedded systems like Raspberry Pi or Arduino.
Disadvantages of SQLite
Chapter 4:
Advance Excel
What is Advanced Excel?
Advanced Excel refers to powerful features and tools beyond basic
spreadsheet use. These include formulas, pivot tables, data visualization,
advanced functions, macros, and data analysis tools. Mastering these tools
enhances your ability to clean, explore, and visualize data efficiently.
Why Use Excel for Data Analytics?
User-friendly interface for handling structured data.
Excellent for data cleaning, manipulation, and visualization.
Supports advanced calculations and automation (macros, VBA).
Ideal for quick analysis, dashboards, and reports.
Commonly used in industry for finance, marketing, HR, and operations.
Real-World Applications
Industry Use Case
Finance Budgeting, forecasting, investment tracking
Marketing Campaign analysis, ROI reports
HR Attendance, payroll, performance tracking
Sales Sales pipeline, region-wise performance
Education Grading systems, student analytics
Logistics Inventory control, supply chain monitoring
=COUNTIF(C2:C20, ">70")
Pivot Tables
A Pivot Table allows you to automatically group, filter, and summarize
your data. It’s perfect for answering questions like:
Pivot Charts
A Pivot Chart is a dynamic chart connected to your PivotTable. As you
change your table (e.g., filter years), the chart updates automatically.
Advantages of Excel
1. User-Friendly Interface
Excel is easy to learn and use, even for beginners. Its tab-based layout, intuitive
features, and familiar design make it highly accessible.
2. Powerful Analytical Tools
With formulas, functions, PivotTables, and data analysis add-ins, Excel
supports advanced analytics, modeling, and reporting without requiring
programming skills.
3. Data Visualization
Excel provides a wide variety of charts, graphs, conditional formatting, and
dashboards to help visualize and interpret data effectively.
4. Flexible and Versatile
It can handle different types of data—numeric, textual, date/time—and is used
in fields like finance, HR, education, marketing, and logistics.
5. Integration with Other Tools
Excel integrates with Power BI, SQL, Python, R, and online services, making it
useful for both standalone and collaborative workflows.
Disadvantages of Excel
1. Limited Scalability
Excel is not designed for handling very large datasets (millions of rows)
efficiently. Performance may degrade with size and complexity.
2. Error-Prone with Manual Entry
Manual data entry and formula writing increase the risk of human errors,
especially in large or complex spreadsheets.
3. No Real-Time Collaboration (Offline Versions)
While Excel 365 supports collaboration, offline versions lack true real-
time multi-user support like Google Sheets.
4. Security Concerns
Excel files can be easily copied, edited, or shared without restrictions. It
lacks strong, built-in access control and auditing features.
Chapter 5:
Introduction to Powerbi
What is Power BI?
A business intelligence and data visualization tool developed by
Microsoft.
Converts raw data into interactive dashboards and meaningful insights.
Allows easy connection to multiple data sources (Excel, SQL, Azure,
etc.).
Offers drag-and-drop interface for designing reports without coding.
Used for creating charts, graphs, KPIs, and maps for data storytelling.
Supports real-time data updates and monitoring through dashboards.
Empowers both technical and non-technical users to analyze data.
Power BI Family
Power BI Desktop – Used to build, model, and design interactive
reports.
Power BI Service – Cloud-based platform to share, publish, and
collaborate.
Power BI Mobile – Mobile apps to access dashboards on phones and
tablets.
Power BI Report Server – On-premise version for hosting reports
securely.
2. Power BI Service
A cloud-based platform used for publishing and sharing reports.
Allows users to create dashboards and collaborate in real-time.
Supports scheduled data refresh and alert notifications.
Provides workspace features for managing user access and roles.
Enables sharing reports across teams, departments, or entire
organizations.
3. Power BI Gateway
Acts as a bridge between on-premise data sources and Power BI
services.
Keeps cloud-based reports updated with the latest on-premise data.
Two types: Personal gateway (for individual use) and Enterprise
gateway (for teams).
Essential for organizations with hybrid data infrastructure.
4. Power BI Mobile
Available for iOS, Android, and Windows devices.
Lets users view and interact with dashboards on the go.
Supports touch-enabled navigation, filtering, and drill-downs.
Sends mobile alerts based on predefined thresholds or conditions.
Features of Powerbi
1. Data Connectivity
Power BI supports a wide variety of data connectors, allowing users to
import data from sources like Excel, CSV, SQL Server, MySQL, Oracle,
Azure, SharePoint, Google Analytics, Salesforce, and even web APIs. This
extensive connectivity makes it flexible for almost any data environment. It
enables users to combine data from multiple sources into a unified data
model for reporting and analysis.
Power BI includes Power Query Editor, a built-in tool that helps users clean
and shape their data before loading it into reports. It supports operations like
filtering rows, removing duplicates, merging tables, renaming columns, and
changing data types. These steps are recorded automatically and can be
reused or modified later, making data preparation efficient and repeatable.
4. Interactive Visualizations
Power BI provides a rich set of built-in visuals including bar charts, line
graphs, pie charts, maps, tables, matrices, cards, gauges, and more. Users can
simply drag and drop fields to create visuals, and customize them with
colors, labels, filters, and tooltips. These visuals are interactive—clicking on
one visual updates others to reflect related data.
5. Custom Visuals
In addition to built-in visuals, Power BI supports importing custom visuals
from Microsoft AppSource or even creating your own. These visuals can be
used to meet specific business needs or create unique dashboards.
1. Business Decision-Making
Organizations use Power BI to:
2. Financial Analysis
Power BI helps finance teams to:
3.HR Analytics
Used for:
Limitations
Limited Export Options: Exporting large reports to PDF can be
restricted.
Complex DAX Syntax: Learning curve for advanced calculations.
Data Model Size: Performance issues with very large datasets.
Custom Visuals Licensing: Some visuals require paid licenses.
Career Opportunities
Learning Power BI opens doors in:
Business Intelligence
Data Analytics
Data Science
Financial Analysis
Project Management
Job Roles
Power BI Developer
Data Analyst
Business Analyst
BI Consultant
Chapter 6:
Dashboard
What is a Dashboard in Power BI?
A dashboard in Power BI is a single-page, interactive view that displays
key insights and metrics from various reports and datasets. Often referred
to as a “canvas,” a dashboard allows users to pin visuals from different
reports and sources into one unified view. It’s ideal for monitoring
business performance at a glance and quickly identifying trends or issues.
Benefits of Dashboards
Provide a quick overview of KPIs and business metrics.
Save time by summarizing complex data in a single view.
Encourage collaboration by enabling sharing across teams.
Help executives and managers make fast, informed decisions.