0% found this document useful (0 votes)

7 views26 pages

Data Analytics Lab Manual

The document outlines a series of experiments focused on setting up a Python environment for data science using libraries like NumPy and Pandas, as well as performing various data manipulation and analysis tasks. It includes detailed steps for installation, verification, and execution of basic programs, along with additional experiments involving Hadoop, MongoDB, data visualization, and MapReduce techniques. Each experiment aims to provide practical experience with data science tools and techniques, culminating in expected outputs for successful execution.

Uploaded by

Satyam Tomar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views26 pages

Data Analytics Lab Manual

Uploaded by

Satyam Tomar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Experiment:1

1. Introduction

In the rapidly evolving landscape of data science and analytics, Python has
emerged as a quintessential programming language, favored for its
readability, extensive libraries, and robust community support. Central to
Python’s dominance in this field are powerful libraries like NumPy and
Pandas, which provide highly optimized tools for numerical operations and
data manipulation, respectively.

2. Aim

The aim of Experiment 1 is :

To successfully install, configure, and verify a Python programming

environment suitable for data science.

To explore and demonstrate basic data manipulation capabilities using the

NumPy and Pandas libraries.

3. Background and Theoretical Context

3.1 Python for Data Science

3. Tools Required

Python: Version 3.8 or higher (preferably installed via Anaconda distribution

for ease of package management).

NumPy: Python library for numerical computing.

Pandas: Python library for data analysis and manipulation.

Integrated Development Environment (IDE) / Code Editor: Jupyter

Notebook/Lab, VS Code, or PyCharm are recommended for interactive
development.

4. Methodology and Tasks Performed

The experiment was conducted on a [Your Operating System, e.g., Windows

10 / macOS Ventura / Ubuntu 22.04 LTS] system. The following steps outline
the installation process and the subsequent programming tasks.
5.1 Installation of Python, NumPy, and Pandas

Steps:

Download Anaconda:

Navigate to the official Anaconda website:

https://www.anaconda.com/products/distribution

Download the appropriate graphical installer for your operating system (e.g.,
64-bit Graphical Installer for Windows).

Install Anaconda:

Run the downloaded installer.

Follow the on-screen prompts. It is generally recommended to accept the

default options, including adding Anaconda to your system PATH (though the
installer might warn against this for advanced users, for a beginner it’s often
convenient).

Once installed, open your system’s command prompt (Windows: cmd,

macOS/Linux: Terminal).

Verify Python Installation:

Type the following command and press Enter:

Python –version

Expected Output: Python 3.x.x (e.g., Python 3.9.12). This confirms Python is
installed and accessible.

Verify NumPy and Pandas Installation:

Since Anaconda typically pre-installs these libraries, we can verify their

presence and version.
Open a Python interpreter session by typing python in your terminal, or
preferably, open a Jupyter Notebook/Lab.

Execute the following commands:

Import numpy as np

Print(f”NumPy Version: {np.version}”)

Import pandas as pd

Print(f”Pandas Version: {pd.version}”)

If these commands execute without ModuleNotFoundError, it confirms that

NumPy and Pandas are successfully installed.

(Optional: If for some reason they were not installed, or you are not using
Anaconda, you would install them via pip or conda):

# Using pip

Pip install numpy pandas

# Using conda (if using Anaconda/Miniconda)

Conda install numpy pandas

5.2 Writing Basic Programs Using NumPy Arrays and Pandas DataFrames

Once the environment is set up, basic programs were written and executed
to demonstrate the core functionalities of NumPy and Pandas. These
programs were run in a Jupyter Notebook environment for interactive
development and clear output.

Task 1: Basic NumPy Array Operations

Objective: Create a NumPy array, perform element-wise operations, and

demonstrate basic array attributes.

Task 2: Basic Pandas DataFrame Operations

Objective: Create a Pandas DataFrame, access specific columns/rows, and
perform basic data selection.

6. Results and Observations

6.1 Installation Verification

The Python environment was successfully set up using the Anaconda

distribution. The verification commands yielded the following outputs:

# Command to check Python version

Python –version

# Output:

# Python 3.9.12

# Commands to check NumPy and Pandas versions in a Python

interpreter/Jupyter Notebook

Import numpy as np

Print(f”NumPy Version: {np.version}”)

Import pandas as pd

Print(f”Pandas Version: {pd.version}”)

# Output:

# NumPy Version: 1.21.5

# Pandas Version: 1.4.2

(Note: Versions may vary based on the Anaconda distribution used at the
time of installation.)

The successful output confirms that Python, NumPy, and Pandas are correctly
installed and configured within the environment.
6.2 Execution of Basic Programs

Program 1: NumPy Array Operations

Import numpy as np

Print(“--- NumPy Array Operations ---“)

# 1. Create a NumPy array

Data_list = [10, 20, 30, 40, 50]

Numpy_array = np.array(data_list)

Print(“\n1. Original NumPy Array:”)

Print(numpy_array)

Print(f” Type: {type(numpy_array)}”)

Print(f” Shape: {numpy_array.shape}”)

Print(f” Data Type (dtype): {numpy_array.dtype}”)

# 2. Perform element-wise operation (e.g., add 5 to each element)

Modified_array = numpy_array + 5

Print(“\n2. Array after adding 5 to each element:”)

Print(modified_array)

# 3. Perform another operation (e.g., multiply by 2)

Multiplied_array = numpy_array * 2

Print(“\n3. Array after multiplying each element by 2:”)

Print(multiplied_array)

# 4. Calculate sum of array elements

Array_sum = np.sum(numpy_array)

Print(f”\n4. Sum of array elements: {array_sum}”)

# 5. Create a 2D array and perform operations

Matrix = np.array([[1, 2, 3], [4, 5, 6]])

Print(“\n5. 2D NumPy Array (Matrix):”)

Print(matrix)

Print(f” Shape of matrix: {matrix.shape}”)

Print(f” Sum of all elements in matrix: {np.sum(matrix)}”)

Output of Program 1:

--- NumPy Array Operations ---

1. Original NumPy Array:

[10 20 30 40 50]

Type: <class ‘numpy.ndarray’>

Shape: (5,)

Data Type (dtype): int64

2. Array after adding 5 to each element:

[15 25 35 45 55]

3. Array after multiplying each element by 2:

[ 20 40 60 80 100]

4. Sum of array elements: 150

5. 2D NumPy Array (Matrix):

[[1 2 3]

[4 5 6]]

Shape of matrix: (2, 3)

Sum of all elements in matrix: 21

Program 2: Pandas DataFrame Operations

Import pandas as pd

Print(“\n--- Pandas DataFrame Operations ---“)

# 1. Create a Pandas DataFrame from a dictionary

Data = {

‘Student_ID’: [101, 102, 103, 104, 105],

‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eve’],

‘Score’: [85, 92, 78, 95, 88],

‘Course’: [‘Math’, ‘Physics’, ‘Chemistry’, ‘Math’, ‘Physics’]

Df = pd.DataFrame(data)

Print(“\n1. Original DataFrame:”)

Print(df)

Print(f”\n Type: {type(df)}”)

Print(f” Shape: {df.shape}”)

Print(f” Columns: {list(df.columns)}”)

# 2. Access a specific column

Names_column = df[‘Name’]

Print(“\n2. ‘Name’ Column:”)

Print(names_column)

Print(f” Type of ‘Name’ column: {type(names_column)}”)

# 3. Access multiple columns

Selected_columns = df[[‘Name’, ‘Score’]]

Print(“\n3. ‘Name’ and ‘Score’ Columns:”)

Print(selected_columns)

# 4. Select rows based on a condition (e.g., Score > 90)

High_scorers = df[df[‘Score’] > 90]

Print(“\n4. Students with Score > 90:”)

Print(high_scorers)

# 5. Get basic descriptive statistics for numerical columns

Print(“\n5. Descriptive Statistics for numerical columns:”)

Print(df.describe())

Output of Program 2:

--- Pandas DataFrame Operations ---

1. Original DataFrame:

Student_ID Name Score Course

0 101 Alice 85 Math

1 102 Bob 92 Physics

2 103 Charlie 78 Chemistry

3 104 David 95 Math

4 105 Eve 88 Physics

Type: <class ‘pandas.core.frame.DataFrame’>

Shape: (5, 4)

Columns: [‘Student_ID’, ‘Name’, ‘Score’, ‘Course’]

2. ‘Name’ Column:
0 Alice
1 1 Bob
2 Charlie
3 3 David
4 Eve
5 Name: Name, dtype: object

Type of ‘Name’ column: <class ‘pandas.core.series.Series’>

3. ‘Name’ and ‘Score’ Columns:

Name Score

0 Alice 85

1 Bob 92

2 Charlie 78

3 David 95

4 Eve 88

4. Students with Score > 90:

Student_ID Name Score Course

1 102 Bob 92 Physics

3 104 David 95 Math

5. Descriptive Statistics for numerical columns:

Student_ID Score

Count 5.000000 5.000000

Mean 103.000000 87.600000

Std 1.581139 6.804410

Min 101.000000 78.000000

25% 102.000000 85.000000

50% 103.000000 88.000000

75% 104.000000 92.000000

Max 105.000000 95.000000

Experiment 1: Install, Configure, and Run Python, NumPy, and Pandas

Aim:

To install and set up the Python environment and explore basic data
manipulation using NumPy and Pandas.

Steps:
1. Install Python from python.org.

2. Open terminal/command prompt and install libraries:

pip install numpy pandas

3. Create two programs:

NumPy Program (numpy_demo.py):

import numpy as np

# create a numpy array

arr = np.array([1, 2, 3, 4, 5])

print("Array:", arr)

# perform operations

print("Mean:", np.mean(arr))

print("Squared:", arr ** 2)

Pandas Program (pandas_demo.py):

import pandas as pd

# create a DataFrame

data = {'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [20, 21, 19],

'Score': [85, 90, 88]}

df = pd.DataFrame(data)

print("DataFrame:\n", df)

print("Mean Score:", df['Score'].mean())

Expected Output:

Successful execution of array operations and DataFrame manipulations.

---

Experiment 2: Install, Configure, and Run Hadoop and HDFS

Aim:

To set up Hadoop and interact with the Hadoop Distributed File System
(HDFS).

Steps:

1. Install Hadoop, set environment variables (JAVA_HOME, HADOOP_HOME).

2. Configure core-site.xml and hdfs-site.xml.

3. Start NameNode and DataNode.

4. Run commands:

hdfs dfs -mkdir /mydata

hdfs dfs -put localfile.txt /user/hadoop/

hdfs dfs -ls /user/hadoop/

Expected Output:

HDFS directories and files should be created, listed, and accessible.

---

Experiment 3: Visualize Data Using Basic Plotting Techniques

Aim:

To create visualizations using Matplotlib and Seaborn.

Steps & Code:

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

# sample dataset

data = {'Name': ['Alice', 'Bob', 'Charlie'],

'Math': [85, 78, 92],

'Science': [90, 88, 95],

'English': [80, 75, 85]}

df = pd.DataFrame(data)

# Line Plot

df.set_index('Name')[['Math','Science','English']].plot(kind='line')

plt.title("Line Plot - Scores")

plt.show()

# Bar Chart

sns.barplot(x='Name', y='Math', data=df)

plt.title("Bar Chart - Math Scores")

plt.show()

# Pie Chart

df[['Math','Science','English']].sum().plot.pie(autopct='%1.1f%%')

plt.title("Pie Chart - Total Marks")

plt.show()
# Histogram

plt.hist(df['Science'], bins=5, edgecolor='black')

plt.title("Histogram - Science Marks")

plt.show()

Expected Output:

Line, bar, pie, and histogram plots.

---

Experiment 4: CRUD Operations in MongoDB

Aim:

To perform CRUD operations and manage arrays in MongoDB using Python.

Code (mongodb_crud.py):

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")

db = client["studentDB"]

collection = db["students"]

# Insert

collection.insert_many([

{"name": "Alice", "math": 85, "skills": ["Python", "SQL"]},

{"name": "Bob", "math": 90, "skills": ["Java"]},

{"name": "Charlie", "math": 78, "skills": ["C++"]}

])

# Read

print("All Students:")

for doc in collection.find():

print(doc)

# Update

collection.update_one({"name": "Alice"}, {"$set": {"math": 95}})

print("\nUpdated Alice:", collection.find_one({"name": "Alice"}))

# Delete

collection.delete_one({"name": "Charlie"})

print("\nAfter Deletion:")

for doc in collection.find():

print(doc)

# Query Array

print("\nStudents with Python:")

for doc in collection.find({"skills": "Python"}):

print(doc)

Expected Output:

Insertion, retrieval, update, deletion, and array query results.

---

Experiment 5: Advanced MongoDB Operations

Aim:

To use Count, Sort, Limit, Skip, and Aggregate functions in MongoDB.

Code:

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")

db = client["companyDB"]

collection = db["employees"]

collection.insert_many([

{"name": "A", "salary": 50000, "dept": "HR"},

{"name": "B", "salary": 60000, "dept": "IT"},

{"name": "C", "salary": 70000, "dept": "IT"},

{"name": "D", "salary": 80000, "dept": "Finance"},

{"name": "E", "salary": 75000, "dept": "Finance"}

])

print("Count:", collection.count_documents({}))

print("\nSorted by Salary:")
for doc in collection.find().sort("salary", -1):

print(doc)

print("\nTop 3 Salaries:")

for doc in collection.find().sort("salary", -1).limit(3):

print(doc)

print("\nSkip 2:")

for doc in collection.find().sort("salary", -1).skip(2):

print(doc)

print("\nAverage Salary by Dept:")

for doc in collection.aggregate([

{"$group": {"_id": "$dept", "avgSalary": {"$avg": "$salary"}}}

]):

print(doc)

---

Experiment 6: Word Count Using MapReduce

Aim:

To implement word frequency using MapReduce in Python.

Code:
def mapper(line):

return [(word, 1) for word in line.strip().split()]

def shuffle_and_sort(mapped):

grouped = {}

for word, count in mapped:

grouped.setdefault(word, []).append(count)

return grouped

def reducer(grouped):

return {word: sum(counts) for word, counts in grouped.items()}

def main():

with open("input.txt", "r") as f:

lines = f.readlines()

mapped = []

for line in lines:

mapped.extend(mapper(line))

grouped = shuffle_and_sort(mapped)

reduced = reducer(grouped)

print("Word Count:", reduced)

if __name__ == "__main__":

main()
---

Experiment 7: MapReduce on Dataset (Average Salary)

Aim:

To compute average salary per department using MapReduce.

Code:

import csv

def mapper(row):

dept, salary = row[1], float(row[2])

return (dept, (salary, 1))

def shuffle_and_sort(mapped):

grouped = {}

for dept, (salary, count) in mapped:

if dept not in grouped:

grouped[dept] = []

grouped[dept].append((salary, count))

return grouped

def reducer(grouped):

results = {}
for dept, values in grouped.items():

total_salary = sum(s for s, _ in values)

total_count = sum(c for _, c in values)

results[dept] = round(total_salary / total_count, 2)

return results

def main():

with open("employees.csv") as f:

reader = csv.reader(f)

next(reader)

mapped = [mapper(row) for row in reader]

grouped = shuffle_and_sort(mapped)

reduced = reducer(grouped)

print("Average Salary by Department:", reduced)

if __name__ == "__main__":

main()

---

Experiment 8: Clustering with Spark MLlib

Aim:

To perform clustering using Spark MLlib’s K-Means.

Code (PySpark):

from pyspark.ml.clustering import KMeans

from pyspark.ml.feature import VectorAssembler

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Clustering").getOrCreate()

# load dataset

data = [(1, 2.0, 3.0), (2, 10.0, 15.0), (3, 25.0, 30.0)]

df = spark.createDataFrame(data, ["id", "x", "y"])

# assemble features

vec = VectorAssembler(inputCols=["x", "y"], outputCol="features")

df = vec.transform(df)

# k-means

kmeans = KMeans(k=2, seed=1)

model = kmeans.fit(df)

print("Cluster Centers:")

for center in model.clusterCenters():

print(center)

✅ Experiment 9 Program: MongoDB + Hadoop Integration

Aim:
To build a mini-project that stores student data in MongoDB, exports it to
HDFS (Hadoop Distributed File System), and then processes it (e.g., count
total records).

Python Program (experiment9.py):

From pymongo import MongoClient

Import pandas as pd

Import subprocess

Import os

# 1. Connect to MongoDB

Client = MongoClient(“mongodb://localhost:27017/”)

Db = client[“bigdataDB”]

Collection = db[“students”]

# Insert sample student data

Students = [

{“name”: “Alice”, “dept”: “CS”, “score”: 85},

{“name”: “Bob”, “dept”: “IT”, “score”: 90},

{“name”: “Charlie”, “dept”: “CS”, “score”: 78},

{“name”: “David”, “dept”: “IT”, “score”: 88},

{“name”: “Eve”, “dept”: “Math”, “score”: 92}

]
Collection.insert_many(students)

Print(“✅ Data inserted into MongoDB.”)

# 2. Export data from MongoDB to CSV

Data = list(collection.find({}, {“_id”: 0})) # Exclude _id field

Df = pd.DataFrame(data)

Csv_file = “students.csv”

Df.to_csv(csv_file, index=False)

Print(“✅ Exported data to students.csv”)

# 3. Put CSV file into HDFS (requires Hadoop running)

Hdfs_dir = “/bigdata_exp9”

Try:

Subprocess.run([“hdfs”, “dfs”, “-mkdir”, “-p”, hdfs_dir], check=True)

Subprocess.run([“hdfs”, “dfs”, “-put”, “-f”, csv_file, hdfs_dir], check=True)

Print(f”✅ File uploaded to HDFS at {hdfs_dir}/{csv_file}”)

Except Exception as e:

Print(“⚠️HDFS commands failed. Make sure Hadoop is running.”)

Print€

# 4. Simple Processing Simulation (count rows)

Print(“\nProcessing Data:”)

Print(f”Total number of student records: {len(df)}”)

Print(“Average Score by Department:”)

Print(df.groupby(“dept”)[“score”].mean())
Steps to Run:

1. Start MongoDB (mongod) and Hadoop (start-dfs.sh).

2. Save the program as experiment9.py.

3. Run:

Python experiment9.py

Expected Output:

✅ Data inserted into MongoDB.

✅ Exported data to students.csv

✅ File uploaded to HDFS at /bigdata_exp9/students.csv

Processing Data:

Total number of student records: 5

Average Score by Department:

CS 81.5

IT 89.0

Math 92.0

Name: score, dtype: float64

👉 This program demonstrates:

Storage in MongoDB

Export to Hadoop HDFS

Processing (average scores per department)

Data Science Lab Manual
No ratings yet
Data Science Lab Manual
63 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
62 pages
Datascience Lab Manual
No ratings yet
Datascience Lab Manual
46 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
58 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
Data Analysis Lab with Python
No ratings yet
Data Analysis Lab with Python
11 pages
DSA LAB Manual - Good Content
No ratings yet
DSA LAB Manual - Good Content
70 pages
EX - No: 1 Date:: Download Install Explore The Features of Numpy, Scipy, Jupiter, Statsmodels and Pandas Packages
No ratings yet
EX - No: 1 Date:: Download Install Explore The Features of Numpy, Scipy, Jupiter, Statsmodels and Pandas Packages
38 pages
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
45 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Pandas Numpy
No ratings yet
Pandas Numpy
4 pages
Unit 2
No ratings yet
Unit 2
38 pages
Data Sceince Lab Manual
No ratings yet
Data Sceince Lab Manual
64 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
Data Science Cs3362 Lab Record
No ratings yet
Data Science Cs3362 Lab Record
39 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
18 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Batch2 FDS Printout
No ratings yet
Batch2 FDS Printout
38 pages
02 Python Basics
No ratings yet
02 Python Basics
52 pages
Attachment 3 Python For Data Analysis Lyst9850
No ratings yet
Attachment 3 Python For Data Analysis Lyst9850
31 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
56 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
FDS Lab Manual (Print)
No ratings yet
FDS Lab Manual (Print)
43 pages
Lab 2 DWM
No ratings yet
Lab 2 DWM
13 pages
Learning NumPy and Pandas
No ratings yet
Learning NumPy and Pandas
3 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
Python For Data Analysis Jan 28
No ratings yet
Python For Data Analysis Jan 28
105 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
61 pages
Unit 3
No ratings yet
Unit 3
110 pages
BIG DATA Lab Record-2024
No ratings yet
BIG DATA Lab Record-2024
59 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
59 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
Cs3361-Data Science Lab Manual
No ratings yet
Cs3361-Data Science Lab Manual
44 pages
Python Libraries 2
No ratings yet
Python Libraries 2
80 pages
Foundation of Data Science Lab Manual
No ratings yet
Foundation of Data Science Lab Manual
31 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
ML Manual
No ratings yet
ML Manual
21 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
31 pages
ML With Python Lab (MCA)
No ratings yet
ML With Python Lab (MCA)
36 pages
Python Libraries for Data Science
No ratings yet
Python Libraries for Data Science
96 pages
DS Lab Manual
No ratings yet
DS Lab Manual
113 pages
DS Final
No ratings yet
DS Final
46 pages
Data Preprocessing
No ratings yet
Data Preprocessing
159 pages
Digital Literacy Level 6 Mechanical
No ratings yet
Digital Literacy Level 6 Mechanical
3 pages
Learn Aspen Plus in 24 Hours, Second Edition Thomas A. Adams 2025 PDF Download
No ratings yet
Learn Aspen Plus in 24 Hours, Second Edition Thomas A. Adams 2025 PDF Download
98 pages
Assessment 1 Prelim Exam
100% (1)
Assessment 1 Prelim Exam
2 pages
Six Thinking Hats for Decision Making
No ratings yet
Six Thinking Hats for Decision Making
14 pages
3d Design and Drawing Exercises (@solidworks2019)
No ratings yet
3d Design and Drawing Exercises (@solidworks2019)
349 pages
Id Rather Be
No ratings yet
Id Rather Be
1 page
The Psychology of Motivation
No ratings yet
The Psychology of Motivation
4 pages
Making Personal Finance Decisions - 1
No ratings yet
Making Personal Finance Decisions - 1
8 pages
Non - Compliance Process and SEDEX
100% (1)
Non - Compliance Process and SEDEX
2 pages
Fte Candidate Data 2025-26 Nit Silchar
No ratings yet
Fte Candidate Data 2025-26 Nit Silchar
35 pages
Artificial Intelligence Essential
No ratings yet
Artificial Intelligence Essential
8 pages
Which The Neighbourhood Song Are You - Personality Quiz
No ratings yet
Which The Neighbourhood Song Are You - Personality Quiz
1 page
Elihu Katz Mass Media Clarification
No ratings yet
Elihu Katz Mass Media Clarification
13 pages
2023 Bermuda School Sports Federation Senior School Cross Country Championships
No ratings yet
2023 Bermuda School Sports Federation Senior School Cross Country Championships
5 pages
OJT Agreement for Engineering Students
No ratings yet
OJT Agreement for Engineering Students
3 pages
Journal Reading
No ratings yet
Journal Reading
4 pages
Nudity, Naturism, and Western Thought
No ratings yet
Nudity, Naturism, and Western Thought
26 pages
Cosmos
No ratings yet
Cosmos
4 pages
Reflection Essay
No ratings yet
Reflection Essay
5 pages
Portfolio - Technology
No ratings yet
Portfolio - Technology
7 pages
Cognitive Psychology Thesis Help
100% (1)
Cognitive Psychology Thesis Help
9 pages
Final Exam
No ratings yet
Final Exam
6 pages
Master
No ratings yet
Master
365 pages
Speeches by HH Prince Karim Aga Khan (1963 - 2009)
100% (1)
Speeches by HH Prince Karim Aga Khan (1963 - 2009)
349 pages
Machine Learning Student Grade Prediction
No ratings yet
Machine Learning Student Grade Prediction
14 pages
CTSD2 UNIT-6 Searching and Sorting
No ratings yet
CTSD2 UNIT-6 Searching and Sorting
47 pages
Informed Consent Template
No ratings yet
Informed Consent Template
5 pages
Books To Start Trading
No ratings yet
Books To Start Trading
3 pages
BEST - BCL - 1 - 2 - Diagnostic - Organ or Organelle
100% (1)
BEST - BCL - 1 - 2 - Diagnostic - Organ or Organelle
4 pages
Code of Ethics
No ratings yet
Code of Ethics
2 pages

Data Analytics Lab Manual

Uploaded by

Data Analytics Lab Manual

Uploaded by

Experiment:1

The aim of Experiment 1 is :

To successfully install, configure, and verify a Python programming

To explore and demonstrate basic data manipulation capabilities using the

3. Background and Theoretical Context

3.1 Python for Data Science

Python: Version 3.8 or higher (preferably installed via Anaconda distribution

NumPy: Python library for numerical computing.

Pandas: Python library for data analysis and manipulation.

Integrated Development Environment (IDE) / Code Editor: Jupyter

4. Methodology and Tasks Performed

The experiment was conducted on a [Your Operating System, e.g., Windows

Navigate to the official Anaconda website:

Run the downloaded installer.

Follow the on-screen prompts. It is generally recommended to accept the

Once installed, open your system’s command prompt (Windows: cmd,

Verify Python Installation:

Type the following command and press Enter:

Verify NumPy and Pandas Installation:

Since Anaconda typically pre-installs these libraries, we can verify their

Execute the following commands:

Print(f”NumPy Version: {np.__version__}”)

Print(f”Pandas Version: {pd.__version__}”)

If these commands execute without ModuleNotFoundError, it confirms that

Pip install numpy pandas

# Using conda (if using Anaconda/Miniconda)

Conda install numpy pandas

Task 1: Basic NumPy Array Operations

Objective: Create a NumPy array, perform element-wise operations, and

Task 2: Basic Pandas DataFrame Operations

6. Results and Observations

6.1 Installation Verification

The Python environment was successfully set up using the Anaconda

# Command to check Python version

# Commands to check NumPy and Pandas versions in a Python

Print(f”NumPy Version: {np.__version__}”)

Print(f”Pandas Version: {pd.__version__}”)

# NumPy Version: 1.21.5

# Pandas Version: 1.4.2

Program 1: NumPy Array Operations

Print(“--- NumPy Array Operations ---“)

# 1. Create a NumPy array

Data_list = [10, 20, 30, 40, 50]

Print(“\n1. Original NumPy Array:”)

Print(f” Type: {type(numpy_array)}”)

Print(f” Shape: {numpy_array.shape}”)

Print(f” Data Type (dtype): {numpy_array.dtype}”)

# 2. Perform element-wise operation (e.g., add 5 to each element)

Print(“\n2. Array after adding 5 to each element:”)

# 3. Perform another operation (e.g., multiply by 2)

Print(“\n3. Array after multiplying each element by 2:”)

# 4. Calculate sum of array elements

Print(f”\n4. Sum of array elements: {array_sum}”)

# 5. Create a 2D array and perform operations

Matrix = np.array([[1, 2, 3], [4, 5, 6]])

Print(“\n5. 2D NumPy Array (Matrix):”)

Print(f” Shape of matrix: {matrix.shape}”)

Print(f” Sum of all elements in matrix: {np.sum(matrix)}”)

--- NumPy Array Operations ---

1. Original NumPy Array:

Type: <class ‘numpy.ndarray’>

Data Type (dtype): int64

2. Array after adding 5 to each element:

3. Array after multiplying each element by 2:

4. Sum of array elements: 150

Shape of matrix: (2, 3)

Sum of all elements in matrix: 21

Program 2: Pandas DataFrame Operations

Print(“\n--- Pandas DataFrame Operations ---“)

# 1. Create a Pandas DataFrame from a dictionary

‘Student_ID’: [101, 102, 103, 104, 105],

‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eve’],

‘Score’: [85, 92, 78, 95, 88],

‘Course’: [‘Math’, ‘Physics’, ‘Chemistry’, ‘Math’, ‘Physics’]

Print(“\n1. Original DataFrame:”)

Print(f”\n Type: {type(df)}”)

Print(f”NumPy Version: {np.version}”)

Print(f”Pandas Version: {pd.version}”)

Print(f”NumPy Version: {np.version}”)

Print(f”Pandas Version: {pd.version}”)