0% found this document useful (0 votes)

41 views8 pages

Dav Lab

The document discusses popular Python and R libraries used for data analytics and visualization. It introduces NumPy, Pandas, Matplotlib, Scikit-learn, SciPy, Dplyr, ggplot2, tidyr, Shiny, and Plotly libraries and provides examples of key functions and code snippets for working with arrays, dataframes, plotting, machine learning, and more.

Uploaded by

Vidhi Artani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views8 pages

Dav Lab

Uploaded by

Vidhi Artani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Data analytics library for python and R

Python libraries:-

● NumPy

● Pandas

● Matplotlib

● PyTorch

● SciKit-Learn

NumPy ;

NumPy is a Python library used for working with arrays.

It also has functions for working in domain of linear algebra, fourier

transform, and matrices.

NumPy was created in 2005 by Travis Oliphant. It is an open source project

and you can use it freely.

NumPy stands for Numerical Python.

In Python we have lists that serve the purpose of arrays, but they are slow to
process.

NumPy aims to provide an array object that is up to 50x faster than

traditional Python lists.

The array object in NumPy is called ndarray, it provides a lot of supporting

functions that make working with ndarray very easy.

Arrays are very frequently used in data science, where speed and resources
are very important.

Functions: np.array(), np.zeros(), np.ones()

Code snippet:

import numpy as np

a = np.array([1, 2, 3]) :

NumPy is used to work with arrays. The array object in NumPy is

called ndarray.

We can create a NumPy ndarray object by using the array() function.

b = np.zeros((2, 3)) :-

Return a new array of given shape and type, filled with zeros.

c = np.ones((2, 3))

Return a new array of given shape and type, filled with ones.

2] Pandas:-

Pandas is a library for data manipulation and analysis, providing data structures and
operations for manipulating numerical tables and time series.

Pandas are generally used for data science but have you wondered why? This is
because pandas are used in conjunction with other libraries that are used for data
science. It is built on the top of the NumPy library which means that a lot of structures of
NumPy are used or replicated in Pandas. The data produced by Pandas are often used
as input for plotting functions of Matplotlib, statistical analysis in SciPy, and machine
learning algorithms in Scikit-learn. Here is a list of things that we can do using Pandas.

Functions:-

pd.read_csv(): Read a CSV file into a DataFrame.

pd.DataFrame(): Create a DataFrame.

pd.concat(): Concatenate DataFrames.

Code snippets:

python

import pandas as pd
df = pd.read_csv('data.csv')

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

df_concat = pd.concat([df1, df2])

3] Matplotlib:

Matplotlib is a low level graph plotting library in python that serves as a

visualization utility.

Matplotlib was created by John D. Hunter.

Matplotlib is open source and we can use it freely.

Matplotlib is mostly written in python, a few segments are written in C,

Objective-C and Javascript for Platform compatibility.

Functions:

plt.plot(): Plot a line chart.

plt.scatter(): Create a scatter plot.

plt.hist(): Plot a histogram.

Code snippets:

python

import matplotlib.pyplot as plt

x = [1, 2, 3]

y = [4, 5, 6]

plt.plot(x, y)

plt.scatter(x, y)

plt.hist(x)
4] Scikit-learn:

Scikit-learn is a machine learning library for Python, providing simple and

efficient tools for predictive data analysis.

Functions:

train_test_split(): Split arrays or matrices into random train and test

subsets.

LinearRegression(): Create a linear regression model.

KMeans(): Perform K-Means clustering.

Code snippets:

python

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.cluster import KMeans

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LinearRegression()

model.fit(X_train, y_train)

kmeans = KMeans(n_clusters=3)

kmeans.fit(X)

5] SciPy:

SciPy is a library for scientific computing and technical computing, providing

many user-friendly and efficient numerical routines.

Functions:

scipy.optimize(): Optimization tools.

scipy.stats(): Statistical functions.

scipy.integrate(): Integration functions.

Code snippets:

python

from scipy.optimize import minimize

from scipy.stats import norm

from scipy.integrate import quad

result = minimize(objective, x0)

p = norm.pdf(0, 1)

result = quad(func, 0, 1)

R libraries:-

● Dplyr

● ggplot2

● tidyr

● Shiny

● Plotly

1] Dplyr:-

dplyr is a popular data manipulation library in R, known for its

functions that are combined naturally with the group_by function.

Functions:

mutate(): Add new variables that are functions of existing variables.

select(): Select variables based on their names.

filter(): Pick/select cases based on their values.

Code snippets:
r

library(dplyr)

new_data <- data %>% mutate(new_var = var1 + var2)

selected_data <- data %>% select(var1, var2)

filtered_data <- data %>% filter(var1 > 10)

2] ggplot2:

ggplot2 is a widely used data visualization library in R, known for its

high-quality graphs and functionality.

Functions:

ggplot(): Create a new ggplot.

geom_point(): Add a layer of points to the plot.

geom_bar(): Add a layer of bars to the plot.

Code snippets:

library(ggplot2)

p <- ggplot(data, aes(x = var1, y = var2)) + geom_point()

p <- ggplot(data, aes(x = var1, fill = var2)) + geom_bar()

2] tidyr:

tidyr is a library in the tidyverse of R packages, used for tidying or

cleaning data.

Functions:

gather(): Gather columns into key-value pairs.

spread(): Spread a key-value pair across multiple columns.

separate(): Separate one column into multiple columns.

Code snippets:

library(tidyr)

gathered_data <- data %>% gather(key, value, -id)

spread_data <- data %>% spread(key, value)

separated_data <- data %>% separate(col, into = c("new_col1",

"new_col2"), sep = "-")

3]Shiny:

Shiny is an R package used for developing interactive web applications

and dashboards.

Functions:

fluidPage(): Create a page with fluid layout.

renderPlot(): Render a plot.

reactive(): Create a reactive expression.

Code snippets:

library(shiny)

ui <- fluidPage(

plotOutput("plot1")

server <- function(input, output) {

output$plot1 <- renderPlot({

plot(data)

})

}
4] Plotly:

plotly is an R package for creating interactive web-based graphs via

the plotly.js JavaScript library.

Functions:

plot_ly(): Create a plotly plot.

add_trace(): Add trace(s) to a plotly visualization.

layout(): Modify the layout of a plotly visualization.

Code snippets:

library(plotly)

p <- plot_ly(data, x = ~var1, y = ~var2, type = 'scatter', mode =

'markers')

p <- add_trace(p, y = ~var3, mode = 'lines')

p <- layout(p, title = "A Plotly Plot")

Data Science
No ratings yet
Data Science
17 pages
Python Packages To Learn Data Science E-Book
No ratings yet
Python Packages To Learn Data Science E-Book
76 pages
Machine Tool Technology - 20me32p
No ratings yet
Machine Tool Technology - 20me32p
10 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
Unit 5 Python Notes HM
No ratings yet
Unit 5 Python Notes HM
59 pages
Unit 4
No ratings yet
Unit 4
105 pages
Collins FMS-4200 Flight Management System PDF
100% (4)
Collins FMS-4200 Flight Management System PDF
606 pages
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
Python
No ratings yet
Python
29 pages
l9 Scientific Python Proc
No ratings yet
l9 Scientific Python Proc
30 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Unit-2 Ds
No ratings yet
Unit-2 Ds
26 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
1.1b UNATE - Timing Arc
No ratings yet
1.1b UNATE - Timing Arc
7 pages
Elc Report
No ratings yet
Elc Report
12 pages
Numpy Code
No ratings yet
Numpy Code
10 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Programming For Data Science
No ratings yet
Programming For Data Science
48 pages
Scipy, Matplotlib, Pandas
No ratings yet
Scipy, Matplotlib, Pandas
16 pages
Unit 5
No ratings yet
Unit 5
28 pages
A Survey: Object Detection Methods From CNN To Transformer
No ratings yet
A Survey: Object Detection Methods From CNN To Transformer
31 pages
NumPy, Pandas, MatplotLib, Seaborn, ScikitLearn (SkLearn)
No ratings yet
NumPy, Pandas, MatplotLib, Seaborn, ScikitLearn (SkLearn)
14 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Goldline-Field Devices
No ratings yet
Goldline-Field Devices
40 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Fds Record
No ratings yet
Fds Record
69 pages
Python Datasci Slides
No ratings yet
Python Datasci Slides
13 pages
Altered States of Consciousness PDF
No ratings yet
Altered States of Consciousness PDF
14 pages
FDS Record Last
No ratings yet
FDS Record Last
61 pages
APC Silcon 60-80kW 208/480V UPS Installation Guide
No ratings yet
APC Silcon 60-80kW 208/480V UPS Installation Guide
94 pages
Python Packages Presentation
No ratings yet
Python Packages Presentation
3 pages
Photography - May 2021
No ratings yet
Photography - May 2021
17 pages
Lab 2 Report
No ratings yet
Lab 2 Report
6 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
Dsbda Unit4
No ratings yet
Dsbda Unit4
110 pages
DDI Book Chapter Tools and Techniques
No ratings yet
DDI Book Chapter Tools and Techniques
13 pages
PUBLIC ADDRESS AND GENERAL Alarm System
No ratings yet
PUBLIC ADDRESS AND GENERAL Alarm System
3 pages
Airspy App Guide
No ratings yet
Airspy App Guide
10 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
Chapter 2 PDF
No ratings yet
Chapter 2 PDF
13 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
49 pages
Paper - Willingness To Pay
No ratings yet
Paper - Willingness To Pay
10 pages
The Pros and Cons of Social Media
No ratings yet
The Pros and Cons of Social Media
4 pages
Common Python Packages For FinML
No ratings yet
Common Python Packages For FinML
7 pages
Libraries
No ratings yet
Libraries
3 pages
Expt-1 Dav
No ratings yet
Expt-1 Dav
5 pages
For Ucsp 2nd Topic For 2nd Semester 4th Quarter
No ratings yet
For Ucsp 2nd Topic For 2nd Semester 4th Quarter
31 pages
Mitsubishi Brochurev2
No ratings yet
Mitsubishi Brochurev2
28 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
Mag KHS
No ratings yet
Mag KHS
4 pages
William G. Niederland - The Schreber Case - Psychoanalytic Profile of A Paranoid Personality-Routledge (1984) PDF
No ratings yet
William G. Niederland - The Schreber Case - Psychoanalytic Profile of A Paranoid Personality-Routledge (1984) PDF
197 pages
Clustering in Python-Dr. Afsaneh Javadi
No ratings yet
Clustering in Python-Dr. Afsaneh Javadi
8 pages
Numpy Lib
No ratings yet
Numpy Lib
19 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
ML Pgms - 24mar2025
No ratings yet
ML Pgms - 24mar2025
23 pages
Data Visualization1
No ratings yet
Data Visualization1
52 pages
Exp 1
No ratings yet
Exp 1
22 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
18 pages
SECTION 205-01 Driveshaft
No ratings yet
SECTION 205-01 Driveshaft
6 pages
Machine Learning Document
No ratings yet
Machine Learning Document
7 pages
Divine Word College of Laoag Graduate School
No ratings yet
Divine Word College of Laoag Graduate School
6 pages
ML Exp
No ratings yet
ML Exp
9 pages
Castrol Hyspin VG Range: Description
No ratings yet
Castrol Hyspin VG Range: Description
2 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Field Force Automation Implementation at Dawlance
No ratings yet
Field Force Automation Implementation at Dawlance
22 pages
Low Voltage, Synchronous Step Down PWM Controller: Ideal For 2A To 10A, Small Footprint, DC-DC Power Converters
No ratings yet
Low Voltage, Synchronous Step Down PWM Controller: Ideal For 2A To 10A, Small Footprint, DC-DC Power Converters
10 pages
6AG41415BC050FA0 Datasheet en
No ratings yet
6AG41415BC050FA0 Datasheet en
3 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
Data Visualization
No ratings yet
Data Visualization
25 pages
Python Libraries
No ratings yet
Python Libraries
17 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
74 pages
DVAP - Final Project Report
No ratings yet
DVAP - Final Project Report
27 pages
Art Rubric
No ratings yet
Art Rubric
1 page
Unit 5
No ratings yet
Unit 5
27 pages
Python Data Visualization Cookbook - Second Edition - Sample Chapter
100% (1)
Python Data Visualization Cookbook - Second Edition - Sample Chapter
22 pages
Rioflex MX 7000 Average Density
100% (2)
Rioflex MX 7000 Average Density
2 pages
EDA Document
No ratings yet
EDA Document
13 pages
Chauvin Arnoux F05 PDF
No ratings yet
Chauvin Arnoux F05 PDF
36 pages
Python For Data Science
No ratings yet
Python For Data Science
8 pages
Introduction To NuScale Design
No ratings yet
Introduction To NuScale Design
26 pages
Top 18 Python Libraries
100% (1)
Top 18 Python Libraries
11 pages
Indian Monsoon
No ratings yet
Indian Monsoon
18 pages
Kanban Cheat Sheet PDF
No ratings yet
Kanban Cheat Sheet PDF
1 page
Introduction To Ore Mineralogy - Thalhammer
100% (1)
Introduction To Ore Mineralogy - Thalhammer
43 pages
Mastering Python Data Visualization - Sample Chapter
100% (9)
Mastering Python Data Visualization - Sample Chapter
63 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet

Dav Lab

Uploaded by

Dav Lab

Uploaded by

Data analytics library for python and R

NumPy is a Python library used for working with arrays.

It also has functions for working in domain of linear algebra, fourier

NumPy was created in 2005 by Travis Oliphant. It is an open source project

NumPy stands for Numerical Python.

NumPy aims to provide an array object that is up to 50x faster than

The array object in NumPy is called ndarray, it provides a lot of supporting

Functions: np.array(), np.zeros(), np.ones()

NumPy is used to work with arrays. The array object in NumPy is

We can create a NumPy ndarray object by using the array() function.

pd.read_csv(): Read a CSV file into a DataFrame.

pd.DataFrame(): Create a DataFrame.

pd.concat(): Concatenate DataFrames.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

df_concat = pd.concat([df1, df2])

Matplotlib is a low level graph plotting library in python that serves as a

Matplotlib was created by John D. Hunter.

Matplotlib is open source and we can use it freely.

Matplotlib is mostly written in python, a few segments are written in C,

plt.plot(): Plot a line chart.

plt.scatter(): Create a scatter plot.

plt.hist(): Plot a histogram.

import matplotlib.pyplot as plt

Scikit-learn is a machine learning library for Python, providing simple and

train_test_split(): Split arrays or matrices into random train and test

LinearRegression(): Create a linear regression model.

KMeans(): Perform K-Means clustering.

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.cluster import KMeans

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

SciPy is a library for scientific computing and technical computing, providing

scipy.optimize(): Optimization tools.

scipy.stats(): Statistical functions.

from scipy.optimize import minimize

from scipy.stats import norm

from scipy.integrate import quad

result = minimize(objective, x0)

dplyr is a popular data manipulation library in R, known for its

mutate(): Add new variables that are functions of existing variables.

select(): Select variables based on their names.

filter(): Pick/select cases based on their values.

new_data <- data %>% mutate(new_var = var1 + var2)

selected_data <- data %>% select(var1, var2)

filtered_data <- data %>% filter(var1 > 10)

ggplot2 is a widely used data visualization library in R, known for its

ggplot(): Create a new ggplot.

geom_point(): Add a layer of points to the plot.

geom_bar(): Add a layer of bars to the plot.

p <- ggplot(data, aes(x = var1, y = var2)) + geom_point()

p <- ggplot(data, aes(x = var1, fill = var2)) + geom_bar()

tidyr is a library in the tidyverse of R packages, used for tidying or

gather(): Gather columns into key-value pairs.

spread(): Spread a key-value pair across multiple columns.

separate(): Separate one column into multiple columns.

gathered_data <- data %>% gather(key, value, -id)

spread_data <- data %>% spread(key, value)

separated_data <- data %>% separate(col, into = c("new_col1",

Shiny is an R package used for developing interactive web applications

fluidPage(): Create a page with fluid layout.

renderPlot(): Render a plot.

reactive(): Create a reactive expression.

server <- function(input, output) {

output$plot1 <- renderPlot({

plotly is an R package for creating interactive web-based graphs via

plot_ly(): Create a plotly plot.

add_trace(): Add trace(s) to a plotly visualization.

layout(): Modify the layout of a plotly visualization.

p <- plot_ly(data, x = ~var1, y = ~var2, type = 'scatter', mode =

p <- add_trace(p, y = ~var3, mode = 'lines')

p <- layout(p, title = "A Plotly Plot")

You might also like