[go: up one dir, main page]

0% found this document useful (0 votes)
41 views8 pages

Dav Lab

The document discusses popular Python and R libraries used for data analytics and visualization. It introduces NumPy, Pandas, Matplotlib, Scikit-learn, SciPy, Dplyr, ggplot2, tidyr, Shiny, and Plotly libraries and provides examples of key functions and code snippets for working with arrays, dataframes, plotting, machine learning, and more.

Uploaded by

Vidhi Artani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views8 pages

Dav Lab

The document discusses popular Python and R libraries used for data analytics and visualization. It introduces NumPy, Pandas, Matplotlib, Scikit-learn, SciPy, Dplyr, ggplot2, tidyr, Shiny, and Plotly libraries and provides examples of key functions and code snippets for working with arrays, dataframes, plotting, machine learning, and more.

Uploaded by

Vidhi Artani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Data analytics library for python and R

Python libraries:-

● NumPy

● Pandas

● Matplotlib

● PyTorch

● SciKit-Learn

NumPy ;

NumPy is a Python library used for working with arrays.

It also has functions for working in domain of linear algebra, fourier


transform, and matrices.

NumPy was created in 2005 by Travis Oliphant. It is an open source project


and you can use it freely.

NumPy stands for Numerical Python.

In Python we have lists that serve the purpose of arrays, but they are slow to
process.

NumPy aims to provide an array object that is up to 50x faster than


traditional Python lists.

The array object in NumPy is called ndarray, it provides a lot of supporting


functions that make working with ndarray very easy.

Arrays are very frequently used in data science, where speed and resources
are very important.

Functions: np.array(), np.zeros(), np.ones()


Code snippet:

import numpy as np

a = np.array([1, 2, 3]) :

NumPy is used to work with arrays. The array object in NumPy is


called ndarray.

We can create a NumPy ndarray object by using the array() function.

b = np.zeros((2, 3)) :-

Return a new array of given shape and type, filled with zeros.

c = np.ones((2, 3))

Return a new array of given shape and type, filled with ones.

2] Pandas:-

Pandas is a library for data manipulation and analysis, providing data structures and
operations for manipulating numerical tables and time series.

Pandas are generally used for data science but have you wondered why? This is
because pandas are used in conjunction with other libraries that are used for data
science. It is built on the top of the NumPy library which means that a lot of structures of
NumPy are used or replicated in Pandas. The data produced by Pandas are often used
as input for plotting functions of Matplotlib, statistical analysis in SciPy, and machine
learning algorithms in Scikit-learn. Here is a list of things that we can do using Pandas.

Functions:-

pd.read_csv(): Read a CSV file into a DataFrame.

pd.DataFrame(): Create a DataFrame.

pd.concat(): Concatenate DataFrames.

Code snippets:

python

import pandas as pd
df = pd.read_csv('data.csv')

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

df_concat = pd.concat([df1, df2])

3] Matplotlib:

Matplotlib is a low level graph plotting library in python that serves as a


visualization utility.

Matplotlib was created by John D. Hunter.

Matplotlib is open source and we can use it freely.

Matplotlib is mostly written in python, a few segments are written in C,


Objective-C and Javascript for Platform compatibility.

Functions:

plt.plot(): Plot a line chart.

plt.scatter(): Create a scatter plot.

plt.hist(): Plot a histogram.

Code snippets:

python

import matplotlib.pyplot as plt

x = [1, 2, 3]

y = [4, 5, 6]

plt.plot(x, y)

plt.scatter(x, y)

plt.hist(x)
4] Scikit-learn:

Scikit-learn is a machine learning library for Python, providing simple and


efficient tools for predictive data analysis.

Functions:

train_test_split(): Split arrays or matrices into random train and test


subsets.

LinearRegression(): Create a linear regression model.

KMeans(): Perform K-Means clustering.

Code snippets:

python

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.cluster import KMeans

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LinearRegression()

model.fit(X_train, y_train)

kmeans = KMeans(n_clusters=3)

kmeans.fit(X)

5] SciPy:

SciPy is a library for scientific computing and technical computing, providing


many user-friendly and efficient numerical routines.

Functions:

scipy.optimize(): Optimization tools.

scipy.stats(): Statistical functions.


scipy.integrate(): Integration functions.

Code snippets:

python

from scipy.optimize import minimize

from scipy.stats import norm

from scipy.integrate import quad

result = minimize(objective, x0)

p = norm.pdf(0, 1)

result = quad(func, 0, 1)

R libraries:-

● Dplyr

● ggplot2

● tidyr

● Shiny

● Plotly

1] Dplyr:-

dplyr is a popular data manipulation library in R, known for its


functions that are combined naturally with the group_by function.

Functions:

mutate(): Add new variables that are functions of existing variables.

select(): Select variables based on their names.

filter(): Pick/select cases based on their values.

Code snippets:
r

library(dplyr)

new_data <- data %>% mutate(new_var = var1 + var2)

selected_data <- data %>% select(var1, var2)

filtered_data <- data %>% filter(var1 > 10)

2] ggplot2:

ggplot2 is a widely used data visualization library in R, known for its


high-quality graphs and functionality.

Functions:

ggplot(): Create a new ggplot.

geom_point(): Add a layer of points to the plot.

geom_bar(): Add a layer of bars to the plot.

Code snippets:

library(ggplot2)

p <- ggplot(data, aes(x = var1, y = var2)) + geom_point()

p <- ggplot(data, aes(x = var1, fill = var2)) + geom_bar()

2] tidyr:

tidyr is a library in the tidyverse of R packages, used for tidying or


cleaning data.

Functions:

gather(): Gather columns into key-value pairs.

spread(): Spread a key-value pair across multiple columns.

separate(): Separate one column into multiple columns.


Code snippets:

library(tidyr)

gathered_data <- data %>% gather(key, value, -id)

spread_data <- data %>% spread(key, value)

separated_data <- data %>% separate(col, into = c("new_col1",


"new_col2"), sep = "-")

3]Shiny:

Shiny is an R package used for developing interactive web applications


and dashboards.

Functions:

fluidPage(): Create a page with fluid layout.

renderPlot(): Render a plot.

reactive(): Create a reactive expression.

Code snippets:

library(shiny)

ui <- fluidPage(

plotOutput("plot1")

server <- function(input, output) {

output$plot1 <- renderPlot({

plot(data)

})

}
4] Plotly:

plotly is an R package for creating interactive web-based graphs via


the plotly.js JavaScript library.

Functions:

plot_ly(): Create a plotly plot.

add_trace(): Add trace(s) to a plotly visualization.

layout(): Modify the layout of a plotly visualization.

Code snippets:

library(plotly)

p <- plot_ly(data, x = ~var1, y = ~var2, type = 'scatter', mode =


'markers')

p <- add_trace(p, y = ~var3, mode = 'lines')

p <- layout(p, title = "A Plotly Plot")

You might also like