numpy1
numpy1
keyboard_arrow_down Content
Introduction to DAV
Python Lists vs Numpy Array
Importing Numpy
Why use Numpy?
Dimension & Shape
Type Conversion
Indexing
Slicing
NPS use case
Numpy
Pandas
Matplotlib & Seaborn
2. DAV-2: Probability Statistics
3. DAV-3: Hypothesis Testing
Because of this hetergenity, in Python lists, the data elements are not stored together in the memory (RAM).
On the other hand, Numpy only stores homogenous data, i.e. a numpy array cannot contain mixed data types.
It will either
Speed
In fact,
With Numpy, though we will be writing our code using Python, but behind the scene, all the code is written in the C programming language, to
make it faster.
Because of this, a Numpy Array will be significantly faster than a Python List in performing the same operation.
This is very important to us, because in data science, we deal with huge amount of data.
keyboard_arrow_down Properties
In-built Functions
Slicing
import numpy as np
Note:
In this terminal, we will already have numpy installed as we are working on Google Colab
However, when working on an evironment that does not have it installed, you'll have to install it the first time working.
This can be done with the command: !pip install numpy
type(a)
list
The basic approach here would be to iterate over the list and square each element.
We can convert any list a into a Numpy array using the array() function.
b = np.array(a)
b
array([1, 2, 3, 4, 5])
type(b)
numpy.ndarray
Now, how can we get the square of each element in the same Numpy array?
b**2
But is the clean syntax and ease in writing the only benefit we are getting here?
l = range(1000000)
343 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
It took approx 300 ms per loop to iterate and square all elements from 0 to 999,999
l = np.array(range(1000000))
%timeit l**2
778 µs ± 100 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Notice that it only took 900 𝜇s per loop time for the numpy operation.
arr1 = np.array(range(1000000))
arr1.ndim
Numpy arrays have another property called shape that tells us number of elements across every dimension.
arr1.shape
(1000000,)
This means that the array arr1 has 1000000 elements in a single dimension.
[[ 1 2 3]
[ 4 5 6]
[10 11 12]]
What do you think will be the shape & dimension of this array?
arr2.ndim
arr2.shape
(3, 3)
ndim specifies the number of dimensions of the array i.e. 1D (1), 2D (2), 3D (3) and so on.
shape returns the exact shape in all dimensions, that is (3,3) which implies 3 in axis 0 and 3 in axis 1.
keyboard_arrow_down np.arange()
We can pass starting point, ending point (not included in the array) and step-size.
Syntax:
arr2 = np.arange(1, 5)
arr2
array([1, 2, 3, 4])
arr2_step = np.arange(1, 5, 2)
arr2_step
array([1, 3])
array([1, 2, 3, 4])
Similarly, what will happen when we run the following code? Will it give an error?
array([1, 2, 3, 4])
arr5 = np.array([1, 2, 3, 4], dtype="float")
arr5
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-26-bdb627c3c07e> in <cell line: 1>()
----> 1 np.array(["Shivank", "Bipin", "Ritwik"], dtype=float)
Since it is not possible to convert strings of alphabets to floats, it will naturally return an Error.
We can also convert the data type with the astype() method.
arr = arr.astype('float64')
print(arr)
keyboard_arrow_down Indexing
Similar to Python lists
m1 = np.arange(12)
m1
11
m1 = np.array([100,200,300,400,500,600])
m1[[2,3,4,1,2,2]]
Did you notice how single index can be repeated multiple times when giving list of indexes?
Note:
If you want to extract multiple indices, you need to use two sets of square brackets [[ ]]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-34-0ec34089038e> in <cell line: 1>()
----> 1 m1[2,3,4,1,2,2]
IndexError: too many indices for array: array is 1-dimensional, but 6 were indexed
keyboard_arrow_down Slicing
Similar to Python lists
m1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
m1
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
m1[:5]
array([1, 2, 3, 4, 5])
m1[-5:-1]
array([6, 7, 8, 9])
array([], dtype=int64)
m1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
m1 < 6
array([ True, True, True, True, True, False, False, False, False,
False])
m1[[True, True, True, True, True, False, False, False, False, False]]
array([1, 2, 3, 4, 5])
Now, let's use this to filter or mask values from our array.
m1[m1 < 6]
array([1, 2, 3, 4, 5])
m1[m1%2 == 0]
array([ 2, 4, 6, 8, 10])
You've been asked to analyze user survey data and report NPS to the management.
Have you all seen that every month, you get a survey form from Scaler?
This form asks you to fill in feedback regarding how you are liking the services of Scaler in terms of a numerical score.
This is known as the Likelihood to Recommend Survey.
It is widely used by different companies and service providers to evaluate their performance and customer satisfaction.
Range of NPS
NPS helps a brand in gauging its brand value and sentiment in the market.
Promoters are highly likely to recommend your product or sevice. Hence, bringing in more business.
whereas, Detractors are likely to recommend against your product or service’s usage. Hence, bringing the business down.
These insights can help business make customer oriented decision along with product improvisation.
Even at Scaler, every month, we randomnly reach out to our learners over a call, and try to understand,
Based on the feedback received, sometimes we end up getting really good insights, and tackle them.
Dataset: https://drive.google.com/file/d/1c0ClC8SrPwJq5rrkyMKyPn80nyHcFikK/view?usp=sharing
type(score)
numpy.ndarray
score[:5]
score.shape
(1167,)
% Promoters
% Detractors
In order to calculate % Promoters and % Detractors , we need to get the count of promoter as well as detractor.
# Number of detractors -
num_detractors = len(detractors)
num_detractors
332
# Number of promoters -
num_promoters = len(promoters)
num_promoters
609
total = len(score)
total
1167
# % of detractors -
28.449014567266495
# % of promoters -
52.185089974293064
23.73607540702657
# Rounding off upto 2 decimal places -
np.round(nps, 2)
output 23.74