Address
:
[go:
up one dir
,
main page
]
Include Form
Remove Scripts
Session Cookies
Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
22 views
17 pages
CH 3 2
Uploaded by
mr explorer
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download
Save
Save Ch-3-2 For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
22 views
17 pages
CH 3 2
Uploaded by
mr explorer
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Carousel Previous
Carousel Next
Download
Save
Save Ch-3-2 For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 17
Search
Fullscreen
Python for Data Science 3-10 Getting Your Hands Dirty with Data * A visual representation of how this image is stored as a NumPy array is as follows ; 1" dimension == import skimage.io # read image | image = skimage.ic.imread(fname="wtable jpg") | import skimage.viewer # display image viewer = skimage. viewer. ImageViewer(image) | viewer.show() | Managing Data from Relational Databases * Relational databases accomplish both the manipulation and data retrieval objectives. SQL perform all sorts of management tasks in a relational database, retrieve data as needed. Relational database consists of one or more tables of information. The rows in the table are called records and the columns in the table are called fields or attributes. A database that contains two or more related tables is called a relational database. When working on a data science project, you may want to connect Python scripts with databases. A library known as SQLAlchemy bridges the gap between SQL and Python. The first thing to do is to create an engine which is an object that is used to manage the connection to the database. from sqlalchemy import create_engine from sqlalchemy.orm import scoped_session, sessionmaker engine = create_engine(‘postgresql://login:password@localhost:5432/flight") ¢ The general format to create an engine is: “ create_engine("“postgresql://login:passeword@localhost:5432/name_database”) © The sqlalchemy library provides support for SQL databases like SQLite, MySQL: PostgreSQL and SQL Server | TECHNICAL PUBLICATIONS® . An up thrust for knowledge | a-— patton for Date Science 3-11 Getting Your Hands Dirty with Data. interacting with Data from NoSQL Databases The Not only SQL (NoSQL) databases are used in large data storage scenarios in which the relational model can become overly complex. « NoSQL databases provide features of retrieval and storage of data in a much different way than their relational database counterparts. The first thing that we need to do in order to establish a connection is import the MongoClient class. We'll use this to communicate with the running database instance. Use the following code to do so: from pymongo import MongoClient client = MongoClient) EX Conditioning Your Data : Juggling between NumPy and Pandas 4. NumPy « Numpy is the core library for scientific computing in Python. It provides a high- performance multidimensional array object, and tools for working with these arrays « NumPy is the fundamental package needed for scientific computing with Python. It contains : a) A powerful N-dimensional array object b) Basic linear algebra functions ©) Basic Fourier transforms d) Sophisticated random number capabilities e) Tools for integrating Fortran code ) Tools for integrating C/C++ code * NumPy is an extension package to Python for array programming. It provides “closer to the hardware” optimization, which in Python means C implementation. 2. Pandas * Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. * DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables. * Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in pandas is often used to feed ‘TECHNICAL PUBLICATIONS® - An up thrust for knowledgePython for Data Science 3-12 Getting Your Hands Dirty with Dy ta statistical analysis in SciPy, plotting functions from Matplotlib, and maching | learning algorithms in Scikit-learn. | + Pandas is the library for data manipulation and analysis. Usually, it is the starting | point for your data science tasks. It allows you to read/write data from/to multiple sources. Process the missing data, align your data, reshape it, merge and join it with other data, search data, group it, slice it. Figuring out what's in Your Data | Duplicate data creates the problem for data science project. If database is large, then processing duplicate data means wastage of time. | « Finding duplicates is important because it will save time, space false result. how to easily and efficiently you can remove the duplicate data using drop_duplicates() function in pandas. | © Create Dataframe with Duplicate data import pandas as pd raw_data = {'first_name’: ['rupali’, ‘rupali’ ‘last_name': [‘dhotre’, 'dhott ': (12, 12, 1111111, 36, 24, 73), ‘TestScorel': (4, 4, 4, 31, 2, 3), "TestScore2': [25, 25, 25, 57, 62, 70]} df = pd.DataFrame(raw_date, columns = |'first_name', ‘last_name’, ‘age’, ‘preTestScore', ‘postTestScore’]) ‘rakshita’'sangeeta’, 'mahesh’, ‘vilas'}, Aut, Jadhav’, 'bagad'), TestScorel Test Score2 oui Drop duplicates | df.drop_duplicates() | * Drop duplicates in the first name column, but take the last observation in the | duplicated set | df.drop_duplicates({ snus |, keep= TECHNICAL PUBLICATIONS® - An up thrust for knowledge : apytnon for Date Science 3-13 Getting Your Hands Dirty with Data pe Creating a Data Map and Data Plan « Overview of dataset is given by data map. Data map is used for finding potential problems in data, such as redundant variables, possible errors, missing values and variable transformations, + Try creating a Python script that converts a Python dictionary into a Pandas DataFrame, then print the DataFrame to screen. import pandas as pd scottish hills = {’Ben Nevis’: (1345, 56.79685, 6.003508), "Ben Macdut': (1909, 7.070453, -3.668262), 'Braeriach': (1296, 57.078628, -3.728024), ‘Caim Toul’: (1291, 57.054611, -3.71042), ‘Sgr an Lochain Uaine': (1258, 57.057999, -3.725416)} | dotaframe = pa.DataFrame(scottish_hills) print(dataframe) Manipulating & Creating Categorical Variables ‘© Categorical variable is one that has a specific value from a limited selection of values. The number of values is usually fixed. * Categorical features can only take on a limited, and usually fixed, number of possible values. For example, if a dataset is about information related to users, then you will typically find features like country, gender, age group, etc. Alternatively, if the data you are working with is related to products, you will find features like product type, manufacturer, seller and so on. * Method for creating a categorical variable and then using it to check whether some data falls within the specified limits. import pandas as pd cycle_colors = pd.Series(|'Blue’, ‘Red’, ‘Green’, dtype='category’) cycle_data = pd Series( pd.Categorical({'Yellow’, ‘Green’, 'Red, ‘Blue’, Purple], categories=cycle_colors, ordered=False)) find_entries = pd isnull(cycle_data) Print cycle colors print Print cycle data print Print find_entries[find_entrie: True] * Here cycle_color is a categorical variable. It contains the values Blue, Red, and Green as color. TECHNICAL PUBLICATIONS® - An up thrust for knowiedge atPython for Data Science 3-14 £3 Renaming Levels & Combining Level * Data frame variable names are typically used many times when wrangling day, Good names for these variables make it easier to write and read wrangling programs. * Categorical data has a categories and a ordered property, which list their Possible values and whether the ordering matters or not. ‘ae Geting Your Hands Dity wih Dy, | | * Renaming categories is done by assigning new values to the Series.cat.categoriog property or by using the Categorical.rename_categories() method: In [41]: 8 = pd.Series({"a',"b',’c','a"), dtype="category") In [41]: 6 Out]43}; Oa 1b 2 3a dtype: category Categories (3, object): [a, b, c] In (44): s.cat.catagories = [Group %s" % g for g in s.cat.categories| In [45]: 5 Out[45}: 0 Groupa 1 Group b 2 Groupe 3 Groupa dtype: category Categories (3, object): [Group a, Group b, Group c] In [46}: s.cat.rename_categories({1,2,3}) Outl46}: 01 12 a 3 a 1 dtype: category Categories (3, inté4): (1, 2, 3] TECHNICAL PUBLICATIONS® - An up thrust for knowledgepynan for Data Science 3-15 Getting Your Hands Diy oe pa Dealing with Dates and Times Values = «Dates are often provided in different formats and must be con; i format DateTime objects before analysis, Verted into single python provides two methods of formatting date and time. 1, str()=it turns a datetime value into a string without any formatting. 2, strftime() function = it define how user want the datetime val ue to, conversion. appear after 4,Using pandas.to_datetime() with a date import pandas as Pa # input in mm.dd.yyyy format date = (21.07.2020 # output in yyyy-mm-dd format print(pd.to_datetime(date)) 2.Using pandas.to_datetime() with a date and time import pandas as pd # date (mm.dd-yyyy) and time (H:MM:SS) date = [21.07.2020 11:31:01 AM'] # output in yyyy-mm-dd HH-MM:SS print(pd.to_datetime(date)) * We can convert a string to datetime using strptime() function. This function is available in datetime and time modules to parse a string to datetime and time objects respectively. * Python strptime() is a class method in datetime class. Its syntax is: datetime.strptime(date_string, format) * Both the arguments are mandatory and should be string import datetime Lea = "ha Sb Sel EE KMS HY" ar, tetime datetime.today() SO today as today strf ae oot} ds 4 Frnt datetime strptime(s, format) town re
>> from sklearn feature_extraction.text import CountVectorizer >>> from sklearn. pipeline import Pipeline >>> import numpy as np >>> corpus = ['this is the first document, ‘this document is the second document’, ‘and this is the third one’, ‘is this the first document!] >>> vocabulary = ['his’, ‘document, first, is, ‘second’, the’ ‘and’, 'one'] >>> pipe = Pipeline({('count’, CountVectorizer(vocabulary= (tfid’, THidfTransformer()))) it(corpus) >>> pipel'count'|.transform(corpus).toarray() Aray([[1, 1, 1, 1, 0, 1, 0, 0), (1,2, 0, 1, 1, 1, 0, O}, vocabulary), al PUBLICATIONS® - An up thrust for knowledgePython for Data Science 3-26 Getting Your Hands IY ity [1,0, 0, 4, 0, 1, 1, 1), 11, 1, 1, 1, 0, 1, 0, OW) >>> pipel'tfid').idf_ array([1, _, 1.22314955, 151082562, 1. _, 1.91629073, >>> pipe.transform(corpus).shape (4, 8) EJ Understanding the Adjacency Matrix * Anadjacency matrix represents the connections between nodes of a graph. 1. _, 1,91629073, 1.91629073)) ¢ The row and column indices represent the vertices: matrix[i][j] = Imatrix{i}fj] 1 means that there is an edge from vertices ii to jj, and matrixfi][j] = Omatrix{ijfj) <9 | denotes that there is no edge between i and j. © The advantage of the adjacency matrix is that it is simple, and for small graphs itis easy to see which nodes are connected to other nodes. ¢ Start Python and import NetworkX ° Different classes exist for directed and undirected networks. Let's create a basic undirected Graph: g = nx.Graph() # empty graph e The graph g can be grown in several ways. NetworkX provides many generator functions and facilities to read and write graphs in many formats. Example: import networkx as nx # Create a networkx graph object my_graph = nx.Graph() # Add edges to to the graph object # Each tuple represents an edge between two nodes my_graph.add_edges_from(| (1,2), (1,3), (3,4), (1,5), (3,5), (4,2), (2,3), (3,0))) # Draw the resulting graph nx.draw(my_graph, with labels=True, font, ed TECHNICAL PUBLICATIONS® - An up thrust for knowiedge gad
You might also like
Pandas Library Documentation
PDF
No ratings yet
Pandas Library Documentation
16 pages
Lec3 PandasDataframes 2
PDF
No ratings yet
Lec3 PandasDataframes 2
16 pages
Week 3 Python
PDF
No ratings yet
Week 3 Python
152 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
PDF
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
Numpy Data Analysis and Visualisation With Python
PDF
No ratings yet
Numpy Data Analysis and Visualisation With Python
75 pages
Lecture Week2
PDF
No ratings yet
Lecture Week2
72 pages
l9 Scientific Python Proc
PDF
No ratings yet
l9 Scientific Python Proc
30 pages
Common Python Data Science Interview Questions1
PDF
No ratings yet
Common Python Data Science Interview Questions1
5 pages
Data Science Workshop - Day 1
PDF
No ratings yet
Data Science Workshop - Day 1
80 pages
DSBDA
PDF
No ratings yet
DSBDA
145 pages
Python Programming Tutorial For Machine Learning Beginners Using
PDF
No ratings yet
Python Programming Tutorial For Machine Learning Beginners Using
13 pages
Data Science Papers
PDF
No ratings yet
Data Science Papers
109 pages
Wa0005.
PDF
No ratings yet
Wa0005.
29 pages
Unit 4 - Working With Graphs - Python
PDF
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Chapter 4 - Python For Data Analysis
PDF
No ratings yet
Chapter 4 - Python For Data Analysis
47 pages
Python For DS Cheat Sheet
PDF
100% (2)
Python For DS Cheat Sheet
6 pages
Python For Data Analysis
PDF
No ratings yet
Python For Data Analysis
96 pages
Data Analysis Using Python2
PDF
No ratings yet
Data Analysis Using Python2
27 pages
Data Analysis Using Python Day - 1 To Day - 4
PDF
No ratings yet
Data Analysis Using Python Day - 1 To Day - 4
30 pages
Utf-8''libraries Data Management
PDF
No ratings yet
Utf-8''libraries Data Management
9 pages
Pandas PDF
PDF
No ratings yet
Pandas PDF
25 pages
Machine Learning Lecture2
PDF
No ratings yet
Machine Learning Lecture2
38 pages
More On Pandas
PDF
No ratings yet
More On Pandas
51 pages
Unit 7: Problem Solving Real World Programming Problems
PDF
No ratings yet
Unit 7: Problem Solving Real World Programming Problems
36 pages
Data Structures For Statistical Computing in Python
PDF
No ratings yet
Data Structures For Statistical Computing in Python
6 pages
01 Introduction To Python
PDF
No ratings yet
01 Introduction To Python
36 pages
Pandas Course Slides
PDF
No ratings yet
Pandas Course Slides
90 pages
Ds With Py
PDF
No ratings yet
Ds With Py
39 pages
Mdad - Numpy ML
PDF
No ratings yet
Mdad - Numpy ML
85 pages
Q.1 Explain Process of Working With Data From Files in Data Science
PDF
No ratings yet
Q.1 Explain Process of Working With Data From Files in Data Science
20 pages
Datascience
PDF
No ratings yet
Datascience
26 pages
Ass 1 DSBDL
PDF
No ratings yet
Ass 1 DSBDL
24 pages
Unit 4 Fod
PDF
100% (1)
Unit 4 Fod
21 pages
DS Final
PDF
No ratings yet
DS Final
46 pages
Report File
PDF
No ratings yet
Report File
40 pages
Usage of NumPy For Numerical Data in Detail
PDF
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
NumPy and Pandas Tutorial
PDF
No ratings yet
NumPy and Pandas Tutorial
8 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
PDF
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
01 Introduction To Python
PDF
No ratings yet
01 Introduction To Python
36 pages
Data Science
PDF
No ratings yet
Data Science
42 pages
Ass1 DSBDA Writeup
PDF
No ratings yet
Ass1 DSBDA Writeup
8 pages
Practical Data Science
PDF
No ratings yet
Practical Data Science
121 pages
NumPy and Pandas
PDF
No ratings yet
NumPy and Pandas
72 pages
Cheat Sheet: Python For Data Science
PDF
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Course - Introduction To Data Science (SD211105)
PDF
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Unit 3 (FODS)
PDF
No ratings yet
Unit 3 (FODS)
34 pages
Data Analysis With Python
PDF
100% (3)
Data Analysis With Python
49 pages
DAL EXT 1 and 2
PDF
No ratings yet
DAL EXT 1 and 2
125 pages
Report
PDF
No ratings yet
Report
18 pages
Cheat Sheet: Python For Data Science
PDF
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Data Analysis With Python
PDF
No ratings yet
Data Analysis With Python
12 pages
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
PDF
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
8 pages
DSBDA Lab Manual
PDF
No ratings yet
DSBDA Lab Manual
155 pages
FDS Module 2 Notes
PDF
No ratings yet
FDS Module 2 Notes
24 pages
Data Analysis and Visualisation With Python
PDF
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Pandas
PDF
No ratings yet
Pandas
41 pages
Commands SQL, Python (BASICS)
PDF
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Data Wrangling With Python and Pandas
PDF
No ratings yet
Data Wrangling With Python and Pandas
7 pages