[go: up one dir, main page]

0% found this document useful (0 votes)
44 views24 pages

MOOC Audit Course 4101079

The document is an audit course report submitted by Jadhav Rahul Kalidas to Savitribai Phule Pune University, focusing on a MOOC about Python for Data Science. It covers the fundamentals of Python, its applications in data science, and the importance of data analysis techniques. The report includes acknowledgments, an abstract, and a structured outline of the content, emphasizing the significance of Python in modern data science practices.

Uploaded by

Rahul Jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views24 pages

MOOC Audit Course 4101079

The document is an audit course report submitted by Jadhav Rahul Kalidas to Savitribai Phule Pune University, focusing on a MOOC about Python for Data Science. It covers the fundamentals of Python, its applications in data science, and the importance of data analysis techniques. The report includes acknowledgments, an abstract, and a structured outline of the content, emphasizing the significance of Python in modern data science practices.

Uploaded by

Rahul Jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

SAVITRIBAI PHULE PUNE UNIVERSITY, PUNE

AUDIT COURSE REPORT ON

MOOC - LEARN NEW SKILL IN PYTHON FOR DATA


SCIENCE
SUBMITTED TO THE SAVITRIBAI PHULE PUNE UNIVERSITY, PUNE
IN THE FULFILLMENT OF THE REQUIREMENT

OF

FINAL YEAR OF COMPUTER ENGINEERING


Academic Year 2023-24

BY

Name: Jadhav Rahul Kalidas PRN: 72168652E

DEPARTMENT OF COMPUTER ENGINEERING


STES’S SINHGAD INSTITUE OF TECHNOLOGY AND SCIENCE

NARHE, PUNE - 411041


CERTIFICATE
This is to certify that the audit course report entitles

“Mooc-Learn New Skill


in
Python for Data science”

Submitted by

Jadhav Rahul Kalidas 72168652E

is a bonafide work carried out by them under the supervision of Ms. K. O. Akhade and is approved for the
partial fulfilment of the requirement of Savitribai Phule Pune University, Pune for the award of the Final
Year of Computer Engineering.

(Ms. K. O. Akhade) (Dr. G.S.Navale)


Co-ordinator Head
Department of Computer Engineering Department of Computer Engineering

(Dr. S.D.Markande)
Principal

SINHGAD INSTITUTE OF TECHNOLOGY and SCIENCE, NARHE – 41

Place: Pune

Date: 12/10/2023
Abstract
Python is a widely used general-purpose, high level programming language. It was initially designed by
Guido van Rossum in 1991 and developed by Python Software Foundation. It was mainly developed for
emphasis on code readability, and its syntax allows programmers to express concepts in fewer lines of code.

Data science is the study of data to extract meaningful insights for business. It is a multidisciplinary approach
that combines principles and practices from the fields of mathematics, statistics, artificial intelligence, and
computer engineering to analyze large amounts of data.

The first of many benefits of Python in data science is its simplicity. Python is an open source language,
meaning the language is open to the public and freely available. This is beneficial for data scientists looking
to learn a new language because there is no up-front cost to start learning Python.
NPTEL course is an introduction to programming and problem solving in Python and aims at equipping
participants to be able to use python programming for solving data science problems. It does not assume any
prior knowledge of programming.
Key words: Python, Data Science, NPTEL, Learn
Acknowledgement
I would like to take a moment to acknowledge and express my deepest gratitude for the kind of help and
guidance received from several people in preparation of this audit course report.
Firstly, I would like to express my deepest gratitude to my guide Ms. K. O. Akhade for his invaluable
guidance, support, and suggestions for the course of the audit course. It would have been not possible to
prepare the audit course in this form without his constant motivation.
Secondly, I would like to thank Dr. G. S. Navale, Head, Department of Computer Engineering, SITS, for
her constant guidance and support throughout the audit course. I also wish record my sincere gratitude to
our beloved Principal, Dr. S.D. Markande for his constant support and encouragement, and for making
available facilities like library and laboratory for the development of this audit course report.
Lastly, I would like to thank to my parents for financing my studies and providing an immense motivation
to study. Also, I would like to express gratitude for my friends for helping me in difficult times during the
audit course and constantly supporting me.

(Jadhav Rahul Kalidas )


Contents

PAGE NO
CHAPTER TITLE
INTRODUCTION TO PYTHON LANGUAGE
1 1

2 DATA SCIENCE 5

3 PYTHON FOR DATA SCIENCE 9

4 HOW TO LEARN PYTHON FOR DATA SCIENCE 13

5 NPTEL CERTIFICATE OF PYTHON FOR DATA SCIENCE 16

6 CONCLUSION 18

7 REFERENCES 19
1. INTRODUCTION

1.1 Python Language Introduction:

Python is a widely used general-purpose, high level programming language. It was initially designed by Guido
van Rossum in 1991 and developed by Python Software Foundation. It was mainly developed for emphasis
on code readability, and its syntax allows programmers to express concepts in fewer lines of code. Python is
a programming language that lets you work quickly and integrate systems more efficiently. Python is a high-
level, interpreted, interactive and object-oriented scripting language. Python is designed to be highly readable.
It uses English keywords frequently whereas other languages use punctuation, and it has fewer syntactical
constructions than other languages.

Python is Interpreted: Python is processed at runtime by the interpreter. You do not need to compile your
program before executing it.
Python is Interactive: You can actually sit at a Python prompt and interact with the interpreter directly to
write your programs.
Python is Object-Oriented: Python supports Object-Oriented style or technique of programming that
encapsulates code within objects.

1.2 History of Python:


Python was developed by Guido van Rossum in the late eighties and early nineties at the National Research
Institute for Mathematics and Computer Science in the Netherlands. Python is derived from many other
languages, including ABC, Modula-3, C, C++,Algol-68, Small Talk, and Unix shell and other scripting
languages. Python is copyrighted. Like Perl, Python source code is now available under the GNU General

1
Public License (GPL).Python is now maintained by a core development team at the institute, although Guido
van Rossum still holds a vital role in directing its progress.

1.3 Python Features:

Python’s Features includes:


Easy-to-learn: Python has few keywords, simple structure, and a clearly defined syntax. This allows the
student to pick up the language quickly.
Easy-to-read: Python code is more clearly defined and visible to the eyes.
Easy-to-maintain: Python's source code is fairly easy-to-maintain.
A broad standard library: Python's bulk of the library is very portable andcross-platform compatible on
UNIX, Windows, and Macintosh.
Interactive Mode: Python has support for an interactive mode which allows interactive testing and debugging
of snippets of code.
Portable: Python can run on a wide variety of hardware platforms and has the same interface on all platforms.
Extendable: You can add low-level modules to the Python interpreter. These modules enable programmers
to add to or customize their tools to be more efficient.
Databases: Python provides interfaces to all major commercial databases.
GUI Programming: Python supports GUI applications that can be created and ported to many system calls,
libraries and windows systems, such as Windows MFC, Macintosh, and the X Window system of Unix.

2
Scalable: Python provides a better structure and support for large programs than shell scripting

Future Technologies Counting On Python


Generally, we have seen that python programming language is extensively used for web development,
application development, system administration, developing games etc. But there are some future
technologies that are relying on python. As a matter of fact, Python has become the core language as far as
the success of these technologies is concerned. Let’s dive into the technologies which use python as a core
element for research, production and further developments.

Artificial Intelligence:
Python programming language is undoubtedly dominating the other languages when future technologies like
Artificial Intelligence(AI) comes into the play. There are plenty of python frameworks, libraries, and tools
that are specifically developed to direct Artificial Intelligence to reduce human efforts with increased
accuracy and efficiency for various development purposes. It is only the Artificial Intelligence that has made
it possible to develop speech recognition system, autonomous cars, interpreting data like images, videos etc.
We have shown below some of the python libraries and tools used in various Artificial Intelligence branches.
Machine Learning- PyML, PyBrain, scikit-learn, MDP Toolkit, GraphLab Create,MIPy etc.
General AI- pyDatalog, AIMA, EasyAI, SimpleAI etc.
Neural Networks- PyAnn, pyrenn, ffnet, neurolab etc. Natural Language
& Text Processing- Quepy, NLTK, gensim

Big Data:
The future scope of python programming language can also be predicted by the way it has helped big data
technology to grow. Python has been successfully contributing in analyzing a large number of data sets across
computer clusters through its high-performance toolkits and libraries.
Let’s have a look at the python libraries and toolkits used for Data analysis and handling other big data issues.
Pandas,Scikit-Learn, NumPy, SciPy, GraphLab Create, IPython, Bokeh, Agate, PySpark, Dask.

Networking:
Networking is another field in which python has a brighter scope in the future. Python programming language
is used to read, write and configure routers and switches and perform other networking automation tasks in a
cost-effective and secure manner. For these purposes, there are many libraries and tools that are built on the
top of the python language. Here we have listed some of these python libraries and tools especially used by
3
network engineers for network automation. Ansible, Netmiko, NAPALM (Network Automation and
Programmability Abstraction Layer with Multivendor Support).

Websites Developed Using Python:

Python programming language is used for web development, so here are some of the world’s most popular
websites that are created using python. Youtube, Quora, Instagram, Pinterest, Spotify, Flipkart, Uber, Google,
Facebook.

4
2.DATA SCIENCE

2.1 What is data science?

Data science is the study of data to extract meaningful insights for business. It is a multidisciplinary approach
that combines principles and practices from the fields of mathematics, statistics, artificial intelligence, and
computer engineering to analyze large amounts of data. This analysis helps data scientists to ask and answer
questions like what happened, why it happened, what will happen, and what can be done with the results.

Why is data science important?

Data science is important because it combines tools, methods, and technology to generate meaning from data.
Modern organizations are inundated with data; there is a proliferation of devices that can automatically
collect and store information. Online systems and payment portals capture more data in the fields of e-
commerce, medicine, finance, and every other aspect of human life. We have text, audio, video, and image
data available in vast quantities.

History of data science

While the term data science is not new, the meanings and connotations have changed over time. The word first
appeared in the ’60s as an alternative name for statistics. In the late ’90s, computer science professionals
formalized the term. A proposed definition for data science saw it as a separate field with three aspects: data
design, collection, and analysis. It still took another decade for the term to be used outside of academia.

Future of data science

Artificial intelligence and machine learning innovations have made data processing faster and more efficient.
Industry demand has created an ecosystem of courses, degrees, and job positions within the field of data
science. Because of the cross-functional skillset and expertise required, data science shows strong projected
growth over the coming decades.

2.2 What is data science used for?

Data science is used to study data in four main ways:

5
1. Descriptive analysis

Descriptive analysis examines data to gain insights into what happened or what is happening in the data
environment. It is characterized by data visualizations such as pie charts, bar charts, line graphs, tables, or
generated narratives. For example, a flight booking service may record data like the number of tickets booked
each day. Descriptive analysis will reveal booking spikes, booking slumps, and high-performing months for
this service.

2. Diagnostic analysis

Diagnostic analysis is a deep-dive or detailed data examination to understand why something happened. It is
characterized by techniques such as drill-down, data discovery, data mining, and correlations. Multiple data
operations and transformations may be performed on a given data set to discover unique patterns in each of
these techniques.For example, the flight service might drill down on a particularly high-performing month to
better understand the booking spike. This may lead to the discovery that many customers visit a particular city
to attend a monthly sporting event.

3. Predictive analysis

Predictive analysis uses historical data to make accurate forecasts about data patterns that may occur in the
future. It is characterized by techniques such as machine learning, forecasting, pattern matching, and predictive
modeling. In each of these techniques, computers are trained to reverse engineer causality connections in the
data.For example, the flight service team might use data science to predict flight booking patterns for the coming
year at the start of each year. The computer program or algorithm may look at past data and predict booking
spikes for certain destinations in May. Having anticipated their customer’s future travel requirements, the
company could start targeted advertising for those cities from February.

4. Prescriptive analysis

Prescriptive analytics takes predictive data to the next level. It not only predicts what is likely to happen but
also suggests an optimum response to that outcome. It can analyze the potential implications of different choices
and recommend the best course of action. It uses graph analysis, simulation, complex event processing, neural
networks, and recommendation engines from machine learning.

6
Back to the flight booking example, prescriptive analysis could look at historical marketing campaigns to
maximize the advantage of the upcoming booking spike. A data scientist could project booking outcomes for
different levels of marketing spend on various marketing channels. These data forecasts would give the flight
booking company greater confidence in their marketing decisions.

2.3 What is the data science process?

A business problem typically initiates the data science process. A data scientist will work with business
stakeholders to understand what business needs. Once the problem has been defined, the data scientist may
solve it using the OSEMN data science process:

O – Obtain data

Data can be pre-existing, newly acquired, or a data repository downloadable from the internet. Data scientists
can extract data from internal or external databases, company CRM software, web server logs, social media
or purchase it from trusted third-party sources.

S – Scrub data

Data scrubbing, or data cleaning, is the process of standardizing the data according to a predetermined format.
It includes handling missing data, fixing data errors, and removing any data outliers. Some examples of data
scrubbing are:·

• Changing all date values to a common standard format.·

• Fixing spelling mistakes or additional spaces.·

• Fixing mathematical inaccuracies or removing commas from large numbers.

E – Explore data

Data exploration is preliminary data analysis that is used for planning further data modeling strategies. Data
scientists gain an initial understanding of the data using descriptive statistics and data visualization tools.
Then they explore the data to identify interesting patterns that can be studied or actioned.

7
M – Model data

Software and machine learning algorithms are used to gain deeper insights, predict outcomes, and prescribe
the best course of action. Machine learning techniques like association, classification, and clustering are
applied to the training data set. The model might be tested against predetermined test data to assess result
accuracy. The data model can be fine-tuned many times to improve result outcomes.

N – Interpret results

Data scientists work together with analysts and businesses to convert data insights into action. They make
diagrams, graphs, and charts to represent trends and predictions. Data summarization helps stakeholders
understand and implement results effectively.

What are different data science technologies?

Data science practitioners work with complex technologies such as:

1.Artificial intelligence: Machine learning models and related software are used for predictive and prescriptive
analysis.

2.Cloud computing: Cloud technologies have given data scientists the flexibility and processing power required
for advanced data analytics.

8
3.Internet of things: IoT refers to various devices that can automatically connect to the internet. These devices
collect data for data science initiatives. They generate massive data which can be used for data mining and data
extraction.

4.Quantum computing: Quantum computers can perform complex calculations at high speed. Skilled data
scientists use them for building complex quantitative algorithms.

3. PYTHON FOR DATA SCIENCE

Python is open source, interpreted, high level language and provides great approach for object-oriented
programming. It is one of the best language used by data scientist for various data science
projects/application. Python provide great functionality to deal with mathematics, statistics and scientific
function. It provides great libraries to deals with data science application. One of the main reasons why
Python is widely used in the scientific and research communities is because of its ease of use and simple
syntax which makes it easy to adapt for people who do not have an engineering background. It is also
more suited for quick prototyping.

3.1 WHY PYTHON FOR DATA SCIENCE?

The first of many benefits of Python in data science is its simplicity. While some data scientists come from a
computer science background or know other programming languages, many come from backgrounds in
statistics, mathematics, or other technical fields and may not have as much coding experience when they enter
the field of data science. Python syntax is easy to follow and write, which makes it a simple programming
language to get started with and learn quickly.

In addition, there are plenty of free resources available online to learn Python and get help if you get stuck.
Python is an open source language, meaning the language is open to the public and freely available. This is
beneficial for data scientists looking to learn a new language because there is no up-front cost to start learning
Python. This also means that there are a lot of data scientists already using Python, so there is a strong
community of both developers and data scientists who use and love Python.

The Python community is large, thriving, and welcoming. Python is the fourth most popular language among
all developers based on a 2020 Stack Overflow survey of nearly 65,000 developers. Python is especially
popular among data scientists. According to SlashData, there are 8.2 million active Python users with “a
whopping 69% of machine learning developers and data scientists now us[ing] Python (compared to 24% of
9
them using R).”4 A large community brings a wealth of available resources to Python users. Not only are
there numerous books and tutorials available, there are also conferences such as PyCon where Python users
across the world can come together to share knowledge and connect. Python has created a supportive and
welcoming community of data scientists willing to share new ideas and help one another.

If the sheer number of people using Python doesn’t convince you of the importance of Python for data science,
maybe the libraries available to make your data science coding easier will. A library in Python is a collection
of modules with pre-built code to help with common tasks. They essentially allow us to benefit from and
build on top of the work of others. In other languages, some data science tasks would be cumbersome and
time consuming to code from scratch. There are countless libraries like NumPy, Pandas, and Matplotlib
available in Python to make data cleaning, data analysis, data visualization, and machine learning tasks easier.
Some of the most popular libraries include:

• NumPy: NumPy is a Python library that provides support for many mathematical tasks on large,
multidimensional arrays and matrices.
• Pandas: The Pandas library is one of the most popular and easy-to-use libraries available. It allows
for easy manipulation of tabular data for data cleaning and data analysis.
• Matplotlib: This library provides simple ways to create static or interactive boxplots, scatterplots, line
graphs, and bar charts. It’s useful for simplifying your data visualization tasks.
• Seaborn: Seaborn is another data visualization library built on top of Matplotlib that allows for
visually appealing statistical graphs. It allows you to easily visualize beautiful confidence intervals,
distributions, and other graphs.
• Statsmodels: This statistical modeling library builds all of your statistical models and statistical tests
including linear regression, generalized linear models, and time series analysis models.
• Scipy: Scipy is a library used for scientific computing that helps with linear algebra, optimization, and
statistical tasks.
• Requests: This is a useful library for scraping data from websites. It provides a user-friendly and
responsive way to configure HTTP requests.

In addition to all of the general data manipulation libraries available in Python, a major advantage of Python
in data science is the availability of powerful machine learning libraries. These machine learning libraries
make data scientists’ lives easier by providing robust, open source libraries for any machine learning
algorithm desired. These libraries offer simplicity without sacrificing performance. You can easily build a

10
powerful and accurate neural network using these frameworks. Some of the most popular machine learning
and deep learning libraries in Python include:

• Scikit-learn: This popular machine learning library is a one-stop-shop for all of your machine learning
needs with support for both supervised and unsupervised tasks. Some of the machine learning
algorithms available are logistic regression, k-nearest neighbors, support vector machine, random
forest, gradient boosting, k-means, DBSCAN, and principal component analysis.
• Tensorflow: Tensorflow is a high-level library for building neural networks. Since it was mostly
written in C++, this library provides us with the simplicity of Python without sacrificing power and
performance. However, working with raw Tensorflow is not suited for beginners.
• Keras: Keras is a popular high-level API that acts as an interface for the Tensorflow library. It’s a tool
for building neural networks using a Tensorflow backend that’s extremely user friendly and easy to
get started with.
• Pytorch: Pytorch is another framework for deep learning created by Facebook’s AI research group. It
provides more flexibility and speed than Keras, but since it has a low-level API, it is more complex
and may be a little bit less beginner friendly than Keras.

Python is the most popular programming language for data science. If you’re looking for a new job as a data
scientist, you’ll find that Python is also required in most job postings for data science roles. Jeff Hale, a
General Assembly data science instructor, scraped job postings from popular job posting sites to see what
was required for jobs with the title of “Data Scientist.” Hale found that Python appears in nearly 75% of all
job postings. Python libraries including Tensorflow, Scikit-learn, Pandas, Keras, Pytorch, and Numpy also
appear in many data science job postings.

11
R, another popular programming language for data science, appeared in roughly 55% of the job postings.
While R is a useful tool for data science and has many benefits including data cleaning, data visualization,
and statistical analysis, Python continues to become more popular and preferred among data scientists for a
majority of tasks. In fact, the average percentage of job postings requiring R dropped by about 7% between
2018 and 2019, while Python increased in the percentage of job postings requiring the language. This isn’t to
say that learning R is a waste of time; data scientists that know both of these languages can benefit from the
strengths of both languages for different purposes. However, since Python is becoming increasingly popular,
there’s a high chance that your team uses Python, and it’s important to use the language that your team is
comfortable with and prefers.

As Python continues to grow in popularity and as the number of data scientists continues to increase, the use
of Python for data science will inevitably continue to grow. As we advance machine learning, deep learning,
and other data science tasks, we’ll likely see these advancements available for our use as libraries in Python.
Python has been well-maintained and continuously growing in popularity for years, and many of the top
companies use Python today. With its continued popularity and growing support, Python will be used in the
industry for years to come.

12
4. HOW TO LEARN PYTHON FOR DATA SCIENCE

Step 1: Learn Python fundamentals

Everyone starts somewhere. This first step is to learn Python programming basics. You can do this with an
online course (which Dataquest offers), data science bootcamps, self-directed learning, or university
programs. There is no right or wrong way to learn the Python basics. The key is to choose a path and stay
consistent.

Find an online community

For help staying motivated, join an online community. Most communities allow you to learn with questions
that you or others ask the group. You can also connect with other community members and build relationships
with industry professionals. This also increases your opportunities for employment, as employee referrals
account for 30% of all hires.Many students also find it helpful to create a Kaggle account and to join a local
Meetup group. If you’re a Data quest subscriber, you get access to Dataquest’s learner community, where
you’ll find access to support from both current students and alums.

Step 2: Practice with hands-on learning

13
One of the best ways to accelerate your education is through hands-on learning.

Practice with Python projects

It may surprise you how quickly you catch on when you build small Python projects. Fortunately, virtually
every Data quest course contains a project to enhance your learning. Here are a few of them:

• Prison Break — Have some fun, and analyze a dataset of helicopter prison escapes using Python and
Jupyter Notebook.
• Profitable App Profiles for the App Store and Google Play Markets — In this guided project, you’ll
work as a data analyst for a company that builds mobile apps. You’ll use Python to provide value
through practical data analysis.
• Exploring Hacker News Posts — Work with a dataset of submissions to Hacker News, a popular
technology site.

• Exploring eBay Car Sales Data — Use Python to work with a scraped dataset of used cars from eBay
Kleinanzeigen, a classifieds section of the German eBay website.

This article also has tons of other Python project ideas for beginners:

• Build a rock, paper, scissors game


• Build a text adventure game
• Build a guessing game
• Build interactive Mad Libs

Alternative ways to practice and learn

To enhance your coursework and find answers to the Python programming problems you encounter, read
guidebooks, blog posts, Python tutorials, or other people’s open-source code for new ideas.If you still want
more, check out this article on different ways to learn Python for data science.

Step 3: Learn Python data science libraries

The four most-important Python libraries are NumPy, Pandas, Matplotlib, and Scikit-learn.

14
• NumPy — A library that makes a variety of mathematical and statistical operations easier; it is also
the basis for many features of the pandas library.
• pandas — A Python library created specifically to facilitate working with data. This is the bread and
butter of a lot of Python data science work.
• Matplotlib — A visualization library that makes it quick and easy to generate charts from your data.
Scikit-learn — The most popular library for machine learning work in Python.

NumPy and Pandas are great for exploring and playing with data. Matplotlib is a data visualization library
that makes graphs as you’d find in Excel or Google Sheets. Here’s a helpful guide to the 15 most important
Python libraries for data science.

Step 4: Build a data science portfolio as you learn Python

For aspiring data scientists, a portfolio is a necessity — it’s one of the most important things hiring managers
look for in a qualified candidate. These projects should include work with several different datasets, and each
should share interesting insights that you discovered. Here are some types of projects to consider:

• Data Cleaning Project — Any project that involves dirty or “unstructured” data that you clean up
and analyze will impress potential employers, since most real-world data requires cleaning.
• Data Visualization Project — Making attractive, easy-to-read visualizations is both a programming
and a design challenge, but if you can do it well, your analysis will be considerably more useful.
Having great-looking charts in a project will make your portfolio stand out.
• Machine Learning Project — If you aspire to work as a data scientist, you will definitely need a
project that shows off your ML skills. You may want a few different machine learning projects, with
each focused on a different algorithm.

Present your portfolio effectively

Your analysis should be clear and easy to read — ideally in a format like a Jupyter Notebook so a technical
audience can read your code. (Non-technical readers can follow along with your charts and written
explanations.)

Does your portfolio need a theme?

15
Your portfolio doesn’t necessarily need a particular theme. Find datasets that interest you, then develop a way
to link them. If you want to work at a particular company or in a particular industry, showcasing projects
relevant to that industry is a great idea. Displaying projects like these demonstrates to future employers that
you’ve taken the time to learn Python and other important programming skills.

Step 5: Apply advanced data science techniques

Finally, improve your skills. Your data science journey will be full of constant learning, but there are advanced
Python courses you can complete to ensure you’ve covered all the bases. Learn to be comfortable with
regression, classification, and k-means clustering models. You can also step into machine learning by
studying bootstrapping models and creating neural networks using Scikit-learn.

5.NPTEL PYTHON FOR DATA SCIENCE

NPTEL course is an introduction to programming and problem solving in Python and aims at equipping
participants to be able to use python programming for solving data science problems. It does not assume any
prior knowledge of programming. Using some motivating examples, the course quickly builds up basic
concepts such as conditionals, loops, functions, lists, strings and tuples. It goes on to cover searching and
sorting algorithms, dynamic programming and backtracking, as well as topics such as exception handling and
using files. As far as data structures are concerned, the course covers Python dictionaries as well as classes
and objects for defining user defined datatypes such as linked lists and binary search trees.
Course layout

Week 1: BASICS OF PYTHON SPYDER (TOOL)

• Introduction Spyder

• Setting working Directory

• Creating and saving a script file

• File execution, clearing console, removing variables from environment, clearing environment
16
• Commenting script files

• Variable creation

• Arithmetic and logical operators

• Data types and associated operations


Week 2: Sequence data types and associated operations

• Strings
• Lists
• Arrays
• Tuples
• Dictionary
• Sets
• Range

NumPy

• ndArray

Week 3: Pandas dataframe and dataframe related operations on Toyota Corolla dataset

1. Reading files

2. Exploratory data analysis

3. Data preparation and preprocessing

•Data visualization on Toyoto Corolla dataset using matplotlib and seaborn libraries

1. Scatter plot

2. Line plot

3. Bar plot

4. Histogram

5. Box plot

17
6. Pair plot

•Control structures using Toyota Corolla dataset

1. if-else family

2. for loop

3. for loop with if break

4. while loop

•Functions

Week 4: CASE STUDY


•Regression: Predicting price of pre-owned cars

•Classification: Classifying personal income

6.CONCLUSION

Python's focus on simplicity and readability, it boasts a gradual and relatively low learning curve. This ease
of learning makes Python an ideal tool for beginning programmers. Python offers programmers the advantage
of using fewer lines of code to accomplish tasks than one needs when using older languages
NPTEL Data Science with Python Certification Course help to establish mastery of data science and analytics
techniques using Python. Using this course, I learn the essential concepts of Python programming and gain
in-depth, valuable knowledge in data analytics, machine learning, data visualization, web scraping, and
natural language processing. As we’ve seen, Python is an increasingly required skill for many data science
positions, so to enhance the career with this interactive, hands-on course is really valuable.

18
7. REFERENCES

1. https://onlinecourses.nptel.ac.in/noc22_cs32/preview
2. https://www.python.org/doc/essays/blurb/
3. https://www.fullstackpython.com/ask.html
4. https://realpython.com/tutorials/
5. https://aws.amazon.com/what-is/data-science/
6. https://www.studocu.com/in/document/

19

You might also like