Data
Science
P r e s e n t e d B y -
Name – Rutuja Babasaheb Salunkhe
Branch – 4th Year Information Technology
PRN No. – 2030331246057
College Name – Dr. Babasaheb Ambedkar
Technological University , Lonere
Under the guidance of –
Prof. Ekta Meshram
CONTENT
What is Data Science?
Purpose of Data Science
Application for Data Scientist
Why python in Data Science?
Introduction to Pandas
Correlation and its properties
Predictive Modeling
Algorithm Selection
Conclusion
What is Data Science?
Data science combines math and statistics, specialized programming, advanced
analytics, artificial intelligence (AI), and machine learning with specific subject matter
expertise to uncover actionable insights hidden in an organization’s data. These insights
can be used to guide decision making and strategic planning.
In today’s world, a large amount of data is generated daily. The main challenge is to deal
with this data and extract insights from it to help various organizations and businesses.
This is where Data Science comes in when it helps data to combine data and make a
pattern with the help of skills as such computer science, mathematics, statistics,
information visualization, graphics, and business to deal with this data.
Purpose of Data Science
Application for Data Scientist
•Healthcare: Data science can identify and predict disease, and personalize
healthcare recommendations.
•Transportation: Data science can optimize shipping routes in real-time.
•Sports: Data science can accurately evaluate athletes’ performance.
•Government: Data science can prevent tax evasion and predict incarceration rates.
•E-commerce: Data science can automate digital ad placement.
•Social media: Data science can create algorithms to pinpoint compatible partners.
Why python in Data Science?
1. Easy To Learn: Being an open-source platform, Python has a simple and intuitive syntax that is easy to learn
and read. This makes it a great language for beginners to learn data science.
2. Cross-Platform: Being a developer, you don’t need to worry about the data types. The reason is, Python
allows developers to run the code on Windows, Mac OS X, UNIX, and Linux.
3. Portable: Being an easy & beginner’s friendly programming language, Python is highly portable in nature
which means that a developer can run their code on different machines without making any further changes.
4. Extensive Library: Python has several powerful libraries that make data analysis and visualization easy.
Pandas is a library for data manipulation and analysis, NumPy is a library for numerical computation, and
Matplotlib is a library for data visualization.
Introduction to Pandas
• What is Pandas?
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.
The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
• Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
• What Can Pandas Do?
Pandas gives you answers about the data. Like:
Is there a correlation between two or more columns?
What is average value?
Max value?
Min value?
Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called cleaning the
data.
Correlation and its properties
Correlation is a process to establish a relationship between two variables. In statistics under relation and functions, methods of correlation
summarize the relationship between two variables in a single unitless number called the correlation coefficient.
1. Coefficient of Correlation lies between -1 and +1:
The coefficient of correlation cannot take value less than -1 or more than one +1. Symbolically,
-1<=r<= + 1 or | r | <1.
2. Coefficients of Correlation are independent of Change of Origin:
This property reveals that if we subtract any constant from all the values of X and Y, it will not affect the coefficient of correlation.
3. Coefficients of Correlation possess the property of symmetry:
The degree of relationship between two variables is symmetric as shown below:
4. If two variables X and Y are independent, coefficient of correlation between them will be zero.
Predictive Modeling
Algorithm Selection
Conclusion
Data science is one of the growing fields. It has become an important
part of almost every sector. It provides the best solutions that help to
fulfill the challenges of the ever-increasing demand and maintainable
future. As the importance of data science is increasing day by day, the
need for a data scientist is also growing. Data scientists are the future
of the world. Thus, a data scientist must be capable of providing great
solutions which meet the challenges of all fields. To perform this, they
should have proper resources and systems which help them to achieve
their goal.
Thank