[go: up one dir, main page]

0% found this document useful (0 votes)
6 views14 pages

Data Science

Uploaded by

Durjoy Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views14 pages

Data Science

Uploaded by

Durjoy Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Data Analysis Tools and

Libraries
Data

Data: Data refers to raw and unprocessed facts and figures. It can take the form of numbers, text,
images, or any other representation of facts. Data by itself does not carry any meaning and needs to be
processed to become useful.

Example: A list of temperatures recorded hourly (e.g., 25°C, 26°C, 24°C) is data. Each individual
temperature reading is a piece of data.
Information

Information: Information is data that has been processed, organized, or structured in a meaningful
way to convey a message or provide insights. It represents knowledge and has context, making it
valuable for decision-making.

Example: If we take the list of temperatures mentioned earlier and calculate the average temperature
for the day (e.g., 25.5°C), the average temperature becomes information. It provides a meaningful
summary of the data. Hare is another example of student rating chart.
Student Rating
6

0
Akash Ahnaf Sihan Moajjim
Math AI CS
Database

Database: A database is a structured collection of data that is organized and stored in a way that
allows for efficient retrieval, updating, and management. It typically consists of tables, relationships,
and a management system to facilitate easy access and manipulation of data.

Example: Consider a database for a library. It might have tables for books, authors, and borrowers.
Each table contains data, such as book titles, author names, and borrower information. The database
structure allows for efficient querying and retrieval of information, such as finding all books by a
specific author or tracking borrowed books.
Database Types

1. Relational Databases
Description: Store data in tables with rows and columns. Use SQL(Structured Query Language)
for querying.
Examples:
• MySQL
• PostgreSQL
• SQLite
• Microsoft SQL Server
• Oracle
Microsoft Access is a Desktop Relational Database Systems

2. NoSQL Databases
Description: Non-relational databases designed for flexibility and scalability. Data is often
stored in key-value pairs, documents, or graphs.
Examples:
• MongoDB
• Firebase Realtime Database
SQL (Structured Query Language)

SQL (Structured Query Language) is a standard programming language used for managing and
manipulating relational databases. It allows users to query, insert, update, and delete data in a
database, as well as manage database structures such as tables and indexes.

Key Functions of SQL:


1. Data Querying: Retrieve data from one or more tables using the SELECT statement.
•Example: SELECT * FROM students;
2. Data Insertion: Insert new records into tables using the INSERT statement.
•Example: INSERT INTO students (name, age) VALUES ('John Doe', 21);
3. Data Updating: Modify existing data with the UPDATE statement.
•Example: UPDATE students SET age = 22 WHERE name = 'John Doe’;
4. Data Deletion: Remove records using the DELETE statement.
•Example: DELETE FROM students WHERE name = 'John Doe’;
5. Database Management: Create, modify, and delete database structures using CREATE, ALTER, and DROP
statements.
•Example: CREATE TABLE students (id INT, name VARCHAR(100));
6. Filtering and Sorting: Filter and sort data using WHERE, ORDER BY, and GROUP BY.
•Example: SELECT * FROM students WHERE age > 20 ORDER BY name;
Data Analysis Tools

Python

SQL R

Data
Science
Jupyter
Power BI Notebooks

Tableau Excel
Data Analysis Tools

Commonly used data analysis software/Tools:


1. Python - Python is a versatile programming language widely used for data analysis tasks. Its
libraries like Pandas, NumPy, and Matplotlib are extensively employed for data manipulation,
numerical computing, and data visualization, respectively.
2. R - R is a programming language and environment specifically designed for statistical computing
and graphics. It is commonly used for tasks such as statistical analysis, data visualization, and
machine learning modeling.
3. MATLAB - MATLAB is a high-level programming language and interactive environment used
primarily for numerical computing and visualization. It is frequently utilized in engineering and
scientific research for tasks including data analysis, algorithm development, and simulation.
4. Tableau - Tableau is a powerful data visualization software that allows users to create interactive
and shareable dashboards and reports. It is widely used for exploring and presenting data insights
in a visually appealing manner.
5. Excel - Microsoft Excel is a spreadsheet software with built-in tools for data analysis, such as
functions, pivot tables, and charting capabilities. It is commonly used for tasks ranging from
simple data manipulation to complex statistical analysis.
6. SPSS - SPSS (Statistical Package for the Social Sciences) is a software package used for statistical
analysis. It provides a wide range of tools for data manipulation, descriptive statistics, hypothesis
testing, and data visualization.
7. Jupyter Notebook - Jupyter Notebook is an open-source web application that allows users to
create and share documents containing live code, equations, visualizations, and narrative text. It is
commonly used for interactive data analysis, exploratory data analysis, and collaborative research.
Google Colab and Kaggle use Jupyter Notebook for live coding.
8. Power BI - Power BI is a business analytics service by Microsoft that provides interactive data
visualization and business intelligence capabilities. It allows users to connect to various data
sources, create interactive dashboards, and share insights across an organization.
Python and its uses

Python is a versatile programming language with a wide range of applications across various
domains. Some of the key application fields of Python include:

1. Web Development: Python is widely used for developing web applications and websites.
Frameworks like Django and Flask simplify the process of building robust and scalable web
applications.

2. Data Science and Machine Learning: Python is one of the most popular languages for data science
and machine learning. Libraries such as NumPy, Pandas, Matplotlib, and scikit-learn make it easy to
perform data analysis, visualization, and machine learning tasks.

3. Artificial Intelligence: Python is commonly used in AI applications. Frameworks like TensorFlow


and PyTorch are extensively used for developing deep learning models and neural networks.
4. Automation and Scripting: Python's simplicity and readability make it an excellent choice for
writing scripts and automating repetitive tasks. It is often used in system administration, network
programming, and automation of various processes.

5. Game Development: Python is used in game development, both for building full-fledged games
and for creating game-related tools and scripts. Libraries like Pygame provide a framework for game
development.

6. Desktop GUI Applications: Python can be used to develop graphical user interface (GUI)
applications using libraries like Tkinter, PyQt, and Kivy. This makes it suitable for creating desktop
applications with a graphical interface.
Python Libraries

Python has a vast ecosystem of libraries that cater to different domains and purposes. Here are
some important and commonly used libraries in Python:
1.NumPy: Fundamental package for scientific computing with support for large, multi-
dimensional arrays and matrices, along with a collection of mathematical functions.
2.Pandas: Provides high-performance, easy-to-use data structures and data analysis tools. It's
particularly useful for working with structured data and time series data.
3.Matplotlib: Comprehensive library for creating static, animated, and interactive visualizations
in Python. It is often used for plotting graphs, charts, histograms, and other types of
visualizations.
4.SciPy: Collection of mathematical algorithms and functions built on top of NumPy. It includes
modules for optimization, integration, interpolation, linear algebra, signal processing, and more.
5.scikit-learn: Simple and efficient tools for data mining and data analysis. It provides various
algorithms and tools for machine learning tasks such as classification, regression, clustering,
dimensionality reduction, and model selection.
6. TensorFlow: Open-source deep learning framework developed by Google. It provides a
comprehensive ecosystem of tools, libraries, and community resources for building and deploying
machine learning models, particularly deep neural networks.
7. PyTorch: Deep learning framework maintained by Facebook's AI Research lab. It is known for its
dynamic computational graph and ease of use, making it popular among researchers and
practitioners in the deep learning community.
8. Keras: High-level neural networks API that runs on top of TensorFlow, Theano, or Microsoft
Cognitive Toolkit (CNTK). It provides a simple and consistent interface for building and training
deep learning models.
9. NLTK (Natural Language Toolkit): Library for natural language processing (NLP) tasks such as
tokenization, stemming, part-of-speech tagging, parsing, and semantic reasoning.
10. OpenCV: Open-source computer vision library with a wide range of functions for image and video
processing, including object detection, face recognition, feature extraction, and more.
11. Requests: HTTP library for making HTTP requests in Python. It simplifies the process of sending
HTTP requests and handling responses, making it easy to work with web APIs and web services.
12. Django: High-level web framework for building web applications in Python. It follows the model-
view-controller (MVC) architectural pattern and provides features such as ORM (Object-Relational
Mapping), authentication, URL routing, and templating.
Cloud Computing

Cloud computing is a technology that enables users to access


and store data, run applications, and utilize computing
resources over the internet, without the need for local
hardware or infrastructure. It involves delivering various
services, including servers, storage, databases, networking,
software, and analytics, over the internet.

Some features of cloud computing:


Flexibility: Cloud platforms offer various services and
deployment models.
Automated Management: Cloud providers often offer tools for
automated resource management.
Security: Cloud services typically implement robust security
measures.
Collaboration: Cloud-based applications facilitate collaboration
among users by providing real-time access to shared
documents, communication tools, and project management
platforms.

You might also like