[go: up one dir, main page]

0% found this document useful (0 votes)
96 views35 pages

A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of

This internship report details Keerthi R M's experience at AK Infopark Private Limited, focusing on Python for Data Science over a week in June 2024. The report covers various topics including Python basics, operators, and libraries like NumPy and Pandas, emphasizing their application in real-time data science projects. It highlights the importance of adaptability and teamwork, concluding with recommendations for future marketing strategies based on insights gained during the internship.

Uploaded by

Sathiya Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views35 pages

A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of

This internship report details Keerthi R M's experience at AK Infopark Private Limited, focusing on Python for Data Science over a week in June 2024. The report covers various topics including Python basics, operators, and libraries like NumPy and Pandas, emphasizing their application in real-time data science projects. It highlights the importance of adaptability and teamwork, concluding with recommendations for future marketing strategies based on insights gained during the internship.

Uploaded by

Sathiya Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

INTERNSHIP REPORT

A report submitted in partial fulfillment of the requirement of the award of degree of

MASTER OF SCIENCE IN MATHEMATICS

By

KEERTHI R M

Reg. No.:23083076511012007

(Duration: 12th June to18th June, 2024)

DEPARTMENT OF MATHEMATICS

GOVERNMENT ARTS AND SCIENCE COLLEGE

KANYAKUMARI – 629 401


DATA SCIENCE WITH PYTHON

Submitted by

KEERTHI R M

Reg .No. 23083076511012007

OCTOBER- 2024
ABSTRACT

This report presents a comprehensive analysis of my internship

experience at AK INFOPARK PRIVATE LIMITED,

PARVATHIPURAM, a leading firm in the sector. The primary focus of

the internship was to understand and learn about various software

packages through hands on approach. I was involved in python for Data

Science.

This internship provided an invaluable opportunity to apply theoretical

knowledge acquired in academic studies to real-world scenarios. I

participated in several learning notably python, python operators, working

with numpy and Panda, data science in real time applications, data

visualization and data science components. I utilized various analytical tools

that are very useful for job opportunities.

The internship underscored the importance of adaptability, teamwork,

and continuous learning within the latest updates of Python with data science.

This report details the project undertaken, skills developed and lessons

learned throughout the internship helped me to develop the skills in python

and data science. It concludes with recommendations for future marketing

endeavors based on the insights gained.


TABLE OF CONTENTS

CHAPTER TITLE PAGE NO.

INTRODUCTION 1

1 PYTHON 3

2 PYTHON OPERATORS 8

3 WORKING WITH NUMPY, PANDAS


17

4 DATA SCIENCE IN REALTIME APPLICATION


19

5 DATA VISUALIZATION
21

6 DATA SCIENCE COMPONENTS


24

CONCLUSION
INTRODUCTION

A program is a sequence of instructions that specifies how to

perform a computation. The computation might be something

mathematical, such as solving a system of equations or finding the roots

of a polynomial, but it can also be a symbolic computation such as

searching and replacing text in a document or something graphical, like

processing an image or playing a video. Python is a powerful and

versatile programming language that has become increasingly popular in

the field of data science. With its simple syntax and vast array of libraries

and tools, python has made it easier for data science to manipulate and

analyze data, build predictive models and make data driven decisions. In

this report, we will explore how python is used in data science, as well as

some of the key libraries and tools that data scientists use to perform

their work.

Python, favored by data scientists is flexible and ease of use.

Python is a high-level programming language that is both easy to learn

and easy to read, making it ideal for data science who may not have

strong background in programming. Python also offers a wide range of

libraries and tools that are specifically designed for data analysis and

machine learning such as NumPy, Pandas, matplotlip and scikit-learn.

1
These libraries allow data science to easily manipulate and visualize data,

as well as build and evaluate predictive models.

NumPy is a fundamental package for scientific computing with

python, providing support for large, multi-dimensional arrays and

matrices, as well as a variety of mathematical functions to operate on

these arrays. Pandas are powerful data manipulation library that offers

data structure like data frames and series, which allow data science to

easily work with structured data. Matplotlib is a plotting library that

enables data science to create a wide variety of visualizations, such as

line plots, scatter plots and histogram. Scikit- learns is a machine

learning library that provides a wide range of algorithms for

classification, regression, clustering and more.

As the field of data science continues to grow and evolve, python

still likely remain as a key programming language for data science

around the world.

2
CHAPTER 1

PHYTHON

Python is a popular programming language. It was created by

Guido van Rossum, and released in 1991. It is used for web development

(server-side), software development, mathematics, system scripting.

Python’s popularity in data science is largely attributed to its readability,

ease of learning, and the powerful libraries it provides. These libraries

enable data manipulation, statistical analysis and machine learning,

making Python an invaluable tool for data scientists.

Key Libraries and Tools

Pandas: A library providing high-performance data manipulation

and analysis. It introduces data structures like Data Frames that simplify

data handling and preparation.

NumPy: This library offers support for arrays and matrices, along

with a collection of mathematical functions to operate on these arrays. It

forms the backbone for many scientific computations in Python.

In Python we have list, that serve the purpose of arrays, but they

are slow to process. NumPy aims to provide an array object that is up to

50x faster than traditional Python lists. The array object in NumPy is

3
called ndarray, it provides a lot of supporting functions that make

working with nd array very easy. Arrays are very frequently used in data

science, where speed and resources are very important.

Matplotlib and Seaborn: These libraries are used for data

visualization. Matplotlib offers a wide range of plotting options, while

Seaborn provides a high-level interface for drawing attractive and

informative statistical graphics.

Scikit-learn: A library for machine learning includes simple and

efficient tools for data mining and data analysis. It supports various

algorithms for classification, regression, clustering and dimensionality

reduction.

TensorFlow and PyTorch: These libraries are used for deep

learning. Tensor Flow developed by Google and PyTorch developed by

Facebook, is popular for building and training neural networks.

Workflow in Python

Data Collection: Data can be collected from various sources,

including databases, APIs and web scraping. Libraries like requests and

Beautiful Soup are commonly used for these tasks.

4
Data Cleaning and Preparation: Data often needs to be cleaned

and transformed before analysis. Pandas are particularly useful for

handling missing values, filtering data and merging datasets.

Exploratory Data Analysis (EDA): EDA involves summarizing

the main characteristics of a dataset. This step helps in understanding the

data distribution and uncovering patterns.

Model Building: Using libraries like Scikit-learn or Tensor Flow,

data scientists build and train models to make predictions or classify

data. This involves selecting algorithms, training the model, and tuning

hyper parameters.

PYTHON BASICS

Python is an interpreted high level programming language known for its

simplicity and readability. Python uses indentation to define code blocks,

making it easy to read and understand.

VARIABLES AND DATA TYPES

Variables store data in memory and are assigned using the

assignment operator “=”. Common data types in python include integers,

floats, strings, lists, tuples, dictionaries, and sets.

5
CONTROL STRUCTURES

Conditional statements like ‘if,’ ‘elif’ and ‘else’ allow to make

decisions based on conditions. Loops like ‘for’ and ‘while’ can be used

for iteration and repetitive tasks.

FUNCTIONS

Functions are blocks of reusable code that performs a specific

task. Functions can take arguments as input and return values as output.

MODULES AND PACKAGES

Python modules and files contain python code. Modules are used

to import statement. Packages are directories, containing multiple

modules and a special file called _init_.py.

FILE I/O

Python provides built in functions for reading from and writing

to files. Use ‘open ( )’ to open a file and ‘read ( ) or write ( )’ to

manipulate file contents.

6
OBJECT-ORIENTED PROGRAMMING

Python supports OOP principles like encapsulation, inheritance

and polymorphism. Classes are blueprints for creating objects, while

objects are instance of classes.

7
CHAPTER 2

PYTHON OPERATORS

Operators are standard symbols used for logical and arithmetic

operations and are used to perform operations on variables and values.

Example: +, -, *, /…...The value on which the operator is applied is

called Operand. Python divides the operator as Python Arithmetic

Operators, Python Assignment Operators, Python Comparison Operators,

Python Logical Operators, Python Identity Operators, Python

Membership Operators, and Python Bitwise Operators.

8
Python Arithmetic Operators

Arithmetic operators are used with numeric values to perform common

mathematical operations.

Operator Name Example

Addition x+y
+

x-y
Subtraction
-

* Multiplication x*y

/ Division x/y

% Modulus x*y

** Exponentiation x ** y

Floor division x // y
\\

9
Python Assignment Operators

Assignment operators are used to assign values to variables.

Operator Example Same As

x=5
= x=5

+= x += 3 x=x+3

-= x -= 3 x=x-3

*= x *= 3 x=x*3

/= x /= 3 x=x/3

%= x %= 3 x=x%3

10
Python Comparison Operators

Comparison operators are used to compare two values.

Operators Name Example

== Equal x == y

!= Not equal x != y

> Greater than x>y

< Less than x<y

Greater than or qual to


>= x >= y

11
Python Logical Operators

Logical operators are used to combine conditional statements.

Operator Description Example

Returns True if both


and x < 5 and x < 10
statements are true

Returns True if one of the


or x < 5 or x < 4
statements is true

Reverse the result,

not returns False if the not(x < 5 and x < 10)

result is true

12
Python Identity Operators

Identity operators are used to compare the objects, not if they are

equal, but if they are actually the same object, with the same memory

location.

Operator Name Example

Returns True if both


is Variables are x is y
the same object

Returns True if both


is not Variables are x is not y

not the same object

13
Python Membership Operators

Membership operators are used to test, if a sequence is presented in an

object

Operator Description Example

Returns True if a sequence

In with the specified value is x in y

present in object

Returns True if a sequence

not in with the specified value is x not in y

not present in the object

14
Python Bitwise Operators

Bitwise operators are used to compare (binary) numbers.

Operator Name Description Example

& AND Sets each bit to 1 if x&y


both bits are 1

Sets each bit to 1 if


| OR one of two bits x|y

is 1

Sets each bit to 1 if


^ XOR only one of two x^y

bits is 1

~ NOT Inverts all the bits ~x

Shift left by
pushing zeros in
from the right and x << 2
<< Zero fill left shift
let the leftmost bits
off

15
Examples

16
CHAPTER 3

WORKING WITH NUMPY AND PANDAS

NumPy and pandas are popular libraries in python that are

commonly used for data manipulation and data analysis.

NumPy provides support for multidimensional arrays and

mathematical functions.

Basic operators are importing NumPy, creating arrays, array

indexing, array slicing,

Basic math operations element-wise operations, matrix

multiplication, array reshape, array transpose are array operators

Pandas offer data structures like data frames and series that make

it easy to work with structured data.

In Pandas, creating data frames, creating series, data selection,

data filtering, importing pandas are basic operators. merging, joining,

pivoting, reshaping, grouping and statistics are used for data

manipulation.

17
NumPy and pandas are indispensable tools for best practice

which includes utilizing vectored operations, optimizing data structures

and visualizing data effectively.

Screenshots for NumPy& Pandas

18
CHAPTER 4

DATA SCIENCE IN REAL TIME APPLICATION

Data Science is the deep study of a large quantity of data, which

involves extracting some meaning from the raw, structured and

unstructured data. Extracting meaningful data from large amounts uses

algorithms, processing of data and this processing can be done using

statistical techniques and algorithms, scientific techniques, different

technologies etc. It uses various tools and techniques to extract

meaningful data from raw data.

Data Science is applied in Finance (stock market prediction,

credit scoring), Healthcare (patient monitoring, diseases diagnosis),

Marketing (customer segment) and IOT (sensor data analysis, predictive

maintenance).

In Python Libraries, NumPy and Pandas are used for data

manipulation, Scikitlearn are used for machine learning, Tensor flow or

PyTorch for deep learning, Matplotlib and seaborne for visualization.

In Real time, data sources are used for streaming data (twitter,

sensor, data), API calls (weather, stock prices) and web scraping. Data

19
cleaning and preprocessing, Feature extraction and selection and Data

transformation are used in data processing.

Example:

Import panda as pd

import numpy as np

From sklearn. Model_selection import

train_ test _ split

from sklearn.Linear_modelimport

20
CHAPTER 5

DATA VISUALIZATION

Data visualization is used to represent data graphically to facilitate

understanding, identifying trends, patterns, and correlation. In Popular

Libraries, Matplotlip is used for 2D/3D plotting, Seaborn is used for

statistical visualization, Plotly is used for interactive visualization, Bokeh

are used for web based visualizations and Pandas are used for data

manipulation and visualizations

BASIC PLOTS

Line plots (plt.Plot( ))

Scatter plots (plt.Scatter( ))

Bar charts (plt.bar( ))

Histograms (plt.hidt( ))

21
In real world Data Visualization is applied in business

intelligence, scientific research, machine learning and web analytics.

df=pd. Read_csv(“data.csv”) is a program to load data.

BASIC PLOT

Plt.plot(df („column‟)) Plt. show( )

Data Visualization is a powerful tool for exploring and

communicating insights from data. Python provides a rich set of libraries

for creating a wide range of visualization from basic static plots to

interactive plots.

22
INPUT

OUTPUT

23
CHAPTER 6

DATA SCIENCE COMPONENTS

Data components are essential elements used for storing,

organizing and manipulating data. These components are crucial for

development and programming tasks, as they allow developers to work

with various types of data efficiency.

Data science in an interdisciplinary field that uses scientific

techniques, procedures, algorithms and structures, to extract know-how

and insights from established and unstructured information.

VARIABLES

Variables are used to store data values in memory. Variables are

created simply by assigning a value to a name. For example, a=10 creates

a variable named ‘a’ with a value of 10. Variables can store different

types of data such as integers, floats, strings, lists and dictionaries.

SETS

Sets are unordered collections of unique elements in python. Sets

do not allow duplicate values and elements to store in a random order.

Sets are defined using curly braces{}.


24
ARRAYS

Arrays in python are data structures that can store multiple values

of the same type. Python does not have built-in support for arrays, but the

NumPy library provides multidimensional arrays that are widely used in

scientific computing and data analysis.

Examples:

DATA FRAMES

Data frames are data structures commonly used in data analysis

and manipulation tasks. Data frames are provided by libraries such as

NumPy and pandas which allow developers to work with tabular data in

a versatile and efficient manner.

25
PACKAGES

Packages are collection of python modules that are organized in a

directory hierarchy. Packages allows developers to structure their code in

a more organized and maintainable way and provide a namespace for

organizing related functionality.

Each data components has its own characteristics and advantages,

allowing developers to choose the most suitable data structure.

26
ASSIGNMENTS

27
28
29
CONCLUSION

Python has emerged as the predominant language in the field of

data science due to its flexibility, extensive libraries, and strong

community support. Its simplicity and readability makes it accessible for

both beginners and experienced programmers, enabling data science to

efficiently manipulate, analyze and visualize data. Python’s libraries such

as NumPy, pandas, and matplotlib provide powerful tools for data

manipulation and visualization, while frame works like scikit-learn and

tensor flow enable the development of complex machine learning

models. The languages versatility allows data science to seamlessly

integrate different tools and technologies, making it an invaluable asset

in tackling diverse data science can easily process large data set, derive

meaningful insights, and build predictive models, making it an essential

tool for anyone working in the field of data science.

30
31

You might also like