[go: up one dir, main page]

0% found this document useful (0 votes)
53 views27 pages

Advanced Programming Final Client Report

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 27

Formative Assessment Report Advanced Programming University of York

Final client report


Section 1: Theory supported by code samples

Task (a)

P
ython threads could be applied in a variety of sections in our program.
Anything that is quite time consuming, can be amended to run
concurrently with threads. [1]

Although concurrency can be beneficial to many aspects of our program, what


we consider to be the most important is parallelising graph generation. In
order to proceed with doing so, we need to modularise the process into
distinct steps:

• Categorise the tasks: Our program currently has three graph generation
functions, a scatter plot, a pie chart and a bar graph.

• Define the functions to be threaded: Our program already contains three


separate functions for graph generation: 1) Generate_pie_chart 2)
generate_scatter_plot and 3) generate_bar_chart. Those would remain the
same.

• Create separate threads for the graphs: Creating a thread for each
function allows us to later initiate those threads. The code for creating the
threads can be seen in appendix (1)

• Start() and Join() threads: We need to thread each function separately using
Start(), and then Join() each thread so that the programs awaits the
completion of all threads before proceeding, as elaborated in the Python-
Threading document. [2] The new code can be seen below

The main issues that can arise when operating threads in our program are: 1)
synchronisation and 2) data access. If threads is modifying data
simultaneously, this can result in what is called race condition, where the
program is trying to perform two or more operations at the same time. [3] Race
conditions can result in errors or miscalculations, as the thread is attempting
to draw on a database that has not been updated, while another thread is
performing the calculation. To avoid this, synchronisation remedies like locks
can be used. Locks are multithreading tools that allow the efficient

1
Formative Assessment Report Advanced Programming University of York

synchronisation of program tasks, with their main functionalities being


“locked” and “unlocked”. While the lock is locked, access to other functions is
restricted. [4]

In order to utilise a synchronisation lock, we have to code the lock at the


beginning of our program, as seen in appendix (3), and utilise it is all three of
our graph generation functions. In appendix (3), we also provide the necessary
addition to the code of one of the three functions utilising a lock.

In our program, concurrency can be of substantial benefit in two parts, which


are both mentioned in the flow chart in the appendix (4) of this chapter. The
part we chose to identify/select is the part of Graph Generation and is
highlighted in bold.

The challenges facing the programmer while implementing threads can be hard
to overcome, and might undermine the potential benefits of implementing
concurrency in the code. It is important that we measure the advantages and
disadvantages obtained from using threads, relative to our code functions, its
recourse intensity and overall structure. [5]

Task (b)

T
he integration and implementation of various cognitive sciences in a GUI
layout has created a dynamic framework of analysing, preparing and
presenting context-appropriate GUI and has contributed significantly to
a more engaging and inclusive best practices. [6]

With this in mind, our current GUI supports pre-specified functionalities.


That is:

• Load Data

• Clean Data

• Calculate Data

• Visualise Graphs (Pie Chart, Scatter Plot, Bar Chart)

• Save Data

2
Formative Assessment Report Advanced Programming University of York

Our GUI is created with several aspects of user interaction and convenience in
mind, the main characteristics being 1) Simplicity, 2) Functionality &
Consistency and 3) Aesthetics. Those principles are implemented and further
elaborated in the following paragraphs.

Our Program consists of:

• A General framework: The minimalistic framework of tkinter presents the


user with an easy-to-spot and easy-to-use functionalities, making the
program straightforward.

• A main window: The window is appropriately resized so that it fits well with
the button layout. The relevant code can be seen in Appendix (5).

• Button widgets: The different buttons are positioned in a linear order of the
expected interaction. The buttons are responsive to hovering over and
clicking, and implement several aspects of colour theory. [7] The selected
colors are Maya Blue, Dark Cyan, and a light shade of red. The buttons are
tightly packed and are linked to a specific function within the main program.

• Making Calculations window: The output of the function


calculate_statistics() has been replaced with a pop-up window in
display_statistics(), as seen in Appendix (6). The pop up window has been
created independently, and has been embedded at the end of the
calculate_statistics() function, in order to avoid prompt before clicking,
avoiding interruption of experience and/or of other functionalities. [8]

• Graph Generation windows: Graph visualisations are also coded to appear


in independent pop-up windows, for reasons of customisation and parallel
overview. An example of the code for the pop-up display can be seen in
Appendix (7). There are three separate buttons, each linked with a different
type of visualisation. The graphs are based on classic Matplotlib and
Seaborn, offering the distinct advantages of integration with Pandas and a
big variety of graph options. [9] The GUI could be further enhanced by using
interactive graphs for customisation/visualisation, or responsive grids. [10]

The main challenge in creating this program was to construct a functional


ensemble that is visually appealing, and that can remain consistent while
executing the various functions independently. Although the program doesn’t
use concurrency, it has an acceptable degree of freedom in it’s order of
execution, with warning messages or tips showing at the top of the screen
when unexpected actions take place (such as trying to make calculations or

3
Formative Assessment Report Advanced Programming University of York

generate graphs without cleaning the data first). The program remains faithful
to best practices in UI, as presented in various books and papers. [11]

A sample of the GUI framework is seen in Appendix (8).

Task (c)

P
ython and java both rank among the most widely known and used
programming languages. Their inception can be traced back to 1991
and 1995 respectively and since then, the two languages have been
steadily developing and getting more popular. [12] [13]

Although both languages have seen substantial change in their libraries and
templates, Python can be argued to be the top choice among Beginners and
Experienced Data Scientists alike. The main reason is simple, Python aims to
do more with less:

• Less complicated Syntax: Python simplicity is among the highest, with


many considering it the simplest programming language to learn. Python
boasts for its commitment to readability, resulting in less debugging time and
simplified prototyping and testing. [12] [14] Although Java is considered
faster than Python, Python should normally be able to run the same program
in less lines.

• More polished Data Visualisation: Python contains various libraries (like


Seaborn or Plotly) that can conduct data visualisations fairly simply and
efficiently. In our program, we used classic Matplotlib and Seaborn to
conduct various graph representations of our data, with only a few lines of
code. An example of data visualisation is shown in Appendix (10).

• More dedicated (Big) Data Pipelines: Python has an excellent capacity of


handling different sorts of data pipelines. In our program, the data pipeline
flow can be traces through the following sequence 1) Load Data >> 2) Clean
Data with autosave >> 3) Isolate Graph Data for calculations >> 4) Save Data

• More Data cleaning: Python provides various functions to clean data


efficiently, handling errors, inconsistencies and missing values.

4
Formative Assessment Report Advanced Programming University of York

• More Data Manipulation libraries: Python immense number of libraries


provides the language with an uncanny versatility, being able to tackle all
sorts of Data Analysis and other issues. Libraries like Pandas, NumPy,
Matplotlib or Pendulum are among the top choices for programmers to work
with data manipulation. [15]

• More dynamic typing: Python variable declaration is easy and


straightforward. Compared to Java, declaring the variable type is not
necessary. An example of variable declaration can be seen in appendix (9)
and Appendix (10).

• More versatility: Python is estimated to have more than 140.000 available


packages. [16] The language has evolved to cover a plethora of subjects,
with its lack of unified purpose being covered by a dedicated
compartmentalisation. For this reason, more and more professionals are
using Python for their purposes, without the need for expertise in various
other aspects/utilities of the language. [12]

Section 2: Design decision supported by code samples

Task (a)

E
very data format has distinct advantages and disadvantages. Choosing
one depends highly on the particularities of the program structure to be
designed and implemented.

In our case, we selected JSON as the most appropriate format. JSON is used
broadly for various purposes, from Marine Integrated Database Management
[17] to Transformations in Cultural Heritage [18]. In our case, we selected it
due to several reasons:

• Advantage - our Data are static: JSON’s simplicity, compared to XML,


renders writing and reading files more efficient, and since the data are not of
high volume or dynamic, there is no substantial need to fill the files with
crowded syntax and slower parsing. [19] Our data backup method (saving
and loading) can be seen in Appendix (12) and (13). An overview of our data
format is also presented in a diagram in Appendix (14).

• Advantage - our API Format is simple: The user-friendly API design in our
software promotes JSON as the better option, due to its compact design and

5
Formative Assessment Report Advanced Programming University of York

simplicity. It also presents both users and developers (in case of shared
code) with higher readability.

• Disadvantage - Reading Dates: A shortcoming of JSON compared to XML


is XML’s support for dates which, given the client needs for the processing of
dates, could make the date processing more efficient. However, it is worth
noting that the Pandas function to_datetime() does an excellent job at
handling the necessary date conversions. An example of the code used is
seen in Appendix (15).

• Advantage - File Size and Security: One of the main reasons to prefer
JSON over XML is due to its decreased file size and its increased security.
Although mitigating the security shortcomings of XML is entirely feasible,
JSON is the way to go for a secure data structure environment. However, it is
worth nothing that both JSON and XML can be subject to an Injection Attack.
[20] [21]

Task (b)

T
he 3rd requirement was quite the challenge for a first timer. The code
used to implement the task is found in the function
calculate_statistics(). The functions performs various calculations on
pre-defined columns of the working_data DataFrame, and presents the results
in a pop-up window.

• Data Cleaning: We defined the relevant columns in columns_to_reformat,


and converted the values to decimals, replacing commas with periods to
ensure consistency, since the values had a mixed usage of commas and dots
to signify thousands and decimal separators. The method used is
the .replace method [22] The data is then converted to float, for smoother
calculations and a consistent format. The relevant code can be seen in
Appendix (16)

• Defining and filtering the relevant columns: As seen in Appendix (17).


Site_height_column values are converted to numbers using the
pd.to_numeric format. The reason we selected pd.to_numeric over .astype()
is to better handle nan values, as can be seen from the error handling error=

6
Formative Assessment Report Advanced Programming University of York

coerce. Overall, pd.to_numeric is better in handling data of uncertain


cleanness. [23] Similarly, the values in date column is
converted .to_datetime format. [24] Finally, the columns are filtered so that
only the corresponding columns of Site Height of over 75 and date after 2001
are considered.

• Iterating and calculating: The function then iterates over the filtered_date
columns, performing various calculations using the Pandas built-in
functions such as mode, mean and median, concatenating the calculations
to statistics_output, as seen in Appendix (18). [25]

• Presenting the calculations: We use statistics_output as a reference to the


function display_statistics, which displays statistics in an independent pop-
up window, for convenience of use and presentation. The relevant code can
be seen in Appendix (19).

Task (c)

A
fter extracting and cleaning the data, performing visualisations was
relatively easier and straightforward. We attempted to visualise the
data in three different ways, a Pie Chart, a Scatter Plot, and a Bar
Graph. For reasons of word limit, we are going to elaborate only on the Pie
Chart, which can be seen in Appendix (20).

Extraction of data and frequency of occurrence: We first start by extracting


the relevant columns in a separate 1D array, rendering them easier to
manipulate, as can be seen in the code in Appendix (21). Realising the Data to
be presented in the chart are too many, we decide to only present the most
frequently occurring elements. This would provide the advantage of better
aesthetics and readability.

We do this using the classic Pandas count command .value_counts [26]. We


then calculated the percentage of occurrence by dividing the number of
occurrence by the number of elements in the DataFrame. The relevant code
can be seen in Appendix (22).

7
Formative Assessment Report Advanced Programming University of York

Creating the actual data to be presented: We then calculate the number of


occurrences with a threshold over 2% and remove it from the DataFrame,
leaving only elements with an occurrence over 2%, adding those occurrences
in the “Other Stations” label. Appendix (23)

Visualising the data: We proceed with creating a new pie, and adding the
percentage, the labels (using the label.index), and a percentage format. [27]
Appendix (24)

Presenting in a pop-up Window: Finally, we proceed with creating a canvas,


converting the figure to a widget, creating a toolbar and packing everything up.
(Appendix 25)

Why choose Matplotlib: The reason we selected Matplotlib for our


visualisations is mainly one of familiarity. Matplotlib offers a great array of data
visualisation tools, with static, animating and customisable graphs being
included. Other viable (and perhaps better) alternatives are Seaborn [28] [9],
Plotly [29] and PyQtGraph [30] , each offering distinct advantages, especially
for interactive visualisations.

Task (d)

I
n implementing task (d), our approach is twofold. 1) We first have to find a
meaningful way to group/analyse string values, and 2) we must be able to
find a meaningful correlation between them. Our approach of choice was to
convert string elements in numerical/categorical data, and then compare
their correlation and p-values to examine whether there is a statistical
significant correlation between them. We proceed explaining both steps.

Grouping/Manipulating String values: Our approach of choice was to


convert string elements in numerical/categorical data using the
pd.Categorical [31] function, converting the string elements into numerical
categories, and use cat.codes [32] to extract the integer codes from the
categories formed. Although Pandas Categorical is not ideal for dynamic data
table, the fixed nature of our table renders It functional and just what we are
looking for. Any non numerical data are filled with a value of -1. The relevant
code can be seen in Appendix (26).

8
Formative Assessment Report Advanced Programming University of York

Calculating Correlation: Using the data extracted from the table, we utilise
the .corr() [33] [34] function to calculate the Pearson Correlation [35] [36] for
the numerical/categorical data, as seen in Appendix (27). Pearson correlation
is a measure of linear correlation between data, ranging from -1 to 1. Our key
measure to identifying statistical significance is the calculation of P-Value,
which is calculated on each pair of columns using nested loops and the
Pearsonr function from the SciPy library. [37] The relevant code can be seen
in Appendix (28).

Identifying statistical significance: The method to identify statistical


significance is straightforward. A measure is statistical significant when its p-
value is less than 0.05 [38]. We compare the results printed on the terminal to
what we decide to set as a measure of statistical significance. This allows the
user flexibility in deciding their own measures, given the necessary
information. Appendix (29)

A sample of the heat-map visualisation can be seen in Appendix (30).

Section 3: Reflection on the ethics, moral and legal aspects

Ethics vs Innovation is a classic dilemma of every Computer Scientist at some


point in their career. We argue that Ethics should hinder Innovation, but mostly
towards a consensual and desirable direction. Incorporating ethical
frameworks to Research & Development is essential to progressing towards
the path that society needs as a whole. Else, we risk of listening only to the
voices of those who use or can use our platform.

This does not mean that every innovation ought to have the same ethics. In
fact, ethical innovation appears to be directed towards a multi-level,
contextual understanding of the ethical frameworks surrounding innovative
decisions. [39] For example, drafting and creating an application might attempt
visionary progress, parallel to a main ethical ensemble, while patching or
making security updates might be governed by more flexible ethical
restrictions. In the same spirit, the mere fact that ethics should govern
innovative decisions does not render them ‘public’ or ‘aligned with national
interest’. [40] The question arises as to how distinct should ethics be from
innovation, via which means should they intermingle and most importantly,
which should be the source of the ethical framework.

A big portion of the literature has proposed an integrative approach. [41] [42]
The integration of Ethics and Innovation can at first appear to be a daunting
task. After all, it would involve a time consuming and at times inefficient

9
Formative Assessment Report Advanced Programming University of York

political process in a rapidly changing environment, stalling innovative


progress. What could therefore best incorporate ethical guidelines to
innovative progress?

We conclude that the answer has to lie in the replacement of politics by policy.
Policy-making is governed by efficiency, transparency and clear end-goals.
Incorporating policy plans to Research & Development makes for a more
efficient process in mitigating the conflictual aspects of Ethics and Innovation.
This leads to research disciplines like Responsible Innovation or Innovation
Ethics. [42] Responsible innovation appears to be indeed the way of
reconciling the two, and treating it as an area of importance, where expertise
in both aspects is highly sought after, will allow for a fine balance between
‘wants’ and ‘needs’.

To sum up, although ethics can seem to obstruct innovation at first, it is also
true that a clear, coherent, dynamic ethical framework can shape innovation in
the right direction, encompassing the minimum humanitarian needs that
provide the basis, upon which technological innovation can thrive.

10
Formative Assessment Report Advanced Programming University of York

References

[1][2] The Python Standard Library : threading - Thread-based parallelism, link:


https://docs.python.org/3/library/threading.html

[3] Educative inc: What are locks in Python?, 2023, link: https://
www.educative.io/answers/what-are-locks-in-python

[4] Saurabh Chaturvedi: Let’s Synchronize threads in Python, 2017, link:


https://betterprogramming.pub/synchronization-primitives-in-
python-564f89fee732

[5] Beazley David: Python Cookbook 3rd Edition, 2013, O’Reilly Media ISBN:
978-1-449-34037-7

[6] Everett L. Heidi: Consistency & Contrast: A content Analysis of Web Design
Instruction, 2014, Technical Communication, Vol. 61, No. 4 (November 2014),
pp. 245-256, https://www.jstor.org/stable/43748721

[7] Yalanska Marina: Introduction to color theory in UI Design, 2023, Uxcel,


Link: https://uxcel.com/blog/beginners-guide-to-color-theory

[8] Kaley Anna: Popups: 10 Problematic Trends and Alternatives, 2019, Nielsen
Norman Group, Link: https://www.nngroup.com/articles/popups/

[9] McKinney Wes: Python for Data Analysis 2nd Edition, 2018, O’Reilly Media,
ISBN: 978-1-491-95766-0

[10] Muhammed Taqi Raza: Top 10 Advantages of Matplotlib library in Python,


Educative, Link: https://www.educative.io/answers/top-10-advantages-of-the-
matplotlib-library-in-python

[11] Jesse James Garrett’s The Elements of User Experience: User-Centered


Design for the Web and Beyond (2nd Edition), 2002, Peachpit Publications,
ISBN: 978-0735712027

[12] Reddy Sandhya: Why do Data Scientists prefer Python over Java?, 2020,
Medium, Link: https://medium.com/quick-code/why-do-data-scientists-prefer-
python-over-java-d570499a1fcd

[13] Editors of Encyclopedia Britannica: Java, Encyclopedia Brittanica. Link:


https://www.britannica.com/technology/Java-computer-programming-
language

[14] Kuhlman, Dave: A Python Book: Beginning Python, Advanced Python, and
Python Exercises, 2011, Platypus Global Media, ISBN: 978-0984221233

11
Formative Assessment Report Advanced Programming University of York

[15] VanderPlas Jake: Python Data Science Handbook, 2023, Oreilly Media,
ISBN: 978-1098121228

[16] Great Learning Team: Top 30 Libraries to know in 2023, Great Learning,
2922, Link: https://www.mygreatlearning.com/blog/open-source-python-
libraries

[17] Zheshu Jia: Marine Integrated Database Management System Based on


Improved Object-Relational Mapping, 2019, Journal of Coastal Research,
Special Issue No. 98: Recent Developments in Practices and Research on
Coastal Regions: Transportation, Environment and Economy

[18] Heath, Sebastian: Narrating Transitions and Transformations in Cultural


Heritage Digital Workflows Using a JSON-Encoded Dataset of Roman
Amphitheaters, 2022, Digital Heritage and Archaeology in Practice: Data,
Ethics, and Professionalism, University Press of Florida

[19] Amazon AWS: What’s the difference between JSON and XML?, Amazon,
Link: https://aws.amazon.com/compare/the-difference-between-json-xml/

[20] Welekwe Amakiri: What is a JSON Injection and How to Prevent it?, 2022,
Comparitech, Link: https://www.comparitech.com/net-admin/json-injection-
guide/

[21] Anilkumar Nikhil: XML Injection, 2022, Beagle Security, Link: https://
beaglesecurity.com/blog/vulnerability/xml-injection.html

[22] Pranit Sharma: Python Pandas: Convert commas decimal separators to


dots within a Dataframe, 2022, Link: https://www.includehelp.com/python/
pandas-convert-commas-decimal-separators-to-dots-within-a-dataframe.aspx

[23] Benedikt Droste: How To Change DataTypes In Pandas in 4 Minutes, 2020,


TowardsDataScience, Link: https://towardsdatascience.com/how-to-change-
datatypes-in-pandas-in-4-minutes-677addf9a409

[24] Zach, How to Specify Format in pandas.to_datetime, 2023, Statology,


Link: https://www.statology.org/pandas-to-datetime-format/

[25] W3Schools: Pandas DataFrame median() Method, Link: https://


www.w3schools.com/python/pandas/ref_df_median.asp

[26] Marsja Erik: Pandas Count Occurrences in Column – i.e. Unique Values,
2020, Link: https://www.marsja.se/pandas-count-occurrences-in-column-
unique-values/#?utm_content=cmp-true

12
Formative Assessment Report Advanced Programming University of York

[27] Saturn Cloud: Matplotlib Pie Chart: Displaying Both Value and Percentage,
2023, Link: https://saturncloud.io/blog/matplotlib-pie-chart-displaying-both-
value-and-percentage/

[28] Ravikiran A.S: An Interesting Guide to Visualize Data Using Python


Seaborn, 2023, Link: https://www.simplilearn.com/tutorials/python-tutorial/
python-seaborn

[29] Lukita Andreas: Plotly and Pandas: Combining Forces for Effective Data
Visualization, 2023, Link: https://towardsdatascience.com/plotly-and-pandas-
combining-forces-for-effective-data-visualization-2e2caad52de9

[30] Lim John: Plotting with PyQtGraph, 2023, Link: https://


www.pythonguis.com/tutorials/plotting-pyqtgraph/

[31] Pandas PyData official Documentation: pandas.Categorical, Link: https://


pandas.pydata.org/docs/reference/api/pandas.Categorical.html

[32] Pandas PyData official Documentation: pandas.Series.cat.codes, Link:


https://pandas.pydata.org/docs/reference/api/pandas.Series.cat.codes.html

[33] Pandas PyData official Documentation: pandas.DataFrame.corr, Link:


https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html

[34] Mirko Stojiljkovic: NumPy, SciPy, and pandas: Correlation With Python,
Real Python, Link: https://realpython.com/numpy-scipy-pandas-correlation-
python/

[35] Shaun Turney: Pearson Correlation Coefficient (r) | Guide & Examples,
2023, Link: https://www.scribbr.com/statistics/pearson-correlation-coefficient/

[36] Laura Igual: Introduction to Data Science, 2017, Springer, ISBN:


978-319-50016-4

[37] Jason Brownlee: How to Calculate Correlation Between Variables in


Python, 2018, Machine Learning Mastery: https://
machinelearningmastery.com/how-to-use-correlation-to-understand-the-
relationship-between-variables/

[38] Saul McLeod: P-Value And Statistical Significance: What It Is & Why It
Matters, 2023, SimplyPsychology, Link: https://www.simplypsychology.org/p-
value.html

[39] Fabian Chris: The Ethics of Innovation, 2014, Stanford Social Innovation
Review, Link: https://ssir.org/articles/entry/the_ethics_of_innovation

13
Formative Assessment Report Advanced Programming University of York

[40] Morgan P. Miles: Innovation, Ethics, and Entrepreneurship, 2004, Journal


of Business Ethics, Vol. 54, No. 1 (Sep., 2004), pp. 97-101 (5 pages), https://
www.jstor.org/stable/25123326

[41] David de Cremer: The ethics of technology innovation: a double-edged


sword?, 2021, Springer, AI and Ethics, Link: https://link.springer.com/article/
10.1007/s43681-021-00103-x

[42] Domenec Mele: What’s Ethics Role in Responsible Innovation?, 2016,


IESE Business School, Link: https://blog.iese.edu/ethics/2016/08/18/whats-
ethics-role-in-responsible-innovation/

14
Formative Assessment Report Advanced Programming University of York

15
Formative Assessment Report Advanced Programming University of York

Appendix (1): Creating the Threads

Platform: Jupyter Notebook

Appendix (2): Start() and Join() the threads

Platform: Jupyter Notebook

Appendix (3): Creating and utilising a Synchronisation Lock

16
Formative Assessment Report Advanced Programming University of York

Start Program
GUI

Load Data
into program

Clean Data
process

NGR Cleaning
Appendix (4): Program
Flow Chart
Extract EID
Values
Program used to create Possible
Concurrency
diagram: Lucid Chart Retrieve
Additional
Data

Retrieve
Graph Data

Calculate
Statistics

Generate
Graphs
Process

Generate Pie

Generate
Scatter
Possible (Chosen)
concurrency

Generate Bar

Calculate
Correlation +
Graph

Save data
and exit

17
Formative Assessment Report Advanced Programming University of York

Appendix (5): Creating a Main Window

Appendix (6): Displaying statistics on a pop-up window

Appendix (7): Displaying the Pie Chart on a pop-up window

18
Formative Assessment Report Advanced Programming University of York

Appendix (8): GUI Framework

Appendix (9): Declaring variables

Appendix (10): Loading data into variables

19
Formative Assessment Report Advanced Programming University of York

Appendix (11): Data visualisation

Appendix (12): Saving as JSON

Appendix (13): Loading JSON files

20
Formative Assessment Report Advanced Programming University of York

Appendix(14): Data Format Diagram

Appendix (15): Converting Dates

Appendix (16): Replacing and converting values

Appendix (17): .to_numeric and .to_datetime conversions

21
Formative Assessment Report Advanced Programming University of York

Appendix (18): Iteration and calculations

Appendix (19): Displaying the statistics using Display_statistics()

22
Formative Assessment Report Advanced Programming University of York

Appendix (20): Pie chart visualisation

Appendix (21): Extracting data with Pandas .ravel

Appendix (22): Counting occurrences and calculating percentage

23
Formative Assessment Report Advanced Programming University of York

Appendix (23): Creation of “Other Sections” label (boolean mask and count

Appendix (24): Creating and formatting the Pie

Appendix (25): Canvas, Widget, Toolbar and Packing

Appendix (26): Extracting and Converting Data

24
Formative Assessment Report Advanced Programming University of York

Appendix (27): Calculating the Pearson Correlation

Appendix (28): Calculating P-value

Appendix (29): P-value table

25
Formative Assessment Report Advanced Programming University of York

Appendix (30): Heat-map visualisation

26
Formative Assessment Report Advanced Programming University of York

27

You might also like