Advanced Programming Final Client Report
Advanced Programming Final Client Report
Advanced Programming Final Client Report
Task (a)
P
ython threads could be applied in a variety of sections in our program.
Anything that is quite time consuming, can be amended to run
concurrently with threads. [1]
• Categorise the tasks: Our program currently has three graph generation
functions, a scatter plot, a pie chart and a bar graph.
• Create separate threads for the graphs: Creating a thread for each
function allows us to later initiate those threads. The code for creating the
threads can be seen in appendix (1)
• Start() and Join() threads: We need to thread each function separately using
Start(), and then Join() each thread so that the programs awaits the
completion of all threads before proceeding, as elaborated in the Python-
Threading document. [2] The new code can be seen below
The main issues that can arise when operating threads in our program are: 1)
synchronisation and 2) data access. If threads is modifying data
simultaneously, this can result in what is called race condition, where the
program is trying to perform two or more operations at the same time. [3] Race
conditions can result in errors or miscalculations, as the thread is attempting
to draw on a database that has not been updated, while another thread is
performing the calculation. To avoid this, synchronisation remedies like locks
can be used. Locks are multithreading tools that allow the efficient
1
Formative Assessment Report Advanced Programming University of York
The challenges facing the programmer while implementing threads can be hard
to overcome, and might undermine the potential benefits of implementing
concurrency in the code. It is important that we measure the advantages and
disadvantages obtained from using threads, relative to our code functions, its
recourse intensity and overall structure. [5]
Task (b)
T
he integration and implementation of various cognitive sciences in a GUI
layout has created a dynamic framework of analysing, preparing and
presenting context-appropriate GUI and has contributed significantly to
a more engaging and inclusive best practices. [6]
• Load Data
• Clean Data
• Calculate Data
• Save Data
2
Formative Assessment Report Advanced Programming University of York
Our GUI is created with several aspects of user interaction and convenience in
mind, the main characteristics being 1) Simplicity, 2) Functionality &
Consistency and 3) Aesthetics. Those principles are implemented and further
elaborated in the following paragraphs.
• A main window: The window is appropriately resized so that it fits well with
the button layout. The relevant code can be seen in Appendix (5).
• Button widgets: The different buttons are positioned in a linear order of the
expected interaction. The buttons are responsive to hovering over and
clicking, and implement several aspects of colour theory. [7] The selected
colors are Maya Blue, Dark Cyan, and a light shade of red. The buttons are
tightly packed and are linked to a specific function within the main program.
3
Formative Assessment Report Advanced Programming University of York
generate graphs without cleaning the data first). The program remains faithful
to best practices in UI, as presented in various books and papers. [11]
Task (c)
P
ython and java both rank among the most widely known and used
programming languages. Their inception can be traced back to 1991
and 1995 respectively and since then, the two languages have been
steadily developing and getting more popular. [12] [13]
Although both languages have seen substantial change in their libraries and
templates, Python can be argued to be the top choice among Beginners and
Experienced Data Scientists alike. The main reason is simple, Python aims to
do more with less:
4
Formative Assessment Report Advanced Programming University of York
Task (a)
E
very data format has distinct advantages and disadvantages. Choosing
one depends highly on the particularities of the program structure to be
designed and implemented.
In our case, we selected JSON as the most appropriate format. JSON is used
broadly for various purposes, from Marine Integrated Database Management
[17] to Transformations in Cultural Heritage [18]. In our case, we selected it
due to several reasons:
• Advantage - our API Format is simple: The user-friendly API design in our
software promotes JSON as the better option, due to its compact design and
5
Formative Assessment Report Advanced Programming University of York
simplicity. It also presents both users and developers (in case of shared
code) with higher readability.
• Advantage - File Size and Security: One of the main reasons to prefer
JSON over XML is due to its decreased file size and its increased security.
Although mitigating the security shortcomings of XML is entirely feasible,
JSON is the way to go for a secure data structure environment. However, it is
worth nothing that both JSON and XML can be subject to an Injection Attack.
[20] [21]
Task (b)
T
he 3rd requirement was quite the challenge for a first timer. The code
used to implement the task is found in the function
calculate_statistics(). The functions performs various calculations on
pre-defined columns of the working_data DataFrame, and presents the results
in a pop-up window.
6
Formative Assessment Report Advanced Programming University of York
• Iterating and calculating: The function then iterates over the filtered_date
columns, performing various calculations using the Pandas built-in
functions such as mode, mean and median, concatenating the calculations
to statistics_output, as seen in Appendix (18). [25]
Task (c)
A
fter extracting and cleaning the data, performing visualisations was
relatively easier and straightforward. We attempted to visualise the
data in three different ways, a Pie Chart, a Scatter Plot, and a Bar
Graph. For reasons of word limit, we are going to elaborate only on the Pie
Chart, which can be seen in Appendix (20).
7
Formative Assessment Report Advanced Programming University of York
Visualising the data: We proceed with creating a new pie, and adding the
percentage, the labels (using the label.index), and a percentage format. [27]
Appendix (24)
Task (d)
I
n implementing task (d), our approach is twofold. 1) We first have to find a
meaningful way to group/analyse string values, and 2) we must be able to
find a meaningful correlation between them. Our approach of choice was to
convert string elements in numerical/categorical data, and then compare
their correlation and p-values to examine whether there is a statistical
significant correlation between them. We proceed explaining both steps.
8
Formative Assessment Report Advanced Programming University of York
Calculating Correlation: Using the data extracted from the table, we utilise
the .corr() [33] [34] function to calculate the Pearson Correlation [35] [36] for
the numerical/categorical data, as seen in Appendix (27). Pearson correlation
is a measure of linear correlation between data, ranging from -1 to 1. Our key
measure to identifying statistical significance is the calculation of P-Value,
which is calculated on each pair of columns using nested loops and the
Pearsonr function from the SciPy library. [37] The relevant code can be seen
in Appendix (28).
This does not mean that every innovation ought to have the same ethics. In
fact, ethical innovation appears to be directed towards a multi-level,
contextual understanding of the ethical frameworks surrounding innovative
decisions. [39] For example, drafting and creating an application might attempt
visionary progress, parallel to a main ethical ensemble, while patching or
making security updates might be governed by more flexible ethical
restrictions. In the same spirit, the mere fact that ethics should govern
innovative decisions does not render them ‘public’ or ‘aligned with national
interest’. [40] The question arises as to how distinct should ethics be from
innovation, via which means should they intermingle and most importantly,
which should be the source of the ethical framework.
A big portion of the literature has proposed an integrative approach. [41] [42]
The integration of Ethics and Innovation can at first appear to be a daunting
task. After all, it would involve a time consuming and at times inefficient
9
Formative Assessment Report Advanced Programming University of York
We conclude that the answer has to lie in the replacement of politics by policy.
Policy-making is governed by efficiency, transparency and clear end-goals.
Incorporating policy plans to Research & Development makes for a more
efficient process in mitigating the conflictual aspects of Ethics and Innovation.
This leads to research disciplines like Responsible Innovation or Innovation
Ethics. [42] Responsible innovation appears to be indeed the way of
reconciling the two, and treating it as an area of importance, where expertise
in both aspects is highly sought after, will allow for a fine balance between
‘wants’ and ‘needs’.
To sum up, although ethics can seem to obstruct innovation at first, it is also
true that a clear, coherent, dynamic ethical framework can shape innovation in
the right direction, encompassing the minimum humanitarian needs that
provide the basis, upon which technological innovation can thrive.
10
Formative Assessment Report Advanced Programming University of York
References
[3] Educative inc: What are locks in Python?, 2023, link: https://
www.educative.io/answers/what-are-locks-in-python
[5] Beazley David: Python Cookbook 3rd Edition, 2013, O’Reilly Media ISBN:
978-1-449-34037-7
[6] Everett L. Heidi: Consistency & Contrast: A content Analysis of Web Design
Instruction, 2014, Technical Communication, Vol. 61, No. 4 (November 2014),
pp. 245-256, https://www.jstor.org/stable/43748721
[8] Kaley Anna: Popups: 10 Problematic Trends and Alternatives, 2019, Nielsen
Norman Group, Link: https://www.nngroup.com/articles/popups/
[9] McKinney Wes: Python for Data Analysis 2nd Edition, 2018, O’Reilly Media,
ISBN: 978-1-491-95766-0
[12] Reddy Sandhya: Why do Data Scientists prefer Python over Java?, 2020,
Medium, Link: https://medium.com/quick-code/why-do-data-scientists-prefer-
python-over-java-d570499a1fcd
[14] Kuhlman, Dave: A Python Book: Beginning Python, Advanced Python, and
Python Exercises, 2011, Platypus Global Media, ISBN: 978-0984221233
11
Formative Assessment Report Advanced Programming University of York
[15] VanderPlas Jake: Python Data Science Handbook, 2023, Oreilly Media,
ISBN: 978-1098121228
[16] Great Learning Team: Top 30 Libraries to know in 2023, Great Learning,
2922, Link: https://www.mygreatlearning.com/blog/open-source-python-
libraries
[19] Amazon AWS: What’s the difference between JSON and XML?, Amazon,
Link: https://aws.amazon.com/compare/the-difference-between-json-xml/
[20] Welekwe Amakiri: What is a JSON Injection and How to Prevent it?, 2022,
Comparitech, Link: https://www.comparitech.com/net-admin/json-injection-
guide/
[21] Anilkumar Nikhil: XML Injection, 2022, Beagle Security, Link: https://
beaglesecurity.com/blog/vulnerability/xml-injection.html
[26] Marsja Erik: Pandas Count Occurrences in Column – i.e. Unique Values,
2020, Link: https://www.marsja.se/pandas-count-occurrences-in-column-
unique-values/#?utm_content=cmp-true
12
Formative Assessment Report Advanced Programming University of York
[27] Saturn Cloud: Matplotlib Pie Chart: Displaying Both Value and Percentage,
2023, Link: https://saturncloud.io/blog/matplotlib-pie-chart-displaying-both-
value-and-percentage/
[29] Lukita Andreas: Plotly and Pandas: Combining Forces for Effective Data
Visualization, 2023, Link: https://towardsdatascience.com/plotly-and-pandas-
combining-forces-for-effective-data-visualization-2e2caad52de9
[34] Mirko Stojiljkovic: NumPy, SciPy, and pandas: Correlation With Python,
Real Python, Link: https://realpython.com/numpy-scipy-pandas-correlation-
python/
[35] Shaun Turney: Pearson Correlation Coefficient (r) | Guide & Examples,
2023, Link: https://www.scribbr.com/statistics/pearson-correlation-coefficient/
[38] Saul McLeod: P-Value And Statistical Significance: What It Is & Why It
Matters, 2023, SimplyPsychology, Link: https://www.simplypsychology.org/p-
value.html
[39] Fabian Chris: The Ethics of Innovation, 2014, Stanford Social Innovation
Review, Link: https://ssir.org/articles/entry/the_ethics_of_innovation
13
Formative Assessment Report Advanced Programming University of York
14
Formative Assessment Report Advanced Programming University of York
15
Formative Assessment Report Advanced Programming University of York
16
Formative Assessment Report Advanced Programming University of York
Start Program
GUI
Load Data
into program
Clean Data
process
NGR Cleaning
Appendix (4): Program
Flow Chart
Extract EID
Values
Program used to create Possible
Concurrency
diagram: Lucid Chart Retrieve
Additional
Data
Retrieve
Graph Data
Calculate
Statistics
Generate
Graphs
Process
Generate Pie
Generate
Scatter
Possible (Chosen)
concurrency
Generate Bar
Calculate
Correlation +
Graph
Save data
and exit
17
Formative Assessment Report Advanced Programming University of York
18
Formative Assessment Report Advanced Programming University of York
19
Formative Assessment Report Advanced Programming University of York
20
Formative Assessment Report Advanced Programming University of York
21
Formative Assessment Report Advanced Programming University of York
22
Formative Assessment Report Advanced Programming University of York
23
Formative Assessment Report Advanced Programming University of York
Appendix (23): Creation of “Other Sections” label (boolean mask and count
24
Formative Assessment Report Advanced Programming University of York
25
Formative Assessment Report Advanced Programming University of York
26
Formative Assessment Report Advanced Programming University of York
27