Shakeel Ur Rahman
Shakeel Ur Rahman
By
Shakeel ur Rahman
Master of Science in
Physics
Islamabad, Pakistan
(2024)
DEDICATION
My Parents
Whose relentless efforts paved the way to my success.
To my
Grandfather
Whose absence I miss the most.
To my
Teachers
Who enabled me to be what I wished to be.
And to my
friends
Whose company made every moment worth living.
I
ACKNOWLEDGEMENTS
All praises belong to Allah almighty, the creator of all beings and the most merciful
and benevolent. Blessings of almighty have been accompanying throughout writing
this piece. He helped me to be creative and empowered me with strength to cope with
hard consequences and obstacles on the way. Then tributes and countless blessings to
the Holy Prophet (PBUH); the ultimate source of inspiration.
I submit my loyalty and respect to my parents whose magnanimous, selfless and un-
conditional love and affection carved out the way to present.
I
Contents
LIST OF TABLES V
ABSTRACT IX
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
II
2.6.1.2 NP Class . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.1.3 NP Hard Class . . . . . . . . . . . . . . . . . . . . . . 14
2.6.1.4 NP Complete . . . . . . . . . . . . . . . . . . . . . . . 14
2.6.1.5 BQP Class . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7 Evolution of Quantum Computing . . . . . . . . . . . . . . . . . . . . . 14
III
4 VQC for Penguins and Titanic Data sets 30
4.1 Penguins Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 QML Classification for Penguin Dataset . . . . . . . . . . . . . . . . . 30
4.2.1 Importing and Displaying Dataset . . . . . . . . . . . . . . . . . 31
4.2.2 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.3 Label Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.4 Previewing Data . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.5 Displaying Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.6 Removal of Weak Variables . . . . . . . . . . . . . . . . . . . . 34
4.2.7 Ordering the Data . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.8 Normalisation and Test/Train Split . . . . . . . . . . . . . . . . 35
4.2.9 Encoding Classical Data into Quantum States . . . . . . . . . . 36
4.2.10 Application of Suitable Ansatz . . . . . . . . . . . . . . . . . . . 37
4.2.11 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.12 Calling VQC Function . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.13 Displaying Optimisation Path . . . . . . . . . . . . . . . . . . . 39
4.2.14 Displaying Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Classification of Penguins Data Through Classical Support Vector Ma-
chine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 QML and CML Comparison for Titanic Data . . . . . . . . . . . . . . 41
IV
List of Tables
6.1 Optimal combinations of reps of feature map and Ansatz, penguins data 57
6.2 Optimal combinations of reps of feature map and Ansatz, Titanic data 60
V
List of Figures
VI
4.9 Ordering the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.10 Selecting target and labels . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.11 Splitting data into train and test data . . . . . . . . . . . . . . . . . . . 36
4.12 Data encoding through ZZ feature map . . . . . . . . . . . . . . . . . . 36
4.13 RealAmplitudes ansatz with Reps = 3 . . . . . . . . . . . . . . . . . . 37
4.14 COBYLA Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.15 Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.16 Variational quantum classifier . . . . . . . . . . . . . . . . . . . . . . . 39
4.17 Objective function optimization against the number of iterations . . . . 39
4.18 Accuracy of training and test data . . . . . . . . . . . . . . . . . . . . . 40
4.19 Classification through classical support vector machine algorith . . . . 40
4.20 Accuracy of results using SVM . . . . . . . . . . . . . . . . . . . . . . . 41
4.21 VQC result for Titanic data . . . . . . . . . . . . . . . . . . . . . . . . 41
4.22 SVM result for Titanic data . . . . . . . . . . . . . . . . . . . . . . . . 42
5.1 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 3, optimizer = cobyla. . . . . . . 44
5.2 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 5, optimizer = cobyla. . . . . . . 44
5.3 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 5, optimizer = cobyla. . . . . . . 45
5.4 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 6, optimizer = cobyla. . . . . . . 45
5.5 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 3, optimizer = SPSA. . . . . . . 46
5.6 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 4, optimizer = SPSA. . . . . . . 46
5.7 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 5, optimizer = SPSA. . . . . . . 47
5.8 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 6, optimizer = SPSA. . . . . . . 47
5.9 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 3, optimizer = Cobyla. . . . . . 48
5.10 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 4, optimizer = Cobyla. . . . . . 49
5.11 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 5, optimizer = Cobyla. . . . . . 49
VII
5.12 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 6, optimizer = Cobyla. . . . . . 50
5.13 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 7, optimizer = Cobyla. . . . . . 50
5.14 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 3, optimizer = SPSA. . . . . . . 51
5.15 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 4, optimizer = SPSA. . . . . . . 51
5.16 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 5, optimizer = SPSA. . . . . . . 52
5.17 Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 6, optimizer = SPSA. . . . . . . 52
5.18 Training and test data accuracy against different reps of feature-map
and ansatz, number of features = 7, optimizer = SPSA. . . . . . . . . . 53
6.1 Training data accuracy against different No. of reps for various No. of
features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Test data accuracy against different No. of reps for various No. of fea-
tures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3 Training data accuracy against different No. of reps for various No. of
features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.4 Test data accuracy against different No. of reps for various No. of fea-
tures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.5 Training data accuracy against different No. of reps for various No. of
features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.6 Test data accuracy against different No. of reps for various No. of fea-
tures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.7 Training data accuracy against different No. of reps for various No. of
features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.8 Test data accuracy against different No. of reps for various No. of fea-
tures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
VIII
Abstract
IX
Chapter 1
Introduction
This chapter is a general introduction to approaches and goals, which will follow in the
next chapters. It sets the objectives of the research work and briefly introduces the
tools that have been used.
1.1 Overview
Quantum computing technology is at the verge of revolution with improvements being
announced every day. The realisation of what once was a dream is imminent. The
utility and potential advantage that we can leverage through quantum computer, has
led us to pluck the chords of our imagination and tread on a path that was never been
trodden before. Today we have purely quantum and hybrid models ready to over-
come the challenges of modern era. Cryptography to sensing, AI to machine learning,
quantum computing is coming out to be the torchbearer of next tech revolution.
Quantum computing employ the peculiar laws of quantum mechanics and put them
to use for its construct and operation. Entanglement and superposition are two impor-
tant non classical phenomena at which, under the hood, all the quantum supremacy
rests. 0s and 1s commonly known as the bits in classical computing with deterministic
values, in quantum computing are transformed into qubits with simultaneous proba-
bilistic values of both 0 and 1. Quantum Annealing and gate model paradigms are two
independent approaches to quantum computation. Resorting to different mathematical
principles in their construct, both are now working in competition.
Quantum algorithms that have mathematically known advantages are yet to be
put to test on a quantum computer with enough qubits and computational resources.
Quantum hardware is a major obstacle in the way. At the moment the quantum
computers of several hundred qubits are in use but we would require millions of them
to prove the quantum advantage of Shor’s algorithm for instance. We have several types
of qubits currently in use such as superconducting qubits, trapped ions qubits, photonic
1
qubits etc. some of them needs ultra-low temperatures and others are difficult to scale
up. Qubits are extremely sensitive to noise and decoherence that cause them to give
faulty outputs, altering the results. Despite these challenges, active research is making
progress by leaps and bounds to make the qubits immune to noise and go beyond
the NISQ (noisy intermediate scale quantum computers) era. Latest achievements
announced by IBM and google is a silver lining.
1.2 Goals
Machine learning has become a pertinent part of modern tech industry. AI revolution is
marching forward, resting on the pillars of machine learning and data science. With the
advent of Quantum computing technology, a race has begun to achieve the quantum
advantage. And just like other algorithms a parallel of classical machine learning,
the quantum machine learning is looming. From quantum support vector machine to
quantum adversarial networks to quantum variational eigen solver, numerous successful
algorithms are underway and many have stood the test of time.
With all the hype on horizon goal of all the endeavor is to look for the quantum
advantage and possibly expand it beyond merely solving a few mathematical prob-
lems. Quantum computing has theoretically shown clear advantage with its potential
to solve some NP problems in polynomial time. QML is expected to outperform clas-
sical counterpart by employing its unique features of entanglement and superpositions
that enables it to represent complex data structure and map intricate relationships on
feature space. In NISQ era it’s all about building an algorithm that can efficiently run
on the existing device with minimum numbers of qubits and gates, and least possible
circuit depth. Looking for optimal number of gates to map the data efficiently onto
the quantum states is mandatory for an efficient model and this is the main goal of the
following work.
2
the same chapter. Chapter 5 contains the training and test data results for different
numbers of features for both datasets. Graphs of accuracy against different combina-
tions of reps are presented for easy comparison. Chapter 6 presents combined graphs of
various number of features and ends with analysis of the results. It concludes with the
final commentary on optimal combination of reps. Chapter 7 summarises the complete
work and thesis ends with Bibliography.
3
Chapter 2
Quantum computing leverages the peculiar laws of quantum mechanics to bring the
computation to life. The concept took birth in the time of Feynman [1] nevertheless, it
has been promulgated quite recently. Just like bits are the basic units of computation
in classical computing, Qubits or Q-bits are the fundamental units of computation
in quantum computing. Qubit is a mathematical entity to which information can be
coded. Besides, it also has a physical existence and there are many ways to prepare
qubits and we still are in the phase of hunting for the most efficient way to prepare
them.
1. Bits exists as 0 or 1 whereas Qubits exist as both 0,1 and their superposition.
4
Figure 2.1: Bit vs Qubit
[3]
5
associated with |0〉 and |1〉 respectively. Also, by Born Rule we have |α|2 + |β|2 = 1.
[5]
6
2.3 Multi Qubit System
The representation of qubits can be extended to express multi-qubit systems. This can
be achieved by taking the tensor product of individual qubit states. For instance, a
two-qubit state can be represented by the tensor product of two single-qubit states.
This approach provides a powerful formalism for describing the quantum states of
multi-qubit systems.
1 0 0 0
0 1 0 0
|00⟩ = , |01⟩ = , |10⟩ = , |11⟩ = (2.3)
0 0 1 0
0 0 0 1
7
2.5 Models of Quantum Computation
There are two main architectures of quantum computation. Both are independent and
have ample resources. One is adiabatic computation also known as quantum annealing
and the other one is, gate model quantum computation.
In the implementation, first qubits are prepared in the ground state of problem in-
dependent Hamiltonian and that Hamiltonian is adiabatically evolved to reach the
Hamiltonian whose ground state energy corresponds to the problem at hand. There
is a whole class of problems called QUBO (Quadratic Unconstrained Binary Opti-
mization) that is considered for solution under AQC framework. D-Wave quantum
computers are specially designed to undertake this implementation [12]. A general
Qubo is of the form:
s∗ = argmin s⊤ Qs + q ⊤ s (2.5)
s∈{−1,+1}n
Here s represents the possible global states of a system over which it is to be min-
imized. Q ∈ Rn×n is the coupling matrix that models interactions and vector q ∈ Rn
models additional constraints, Work of Faehi et al. [13] gives an amazing approach
which makes the quantum computing a fair choice to deal with complicated QUBOs.
It starts with known values of Q and q then a time dependent system of n qubits is
considered.
n −1
2X
ψ(t) = αi (t) |ψi ⟩ (2.6)
i=0
Then it is evolved under time dependent Hamiltonian H(t) so that its behaviour is
governed by the Schrodinger equation
dψ(t)
iℏ = H(t)ψ(t) (2.7)
dt
8
Qubits are prepared in the ground state of the Hamiltonian at the beginning called
HB .
X n
HB = − σxi (2.8)
i=1
Finally, after the time evolution at time T measurement is performed on the qubits.
This causes the system to collapse onto one of its 2n basis states whose probability
2
is given by αi (T ) . Since, the adiabatic evolution was steered towards the problem
Hamiltonian HP so, the basis states |ψi ⟩ that correspond to the ground state of HP
are more likely to be found.
9
Figure 2.4: Left: Example of a digital subtractor circuit. Right: Example of a quan-
tum gate circuit for generating a 4-qubit Bell state.
X gate does the same operation of bit switch as does the NOT gate in classical com-
puting. X gate switch
" #
0 1
X= = |0⟩⟨1| + |1⟩⟨0| (2.11)
1 0
" #
0 −i
Y = = −i|0⟩⟨1| + i|1⟩⟨0| (2.13)
i 0
" #
1 0
Z= = |0⟩⟨0| − |1⟩⟨1| (2.15)
0 −1
10
" #
1 1 1
H= = √ |0⟩⟨0| + |0⟩⟨1| + |1⟩⟨0| − |1⟩⟨1| (2.17)
1 −1 2
1 1
(2.18)
H |0⟩ = √ |0⟩ + |1⟩ , H |1⟩ = √ |0⟩ − |1⟩
2 2
In addition to single qubit gates we also have multi-qubit gates, which are applied
on more than one qubits. Two important examples are CNOT gate and Toffoli gate.
CNOT gate is also known as entangling gate since it is used to entangle two or more
qubits. It has one or more control qubits and one target qubit. If control qubit is in 0
state then the target qubit won’t change and if control qubit is in 1 state then target
qubit is flipped. Its operation is shown below:
1 0 0 0
0 1 0 0
CNOT = = |0⟩ ⟨0| ⊗ I + |1⟩ ⟨1| ⊗ X (2.19)
0 0 0 1
0 0 1 0
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
Toffoli = (2.21)
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
11
Action of Toffoli gate is given as
T |000⟩ = |000⟩
T |001⟩ = |001⟩
T |010⟩ = |010⟩
T |011⟩ = |011⟩
(2.22)
T |100⟩ = |100⟩
T |101⟩ = |101⟩
T |110⟩ = |111⟩
T |111⟩ = |110⟩
Both Quantum annealing and gate model framework work equally good in their re-
spective domains with great resources available.
12
complexity can be represented using big O notation. For instance a problem which
takes polynomial time can be represented as O(nc ) and one with quadratic time can
be represented as O(n2 ).
2.6.1.1 P Class
This class consists of the problems which can be solved and verified in deterministic
polynomial time. It contains decision making problems with ’Yes’ or ’No’ as outputs.
some problems could be hard but nevertheless solvable in polynomial time. Some
common examples of P class problems are matrix multiplication, maximum flow, linear
programming etc.
2.6.1.2 NP Class
NP class contains problems that take non deterministic polynomial time to solve. It
can take from polynomial to exponential time to solve but only polynomial time to be
verified. for example a decision making problem such as a simple ’Yes’ or ’No’ problem
may take non-deterministic polynomial time to solve but once the answer is known it
takes polynomial time to verify it. Common problems in NP class are graph colouring,
graph isomorphism, factoring etc [19].
13
2.6.1.3 NP Hard Class
NP hard class contains such special NP problems if any algorithm is found to solve
them in polynomial time then many other problems in NP can be solved based on that
algorithm. There is a whole class of such NP hard problem and these are considered as
hard as the hardest problem in NP. We don’t know exactly all the NP hard problems
but one common example is of travelling salesman problem [20].
2.6.1.4 NP Complete
BQP stands for Bounded-error Quantum Polynomial time. It is a class of problems that
can be solved by quantum computer in polynomial time. When we talk about quantum
advantage it is primarily this class of problems which is our target [22]. Bounded error
refers to the fact that quantum computer should be able to solve these problems with
error probability no more than 1/3 [23]. It is an open question whether or not BQP
is subset of NP. We have no conclusive arguments to either support or refute the idea
hitherto [24].
14
developed one for existing NISQ era Near term quantum computers and others are for
future fault tolerant quantum computers. We have reached the quantum computers of
several hundred qubits, hitherto.
We are quickly striding forward to look for areas and algorithms which can materialise
our dream of quantum supremacy. Extent of research and surge of new techniques hint
that a breakthrough is imminent. Corner stones of Quantum computing technology are
efficient qubits, gates and then algorithm that would give quantum supremacy using
these qubits and gates. In the next chapter brief introduction to Quantum Machine
Learning with its various types will be explained.
15
Chapter 3
In the last chapter we presented the detailed architecture of quantum computing and
ended our discussion with two parallel frameworks; quantum annealing and gate model
quantum computing. In this chapter an overview of machine learning with detailed
explanation of quantum machine learning, coupled with its popular classes will be
presented.
16
Figure 3.1: Types of Machine Learning
17
3.2.3 Reinforcement Learning
Reinforcement learning works on the trial and error basis. Algorithm is rewarded with
positive score when it makes correct prediction to the desired output and is awarded
negative score if the prediction goes wrong. System learns to maximize the total score
and that is how it learns to reach the output with minimum steps and correct prediction.
It is just like a chess master who carefully plans his next move based on the moves
played by the opponent. Important algorithms that employ reinforcement learning are
Q-Learning, Policy Gradient Methods etc [29].
18
3.3.1 Classical Data on Classical Device (CC)
In this approach classical data is processed on the classical device which is conventional
machine learning as discussed in section (3.1,3.2). In the context of Quantum machine
learning it refers to the machine learning based on methods borrowed from the frame-
work of quantum information. Application of tensor networks is an important example
of this approach. Wittek [31] describe it as quantum -like machine learning.
19
and clustering using unsupervised learning technique. And on the basis of the type of
machine learning task at hand we can choose what kind of algorithm is aptly applicable.
Some run purely on quantum resources but, there are some like Variational Quantum
Classifier (VQC) which run on the hybrid model, using both classical and quantum
resources. Researchers are still looking for the algorithms which have the potential for
vivid and promising quantum advantage [34].
In quantum support vector machines and quantum kernal methods, quatum feature
maps such ZZfeaturemap etc are used to map the input vector states ϕ(⃗x) to higher
dimensional Hilbert space |ϕ(⃗x)⟩⟨ϕ(⃗x)| using a unitary transformation. Whereas com-
plex pattern as shown in Fig 3.5 can be separated by linear hyper plane. Constructing
quantum feature maps employing parameterized quantum circuits, which are classi-
cally hard to simulate, is expected to be an important step towards possible advantage
of QML over classical machine learning approaches, and is an active area of research.
[38]
20
Figure 3.4: Kernal Trick to eperate nonlinearly separable data
[37]
21
The idea to replicate GANs in the Quantum realm was proposed by [41] and [42]
in 2018. The Basic idea is to replace the classical structure of either Generator or
Discriminator or both with quantum parameterized circuits to enable it to process
quantum data. Generator will produce a desired state |ψ⟩ using repetitive update to
its parameters. In QGAN however, landscape of cost function evolves to the goal of
improvement in creating the fake data that could dodge the discriminator. In fully
quantum QGAN a discriminator is fed with both real and fake data, to that end first
step is to encode the classical data into quantum states, for that real data is encoded
using a unitary UR .
|DataR ⟩ = UR |0⟩⊗n (3.1)
After encoding it is sent to the parameterised circuit UD(θ) which iteratively update
the parameters θ, to the end of minimizing the cost function. Similarly the generator
creates fake data state using a parameterised unitary UG(θG ) on to the initial state |0⟩⊗n
E
CostD = − Pr D θD , G (θG ) =| real (3.4)
QGAN is a nascent approach in QML it is too early to look for quantum advantages
however due to inherent quantum mechanical laws that QGANs are knitted on expected
to enable them to sample and manipulate classically intractable probability distribution
which would potentially give us useful applications in chemistry, drugs discovery and
finance.[41] , [42].
22
1. Feature Map
2. Ansatz
3. Cost function
4. Optimizer
The gist of VQE is to look for a quantum state with the minimum eigenvalue of a
certain observable. It does it so based on the mathematical insights of variational
theorem of quantum mechanics, which states that: ’If the (normalized) state |ψ⟩ of a
quantum system depends on a parameter vector θ then the optimal approximation of
the ground state (i.e. the eigenstate |ϕ0 ⟩ with the minimum eigenvalue λ0 ) is the one
that minimizes the expectation value of the Hamiltonian H’ [44].
⃗ Ĥ|ψ(θ)⟩
⟨Ĥ⟩ = ⟨ψ(θ)| ⃗ ≥ λ0 (3.5)
23
Qiskit which a Python Library by IBM has been used for all the work here onward
and in the next chapter as well primarily due to extensive resources available out there.
However same design can amply be generated in the other languages like Q-sharp or
similar Libraries like Cirq and Pennylane [45].
x(m) is a N dimensional vector with m = 1,. . . . . . ..,M we will use following tech-
niques to encode this datasets into quantum states.
In basis encoding each value of classical dataset is encoded in Z basis (computational ba-
sis) of 0s and 1s. for instance x=3 can be written in 4-bit string as 0011 and same x=3 in
4-qubit system can be be written in Dirac notation as |0011⟩ more generally a classical
dataset x = (b1 , b2 , . . . , bN ) will be mapped into quantum state |xm ⟩ = |b1 , b2 , . . . , bN ⟩,
Where bn ∈ {0, 1}, for n = 1 , . . . , N. Entire datasets can be represented as the su-
perposition of computational basis as follows:
M
1 X m
|X ⟩ = √ |x ⟩ (3.7)
M m=1
N
X
|ψa ⟩ = ai |i⟩ (3.8)
i=1
24
Where, N = 2n , ai is the ith element of a and |i⟩ is the ith computational basis
state. And ofcourse quantum state must be normalized.
Angle encoding is most common method in use for encoding classical data in quantum
states. In this method of encoding classical data is encoded as rotation angle of qubit.
For instance the data points a = (a1 , ......, aN ).
Angle encoding is different from basis and amplitude encoding in a way that it
encodes one data value at a time. In angle encoding, angles can also be encoded as the
angles of Unitary gate [47]
N
X
S(aj ) = U (ai (i) ) (3.10)
i=1
Here, " #
(i) (i)
cos(ai ) − sin(ai )
U ai (i) = (i) (i) (3.11)
sin(ai ) cos(ai )
25
Figure 3.7: Ansatz: EfficientSU2 with features = 3, Reps = 1, Entangling Pattern =
Circular
Based on the types of Gates used, Number of Reps and Entangling Pattern, we can
generate an Ansatz of our own. Efficiency of a particular type of Ansatz depends on
the input data features. However, three important features; expressibility , entangling
capability and hardware efficiency would determine the suitability of the Ansatz which
are discussed below in detail.
3.5.2.1 Expressibility
26
Figure 3.8: Expressibility of various PQC models shown on Bloch sphere
[48]
Entanglement is non classical correlation between the qubits. Ability of PQC to create
more entangled states is expected to express complex data feature and represent com-
plicated states on quantum landscape. Entanglement is measured by Meyer-Wallach
entanglement measure (Meyer and Wallach,2002) [49]. A non-entangled product state
would have Meyer-Wallach measurement of 0 and maximally entangled state such as
Bell state would have Meyer-Wallach measure of 1.
In Figure 3.9 Circuit A has no entanglement among the qubits so its Meyer-Wallach
measurement is 0 and circuit B has entanglement of linear pattern so it will have Meyer-
Wallach measurement greater than 0.
27
3.5.2.3 Hardware efficiency
Quantum computers of NISQ (Noisy Intermediate Scale Quantum computer) era suf-
fer decoherence in which qubits are difficult to be maintained in complete isolation.
Environment continuously interact with them, causing their states to change and this
results into gate fidelities. With this problem at hand circuit depth and the number of
gates applied become an important factor determining the accuracy of results. Explor-
ing hardware efficient PQC that reconciles with hardware efficiency in existence today
is an active area of research.[50]
3.5.4 Optimisation
After evaluation of the expectation value of cost function result is passed on to the
optimiser to look for parameter that minimise the output. This step is repeated many
times and parameters are updated with each iteration, until it detects the minimum
value and then optimiser will return the parameters values θ⃗∗ that minimises the expec-
tation value. Leading to that minimum expectation value, from which solution state
|Ψ(θ∗ )⟩ is constructed.
We have two main frameworks of optimisation.
28
Here ∇f
⃗ θ⃗ calculates the gradient of cost function and η is a small hyper-parameter
that determines the size of parameters update. Gradient based optimisation works fine
with small number of qubits but as we increase the number of qubits to express many
features of the data it suffers from a gruesome problem of ‘baren plateaus’ , which
refers to the exponentially vanishing gradients.
Figure 3.10: Variance decreases with the increase of number of qubits, Baren Plateau
[40]
Figure 3.11 shows that variance decreases with the increase in the depth of the
circuit. It actually gets stuck at some local minima with global minima un explored
and resultantly we don’t get the optimised parameters.
29
Chapter 4
In the last chapter an introduction to quantum machine learning coupled with different
approaches and popular algorithms was presented. Here we will present the detailed
breakdown of Variational Quantum Classifier (VQC) for Penguins and Titanic datasets
[53]. All the coding is done in python and qiskit library by IBM.
30
Figure 4.1: Importing the necessary libraries and functions
31
Figure 4.3: Displaying all the missing values
Now we will clean the data by dropping all the missing values. this is done so as
to make sure that the predicting model do not get a false input and train on it, since
it can signifcantly effect the accuracy. Figure 4.4 diplays the values after cleaning.
32
Figure 4.5: Label encoding
33
Figure 4.7: displaying data
Figure 4.7 displays the relationship between different features. Four features of
three species of penguins make up different combinations for comparison. It helps
determine how each feature varies with respect to another. Feature values which are
closely clustered show more dependence on one other than the ones which are far apart.
34
Figure 4.9 features that were retained after reduction are Island, bill length, flipper
length and body mass.
35
Next we will normalise the data so that all values appear between 0 and 1. After
normalisation data will be splitted into two parts one part which is usually 80% of the
data is used for training and rest is the test data that will be used to test the model,
as shown in figure 4.11.
36
Feature map initializes the qubits (propotional to the number of input features)
in zero state. Hadamard gates put each qubit in superposition. Next, P represents
the Puli rotation gates which are Rz gates in this case. Input features are encoded in
the phase of these gates. Phase angles are multiplied by the factor of 2 by the choice
of design to amplify the angles and to derive some advantage during the gradient
calculation. Followed by it, are the entangling gates, whose pattern can be manually
adjusted as desired [54].
37
4.2.11 Optimisation
VQE is a hybrid model in which both classical and quantum computational resources
are used simultaneously. Here we will use COBYLA, a classical optimizer, to minimize
the objective function. Figure 4.14 gives code to call the optimiser.
In VQE framework, we have two popular primitives for optimization: the sampler,
that generates the probability distribution bit-strings, and the other is estimator, which
calculates the expectation values [57]. Here we will use sampler as shown in figure 4.15
38
Figure 4.16: Variational quantum classifier
39
4.2.14 Displaying Accuracy
Last step in the VQC classification model training is to get the accuracy of how good
the model has performed. For this instance we have 86% accuracy for training data and
82% for test data. It is not as good as the classical data, but it is still fairly accurate.
Considering the under developing quantum technology and hardware constraints it is
quite interesting result. QML is an active area of research and different frameworks are
proposed frequently. It is quite possible that same data might give a better or worse
accuracy results if trained through algorithms other than VQC.
It is quite evident from the Figure 4.20 that classical algorithm outperforms the
quantum one, but the silver lining is that quantum mechanical inherent properties
such as superposition, entanglement and interference have suggested strong clues about
40
the potential advantage that can be extracted through a quantum machine learning
algorithm [59].
41
Figure 4.22: SVM result for Titanic data
In this chapter a detailed breakdown of VQC algorithm is presented with python code
snippets for Penguins and Titanic datasets. It was one example each for a fixed number
of reps and it can be seen that QML algorithm is slightly less efficient for penguins data
but works equally good for titanic dataset. In the next chapter accuracy results for var-
ious combinations of reps of feature map and ansatz will be presented for comparison.
42
Chapter 5
In the previous chapter data analysis and VQC for training and test data accuracy
in the framework of python was explained. In this chapter we will discuss how the
numbers of reps of Feature-Map and Ansatz contribute to the accuracy of the the
classification algorithm VQC. In the following examples number of reps of feature map
varies from 1 to 3 and for ansatz it varies from 1 to 5. We have used ZZfeaturemap as
Feature map and Real Amplitude as Ansatz. Number of features taken for penguins
data are 3, 4, 5 and 6 and for that of Titanic data are 3, 4, 5, 6, and 7. We have used
two different optimizers Cobyla and SPSA for fair comparison.
43
in each value represents the number of reps for feature map whereas the second number
represents the number of reps for the ansatz.
Figure 5.1: Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 3, optimizer = cobyla.
Figure 5.2: Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 5, optimizer = cobyla.
44
Figure 5.3: Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 5, optimizer = cobyla.
Figure 5.4: Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 6, optimizer = cobyla.
45
different optimizer called SPSA.
Figure 5.5: Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 3, optimizer = SPSA.
Figure 5.6: Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 4, optimizer = SPSA.
46
Figure 5.7: Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 5, optimizer = SPSA.
Figure 5.8: Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 6, optimizer = SPSA.
47
5.2 Relation between Numbers of Reps and Accuracy
For Titanic Data-set
In this section we see how accuracy and number of reps of feature map and ansatz are
related using Titanic dataset. Titanic data sets originally consists of 12 features which
can be trained for classification between survived or deceased passengers. For our pur-
pose of classification through VQC, 7 features have been retained and remaining have
been removed during the data cleaning which were deemed unnecessary or incomplete.
Figure 5.9: Training and test data accuracy against different number reps of feature-
map and ansatz, number of features = 3, optimizer = Cobyla.
48
Figure 5.10: Training and test data accuracy against different number reps of
feature-map and ansatz, number of features = 4, optimizer = Cobyla.
Figure 5.11: Training and test data accuracy against different number reps of
feature-map and ansatz, number of features = 5, optimizer = Cobyla.
49
Figure 5.12: Training and test data accuracy against different number reps of
feature-map and ansatz, number of features = 6, optimizer = Cobyla.
Figure 5.13: Training and test data accuracy against different number reps of
feature-map and ansatz, number of features = 7, optimizer = Cobyla.
50
for optimization SPSA is used.
Figure 5.14: Training and test data accuracy against different number reps of
feature-map and ansatz, number of features = 3, optimizer = SPSA.
Figure 5.15: Training and test data accuracy against different number reps of
feature-map and ansatz, number of features = 4, optimizer = SPSA.
51
Figure 5.16: Training and test data accuracy against different number reps of
feature-map and ansatz, number of features = 5, optimizer = SPSA.
Figure 5.17: Training and test data accuracy against different number reps of
feature-map and ansatz, number of features = 6, optimizer = SPSA.
52
Figure 5.18: Training and test data accuracy against different reps of feature-map
and ansatz, number of features = 7, optimizer = SPSA.
53
Chapter 6
In the previous chapters the detailed break down of model training and output of
individual instances have been presented. Now in this chapter we will analyse the
results in comparison. Training and test data data accuracy graphs against different
combinations of reps have been plotted for various number of features.
54
Figure 6.1: Training data accuracy against different No. of reps for various No. of
features.
Figure 6.2: Test data accuracy against different No. of reps for various No. of fea-
tures.
55
6.1.2 Training and Test Data Accuracy Comparison Using SPSA
Figure 6.3 and 6.4 present combined graphs of training and test data accuracy against
the number of reps for various number of features, using the SPSA optimizer.
Figure 6.3: Training data accuracy against different No. of reps for various No. of
features.
Figure 6.4: Test data accuracy against different No. of reps for various No. of fea-
tures.
56
6.1.3 Optimal Combination of Reps
Table 6.1 below shows the combination of repetitions of feature map and Ansatz which
give best accuracy for various number of features.
Table 6.1. Optimal combinations of reps of feature map and Ansatz, penguins data
57
Figure 6.5: Training data accuracy against different No. of reps for various No. of
features.
Figure 6.6: Test data accuracy against different No. of reps for various No. of fea-
tures.
58
Figure 6.7: Training data accuracy against different No. of reps for various No. of
features.
Figure 6.8: Test data accuracy against different No. of reps for various No. of fea-
tures.
59
6.2.3 Optimal Combination of Reps
Table 6.2 below shows the combination of repetitions of feature map and Ansatz which
give best accuracy for various number of features.
Table 6.2. Optimal combinations of reps of feature map and Ansatz, Titanic data
• The number of repetitions of the feature map and ansatz dramatically affects the
output of VQC algorithm for both training and test data, and setting the reps
to the right combination will produce optimal results.
• A trade-off between the number of reps and circuit depth is important since, in
NISQ era hardware, we are constrained to minimise the number of qubits.
• In figures 6.1 through 6.8 It can be seen that there accuracy peaks appear when
number of reps of ansatz are increased from 1 to 5. But at the same time Number
Increaeing the number of reps of feature map limits the performance.
60
• We also propose that, although increasing the number of reps of feature map help
capture intricate patterns in the data and might be of statistical value, but at
the same time it negatively affects the accuracy of the model, as seen in figures
6.1 through 6.8. It is complicated to comment on why exactly this is the case.
However, a likely reason for the impact is the increase in circuit length that would
invite noise.
• From the results in sections 6.1 and 6.2 it can be concluded that the optimal
combinations for the number of repetitions for the feature map and ansatz are
13, 14, 15, 23, 24, 25.
61
Chapter 7
To summarise the results produced in the last chapter and before it, for both penguins
and titanic datasets different combinations of the repetitions of feature map and Ansatz
seem to play a pivotal role in determining the accuracy. Besides the choice of feature
map and ansatz and number of times you apply them in a circuit noticeably improves
the accuracy.
Quantum hardware of NISQ era is prone to noise and could result in erroneous
outputs. With this constraint circuit depth and entangling pattern can increase deco-
herence. But more repetitions of feature map can represent complex data structures in
Hilbert space and by increasing the number of repetitions for ansatz and can increase
the expressibility resultantly broadening the solution search space that could poten-
tially capture intricate data patterns, which will not only help enhance the accuracy
but it will also make the model as good with less number of features. Here comes the
trade off between the accuracy and complexity of algorithm. Until the fault tolerant
quantum computers we have to look for algorithms that less number if qubits with
minimum circuit depth.
Sections 6.1 and 6.2 presents combination with the optimal number of reps. from
the graphs accuracy peeks can be seen for the combinations 13, 14, 15, 24, and 25,
where furst number in each combination represents feature map’s reps and second
number represents number of reos for the ansatz.
62
Bibliography
63
[11] T. Albash, D. A. Lidar, Adiabatic quantum computation, Reviews of Modern
Physics 90 (1) (2018) 015002.
[12] Z. Bian, F. Chudak, W. G. Macready, G. Rose, The ising model: teaching an old
problem new tricks, D-wave systems 2 (2010) 1–32.
[15] Proceedings of the Royal Society of London. Series A: Mathematical and Physical
Sciences 449 (1937) (1995) 669–677. doi:10.1098/rspa.1995.0065, [link].
URL http://dx.doi.org/10.1098/rspa.1995.0065
[18] http://s3.amazonaws.com/sf-web-assets-prod/wp-content/uploads/2018/
05/29144646/Scott_AAronson_Quantum_And_Classical_Uncertainty.svg,
[Accessed 14-03-2024].
64
[25] K. A. Tychola, T. Kalampokas, G. A. Papakostas, Quantum machine learning—an
overview, Electronics 12 (11) (2023) 2379.
[26] A. Géron, Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow,
" O’Reilly Media, Inc.", 2022.
[31] P. Wittek, Quantum machine learning: what quantum computing means to data
mining, Academic Press, 2014.
[32] S. Aaronson, The learnability of quantum states, Proceedings of the Royal Society
A: Mathematical, Physical and Engineering Sciences 463 (2088) (2007) 3089–3114.
[35] P. Rebentrost, M. Mohseni, S. Lloyd, Quantum support vector machine for big
data classification, Physical review letters 113 (13) (2014) 130503.
[36] E. Badr, S. Almotairi, M. A. Salam, H. Ahmed, New sequential and parallel sup-
port vector machine with grey wolf optimizer for breast cancer diagnosis, Alexan-
dria Engineering Journal 61 (3) (2022) 2520–2534.
[38] M. Schuld, N. Killoran, Quantum machine learning in feature hilbert spaces, Phys-
ical Review Letters 122 (4) (Feb. 2019). doi:10.1103/physrevlett.122.040504.
URL http://dx.doi.org/10.1103/PhysRevLett.122.040504
65
[39] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, Y. Bengio, Generative adversarial networks (2014). arXiv:1406.
2661.
[47] R. LaRose, B. Coyle, Robust data encodings for quantum classifiers, Physical
Review A 102 (3) (2020) 032420.
66
[52] A. Gilyén, S. Arunachalam, N. Wiebe, Optimizing quantum optimization algo-
rithms via faster quantum gradient computation, in: Proceedings of the Thirtieth
Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2019, pp. 1425–
1444.
67