-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Comparison queries can ask different questions:
- triplet: d(A, B) < d(A, C)
- quadrupled: d(A, B) < d(C, D)
- odd-one-out: d(B, C) < d(A, B) & d(A, C)
- most-central: d(A, B) & d(A, C) < d(B, C)
- choose-n-similar: d(A, B1) & d(A, B2) ... & d(A, Bn) < d(A, C1) ... & d(A, Cm)
- rank-n-similar: d(A, B1) < d(A, B2) ... < d(A, Bn) < d(A, C1) ... & 692A d(A, Cm)
Notice their relations:
- triplet is a special form of quadrupled with A or B = C or D
- triplet is a special form of n-similar with n=1, m=1
- odd-one-out, choose-n-similar, and rank-n-similar can be represented by multiple triplet
These queries can be represented in different formats that have their own advantages:
- list of queries with a column per object. The order of the entries indicates the response
- list of queries and a list of responses.
- sparse tensor
Even responses, if not implicit in the order of query items, can come in different formats:
- True, False of the inequality
- -1, 1, 0 -> false, true, undecided of the inequality
- index/indices of the selected item
- one hot encoding of the selected item
Currently, we assume a query is a triplet and provide preprocessing functions to convert other queries to triplets. These triplets are stores as plain arrays (plus response array) or sparse tensors.
We infer the format, then and convert it in a single utility function. This is neither easy to extend nor to maintain.
I would like to switch to an object oriented API instead. We provide classes for the different questions that provide conversion methods. These classes store data as a list of queries, but can be build from and export to multiple data formats.
Still provide a function to infer the query if possible, so that all functions accept either the Query subclass or the "raw" data formats.
class Query:
...
def to_X():
....
def to_X_y():
# ask for response format as parameter
....
def to_sparse():
# ask for response format as parameter
....
class Triplet(Query):
...
def to_quadruplet():
return Quadruplet(self.data[:, [0, 1, 0, 2]])
class Quadruplet(Query):
def to_triplet():
# check if the rows are actually triplets (A or B = C or D)
# otherwise throw error
return Triplet(self.data[:, [col indices...]])
...