8000 Object-oriented query API - more flexible and maintainable · Issue #35 · cblearn/cblearn · GitHub
[go: up one dir, main page]

Skip to content

Object-oriented query API - more flexible and maintainable #35

@dekuenstle

Description

@dekuenstle

Comparison queries can ask different questions:

  • triplet: d(A, B) < d(A, C)
  • quadrupled: d(A, B) < d(C, D)
  • odd-one-out: d(B, C) < d(A, B) & d(A, C)
  • most-central: d(A, B) & d(A, C) < d(B, C)
  • choose-n-similar: d(A, B1) & d(A, B2) ... & d(A, Bn) < d(A, C1) ... & d(A, Cm)
  • rank-n-similar: d(A, B1) < d(A, B2) ... < d(A, Bn) < d(A, C1) ... & 692A d(A, Cm)

Notice their relations:

  • triplet is a special form of quadrupled with A or B = C or D
  • triplet is a special form of n-similar with n=1, m=1
  • odd-one-out, choose-n-similar, and rank-n-similar can be represented by multiple triplet

These queries can be represented in different formats that have their own advantages:

  • list of queries with a column per object. The order of the entries indicates the response
  • list of queries and a list of responses.
  • sparse tensor

Even responses, if not implicit in the order of query items, can come in different formats:

  • True, False of the inequality
  • -1, 1, 0 -> false, true, undecided of the inequality
  • index/indices of the selected item
  • one hot encoding of the selected item

Currently, we assume a query is a triplet and provide preprocessing functions to convert other queries to triplets. These triplets are stores as plain arrays (plus response array) or sparse tensors.
We infer the format, then and convert it in a single utility function. This is neither easy to extend nor to maintain.

I would like to switch to an object oriented API instead. We provide classes for the different questions that provide conversion methods. These classes store data as a list of queries, but can be build from and export to multiple data formats.
Still provide a function to infer the query if possible, so that all functions accept either the Query subclass or the "raw" data formats.

class Query:
    ...
    def to_X():
        ....
    def to_X_y():
        # ask for response format as parameter
        ....
    def to_sparse():
        # ask for response format as parameter
        ....

class Triplet(Query):
    ...
    def to_quadruplet():
         return Quadruplet(self.data[:, [0, 1, 0, 2]])
    
class Quadruplet(Query):
    def to_triplet():
        # check if the rows are actually triplets (A or B = C or D)
        # otherwise throw error
        return Triplet(self.data[:, [col indices...]])

...

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0