An implementation of the candidate elimination algorithm which can be run across multiple cores. Written in rust for speed and fearless concurrency
Candidate Elimination is a type of version-space learning - an older approach to machine learning, originally introduced in 1977 by Tom Mitchell. It involves finding the most specific and most general hypotheses that satisfy all of the training examples that the algorithm has been shown, where each hypothesis is a logical sentence.
The algorithm is rarely seen nowadays, primarily because of its lack of noise resistance. Just one incorrectly labeled training example will cause the algorithm to converge incorrectly - a problem that more modern machine learning methods such as Neural Networks.
The main advantage candidate elimination has over other more modern approaches is that the output is easily interpretable. It's near impossible to figure out what concepts a neural network (or another black-box approach) has learned through looking at just its weights. However, looking at the hypotheses in the specific and general boundary produced by the algorithm, it is quite easy to see what constraints the target concept has.
The tool can be installed using cargo, the package manager installed as part of the rust toolchain.
Since the tool isn't currently published to crates.io, the easiest way to get it running is to cd
into a cloned version of the repository and run:
cargo install --path .
Assuming cargo has been configured correctly, this will add the ccelm
command to your path. Instructions on how to use the command can be found through ccelm --help
. Examples of configured datasets can be found in data/
.
Pre-built binaries are also currently unavailable.
This repository includes data adapted from the paper At the Boundaries of Syntactic Prehistory. The original data can be found on GitHub.