A collection of utilities I wrote for ML stuff!
- K-means clustering
- Principal component analysis
- Convolution
- 1D/2D FFT/IFFT
It also includes some code for compatibility between projects:
- ndarray.simd
- This provides the primary data type used in this repo,
F64Array
. - Support was added to read/write
F64Array
s to/from CSV files.
- This provides the primary data type used in this repo,
- Exposed and pgGvector
- Supports storing
F64Array
s in exposed tables using thePGvector
type, as well as implementing a DSL for all operations.
- Supports storing
- LangChain4J
- Supports converting
F64Array
s to/from LangChain4J'sEmbedding
type.
- Supports converting
- Scrimage
- Support was added to read/write
F64Array
s to/from image files. - Supports:
- PNG
- JPEG
- WEBP (does require the
scrimage-webp
module)
- Support was added to read/write
- Kotlinx DataFrame
- Support was added to convert
F64Array
s to/from DataFrame'sDataColumn<Double>
type.
- Support was added to convert
- MultiK support
- Potentially switch the entire codebase to use MultiK instead of Viktor at the core, this would also make it multiplatform.
- Kotlinx Serialization support
- Support for serializing/deserializing
F64Array
s using kotlinx.serialization.
- Support for serializing/deserializing
- Built-in extensions for dealing with ND data (like colors, complex numbers, etc.)
Coming soon:tm: to my Maven repo.
The following is a list of assets from src/test/resources
and their origins.
hw3-data.csv
: https://docs.google.com/uc?export=download&id=1CjR6Q6nMN_2pTJJietr07mRjEYYSWR7U
- This data was sourced from this Medium article.
plush.png
: @Totalatomic_ on Twitter.
- This is an image Totalatomic_ made for me (@Martmists-GH) personally back in 2021. Its use is permitted in this project and forks for the purpose of testing, and it is not to be used for any other purpose unless explicitly stated.
This project is licensed under the 3-Clause BSD NON-AI License.
The TL;DR is: You may use the code in this project for AI/ML purposes (heck, that's what it's made for in the first place), but the source code may not be used as part of a dataset to train AI.