[go: up one dir, main page]

Skip to content

Martmists-GH/mlutils

Repository files navigation

MLUtils

A collection of utilities I wrote for ML stuff!

Features

  • K-means clustering
  • Principal component analysis
  • Convolution
  • 1D/2D FFT/IFFT

It also includes some code for compatibility between projects:

  • ndarray.simd
    • This provides the primary data type used in this repo, F64Array.
    • Support was added to read/write F64Arrays to/from CSV files.
  • Exposed and pgGvector
    • Supports storing F64Arrays in exposed tables using the PGvector type, as well as implementing a DSL for all operations.
  • LangChain4J
    • Supports converting F64Arrays to/from LangChain4J's Embedding type.
  • Scrimage
    • Support was added to read/write F64Arrays to/from image files.
    • Supports:
      • PNG
      • JPEG
      • WEBP (does require the scrimage-webp module)
  • Kotlinx DataFrame
    • Support was added to convert F64Arrays to/from DataFrame's DataColumn<Double> type.

Planned features

  • MultiK support
    • Potentially switch the entire codebase to use MultiK instead of Viktor at the core, this would also make it multiplatform.
  • Kotlinx Serialization support
    • Support for serializing/deserializing F64Arrays using kotlinx.serialization.
  • Built-in extensions for dealing with ND data (like colors, complex numbers, etc.)

Installation

Coming soon:tm: to my Maven repo.

Test data sources

The following is a list of assets from src/test/resources and their origins.

hw3-data.csv: https://docs.google.com/uc?export=download&id=1CjR6Q6nMN_2pTJJietr07mRjEYYSWR7U

plush.png: @Totalatomic_ on Twitter.

  • This is an image Totalatomic_ made for me (@Martmists-GH) personally back in 2021. Its use is permitted in this project and forks for the purpose of testing, and it is not to be used for any other purpose unless explicitly stated.

License

This project is licensed under the 3-Clause BSD NON-AI License.

The TL;DR is: You may use the code in this project for AI/ML purposes (heck, that's what it's made for in the first place), but the source code may not be used as part of a dataset to train AI.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages