[go: up one dir, main page]

Skip to content
🌲🎉 We've officially partnered with Black Forest Labs to bring you the latest Flux.1 family of models.
Turbo Registry: our new docker registry in Rust to drastically lower cold starts for ML models
Learn more

Run any AI model
in
your cloud or ours

With Mystic you can deploy ML in your own Azure/AWS/GCP account or deploy in our shared GPU cluster

Recommended

Pay-as-you-go API

Mystic in our shared cloud

Our shared cluster of GPUs used by 100s of users simultaneously. Low cost but performance will vary depending on realtime GPU availability.

Created and used by experts at

  • customer logo for SensusFuturis
  • customer logo for Seelab
  • customer logo for Vellum
  • AWS logo
  • Cambridge University logo
  • Google logo
  • Monzo logo
  • customer logo for Renovate AI
  • customer logo for Charisma AI
  • customer logo for Hypotenuse AI
  • customer logo for SensusFuturis
  • customer logo for Seelab
  • customer logo for Vellum
  • AWS logo
  • Cambridge University logo
  • Google logo
  • Monzo logo
  • customer logo for Renovate AI
  • customer logo for Charisma AI
  • customer logo for Hypotenuse AI

Bring your generative AI product to market faster

Good AI products need good models and infrastructure;
we solve the infrastructure part.

Cost optimizations

  • Run on spot and parallelized GPUs

  • Run in AWS/GCP/Azure and use your cloud credits

Fast inference

  • Use vLLM, TensorRT, TGI or any other inference engine

  • Low cold starts with our fast registry

Simpler developer experience

  • A fully managed Kubernetes platform that runs in your own cloud

  • Open-source Python library and API to simplify your entire AI workflow

With our managed platform designed for AI

You get a high-performance platform to serve your AI models. Mystic will automatically scale up and down GPUs depending on the number of API calls your models receive. You can easily view, edit and monitor your infrastructure from your Mystic Dashboard, CLI and APIs.

Cost optimizations

What we’ve done to make sure your infrastructure bill is as low as possible.

Pay GPUs at cost of cloud

Serverless providers charge you a premium on compute that quickly becomes very expensive. With Mystic running in your cloud, there is no added fee on compute.

a chart explaining cost optimizations from other providers vs Mystic, showing we dont have overheads

Run inference on spot instances

Mystic allows you to run your AI models on spot instances and automatically manage the request of new GPUs when preempted.

A graphic showing 10x cheaper than other on demand GPUs (example: A100-80GB on Azure)

Run in parallel, in the same GPU

Mystic supports GPU fractionalization. With 0 code changes, you can run multiple models on the same A30 or A100 or H100 or H200 GPU and maximise GPU utilization.

A graphic depicting fractionalization of GPUs by showing a rectangle broken in 24 smaller squares, and some of these squares are using on model and some are using another

Automatically scale down to 0-GPUs

If your models in production stop receiving requests, our auto-scaler will automatically release the GPUs back to the cloud provider. You can easily customize these warmup and cooldown periods with our API.

A chart showing how Mystic scales down to 0 GPUs when not in use, and scales up when needed

Cloud credits and commitments

If you are a company with cloud credits or existing cloud spend agreements, you can use them to pay for your cloud bill while using Mystic.

A graphic showing the 3 cloud providers we support: AWS, GCP, Azure

Performance optimizations

What we’ve done to make sure your models run extremely fast and have minimal cold start.

Bring your inference engines

Within a few milliseconds our scheduler decides the optimal strategy of queuing, routing and scaling.

vLLM
TGI
TensorRT
Deepspeed
Exllamav2
...
Bring your own

High-performance model loader built in Rust

Thanks to our custom container registry, written in Rust, experience much lower cold-starts than anywhere else in the market and load your containers extremely fast.

A graphic showing the 3 cloud providers we support: AWS, GCP, Azure

A simple and beautiful developer experience

We believe data-scientists and AI engineers should be able to safely deploy their ML without having to be experts in infrastructure.

No Kubernetes or DevOps experience required

Our managed platform removes all the complexities of building and maintaining your custom ML platform. We’ve packed all the engineering so you don’t have to.

A graphic showing the 3 cloud providers we support: AWS, GCP, Azure

APIs, CLI and Python SDK to deploy and run your ML

Extremely simple APIs, our CLI tool and an open-source Python library to give you the freedom and confidence of serving high-performance ML models..

A graphic showing the 3 cloud providers we support: AWS, GCP, Azure

A beautiful dashboard to view and manage all your ML deployments

A unified dashboard to view all your runs, ML pipelines, versions, GPU clusters, API tokens and much more.

A graphic showing the 3 cloud providers we support: AWS, GCP, Azure

Get started with Mystic

Run your AI models in in your cloud or ours

With Mystic, you can deploy ML in your own Azure/AWS/GCP account or deploy in our shared GPU cluster.

Recommended

Pay-as-you-go API

Mystic in our shared cloud

Our shared cluster of GPUs used by 100s of users simultaneously. Low cost but performance will vary depending on realtime GPU availability.

How to deploy AI models with Mystic

From 0 to fast API endpoint

From your custom SDXL, to your fine-tuned LLM. Whether it’s a LoRa or a complex pipeline. Our open-source tool will allow you to package your ML pipeline.

Wrap your pipeline with our open source library

Pipeline AI is our open-source Python library to wrap AI pipelines.

Whether it's a standard PyTorch model, a HuggingFace model, a combination of multiple models using your favourite inference engine or your fine-tuned models.

You get it, it's flexible and you can package anything.

View docs
from huggingface_hub import snapshot_download
from vllm import LLM, SamplingParams

from pipeline import entity, pipe

@entity
class LlamaPipeline
    @pipe(on_startup=True, run_once=True)
    def load_model(self) -> None:
      model_dir = "/tmp/llama2-7b-cache/"
      snapshot_download(
          "meta-llama/Llama-2-7b-chat-hf",
          local_dir=model_dir,
          token="YOUR_HUGGINGFACE_TOKEN",
      )
      self.llm = LLM(
        model_dir,
        dtype="bfloat16",
      )
      self.tokenizer = self.llm.get_tokenizer()
                    

Deploy to AWS, GCP, Azure with Mystic

With a single command, a new version of your pipeline is deployed on your own cloud.

Upload your pipeline
pipeline container push

Run your AI model as an API

Get an instant API endpoint to run your model after upload. Mystic automatically scales up and down GPUs depending on the usage of your deployed model. Use our APIs, CLI or Dashboard to view and manage your models and infrastructure.

RESTful APIs to call your model
curl -X POST 'https://www.mystic.ai/v4/runs/stream'
--header 'Authorization: Bearer YOUR_TOKEN'
--header 'Content-Type: application/json'
--data '{
  "pipeline": "user/pipeline_streaming:v1",
  "inputs": [{"type":"string","value":"A lone tree in the dessert"}]
  }'-N
Streaming animation showing an image of a tree materializing from white to full image over 1 second
Streaming animation showing an image of a tree materializing from white to full image over 1 second

Community

See what our public community uploads and deploy in your own cloud with 1-click-deploy.