Moonshinedeploys every model your lab builds.

Provide a link to your weights and we autonomously set up a production endpoint for you usually within an hour. An agent handles container setup, GPU selection, and deployment for any common or custom model.

If you would like to try Moonshine, please reach out.

FAQ

What are the limits of “deploy every model”?

Moonshine deploys your model as a callable endpoint on serverless GPUs that scale with load. It's a model behind an API, not a backend, so it does not support long-running processes or persistent data.

What's the full workflow?

  1. Upload your model code + weights to GitHub, Hugging Face, etc.
  2. Provide the link and start the deployment process. Moonshine builds the image, selects the hardware, and deploys the model for you.
  3. Get an email when it's live with your endpoint and generated API documentation.
  4. Update the API, hardware, or redeploy afterwards.

What model types/frameworks do you support?

Any of them. Moonshine is framework and architecture agnostic. The only requirement is the shape, not the framework: the model runs on a GPU and works in a single call with inputs in and results out.

What GPUs do you support?

GPUVRAM
B300288 GB
B200180 GB
H200141 GB
RTX Pro 600096 GB
H100 NVL94 GB
H100 SXM80 GB
H100 PCIe80 GB
A100 SXM80 GB
A100 PCIe80 GB
L40S48 GB
L4048 GB
RTX 6000 Ada48 GB
RTX A600048 GB
A4048 GB
RTX 509032 GB
RTX 409024 GB
RTX 309024 GB
L424 GB
RTX A500024 GB
Your own compute

Where does my model run?

Moonshine provisions serverless GPUs from a network of compute providers and auto-selects the right hardware. Nothing is stored.

How much does it cost?

Pricing is usage based, contact us to learn more.

Who is Moonshine?

We previously built video models and spent too much time on deployment. We're backed by Y Combinator and a number of angels.