Moonshine

Provide a link to your weights and we autonomously set up a production endpoint for you usually within an hour. An agent handles container setup, GPU selection, and deployment for any common or custom model.

If you would like to try Moonshine, please reach out to get set up.

FAQ

What are the limits of “deploy every model”?

Moonshine deploys your model as a callable endpoint on serverless GPUs that scale with load. It's a model behind an API, not a backend, so it does not support long-running processes or persistent data.

What's the full workflow?

Upload your model code + weights to GitHub, Hugging Face, etc.
Provide the link and start the deployment process. Moonshine builds the image, selects the hardware, and deploys the model for you.
Get an email when it's live with your endpoint and generated API documentation.
Update the API, hardware, or redeploy afterwards.

What model types/frameworks do you support?

Any of them. Moonshine is framework and architecture agnostic. The only requirement is the shape, not the framework: the model runs on a GPU and works in a single call with inputs in and results out.

What GPUs do you support?

GPU	VRAM
B300	288 GB
B200	180 GB
H200	141 GB
RTX Pro 6000	96 GB
H100 NVL	94 GB
H100 SXM	80 GB
H100 PCIe	80 GB
A100 SXM	80 GB
A100 PCIe	80 GB
L40S	48 GB
L40	48 GB
RTX 6000 Ada	48 GB
RTX A6000	48 GB
A40	48 GB
RTX 5090	32 GB
RTX 4090	24 GB
RTX 3090	24 GB
L4	24 GB
RTX A5000	24 GB
Your own compute

Where does my model run?

Moonshine provisions serverless GPUs from a network of compute providers and auto-selects the right hardware. Nothing is stored.

How much does it cost?

Pricing is usage based, contact us to learn more.

Who is Moonshine?

We previously built video models and spent too much time on deployment. We're backed by Y Combinator and a number of angels.