GPUs and scaling
Request GPUs by canonical name; set worker bounds and multi-GPU configs.
GPU types
Specify by canonical name. All types are available in US; availability varies by region. Use A100_80 for the 80 GB variant.
python
1# Consumer / inference2@app.task(gpu=gw.Gpu("T4")) # 16 GB3@app.task(gpu=gw.Gpu("L4")) # 24 GB4@app.task(gpu=gw.Gpu("A10G")) # 24 GB5 6# Training / large models7@app.task(gpu=gw.Gpu("A100")) # 40 GB8@app.task(gpu=gw.Gpu("A100_80")) # 80 GB9@app.task(gpu=gw.Gpu("H100")) # 80 GB10@app.task(gpu=gw.Gpu("H200")) # 141 GBMulti-GPU
Request multiple GPUs on a single worker with the count argument. All GPUs share the same node — useful for tensor-parallel inference or DDP training.
python
1@app.task(gpu=gw.Gpu("A100", count=4), memory=65536, timeout=3600)2async def train_ddp():3 import torch4 print(torch.cuda.device_count()) # 4Worker bounds
min_workers keeps containers warm (eliminates cold starts). max_workers caps fan-out. Set min_workers=1 for latency-sensitive inference.
python
1@app.task(2 gpu=gw.Gpu("A100"),3 min_workers=1, # always 1 warm worker4 max_workers=10, # scale up to 10 under load5)6async def infer(prompt: str) -> str: ...Region pinning
Pin to a datacenter with the region argument. ANY (default) lets gworker pick the cheapest available. Match your volume's region to avoid cross-region egress.
python
1@app.task(gpu=gw.Gpu("H100"), region=gw.Region.US)2async def run(): ...