Cerebras
Use Cerebras Wafer Scale Engine for ultra-fast Llama inference with Nexus.
Cerebras provides ultra-fast inference on their Wafer Scale Engine hardware. Uses an OpenAI-compatible API.
import "github.com/xraph/nexus/providers/cerebras"
provider := cerebras.New(os.Getenv("CEREBRAS_API_KEY"))
gw := nexus.New(
nexus.WithProvider(provider),
)
| Option | Description |
|---|
cerebras.WithBaseURL(url) | Override the API base URL (default: https://api.cerebras.ai/v1) |
| Capability | Supported |
|---|
| Chat | Yes |
| Streaming | Yes |
| Embeddings | No |
| Vision | No |
| Tools | No |
| Thinking | No |
| Model | Context | Max Output | Input Price | Output Price |
|---|
llama3.1-8b | 8,192 | 4,096 | $0.10/M | $0.10/M |
llama3.1-70b | 8,192 | 4,096 | $0.60/M | $0.60/M |