Run fal.ai models for speed

fal is a serverless inference platform built around one promise: faster cold starts and lower latency than the rest of the hosted-inference market. Flux schnell on fal returns a finished image in under two seconds. The platform exposes a queue API, server-sent-event streaming, WebSocket real-time, official Python and JS SDKs, and a hosted MCP server at mcp.fal.ai/mcp. Agents that paste in generic HTTP calls miss the queue lifecycle, the streaming primitives, and the model registry conventions. This skill teaches an agent how to use fal the way fal wants to be used.

When to use fal vs Replicate

Use fal when:

The user cares about latency (Flux schnell 1-2s, fast-SDXL 1-3s)
The use case is interactive (live preview, streaming, chat-style iteration)
The model is in fal's "fast-X" catalog (fast-sdxl, fast-lcm, flux schnell)
You need WebSocket / SSE streaming for progressive output

Use Replicate (companion skill) when the model isn't on fal, you need pinned model versions (Replicate exposes version SHAs; fal mostly doesn't), or the catalog gap matters more than latency.

Install

pip install fal-client
npm install @fal-ai/client
export FAL_KEY="..."

Official hosted MCP at mcp.fal.ai/mcp (Claude Code, Cursor, Windsurf; Claude Desktop not yet supported, needs OAuth 2.0):

claude mcp add --transport http fal-ai https://mcp.fal.ai/mcp \
  --header "Authorization: Bearer $FAL_KEY"

Pattern	Use when	Returns
`run()`	One-shot, you can wait, no queue visibility	Final result
`subscribe()`	Default for agent code. Blocks, polls queue, exposes progress	Final result + queue updates
`submit()` + `iter_events()` + `get()`	Long jobs, webhooks, background work	request_id, then events
`stream()`	Live SSE progress. Bypasses queue, no retries	Iterator of events

Model	Slug	Use case
Flux schnell	`fal-ai/flux/schnell`	Fastest Flux, 1-4 steps, sub-2s. Drafts and iteration.
Flux dev	`fal-ai/flux/dev`	Standard quality, commercial-use license.
Flux Pro v1.1	`fal-ai/flux-pro/v1.1`	Higher fidelity, better composition.
Flux Pro Ultra	`fal-ai/flux-pro/v1.1-ultra`	Up to 2K, photoreal.
Fast SDXL	`fal-ai/fast-sdxl`	LoRA-friendly, very fast.
Recraft V4	`fal-ai/recraft/v4/text-to-image`	Design, brand systems, vector-friendly.
Kling v3 Pro	`fal-ai/kling-video/v3/pro/text-to-video`	Cinematic video with native audio.

Run fal.ai models for speed

Run fal.ai models for speed

When to use fal vs Replicate

Install

How calls should be structured

Model registry quick reference

File handling

Cost gotchas

Webhooks for async work

What to deliver to the user

What NOT to do

Useful follow-ups

Run fal.ai models for speed

Run fal.ai models for speed

When to use fal vs Replicate

Install

How calls should be structured

Queue vs subscribe vs stream

Model registry quick reference

File handling

Cost gotchas

Webhooks for async work

What to deliver to the user

What NOT to do

Useful follow-ups