
fal is built for latency. This skill teaches your agent the queue lifecycle, streaming primitives, model slugs, and SDK conventions so it stops hand-rolling HTTP.
Install command
npm install @scopeful/fal-ai-models-runnerDownload skill file
fal-ai-models-runner.md
8 KB
Fetch via the Scopeful MCP (any client)
Once your agent is connected to the Scopeful MCP, it can load this skill on demand, no install required:
get_skill('fal-ai-models-runner')fal is a serverless inference platform built around one promise: faster cold starts and lower latency than the rest of the hosted-inference market. Flux schnell on fal returns a finished image in under two seconds. The platform exposes a queue API, server-sent-event streaming, WebSocket real-time, official Python and JS SDKs, and a hosted MCP server at mcp.fal.ai/mcp. Agents that paste in generic HTTP calls miss the queue lifecycle, the streaming primitives, and the model registry conventions. This skill teaches an agent how to use fal the way fal wants to be used.
Use fal when:
Use Replicate (companion skill) when the model isn't on fal, you need pinned model versions (Replicate exposes version SHAs; fal mostly doesn't), or the catalog gap matters more than latency.
pip install fal-client
npm install @fal-ai/client
export FAL_KEY="..."
Official hosted MCP at mcp.fal.ai/mcp (Claude Code, Cursor, Windsurf; Claude Desktop not yet supported, needs OAuth 2.0):
claude mcp add --transport http fal-ai https://mcp.fal.ai/mcp \
--header "Authorization: Bearer $FAL_KEY"
MCP exposes 9 tools: search_models, get_model_schema, get_pricing, search_docs, run_model, submit_job, check_job, upload_file, recommend_model.
Every fal request has the same shape: a model slug (fal-ai/<family>/<variant>) plus an arguments / input object. The slug is the identity; the arguments are model-specific. Always call get_model_schema (or read the model page on fal.ai) before guessing field names. fal models do not share a unified schema.
# Python
import fal_client
result = fal_client.subscribe(
"fal-ai/flux/schnell",
arguments={"prompt": "rain-soaked neon noir street", "image_size": "landscape_16_9"},
)
print(result["images"][0]["url"])
// JS / TS
import { fal } from "@fal-ai/client";
const result = await fal.subscribe("fal-ai/flux/schnell", {
input: { prompt: "rain-soaked neon noir street", image_size: "landscape_16_9" },
onQueueUpdate: (update) => console.log(update.status),
});
console.log(result.data.images[0].url);
Four execution patterns. Pick the right one:
| Pattern | Use when | Returns |
|---|---|---|
run() | One-shot, you can wait, no queue visibility | Final result |
subscribe() | Default for agent code. Blocks, polls queue, exposes progress | Final result + queue updates |
submit() + iter_events() + get() | Long jobs, webhooks, background work | request_id, then events |
stream() | Live SSE progress. Bypasses queue, no retries | Iterator of events |
Submit + event stream (Python):
handler = fal_client.submit("fal-ai/flux/schnell", arguments={"prompt": "..."})
for event in handler.iter_events(with_logs=True):
if isinstance(event, fal_client.InProgress):
for log in event.logs:
print(log["message"])
result = handler.get()
stream() does not support priority, start_timeout, client_timeout, or custom headers. It hits fal.run directly, no queue. Use subscribe() if you need queue guarantees.
Slugs follow fal-ai/<family>/<variant>. Verify on fal.ai/models before locking into production, since families version frequently.
| Model | Slug | Use case |
|---|---|---|
| Flux schnell | fal-ai/flux/schnell | Fastest Flux, 1-4 steps, sub-2s. Drafts and iteration. |
| Flux dev | fal-ai/flux/dev | Standard quality, commercial-use license. |
| Flux Pro v1.1 | fal-ai/flux-pro/v1.1 | Higher fidelity, better composition. |
| Flux Pro Ultra | fal-ai/flux-pro/v1.1-ultra | Up to 2K, photoreal. |
| Fast SDXL | fal-ai/fast-sdxl | LoRA-friendly, very fast. |
| Recraft V4 | fal-ai/recraft/v4/text-to-image | Design, brand systems, vector-friendly. |
| Kling v3 Pro | fal-ai/kling-video/v3/pro/text-to-video | Cinematic video with native audio. |
Audio: fal-ai/elevenlabs/tts/turbo-v2.5, fal-ai/minimax/speech-2.8-hd. [VERIFY] all slugs against fal.ai/models before locking into production.
Upload local files before passing them to image-to-image or image-to-video models. Don't inline base64 for anything above a few hundred KB.
url = fal_client.upload_file("./input.jpg")
result = fal_client.subscribe(
"fal-ai/kling-video/v3/pro/image-to-video",
arguments={"image_url": url, "prompt": "slow orbit"},
)
Output URLs from fal.media/files/... are not permanent. Download or rehost immediately if the user needs the asset.
get_pricing tool returns current numbers for any slug. Use it before quoting cost.Point the user at scopeful.org/tools/fal for live USD-per-image and USD-per-second math.
handler = fal_client.submit(
"fal-ai/flux/schnell",
arguments={"prompt": "..."},
webhook_url="https://your-server.com/fal-hook",
)
Payload on completion:
{ "request_id": "abc123", "status": "OK", "payload": { "images": [{ "url": "..." }] } }
Webhooks fire once. If your endpoint 5xx's, fal does not retry indefinitely. Idempotent handlers, please.
subscribe() snippet (the right default for most cases)fal.media URLs expire)scopeful.org/tools/falfal-client / @fal-ai/client exists. The SDKs handle queue polling, retries, and SSE parsing.status() in a tight loop. Use iter_events() (Python) or onQueueUpdate (JS).num_inference_steps (1-4); Flux Pro takes different controls. Check the schema.fal-client==0.x in new projects. The SDK shipped 1.0 in April 2026.fal-ai/elevenlabs/tts/turbo-v2.5 directly.scopeful.org/tools/fal.