
Walks your agent through MJ's image-to-video workflow: starting frames, --motion, --loop, --end, --bs batch sizes, and the 3.25× SD vs HD GPU cost split.
Install command
npm install @scopeful/midjourney-videoDownload skill file
midjourney-video.md
10 KB
Fetch via the Scopeful MCP (any client)
Once your agent is connected to the Scopeful MCP, it can load this skill on demand, no install required:
get_skill('midjourney-video')Midjourney's video feature is image-to-video, not text-to-video. You feed it a starting frame (usually an MJ-generated image), optionally a text prompt, and it produces a 5-second clip. Then you can extend it up to 21 seconds total in 4-second chunks.
It is expensive compared to images and cheap compared to dedicated video models like Veo or Kling. Quality is mid-tier, better than the Higgsfield Free or Wan 2.5 baseline, worse than Veo 3.1. The killer feature is the identity continuity: a video extended from an MJ image preserves the original aesthetic in ways no other model matches today.
Use MJ video when:
Do NOT use MJ video when:
There are two paths: midjourney.com (web) and Discord. The agent's output should target the user's platform.
For user-uploaded images: click the image icon in the Imagine bar, drag the uploaded image into the Starting Frame slot. A text prompt is optional.
<image_URL> <prompt_text> --video
The image URL goes first, then the text prompt, then --video. The image must be online, host on Discord first if the user has a local file.
If the user already has a Midjourney image upscaled in Discord, they can use the Animate (High motion) / Animate (Low motion) buttons directly under it.
Video generations ignore most regular image parameters. Only these work:
| Param | Purpose | Values |
|---|---|---|
--motion low | Subtle camera + character movement (default) | flag |
--motion high | Big camera moves, large character motion (more glitch risk) | flag |
--raw | Reduce MJ's creative flair, follow prompt more literally | flag |
--loop | Reuse the start frame as the end frame (creates a loop) | flag |
--end <URL> | Use a different image as the end frame | URL |
--bs N | Batch size, how many video variations to generate | 1, 2, or 4 (default 4) |
--video | Required in Discord when using a custom image URL | flag |
That's it. Trying to put --ar, --stylize, --sref, --oref on a video prompt does nothing. The aspect ratio is inherited from the starting image. The style is inherited from the starting image.
MJ video is significantly more expensive than images. Memorize this table:
| Resolution | Batch 4 (default) | Batch 2 | Batch 1 |
|---|---|---|---|
| SD (480p) | 8 GPU-min | 4 GPU-min | 2 GPU-min |
| HD (720p) | 26 GPU-min | 13 GPU-min | 7 GPU-min |
Key implication: HD video is 3.25× more expensive than SD. The agent should default-recommend SD for exploration and only switch to HD when the user has picked the final shot.
Per-clip USD cost on Standard plan ($0.0333 per GPU-min):
--bs 1): $0.07--bs 1): $0.23If the user is on Basic plan ($0.05 per GPU-min), all numbers go up 50%.
Recommend Relax SD on Pro+ for experimentation. Switch to Fast HD for final.
A single video starts at 5 seconds. You can extend it up to 4 more times, gaining 4 seconds per extension, max 21 seconds total.
Each extension costs the same as the initial video. A 21-second HD clip = 5 video jobs × 26 GPU-min = 130 GPU-min ≈ $4.33 on Standard. Tell the user before they start chaining.
Two extension modes (web UI buttons or Discord buttons):
--loop reuses the start frame as the end frame. Use this for: subtle scene loops, ambient backgrounds, parallax-style breathing motion. The result genuinely loops cleanly.
--end <URL> uses a different image as the end frame. MJ interpolates between the two. Use this for:
Either flag goes at the end of the prompt with the other video params.
--motion low (default):
--motion high:
If both look bad, the starting image is probably the problem. Re-generate the still first.
The video inherits the starting image's aspect ratio. The exact pixel dimensions:
| Starting AR | Video AR | SD pixels | HD pixels |
|---|---|---|---|
| 1:1 | 1:1 | 624×624 | 960×960 |
| 4:3 | ~4:3 | 720×544 | 1104×832 |
| 2:3 | 2:3 | 512×768 | 784×1168 |
| 16:9 | ~16:9 | 832×464 | 1280×720 |
| 1:2 | 1:2 | 448×880 | 672×1360 |
For social-platform deliverables, generate the starting image at the target AR first:
--ar 9:16 on the still → video stays vertical--ar 16:9--ar 1:1For an MJ video request, the agent should return:
--motion low --bs 1 for first attemptExample output:
**Animating your fisherman image** (1280×720, HD on Standard plan)
**Discord:**
```
https://your-image-url.jpg moody push-in, gentle fog rolling, fisherman keeps mending nets --motion low --bs 1 --video
```
**Or on midjourney.com:** open the image, click **Animate Manually**, paste the prompt above (without the leading image URL or `--video`).
**Cost estimate:**
- HD, batch-of-1: ~7 GPU-min ≈ $0.23 on Standard
- HD, batch-of-4: ~26 GPU-min ≈ $0.87 (recommended for first pick)
- SD equivalent batch-of-1: 2 GPU-min ≈ $0.07 (recommended for cheap exploration)
**Likely follow-ups:**
- Extend by 4 s if you like the first 5 s (~7 more GPU-min)
- Add `--loop` to make it loop seamlessly
- Switch `--motion low` → `--motion high` if it's too static
Right-click any video on the Create or Organize page to get three download options:
In Discord, right-click only downloads the raw file.
--motion low is too low; try --motion high. If that's also static, the source image has no implied motion direction; pick a more dynamic still.--motion high over-warps; drop to low. Or the source image had ambiguous anatomy (hands, faces in profile).--bs 1 next time for the same prompt to spend less burning bad variants, or rerun the great one with --motion high for a variant.