Sogni AI 图像与视频生成：去中心化 AI 媒体

AI智能体脚本智能办公脚本自动化游戏脚本浏览器自动化脚本服务器脚本

Sogni AI 图像与视频生成：去中心化 AI 媒体 - Openclaw Skills

作者：互联网

2026-03-27

AI教程

什么是 Sogni 图像与视频生成？

Sogni 图像与视频生成是一个专业级技能，利用 Sogni AI 的去中心化 GPU 网络制作高保真媒体内容。通过将此工具集成到您的 Openclaw Skills 工作流中，您可以访问广泛的最先进模型，用于文本生成图像、文本生成视频和高级图像编辑。此技能专为人类开发人员和 AI 代理设计，提供了一个强大的媒体创作接口，无需本地高端硬件。

该实用工具不仅限于简单生成，还提供用于面部转移、照片修复和复杂视频操作的专业工作流。作为 Openclaw Skills 库的核心组件，它在本地文件管理和去中心化 AI 计算之间提供了无缝桥梁，确保生成的资产被有序组织并易于下游应用程序访问。

下载入口:https://github.com/openclaw/skills/tree/main/skills/krunkosaurus/sogni-gen

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install sogni-gen

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级：工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 sogni-gen。如果尚未安装 Clawhub，请先安装（npm i -g clawhub）。

Sogni 图像与视频生成应用场景

从简单的文本提示或静态参考图像创建电影级的视频序列。
使用 360 度扫描模式在多个摄像机角度生成一致的人物肖像。
通过照相馆面部转移功能将用户照片转换为风格化的化身。
利用先进的图像到图像编辑模型修复复古和受损照片。
使用音频驱动动画（s2v）工作流生成逼真的口型同步动画。

Sogni 图像与视频生成工作原理

用户或代理通过 CLI 发起请求，指定提示词、模型和参考文件。
技能从本地 Sogni 凭据文件中检索身份验证令牌以授权请求。
命令传输到 Sogni AI 的去中心化网络，由专门的工作节点处理生成。
坚控实时进度，最终媒体资产生成为 URL 或下载到指定的本地路径。
有关渲染的元数据（包括种子和模型参数）存储在本地，以便进行迭代编辑或继续创作。

Sogni 图像与视频生成配置指南

要在您的 Openclaw Skills 环境中开始使用此技能，请按照以下步骤操作：

从 Sogni AI 仪表板获取您的凭据。
创建安全的凭据文件：

mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_USERNAME=your_username
SOGNI_PASSWORD=your_password
EOF
chmod 600 ~/.config/sogni/credentials

通过 npm 安装技能：

mkdir -p ~/.clawdbot/skills
cd ~/.clawdbot/skills
npm i sogni-gen
ln -sfn node_modules/sogni-gen sogni-gen

Sogni 图像与视频生成数据架构与分类体系

该技能通过 Openclaw Skills 生态系统内的一组结构化文件和目录来管理其操作：

路径	用途
`~/.config/sogni/credentials`	存储用于 API 访问的加密用户名和密码。
`~/.config/sogni/last-render.json`	持久化最近一次生成的元数据，以便轻松检索。
`~/.openclaw/openclaw.json`	存储模型 ID 和维度等默认插件配置。
`~/.clawdbot/media/inbound`	作为定位用户提供的图像和音频的主要目录。
`~/Downloads/sogni`	通过模型上下文协议 (MCP) 生成的资产的默认保存位置。

name: sogni-gen
version: "1.5.11"
description: Generate images **and videos** using Sogni AI's decentralized network, with local credential/config files and optional local media inputs. Ask the agent to "draw", "generate", "create an image", or "make a video/animate" from a prompt or reference image.
homepage: https://sogni.ai
metadata:
  clawdbot:
    emoji: "??"
    primaryEnv: "SOGNI_USERNAME"
    os: ["darwin", "linux", "win32"]
    requires:
      bins: ["node"]
      anyBins: ["ffmpeg"]
      env:
        - "SOGNI_USERNAME"
        - "SOGNI_PASSWORD"
        - "SOGNI_CREDENTIALS_PATH"
        - "SOGNI_LAST_RENDER_PATH"
        - "SOGNI_MEDIA_INBOUND_DIR"
        - "OPENCLAW_CONFIG_PATH"
        - "OPENCLAW_PLUGIN_CONFIG"
        - "FFMPEG_PATH"
        - "SOGNI_DOWNLOADS_DIR"
        - "SOGNI_MCP_SAVE_DOWNLOADS"
      config:
        - "~/.config/sogni/credentials"
        - "~/.openclaw/openclaw.json"
        - "~/.clawdbot/media/inbound"
        - "~/.config/sogni/last-render.json"
        - "~/Downloads/sogni"
    install:
      - id: npm
        kind: exec
        command: "cd {{skillDir}} && npm i"
        label: "Install dependencies"

Sogni Image & Video Generation

Generate images and videos using Sogni AI's decentralized GPU network.

Setup

Get Sogni credentials at https://app.sogni.ai/
Create credentials file:

mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_USERNAME=your_username
SOGNI_PASSWORD=your_password
EOF
chmod 600 ~/.config/sogni/credentials

Install dependencies (if cloned):

cd /path/to/sogni-gen
npm i

Or install from npm (no git clone):

mkdir -p ~/.clawdbot/skills
cd ~/.clawdbot/skills
npm i sogni-gen
ln -sfn node_modules/sogni-gen sogni-gen

Filesystem Paths and Overrides

Default file paths used by this skill:

Credentials file (read): ~/.config/sogni/credentials
Last render metadata (read/write): ~/.config/sogni/last-render.json
OpenClaw config (read): ~/.openclaw/openclaw.json
Media listing for --list-media (read): ~/.clawdbot/media/inbound
MCP local result copies (write): ~/Downloads/sogni

Path override environment variables:

SOGNI_CREDENTIALS_PATH
SOGNI_LAST_RENDER_PATH
SOGNI_MEDIA_INBOUND_DIR
OPENCLAW_CONFIG_PATH
SOGNI_DOWNLOADS_DIR (MCP)
SOGNI_MCP_SAVE_DOWNLOADS=0 to disable MCP local file writes

Usage (Images & Video)

# Generate and get URL
node sogni-gen.mjs "a cat wearing a hat"

# Save to file
node sogni-gen.mjs -o /tmp/cat.png "a cat wearing a hat"

# JSON output (for scripting)
node sogni-gen.mjs --json "a cat wearing a hat"

# Check token balances (no prompt required)
node sogni-gen.mjs --balance

# Check token balances in JSON
node sogni-gen.mjs --json --balance

# Quiet mode (suppress progress)
node sogni-gen.mjs -q -o /tmp/cat.png "a cat wearing a hat"

Options

Flag	Description	Default
`-o, --output`	Save to file	prints URL
`-m, --model`	Model ID	z_image_turbo_bf16
`-w, --width`	Width	512
`-h, --height`	Height	512
`-n, --count`	Number of images	1
`-t, --timeout`	Timeout seconds	30 (300 for video)
`-s, --seed`	Specific seed	random
`--last-seed`	Reuse seed from last render	-
`--seed-strategy`	Seed strategy: random\|prompt-hash	prompt-hash
`--multi-angle`	Multiple angles LoRA mode (Qwen Image Edit)	-
`--angles-360`	Generate 8 azimuths (front -> front-left)	-
`--angles-360-video`	Assemble looping 360 mp4 using i2v between angles (requires ffmpeg)	-
`--azimuth`	front\|front-right\|right\|back-right\|back\|back-left\|left\|front-left	front
`--elevation`	low-angle\|eye-level\|elevated\|high-angle	eye-level
`--distance`	close-up\|medium\|wide	medium
`--angle-strength`	LoRA strength for multiple_angles	0.9
`--angle-description`	Optional subject description	-
`--steps`	Override steps (model-dependent)	-
`--guidance`	Override guidance (model-dependent)	-
`--output-format`	Image output format: png\|jpg	png
`--sampler`	Sampler (model-dependent)	-
`--scheduler`	Scheduler (model-dependent)	-
`--lora`	LoRA id (repeatable, edit only)	-
`--loras`	Comma-separated LoRA ids	-
`--lora-strength`	LoRA strength (repeatable)	-
`--lora-strengths`	Comma-separated LoRA strengths	-
`--token-type`	Token type: spark\|sogni	spark
`--balance, --balances`	Show SPARK/SOGNI balances and exit	-
`-c, --context`	Context image for editing	-
`--last-image`	Use last generated image as context/ref	-
`--video, -v`	Generate video instead of image	-
`--workflow`	Video workflow (t2v\|i2v\|s2v\|v2v\|animate-move\|animate-replace)	inferred
`--fps`	Frames per second (video)	16
`--duration`	Duration in seconds (video)	5
`--frames`	Override total frames (video)	-
`--auto-resize-assets`	Auto-resize video assets	true
`--no-auto-resize-assets`	Disable auto-resize	-
`--estimate-video-cost`	Estimate video cost and exit (requires --steps)	-
`--photobooth`	Face transfer mode (InstantID + SDXL Turbo)	-
`--cn-strength`	ControlNet strength (photobooth)	0.8
`--cn-guidance-end`	ControlNet guidance end point (photobooth)	0.3
`--ref`	Reference image for video or photobooth face	required for video/photobooth
`--ref-end`	End frame for i2v interpolation	-
`--ref-audio`	Reference audio for s2v	-
`--ref-video`	Reference video for animate/v2v workflows	-
`--controlnet-name`	ControlNet type for v2v: canny\|pose\|depth\|detailer	-
`--controlnet-strength`	ControlNet strength for v2v (0.0-1.0)	0.8
`--sam2-coordinates`	SAM2 click coords for animate-replace (x,y or x1,y1;x2,y2)	-
`--trim-end-frame`	Trim last frame for seamless video stitching	-
`--first-frame-strength`	Keyframe strength for start frame (0.0-1.0)	-
`--last-frame-strength`	Keyframe strength for end frame (0.0-1.0)	-
`--last`	Show last render info	-
`--json`	JSON output	false
`--strict-size`	Do not auto-adjust i2v video size for reference resizing constraints	false
`-q, --quiet`	No progress output	false
`--extract-last-frame`	Extract last frame from video (safe ffmpeg wrapper)	-
`--concat-videos`	Concatenate video clips (safe ffmpeg wrapper)	-
`--list-media [type]`	List recent inbound media (images\|audio\|all)	images

OpenClaw Config Defaults

When installed as an OpenClaw plugin, sogni-gen will read defaults from:

~/.openclaw/openclaw.json

{
  "plugins": {
    "entries": {
      "sogni-gen": {
        "enabled": true,
        "config": {
          "defaultImageModel": "z_image_turbo_bf16",
          "defaultEditModel": "qwen_image_edit_2511_fp8_lightning",
          "defaultPhotoboothModel": "coreml-sogniXLturbo_alpha1_ad",
          "videoModels": {
            "t2v": "wan_v2.2-14b-fp8_t2v_lightx2v",
            "i2v": "wan_v2.2-14b-fp8_i2v_lightx2v",
            "s2v": "wan_v2.2-14b-fp8_s2v_lightx2v",
            "animate-move": "wan_v2.2-14b-fp8_animate-move_lightx2v",
            "animate-replace": "wan_v2.2-14b-fp8_animate-replace_lightx2v",
            "v2v": "ltx2-19b-fp8_v2v_distilled"
          },
          "defaultVideoWorkflow": "t2v",
          "defaultNetwork": "fast",
          "defaultTokenType": "spark",
          "seedStrategy": "prompt-hash",
          "modelDefaults": {
            "flux1-schnell-fp8": { "steps": 4, "guidance": 3.5 },
            "flux2_dev_fp8": { "steps": 20, "guidance": 7.5 }
          },
          "defaultWidth": 768,
          "defaultHeight": 768,
          "defaultCount": 1,
          "defaultFps": 16,
          "defaultDurationSec": 5,
          "defaultImageTimeoutSec": 30,
          "defaultVideoTimeoutSec": 300,
          "credentialsPath": "~/.config/sogni/credentials",
          "lastRenderPath": "~/.config/sogni/last-render.json",
          "mediaInboundDir": "~/.clawdbot/media/inbound"
        }
      }
    }
  }
}

CLI flags always override these defaults. If your OpenClaw config lives elsewhere, set OPENCLAW_CONFIG_PATH. Seed strategies: prompt-hash (deterministic) or random.

Image Models

Model	Speed	Use Case
`z_image_turbo_bf16`	Fast (~5-10s)	General purpose, default
`flux1-schnell-fp8`	Very fast	Quick iterations
`flux2_dev_fp8`	Slow (~2min)	High quality
`chroma-v.46-flash_fp8`	Medium	Balanced
`qwen_image_edit_2511_fp8`	Medium	Image editing with context (up to 3)
`qwen_image_edit_2511_fp8_lightning`	Fast	Quick image editing
`coreml-sogniXLturbo_alpha1_ad`	Fast	Photobooth face transfer (SDXL Turbo)

Video Models

WAN 2.2 Models

Model	Speed	Use Case
`wan_v2.2-14b-fp8_i2v_lightx2v`	Fast	Default video generation
`wan_v2.2-14b-fp8_i2v`	Slow	Higher quality video
`wan_v2.2-14b-fp8_t2v_lightx2v`	Fast	Text-to-video
`wan_v2.2-14b-fp8_s2v_lightx2v`	Fast	Sound-to-video
`wan_v2.2-14b-fp8_animate-move_lightx2v`	Fast	Animate-move
`wan_v2.2-14b-fp8_animate-replace_lightx2v`	Fast	Animate-replace

LTX-2 Models

Model	Speed	Use Case
`ltx2-19b-fp8_t2v_distilled`	Fast (~2-3min)	Text-to-video, 8-step
`ltx2-19b-fp8_t2v`	Medium (~5min)	Text-to-video, 20-step quality
`ltx2-19b-fp8_v2v_distilled`	Fast (~3min)	Video-to-video with ControlNet
`ltx2-19b-fp8_v2v`	Medium (~5min)	Video-to-video with ControlNet, quality

Image Editing with Context

Edit images using reference images (Qwen models support up to 3):

# Single context image
node sogni-gen.mjs -c photo.jpg "make the background a beach"

# Multiple context images (subject + style)
node sogni-gen.mjs -c subject.jpg -c style.jpg "apply the style to the subject"

# Use last generated image as context
node sogni-gen.mjs --last-image "make it more vibrant"

When context images are provided without -m, defaults to qwen_image_edit_2511_fp8_lightning.

Photobooth (Face Transfer)

Generate stylized portraits from a face photo using InstantID ControlNet. When a user mentions "photobooth", wants a stylized portrait of themselves, or asks to transfer their face into a style, use --photobooth with --ref pointing to their face image.

# Basic photobooth
node sogni-gen.mjs --photobooth --ref face.jpg "80s fashion portrait"

# Multiple outputs
node sogni-gen.mjs --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"

# Custom ControlNet tuning
node sogni-gen.mjs --photobooth --ref face.jpg --cn-strength 0.6 --cn-guidance-end 0.5 "oil painting"

Uses SDXL Turbo (coreml-sogniXLturbo_alpha1_ad) at 1024x1024 by default. The face image is passed via --ref and styled according to the prompt. Cannot be combined with --video or -c/--context.

Agent usage:

# Photobooth: stylize a face photo
node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"

# Multiple photobooth outputs
node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -n 4 -o /tmp/stylized.png "LinkedIn professional headshot"

Multiple Angles (Turnaround)

Generate specific camera angles from a single reference image using the Multiple Angles LoRA:

# Single angle
node sogni-gen.mjs --multi-angle -c subject.jpg r
  --azimuth front-right --elevation eye-level --distance medium r
  --angle-strength 0.9 r
  "studio portrait, same person"

# 360 sweep (8 azimuths)
node sogni-gen.mjs --angles-360 -c subject.jpg --distance medium --elevation eye-level r
  "studio portrait, same person"

# 360 sweep video (looping mp4, uses i2v between angles; requires ffmpeg)
node sogni-gen.mjs --angles-360 --angles-360-video /tmp/turntable.mp4 r
  -c subject.jpg --distance medium --elevation eye-level r
  "studio portrait, same person"

The prompt is auto-built with the required token plus the selected camera angle keywords. --angles-360-video generates i2v clips between consecutive angles (including last→first) and concatenates them with ffmpeg for a seamless loop.

360 Video Best Practices

When a user requests a "360 video", follow this workflow:

Default camera parameters (do not ask unless they specify):
- Elevation: default to medium
- Distance: default to medium

Map user terms to flags:

User says	Flag value
"high" angle	`--elevation high-angle`
"medium" angle	`--elevation eye-level`
"low" angle	`--elevation low-angle`
"close"	`--distance close-up`
"medium" distance	`--distance medium`
"far"	`--distance wide`

Always use first-frame/last-frame stitching - the --angles-360-video flag automatically handles this by generating i2v clips between consecutive angles including last→first for seamless looping.

Example command:

node sogni-gen.mjs --angles-360 --angles-360-video /tmp/output.mp4 r
  -c /path/to/image.png --elevation eye-level --distance medium r
  "description of subject"

Transition Video Rule

For any transition video work, always use the Sogni skill/plugin (not raw ffmpeg or other shell commands). Use the built-in --extract-last-frame, --concat-videos, and --looping flags for video manipulation.

Insufficient Funds Handling

When you see "Debit Error: Insufficient funds", reply:

"Insufficient funds. Claim 50 free daily Spark points at https://app.sogni.ai/"

Video Generation

Generate videos from a reference image:

# Text-to-video (t2v)
node sogni-gen.mjs --video "ocean waves at sunset"

# Basic video from image
node sogni-gen.mjs --video --ref cat.jpg -o cat.mp4 "cat walks around"

# Use last generated image as reference
node sogni-gen.mjs --last-image --video "gentle camera pan"

# Custom duration and FPS
node sogni-gen.mjs --video --ref scene.png --duration 10 --fps 24 "zoom out slowly"

# Sound-to-video (s2v)
node sogni-gen.mjs --video --ref face.jpg --ref-audio speech.m4a r
  -m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync talking head"

# Animate (motion transfer)
node sogni-gen.mjs --video --ref subject.jpg --ref-video motion.mp4 r
  --workflow animate-move "transfer motion"

Video-to-Video (V2V) with ControlNet

Transform an existing video using LTX-2 models with ControlNet guidance:

# Basic v2v with canny edge detection
node sogni-gen.mjs --video --workflow v2v --ref-video input.mp4 r
  --controlnet-name canny "stylized anime version"

# V2V with pose detection and custom strength
node sogni-gen.mjs --video --workflow v2v --ref-video dance.mp4 r
  --controlnet-name pose --controlnet-strength 0.7 "robot dancing"

# V2V with depth map
node sogni-gen.mjs --video --workflow v2v --ref-video scene.mp4 r
  --controlnet-name depth "watercolor painting style"

ControlNet types: canny (edge detection), pose (body pose), depth (depth map), detailer (detail enhancement).

Photo Restoration

Restore damaged vintage photos using Qwen image editing:

# Basic restoration
sogni-gen -c damaged_photo.jpg -o restored.png r
  "professionally restore this vintage photograph, remove damage and scratches"

# Detailed restoration with preservation hints
sogni-gen -c old_photo.jpg -o restored.png -w 1024 -h 1280 r
  "restore this vintage photo, remove peeling, tears and wear marks, r
  preserve natural features and expression, maintain warm nostalgic color tones"

Tips for good restorations:

Describe the damage: "peeling", "scratches", "tears", "fading"
Specify what to preserve: "natural features", "eye color", "hair", "expression"
Mention the era for color tones: "1970s warm tones", "vintage sepia"

Finding received images (T@elegrimm/etc):

node {{skillDir}}/sogni-gen.mjs --json --list-media images

Do NOT use ls, cp, or other shell commands to browse user files. Always use --list-media to find inbound media.

IMPORTANT KEYWORD RULE

If the user message includes the word "photobooth" (case-insensitive), always use --photobooth mode with --ref set to the user-provided face image.
Prioritize this rule over generic image-edit flows (-c) for that request.

Agent Usage

When user asks to generate/draw/create an image:

# Generate and save locally
node {{skillDir}}/sogni-gen.mjs -q -o /tmp/generated.png "user's prompt"

# Edit an existing image
node {{skillDir}}/sogni-gen.mjs -q -c /path/to/input.jpg -o /tmp/edited.png "make it pop art style"

# Generate video from image
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -o /tmp/video.mp4 "camera slowly zooms in"

# Generate text-to-video
node {{skillDir}}/sogni-gen.mjs -q --video -o /tmp/video.mp4 "ocean waves at sunset"

# Photobooth: stylize a face photo
node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"

# Check current SPARK/SOGNI balances (no prompt required)
node {{skillDir}}/sogni-gen.mjs --json --balance

# Find user-sent images/audio
node {{skillDir}}/sogni-gen.mjs --json --list-media images

# Then send via message tool with filePath

Security: Agents must use the CLI's built-in flags (--extract-last-frame, --concat-videos, --list-media) for all file operations and video manipulation. Never run raw shell commands (ffmpeg, ls, cp, etc.) directly.

Animate Between Two Images (First-Frame / Last-Frame)

When a user asks to animate between two images, use --ref (first frame) and --ref-end (last frame) to create a creative interpolation video:

# Animate from image A to image B
node {{skillDir}}/sogni-gen.mjs -q --video --ref /tmp/imageA.png --ref-end /tmp/imageB.png -o /tmp/transition.mp4 "descriptive prompt of the transition"

Animate a Video to an Image (Scene Continuation)

When a user asks to animate from a video to an image (or "continue" a video into a new scene):

Extract the last frame of the existing video using the built-in safe wrapper:

node {{skillDir}}/sogni-gen.mjs --extract-last-frame /tmp/existing.mp4 /tmp/lastframe.png

Generate a new video using the last frame as --ref and the target image as --ref-end:

node {{skillDir}}/sogni-gen.mjs -q --video --ref /tmp/lastframe.png --ref-end /tmp/target.png -o /tmp/continuation.mp4 "scene transition prompt"

Concatenate the videos using the built-in safe wrapper:

node {{skillDir}}/sogni-gen.mjs --concat-videos /tmp/full_sequence.mp4 /tmp/existing.mp4 /tmp/continuation.mp4

This ensures visual continuity — the new clip picks up exactly where the previous one ended.

Do NOT run raw ffmpeg commands. Always use --extract-last-frame and --concat-videos for video manipulation.

Always apply this pattern when:

User says "animate image A to image B" → use --ref A --ref-end B
User says "animate this video to this image" → extract last frame, use as --ref, target image as --ref-end, then stitch
User says "continue this video" with a target image → same as above

JSON Output

{
  "success": true,
  "prompt": "a cat wearing a hat",
  "model": "z_image_turbo_bf16", 
  "width": 512,
  "height": 512,
  "urls": ["https://..."],
  "localPath": "/tmp/cat.png"
}

On error (with --json), the script returns a single JSON object like:

{
  "success": false,
  "error": "Video width and height must be divisible by 16 (got 500x512).",
  "errorCode": "INVALID_VIDEO_SIZE",
  "hint": "Choose --width/--height divisible by 16. For i2v, also match the reference aspect ratio."
}

Balance check example (--json --balance):

{
  "success": true,
  "type": "balance",
  "spark": 12.34,
  "sogni": 0.56
}

Cost

Uses Spark tokens from your Sogni account. 512x512 images are most cost-efficient.

Troubleshooting

Auth errors: Check credentials in ~/.config/sogni/credentials
i2v sizing gotchas: Video sizes are constrained (min 480px, max 1536px, divisible by 16). For i2v, the client wrapper resizes the reference (fit: inside) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size (example: 1024x1536 requested but ref becomes 1024x1535).
Auto-adjustment: With a local --ref, the script will auto-adjust the requested size to avoid non-16 resized reference dimensions.
If the script adjusts your size but you want to fail instead: pass --strict-size and it will print a suggested --width/--height.
Timeouts: Try a faster model or increase -t timeout
No workers: Check https://sogni.ai for network status

上一篇：Suno Headless：Linux 服务器 AI 音乐生成 - Openclaw Skills 下一篇：日语：AI 的类人语言合成 - Openclaw Skills