Sogni AI 图像与视频生成:去中心化 AI 媒体 - Openclaw Skills

作者:互联网

2026-03-27

AI教程

什么是 Sogni 图像与视频生成?

Sogni 图像与视频生成是一个专业级技能,利用 Sogni AI 的去中心化 GPU 网络制作高保真媒体内容。通过将此工具集成到您的 Openclaw Skills 工作流中,您可以访问广泛的最先进模型,用于文本生成图像、文本生成视频和高级图像编辑。此技能专为人类开发人员和 AI 代理设计,提供了一个强大的媒体创作接口,无需本地高端硬件。

该实用工具不仅限于简单生成,还提供用于面部转移、照片修复和复杂视频操作的专业工作流。作为 Openclaw Skills 库的核心组件,它在本地文件管理和去中心化 AI 计算之间提供了无缝桥梁,确保生成的资产被有序组织并易于下游应用程序访问。

下载入口:https://github.com/openclaw/skills/tree/main/skills/krunkosaurus/sogni-gen

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install sogni-gen

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 sogni-gen。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

Sogni 图像与视频生成 应用场景

  • 从简单的文本提示或静态参考图像创建电影级的视频序列。
  • 使用 360 度扫描模式在多个摄像机角度生成一致的人物肖像。
  • 通过照相馆面部转移功能将用户照片转换为风格化的化身。
  • 利用先进的图像到图像编辑模型修复复古和受损照片。
  • 使用音频驱动动画(s2v)工作流生成逼真的口型同步动画。
Sogni 图像与视频生成 工作原理
  1. 用户或代理通过 CLI 发起请求,指定提示词、模型和参考文件。
  2. 技能从本地 Sogni 凭据文件中检索身份验证令牌以授权请求。
  3. 命令传输到 Sogni AI 的去中心化网络,由专门的工作节点处理生成。
  4. 坚控实时进度,最终媒体资产生成为 URL 或下载到指定的本地路径。
  5. 有关渲染的元数据(包括种子和模型参数)存储在本地,以便进行迭代编辑或继续创作。

Sogni 图像与视频生成 配置指南

要在您的 Openclaw Skills 环境中开始使用此技能,请按照以下步骤操作:

  1. 从 Sogni AI 仪表板获取您的凭据。
  2. 创建安全的凭据文件:
mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_USERNAME=your_username
SOGNI_PASSWORD=your_password
EOF
chmod 600 ~/.config/sogni/credentials
  1. 通过 npm 安装技能:
mkdir -p ~/.clawdbot/skills
cd ~/.clawdbot/skills
npm i sogni-gen
ln -sfn node_modules/sogni-gen sogni-gen

Sogni 图像与视频生成 数据架构与分类体系

该技能通过 Openclaw Skills 生态系统内的一组结构化文件和目录来管理其操作:

路径 用途
~/.config/sogni/credentials 存储用于 API 访问的加密用户名和密码。
~/.config/sogni/last-render.json 持久化最近一次生成的元数据,以便轻松检索。
~/.openclaw/openclaw.json 存储模型 ID 和维度等默认插件配置。
~/.clawdbot/media/inbound 作为定位用户提供的图像和音频的主要目录。
~/Downloads/sogni 通过模型上下文协议 (MCP) 生成的资产的默认保存位置。
name: sogni-gen
version: "1.5.11"
description: Generate images **and videos** using Sogni AI's decentralized network, with local credential/config files and optional local media inputs. Ask the agent to "draw", "generate", "create an image", or "make a video/animate" from a prompt or reference image.
homepage: https://sogni.ai
metadata:
  clawdbot:
    emoji: "??"
    primaryEnv: "SOGNI_USERNAME"
    os: ["darwin", "linux", "win32"]
    requires:
      bins: ["node"]
      anyBins: ["ffmpeg"]
      env:
        - "SOGNI_USERNAME"
        - "SOGNI_PASSWORD"
        - "SOGNI_CREDENTIALS_PATH"
        - "SOGNI_LAST_RENDER_PATH"
        - "SOGNI_MEDIA_INBOUND_DIR"
        - "OPENCLAW_CONFIG_PATH"
        - "OPENCLAW_PLUGIN_CONFIG"
        - "FFMPEG_PATH"
        - "SOGNI_DOWNLOADS_DIR"
        - "SOGNI_MCP_SAVE_DOWNLOADS"
      config:
        - "~/.config/sogni/credentials"
        - "~/.openclaw/openclaw.json"
        - "~/.clawdbot/media/inbound"
        - "~/.config/sogni/last-render.json"
        - "~/Downloads/sogni"
    install:
      - id: npm
        kind: exec
        command: "cd {{skillDir}} && npm i"
        label: "Install dependencies"

Sogni Image & Video Generation

Generate images and videos using Sogni AI's decentralized GPU network.

Setup

  1. Get Sogni credentials at https://app.sogni.ai/
  2. Create credentials file:
mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_USERNAME=your_username
SOGNI_PASSWORD=your_password
EOF
chmod 600 ~/.config/sogni/credentials
  1. Install dependencies (if cloned):
cd /path/to/sogni-gen
npm i
  1. Or install from npm (no git clone):
mkdir -p ~/.clawdbot/skills
cd ~/.clawdbot/skills
npm i sogni-gen
ln -sfn node_modules/sogni-gen sogni-gen

Filesystem Paths and Overrides

Default file paths used by this skill:

  • Credentials file (read): ~/.config/sogni/credentials
  • Last render metadata (read/write): ~/.config/sogni/last-render.json
  • OpenClaw config (read): ~/.openclaw/openclaw.json
  • Media listing for --list-media (read): ~/.clawdbot/media/inbound
  • MCP local result copies (write): ~/Downloads/sogni

Path override environment variables:

  • SOGNI_CREDENTIALS_PATH
  • SOGNI_LAST_RENDER_PATH
  • SOGNI_MEDIA_INBOUND_DIR
  • OPENCLAW_CONFIG_PATH
  • SOGNI_DOWNLOADS_DIR (MCP)
  • SOGNI_MCP_SAVE_DOWNLOADS=0 to disable MCP local file writes

Usage (Images & Video)

# Generate and get URL
node sogni-gen.mjs "a cat wearing a hat"

# Save to file
node sogni-gen.mjs -o /tmp/cat.png "a cat wearing a hat"

# JSON output (for scripting)
node sogni-gen.mjs --json "a cat wearing a hat"

# Check token balances (no prompt required)
node sogni-gen.mjs --balance

# Check token balances in JSON
node sogni-gen.mjs --json --balance

# Quiet mode (suppress progress)
node sogni-gen.mjs -q -o /tmp/cat.png "a cat wearing a hat"

Options

Flag Description Default
-o, --output Save to file prints URL
-m, --model Model ID z_image_turbo_bf16
-w, --width Width 512
-h, --height Height 512
-n, --count Number of images 1
-t, --timeout Timeout seconds 30 (300 for video)
-s, --seed Specific seed random
--last-seed Reuse seed from last render -
--seed-strategy Seed strategy: random|prompt-hash prompt-hash
--multi-angle Multiple angles LoRA mode (Qwen Image Edit) -
--angles-360 Generate 8 azimuths (front -> front-left) -
--angles-360-video Assemble looping 360 mp4 using i2v between angles (requires ffmpeg) -
--azimuth front|front-right|right|back-right|back|back-left|left|front-left front
--elevation low-angle|eye-level|elevated|high-angle eye-level
--distance close-up|medium|wide medium
--angle-strength LoRA strength for multiple_angles 0.9
--angle-description Optional subject description -
--steps Override steps (model-dependent) -
--guidance Override guidance (model-dependent) -
--output-format Image output format: png|jpg png
--sampler Sampler (model-dependent) -
--scheduler Scheduler (model-dependent) -
--lora LoRA id (repeatable, edit only) -
--loras Comma-separated LoRA ids -
--lora-strength LoRA strength (repeatable) -
--lora-strengths Comma-separated LoRA strengths -
--token-type Token type: spark|sogni spark
--balance, --balances Show SPARK/SOGNI balances and exit -
-c, --context Context image for editing -
--last-image Use last generated image as context/ref -
--video, -v Generate video instead of image -
--workflow Video workflow (t2v|i2v|s2v|v2v|animate-move|animate-replace) inferred
--fps Frames per second (video) 16
--duration Duration in seconds (video) 5
--frames Override total frames (video) -
--auto-resize-assets Auto-resize video assets true
--no-auto-resize-assets Disable auto-resize -
--estimate-video-cost Estimate video cost and exit (requires --steps) -
--photobooth Face transfer mode (InstantID + SDXL Turbo) -
--cn-strength ControlNet strength (photobooth) 0.8
--cn-guidance-end ControlNet guidance end point (photobooth) 0.3
--ref Reference image for video or photobooth face required for video/photobooth
--ref-end End frame for i2v interpolation -
--ref-audio Reference audio for s2v -
--ref-video Reference video for animate/v2v workflows -
--controlnet-name ControlNet type for v2v: canny|pose|depth|detailer -
--controlnet-strength ControlNet strength for v2v (0.0-1.0) 0.8
--sam2-coordinates SAM2 click coords for animate-replace (x,y or x1,y1;x2,y2) -
--trim-end-frame Trim last frame for seamless video stitching -
--first-frame-strength Keyframe strength for start frame (0.0-1.0) -
--last-frame-strength Keyframe strength for end frame (0.0-1.0) -
--last Show last render info -
--json JSON output false
--strict-size Do not auto-adjust i2v video size for reference resizing constraints false
-q, --quiet No progress output false
--extract-last-frame Extract last frame from video (safe ffmpeg wrapper) -
--concat-videos Concatenate video clips (safe ffmpeg wrapper) -
--list-media [type] List recent inbound media (images|audio|all) images

OpenClaw Config Defaults

When installed as an OpenClaw plugin, sogni-gen will read defaults from:

~/.openclaw/openclaw.json

{
  "plugins": {
    "entries": {
      "sogni-gen": {
        "enabled": true,
        "config": {
          "defaultImageModel": "z_image_turbo_bf16",
          "defaultEditModel": "qwen_image_edit_2511_fp8_lightning",
          "defaultPhotoboothModel": "coreml-sogniXLturbo_alpha1_ad",
          "videoModels": {
            "t2v": "wan_v2.2-14b-fp8_t2v_lightx2v",
            "i2v": "wan_v2.2-14b-fp8_i2v_lightx2v",
            "s2v": "wan_v2.2-14b-fp8_s2v_lightx2v",
            "animate-move": "wan_v2.2-14b-fp8_animate-move_lightx2v",
            "animate-replace": "wan_v2.2-14b-fp8_animate-replace_lightx2v",
            "v2v": "ltx2-19b-fp8_v2v_distilled"
          },
          "defaultVideoWorkflow": "t2v",
          "defaultNetwork": "fast",
          "defaultTokenType": "spark",
          "seedStrategy": "prompt-hash",
          "modelDefaults": {
            "flux1-schnell-fp8": { "steps": 4, "guidance": 3.5 },
            "flux2_dev_fp8": { "steps": 20, "guidance": 7.5 }
          },
          "defaultWidth": 768,
          "defaultHeight": 768,
          "defaultCount": 1,
          "defaultFps": 16,
          "defaultDurationSec": 5,
          "defaultImageTimeoutSec": 30,
          "defaultVideoTimeoutSec": 300,
          "credentialsPath": "~/.config/sogni/credentials",
          "lastRenderPath": "~/.config/sogni/last-render.json",
          "mediaInboundDir": "~/.clawdbot/media/inbound"
        }
      }
    }
  }
}

CLI flags always override these defaults. If your OpenClaw config lives elsewhere, set OPENCLAW_CONFIG_PATH. Seed strategies: prompt-hash (deterministic) or random.

Image Models

Model Speed Use Case
z_image_turbo_bf16 Fast (~5-10s) General purpose, default
flux1-schnell-fp8 Very fast Quick iterations
flux2_dev_fp8 Slow (~2min) High quality
chroma-v.46-flash_fp8 Medium Balanced
qwen_image_edit_2511_fp8 Medium Image editing with context (up to 3)
qwen_image_edit_2511_fp8_lightning Fast Quick image editing
coreml-sogniXLturbo_alpha1_ad Fast Photobooth face transfer (SDXL Turbo)

Video Models

WAN 2.2 Models

Model Speed Use Case
wan_v2.2-14b-fp8_i2v_lightx2v Fast Default video generation
wan_v2.2-14b-fp8_i2v Slow Higher quality video
wan_v2.2-14b-fp8_t2v_lightx2v Fast Text-to-video
wan_v2.2-14b-fp8_s2v_lightx2v Fast Sound-to-video
wan_v2.2-14b-fp8_animate-move_lightx2v Fast Animate-move
wan_v2.2-14b-fp8_animate-replace_lightx2v Fast Animate-replace

LTX-2 Models

Model Speed Use Case
ltx2-19b-fp8_t2v_distilled Fast (~2-3min) Text-to-video, 8-step
ltx2-19b-fp8_t2v Medium (~5min) Text-to-video, 20-step quality
ltx2-19b-fp8_v2v_distilled Fast (~3min) Video-to-video with ControlNet
ltx2-19b-fp8_v2v Medium (~5min) Video-to-video with ControlNet, quality

Image Editing with Context

Edit images using reference images (Qwen models support up to 3):

# Single context image
node sogni-gen.mjs -c photo.jpg "make the background a beach"

# Multiple context images (subject + style)
node sogni-gen.mjs -c subject.jpg -c style.jpg "apply the style to the subject"

# Use last generated image as context
node sogni-gen.mjs --last-image "make it more vibrant"

When context images are provided without -m, defaults to qwen_image_edit_2511_fp8_lightning.

Photobooth (Face Transfer)

Generate stylized portraits from a face photo using InstantID ControlNet. When a user mentions "photobooth", wants a stylized portrait of themselves, or asks to transfer their face into a style, use --photobooth with --ref pointing to their face image.

# Basic photobooth
node sogni-gen.mjs --photobooth --ref face.jpg "80s fashion portrait"

# Multiple outputs
node sogni-gen.mjs --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"

# Custom ControlNet tuning
node sogni-gen.mjs --photobooth --ref face.jpg --cn-strength 0.6 --cn-guidance-end 0.5 "oil painting"

Uses SDXL Turbo (coreml-sogniXLturbo_alpha1_ad) at 1024x1024 by default. The face image is passed via --ref and styled according to the prompt. Cannot be combined with --video or -c/--context.

Agent usage:

# Photobooth: stylize a face photo
node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"

# Multiple photobooth outputs
node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -n 4 -o /tmp/stylized.png "LinkedIn professional headshot"

Multiple Angles (Turnaround)

Generate specific camera angles from a single reference image using the Multiple Angles LoRA:

# Single angle
node sogni-gen.mjs --multi-angle -c subject.jpg r
  --azimuth front-right --elevation eye-level --distance medium r
  --angle-strength 0.9 r
  "studio portrait, same person"

# 360 sweep (8 azimuths)
node sogni-gen.mjs --angles-360 -c subject.jpg --distance medium --elevation eye-level r
  "studio portrait, same person"

# 360 sweep video (looping mp4, uses i2v between angles; requires ffmpeg)
node sogni-gen.mjs --angles-360 --angles-360-video /tmp/turntable.mp4 r
  -c subject.jpg --distance medium --elevation eye-level r
  "studio portrait, same person"

The prompt is auto-built with the required token plus the selected camera angle keywords. --angles-360-video generates i2v clips between consecutive angles (including last→first) and concatenates them with ffmpeg for a seamless loop.

360 Video Best Practices

When a user requests a "360 video", follow this workflow:

  1. Default camera parameters (do not ask unless they specify):

    • Elevation: default to medium
    • Distance: default to medium
  2. Map user terms to flags:

    User says Flag value
    "high" angle --elevation high-angle
    "medium" angle --elevation eye-level
    "low" angle --elevation low-angle
    "close" --distance close-up
    "medium" distance --distance medium
    "far" --distance wide
  3. Always use first-frame/last-frame stitching - the --angles-360-video flag automatically handles this by generating i2v clips between consecutive angles including last→first for seamless looping.

  4. Example command:

    node sogni-gen.mjs --angles-360 --angles-360-video /tmp/output.mp4 r
      -c /path/to/image.png --elevation eye-level --distance medium r
      "description of subject"
    

Transition Video Rule

For any transition video work, always use the Sogni skill/plugin (not raw ffmpeg or other shell commands). Use the built-in --extract-last-frame, --concat-videos, and --looping flags for video manipulation.

Insufficient Funds Handling

When you see "Debit Error: Insufficient funds", reply:

"Insufficient funds. Claim 50 free daily Spark points at https://app.sogni.ai/"

Video Generation

Generate videos from a reference image:

# Text-to-video (t2v)
node sogni-gen.mjs --video "ocean waves at sunset"

# Basic video from image
node sogni-gen.mjs --video --ref cat.jpg -o cat.mp4 "cat walks around"

# Use last generated image as reference
node sogni-gen.mjs --last-image --video "gentle camera pan"

# Custom duration and FPS
node sogni-gen.mjs --video --ref scene.png --duration 10 --fps 24 "zoom out slowly"

# Sound-to-video (s2v)
node sogni-gen.mjs --video --ref face.jpg --ref-audio speech.m4a r
  -m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync talking head"

# Animate (motion transfer)
node sogni-gen.mjs --video --ref subject.jpg --ref-video motion.mp4 r
  --workflow animate-move "transfer motion"

Video-to-Video (V2V) with ControlNet

Transform an existing video using LTX-2 models with ControlNet guidance:

# Basic v2v with canny edge detection
node sogni-gen.mjs --video --workflow v2v --ref-video input.mp4 r
  --controlnet-name canny "stylized anime version"

# V2V with pose detection and custom strength
node sogni-gen.mjs --video --workflow v2v --ref-video dance.mp4 r
  --controlnet-name pose --controlnet-strength 0.7 "robot dancing"

# V2V with depth map
node sogni-gen.mjs --video --workflow v2v --ref-video scene.mp4 r
  --controlnet-name depth "watercolor painting style"

ControlNet types: canny (edge detection), pose (body pose), depth (depth map), detailer (detail enhancement).

Photo Restoration

Restore damaged vintage photos using Qwen image editing:

# Basic restoration
sogni-gen -c damaged_photo.jpg -o restored.png r
  "professionally restore this vintage photograph, remove damage and scratches"

# Detailed restoration with preservation hints
sogni-gen -c old_photo.jpg -o restored.png -w 1024 -h 1280 r
  "restore this vintage photo, remove peeling, tears and wear marks, r
  preserve natural features and expression, maintain warm nostalgic color tones"

Tips for good restorations:

  • Describe the damage: "peeling", "scratches", "tears", "fading"
  • Specify what to preserve: "natural features", "eye color", "hair", "expression"
  • Mention the era for color tones: "1970s warm tones", "vintage sepia"

Finding received images (T@elegrimm/etc):

node {{skillDir}}/sogni-gen.mjs --json --list-media images

Do NOT use ls, cp, or other shell commands to browse user files. Always use --list-media to find inbound media.

IMPORTANT KEYWORD RULE

  • If the user message includes the word "photobooth" (case-insensitive), always use --photobooth mode with --ref set to the user-provided face image.
  • Prioritize this rule over generic image-edit flows (-c) for that request.

Agent Usage

When user asks to generate/draw/create an image:

# Generate and save locally
node {{skillDir}}/sogni-gen.mjs -q -o /tmp/generated.png "user's prompt"

# Edit an existing image
node {{skillDir}}/sogni-gen.mjs -q -c /path/to/input.jpg -o /tmp/edited.png "make it pop art style"

# Generate video from image
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -o /tmp/video.mp4 "camera slowly zooms in"

# Generate text-to-video
node {{skillDir}}/sogni-gen.mjs -q --video -o /tmp/video.mp4 "ocean waves at sunset"

# Photobooth: stylize a face photo
node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"

# Check current SPARK/SOGNI balances (no prompt required)
node {{skillDir}}/sogni-gen.mjs --json --balance

# Find user-sent images/audio
node {{skillDir}}/sogni-gen.mjs --json --list-media images

# Then send via message tool with filePath

Security: Agents must use the CLI's built-in flags (--extract-last-frame, --concat-videos, --list-media) for all file operations and video manipulation. Never run raw shell commands (ffmpeg, ls, cp, etc.) directly.

Animate Between Two Images (First-Frame / Last-Frame)

When a user asks to animate between two images, use --ref (first frame) and --ref-end (last frame) to create a creative interpolation video:

# Animate from image A to image B
node {{skillDir}}/sogni-gen.mjs -q --video --ref /tmp/imageA.png --ref-end /tmp/imageB.png -o /tmp/transition.mp4 "descriptive prompt of the transition"

Animate a Video to an Image (Scene Continuation)

When a user asks to animate from a video to an image (or "continue" a video into a new scene):

  1. Extract the last frame of the existing video using the built-in safe wrapper:
    node {{skillDir}}/sogni-gen.mjs --extract-last-frame /tmp/existing.mp4 /tmp/lastframe.png
    
  2. Generate a new video using the last frame as --ref and the target image as --ref-end:
    node {{skillDir}}/sogni-gen.mjs -q --video --ref /tmp/lastframe.png --ref-end /tmp/target.png -o /tmp/continuation.mp4 "scene transition prompt"
    
  3. Concatenate the videos using the built-in safe wrapper:
    node {{skillDir}}/sogni-gen.mjs --concat-videos /tmp/full_sequence.mp4 /tmp/existing.mp4 /tmp/continuation.mp4
    

This ensures visual continuity — the new clip picks up exactly where the previous one ended.

Do NOT run raw ffmpeg commands. Always use --extract-last-frame and --concat-videos for video manipulation.

Always apply this pattern when:

  • User says "animate image A to image B" → use --ref A --ref-end B
  • User says "animate this video to this image" → extract last frame, use as --ref, target image as --ref-end, then stitch
  • User says "continue this video" with a target image → same as above

JSON Output

{
  "success": true,
  "prompt": "a cat wearing a hat",
  "model": "z_image_turbo_bf16", 
  "width": 512,
  "height": 512,
  "urls": ["https://..."],
  "localPath": "/tmp/cat.png"
}

On error (with --json), the script returns a single JSON object like:

{
  "success": false,
  "error": "Video width and height must be divisible by 16 (got 500x512).",
  "errorCode": "INVALID_VIDEO_SIZE",
  "hint": "Choose --width/--height divisible by 16. For i2v, also match the reference aspect ratio."
}

Balance check example (--json --balance):

{
  "success": true,
  "type": "balance",
  "spark": 12.34,
  "sogni": 0.56
}

Cost

Uses Spark tokens from your Sogni account. 512x512 images are most cost-efficient.

Troubleshooting

  • Auth errors: Check credentials in ~/.config/sogni/credentials
  • i2v sizing gotchas: Video sizes are constrained (min 480px, max 1536px, divisible by 16). For i2v, the client wrapper resizes the reference (fit: inside) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size (example: 1024x1536 requested but ref becomes 1024x1535).
  • Auto-adjustment: With a local --ref, the script will auto-adjust the requested size to avoid non-16 resized reference dimensions.
  • If the script adjusts your size but you want to fail instead: pass --strict-size and it will print a suggested --width/--height.
  • Timeouts: Try a faster model or increase -t timeout
  • No workers: Check https://sogni.ai for network status