baoyu-image-gen:多提供商 AI 图像生成 - Openclaw Skills
作者:互联网
2026-03-29
什么是 baoyu-image-gen?
baoyu-image-gen 是为 Openclaw Skills 设计的强大工具,使 AI 代理能够使用行业领先的 API 创建高质量的视觉内容。它支持广泛的提供商,包括 OpenAI、Google (Gemini 和 Imagen)、DashScope (阿里云) 和 Replicate。无论您需要简单的文本转图像生成,还是使用参考图像进行复杂的模型编辑,此技能都提供了一个统一的接口来无缝处理各种模型要求。
通过将其集成到您的工作流程中,您可以直接从 AI 代理环境自动创建营销资产、UI 占位符或插图内容。它通过 EXTEND.md 具有复杂的配置系统,允许对 Openclaw Skills 生态系统内不同项目和环境的默认模型、质量预设和输出尺寸进行精细控制。
下载入口:https://github.com/openclaw/skills/tree/main/skills/peters820-art/baoyu-image-gen
安装与下载
1. ClawHub CLI
从源直接安装技能的最快方式。
npx clawhub@latest install baoyu-image-gen
2. 手动安装
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
3. 提示词安装
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 baoyu-image-gen。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
baoyu-image-gen 应用场景
- 根据文本提示自动创建博客文章插图和社交媒体图形。
- 在开发过程中生成 UI/UX 设计资产和占位符。
- 使用参考图像进行多模态风格迁移,迭代视觉概念。
- 使用并行执行为高容量项目批量制作大量图像。
- 在 Google 和 OpenAI 等提供商之间切换,为特定提示找到最佳美感。
- 技能通过检查项目或用户目录中的 EXTEND.md 配置文件来加载首选默认设置。
- 如果未找到配置,它会触发阻塞设置过程以定义 API 密钥、提供商和默认保存位置。
- 代理收到带有提示词和可选参数(如宽高比、质量或参考图像)的命令。
- 它根据提供的参数、可用的 API 密钥或明确的用户标志选择最佳提供商。
- 脚本通过选择的 API(OpenAI、Google、DashScope 或 Replicate)执行生成。
- 对于大批量任务,如果明确要求,技能可以启动多个子代理执行并行生成。
- 最终图像保存到指定路径,并向代理返回操作的 JSON 摘要。
baoyu-image-gen 配置指南
要开始使用 Openclaw Skills 的此组件,请确保您拥有首选提供商所需的 API 密钥。
# 技能将在首次运行时自动引导您完成设置
npx -y bun scripts/main.ts --prompt "A serene mountain landscape" --image landscape.png
# 手动设置环境变量:
export OPENAI_API_KEY='your-key'
export GOOGLE_API_KEY='your-key'
baoyu-image-gen 数据架构与分类体系
该技能通过 CLI 标志和配置文件组织其操作,以确保 Openclaw Skills 内部的一致输出。
| 参数 | 类型 | 描述 |
|---|---|---|
| --prompt | 字符串 | 要生成的图像的文本描述。 |
| --image | 路径 | 文件所需的输出目的地。 |
| --ar | 比例 | 支持:1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1。 |
| --quality | 预设 | 'normal' (1K) 或 '2k' (2K 分辨率)。 |
| --ref | 文件 | 用于多模态任务的参考图像 (Google/OpenAI)。 |
| --provider | 字符串 | 强制使用 google, openai, dashscope 或 replicate。 |
name: baoyu-image-gen description: AI image generation with OpenAI, Google, DashScope and Replicate APIs. Supports text-to-image, reference images, aspect ratios. Sequential by default; parallel generation available on request. Use when user asks to generate, create, or draw images.
Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Google, DashScope (阿里通义万象) and Replicate providers.
Script Directory
Agent Execution:
SKILL_DIR= this SKILL.md file's directory- Script path =
${SKILL_DIR}/scripts/main.ts
Step 0: Load Preferences ? BLOCKING
CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.
Check EXTEND.md existence (priority: project → user):
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project"
test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"
| Result | Action |
|---|---|
| Found | Load, parse, apply settings. If default_model.[provider] is null → ask model only (Flow 2) |
| Not found | ? Run first-time setup (references/config/first-time-setup.md) → Save EXTEND.md → Then continue |
CRITICAL: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.
| Path | Location |
|---|---|
.baoyu-skills/baoyu-image-gen/EXTEND.md |
Project directory |
$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md |
User home |
EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models
Schema: references/config/preferences-schema.md
Usage
# Basic
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9
# High quality
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k
# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference images (Google multimodal or OpenAI edits)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
# With reference images (explicit provider/model)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png
# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider openai
# DashScope (阿里通义万象)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope
# Replicate (google/nano-banana-pro)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Replicate with specific model
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
Options
| Option | Description |
|---|---|
--prompt , -p |
Prompt text |
--promptfiles |
Read prompt from files (concatenated) |
--image |
Output image path (required) |
--provider google|openai|dashscope|replicate |
Force provider (default: google) |
--model , -m |
Model ID (Google: gemini-3-pro-image-preview, gemini-3.1-flash-image-preview; OpenAI: gpt-image-1.5) |
--ar |
Aspect ratio (e.g., 16:9, 1:1, 4:3) |
--size |
Size (e.g., 1024x1024) |
--quality normal|2k |
Quality preset (default: 2k) |
--imageSize 1K|2K|4K |
Image size for Google (default: from quality) |
--ref |
Reference images. Supported by Google multimodal (gemini-3-pro-image-preview, gemini-3-flash-preview, gemini-3.1-flash-image-preview) and OpenAI edits (GPT Image models). If provider omitted: Google first, then OpenAI |
--n |
Number of images |
--json |
JSON output |
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key |
GOOGLE_API_KEY |
Google API key |
DASHSCOPE_API_KEY |
DashScope API key (阿里云) |
REPLICATE_API_TOKEN |
Replicate API token |
OPENAI_IMAGE_MODEL |
OpenAI model override |
GOOGLE_IMAGE_MODEL |
Google model override |
DASHSCOPE_IMAGE_MODEL |
DashScope model override (default: z-image-turbo) |
REPLICATE_IMAGE_MODEL |
Replicate model override (default: google/nano-banana-pro) |
OPENAI_BASE_URL |
Custom OpenAI endpoint |
GOOGLE_BASE_URL |
Custom Google endpoint |
DASHSCOPE_BASE_URL |
Custom DashScope endpoint |
REPLICATE_BASE_URL |
Custom Replicate endpoint |
Load Priority: CLI args > EXTEND.md > env vars > > ~/.baoyu-skills/.env
Replicate Model Configuration
When using --provider replicate, the model can be configured in the following ways (highest priority first):
- CLI flag:
--model - EXTEND.md:
default_model.replicate - Env var:
REPLICATE_IMAGE_MODEL - Built-in default:
google/nano-banana-pro
Supported model formats:
owner/name(recommended for official models), e.g.google/nano-banana-proowner/name:version(community models by version), e.g.stability-ai/sdxl:
Examples:
# Use Replicate default model
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Override model explicitly
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
Provider Selection
--refprovided + no--provider→ auto-select Google first, then OpenAI, then Replicate--providerspecified → use it (if--ref, must begoogle,openai, orreplicate)- Only one API key available → use that provider
- Multiple available → default to Google
Quality Presets
| Preset | Google imageSize | OpenAI Size | Use Case |
|---|---|---|---|
normal |
1K | 1024px | Quick previews |
2k (default) |
2K | 2048px | Covers, illustrations, infographics |
Google imageSize: Can be overridden with --imageSize 1K|2K|4K
Aspect Ratios
Supported: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
- Google multimodal: uses
imageConfig.aspectRatio - Google Imagen: uses
aspectRatioparameter - OpenAI: maps to closest supported size
Generation Mode
Default: Sequential generation (one image at a time). This ensures stable output and easier debugging.
Parallel Generation: Only use when user explicitly requests parallel/concurrent generation.
| Mode | When to Use |
|---|---|
| Sequential (default) | Normal usage, single images, small batches |
| Parallel | User explicitly requests, large batches (10+) |
Parallel Settings (when requested):
| Setting | Value |
|---|---|
| Recommended concurrency | 4 subagents |
| Max concurrency | 8 subagents |
| Use case | Large batch generation when user requests parallel |
Agent Implementation (parallel mode only):
# Launch multiple generations in parallel using Task tool
# Each Task runs as background subagent with run_in_background=true
# Collect results via TaskOutput when all complete
Error Handling
- Missing API key → error with setup instructions
- Generation failure → auto-retry once
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint (switch to Google multimodal:
gemini-3-pro-image-preview,gemini-3.1-flash-image-preview; or OpenAI GPT Image edits)
Extension Support
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.
相关推荐
专题
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
最新数据
相关文章
信号管道:自动化营销情报工具 - Openclaw Skills
技能收益追踪器:监控 Openclaw 技能并实现变现
AI 合规准备就绪度:评估与治理工具 - Openclaw Skills
FOSMVVM ServerRequest 测试生成器:自动化 API 测试 - Openclaw Skills
酒店搜索器:AI 赋能的住宿与位置情报 - Openclaw Skills
Dub 链接 API:程序化链接管理 - Openclaw Skills
IntercomSwap:P2P BTC 与 USDT 跨链兑换 - Openclaw Skills
spotplay:macOS 原生 Spotify 播放控制 - Openclaw Skills
DeepSeek OCR:AI驱动的图像文本识别 - Openclaw Skills
Web Navigator:自动化网页研究与浏览 - Openclaw Skills
AI精选
