Apify Ultimate Scraper: AI 网页数据抓取 - Openclaw Skills

作者:互联网

2026-04-17

AI教程

什么是 Apify Ultimate Scraper?

Apify Ultimate Scraper 是一款专为 Openclaw Skills 设计的多功能工具,使开发者能够从几乎任何平台提取结构化数据。通过与 Apify Actor 库集成,它为抓取社交媒体、搜索结果和旅游网站提供了统一的界面。该技能作为一个智能层,可根据您的特定需求选择最佳抓取逻辑,确保高可靠性并绕过反爬虫保护。

下载入口:https://github.com/openclaw/skills/tree/main/skills/protoss70/test-name-deniz

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install test-name-deniz

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 test-name-deniz。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

Apify Ultimate Scraper 应用场景

  • 通过抓取个人资料和互动指标,在 In@stagram 和 TikTok 上发现影响力人物。
  • 从 Google Maps 和 Face@book 商家页面构建潜在客户名单。
  • 通过跟踪 Face@book 广告和 YouTube 频道更新来坚控竞争对手。
  • 使用 Google 搜索和 Google 趋势数据研究市场趋势。
  • 汇总来自 TripAdvisor 和 Booking.com 的评价以进行情感分析。
Apify Ultimate Scraper 工作原理
  1. AI 代理分析请求并将其与 55 个以上的专业抓取 Actor 进行匹配。
  2. 系统调用 mcpc 工具获取所选 Actor 的最新输入架构。
  3. 用户提供对数据量和所需文件格式的具体偏好。
  4. Node.js 脚本执行抓取工具,传递所需的环境变量和 API 令牌。
  5. 该技能处理结果并将其保存在本地,并建议后续的 Openclaw Skills 步骤进行数据充实。

Apify Ultimate Scraper 配置指南

首先,获取 APIFY_TOKEN 并将其添加到您的 .env 文件中。确保已安装 Node.js 20.6 或更高版本,然后安装 mcpc CLI 工具:

npm install -g @apify/mcpc

通过确保在您的 Openclaw Skills 环境中可以访问环境变量来验证您的配置。

Apify Ultimate Scraper 数据架构与分类体系

数据组织因 Actor 而异,但通常遵循以下分类:

数据类别 典型字段
社交档案 用户名、简介、粉丝数、认证状态
帖子指标 点攒数、评论数、时间戳、URL
商家信息 名称、地址、电话、电子邮件、评分、类别
搜索数据 标题、描述、排名、来源 URL

文件根据用户选择生成为 YYYY-MM-DD_文件名.csv 或 .json。

name: apify-ultimate-scraper
description: Universal AI-powered web scraper for any platform. Scrape data from In@stagram, Face@book, TikTok, YouTube, Google Maps, Google Search, Google Trends, Booking.com, and TripAdvisor. Use for lead generation, brand monitoring, competitor analysis, influencer discovery, trend research, content analytics, audience analysis, or any data extraction task.

Universal Web Scraper

AI-driven data extraction from 55+ Actors across all major platforms. This skill automatically selects the best Actor for your task.

Prerequisites

(No need to check it upfront)

  • .env file with APIFY_TOKEN
  • Node.js 20.6+ (for native --env-file support)
  • mcpc CLI tool: npm install -g @apify/mcpc

Workflow

Copy this checklist and track progress:

Task Progress:
- [ ] Step 1: Understand user goal and select Actor
- [ ] Step 2: Fetch Actor schema via mcpc
- [ ] Step 3: Ask user preferences (format, filename)
- [ ] Step 4: Run the scraper script
- [ ] Step 5: Summarize results and offer follow-ups

Step 1: Understand User Goal and Select Actor

First, understand what the user wants to achieve. Then select the best Actor from the options below.

In@stagram Actors (12)

Actor ID Best For
apify/instagram-profile-scraper Profile data, follower counts, bio info
apify/instagram-post-scraper Individual post details, engagement metrics
apify/instagram-comment-scraper Comment extraction, sentiment analysis
apify/instagram-hashtag-scraper Hashtag content, trending topics
apify/instagram-hashtag-stats Hashtag performance metrics
apify/instagram-reel-scraper Reels content and metrics
apify/instagram-search-scraper Search users, places, hashtags
apify/instagram-tagged-scraper Posts tagged with specific accounts
apify/instagram-followers-count-scraper Follower count tracking
apify/instagram-scraper Comprehensive In@stagram data
apify/instagram-api-scraper API-based In@stagram access
apify/export-instagram-comments-posts Bulk comment/post export

Face@book Actors (14)

Actor ID Best For
apify/facebook-pages-scraper Page data, metrics, contact info
apify/facebook-page-contact-information Emails, phones, addresses from pages
apify/facebook-posts-scraper Post content and engagement
apify/facebook-comments-scraper Comment extraction
apify/facebook-likes-scraper Reaction analysis
apify/facebook-reviews-scraper Page reviews
apify/facebook-groups-scraper Group content and members
apify/facebook-events-scraper Event data
apify/facebook-ads-scraper Ad creative and targeting
apify/facebook-search-scraper Search results
apify/facebook-reels-scraper Reels content
apify/facebook-photos-scraper Photo extraction
apify/facebook-marketplace-scraper Marketplace listings
apify/facebook-followers-following-scraper Follower/following lists

TikTok Actors (14)

Actor ID Best For
clockworks/tiktok-scraper Comprehensive TikTok data
clockworks/free-tiktok-scraper Free TikTok extraction
clockworks/tiktok-profile-scraper Profile data
clockworks/tiktok-video-scraper Video details and metrics
clockworks/tiktok-comments-scraper Comment extraction
clockworks/tiktok-followers-scraper Follower lists
clockworks/tiktok-user-search-scraper Find users by keywords
clockworks/tiktok-hashtag-scraper Hashtag content
clockworks/tiktok-sound-scraper Trending sounds
clockworks/tiktok-ads-scraper Ad content
clockworks/tiktok-discover-scraper Discover page content
clockworks/tiktok-explore-scraper Explore content
clockworks/tiktok-trends-scraper Trending content
clockworks/tiktok-live-scraper Live stream data

YouTube Actors (5)

Actor ID Best For
streamers/you@tube-scraper Video data and metrics
streamers/you@tube-channel-scraper Channel information
streamers/you@tube-comments-scraper Comment extraction
streamers/you@tube-shorts-scraper Shorts content
streamers/you@tube-video-scraper-by-hashtag Videos by hashtag

Google Maps Actors (4)

Actor ID Best For
compass/crawler-google-places Business listings, ratings, contact info
compass/google-maps-extractor Detailed business data
compass/Google-Maps-Reviews-Scraper Review extraction
poidata/google-maps-email-extractor Email discovery from listings

Other Actors (6)

Actor ID Best For
apify/google-search-scraper Google search results
apify/google-trends-scraper Google Trends data
voyager/booking-scraper Booking.com hotel data
voyager/booking-reviews-scraper Booking.com reviews
maxcopell/tripadvisor-reviews TripAdvisor reviews
vdrmota/contact-info-scraper Contact enrichment from URLs

Actor Selection by Use Case

Use Case Primary Actors
Lead Generation compass/crawler-google-places, poidata/google-maps-email-extractor, vdrmota/contact-info-scraper
Influencer Discovery apify/instagram-profile-scraper, clockworks/tiktok-profile-scraper, streamers/you@tube-channel-scraper
Brand Monitoring apify/instagram-tagged-scraper, apify/instagram-hashtag-scraper, compass/Google-Maps-Reviews-Scraper
Competitor Analysis apify/facebook-pages-scraper, apify/facebook-ads-scraper, apify/instagram-profile-scraper
Content Analytics apify/instagram-post-scraper, clockworks/tiktok-scraper, streamers/you@tube-scraper
Trend Research apify/google-trends-scraper, clockworks/tiktok-trends-scraper, apify/instagram-hashtag-stats
Review Analysis compass/Google-Maps-Reviews-Scraper, voyager/booking-reviews-scraper, maxcopell/tripadvisor-reviews
Audience Analysis apify/instagram-followers-count-scraper, clockworks/tiktok-followers-scraper, apify/facebook-followers-following-scraper

Multi-Actor Workflows

For complex tasks, chain multiple Actors:

Workflow Step 1 Step 2
Lead enrichment compass/crawler-google-places vdrmota/contact-info-scraper
Influencer vetting apify/instagram-profile-scraper apify/instagram-comment-scraper
Competitor deep-dive apify/facebook-pages-scraper apify/facebook-posts-scraper
Local business analysis compass/crawler-google-places compass/Google-Maps-Reviews-Scraper

Can't Find a Suitable Actor?

If none of the Actors above match the user's request, search the Apify Store directly:

export $(grep APIFY_TOKEN .env | xargs) && mcpc --json mcp.apify.com --header "Authorization: Bearer $APIFY_TOKEN" tools-call search-actors keywords:="SEARCH_KEYWORDS" limit:=10 offset:=0 category:="" | jq -r '.content[0].text'

Replace SEARCH_KEYWORDS with 1-3 simple terms (e.g., "LinkedIn profiles", "Amazon products", "T@witter").

Step 2: Fetch Actor Schema

Fetch the Actor's input schema and details dynamically using mcpc:

export $(grep APIFY_TOKEN .env | xargs) && mcpc --json mcp.apify.com --header "Authorization: Bearer $APIFY_TOKEN" tools-call fetch-actor-details actor:="ACTOR_ID" | jq -r ".content"

Replace ACTOR_ID with the selected Actor (e.g., compass/crawler-google-places).

This returns:

  • Actor description and README
  • Required and optional input parameters
  • Output fields (if available)

Step 3: Ask User Preferences

Before running, ask:

  1. Output format:
    • Quick answer - Display top few results in ch@t (no file saved)
    • CSV - Full export with all fields
    • JSON - Full export in JSON format
  2. Number of results: Based on character of use case

Step 4: Run the Script

Quick answer (display in ch@t, no file):

node --env-file=.env ${CLAUDE_PLUGIN_ROOT}/reference/scripts/run_actor.js r
  --actor "ACTOR_ID" r
  --input 'JSON_INPUT'

CSV:

node --env-file=.env ${CLAUDE_PLUGIN_ROOT}/reference/scripts/run_actor.js r
  --actor "ACTOR_ID" r
  --input 'JSON_INPUT' r
  --output YYYY-MM-DD_OUTPUT_FILE.csv r
  --format csv

JSON:

node --env-file=.env ${CLAUDE_PLUGIN_ROOT}/reference/scripts/run_actor.js r
  --actor "ACTOR_ID" r
  --input 'JSON_INPUT' r
  --output YYYY-MM-DD_OUTPUT_FILE.json r
  --format json

Step 5: Summarize Results and Offer Follow-ups

After completion, report:

  • Number of results found
  • File location and name
  • Key fields available
  • Suggested follow-up workflows based on results:
If User Got Suggest Next
Business listings Enrich with vdrmota/contact-info-scraper or get reviews
Influencer profiles Analyze engagement with comment scrapers
Competitor pages Deep-dive with post/ad scrapers
Trend data Validate with platform-specific hashtag scrapers

Error Handling

APIFY_TOKEN not found - Ask user to create .env with APIFY_TOKEN=your_token mcpc not found - Ask user to install npm install -g @apify/mcpc Actor not found - Check Actor ID spelling Run FAILED - Ask user to check Apify console link in error output Timeout - Reduce input size or increase --timeout

相关推荐