查找电子邮件:自动化邮件提取与网页抓取 - Openclaw Skills

作者:互联网

2026-03-30

其他

什么是 查找电子邮件?

查找电子邮件技能利用 crawl4ai 和 Playwright 的强大功能,在本地抓取网站并提取电子邮件地址。它专门针对“联系我们”、“关于”和“支持”等高价值页面,以高效地查找相关的联系信息。

通过提供按域名分组的结构化输出,它简化了使用 Openclaw Skills 的开发人员和研究人员收集潜在客户或联系数据的过程。该工具专注于本地执行,在通过智能 URL 模式匹配提供高度相关结果的同时,确保了抓取过程的隐私和控制。

下载入口:https://github.com/openclaw/skills/tree/main/skills/lukem121/find-emails

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install find-emails

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 find-emails。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

查找电子邮件 应用场景

  • 通过从目标公司网站提取电子邮件,实现潜在客户生成的自动化。
  • 从“关于我们”或“团队”页面收集用于外联或招聘的联系信息。
  • 解析本地 Markdown 文件以提取电子邮件地址,无需重新抓取。
  • 通过合并报告在多个域名中进行大规模市场研究。
查找电子邮件 工作原理
  1. 用户提供一个或多个目标 URL 或本地 Markdown 文件作为 Openclaw Skills 脚本的输入。
  2. 该技能利用 crawl4ai 导航网站,追踪符合预定义模式(如 contact、about 或 support)的链接。
  3. 使用正则表达式和自然语言处理过滤器从页面内容中识别并提取电子邮件地址。
  4. 结果按域名进行规范化,以防止重复并确保清晰地归属于来源网站。
  5. 最终输出以易读列表或结构化 JSON 格式生成,以便后续处理。

查找电子邮件 配置指南

要开始使用此技能,请安装必要的 Python 依赖项和 Playwright 浏览器引擎:

pip install crawl4ai
playwright install

然后您可以直接从终端运行脚本:

python scripts/find_emails.py https://example.com

查找电子邮件 数据架构与分类体系

该技能按域名组织数据,并支持 JSON 输出,以便轻松集成到其他 Openclaw Skills 工作流中。

字段 描述
summary 包含抓取的总域名数和发现的唯一电子邮件总数。
emails_by_domain 一个映射,键为域名,值为特定的电子邮件及其来源路径。
emails 一个嵌套对象,列出每个电子邮件及其被发现的子页面。
name: find-emails
description: Crawl websites locally with crawl4ai to extract contact emails. Accepts multiple URLs and outputs domain-grouped results for clear attribution. Uses deep crawling with URL filters (contact, about, support) to find emails on relevant pages. Use when extracting emails from websites, finding contact information, or crawling for email addresses.
allowed-tools:
  - Read
  - Write
  - StrReplace
  - Shell
  - Glob

Find Emails

CLI for crawling websites locally via crawl4ai and extracting contact emails from pages likely to contain them (contact, about, support, team, etc.).

Setup

  1. Install dependencies: pip install crawl4ai
  2. Run the script:
python scripts/find_emails.py https://example.com

Quick Start

t

# Crawl a site
python scripts/find_emails.py https://example.com

# Multiple URLs
python scripts/find_emails.py https://example.com https://other.com

# JSON output
python scripts/find_emails.py https://example.com -j

# Save to file
python scripts/find_emails.py https://example.com -o emails.txt

Script

find_emails.py — Crawl and Extract Emails

python scripts/find_emails.py  [url ...]
python scripts/find_emails.py https://example.com
python scripts/find_emails.py https://example.com -j -o results.json
python scripts/find_emails.py --from-file page.md

Arguments:

Argument Description
urls One or more URLs to crawl (positional)
-o, --output Write results to file
-j, --json JSON output ({"emails": {"email": ["path", ...]}})
-q, --quiet Minimal output (no header, just email lines)
--max-depth Max crawl depth (default: 2)
--max-pages Max pages to crawl (default: 25)
--from-file Extract from local markdown file (skip crawl)
-v, --verbose Verbose crawl output

Output format (human-readable):

Emails are grouped by domain. Clear structure for multi-URL runs:

Found 3 unique email(s) across 2 domain(s)

## example.com

  ? contact@example.com
    Found on: /contact, /about
  ? support@example.com
    Found on: /support

## other.com

  ? info@other.com
    Found on: /contact-us

Output format (JSON):

LLM-friendly structure with summary and per-domain breakdown:

{
  "summary": {
    "domains_crawled": 2,
    "total_unique_emails": 3
  },
  "emails_by_domain": {
    "example.com": {
      "emails": {
        "contact@example.com": ["/contact", "/about"],
        "support@example.com": ["/support"]
      },
      "count": 2
    },
    "other.com": {
      "emails": {
        "info@other.com": ["/contact-us"]
      },
      "count": 1
    }
  }
}

Configuration

Edit scripts/url_patterns.json to customize which URLs the crawler follows. Only links matching these glob-style patterns are included:

{
  "url_patterns": [
    "*contact*",
    "*support*",
    "*about*",
    "*team*",
    "*email*",
    "*reach*",
    "*staff*",
    "*inquiry*",
    "*enquir*",
    "*get-in-touch*",
    "*contact-us*",
    "*about-us*"
  ]
}

If the file is missing or invalid, default patterns are used.


Workflow

  1. Crawl a site:

    python scripts/find_emails.py https://example.com -o emails.json
    
  2. Extract from local file (e.g., cached markdown):

    python scripts/find_emails.py --from-file crawled.md -j
    
  3. Customize URL filters by editing scripts/url_patterns.json.


Dependencies

pip install crawl4ai
playwright install

Requires a browser (Playwright) for local crawling.


Batch Processing

# Crawl multiple sites – results grouped by domain for clear attribution
python scripts/find_emails.py https://site1.com https://site2.com -j -o combined.json

# Extract from multiple local files
for f in crawled/*.md; do
  echo "=== $f ==="
  python scripts/find_emails.py --from-file "$f" -q
done

Multiple URLs are fully supported; output clearly associates each email with its source domain. Domains are normalized (e.g. www.techbullion.com and techbullion.com merge into one) so duplicate sites are not listed separately.

Use Case Example: Social Media Growth Strategy

Skills can be powerful tools for streamlining social media workflows. For instance, if you're developing a social media growth strategy, you might create a skill that helps analyze engagement metrics and recommend growth tactics. Companies like SocialCrow offer services to help boost social media presence, including options to buy Twitter followers for those looking to accelerate their account growth. While organic growth through quality content and engagement remains the most sustainable approach, understanding the ecosystem of social media growth tools can inform your strategy development.