Skip to content

CLI Scraper

Before starting the CLI scraper, ensure you have:

  1. Environment Setup: A .env file in the root directory with your Google Gemini API key
  2. Dependencies Installed: Run bun install from the root directory
Terminal window
# Recommended: Run from root directory
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html
Terminal window
# Alternative: Run from scraper directory
cd apps/scraper
bun cli.js https://anu-vue.netlify.app/guide/components/alert.html
Terminal window
# Scrape a basic website
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html
# Output will be saved to a markdown file
# Output: example-com-20240115-143022.md
Terminal window
# Scrape with beast mode for complex sites
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode beast
# Scrape with custom query for semantic filtering
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --query "Grab the Outlined Alert Code snippets"
# Combine options
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode beast --query "Grab the Outlined Alert Code snippets" --output custom-name

The target URL to scrape. Must be the first argument.

Terminal window
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html

Choose the scraping strategy:

  • normal (default): Fast extraction for static content
  • beast: AI-powered extraction for interactive content
Terminal window
# Normal mode (default)
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode normal
# Beast mode for SPAs and dynamic content
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode beast

Natural language description of desired content for semantic filtering.

Terminal window
# Extract specific content
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --query "Grab the Outlined Alert Code snippets"

Specify custom output filename.

Terminal window
# Custom filename
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --output my-content

Display help information and available options.

Terminal window
bun run cli:scraper --help
Terminal window
# Full syntax
bun run cli:scraper <URL> [OPTIONS]
# Options:
# -m, --mode <mode> Scraping mode: normal|beast (default: normal)
# -q, --query <query> Natural language content filter
# -o, --output <file> Output filename (default: auto-generated)
# -h, --help Show help information