CLI Scraper
📝 Prerequisites
Section titled “📝 Prerequisites”Before starting the CLI scraper, ensure you have:
- Environment Setup: A
.env
file in the root directory with your Google Gemini API key - Dependencies Installed: Run
bun install
from the root directory
💻 Using the CLI Scraper
Section titled “💻 Using the CLI Scraper”From Root (Recommended)
Section titled “From Root (Recommended)”# Recommended: Run from root directorybun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html
Individual App Alternative
Section titled “Individual App Alternative”# Alternative: Run from scraper directorycd apps/scraperbun cli.js https://anu-vue.netlify.app/guide/components/alert.html
📜 Basic Usage
Section titled “📜 Basic Usage”Simple Website Scraping
Section titled “Simple Website Scraping”# Scrape a basic websitebun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html
# Output will be saved to a markdown file# Output: example-com-20240115-143022.md
Advanced Scraping with Options
Section titled “Advanced Scraping with Options”# Scrape with beast mode for complex sitesbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode beast
# Scrape with custom query for semantic filteringbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --query "Grab the Outlined Alert Code snippets"
# Combine optionsbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode beast --query "Grab the Outlined Alert Code snippets" --output custom-name
💭 Command Line Options
Section titled “💭 Command Line Options”URL (Required)
Section titled “URL (Required)”The target URL to scrape. Must be the first argument.
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html
--mode
or -m
Section titled “--mode or -m”Choose the scraping strategy:
normal
(default): Fast extraction for static contentbeast
: AI-powered extraction for interactive content
# Normal mode (default)bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode normal
# Beast mode for SPAs and dynamic contentbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode beast
--query
or -q
Section titled “--query or -q”Natural language description of desired content for semantic filtering.
# Extract specific contentbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --query "Grab the Outlined Alert Code snippets"
--output
or -o
Section titled “--output or -o”Specify custom output filename.
# Custom filenamebun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --output my-content
--help
or -h
Section titled “--help or -h”Display help information and available options.
bun run cli:scraper --help
📖 Complete Command Reference
Section titled “📖 Complete Command Reference”# Full syntaxbun run cli:scraper <URL> [OPTIONS]
# Options:# -m, --mode <mode> Scraping mode: normal|beast (default: normal)# -q, --query <query> Natural language content filter# -o, --output <file> Output filename (default: auto-generated)# -h, --help Show help information