CLI Scraper
📝 Prerequisites
Section titled “📝 Prerequisites”Before starting the CLI scraper, ensure you have:
- Environment Setup: A
.envfile in the root directory with your Google Gemini API key - Dependencies Installed: Run
bun installfrom the root directory
💻 Using the CLI Scraper
Section titled “💻 Using the CLI Scraper”From Root (Recommended)
Section titled “From Root (Recommended)”# Recommended: Run from root directorybun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.htmlIndividual App Alternative
Section titled “Individual App Alternative”# Alternative: Run from scraper directorycd apps/scraperbun cli.js https://anu-vue.netlify.app/guide/components/alert.html📜 Basic Usage
Section titled “📜 Basic Usage”Simple Website Scraping
Section titled “Simple Website Scraping”# Scrape a basic websitebun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html
# Output will be saved to a markdown file# Output: example-com-20240115-143022.mdAdvanced Scraping with Options
Section titled “Advanced Scraping with Options”# Scrape with beast mode for complex sitesbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode beast
# Scrape with custom query for semantic filteringbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --query "Grab the Outlined Alert Code snippets"
# Combine optionsbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode beast --query "Grab the Outlined Alert Code snippets" --output custom-name💭 Command Line Options
Section titled “💭 Command Line Options”URL (Required)
Section titled “URL (Required)”The target URL to scrape. Must be the first argument.
bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html--mode or -m
Section titled “--mode or -m”Choose the scraping strategy:
normal(default): Fast extraction for static contentbeast: AI-powered extraction for interactive content
# Normal mode (default)bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode normal
# Beast mode for SPAs and dynamic contentbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode beast--query or -q
Section titled “--query or -q”Natural language description of desired content for semantic filtering.
# Extract specific contentbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --query "Grab the Outlined Alert Code snippets"--output or -o
Section titled “--output or -o”Specify custom output filename.
# Custom filenamebun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --output my-content--help or -h
Section titled “--help or -h”Display help information and available options.
bun run cli:scraper --help📖 Complete Command Reference
Section titled “📖 Complete Command Reference”# Full syntaxbun run cli:scraper <URL> [OPTIONS]
# Options:# -m, --mode <mode> Scraping mode: normal|beast (default: normal)# -q, --query <query> Natural language content filter# -o, --output <file> Output filename (default: auto-generated)# -h, --help Show help information