Quick Start Guide
⚙️ Prerequisites
Section titled “⚙️ Prerequisites”Before we begin, make sure you have:
- Bun >= 1.2.15 (Install Bun)
- Google Gemini API Key (Get free key from Google AI Studio)
🛠️ Installation & Setup
Section titled “🛠️ Installation & Setup”Step 1: Clone the Repository
Section titled “Step 1: Clone the Repository”git clone https://github.com/mpmeetpatel/sniffhunt-scraper.gitcd sniffhunt-scraper
Step 2: Install Dependencies
Section titled “Step 2: Install Dependencies”bun install
This installs all dependencies for the entire workspace including all apps.
Step 3: Configure Environment
Section titled “Step 3: Configure Environment”cp .env.example .env
Edit the .env
file and add your Gemini API key:
# RequiredGOOGLE_GEMINI_KEY=your_actual_api_key_here
# Optional (You can provide multiple keys here to avoid rate limiting & load balancing)GOOGLE_GEMINI_KEY1=your_alternative_key_1GOOGLE_GEMINI_KEY2=your_alternative_key_2GOOGLE_GEMINI_KEY3=your_alternative_key_3
# Optional (defaults shown)PORT=8080MAX_RETRY_COUNT=2RETRY_DELAY=1000PAGE_TIMEOUT=10000CORS_ORIGIN=*
🚀 Launch Options
Section titled “🚀 Launch Options”Choose your preferred way to use SniffHunt:
Option 1: API Server + Web Interface
Section titled “Option 1: API Server + Web Interface”Perfect for interactive use and web application integration.
Start the API Server
Section titled “Start the API Server”bun run dev:server
The server will start on http://localhost:8080
Start the Web Interface (Optional)
Section titled “Start the Web Interface (Optional)”# In a new terminalbun run dev:web
Open http://localhost:6001
in your browser for the beautiful web interface.
Test the Setup
Section titled “Test the Setup”# Test the APIcurl -X POST http://localhost:8080/scrape-sync \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "mode": "normal"}'
Option 2: MCP Integration for AI Tools
Section titled “Option 2: MCP Integration for AI Tools”Integrate SniffHunt directly with Claude Desktop, Cursor, or other MCP-compatible AI tools.
Build and Setup MCP Server
Section titled “Build and Setup MCP Server”bun run setup:mcp
This builds the MCP server and makes it globally available.
Configure Your AI Client
Section titled “Configure Your AI Client”Add this to your MCP client configuration (e.g., Cursor, Windsurf, VSCode, Claude Desktop):
{ "mcpServers": { "sniffhunt-scraper": { "command": "npx", "args": ["-y", "sniffhunt-scraper-mcp-server"], "env": { "GOOGLE_GEMINI_KEY": "your-api-key-here" } } }}
Test MCP Integration
Section titled “Test MCP Integration”Restart your AI client and try asking:
Scrape https://anu-vue.netlify.app/guide/components/alert.html & grab the ‘Outlined Alert Code snippets’
The AI will automatically use SniffHunt to extract the content!
Option 3: CLI Scraper
Section titled “Option 3: CLI Scraper”Perfect for automation, scripting, and one-off extractions.
Basic Usage
Section titled “Basic Usage”# Scrape any websitebun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html
# Output saved as:scraped.raw.md or scraped.md # (based on mode and query automatically generated name)scraped.html
Advanced Usage
Section titled “Advanced Usage”# Use normal mode for static sitesbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --mode normal
# Use beast mode for complex sitesbun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --query "Grab the Outlined Alert Code snippets" --mode beast
# Add semantic query for focused extraction & Custom output filenamebun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --query "Grab the Outlined Alert Code snippets" --output my-content
✅ Verify Installation
Section titled “✅ Verify Installation”Health Check
Section titled “Health Check”# Check if API server is runningcurl http://localhost:8080/health
# Should return:{ "status": "healthy", "service": "SniffHunt Scraper API", "version": "1.0.0"}
Test Extraction
Section titled “Test Extraction”API Test
Section titled “API Test”curl -X POST http://localhost:8080/scrape-sync \ -H "Content-Type: application/json" \ -d '{ "url": "https://anu-vue.netlify.app/guide/components/alert.html", "mode": "normal", "query": "Grab the Outlined Alert Code snippets" }'
CLI Test
Section titled “CLI Test”bun run cli:scraper https://anu-vue.netlify.app/guide/components/alert.html --query "Grab the Outlined Alert Code snippets"
Web UI Test
Section titled “Web UI Test”- Open
http://localhost:6001
- Enter URL:
https://anu-vue.netlify.app/guide/components/alert.html
- Select mode: “Normal”
- Add query: “Grab the Outlined Alert Code snippets”
- Click “Extract Content”
🔧 Troubleshooting
Section titled “🔧 Troubleshooting”Common Issues
Section titled “Common Issues”Need Help?
Section titled “Need Help?”- 🐞 Bug Reports: GitHub Issues
- 💬 Questions: GitHub Discussions
- 📧 Support: contact me on Twitter
Ready to extract some content? Choose your preferred integration method above and start scraping! 🚀