API Server
📝 Prerequisites
Section titled “📝 Prerequisites”Before starting the server, ensure you have:
- Environment Setup: A
.env
file in the root directory with your Google Gemini API key - Dependencies Installed: Run
bun install
from the root directory
🟢 Starting the Server
Section titled “🟢 Starting the Server”From Root (Recommended)
Section titled “From Root (Recommended)”# Start from root directory (automatically loads .env)bun run dev:server
Benefits:
- Automatically loads environment variables from root
.env
- Consistent with other workspace commands
- No need to navigate to subdirectories
Individual App Alternative
Section titled “Individual App Alternative”# Alternative: Start from server directorycd apps/serverbun dev
Note: This approach also loads the root .env
file automatically due to the server’s configuration.
The server will start on http://localhost:8080
by default.
Verify Server is Running
Section titled “Verify Server is Running”# Health checkcurl http://localhost:8080/health
# Expected response:{ "status": "healthy", "service": "SniffHunt Scraper API", "version": "1.0.0", "timestamp": "xxxxx"}
📡 API Reference
Section titled “📡 API Reference”Health Check Endpoints
Section titled “Health Check Endpoints”GET /
& GET /health
Section titled “GET / & GET /health”Returns API health status and configuration validation.
Response:
{ "status": "healthy", "service": "SniffHunt Scraper API", "version": "1.0.0", "timestamp": "xxxxx"}
Content Extraction Endpoints
Section titled “Content Extraction Endpoints”POST /scrape
- Streaming Content Extraction
Section titled “POST /scrape - Streaming Content Extraction”Real-time streaming extraction with progress updates.
Request Body:
{ "url": "https://anu-vue.netlify.app/guide/components/alert.html", "mode": "normal" | "beast", "query": "natural language content description"}
Example:
curl -N http://localhost:8080/scrape \ -H "Content-Type: application/json" \ -d '{"url": "https://anu-vue.netlify.app/guide/components/alert.html", "mode": "beast"}'
Response: Server-Sent Events (SSE) stream with real-time updates.
POST /scrape-sync
- Synchronous Content Extraction
Section titled “POST /scrape-sync - Synchronous Content Extraction”Standard synchronous extraction for simple integrations.
Request Body:
{ "url": "https://anu-vue.netlify.app/guide/components/alert.html", "mode": "normal" | "beast", "query": "natural language content description"}
Parameters:
url
(required): Target URL for content extractionmode
(optional): Extraction strategynormal
: Standard content extraction (default)beast
: Interactive interface handling with AI intelligence
query
(optional): Natural language description for semantic filtering
Response Format:
{ "success": true, "content": "# Extracted Content\n\nMarkdown-formatted content here...", "metadata": { "title": "Page Title", "url": "https://anu-vue.netlify.app/guide/components/alert.html", "mode": "beast", "extractionTime": 3.2, "contentLength": 15420 }}
Example:
curl -X POST http://localhost:8080/scrape-sync \ -H "Content-Type: application/json" \ -d '{ "url": "https://anu-vue.netlify.app/guide/components/alert.html", "mode": "normal", "query": "pricing information" }'
🎛️ Configuration
Section titled “🎛️ Configuration”Environment Variables
Section titled “Environment Variables”The server loads configuration from the root .env
file:
# RequiredGOOGLE_GEMINI_KEY=your_gemini_api_key_here
# Server ConfigurationPORT=8080 # Server portCORS_ORIGIN=* # CORS allowed origins
# Scraping ConfigurationMAX_RETRY_COUNT=2 # Maximum retry attemptsRETRY_DELAY=1000 # Delay between retries (ms)PAGE_TIMEOUT=10000 # Page load timeout (ms)
CORS Configuration
Section titled “CORS Configuration”The server supports configurable CORS settings:
# Allow all originsCORS_ORIGIN=*
📊 Scraping Modes
Section titled “📊 Scraping Modes”Normal Mode
Section titled “Normal Mode”- Best for: Static content, blogs, documentation
- Performance: Fast extraction
- Capabilities: Basic content extraction (still better than paid services even in normal mode)
Beast Mode
Section titled “Beast Mode”- Best for: SPAs, dynamic dashboards, interactive interfaces
- Performance: Intelligent extraction with AI processing
- Capabilities:
- UI interaction (clicks, scrolls, navigation)
- Modal and popup handling
- Dynamic content loading
- Semantic content understanding
Semantic Content Filtering Examples
Section titled “Semantic Content Filtering Examples”Use the query
parameter to extract specific content like this:
# Extract Avatar Code snippetscurl -X POST http://localhost:8080/scrape-sync \ -H "Content-Type: application/json" \ -d '{ "url": "https://anu-vue.netlify.app/guide/components/avatar.html", "mode": "beast", "query": "Grab the 'Avatar Code snippets'" }'
# Extract API reference and code examplescurl -X POST http://localhost:8080/scrape-sync \ -H "Content-Type: application/json" \ -d '{ "url": "https://anu-vue.netlify.app/guide/components/alert.html", "mode": "normal", "query": "Grab API reference and code examples" }'