Skip to content

API Server

Before starting the server, ensure you have:

  1. Environment Setup: A .env file in the root directory with your Google Gemini API key
  2. Dependencies Installed: Run bun install from the root directory
Terminal window
# Start from root directory (automatically loads .env)
bun run dev:server

Benefits:

  • Automatically loads environment variables from root .env
  • Consistent with other workspace commands
  • No need to navigate to subdirectories
Terminal window
# Alternative: Start from server directory
cd apps/server
bun dev

Note: This approach also loads the root .env file automatically due to the server’s configuration.

The server will start on http://localhost:8080 by default.

Terminal window
# Health check
curl http://localhost:8080/health
# Expected response:
{
"status": "healthy",
"service": "SniffHunt Scraper API",
"version": "1.0.0",
"timestamp": "xxxxx"
}

Returns API health status and configuration validation.

Response:

{
"status": "healthy",
"service": "SniffHunt Scraper API",
"version": "1.0.0",
"timestamp": "xxxxx"
}
POST /scrape - Streaming Content Extraction
Section titled “POST /scrape - Streaming Content Extraction”

Real-time streaming extraction with progress updates.

Request Body:

{
"url": "https://anu-vue.netlify.app/guide/components/alert.html",
"mode": "normal" | "beast",
"query": "natural language content description"
}

Example:

Terminal window
curl -N http://localhost:8080/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://anu-vue.netlify.app/guide/components/alert.html", "mode": "beast"}'

Response: Server-Sent Events (SSE) stream with real-time updates.

POST /scrape-sync - Synchronous Content Extraction
Section titled “POST /scrape-sync - Synchronous Content Extraction”

Standard synchronous extraction for simple integrations.

Request Body:

{
"url": "https://anu-vue.netlify.app/guide/components/alert.html",
"mode": "normal" | "beast",
"query": "natural language content description"
}

Parameters:

  • url (required): Target URL for content extraction
  • mode (optional): Extraction strategy
    • normal: Standard content extraction (default)
    • beast: Interactive interface handling with AI intelligence
  • query (optional): Natural language description for semantic filtering

Response Format:

{
"success": true,
"content": "# Extracted Content\n\nMarkdown-formatted content here...",
"metadata": {
"title": "Page Title",
"url": "https://anu-vue.netlify.app/guide/components/alert.html",
"mode": "beast",
"extractionTime": 3.2,
"contentLength": 15420
}
}

Example:

Terminal window
curl -X POST http://localhost:8080/scrape-sync \
-H "Content-Type: application/json" \
-d '{
"url": "https://anu-vue.netlify.app/guide/components/alert.html",
"mode": "normal",
"query": "pricing information"
}'

The server loads configuration from the root .env file:

Terminal window
# Required
GOOGLE_GEMINI_KEY=your_gemini_api_key_here
# Server Configuration
PORT=8080 # Server port
CORS_ORIGIN=* # CORS allowed origins
# Scraping Configuration
MAX_RETRY_COUNT=2 # Maximum retry attempts
RETRY_DELAY=1000 # Delay between retries (ms)
PAGE_TIMEOUT=10000 # Page load timeout (ms)

The server supports configurable CORS settings:

Terminal window
# Allow all origins
CORS_ORIGIN=*
  • Best for: Static content, blogs, documentation
  • Performance: Fast extraction
  • Capabilities: Basic content extraction (still better than paid services even in normal mode)
  • Best for: SPAs, dynamic dashboards, interactive interfaces
  • Performance: Intelligent extraction with AI processing
  • Capabilities:
    • UI interaction (clicks, scrolls, navigation)
    • Modal and popup handling
    • Dynamic content loading
    • Semantic content understanding

Use the query parameter to extract specific content like this:

Terminal window
# Extract Avatar Code snippets
curl -X POST http://localhost:8080/scrape-sync \
-H "Content-Type: application/json" \
-d '{
"url": "https://anu-vue.netlify.app/guide/components/avatar.html",
"mode": "beast",
"query": "Grab the 'Avatar Code snippets'"
}'
# Extract API reference and code examples
curl -X POST http://localhost:8080/scrape-sync \
-H "Content-Type: application/json" \
-d '{
"url": "https://anu-vue.netlify.app/guide/components/alert.html",
"mode": "normal",
"query": "Grab API reference and code examples"
}'