API Documentation

Getting started

Here you can find all details on how to get started with the ScrapeBoss API.

API Endpoint

POST method is required and connections will default to HTTPS.

https://scrapeboss.com/api/v1/scrape

Authorization

Requests to the API require authentication. To do this, pass your API key as a request header.

"authentication": "14ac5499cfdd2bb2859e4476d2e5b1d2bad079bf"

Define scrape URL

You can set the webpage to scrape by passing a URL parameter in the body of your API request.

PHP Object
$postData = ['url'=>'https://bbc.co.uk/example-article']

Response data

You receive a vast amount of data back from the ScrapeBoss API. Some articles are able to support returning more data than others:

Success

A Successful response from the API should yield the following results:

  • Data
    • main_image (src & alt)
    • main_text
    • main_title
  • Meta
    • viewport
    • description
    • x-country
    • x-audience
    • cps_audience
    • cps_changequeueid
  • Other
    • robots
    • theme-color
    • apple-itunes-app
    • apple-mobile-web-app-title
    • application-name
    • x-audience
    • msapplication-TileImage
    • msapplication-TileColor
    • mobile-web-app-capable

Data that may or may not be included:

  • Social
    • Facebook OG
    • pages
    • social-image
    • admins
    • title
    • description
    • pages
    • social-image (src & alt)
    • admins
    • Twitter
    • card
    • site
    • title
    • description
    • domain
    • creator
    • social-image (src & alt)
  • Meta
    • viewport
    • description
    • x-country
    • x-audience
    • cps_audience
    • cps_changequeueid
  • Extras
    • canonical
    • other_images
    • authors

Failure

If you encounter any problems when trying to retrieve data back from ScrapeBoss please contact us at hello@scrapeboss.com


Examples

You can see usage of ScrapeBoss with different languages below:

Python

import http.client

conn = http.client.HTTPConnection("127,0,0,1")

headers = {
'authentication': "YOUR_API_KEY_HERE", 'cache-control': "no-cache", }

conn.request("POST", "api,v1,scrape", headers=headers)

res = conn.getresponse() data = res.read()

print(data.decode("utf-8"))

cURL

curl -X POST \
'https://scrapeboss.com/api/v1/scrape?url=https://www.bbc.co.uk/news/uk-47891737' \
-H 'authentication: YOUR_API_KEY_HERE' \
-H 'cache-control: no-cache'

php cURL

$curl = curl_init();

curl_setopt_array($curl, array(
CURLOPT_PORT => "8000",
CURLOPT_URL => "https://scrapeboss.com/api/v1/scrape?url=https://www.bbc.co.uk/news/uk-47891737",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "POST",
CURLOPT_HTTPHEADER => array(
"authentication: YOUR_API_KEY_HERE", "cache-control: no-cache" ),
));

$response = curl_exec($curl); $err = curl_error($curl);

curl_close($curl);

if ($err) { echo "cURL Error #:" . $err; } else { echo $response; }

Go (GoLang)

package main

import ( "fmt" "net/http" "io/ioutil" )

func main() { url := "https://scrapeboss.com/api/v1/scrape?url=https://www.bbc.co.uk/news/uk-47891737"

req, _ := http.NewRequest("POST", url, nil)

req.Header.Add("authentication", "afbf371a191e102f47493ab582b1545ec642a383c2a361f9b6e9abf06b846ab0")
req.Header.Add("cache-control", "no-cache")

res, _ := http.DefaultClient.Do(req)

defer res.Body.Close() body, _ := ioutil.ReadAll(res.Body)

fmt.Println(res) fmt.Println(string(body))
}

Found a problem?

Report a bug, Contact us.