Skip to content

Instantly share code, notes, and snippets.

@signalpillar
Last active November 15, 2024 23:08
Show Gist options
  • Save signalpillar/a90585b36fd9cf8278b770f2c235825f to your computer and use it in GitHub Desktop.
Save signalpillar/a90585b36fd9cf8278b770f2c235825f to your computer and use it in GitHub Desktop.

Table of Contents

  1. example
  • crawl as MD or extract (custom JSON schema or with just a prompt)
  • a bit slow (~15-30sec) for one page
    • not a problem in my case
  • more reliable than CSS rules
  • with usage quota & cost it may be harder to experiment
  • INTERESTING feature - actions! perform actions during crawling on the webpage
    {
      "url": "https://hellorayo.co.uk/kiss/schedule/",
      "formats": [
        "extract"
      ],
      "extract": {
        "schema": {
          "type": "object",
          "properties": {
            "schedule": {
              "description": "List of programmes for today",
              "type": "array",
              "items": {
                "type": "object",
                "required": [
                  "headline",
                  "body",
                  "startTime",
                  "hosts"
                ],
                "properties": {
                  "headline": {
                    "description": "Programme headline",
                    "type": "string"
                  },
                  "body": {
                    "description": "Programme details",
                    "type": "string"
                  },
                  "hosts": {
                    "description": "Hosts",
                    "type": "array",
                    "items": {
                      "type": "object",
                      "properties": {
                        "name": {
                          "type": "string"
                        }
                      }
                    }
                  },
                  "startTime": {
                    "description": "Time when programme starts in format HH:mm",
                    "type": "string"
                  }
                }
              }
            }
          },
          "required": [
            "schedule"
          ]
        }
      }
    }
    echo curl -X POST https://api.firecrawl.dev/v1/scrape \
        -H "'Content-Type: application/json'" \
        -H "'Authorization: Bearer $apiKey'" \
        -d "'$payload'"

example

{
  "success": true,
  "data": {
    "metadata": {
      "title": "KISS Schedule | List of Upcoming Shows ",
      "description": "Stay up to date with the schedule at KISS. See which shows and presenters are coming up this week and never miss your favourite KISS show again.",
      "language": "en",
      "ogTitle": "KISS - Latest Show Schedule",
      "ogDescription": "Stay up to date with the schedule at KISS. See which shows and presenters are coming up this week and never miss your favourite KISS show again.",
      "ogUrl": "https://hellorayo.co.uk/kiss/schedule/",
      "ogImage": "https://media.bauerradio.com/image/upload/c_crop,g_custom/v1678797505/brand_manager/stations/mahvph4fqjmnp4y8d6wt.png",
      "ogLocaleAlternate": [],
      "ogSiteName": "KISS",
      "og:local": "en_GB",
      "og:url": "https://hellorayo.co.uk/kiss/schedule/",
      "og:site_name": "KISS",
      "fb:app_id": "197948390245380",
      "twitter:site": "kissfmuk",
      "twitter:creator": "kissfmuk",
      "viewport": "width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=5",
      "og:title": "KISS - Latest Show Schedule",
      "og:description": "Stay up to date with the schedule at KISS. See which shows and presenters are coming up this week and never miss your favourite KISS show again.",
      "og:image": "https://media.bauerradio.com/image/upload/c_crop,g_custom/v1678797505/brand_manager/stations/mahvph4fqjmnp4y8d6wt.png",
      "image:type": "image/jpeg",
      "og:image:width": "150",
      "og:image:height": "150",
      "og:image:alt": "KISS Logo",
      "og:type": "website",
      "twitter:card": "summary",
      "twitter:title": "KISS - Latest Show Schedule",
      "twitter:description": "Stay up to date with the schedule at KISS. See which shows and presenters are coming up this week and never miss your favourite KISS show again.",
      "twitter:image:src": "https://media.bauerradio.com/image/upload/c_crop,g_custom/v1678797505/brand_manager/stations/mahvph4fqjmnp4y8d6wt.png",
      "twitter:image:alt": "KISS",
      "next-head-count": "42",
      "sourceURL": "https://hellorayo.co.uk/kiss/schedule/",
      "url": "https://hellorayo.co.uk/kiss/schedule/",
      "statusCode": 200
    },
    "extract": {
      "schedule": [
        {
          "headline": "Non-Stop KISS",
          "body": "All your favourite KISS tunes, back to back perfect for that overnight motivation!",
          "hosts": [],
          "startTime": "01:00"
        },
        {
          "headline": "Hot Right Now",
          "body": "The hottest tunes and brand-new fresh beats from KISS updated every Friday.",
          "hosts": [],
          "startTime": "03:00"
        },
        {
          "headline": "Jordan Lee",
          "body": "Jordan Lee on KISS with the BIGGEST Dance and R&B!",
          "hosts": [],
          "startTime": "04:00"
        },
        {
          "headline": "\"The Alfie Moon Fan Club\" - KISS Breakfast with Jordan & Perri",
          "body": "Jordan & Perri on KISS with the BIGGEST Dance and R&B! Join the boys' 'KISS Breakfast' WhatsApp channel for an exclusive look behind the scenes at KISS.",
          "hosts": [],
          "startTime": "07:00"
        },
        {
          "headline": "KISSTORY Anthems on KISS with Marvin Humes",
          "body": "KISSTORY Anthems on KISS with Marvin Humes dropping the BIGGEST Dance and R&B!",
          "hosts": [],
          "startTime": "11:00"
        },
        {
          "headline": "Tatum on KISS",
          "body": "Tatum's got your afternoon covered with The Biggest Dance and R&B, plus chances to win BIG cash prizes.",
          "hosts": [],
          "startTime": "13:00"
        },
        {
          "headline": "\"They don't know about Gary Barlow's son!\" - The Home Straight With Tyler West",
          "body": "You're on the Home Straight with Tyler West - wrapping up your day with banter, The Biggest Dance and R&B and chances to win big.",
          "hosts": [],
          "startTime": "16:00"
        },
        {
          "headline": "Majestic",
          "body": "Majestic in the mix with the best tunes & biggest anthems for your weekend rave",
          "hosts": [],
          "startTime": "19:00"
        },
        {
          "headline": "Sam Divine",
          "body": "Sam Divine in the mix with her unique blend of upfront house heaters.",
          "hosts": [],
          "startTime": "21:00"
        },
        {
          "headline": "Hix",
          "body": "Hix soundtracking your Friday night with the best in brand new dance music.",
          "hosts": [],
          "startTime": "23:00"
        }
      ]
    }
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment