I’m trying to create a scenario that will identify written case study content within a given website. Using HTTP Make a Request to identify the sitemap.xml of the site doesnt provide enough detail, because typically the case studies reside several steps below the main URL - and there isn’t a uniform structure (some are stored as HTML, some are PDF etc). I could use a scraper, but that will be costly, and difficult to find the right content.
Any suggestions?