Identifying specific content within a website

,

I’m trying to create a scenario that will identify written case study content within a given website. Using HTTP Make a Request to identify the sitemap.xml of the site doesnt provide enough detail, because typically the case studies reside several steps below the main URL - and there isn’t a uniform structure (some are stored as HTML, some are PDF etc). I could use a scraper, but that will be costly, and difficult to find the right content.

Any suggestions?

Could you share an example url? Will the relevant content be present in sitemap for sure?

Hi - here are a few examples:

Ideally I’d like to be able to point at a top-level URL (i.e. www.cisco.com) and identify all case studies automatically.