Hi everyone!
I am currently working on a project for my supervisor to automate her invoice notification system. Before we commit to Make.com (we need permissions from management) I am making a test run.
If you read my previous post I have managed to pull the info I want from the email (in the test case the date). In my test case I want to search for corrections to articles for that date on the New York Times site and then send a pdf of that article to my email.
I’m currently hitting a problem with the http request I am making. It works, but I can’t get it to pull data for just the date/corrections. I’m assuming I have to use query strings, content requests or headers, but even with significant research I can’t figure out how to do this. I’ll add some photos below.
If anyone has any suggestions, or even a better description of how to use the various fields in HTTP it would be greatly appreciated.
You’re pulling the full web page, in an unstructured form, from the NYT website, such as the page Corrections. This is designed to display to humans in a browser, not for automated processing. Headers, query string etc. does not matter in this case.
So the response includes all parts of the page, including the top menu, which is what you’re seeing, as well as the corrections further down the page (and many other things)
Working with websites like this is called Scraping. The Documentation of Text Parser has a short section on this as well. It suggests some external tools.
While relatively inefficient, one method I can think of without coding knowledge, and staying within Make would be to feed the full page text into an AI, and prompt it to get the links you’re looking for. Probably a low cost AI like a nano model would be reliable enough for this.
The more robust way is to parse the HTML with selectors, but that’s not particularly easy without developer help for a complex webpage. Although again ChatGPT can help finding the ways to match it. You might be able to use regular expressions with AI help. But it might break on small changes to their webpage structure.
Or even better, use the NYT API if available, which is designed for automated processing. But that again might require some technical experience, and I did not check the pricing and terms.
Thank you so much, I actually ended up solving it through the url. I looked at the url when I was on the page I wanted and adapted it to change with the date.
2 Likes