Scrape Article Content from Redirected Google News Links

I’m trying to scrape the content of a specific article, but the starting point is a Google News link that redirects to the actual article. When I use the http module and set headers, I’m not able to follow the redirect and access the article content directly.
For example, the Google News link is:
This link redirects to the actual article URL:
I’ve tried using the following headers, but I’m still not able to access the article content:

headers = {
‘User-Agent’: ‘Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36’
‘CONSENT’: ‘YES+cb.20220419-08-p0.cs+FX+111’

Do you guys have any suggestions on how I can efficiently scrape the article content from the redirected Google News link?

Welcome to the Make community!

If you view the source code of the HTML file, it looks like a client-side JavaScript redirect. This means that the “Follow redirects” option in the HTTP module won’t work as that option is for server-side 301/2 redirect codes.

From the source code, there are three obvious URLs to the article, so you could probably use a Text Parser “Match Elements” module to extract the URLs, and then filter by the domain name.

Or, you can just aggregate the results and use the built-in function last to get the last URL on the page.


Give it a go and let us know if you have any issues!

samliewrequest private consultation