How to scrape page requiring login?

First of all, I need to scrape my own page. I’m not trying to steal anything here.

In particular, I want to scrape my Substack Notes feed to extract the number of likes and comments of each post.
The URL is Home | Substack. But I need to be logged in to see it.

I found an year-old python tutorial that used an HTTP post request. He inspected the login request to find out the payload.

I did the same, but couldn’t find the information.

How should I proceed?

If there is no captcha involved it’s relatively simple.

You just have to emulate the actual headers and payload sent during login, and store the cookie for a future request.

Hope this helps! Let me know if there are any further questions or issues.

@samliew

Yes, there’s no captcha. How do you inspect the payload to emulate it?

Your web browser’s developer console’s Network tab.

On Google Chrome it’s F12 to open.

Yes, but how do I find the right call?

It varies from site to site, so you have to take a look yourself.

Usually it is a URL with a POST method, and an endpoint like /login

Hope this helps! Let me know if there are any further questions or issues.

@samliew

Ok. I’ll have to scroll through them all again.