I am looking to do 2 things: A: format a URL i am getting from Hubspot to use for Scraptio, to ensure that it is correct formatting.
That mainly involves:
Ensuring that it has “http://” or “https://”, if not then adding it.
Removing any type of “www.” if they are present
Removing anything after the top-level domain, for example “.com/about/history”, then i need to remove everything after “.com”, meaning to remove: “/about/history”
B. Then i am thinking of using the HTTP GET function with an empty body to see if i get a 200 response.
In terms of getting the url a regex is about your simplest way afaik.
You can workshop it at regex101.com if you need to work it out - simply add a few different urls into the test box and your regex at the top. Make sure you select the javascript option on the left side though.
You don’t need to do a GET http call per se to test the url, there’s control under the http controls that lets you simply resolve a url.
For more information, see Text Parser in the Make Help Center:
Match Pattern
The Match pattern module enables you to find and extract string elements matching a search pattern from a given text. The search pattern is a regular expression (aka regex or regexp), which is a sequence of characters in which each character is either a metacharacter, having a special meaning, or a regular character that has a literal meaning.
For experimenting with regular expressions, we recommend the regular expressions 101 website. Just make sure to tick the ECMAScript (JavaScript) FLAVOR in the left panel.
Then you can prepend https:// in front of each match.
If you create a new http component via the add application menu you should see one option under that app is Resolve a Target URL
Also I thought you wanted the www removed from the urls so I think this regex would be slightly better - then just prepend the https:// to the start as @samliew suggests