Formatting URLs for Scraptio and checking if it is a valid email

Hi Everyone

I am looking to do 2 things:
A: format a URL i am getting from Hubspot to use for Scraptio, to ensure that it is correct formatting.

That mainly involves:

  1. Ensuring that it has “http://” or “https://”, if not then adding it.
  2. Removing any type of “www.” if they are present
  3. Removing anything after the top-level domain, for example “.com/about/history”, then i need to remove everything after “.com”, meaning to remove: “/about/history”

B. Then i am thinking of using the HTTP GET function with an empty body to see if i get a 200 response.

Examples of formatting

I have not found a simple solution without using data stores, multiple replace functions or starting to dabble in regex.

Has anyone had any success doing this easily without too many workarounds?

In terms of getting the url a regex is about your simplest way afaik.
You can workshop it at regex101.com if you need to work it out - simply add a few different urls into the test box and your regex at the top. Make sure you select the javascript option on the left side though.

You don’t need to do a GET http call per se to test the url, there’s control under the http controls that lets you simply resolve a url.

Cheers

Simon

3 Likes

You can use a Text Parser “Match Pattern” module with this regular expression pattern, to match the hostname part of the URLs.

^(?:\w+:\/\/)?(?:www.)?(?[^/]+)

Regex test: https://regex101.com/r/RdVAmm/

Screenshot

Output


For more information, see Text Parser in the Make Help Center:

Match Pattern
The Match pattern module enables you to find and extract string elements matching a search pattern from a given text. The search pattern is a regular expression (aka regex or regexp), which is a sequence of characters in which each character is either a metacharacter, having a special meaning, or a regular character that has a literal meaning.


Then you can prepend https:// in front of each match.

Hope this helps!

4 Likes

Hi Simon

Thanks for the answer!

What do you mean by ‘theres control under the http controls’ that lets me resolve a url?

Hi Sam

Thank you so much, exactly what I needed!

If you create a new http component via the add application menu you should see one option under that app is Resolve a Target URL

Also I thought you wanted the www removed from the urls so I think this regex would be slightly better - then just prepend the https:// to the start as @samliew suggests

^(?:https?:\/\/)?(?:www\.)?(?<url>[^\/\n]+).*
https://$<url>

Cheers

Simon

2 Likes