Extract links from websites with regex

Daniel_Muller · January 9, 2025, 1:20pm

What are you trying to achieve?

Hi Folks,
I am new here and trying to build a scenario to extract information out of websites. Make advanced academy is done.
I used regex to search for links. Example: www.example.com"/links"

Steps taken so far

I tried the following regex: (/(a-z]{3,}$)/gm regex101: build, test, and debug regex
I regex101 it does exactly want I want. Search for links at the end of the string (URL)with more than 3 letters.
But in my test scenario there is no output when I use the exact regex. Without the $ it works, but brings also things like the following: https:/“/www”.linkedin.com"/company"“/charonium”/
Any thoughts?

Thanks in advance
Daniel

Screenshots: scenario setup, module configuration, errors

Msquare_Automation · January 11, 2025, 8:22am

Hi @Daniel_Muller

you can use this module.

Regards,
Msquare Automation - Platinum Partner of Make
@Msquare_Automation

Kirill_Vodopianov · January 11, 2025, 9:29am

Hi @Daniel_Muller ,
I think the problem is your regex pattern, it looks malformed.
You can try to use this one:
https?://[^\s]+(?=/[a-zA-Z]{3,})
This avoids relying on $ to enforce end-of-line matching and adjust Regex to match full URL structure.

Hope, it helps

Daniel_Muller · January 12, 2025, 8:46am

Thank you for your help.
Yes, meanwhile I recognized the Problem. My Regex works, if the pattern is at the end of the Line, like in the regex101, but not if it is somewhere in a text.
I will try your proposal, Thank you.

Daniel_Muller · January 12, 2025, 8:47am

Thank you for the Tip,
I will try it and let you know if it helped me.
Meanwhile, Thank you.

Daniel_Muller · January 19, 2025, 10:54am

I made it, maybe a little overcomplicated, with Gemini, to get the desired internal links like /xyz.
I then use a “set variable”, to combine the base URL with the extracted link.
Important at this stage: Use the trim function to erase spaces while building the entire link.

Thanks for your assistance

Cheers
Daniel

Topic		Replies	Views
Extract URLs from text How To functions	10	1324	October 12, 2024
Extracting link from Gmail Text body How To text-parser	9	249	September 10, 2024
Extract text (split / parse) Getting Started functions	9	3458	March 23, 2024
Help with RegEx pleasee How To connections , regular-expressions	3	26	August 27, 2024
Need to extract URL from Gmail Email Body Getting Started text-parser	3	718	April 11, 2024

Extract links from websites with regex

What are you trying to achieve?

Steps taken so far

Screenshots: scenario setup, module configuration, errors

Related topics