Extract links from websites with regex

What are you trying to achieve?

Hi Folks,
I am new here and trying to build a scenario to extract information out of websites. Make advanced academy is done.
I used regex to search for links. Example: www.example.com"/links"

Steps taken so far

I tried the following regex: (/(a-z]{3,}$)/gm https://regex101.com/r/jTskAe/1
I regex101 it does exactly want I want. Search for links at the end of the string (URL)with more than 3 letters.
But in my test scenario there is no output when I use the exact regex. Without the $ it works, but brings also things like the following: https:/“/www”.linkedin.com"/company"“/charonium”/
Any thoughts?

Thanks in advance
Daniel

Screenshots: scenario setup, module configuration, errors



1 Like

Hi @Daniel_Muller

you can use this module.

Regards,
Msquare Automation - Platinum Partner of Make
@Msquare_Automation

1 Like

Hi @Daniel_Muller ,
I think the problem is your regex pattern, it looks malformed.
You can try to use this one:
https?://[^\s]+(?=/[a-zA-Z]{3,})
This avoids relying on $ to enforce end-of-line matching and adjust Regex to match full URL structure.

Hope, it helps

1 Like

Thank you for your help.
Yes, meanwhile I recognized the Problem. My Regex works, if the pattern is at the end of the Line, like in the regex101, but not if it is somewhere in a text.
I will try your proposal, Thank you.

Thank you for the Tip,
I will try it and let you know if it helped me.
Meanwhile, Thank you.

1 Like