i am trying to parse text that has emails, website urls, etc using text parser and this regex, https?://(?!lh[1-6].|maps.|lh\d+.).?.[a-z]{2,}(/[a-zA-Z0-9-_/?&=%.])?, however i am getting output like the below: [
{
“i”: 1,
“$2”: “/Place”
},
{
“i”: 2,
“$1”: “maps.”,
“$2”: “/maps/api/staticmap”
},
{
“i”: 3,
“$1”: “maps.”,
“$2”: “/maps/api/staticmap”
},
{
“i”: 4,
“$1”: “lh3.”,
“$2”: “/”
},
{
“i”: 5,
“$1”: “lh4.”,
“$2”: “/”
it should be matching only valid urls. Can you pls tell me how to resolve this issue.
Steps taken so far
i tried changing the regular expression but doesnt help
You need to use two text parsers one for emails and one for urls when you do make sure you select global match and multiline option. I think the regex you are using is just capturing the paths of the urls but you can try with different regular expressions.
Another way of doing it is by using text parser’s match elements module, just select the predefined patter and you can extract it from the text