Remove ID from Url

If I have multiple url’s that are consistent and but have the a unique id in the middle what is the best way to extract that?

Let’s say the url’s are like this:

http://domain.com/project/inprogress/UNIQUE-ID/viewtasks

All I want is the UNIQUE_ID

Thanks in advance.

Hi @jbuesking

@andyoneil made about the substring function. That should help you getting the Unique ID. Or you can use regex.

Cheers,
Gijs

2 Likes

I suggest a regular expression for this. substring works fine, but you have more control and a lot more power with regular expressions. I use regex101.com to quickly test the input and the string and then pop it into the Match Pattern module in Text Parser tool.

Here’s the regular expression that will extract UNIQUE-ID

inprogress\/([\S]+)\/viewtasks

The key is the ([\S]+) pattern which matches 1 or more (using the +) nonwhitespace characters (using the \S) and puts them in a group to extract using the ()s.

The / in the URLs are escaped with \/ to work properly in a regex.

I use it here, and the URLtoParse was set in a SetVariable, but could be as part of an input bundle from another module.

4 Likes

Exactly what I was looking for. Works perfectly. Thank you!

1 Like

Regex/regular expression is definitely more powerful but will cost you additional operations in this case.

As long as the URL will always be the same, I would prefer the substring function in this case as the string is very simple and it saves operations.

But yeah, depending on your needs, regular expression is very good alternative.

2 Likes

I find that after using substring there are edge cases that force me to do more work, and the expressions with substring becomes harder to read and maintain. Regular Expressions have their own maintenance headaches but I think they are worth more operation simply because of their inherent power to address edge cases much easier, once you know regular expression syntax of course.

The mind bending exercises with substring() and regex’s to be honest end up being very similar in nature.

2 Likes

I agree about the mind bending part of substring.

Was attempting that way first and had me thinking surely there is a better way.

Regex is new to me but Alex’s explanation gave me what I needed to play around and use it a couple different url structures I needed to do the same thing with. I find grasping that easier than substring.

2 Likes

For myself, in cases like this where I know the URL will be consistent, I like to use get(split(url;/);#)

I prefer this over regex, as it saves an operation depending on where you put it.

3 Likes

I have to agree with @luke.ifonly_solution on this one, that is the best way and it costs zero operations :slightly_smiling_face:

2 Likes

That’s a terrific use of the get() array function to retrieve the element if you know the index won’t change of the element you want.

But a word of warning, that if for some reason there is extra cruft added into the URL that uses / it may fail, and in a silent and an unintended way. I always worry about edge cases :worried:.

1 Like

I don’t have any specific concerns with this compared to regex. Both can have edge cases, and both can be worked around/with just as easily.

For the get(split()) in this case, I think the edge cases would be:

  1. Services that would allow a / as part of the unique ID
  2. Services that would allow a / to be used in a previous portion of the URL.
  3. Malformed or oddly trimmed URLS

For 1 & 2, this would need to be considered before utilizing this method.

But, a similar list could be made for regex. :smiley:

The only instance that I have come across would be with Vimeo’s API allowing / in filenames.

1 Like

For the particular requirement, the regex I provided specifically looks for the group between the keywords and doesn’t care about a / inside the unique ID. Malformed URLs would similarly just return no data.

Regex is at the same time more forgiving and more explicit than string substring parsing. I would wager that none of your 3 issues would provide an incorrect unique ID with a regex. And yes in general you could make a list of edge cases for any regex, but that’s the point. You can address the edge cases with a regex and with substring it becomes very complex, with nesting of function calls.

Substring is a simple parser, where as regex is a pattern matcher, a way to process grammars - two very different things.

That link summarizes my preference for regular expressions well. Regular expressions are:

  • Safe from bugs. Grammars and regular expressions are declarative specifications for strings and streams, which can be used directly by libraries and tools. These specifications are often simpler, more direct, and less likely to be buggy then parsing code written by hand.
  • Easy to understand. A grammar captures the shape of a sequence in a form that is easier to understand than hand-written parsing code. Regular expressions, alas, are often not easy to understand, because they are a one-line reduced form of what might have been a more understandable regular grammar.
  • Ready for change. A grammar can be easily edited, but regular expressions, unfortunately, are much harder to change, because a complex regular expression is cryptic and hard to understand.
1 Like

Yup yup. The two final points address some of the potential downsides from that complexity as well as. I’ve seen clients use a lot of copy/paste regex formulas that work, but due to not fully understanding the code, produce their own edge cases.

If we could use regex with a bit more flexibility as a function, vs a module, it would be much more powerful. Limited perhaps in usage with capture groups, etc…

To each their own, but where the URLs are consistent, and operational efficiency matters, I’ll go for get(split()) :slight_smile: I’ve had zero issues with this everywhere it’s in use.

2 Likes

Great discussion everyone! Thanks to all for input.

I’m not heavy on operation use currently, and as a small business just automating some of my basic reoccurring tasks while increasing accuracy by limiting user input errors is my first priority.

For now using the text parser with regex seems to be the best route for me to keep things simple.

When I get further down the road and if operations are getting out of hand, I will then look at ways to increase efficiency in my scenarios.

All I’m doing is getting my pipedrive and xero talking to each other, so I’m not doubling my data input and saving office staff time. That’s worth 10’s of 1000’s of dollars a year to me.

2 Likes

@jbuesking If you would like to understand regex a bit more, I made a small post about the basics of it here:

3 Likes

Thank you! I have bookmarked it.

You can have the best of both worlds (Regex and save operations) by using replace().

replace(URL;/.+\/inprogress\/(.+?)\/.*/;$1)

or to allow “/” in the unique-id:

replace(URL;/.+\/inprogress\/(.+?)\/viewtasks.*/;$1)

6 Likes

That’s a great one @JimTheMondayMan .
For those interested, Jim also placed a nice topic about this here: How to use Regex in Make? - #4 by JimTheMondayMan

3 Likes