Link filtering in my scraper workflow

Right now I’m stuck at the link filtering step in my scenario:

  • I’m scraping websites for leads, extracting all <a> elements, and then using a filter to only keep “good links” (like /about, /team, /karriere, /leistungen, etc.).

  • The issue: my filter is blocking everything. Even when the links clearly contain those keywords, nothing passes through.

  • I tried combining an Include regex (to allow only relevant paths) with an Exclude regex (to filter out assets like .png, .css, mailto: etc.), but no links make it past the filter.

  • Because of that, my Array Aggregator right after “Links extrahieren” receives no bundles and stays empty.

  • So ChatGPT later in the flow only gets the person’s name/company but no real website text context.

Basically: the regex filter setup is the bottleneck — it’s blocking all links instead of letting through the relevant ones.

Could you help me adjust the regex/filter logic so that only the useful company subpages (About, Team, Services, Jobs, etc.) pass through, while skipping logos, assets, or legal pages?

image

image

image

Hi,

Did you test your patter on regexp101 or someo other tool? If it works there (using the ECMAScript variant) then it could be uns supported in the Make regexp engine r there are subtleties you need to tweak. I know in the past I’ve had to tweak complex regexp to get them to work. It migh also be useful to put your regexps in consecutive modules instead of all in the same module, just to make sure you can isolate whych one is working and which one is breaking the process.

L

I tried this regex, and many others… still nothing goes through

1 Like

Can you paste the regexp between code tags

Like this? It will be much easier to work with. If you can also include a few URLs (or attach HTML output that you would be working with, provided it's not confidential), it'll be easier for others to test and find a solution.

L

@Azariel Regex patterns are by default case sensitive. Always make sure to use them correctly with a correct filter. If pattern generated by AI, try to use both filters pattern match and does not match as sometime prompting errors can make issues. So, either one of them might work or you need to create a new pattern.

Use this prompt : Make.com regex helper:
I need ONE rule using Text operators → Matches pattern (always mention case insensitive or sensitive).
Target field: .
Goal: or .
Tokens: <list them, e.g., png|jpeg|pdf>.
Constraints:

  • Use ECMAScript/JavaScript regex flavor.
  • Return only the pattern text. No /…/ delimiters and no flags like i m s g. Case choice is handled by the operator I pick.
  • Match the whole value with ^ and $.
  • Add a 6-line pass/fail table to sanity check.

Example input lines with sample text input and expected output:
<put 3 that should pass>
<put 3 that should fail>

Best,
@Prem_Patel

1 Like