Data scraping, extract specific text from http module

What are you trying to achieve?

Hi Everyone - I’d like to extract only the skills from a http request, so the output in my googlesheet would only contain: GMP, Analysis skills, LDAP, Organizational skills, Communication skills. How can I achieve that?

Screenshots: scenario setup, module configuration, errors

blueprint.json (35.7 KB)

Hi,
Basically you need to create 2 variables:

  1. startIndex: Index of “Skills:” + 7 (It will give you a start of your desire string) - indexOf(; 0)
  2. EndIndex: end of your desire string - indexOf(; “”; startIndex)

Then use a substring function to extract and trim to clean the data

Thanks for your prompt reply. May I ask your help with a blueprint to see how it should look like?

Thanks in advance Kirill.

Hi @Attila_Herter and welcome to the Make Community!

If the string “Skills:” appears only once in your data (and you are sure of that), you can use a regular expression to extract the information.

If you don’t know how to write that:

  • Give your sample text to ChatGPT (or Gemini, or Claude, or Mistral, or…) and ask it to give you an ECMAScript compatible regulkar expression to extract the skills. In your request, make sure to tell it what you want as a response from the sample text you gave it.
  • Then go to https://regex101.com/ to test the regexp. If you’re not getting the result you want, tell ChatGPT what’s happening and ask it to fix the regexp. (Make sure you select ECMAScript in Regex101’s tool.)
  • Once it works, use the Text Parser module and the Match pattern action. Put your regexp in the field for the regular expression, and you should be able to extract the skills as you want. The Text Parser would replace the Tools module you are using.

If you want someone to help you with your blueprint, export the one you have and post it here so someone can download it, make the necessary fixes, and re-upload it.

L

1 Like