How to cleanup HTTP get request (html object)

Hello team,
My scenario is a simple one:

  1. Get URL from Spreadsheet
  2. make HTTP get request
  3. paste the object HTML back into the spreadsheet (another cell)

The problem is that the HTML objects seem to be +50,000 characters long, and G-spreadsheet doesn’t allow this. Any ideas/tips to cleanup the code before?

ScrapeNinja has a JavaScript field that helps clean-up the data a bit, but I think their services are paid :confused:

It depends on what you want your output to be.

What do you mean by clean? Do you want to use a Match Pattern module to extract specific details from the HTML code?

Let me know if there are any further questions or issues.

You can also join us in the Make Fans Discord server to chat with other makers!

I’m looking to extract addresses, e-mails and phone numbers from each business. Is there an easier way around this?

The most reliable way is to probably use the OpenAI “Transform Text to Structured Data” module, or the free GroqCreate a JSON Chat Completion” module.

Example

Here is a Groq example on how to specify the variables that you want to match from the text content. This setup is also similar to what you’d do with the OpenAI Structured Data module.

Output

(Google’s about page didn’t contain any Google email or phone numbers)

Hope this helps! Let me know if there are any further questions or issues.

You can also join us in the Make Fans Discord server to chat with other makers!