How to search/filter through a massive amount of data for one resource

What are you trying to achieve?

I’m trying to make a system where I can get info from an API and search through it for specific data points, and then run the selected data through Chat GPT to analyse it, so basically a search engine for a massive amount of data from an API. Lets say I have info on 1000 companies, so far I’ve managed to pull the info, I’ve got it all into one long text (takes a while to load when opening), then parsed it so there are 1000 files (takes even longer), but I don’t know how to put it through Chat GPT as such a large data set. How can I filter or search through the data so I can only access the relevant data, and only put that* data through a chat gpt prompt. I’ve been practicing trying to look for the company Apple but I don’t really know how to approach this. Eventually I want to have info on all companies referenced by the API, which is pretty much the entire stock market, so I figured 1000 companies is a good place to start. I know it’s possible to do quickly because google exists. What modules should I use in what order? And how do I limit loading times? Help please !

Steps taken so far

Pulled the 1000 companies data from the API, parsed them so they are 1000 files, ran into mad problems trying to basically do anything further than that, I’ve messed around with aggregators and iterators trying to address a different problem that I won’t go into here, but I thought I’d just take a step back and ask if there are any better ways to approach this that I’m not seeing.

Google stores their scraped data into an indexed database.

OpenAI GPTs cannot read from databases, and have a limited context window, so you need to be selective on what you pass to it.

So you basically need to “visit” the site yourself to get the content. This is called Web Scraping.

Web Scraping

For web scraping, a service you can use is ScrapeNinja to get content from the page.

ScrapeNinja allows you to use jQuery-like selectors to extract content from elements by using an extractor function. ScrapeNinja also can run the page in a real web-browser, loading all the content and running the page load scripts so it closely simulates what you see, as opposed to just the raw page HTML fetched from the HTTP module.

If you want an example, take a look at Grab data from page and url - #5 by samliew

AI-powered “easier” method

You can also use AI-powered web scraping tools like Dumpling AI.

This is probably the easiest and quickest way to set-up, because all you need to do is to describe the content that you want, instead of inspecting the element to create selectors, or having to come up with regular expression patterns.

The plus-side of this is that such services combine BOTH fetching and extracting of the data in a single module (saving operations), and doing away with the lengthy setup from the other methods.

For more information on the different methods of web scraping, see Overview of Different Web Scraping Techniques in Make 🌐

Hope this helps! Let me know if there are any further questions or issues.

@samliew

P.S.: Investing some effort into the Make Academy will save you lots of time and frustration using Make.

Hope this helps! Let me know if there are any further questions or issues.

I’ve just looked into Dumpling AI

I’ve inserted the API key but as far as I know how to use it, Dumpling seems to be working the same as a HTTP API request. How to I automate it so I can, say, type a company’s name (say… Apple) into a google document, have that variable mapped (ive already sorted this) somewhere in dumpling Ai module, and have Dumpling search through a given API’s data specifically for information relating to variable (apple), and then send that info off to chat gpt? I’ve looked for videos detailing how to do it but I can’t find any! If you could give me advice on how to do such a thing that would be invaluable I’ve been trying to do this kind of thing for hours on hours!