Scrape Websites at No Cost with Jina :Get LLM-friendly input from a URL

Web scraping can be a complex and expensive process, but Jina AI simplifies it by allowing you to extract valuable content from websites using just a URL. In this guide, we’ll share how integrating Jina AI into our Make workflows revolutionized our content creation process, enabling us to generate highly relevant, SEO-optimized articles that engage our audience while keeping costs low.

So, what makes Jina AI’s Reader API a game-changer for extracting information from URLs cost-effectively?

Jina AI’s Reader API tackles the challenges of feeding web data into language models (LLMs).
Scraping webpages and passing raw HTML to LLMs can be complex, unreliable, and expensive due to the high volume of unwanted tokens. The Reader API solves this by extracting only the core content from a URL and converting it into clean, LLM-friendly text. This not only ensures high-quality input for your AI systems but also reduces costs by minimizing the number of tokens processed.

Best of all, the Reader API is free, with generous rate limits . Its scalable infrastructure offers high accessibility, concurrency, and reliability, making it an ideal solution for cost-effective LLM grounding.

Note: While we have chosen to use Anthropic Claude in our workflow, you can certainly use other LLMs like ChatGPT or any alternative of your choice. However, we have found that Claude can handle a larger number of input tokens compared to other LLMs, which is particularly beneficial when working with scraped content that may be lengthy. Additionally, even without providing strong prompt guidance, Claude has consistently delivered better results in our case, generating high-quality, coherent articles that require minimal editing.

So Let’s break down the flow of our Make scenario step by step:

  1. Scraping content with Jina AI-Reader: The first step in our workflow is to use the Jina AI-Reader module to scrape and extract the core content from a specified URL. This module efficiently retrieves the essential information from the webpage, eliminating any unnecessary elements. and converts it into a clean, LLM-friendly format.

  2. Generating the outline with Anthropic Claude: Once we have the scraped content from Jina AI-Reader, we feed it into the first instance of the Anthropic Claude module. This module analyzes the content and generates a comprehensive outline or structure for the article. The outline serves as a blueprint for the final content piece, ensuring that it covers all the important points and maintains a logical flow.

Download the prompts we’ve used in this scenario

  1. Creating the article with Anthropic Claude: With the outline generated, we move on to the second instance of the Anthropic Claude module. This module receives both the outline and the scraped content from Jina AI-Reader. Using the outline as a guide and the scraped data as a source of information, this instance of Claude generates the full article. It expands on each point in the outline, incorporating relevant details from the scraped content to create a well-structured, informative piece.

  1. Generating the Google Doc: In the final step of our workflow, we pass the generated article to the Google Docs module.

Note: In this example, we have used the Google Docs module to store our final content piece. However, you can easily adapt this workflow to post the generated article directly to your favorite content management system (CMS) such as WordPress, Medium, or any other platform that integrates with Make.

In addition to the core functionality of extracting clean, LLM-friendly content from URLs, Jina AI’s Reader API offers several advanced features that provide even greater control over the content scraping process:

  1. Target Selector: If the default settings don’t capture the specific content you need, you can use the Target Selector feature to provide a CSS selector.

  2. Wait For Selector: Sometimes, the desired content may not be immediately available when the page loads. In such cases, the Wait For Selector feature comes in handy.

  3. Output Format: Jina AI’s Reader API supports various output formats, giving you the freedom to choose the one that best fits your workflow. You can opt for HTML, plain text, Markdown, or even screenshots of the webpage.

Furthermore, Jina AI offers a powerful search endpoint, s.jina.ai, which enables you to perform web searches and retrieve the top-5 results. This feature is particularly useful for search grounding, where you need to provide your LLM with relevant, up-to-date information from the web.

To begin using Jina AI’s Reader API in your Make workflows, head over to the official website and grab your API key. Each key comes loaded with 1 million free tokens, giving you plenty of room to explore and harness the power of the Reader API in your content creation journey.

If you’re seeking a more streamlined experience, check out our custom Jina AI app built specifically for Make. For a small fee, you’ll gain access to a seamless integration that simplifies incorporating Jina AI’s capabilities into your make workflows.

For those comfortable with Make’s HTTP module, you can also directly call the Jina API using the provided endpoints.

When using the HTTP module to call the Jina API directly, you have the option to make requests with or without an API key. While using the API without a key is possible, providing an API key grants you access to higher rate limits.

Grab the blueprint for this scenario and import it directly into your Make account.

blueprint.json (36.5 KB)

If you have any questions or need further assistance, feel free to reach out at bilalmansouri.com

6 Likes