Crawl Cloudflare protected websites | Apify does not fully work

Hi together,

I tried to crawl a website with the HTTP Get modul but received the following error:

InvalidConfigurationError
Error: 403 Forbidden
<!DOCTYPE html><html lang="en-US"><head><title>Just a moment...</title><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta name="robots" content="noindex,nofollow"><meta name="viewport" content="width=device-width,initial-scale=1"><style>*{box-sizing:border-box;margin:0;padding:0}html{line-height:1.15;-webkit-text-size-adjust:100%;color:#313131;font-family:system-ui,-apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji}body{display:flex;flex-direction:column;height:100vh;min-height:100vh}.main-content{margin:8rem auto;max-width:60rem;padding-left:1.5rem}@media (width <= 720px){.main-content{margin-top:4rem}}.h2{font-size:1.5rem;font-weight:500;line-height:2.25rem}@media (width <= 720px){.h2{font-size:1.25rem;line-height:1.5rem}}#challenge-error-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSI+PHBhdGggZmlsbD0iI0IyMEYwMyIgZD0iTTE2IDNhMTMgMTMgMCAxIDAgMTMgMTNBMTMuMDE1IDEzLjAxNSAwIDAgMCAxNiAzbTAgMjRhMTEgMTEgMCAxIDEgMTEtMTEgMTEuMDEgMTEuMDEgMCAwIDEtMTEgMTEiLz48cGF0aCBmaWxsPSIjQjIwRjAzIiBkPSJNMTcuMDM4IDE4LjYxNUgxNC44N0wxNC41NjMgOS41aDIuNzgzem0tMS4wODQgMS40MjdxLjY2IDAgMS4wNTcuMzg4LjQwNy4zODkuNDA3Ljk5NCAwIC41OTYtLjQwNy45ODQtLjM5Ny4zOS0xLjA1Ny4zODktLjY1IDAtMS4wNTYtLjM4OS0uMzk4LS4zODktLjM5OC0uOTg0IDAtLjU5Ny4zOTgtLjk4NS40MDYtLjM5NyAxLjA1Ni0uMzk3Ii8+PC9zdmc+);background-repeat:no-repeat;background-size:contain;padding-left:34px}@media (prefers-color-scheme:dark){body{background-color:#222;color:#d9d9d9}}</style><meta http-equiv="refresh" content="360"></head><body><div class="main-wrapper" role="main"><div class="main-content"><noscript><div class="h2"><span id="challenge-error-text">Enable JavaScript and cookies to continue</span></div></noscript></div></div><script>(function(){window._cf_chl_opt={cvId: '3',cZone: "angebote.ratzel-dasautohaus.de",cType: 'managed',cRay: '932b1cf64a754197',cH: 'WY806ruWhuIyFpc4QS6TOSA0XWDdeE20Ts0Kk04Cshk-1745052194-1.2.1.1-T3yN3s7RC6_CM8sDQ.NDDq0DT5_6Dljw3djlprHH6zzxCxW_53hM23_ZX7O5Eawd',cUPMDTk: "\/vw-t-roc\/?__cf_chl_tk=b9OMsSTZw0S6qbuQj7JNwRr0XR9SJC0lCbpOdZ3qUS0-1745052194-1.0.1.1-pJmHd.vlYzmE1YHmmWnKlnqp2qJraKkTPuK7nH5hrH0",cFPWv: 'g',cITimeS: '1745052194',cTplC: 0,cTplV: 5,cTplB: 'cf',cK: "unsupported_browser_beacon",fa: "\/vw-t-roc\/?__cf_chl_f_tk=b9OMsSTZw0S6qbuQj7JNwRr0XR9SJC0lCbpOdZ3qUS0-1745052194-1.0.1.1-pJmHd.vlYzmE1YHmmWnKlnqp2qJraKkTPuK7nH5hrH0",md: "WjXXqoQQICUUGwiGcqGboABxB6Dp.BTgTduUDEj7S5o-1745052194-1.2.1.1-upgRM.o3e_ebpWR1CFEjlNjv4hANv2QyHOOujSbkVwWIClHgUal_EqW.YG5FSRaPXGmvHUWUQNxMkD54IBlu0no0VEbJQhNWT4y0D9qnVyaiWR8638TzYusT8JtRA2XKMUb26jA3hsOpIw5FLR_JHiWCgRhoNOc6cjjztb0wH758BlG2Ilq0bErcCqt1Q46xXyAUhsxtoS3_Su3E0CbtLgeKoM_eSMjLH7Bo4.dTx4LdKZbW_33y6zHDKgxp.AQWz6hGBj4gnCkbfwG50IaKn_Z5nh6Wuu.pQ.LNRLzIodQlnK9fW0i06Xl68knE70jgyJ8Ujlnehl5a0tsbZRpFWv8V5Fhbxl7q.RASjD3ejkuYf8vwdjDvSsj1lBLhhcHfYPNQpduZxxAasLd2DhoRiXTqPhaCMRs6.lgmN3.CuCOPwiw9upDyN1_szVKcBKcgUvkJ_BodeQJ4TFMjHunvwAJ9zhSsoQhQGlD2FiWInax27hJh21yTiCsZBOH9ny56HEeref2DpaZ29uOBx4C9ZUULf5rqXqRmprNAc0irhd43y5LFaYXcuY6MrNn80OgqMWfV.Qks8gJc2dloyJTMX6fY7Xdh70mqv_BEovx5gX4hgrAqKTMzeiKV4il5.l63zYhTsy5da3sZmAFczbTEa1w3a6IgMbhFqSCr1ZPmneYCZc2JQKHyh_LnGSxBwEaebnweNt9Xiv9JcgKXqd4SQcsgZxz9dYRpdqFPSS4bANGnndeR5xz2_Al6JxAyq5bGSayeNz8jhq.GJimqM5apADjczee4fEnMleWMcZe.kmBCHuO_K6GCUPSCnU6d57qd2m3HDJudR7be3gcKWfYPUsUaGP.YMr0D_QkOqO14G71x6FP6Z8wK2cJfiJFDMzoBim9KLA4r7stVorD9CDB_b6Pmhr7Le4TN9deHm8eKSPHOUt3GUT5jRMJkLVL9kge5OpvxmeP6u8_FSi5vJmURGW.YoJym.AtCYGTw4.aiBam8rXsnN44t3h7exBG7lPWfPF.re4GKyEPEaIUHmDGCql4M5ODMsZyEkuqUCq57I8IkoG2PXQ7ARGWxEZ4DoappYgooFdXKeq6hr5QkiBMdaZcLqDqYKclk83cykDq8O8o7PqelAHpKzhbfyBkwsGfB",mdrd: "BD.qYMbhWNscHvliyt6HKjNwP8CPkC0rd.QrxJ5RG2Q-1745052194-1.2.1.1-9I9ne4HHz7bikfS5Qlv8Zl3kybXzN3QuqmbPKiRjYf1Bqori.huGKoBC8uZoQJ_IrVGHvT0acOp61b01n1UMWKEHCaVA8ji91F8UfbG2juNdHgdCfzgYG..p4I9WJy8.DYvO37eS_Zk2i7uHWAxLHN6DNRAHBkwndWWGHPCfzzSpgE9Hu8utc4auw8GyFCM0BzdDNFyzX2bOphSRE8u9FXCtiJIS3oiazcm_asO9sHfVJ9sbeLO._mKzqZFP0JrtTlqSSDJ1dtIiKu2bY_Xwpky1oIa55ekU2oE_dF4IozAtpypJ044wAJT1OFmJtEfSz7cKckIzBeJFuc6UOfyHBfQvVQPtvAFalGV2fvm6kHqRWfKiCMiI0uuqDjBeSt0Q6PepbIGr77wDVD2gqoUlnuWWeTW6Rz32Lxu2XkELD3fgoaZaPRo82cj2LSd09WKNWfDH63BAt.psNQlkyaIELlX0Ohi6M4SW.GHquPX5ig4ZuiJ1fEyEFf4EQEpWnlzSFNhn3xgPWMyA6qE0vL3W2VC6nivgN1vR4kf_59hSnXdi5lRfExjeW9JjH9d_twPAsG6h8a5YMEnX1OXpbpopxEIJ9HxzitfMAdibmPVrtZXYuImWhpFsGzylAR0obzzRGrRxCizxFIleJWg2H2rS2sgZTl4jLznsZBPjIysjOGF4l4H2k1NTNmkkJ.gedZZoS7eunbTeiYK_AzH9DNqof6nuuZ9764I1yESYnAfz39AFX5i.Q8o0WxrDip4SwnGJQ7fs8Pem4mGun2YlbKuBcbl2560MRI9zUdyrrXp99umcj6yfBHNifTqtVyi2yrn7RDCbMgH9vqvI07NF5tdgj7QrF27y1i5bVw2RWUY12IJ263Qj3C7YPCS9QbjBcJnsS4lL_CmsJXAdMtsfKUOv0GajyN1rooeFGLCs6O_EWQUUPNxGNfVTN9a7KI7rMmn9Wtfa_P4TtMKFKLQ9aFaRR_172WYb67Kd1Zj6pULiirFGaRmdXWiMCgBAowzsPoaKIrJgqfJWgNvKg.5CeGzeB1iuHSkIGHLtTHDYGqxSDXlcLJil9GpKn2tTimg9kg8cxYgQ_LU0xY8xGddDS.RXyGZswqVu7LIJIZVuL7fUaaK0vi0LmCselVpB633lmKEr.BXbn8PK3nuWiZgiHxMUT0qOPbfg2RozJrFV0gnevPVxw2bcPi6ACFR05QIMXv4e.mXd9X6_IaeWu9Jz8dytqKBAdyMKq97POILtQQVfHw.5zDbDRwIgfG9JV6M4D0lKxDnmcYvcpGOfr6d18_8Dir1VmZHw9anyr0JmZQHI2YgkRF6iGLDRwrh0DlMGDeh69E.dp7vPsovGn.oleFpY_cFiVfTUg4VJD7IBupjSq6lCa7w9IBw19z08HTF..VkTlGEo4nTPVOOsC8JFgRi2.ze6wz481.c4GqbSA94TfWeV11zYuLA4U7iKNOKeiTCgWD2rBB8DjWX4JcOoOz7oF3w3zFDNVJbOFUqOGpq8xQTl8d8M8rZtcKv9ieRknGRC913cDLONmem5Bj0uGTU0zmPdBZ5gvlurUDG2aY2A1VX_oXS0Y3Crzwnq4eQ2x.fqzP5NW4QcJPUPb8u6R9PAMNEl4GDzenHlbw_ppURL7QKDqYFQLZUbaKViv63Dm9zuJcVJHr_jEirqyHYT_a4CaAanLCGz4z2vPHf812EJgXxIg_u5cCAMx6zQDaCK9dqGVjSPLWuWdYroytT1dA.pLLvuhlBCmQWik.N0ST5e9hhKu.VZWEZmfrcsoez8ZAPOxMmVfX02GkXENv4q6FOtmJCrn2_374SSMdAoj09OBWg2ad4dn36HfjsdUtzCnJ1xAD1po5uVfLqM62RHgEnjV2ABhz7eU0lhycPex7OdcELw4KiQOlK_FqRO6joRAZAMa_juCREO.ugjqQ1lqfK356TzNS7W6J2lrBO5zeHxewL0ouPCNAj2ljEeCOBclgSFbzBSRanXYny8oVVN.qQvFesRUQ0kClsQxZ_Ut9rsnUg.Bwia8GQiaN_rzsfGbTC6Z7mpHrAvPkhxFzbmXH_Jv7inwoJsJir_YYfFin9Kebquy48ninO2TAlT5FOz3OhdKvgQtC27j_JMpOBsTiBIE0Rh3IGlQ0BZgJFOVnaH4hr_YXKSp.Nc7ZJ0IP7Ic92n_VBAPlLh7zt6oFJyJHMymuSxC0zcohQQ9ipjHUy2fNtjseDRbO2o4ik5EGCYlbo3VvxAU2hcpjPQnp5axEmvVTiEww7ZTLMbdPI.FjxB92mjyJk5qweaPEBevCt3AcOksJ1h.DfeClD4VZ9msLoMVDPFDNJEhatIj3mXVfzJ0z8yWgsayxl8NtvyXDVu6hOBa90g2qClp95f7LvBamalkFKBoTB2Bn72bEXYO4GSeM3j3Ss.ePenwCCYsTZJ7dVkkLR52lMZ4Ul6pjcSmHi9lvaQBKN9rn1KWhWloklMwgrKr6q1ebJsOQ9bBjPZsc3sCN8H0TdqEfiP6vf.98tZs68Ui0.NEOiVFo.J3LcjPeGMwYk49Ww3T.EZ4gP3wr7nGw3w9FKjbFC0PXQ8DRh3qUtU7FMtcIujzdh.3dRtbh.rAfUIujUo4iBisMgFvY920rQAofF3SChJeYEU0QZUOpMfpwoB5Xa0UI9fsVeFZOq.h6F8IsSr34NAS3cFO5IqgF8Gn6aY31e7_OkVbCpcHgLYdgrv9xuYgtU0aWlQJg922a.m_ajd50RyU.hlRauA.AbFTs1tTlI_.jUFZzXeGwRpvC9VXjaRz6c4sFtC55V7ZCg69p.9Nf3K53MlzzUNlURdpB3yJul2qAsTJn0t7dZxN1aBf5Vr3ICGyNd.zCQ"};var cpo = document.createElement('script');cpo.src = '/cdn-cgi/challenge-platform/h/g/orchestrate/chl_page/v1?ray=932b1cf64a754197';window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash;window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.slice(0, location.href.length - window._cf_chl_opt.cOgUHash.length).indexOf('?') !== -1 ? '?' : location.search;if (window.history && window.history.replaceState) {var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;history.replaceState(null, null, "\/vw-t-roc\/?__cf_chl_rt_tk=b9OMsSTZw0S6qbuQj7JNwRr0XR9SJC0lCbpOdZ3qUS0-1745052194-1.0.1.1-pJmHd.vlYzmE1YHmmWnKlnqp2qJraKkTPuK7nH5hrH0" + window._cf_chl_opt.cOgUHash);cpo.onload = function() {history.replaceState(null, null, ogU);}}document.getElementsByTagName('head')[0].appendChild(cpo);}());</script></body></html>

Origin
HTTP

Automatic error handler
If you want to handle this error automatically, choose one of the following options. This will create a new error-handler route in your scenario. You can then expand the route in any way you like.

Seems it is due to Cloudflare protection. I then added some headers to pretend I am human. Did not work either. Now I tried Apify “Crawl Single URL” but it only gives little bit of the content, like 20%. What other tool can I use for that case to crawl a website and get fully content? Dumpling.ai seems expensive.

Regards,
Soeren

1 Like

Solved - I just use Firecrawl right now - seems extremely good.

1 Like

Welcome to the Make community!

So you basically need to “visit” the site yourself to get the content. This is called Web Scraping.

Incomplete Scraping

Are you getting NO output from the Text Parser “HTML to Text” module? This is because there is NO text content in the HTML! The entire page content you are scraping is hosted in a script tag, which is dynamically generated and placed onto the page using JavaScript when loaded and run on the user’s web browser on the client-side. Make is a server-side runtime environment, so using the HTTP modules, you get just the script tags, and those script tags are ignored by the Text Parser “HTML to Text” module because it is NOT a HTML layout element.

Using the Make HTTP “Make a request” does NOT run any of those JavaScript scripts, so there is no content on the page other than a default message that tells you to enable JavaScript.

This is NOT a Make platform, or Text Parser, or Regular Expression issue/bug.

You CANNOT use normal scraping integrations like ScrapingBee or HTTP “Make a request” module to fetch this page’s structure.

You will need to use ScrapeNinja’s “Scrape (Real browser)” module to emulate a real person visiting the site using a web browser, as client-side JavaScript needs to run to parse the JSON data in the script tags, and generate the page structure and content.

For more information and demo using ScrapeNinja, see Scraping Bee Integration Runtime Error 400

Web Scraping

For web scraping, a service you can use is ScrapeNinja to get content from the page.

ScrapeNinja allows you to use jQuery-like selectors to extract content from elements by using an extractor function. ScrapeNinja also can run the page in a real web-browser, loading all the content and running the page load scripts so it closely simulates what you see, as opposed to just the raw page HTML fetched from the HTTP module.

If you want an example, take a look at Grab data from page and url - #5 by samliew

AI-powered “easier” method

You can also use AI-powered web scraping tools like Dumpling AI.

This is probably the easiest and quickest way to set-up, because all you need to do is to describe the content that you want, instead of inspecting the element to create selectors, or having to come up with regular expression patterns.

The plus-side of this is that such services combine BOTH fetching and extracting of the data in a single module (saving operations), and doing away with the lengthy setup from the other methods.

More information, other methods

For more information on the different methods of web scraping, see Overview of Different Web Scraping Techniques in Make 🌐

Hope this helps! Let me know if there are any further questions or issues.

— @samliew

P.S.: Investing some effort into the Make Academy will save you lots of time and frustration using Make.

@sorenzi Can you elaborate on how you handled this? I am currently trying to do the same and running into output issues with Firecrawl.

Can you share more details?
I used:
Format: Markdown
Only main content: No
Timeout: 30000

I had “main content” set to “yes”.

I couldn’t get the prebuilt connector to work, but I did an HTTP module with the same parameters and it worked no problem.