crwl.io is a web crawling and scraping service.
Use our no-code tool to configure crawlers and scrapers yourself.
Or simply have your crawlers conveniently created by us.
The crwl.io web app is a so-called no-code tool, that allows you to configure crawlers to your needs without any programming knowledge.
Our user interface enables you to define crawling and scraping procedures using pre-made and configurable building blocks known as "Steps".
Once you run your crawler, it performs the steps you defined and eventually provide you with the desired data.
If you need additional functionality for your crawlers that is not already included in the pre-made steps, you have the option to create your own steps and install them as extensions in the web app. *
Instructions for programming your custom steps can be found in the documentation of the crwlr.software open-source library. You can then conveniently share your own code through a (private) GitHub repository. You'll find more detailed information on this feature in the web app.
* This feature is available starting from the S plan and is not included in the XS plan.
Typically, the term "web scraping" refers to extracting content from (HTML) websites, which is why many services focus solely on that. However, in practice, there are often cases where data needs to be extracted from other formats like JSON, XML, or CSV. With crwl.io, that's no problem.
Many web crawling and scraping libraries and services offer the sole option of loading websites using a so-called headless browser (an automated, regular web browser without a user interface). However, in most cases, the use of a browser is not actually necessary.
In most situations a simple HTTP client loading only the HTML source code of a website without the linked assets (such as images, CSS, and JavaScript), is sufficient. Consequently, the HTTP client is much more efficient and performant, and it is the default choice in the crwl.io web app. And if needed, you can always switch to using a headless browser.
Of course, you can start your crawlers not only manually on demand but also schedule them to run automatically at the times you prefer. This way, you keep your crawling data up to date continuously.
After a crawler ran successfully, you can easily download the collected data as a JSON, XML, or CSV file. If you wish to integrate crwl.io crawlers into your own or third-party applications, you can also retrieve your data through our REST API. When combined with webhooks, you can fully automate the integration into your applications.
Webhooks are the final ingredient to smoothly integrate the data collected by crwl.io into your own applications. By setting up a webhook URL (a URL that is part of your application) for a crawler, it will notify your application after each successful run. The webhook URL's invocation will transmit the necessary data for retrieving the results of the crawler run.
The foundation of the crwl.io web app is the free and open-source web crawling and scraping library from crwlr.software. Therefore, you can always see in detail how the crawlers and the steps available in the app work and, if necessary, contribute improvements or changes.
Limitation/Feature | XS | S | M | L |
---|---|---|---|---|
Requests/Day1 Requests/Month |
5.000 150.000 |
15.000 450.000 |
60.000 1.800.000 |
250.000 7.500.000 |
Storage2 | 1 GB | 5 GB | 20 GB | 50 GB |
Private Instance 3 | ||||
Custom Extensions4 | ||||
Price including VAT |
€ 36 per month |
€ 72 per month |
€ 240 per month |
€ 720 per month |
Price including VAT |
€ 396 per year |
€ 792 per year |
€ 2.640 per year |
€ 7.920 per year |
1) Refers to HTTP requests executed by your crawlers. It's important to note that requests sent via a headless browser are weighted by a factor of five, as they are demanding significantly more resources. See Javascript Execution . The daily limit is based on the average number of daily HTTP requests within a month. If the limit is exceeded on certain days, it's not an issue, as long as the daily average remains below it.
2) The required storage space for the data collected by the crawlers, as well as the use of the response cache.
3) In the XS plan, all crawlers run on a shared infrastructure. Starting from the S plan and above, each customer gets their own instance of the crwl.io app.
4) For the same reason (shared infrastructure in the XS plan), it is only possible to install custom extensions in the app starting from the S plan.
The crwl.io app is currently in closed beta.
You can pre-register for an invitation here.