Announcing Zapdroid's open-source web crawler
We are excited to announce the open-sourcing of Zapdroid's Multi-Threaded Website Crawler! Designed to facilitate effective communication between Large Language Models (LLMs) and external web sources, our crawler provides developers and researchers with a robust tool to enhance AI-driven applications.
Why Zapdroid's Website Crawler?
In today's data-driven world, the ability for LLMs to access and process real-time information from the web is invaluable. Zapdroid's crawler addresses this need by offering a scalable and efficient solution tailored to integrate seamlessly with AI models, ensuring they can retrieve and utilize up-to-date data effortlessly.
Key Features
- Optimized for LLM Integrations: Tailored to enable smooth interactions between LLMs and external websites, ensuring AI models can fetch and utilize the necessary information effectively.
- Multi-Threaded Processing: Harness the power of multiple CPU cores to handle concurrent crawl jobs, significantly enhancing data retrieval speed.
- Robust Retry Mechanism: Implements up to 3 retries with configurable intervals (10 seconds, 20 seconds, and 1 minute) to ensure successful data fetching even in unstable network conditions.
- Smart Rate Limiting: Configurable global and per-domain rate limits prevent server overloads and ensure respectful interaction with target websites.
- Robots.txt Compliance: Automatically adheres to the crawling rules specified in each website's
robots.txt
, promoting ethical data collection practices. - RESTful API with Swagger Documentation: Easily submit crawl requests and retrieve results through intuitive HTTP endpoints, with comprehensive API documentation available via Swagger UI.
- Job Queue Management with Redis and Bull: Efficiently manages crawl jobs, ensuring scalability and reliability even under high demand.
- Detailed Result Retrieval: Access structured crawl results using a unique
queueId
, providing essential data including URLs, content, and error statuses. - Comprehensive Logging: Utilizes Winston for consistent and informative logging, aiding in monitoring and debugging.
Open Source Benefits
By open-sourcing our website crawler, Zapdroid aims to:
- Foster Collaboration: Encourage developers to contribute, enhance, and customize the crawler to meet diverse needs.
- Promote Transparency: Share our development practices and architectural insights to help others build robust web scraping solutions.
Get Started
Visit our GitHub repository to explore the source code, contribute to its development, and integrate the crawler into your applications:
🔗 Zapdroid Crawler on GitHub
What is Zapdroid?
Zapdroid is an advanced AI Agent Framework designed to empower developers and organizations to build intelligent, autonomous agents capable of interacting seamlessly with various data sources and services. Our framework provides the tools and infrastructure necessary to create and deploy agents on various platforms like Slack, WhatsApp, Telegram, websites and IoT devices.