About the Job: HEROIC Cybersecurity ( HEROIC.com ) is seeking a Web Scraper Developer with deep expertise in building scalable, automated web data collection systems to power our AI-driven cybersecurity intelligence platforms.
You will be responsible for developing, deploying, and maintaining high-performance web crawlers and data extraction pipelines that source threat intelligence, leaked datasets, and cybersecurity-related data from the surface, deep, and dark web.
This role requires strong technical knowledge in Python-based scraping frameworks, distributed data pipelines, and automation systems to collect and normalize large-scale datasets with minimal manual intervention. Your work will directly support HEROIC’s mission to make the internet safer through intelligent, data-driven cybersecurity insights.
What you will do:
- Design, develop, and maintain large-scale, distributed web crawlers and data extraction pipelines.
- Build automated systems to scrape, clean, and normalize structured and unstructured data from multiple web sources (surface, deep, and dark web).
- Develop resilient scraping solutions using frameworks like Scrapy, Selenium, Playwright, or custom Python-based tools.Implement strategies to overcome anti-bot challenges (e.g., proxy rotation, CAPTCHA handling, user-agent management).
- Integrate scraped data into centralized databases (PostgreSQL, MySQL etc).
- Collaborate with the backend team to design ingestion workflows that feed into HEROIC’s cybersecurity intelligence platform.
- Monitor and optimize scraping performance, reliability, and compliance with data usage policies.
- Automate deployment and scaling of crawler clusters using Docker, Kubernetes, or cloud infrastructure (AWS/GCP).
- Write and maintain APIs, scripts, and ETL components for downstream data processing.
- Collaborate closely with software development team to ensure seamless data flow and usability.
Requirements
- Bachelor's Degree in Computer Science, Information Technology or related field
- Minimum 4 years of hands-on experience in web scraping, data crawling, or data pipeline development.
- Strong proficiency in Python and scraping frameworks such as Scrapy, Selenium, Playwright, or BeautifulSoup.
- Proven experience building scalable crawlers capable of handling high-volume, dynamic, or JavaScript-rendered sites.
- Deep understanding of HTTP, DOM structures, XPath/CSS selectors, and data parsing.
- Experience managing asynchronous/concurrent scraping tasks and distributed crawling architectures.
- Knowledge of data pipelines, ETL workflows, and API integrations.
- Familiarity with NoSQL and SQL databases (e.g., MongoDB, PostgreSQL, Elasticsearch, Cassandra).
- Strong command of Linux/Unix systems, shell scripting, and version control (Git).
- Experience with containerization and cloud-based deployments (Docker, Kubernetes, AWS, or GCP).
- Excellent problem-solving, analytical, and debugging skills.
- Strong written and verbal communication in English.
- Prior experience in cybersecurity, data intelligence, or dark web data collection (preferred but not required).
Benefits
- Position Type: Full-time
- Location: India (Remote – Work from anywhere)
- Salary: Competitive salary based on experience
- Other Benefits: PTOs & National Holidays
- Professional Growth: Work with cutting-edge AI, cybersecurity, and SaaS technologies
- Culture: Fast-paced, innovative, mission-driven team.
About Us: HEROIC Cybersecurity (HEROIC.com) is building the future of cybersecurity. Unlike traditional solutions, HEROIC takes a predictive and proactive approach to intelligently secure users before an attack or threat occurs. Our work environment is fast-paced, challenging, and exciting. At HEROIC, you’ll collaborate with a team of passionate, driven individuals dedicated to making the world a safer digital place.