Web Scraper Developer at HEROIC Cybersecurity

View All Jobs

Download File

About the Job: HEROIC Cybersecurity ( HEROIC.com ) is seeking a Web Scraper Developer with deep expertise in building scalable, automated web data collection systems to power our AI-driven cybersecurity intelligence platforms.

You will be responsible for developing, deploying, and maintaining high-performance web crawlers and data extraction pipelines that source threat intelligence, leaked datasets, and cybersecurity-related data from the surface, deep, and dark web.

This role requires strong technical knowledge in Python-based scraping frameworks, distributed data pipelines, and automation systems to collect and normalize large-scale datasets with minimal manual intervention. Your work will directly support HEROIC’s mission to make the internet safer through intelligent, data-driven cybersecurity insights.

What you will do:

Design, develop, and maintain large-scale, distributed web crawlers and data extraction pipelines.
Build automated systems to scrape, clean, and normalize structured and unstructured data from multiple web sources (surface, deep, and dark web).
Develop resilient scraping solutions using frameworks like Scrapy, Selenium, Playwright, or custom Python-based tools.Implement strategies to overcome anti-bot challenges (e.g., proxy rotation, CAPTCHA handling, user-agent management).
Integrate scraped data into centralized databases (PostgreSQL, MySQL etc).
Collaborate with the backend team to design ingestion workflows that feed into HEROIC’s cybersecurity intelligence platform.
Monitor and optimize scraping performance, reliability, and compliance with data usage policies.
Automate deployment and scaling of crawler clusters using Docker, Kubernetes, or cloud infrastructure (AWS/GCP).
Write and maintain APIs, scripts, and ETL components for downstream data processing.
Collaborate closely with software development team to ensure seamless data flow and usability.

Requirements

Bachelor's Degree in Computer Science, Information Technology or related field
Minimum 4 years of hands-on experience in web scraping, data crawling, or data pipeline development.
Strong proficiency in Python and scraping frameworks such as Scrapy, Selenium, Playwright, or BeautifulSoup.
Proven experience building scalable crawlers capable of handling high-volume, dynamic, or JavaScript-rendered sites.
Deep understanding of HTTP, DOM structures, XPath/CSS selectors, and data parsing.
Experience managing asynchronous/concurrent scraping tasks and distributed crawling architectures.
Knowledge of data pipelines, ETL workflows, and API integrations.
Familiarity with NoSQL and SQL databases (e.g., MongoDB, PostgreSQL, Elasticsearch, Cassandra).
Strong command of Linux/Unix systems, shell scripting, and version control (Git).
Experience with containerization and cloud-based deployments (Docker, Kubernetes, AWS, or GCP).
Excellent problem-solving, analytical, and debugging skills.
Strong written and verbal communication in English.
Prior experience in cybersecurity, data intelligence, or dark web data collection (preferred but not required).

Benefits

Position Type: Full-time
Location: India (Remote – Work from anywhere)
Salary: Competitive salary based on experience
Other Benefits: PTOs & National Holidays
Professional Growth: Work with cutting-edge AI, cybersecurity, and SaaS technologies
Culture: Fast-paced, innovative, mission-driven team.

About Us: HEROIC Cybersecurity (HEROIC.com) is building the future of cybersecurity. Unlike traditional solutions, HEROIC takes a predictive and proactive approach to intelligently secure users before an attack or threat occurs. Our work environment is fast-paced, challenging, and exciting. At HEROIC, you’ll collaborate with a team of passionate, driven individuals dedicated to making the world a safer digital place.