Scraping Projects
Looking for freelance Scraping jobs and project work? PeoplePerHour has you covered.
Scrape Emails using Make.com
I need someone who is an expert using make.com to setup an automation to scrape 3000 emails per month (approx). I already use Linkedin Sales Nav - so I want to avoid this platform.
4 days ago27 proposalsRemoteopportunity
Web scraping solution
Requerimiento: Solución de Web Scraping para Portales de Medios de Información Objetivo del Proyecto Diseñar e implementar una solución automatizada de web scraping para recopilar información de portales de medios de información. Los datos recolectados deberán ser almacenados en una base de datos mediante la ejecución de un procedimiento almacenado (stored procedure) en SQL. ________________________________________ Alcance 1. Realizar scraping en sitios web de medios de información predefinidos, incluyendo, pero no limitado a: o Noticias nacionales o Noticias internacionales o Deportes o Entretenimiento o Economía 2. Recolectar información clave, como: o Título de la noticia o Fecha de publicación o Autor (si está disponible) o Resumen o encabezado o Contenido completo del artículo o Categoría o sección del portal o URL de la noticia o Fuente del medio 3. Guardar los datos en una base de datos SQL mediante un stored procedure que permita: o Validar la integridad de los datos antes de insertarlos. o Evitar duplicidad de registros. ________________________________________ Requerimientos Funcionales 1. Extracción de Datos: o Implementar un módulo para realizar scraping de páginas HTML dinámicas y estáticas. o Manejar sitios con diferentes estructuras HTML, incluyendo aquellos que requieran interacción con JavaScript. 2. Gestión de Acceso: o Incorporar control de acceso para sitios protegidos (si aplica), como inicios de sesión. o Considerar técnicas anti-bloqueo (User-Agent Rotation, IP Proxy). 3. Almacenamiento de Datos: o Diseñar un esquema en SQL que soporte la información recolectada. o Implementar un procedimiento almacenado para insertar los datos en la base de datos, manejando validaciones como: Verificación de duplicados por URL. Registro de errores o inconsistencias en una tabla de logs. 4. Escalabilidad: o Posibilidad de agregar nuevos portales sin modificar significativamente el código base. 5. Frecuencia: o Programar ejecuciones automáticas en intervalos configurables (diario, semanal, etc.). ________________________________________ Requerimientos Técnicos 1. Lenguaje de Programación: o Python (preferiblemente) o Node.js, utilizando frameworks como Beautiful Soup, Scrapy o Puppeteer. 2. Base de Datos: o SQL Server o un motor compatible que permita la ejecución de procedimientos almacenados. 3. Integración con la Base de Datos: o El programa de scraping debe ejecutar el stored procedure tras procesar cada lote de datos. 4. Manejo de Excepciones: o Registro detallado de errores durante el scraping, almacenamiento o ejecución de stored procedures. 5. Compatibilidad: o Diseñar la solución para que sea compatible con sistemas operativos Windows y Linux. ________________________________________ Criterios de Aceptación 1. La solución debe ser capaz de recolectar y almacenar información de al menos 5 portales de noticias iniciales. 2. La inserción en la base de datos debe completarse sin errores y siguiendo las validaciones establecidas. 3. El sistema debe ser modular para facilitar la adición de nuevos portales. 4. Los datos deben ser accesibles para consultas SQL tras su almacenamiento. ________________________________________
5 days ago16 proposalsRemotePython Developer to Repair and Enhance Web Scraping Script.
Job Description: Python Developer to Repair and Enhance Web Scraping Script We are looking for a skilled Python developer experienced in Selenium to repair and extend the functionality of our web scraping script. The script was fully functional until one month ago but requires updates due to changes in the target website's structure. Additionally, the script needs to be enhanced to filter and log profiles based on follower counts within specified ranges under specific hashtags. Objective: 1. Restore the script's core functionality: - Navigate hashtags and profiles. - Like and comment on posts. - Handle edge cases such as private or inaccessible profiles. 2. Add filtering and logging functionality: - Scrape and log profile names and follower counts for profiles under specific hashtags. - Allow configuration of follower ranges (e.g., 10,000–20,000). - Output results in a structured format (e.g., CSV or JSON). Key Tasks: 1. Repair the Existing Script: - Update locators to match the new website structure. - Resolve issues like NoSuchElementException and ElementClickInterceptedException. - Ensure robust performance through error handling and retries. 2. Implement Profile Filtering: - Scrape profile details under given hashtags. - Filter profiles by follower count ranges. - Log the profile name and follower count into a structured output. 3. Testing: - Validate the script's functionality across multiple hashtags. - Test the filtering logic thoroughly with different follower ranges. Requirements: - Expertise in Python and Selenium for web scraping. - Experience adapting scripts to dynamic websites and resolving errors. - Knowledge of HTML structure analysis using tools like Chrome DevTools. - Ability to scrape and structure data efficiently. This project requires delivering a fully functional script with the new features within 2-3 weeks. If you can deliver a robust solution, we’d love to hear from you.
2 days ago22 proposalsRemoteData miner needed to scrape Coaching Business leads
We are looking for data miners to search online for Coaching Businesses in the USA & Canada and collect their information. Once gathered, the data will need to be added to a Google Sheet which we will use for outreach. We will provide all the tools and processes you need to source leads. - Starting Task: Gather 1,000 leads for $100 USD. - Ongoing Work: Successful candidates will source 2,000+ leads per month at $100 USD per 1,000 leads. After the starting task, this is a long-term position with 3 months guaranteed. If you perform well, we’ll extend it to 12 months or more. Skills required - Excellent English is a MUST (both written and spoken) - Knowledge of Google sheets - Knowledge of how to do lead sourcing (finding emails etc.) Technical requirements - Good Laptop/Computer (windows or mac) - Fast and reliable internet connection Lead requirements - Type: Coaching Businesses - Country: United States or Canada - Company Age: 3+ years What needs to be added to our spreadsheet - Company Name - Company LinkedIn - Website - Owner Name - Owner Email - Owner LinkedIn Please put the number ‘55’ in your application so we know you’ve taken the time to read this description.
12 hours ago28 proposalsRemoteEvents Database
Job Description: We are seeking a skilled and detail-oriented Web Scraper & Data Researcher to assist in building a comprehensive events database. The role involves using online tools and scraping techniques to extract detailed information about events and associated contacts. The database will be a resource for networking, business development, and event planning. Key Responsibilities: Data Scraping: Scrape event details (e.g., name, date, location, description) from specified websites. Scrape contact information (e.g., organizer names, emails, phone numbers) linked to these events. Extract additional data fields as specified (e.g., ticket prices, registration links). Data Research: Research additional event-related information where scraping is not applicable. Validate and cross-check collected data for accuracy and relevance. Database Organization: Organize scraped data into a structured format (Excel, Google Sheets, or database software). Ensure data is properly categorized and searchable for future use. It's a big project with up to 40,000 contacts.
5 days ago28 proposalsRemoteGoogle Chrome Extension V3 from V2 & Other Tasks
1. Stripe subscribers get back into the extension after previous use. 2. Upgrade to Version 3. 3. Scrape email when there is a website 4. Google Sheets download option 5. a counter that gives the number of results to download and total results
6 days ago13 proposalsRemoteopportunity
Need a Data Base of Smoke Shops in TX
I m looking for someone that can data scrape a list of smoke shops in Texas. Looking for a list of 10k shops. I will need email, mailing address, Name of Shop, Phone number. I have a list of 2k already that I can provide to avoid duplicates of the ones on that list.
5 days ago44 proposalsRemoteopportunitypre-funded
Daily scrape of jobs into Google Sheet from employer websites
This project seeks to streamline daily job posting efforts for a large jobs website. At present, we keep it busy we manually finding and posting jobs to the site daily. It's a lot of effort to maintain and I'm looking to streamline and automate it. The 13 organisations I'd like to start with - including links to their pages where jobs are listed/linked out from - are shown in the attached spreadsheet. We'll likely want to expand this list over time. I'm looking to set up a process to monitor for new jobs each day, and to scrape key information into a Google sheet I already have set up. It needs to be a robust process that can run in the longer term to autopilot. The information we'd need to capture for each new job into the Google sheet is as follows: 1 - Job title 2 - HTML of the role description 3 - Employer name 4 - URL of the role 5 - Country 6 - State (if in the USA) Once in the Google Sheet I can then categorise them and import them to the site in batches. Please outline how you'd approach the task. I hope you can help with this task.
6 days ago46 proposalsRemoteEvents Database
Job Description: We are seeking a skilled and detail-oriented Web Scraper & Data Researcher to assist in building a comprehensive events database. The role involves using online tools and scraping techniques to extract detailed information about events and associated contacts. The database will be a resource for networking, business development, and event planning. Key Responsibilities: Data Scraping: Scrape event details (e.g., name, date, location, description) from specified websites. Scrape contact information (e.g., organizer names, emails, phone numbers) linked to these events. Extract additional data fields as specified (e.g., ticket prices, registration links). Data Research: Research additional event-related information where scraping is not applicable. Validate and cross-check collected data for accuracy and relevance. Database Organization: Organize scraped data into a structured format (Excel, Google Sheets, or database software). Ensure data is properly categorized and searchable for future use
14 days ago40 proposalsRemotePython script for scrape 6 data fields
I need a python script to scrape following Data(see attachment 6 columns of data) from this following website https://apps.hcr.ny.gov/BuildingSearch/ First go to website then Click zipcode upper right. You can see there are two Dropdowns. script need to go through all the selections of this two dropdowns. then export a CSV file using dataframe. Note - I dont want to use any thirdparty proxy or rechaptcha resolver. I dont know if the website uses that kind of technology. So please check before sending me request. (mention that you checked it in the proposal)
19 days ago19 proposalsRemotePython Developer for Web Monitoring Tool with GUI
Job Description: I am seeking an experienced Python developer to create a web monitoring tool that checks a specific website field for changes and alerts the user with a sound. The tool should also have a simple graphical user interface (GUI) using Tkinter to allow users to input the URL and field selector. The main requirements are as follows: Requirements: Develop a Python script that fetches content from a given website URL and checks a specific field for changes. Use BeautifulSoup for parsing HTML and Requests for fetching the webpage. Implement a sound alert when a change is detected using the playsound or similar library. Create a simple Tkinter GUI that allows users to input the URL and field selector, and start the monitoring process. Ensure the script checks the field every 3 seconds. Skills Required: Proficiency in Python programming. Experience with web scraping using BeautifulSoup and Requests libraries. Familiarity with creating simple GUIs using Tkinter. Ability to implement sound alerts in Python. Project Details: The developer will need to incorporate placeholder text for the website URL and field selector, which will be replaced by the user. The tool should run continuously, checking for changes every 3 seconds and playing a sound alert when a change is detected. How to Apply: Please provide examples of previous Python projects, especially those involving web scraping and GUI development. Include your estimated time to complete the project and your availability.
12 hours ago7 proposalsRemoteJob Data Collection System Python (scraping)
Project Overview We are seeking an experienced Python developer to optimize and enhance our job data collection system. The current Selenium-based approach needs to be replaced with a more efficient API-driven solution, incorporating sophisticated data management and robust error handling. Key Requirements - Strong Python programming skills with API integration experience - Database design and implementation (PostgreSQL preferred) - Experience with data versioning and delta tracking - Familiarity with VPN handling for IP rotation - Linux server deployment experience (Ubuntu) Technical Specifications Core Functionalities 1. API Integration - Implement API-based job ID collection to replace current Selenium approach - Design intelligent filtering system to manage data retrieval within API limitations - Develop dynamic filter adjustment for optimal data collection 2. Database Design & Implementation - Design and implement a PostgreSQL database structure - Key data points to track: - Job IDs and metadata - First addition and update dates - Full job details (JSON format) - Update tracking and versioning - Job availability status 3. Data Management - Implement delta versioning for historical tracking - Design system to handle regular job listing updates - Ensure no data loss during updates 4. System Features - Flexible time period selection for data retrieval - Automatic filter optimization to work within API limitations - IP rotation mechanism using NordVPN Additional Requirements - Comprehensive logging system - Email notification system for errors and results - Daily statistics tracking and reporting - Server deployment on Ubuntu VPS Technical Considerations - System must handle large volumes of data efficiently - Solution should be scalable and maintainable - Must work within API rate limits and restrictions Deliverables 1. Complete Python codebase 2. Database schema and implementation 3. Import of existing data 4. Deployment documentation 5. System documentation including error handling procedures Skills Required - Advanced Python programming - API integration expertise - Database design and optimization - Linux server administration - Network handling (VPN integration) This is a complex project requiring a developer with strong system design skills and attention to detail. The ideal candidate will have experience with large-scale data collection and management systems.
a month ago10 proposalsRemoteData scraping and Automation for lead generation
About Us We are a leading London-based design agency specializing in branding and packaging for the drinks sector. We are looking to streamline our client acquisition process by automating data collection and enrichment from public company registers like Companies House. This project is essential for building a highly targeted database of potential clients. Role Overview We are seeking an experienced Automation Specialist to design and implement an automated process to extract, enrich, and manage client data. The successful candidate will set up a system that collects data from company registers, enriches it with relevant contact details, and integrates seamlessly with our custom CRM which is already built. Key Responsibilities 1. Data Extraction Develop and deploy a web scraping solution to extract company details (e.g., name, incorporation date, SIC codes, director name, date of birth, profession etc) from Companies House or similar public registers. Ensure the solution complies with data privacy laws and ethical scraping practices. 2. Data Enrichment Integrate the scraping tool with services like LinkedIn, Hunter.io, Apollo.io, or similar platforms to locate contact details (emails, LinkedIn profiles, etc.) for company directors. Validate and clean the data for accuracy. 3. Integration with CRM Set up automated workflows to input data into our CRM Implement safeguards to prevent duplicate entries. Required Skills and Experience • Proven experience in web scraping using tools like Octoparse, ParseHub, Apify, or Python-based solutions (e.g., BeautifulSoup, Scrapy). • Familiarity with APIs and data enrichment tools (e.g., LinkedIn API, Hunter.io, Apollo.io). • Strong knowledge of CRM platforms (e.g., HubSpot, Salesforce) and automation tools like Zapier or Make (Integromat). • Understanding of data privacy regulations (e.g., GDPR) and ethical scraping practices. • Excellent problem-solving skills with the ability to design efficient workflows. • Strong communication skills for creating SOPs and explaining technical concepts. Preferred Qualifications • Experience in lead generation or B2B sales processes. • Background in the drinks, branding, or creative industries. • Previous experience setting up similar automation projects for small or medium-sized businesses. Project Deliverables • Fully functional automated system for scraping and enriching company data. • Seamless integration with our CRM, including workflows for data input and validation.
a month ago38 proposalsRemoteI need a contact list of newly started up companies
I need a contact list of business owners who own newly started companies. so a list of companies that are 3 months to 1 year old (this is important) I want to contact these owners and send them something so I need a contact name, a company name, a personal email address, address and phone number these need to be UK based and single location SMEs NOT multiple outlet chains or companies in a group or supermarket chains. They need to be Independent newly started companies. This can be a list you have of new start companies or you can scrape from a directory or website or list Please provide your best price for this I am interested in an initial 500 UK contacts but would like 2000, then more later My price is just the PPH suggested price at the moment so please tell me what you can deliver and how much it will cost us Thanks
5 days ago36 proposalsRemoteSkilled programmer who can extract specific data from Google
I need a skilled programmer who can extract specific data from Google search results on a daily basis. The extracted data should be delivered in CSV format. Ideal Skills and Experience: - Proficient in web scraping and data manipulation - Strong understanding of Google’s search algorithms and structures - Experience handling large datasets - Attention to detail Excel Web Scraping Software Architecture Data Mining
a month ago17 proposalsRemoteopportunity
Develop AI Agent for Automated Legal Case Processing
I'm a lawyer seeking an experienced developer to create an AI agent that can automatically collect, process, and integrate court decisions from Croatian legal websites into our database. The agent should be able to: 1. Scrape data from specific Croatian legal websites, including https://e-oglasna.pravosudje.hr/ 2. Navigate through search interfaces and handle dynamic content 3. Download and process PDF documents containing court decisions 4. Extract relevant information from these documents using NLP techniques 5. Categorize and index the decisions based on predefined legal areas and keywords 6. Integrate the processed information into our existing legal database Required Skills and Experience: - Proficient in Python, with expertise in web scraping libraries (e.g., Scrapy, Selenium) - Experience with PDF processing libraries (e.g., PyPDF2, pdfminer) - Strong background in Natural Language Processing (NLP) using libraries like NLTK or spaCy - Familiarity with database management and indexing (e.g., SQL, Elasticsearch) - Experience in developing AI/ML models for text classification and information extraction - Knowledge of web technologies and ability to handle dynamic content and CAPTCHAs - Understanding of data privacy and security best practices - Ability to work with Croatian language text (knowledge of Croatian is a plus but not mandatory) - Experience with legal documents or similar text-heavy domains is advantageous Deliverables: 1. A fully functional AI agent meeting the above requirements 2. Comprehensive documentation and user guide 3. Source code with clear comments 4. A report detailing the methodology, challenges, and potential improvements Please provide examples of similar projects you've worked on, especially those involving web scraping, PDF processing, or legal document analysis. Include your estimated timeline and budget for this project. Note: The successful candidate must be willing to sign a non-disclosure agreement due to the sensitive nature of legal data.
a month ago26 proposalsRemoteLead Generation EMAIL DATA EXPERT
Job Description: Email Collection Specialist We are seeking a skilled and reliable data specialist to enrich an existing list of 5,000–10,000 leads, extracted from Sales Navigator, by adding their verified work email addresses. The list is provided in CSV format. Responsibilities: Scrape, find, or validate work email addresses for the provided list of leads. Ensure all email addresses are accurate, valid, and active. Deliver a final, enriched CSV file with added email addresses. Maintain strict adherence to data privacy and compliance laws. Qualifications: Proven experience in web scraping or email enrichment. Proficiency in tools like Hunter.io, Apollo, or other email-finding platforms. Familiarity with Sales Navigator and CRM integration is a plus. Understanding of GDPR and CAN-SPAM compliance. Deliverables: Enriched CSV file with verified work email addresses. Brief report on the tools and methods used for data collection.
24 days ago40 proposalsRemoteVirtual Assistant for Lead Generation and Data Entry
We are seeking a detail-oriented Virtual Assistant to help build a database of potential clients for our design agency. The role involves researching companies registered in the last 3 years under specific SIC codes (e.g., spirits and alcohol brands), identifying their directors, and finding their LinkedIn profiles or contact details. All information will need to be accurately entered into our CRM system. Responsibilities: • Research companies on public registers (e.g., Companies House). • Locate contact details for company directors via LinkedIn or other sources. • Input and organize data into our CRM. • Maintain high levels of accuracy and efficiency. Requirements: • Strong research and data entry skills. • Experience with LinkedIn and online research tools. • Familiarity with CRM systems or data scraping • Reliable and able to meet deadlines.
a month ago92 proposalsRemoteopportunityurgent
List of aesthetic clinics in the UK
Job Title: Data Collection and Research for UK Aesthetic Clinics Offering Botulinum Toxin Treatments Job Description: We are seeking a diligent and detail-oriented researcher to compile an accurate and comprehensive list of UK aesthetic clinics that specifically deliver botulinum toxin injections, also known as Botox. The data should be well-structured and exclude dental clinics. Below are the requirements and key points for this project. Scope of Work: Objective: Compile a list of UK-based aesthetic clinics offering botulinum toxin injections, excluding dental clinics. Data Requirements: For each clinic identified, collect the following details: Clinic Name Clinic Location (Town) Clinic Full Address Business Website Clinic Phone Number Number of Staff (if available on the website) Treatments Offered (scraped from the website) Key Personnel Information: Name(s) of the clinical director, managing director, clinic owner, doctor, or dentist associated with the clinic. Google Ratings Count and Average Review Score (if easily accessible). Verification of Data Accuracy: Include a column or note on how the data was verified (e.g., direct website information, verified phone calls, or reliable secondary sources). Include a column for missing data points and a note on why they couldn’t be retrieved (e.g., no website, no phone number listed). Exclusions: Do not include dental clinics that also offer aesthetic treatments. Methodology Suggestions: Primary Tools: Use specific tools like Google Maps, LinkedIn, and official clinic websites to gather information. Supplement with reliable business directories or professional registries as needed. Search Method: Perform a targeted search on Google Maps using terms like “aesthetic clinic.” Cross-reference information with clinic websites to confirm that services delivering "botulinum toxin" or "toxin" is mentioned within their website or otherwise. Formatting: The final dataset should be delivered in Excel format with clean and consistent formatting for key fields. Organize the data into logical categories (e.g., by region, town, or clinic size). Deliverables: A structured Excel database containing all required data fields (outlined above). Clearly labeled columns, including one for missing data and one indicating how each data point was verified. Associate Google Ratings count and average review score where available. A brief explanation of the methodology used and any challenges encountered during data collection. Volume Expectation: Based on market reports for this sector, we estimate up to 20,000 clinics may need to be identified during this search. Please confirm if this scope is manageable and suggest how you might structure the process for efficiency. Ideal Candidate Requirements: Experience in web scraping, data collection, and research. Familiarity with tools for gathering and structuring data. Attention to detail to ensure exclusions (such as dental clinics) are implemented correctly. Ability to work efficiently and deliver accurate data within agreed timelines. Additional Considerations: If you encounter discrepancies or unexpected patterns in the data (e.g., duplicate clinics, regional oversaturation), highlight them in your submission. Provide suggestions or feedback on improving the search methodology based on your expertise. Budget: Open to proposals, based on the scope and timeframe. Timeline: Ideally within 1-2 weeks. If you are interested, please provide examples of similar projects you have completed and a brief summary of how you would approach this task. Thank you
17 days ago56 proposalsRemoteExpires in 13
Past "Scraping" Projects
Data Scraping Project - quick turnaround
Brief for Data Scraping Project Objective: To collect accurate and up-to-date contact information for specific roles within UK Local Authorities (Councils), Virtual Schools, and SENCOs. This data will be used for targeted outreach efforts, particularly for CFOs, School Business Managers (SBMs), and finance-related personnel, as well as the Head Teachers or relevant individuals responsible for Virtual Schools within Local Authorities. Data Requirements: At a minimum, we require: First Name Last Name Local Authority Name Email Address Additionally, the following data points are desirable for enhanced targeting: Job Title Postal Address (with postcode) Phone Number Website URL Key Targets: Local Authorities: CFOs, Finance Directors, School Business Managers (SBMs), or equivalent financial roles. Virtual Schools: Head Teachers or the individual at the Local Authority responsible for overseeing the Virtual School. SENCOs (Special Educational Needs Coordinators): SEN-specific financial or operational contacts within schools or Local Authorities. Delivery Expectations: Data should be accurate and validated to ensure no duplicate or outdated entries. Organized in a clean and accessible format (e.g., Excel or CSV file). Include segmentation by role, region, or other relevant categories. Additional Notes: If you have access to any other relevant data (e.g., Council size, school affiliations, etc.), please include it. This may help in tailoring outreach campaigns across multiple channels beyond email, such as direct mail or telemarketing. Please confirm your ability to meet the requirements, delivery timeline, and any associated costs.