Crawlers Projects
Looking for freelance Crawlers jobs and project work? PeoplePerHour has you covered.
Post an offer to educate them
Past "Crawlers" Projects
DevOps engineer
we are looking for a DevOps professional freelancer to support us with in maintaining, problem solving and optimising a domain crawler written in python. You'll need to first deploy the code on whatever machine you like, then ensure you can run it. Get to know how the code works, then help us run it in production on our own server. Other tasks will be to troubleshoot issues it has, suggest optimisations (eg. better scaling) and add new features. Code is in gitlab which you'll be given access to. We are looking for someone to work with us for the long term so would to find someone friendly and with excellent English.
Automated collection of business data with email
We are an email agency and are looking for a reliable and experienced data collector who can collect data from Google Maps based on industry and country. You are expected to research sub-industries provided (for example manufacturing > steel manufacturing, automotive manufacturing, technology manufacturing etc.) for which you may use alternative directories, however it is vital that all data is gathered from Google Maps only. All contacts must have a verified email (we can not accept contacts without email or with emails which have not been verified = this means you will probably have to collect 2/3 the total amount as you will lose non-suitable contacts). Please start your proposal stating what verification software you will use (a full report is expected with the data). Some jobs may require additional manual research work which can only be completed through manually checking each site (we pay a premium cost for those records - if you do have the ability to manually collect please state this in your proposal). This is for ongoing work of various jobs. Please base your quote on the cost for 1k (most orders will be 5-10k in total). IMPORTANT! Due to the expected volumes and delivery times you must be in a position to automate this job using relevant crawlers. Please do not reply if you only collect manually.
Google Ads / Analytics Event /Conversion set up
Need to set up Events/Conversion for 4 Ad landing pages & enquiry forms on website but hidden from Google crawlers,
Advanced Python developer / Data scientist
We are looking for someone to help us with a web crawler built in python potentially as a long-term support in the Net Knowledge team. We need someone to go through the code of the project and understand how it works to a fine detail. Once knowledgeable in the code, the person should run the crawler in production on a monthly basis over millions of domain names provided via our system and resolve any issues that come up. The person should have strong skills in python and data science as well as excellent problem solving skills and high level English. Ideally you should have knowledge and experience in infrastructure too such as setting up servers, AWS and databases About us We are a very small team (2) of data analysts that have clients in several locations around the world and work exclusively in the industry of domain name registrations and in particularly Top Level Domain Registries. We do a lot of analysis and reporting on the trends in domain registrations and how domains are being used (hence why we run a crawler). As we are a small team we need someone who is friendly, pro-active and has excellent communication skills.
opportunity
Web Crawling and public data collection.
We are looking to develop a simple web crawler that can extract and compile contact information for businesses based on geographic location. Will want simply to compile - Business name, address, city, State/Province, Phone and main email address.
Chrome extension
I need to create the chrome extension . Basically what extension programmer should do: 1. Load parser/crawler lirary from the backend (js file). Send extension version number as well so that correct js file is returned. 2. Run the library (app with parsers). First version has 10 largest ecommerce shops. We start with the Amazon.com. Eventually 1 000 shops covered and all countries. 3. execute Amazon-tailored crawler if user is on any Amazon website e.g. Amazon.com. The code baseline should be modular so that we can easily add more crawlers there. Eventually, there will 1 000 crawlers. 5. Amazon crawler parse page ASIN codes (product ID) 6. Adds ASIN codes in an array and sends the array to the backend REST API 7. There is two cases where you show stamp and what details is available from the product on that page
Build Web Crawler to Find Websites Not Blogging
Hello. I need a tool that finds business websites that have a blog, which were active in the past, but no new blog posts have been published in a time period. I want to send cold email to .co.uk businesses that don’t have the time to blog, so we can promote our blog packages. Currently, this is done manually, with it time consuming. I want a bot / web crawler that can find these for us. If the bot can also automate any other parts of the process (i.e. find a contact email or send email), would love to hear what’s possible. Thanks. Bob
Data Aggregation from Public Real Estate Records Website
Project Description: I am seeking a freelancer to assist in pulling data from the publicly available website Chescoviews.org or tax assessors office, which provides access to real estate public record data for Chester County, PA. The website presents data through interactive maps, allowing users to zoom in on specific areas such as roads, neighborhoods, and residential parcels. Each parcel on the map provides information about the owner and various tax details upon clicking. The task involves aggregating ownership and mailing addresses for specific geofences within Chester County and related form data like lot size, last sale price, etc.... This can be achieved either manually or through the automation of a script that simulates clicks on specific links within the map to capture the required form data. Ideally, once the data is captured, a crawler or similar tool could be employed to match, identify, and capture email addresses or other demographic data traits associated with each homeowner in the list. The initial target area encompasses approximately 1830 homes within Chester County, PA. I am open to suggestions for more efficient methods or additional sources of information beyond Chescoviews.org. Key Deliverables: Aggregated list of homeowner names and mailing addresses within specified geofences. If feasible, email addresses and or mobile phones associated with each homeowner on the list. Documentation outlining the methodology used and any scripts developed for automation. Recommendations for alternative or complementary sources of information for more efficient data collection. this script should be reusable and adjust based on specific interactive map requirements. Skills Required: Web scraping Scripting/automation (e.g., Python) Data aggregation and analysis Familiarity with real estate records and public data sources Additional Information: The freelancer may need to navigate the website and understand its structure to devise an effective data extraction strategy. Experience with similar projects or familiarity with Chester County's real estate landscape would be advantageous. Please provide an estimated timeline for completing the project, including any potential challenges or dependencies. Clear communication and regular updates on progress are essential throughout the project duration.
opportunity
Website scraping tool
# Composite Use Case for myClerk.ai Web Scraping Tool Development ## Project Overview The myClerk.ai project aims to automate the collection, organization, and monthly update of documents from approximately 10,827 UK council websites, including 10,450 parish and town councils and 377 larger councils. This initiative seeks to make council documents easily accessible and searchable, covering essential materials such as constitutional documents, terms of reference, minutes, and planning documents. ## Objectives - **Automate Document Extraction:** Develop a scraping tool to automate the retrieval of PDF documents across varied council websites, accounting for the unique structure and content of each site. - **Efficient Data Organization:** Utilize council reference codes to systematically organize documents on a web server. - **Monthly Updates:** Implement a mechanism to capture new documents on a monthly basis without duplicating existing files. - **Link Monitoring and Notifications:** Create a system to track and report broken links and facilitate updates or notifications to site administrators. - **Data Categorization for Larger Councils:** Classify documents on larger council websites for more efficient retrieval and analysis. ## Database Structure The development leverages a hybrid database approach: - **Relational Database (PostgreSQL):** Hosts a comprehensive list of councils and their metadata, crucial for guiding the scraping tool to the correct websites for document extraction. - **Vector Database:** Reserved for storing processed text from PDFs for content-based searches, but note that this element is separate from the scraping tool task. ## Suggested Technologies - **Web Scraping and Data Organization:** Python, with libraries such as BeautifulSoup, Scrapy, and Requests for web scraping and automation. AWS S3 for document storage and PostgreSQL on AWS RDS for data management. - **Server and Hosting:** AWS Lambda for cost-effective routine downloading tasks and Amazon Aurora Serverless for RDS to dynamically adjust computational capacity. - **Notification System:** AWS Lambda and SNS for monitoring and identifying broken links, sending notifications for action. ## Crawling and Scraping Process - **Crawling:** Implement a depth-controlled crawler to navigate each council's website, identifying webpages with PDF links at all levels. - **Scraping and Downloading:** Post-crawling, the tool will scrape the identified PDFs, checking against previous downloads to avoid duplication. The tool is designed to adapt to the diverse web structures of council sites, ensuring comprehensive document retrieval. ## Monthly Update Cycle - The tool will perform a complete cycle each month, identifying and downloading new or updated documents based on changes in file details, thereby keeping the database current without accumulating duplicates. ## Development and Testing - Prior to full deployment, the scraper will undergo a testing phase on a selection of websites to refine its operation, gradually scaling up to include the full range of targeted sites. Timescale: basic model for testing to be delivered as soon as possible, a number of weeks can be allowed for the full model, including deploying it to the host web server and connecting to the database.
Create a Python-based web scraper
Build and deploy a Python-based web application that can: • Crawl a wide range of corporate and news websites for pages and documents containing relevant information, based on search terms that can be either set or manually input. • The crawler can be set to automatically run at set frequencies or run manually. • The crawler should be broadly as effective as a reasonably comprehensive Google search e.g. should return the majority of the results from the equivalent first two Google results pages. • Scrapes those pages and documents for the relevant information (predominantly text strings) and compiles it into some sort of data lake/unstructured store ready for later analysis and structuring. The application should be cloud hosted and the freelancer will need to also set this hosting up (install Python virtually etc) and handover to me with necessary demonstrations/explanations. Additional information and examples can be shared with interested freelancers following signature of an NDA. Further work may follow if this project is successful. Please indicate in your response how many hours you expect this work to take.
opportunity
Directory website crawler - automation
I have a business directory website. Rather than add business listings manually, I need some type of automation that fetches info from the web and submits to the website. There is a not a database to be scraped, the input for said automation will be a list of web domains. The site is build on Wordpress and is self hosted. The price set is just arbitrary as the scope need to be fine tuned. You will obviously need more info so please reply with any questions if you are interested in the project. If you are not very confident in English, please DO NOT reply. This is a complicated project and communication is key.
Video Game HUD/UI/UX designer needed for Unreal Engine 5
I am looking for a HUD/UI/UX designer for our multiplayer dungeon crawler game. We already have a large asset pack of files and icons, we just need someone with a keen eye to put it all together. Everything we have is currently in a prototype design and we need it to have a nice professional feel with the new assets we will provide. Please check the pictures for some examples of the current menus we have. Layouts needed: - HUD Improvements and layout improved - Inventory/Equipment/Stash/Container - In-Game menu - Tip window - Interaction window - Main Menu Windows Customize/Skills/Shop/Settings/Stash - In game pop ups - In game notes Please reach out with any examples of your previous work. Thank you.
opportunity
Website SEO Analysis and Competitor Analysis - Detailed
I have a website focused on a local service (like 'plumbers new york' - but its not that.) I know what keywords are relevant and have some SEO done on the site. The site is 10+ years old. It has a good number of backlinks with many from good domains. We are being outranked by newer, thinner sites with fewer backlinks. Why...... Things I want done / questions I want answering..... Competitor analysis - what keywords are they using that we are not? What backlinks do they have that are causing them to outrank us? Backlink Analysis - Analyze the current backlinks. Tell me if there are any issues with the backlink profile (does it look spammy etc). On-page SEO - Any suggested changes. Is there something about the site or its structure causing poor ranking? How does it look in the crawlers? Off-page SEO - What changes would you make Internal linking structure - any suggested changes? Look into the HTACCESS to see if thats causing any issues.... After all this is done I would like to know why certain sites are performing better (their backlinks? Their site speed?), and some action steps I can take to improve rankings for my main keyword or keywords. When quoting - I ask you to bear in mind that, while this reporting piece is a once-off project - we will likely need help in implementing your recommendations....
SEO for React PWA and database
We have developed an Ionic React single page app with a large database and want to develop a way for search engines to search millions of business profiles and posting within the app. We will provide further details to a developer that we feel is qualified and negotiate the price. To update this description of our needs note that our app is developed to promote other businesses locally. Our database currently contains over a million business listing that are attached to business categories and keywords, some have web links. We want to have the search engines find and post these listings by users searching location. Our success is in helping get their business found in their local community. To further update... We don't believe that standard SEO functions will be adequate for this project as a a React SPA app is not searched by crawlers in the traditional methods. We need the data from our database presented for crawling. The successful freelancer should give us some idea how they may achieve this as the SPA will not render all the data from our database based on standard SEO programming.
opportunity
I need a web crawler program created for driving school purposes
I need someone who is an expert at programming who can create a software/program to extract data from a website. I need to be able to use this software with ease, and something that can crawl this website every few seconds.
Build crawler to read entries from page
Hi PPH Community, I need a crawler built that reads the following page: https://die-kfzgutachter.de/sachverstaendiger-kfz-gutachter-aura-97717.htm I would like to get the following information of the page: 1. Name in green text 2. Street 3. ZIP 4. City 5. Phone 6. Webseite There should be around 6.200 entries. The results should be posted by collumn into a google sheet. Best regards, Daniel
I need a site crawling to generate a CSV file
I'm looking for a site crawler to crawl https://www.cutpricewholesaler.com/ I would like the following data for every product listed in CSV format Title, SKU, Barcode, Unit Price, Minimum Quantity, Web link to the listing All this information is in the same format on every listing.
opportunity
Information scraper/crawler.
Hi All, I'm looking for someone to help us create some scrapers/crawlers to be used to collect data on product pricing and compare against others companies in the market. An example being that we want to collect all data from laptops being sold online through retailers, we then want to save the data and information in a database that we can use to review pricing. The pricing scrapers/crawlers much pick the below: - Name of product - Price - currency - Data size - Colour - new/refurbished It's important that the scraper matches the exact make and model together, without putting together any other models. Once the data is collected we want to be able to check the pricing a few times a day to make sure the pricing is up-to date. We want to create the products in a backend system, but this must be done in an automated way without the need to to create manual products. We would like to get this information from categories such as TV's, laptops and mobile phones. The companies we want to get the pricing from are online retailers (names of these businesses I can share with you). if you have experience in creating a database and an automated way of collecting this information then we would like to talk to you further on this project. if you can share with me the work you've done previously that you believe will help me make a decision that you're the right person/company to work with. Thanks Ashley
opportunity
Build crawler to scrape descriptions and prices
Hi Guys, we would need somebody for data scraping. The whole job should not be more than 500-600 Lines, each with around 5 fields. Potentially, this could also be done manually Please see the video on Loom for the description https://www.loom.com/share/1addd6a7ac2f4c3dacd56592ec096d28
Create an Automated Client Database Web Crawler
I need a programmer to help create a web crawler to auto populate a prospect (client) database on a daily basis.