Develop a Web Scraping system
- or -
Post a project like this$$
- Posted:
- Proposals: 8
- Remote
- #2074267
- Expired
Top rated PHP Web Development | WordPress | Magento | Drupal | OpenCart | PrestaShop | Joomla
Leicester
Virtual Assistant, Web Scraping, Data Mining, Python Bot creation, Data Entry, Photoshop
Salem
10507541469407149197814932381539295187566121149472273379
Description
Experience Level: Intermediate
We want to develop a system for scraping data from various websites specialized in classified ads for second-hand products.
The data must be captured periodically to know which ads are new, which ones have been updated and which ones have been eliminated.
With the data from all the websites, finally unified in a single database, we want to be able to analyze the evolution of the market data.
Therefore, we need to be able to go through certain categories (not all) of a total of five different websites. We need to be able to scrape about 10 or 15 key fields of all those ads (each website have the same page structure in all of their categories).
Preferably we would like the system to be developed in Python (we already have a crawler of one of those web pages in Python and works fine).
We want a stable system. We want the system to be executed as autonomously as possible (as long as there are no changes in the format of the target websites). We also want the system to have a series of alerts by email to notify us when a failure occurs (some service goes down, some web blocks us, the format of some web has changed and we can no longer extract the data, etc.). We want proxy change support (to prevent ip-blocking)
We are open to suggestions regarding the most professional architecture to maintain the system (python->file->mysql ; python->postgre->mysql; python->mysql; ... server hosted, crawlers specialized hosting, etc..).
The five websites to be crawled (for now) are:
https://www.agriaffaires.es/
http://www.merkatia.com/motor-maquinaria/
https://www.mascus.es/
https://www.topmaquinaria.com
https://trademachines.fr/agricole
We probably need maintenance services after the end of the project.
The data must be captured periodically to know which ads are new, which ones have been updated and which ones have been eliminated.
With the data from all the websites, finally unified in a single database, we want to be able to analyze the evolution of the market data.
Therefore, we need to be able to go through certain categories (not all) of a total of five different websites. We need to be able to scrape about 10 or 15 key fields of all those ads (each website have the same page structure in all of their categories).
Preferably we would like the system to be developed in Python (we already have a crawler of one of those web pages in Python and works fine).
We want a stable system. We want the system to be executed as autonomously as possible (as long as there are no changes in the format of the target websites). We also want the system to have a series of alerts by email to notify us when a failure occurs (some service goes down, some web blocks us, the format of some web has changed and we can no longer extract the data, etc.). We want proxy change support (to prevent ip-blocking)
We are open to suggestions regarding the most professional architecture to maintain the system (python->file->mysql ; python->postgre->mysql; python->mysql; ... server hosted, crawlers specialized hosting, etc..).
The five websites to be crawled (for now) are:
https://www.agriaffaires.es/
http://www.merkatia.com/motor-maquinaria/
https://www.mascus.es/
https://www.topmaquinaria.com
https://trademachines.fr/agricole
We probably need maintenance services after the end of the project.
Juanjo N.
100% (44)Projects Completed
45
Freelancers worked with
39
Projects awarded
32%
Last project
5 Apr 2024
Spain
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies