Develop a Web Scraping system

- or -

Post a project like this

Ended at: 10/09/2018

Fixed Price

Posted: 6 years ago
Proposals: 8
Remote
#2074267
Expired

+ have already sent a proposal.

Description

Experience Level: Intermediate

We want to develop a system for scraping data from various websites specialized in classified ads for second-hand products.
The data must be captured periodically to know which ads are new, which ones have been updated and which ones have been eliminated.

With the data from all the websites, finally unified in a single database, we want to be able to analyze the evolution of the market data.

Therefore, we need to be able to go through certain categories (not all) of a total of five different websites. We need to be able to scrape about 10 or 15 key fields of all those ads (each website have the same page structure in all of their categories).

Preferably we would like the system to be developed in Python (we already have a crawler of one of those web pages in Python and works fine).

We want a stable system. We want the system to be executed as autonomously as possible (as long as there are no changes in the format of the target websites). We also want the system to have a series of alerts by email to notify us when a failure occurs (some service goes down, some web blocks us, the format of some web has changed and we can no longer extract the data, etc.). We want proxy change support (to prevent ip-blocking)

We are open to suggestions regarding the most professional architecture to maintain the system (python->file->mysql ; python->postgre->mysql; python->mysql; ... server hosted, crawlers specialized hosting, etc..).

The five websites to be crawled (for now) are:
https://www.agriaffaires.es/
http://www.merkatia.com/motor-maquinaria/
https://www.mascus.es/
https://www.topmaquinaria.com
https://trademachines.fr/agricole

We probably need maintenance services after the end of the project.

New Proposal

Clarification Board Ask a Question

There are no clarification messages.

Description

Juanjo N.

New Proposal

Clarification Board Ask a Question