I need a Gumtree scraper which scans and stores related info of new posts

- or -

Post a project like this

Ended at: 06/02/2016

Fixed Price

£60(approx. $75)

Posted: 8 years ago
Proposals: 6
Remote
#1008661
Expired

+ have already sent a proposal.

Description

Experience Level: Intermediate

General information for the website: I run a direct marketing tool for property managers and agents
Num. of web pages/modules: 1
Description of requirements/features: I run a business app which collects property ads posted from public websites and feeds it into a database where it can later be presented to users of the system in the aid of direct marketing to them. My system is built in PHP and uses MySQL to store data.

I need a single PHP script which whenever called will connect via CURL to gumtree.com and provide gumtree with a request to return the listings of private ads for property to rent within a certain search location (provided as GET argument) ie:

https://www.gumtree.com/search?search_category=flats-and-houses-for-rent&q=&search_location=Walsall

The GET parameters above can be added as variables at the top of the script. I can later modify this to accept input from other scripts or add additional GET arguments. So to illustrate, something such as this is acceptable:

$website_url = 'https://www.gumtree.com/search?';
$category = 'flats-and-houses-for-rent';
$search_loc = 'Walsall';

This initial CURL request will return a listings page. The script should then iterate through every ad link provided on the return HTML (and the following two pages in order of pagination) and collect metadata from each ad including:

The unique ID of this post if any
The poster's name
The poster's telephone if provided, if not the email address
The title of the ad
The rental price for the property
The seller type
The date the add was posted
The property type
The number of bedrooms

The script has to be able to parse the HTML that gets returned from each ad page and extract the above information as an array of PHP variables.

I will process this array separately for input into the database. You will not need to do any cleansing or validation on the values, just extract the TEXT NODES and strip any and all HTML from the above metadata before assigning them to variables. If blank text nodes are encountered, that is also fine. I will process these later.

There is no front-facing interface, no output that needs to be presented to the screen and ideally no libraries should be linked to if the entire codebase can be added to a single PHP file.
Extra notes: The project will be for a PHP HTML scraper and requires metadata to be pulled from each return page, forming a large array comprised of individual arrays comprised of the ads' metadata. Ie:

LISTINGS ARRAY = [
LISTING = [ uid= ... , poster_name = ... , poster_tel = ... , title = ... , rental_price = ... , seller_type = ... , data_posted = .. , etc ]
LISTING = [ uid= ... , poster_name = ... , poster_tel = ... , title = ... , rental_price = ... , seller_type = ... , data_posted = .. , etc ]
LISTING = [ uid= ... , poster_name = ... , poster_tel = ... , title = ... , rental_price = ... , seller_type = ... , data_posted = .. , etc ]
]

Extra notes: I take great pride in the software I have created and have been involved in all aspects of its development until now however I no longer have the time to sit and code everything myself.

This project best serves a developer who has created a PHP scraper in the past and I do not mind if code is reused extensively or integrated from sources where the developer has the license or right to do so.

New Proposal

Clarification Board Ask a Question

There are no clarification messages.