Php scraping script
- or -
Post a project like this£121(approx. $151)
- Posted:
- Proposals: 11
- Remote
- #2176961
- OPPORTUNITY
- Expired
18 years expert in PHP, MySQL, HTML, CSS, JavaScript, jQuery, Ajax, Bootstrap, Chrome Extension, VB, VBA, AutIT Programming & Website Data Scraping
Ambala
LinkedIn Lead Expert, Web Scraping, Data Entry, Data mining, Web Crawling, Data extraction and Email Lists
London
20674323247684684312289281279261128591913035791309748135543721181872397301
Description
Experience Level: Expert
This is the first of many hundreds of scraping projects so we are interested in this being built in a robust way as plenty more work will follow. The objective is to scrape through a real estate agents website so we can collect data on property listings. This spec is for a single real estate agents website, but the future projects will be to expand on this so we have scrapers for every real estate agents website running in an automated fashion. They need to be built as robust as possible such that minor changes to their site do not break the script and so when selecting how to navigate the DOM please make sensible choices to try and future proof this. We will be checking your work to ensure we are working with the right calibre developer for the future projects.
So the spec for this project is:
Scrape through a single real estate agents website (https://www.bradfordandhowley.com/) and extract data to php variables. We will separately save these to a db, so you only need to store the data in variables and we will integrate the script into our own framework. The scraper should have headers set to make it look like a user and not an automated scraper. When testing this please put a small timer delay in the scraper to prevent dos attacking the site... this is a 'friendly' scraper. From the root, dynamically navigate through the real estate property listings and extract the following data to php variables:
1. The website root
2. Whether or not robots are allowed ie robots.txt (Boolean)
3. Who built the website (ie Technicweb)
4. Timestamp of scrape.
4. Then for each property for sale (in an array called $properties)... and we only need properties for sale and not lettings:
- unique url of property
- price upper (ie the price the property is listed for as an integer)
- price lower (only populated if its a range i.e. 480000 to 500000)
- price type (ie offers over, in region of, between, price on application etc)
- street
- town
- postcode
- status (on market, sold, stc etc)
- description ... probably best to store this in markdown so we can recreate headers etc..
- room details.... probably best to store this in markdown so we can recreate headers etc..
- property type (detached, semi detached, Terrace, bungalow, flat)
- number of bedrooms
- map co-ordinates (longitude/latitude)
- main cover image
- array of the rest of the property images
Please include logic to check the output of each variable above is as expected and also provide an error variable such that you can detect if the script has run as expected or if the website has changed significantly enough such that the data being extracted in the variables above is not correct.
So the spec for this project is:
Scrape through a single real estate agents website (https://www.bradfordandhowley.com/) and extract data to php variables. We will separately save these to a db, so you only need to store the data in variables and we will integrate the script into our own framework. The scraper should have headers set to make it look like a user and not an automated scraper. When testing this please put a small timer delay in the scraper to prevent dos attacking the site... this is a 'friendly' scraper. From the root, dynamically navigate through the real estate property listings and extract the following data to php variables:
1. The website root
2. Whether or not robots are allowed ie robots.txt (Boolean)
3. Who built the website (ie Technicweb)
4. Timestamp of scrape.
4. Then for each property for sale (in an array called $properties)... and we only need properties for sale and not lettings:
- unique url of property
- price upper (ie the price the property is listed for as an integer)
- price lower (only populated if its a range i.e. 480000 to 500000)
- price type (ie offers over, in region of, between, price on application etc)
- street
- town
- postcode
- status (on market, sold, stc etc)
- description ... probably best to store this in markdown so we can recreate headers etc..
- room details.... probably best to store this in markdown so we can recreate headers etc..
- property type (detached, semi detached, Terrace, bungalow, flat)
- number of bedrooms
- map co-ordinates (longitude/latitude)
- main cover image
- array of the rest of the property images
Please include logic to check the output of each variable above is as expected and also provide an error variable such that you can detect if the script has run as expected or if the website has changed significantly enough such that the data being extracted in the variables above is not correct.
Adam P.
100% (3)Projects Completed
4
Freelancers worked with
4
Projects awarded
57%
Last project
5 Aug 2016
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies