Web Scrapping for 3 car sites

  • Posted:
  • Proposals: 9
  • Remote
  • #1213567
  • Expired
Kaycelyn G.Satish G.UMAIR K.Oleg V.A.S.M R. + 4 others have already sent a proposal.
  • 0

Description

Experience Level: Expert
General information for the website: Web scraper for three car sites for data
Description of requirements/features: We require a tool to scrape carsales.com.au, carsguide.com.au, and redbook.com.au websites for relevant information regarding specific cars and models. Keep in mind we may need to incorporate more sites in the future.

As we understand both redbook.com.au and carsales.com.au are under anti scraping systems.

For each website, we need to set up a MySQL database containing each of the scraped data with each field in its own individual column. You will need to also write an aggregation interface to work with the result.

The tool should be made available for us to:
- initiate the scrape and run against the sites and populate the database with new data periodically until no more results are present
- configure how often to scrape the sites i.e. daily, weekly, monthly should we want it to continously run it until all results on the page(s) are scraped.
- import CSV (or Excel) of cars

The data to scrape for carsguide.com.au include the following:
- URL
- Price (integer)
- Model
- Make
- Variant
- Series
- Colour ext
- Body type
- Seats
- Doors
- Transmission
- Engine
- Drive Type
- Fuel Type
- Fuel Consumption
- Kilometres (integer)
- Marketing Year (MY)
- Year
- Features

For example
- URL : http://www.carsguide.com.au/cars-for-sale/NEW_UCJ16D_Mystic_Violet/HOLDEN--SPARK------Hatchback?searchKey=cg_s.38ffc6de7550de6aec952b87c417c3ac#pos3
- Price : 16690
- Model : Holden
- Make : Spark
- Variant : LS
- Series : MP
- Colour ext : Mystic Violet
- Body type : Hatchback
- Seats : 5
- Doors : 5
- Transmission : Automatic
- Engine : 4 cyl, 1.4 L
- Drive Type : Front
- Fuel Type : Unleaded
- Fuel Consumption : 5.8 L / 100 km
- Kilometres : (not showing on some cars, but when it does extract it)
- Marketing Year (MY) : 16
- Year : 2016
- Features : (everything under the features tab, comma separated)


The data to scrape for carsales.com.au include the following:
- URL
- Price
- Model
- Make
- Badge
- Series
- Colour
- Body style
- Seat Capacity
- Doors
- Transmission
- Engine
- Drive Type
- Fuel Type
- Fuel Consumption
- Kilometres
- Marketing Year (MY)
- Year
- Features


For example
- URL : http://www.carsales.com.au/bnc/details/Holden-Spark-2016/OAG-AD-12588414/?gts=OAG-AD-12588414&gtssaleid=OAG-AD-12588414
- Price : 14990
- Model : Holden
- Make : Spark
- Badge : LS
- Series : MP
- Colour : Solar Red
- Body style : Hatch
- Seat Capacity : 5
- Doors : 5
- Transmission : Manual
- Engine : 4cyl 1.4L Petrol
- Drive Type : Front
- Fuel Type : Petrol - Unleaded ULP
- Fuel Consumption : 5.2 (L/100km)
- Kilometres : (not showing on some cars, but when it does extract it)
- Marketing Year (MY) : 16
- Year : 2016
- Features : (everything under the features tab, comma separated, without the headings)

The makes and models are in the excel sheet provided under the Makes and Models sheet. When scraping, we require every listing from the search result on the make and model.

The examples show the field names that are on the site but we will need to also normalize the database field to have the same column name regardless of the website we are scraping.

The scraper needs to be 100% tested without issues of anti spam detection.
Extra notes:

New Proposal

Create an account now and send a proposal now to get this project.

Sign up

Clarification Board Ask a Question

  • David B.

    Are you in a rush or flexible on deadline?

  • David B.

    Are you in a rush or flexible on deadline?