Web Scrapping for 3 car sites
- or -
Post a project like this£200(approx. $255)
- Posted:
- Proposals: 5
- Remote
- #1213567
- Expired
Description
Experience Level: Expert
General information for the website: Web scraper for three car sites for data
Description of requirements/features: We require a tool to scrape carsales.com.au, carsguide.com.au, and redbook.com.au websites for relevant information regarding specific cars and models. Keep in mind we may need to incorporate more sites in the future.
As we understand both redbook.com.au and carsales.com.au are under anti scraping systems.
For each website, we need to set up a MySQL database containing each of the scraped data with each field in its own individual column. You will need to also write an aggregation interface to work with the result.
The tool should be made available for us to:
- initiate the scrape and run against the sites and populate the database with new data periodically until no more results are present
- configure how often to scrape the sites i.e. daily, weekly, monthly should we want it to continously run it until all results on the page(s) are scraped.
- import CSV (or Excel) of cars
The data to scrape for carsguide.com.au include the following:
- URL
- Price (integer)
- Model
- Make
- Variant
- Series
- Colour ext
- Body type
- Seats
- Doors
- Transmission
- Engine
- Drive Type
- Fuel Type
- Fuel Consumption
- Kilometres (integer)
- Marketing Year (MY)
- Year
- Features
For example
- URL : http://www.carsguide.com.au/cars-for-sale/NEW_UCJ16D_Mystic_Violet/HOLDEN--SPARK------Hatchback?searchKey=cg_s.38ffc6de7550de6aec952b87c417c3ac#pos3
- Price : 16690
- Model : Holden
- Make : Spark
- Variant : LS
- Series : MP
- Colour ext : Mystic Violet
- Body type : Hatchback
- Seats : 5
- Doors : 5
- Transmission : Automatic
- Engine : 4 cyl, 1.4 L
- Drive Type : Front
- Fuel Type : Unleaded
- Fuel Consumption : 5.8 L / 100 km
- Kilometres : (not showing on some cars, but when it does extract it)
- Marketing Year (MY) : 16
- Year : 2016
- Features : (everything under the features tab, comma separated)
The data to scrape for carsales.com.au include the following:
- URL
- Price
- Model
- Make
- Badge
- Series
- Colour
- Body style
- Seat Capacity
- Doors
- Transmission
- Engine
- Drive Type
- Fuel Type
- Fuel Consumption
- Kilometres
- Marketing Year (MY)
- Year
- Features
For example
- URL : http://www.carsales.com.au/bnc/details/Holden-Spark-2016/OAG-AD-12588414/?gts=OAG-AD-12588414>ssaleid=OAG-AD-12588414
- Price : 14990
- Model : Holden
- Make : Spark
- Badge : LS
- Series : MP
- Colour : Solar Red
- Body style : Hatch
- Seat Capacity : 5
- Doors : 5
- Transmission : Manual
- Engine : 4cyl 1.4L Petrol
- Drive Type : Front
- Fuel Type : Petrol - Unleaded ULP
- Fuel Consumption : 5.2 (L/100km)
- Kilometres : (not showing on some cars, but when it does extract it)
- Marketing Year (MY) : 16
- Year : 2016
- Features : (everything under the features tab, comma separated, without the headings)
The makes and models are in the excel sheet provided under the Makes and Models sheet. When scraping, we require every listing from the search result on the make and model.
The examples show the field names that are on the site but we will need to also normalize the database field to have the same column name regardless of the website we are scraping.
The scraper needs to be 100% tested without issues of anti spam detection.
Extra notes:
Description of requirements/features: We require a tool to scrape carsales.com.au, carsguide.com.au, and redbook.com.au websites for relevant information regarding specific cars and models. Keep in mind we may need to incorporate more sites in the future.
As we understand both redbook.com.au and carsales.com.au are under anti scraping systems.
For each website, we need to set up a MySQL database containing each of the scraped data with each field in its own individual column. You will need to also write an aggregation interface to work with the result.
The tool should be made available for us to:
- initiate the scrape and run against the sites and populate the database with new data periodically until no more results are present
- configure how often to scrape the sites i.e. daily, weekly, monthly should we want it to continously run it until all results on the page(s) are scraped.
- import CSV (or Excel) of cars
The data to scrape for carsguide.com.au include the following:
- URL
- Price (integer)
- Model
- Make
- Variant
- Series
- Colour ext
- Body type
- Seats
- Doors
- Transmission
- Engine
- Drive Type
- Fuel Type
- Fuel Consumption
- Kilometres (integer)
- Marketing Year (MY)
- Year
- Features
For example
- URL : http://www.carsguide.com.au/cars-for-sale/NEW_UCJ16D_Mystic_Violet/HOLDEN--SPARK------Hatchback?searchKey=cg_s.38ffc6de7550de6aec952b87c417c3ac#pos3
- Price : 16690
- Model : Holden
- Make : Spark
- Variant : LS
- Series : MP
- Colour ext : Mystic Violet
- Body type : Hatchback
- Seats : 5
- Doors : 5
- Transmission : Automatic
- Engine : 4 cyl, 1.4 L
- Drive Type : Front
- Fuel Type : Unleaded
- Fuel Consumption : 5.8 L / 100 km
- Kilometres : (not showing on some cars, but when it does extract it)
- Marketing Year (MY) : 16
- Year : 2016
- Features : (everything under the features tab, comma separated)
The data to scrape for carsales.com.au include the following:
- URL
- Price
- Model
- Make
- Badge
- Series
- Colour
- Body style
- Seat Capacity
- Doors
- Transmission
- Engine
- Drive Type
- Fuel Type
- Fuel Consumption
- Kilometres
- Marketing Year (MY)
- Year
- Features
For example
- URL : http://www.carsales.com.au/bnc/details/Holden-Spark-2016/OAG-AD-12588414/?gts=OAG-AD-12588414>ssaleid=OAG-AD-12588414
- Price : 14990
- Model : Holden
- Make : Spark
- Badge : LS
- Series : MP
- Colour : Solar Red
- Body style : Hatch
- Seat Capacity : 5
- Doors : 5
- Transmission : Manual
- Engine : 4cyl 1.4L Petrol
- Drive Type : Front
- Fuel Type : Petrol - Unleaded ULP
- Fuel Consumption : 5.2 (L/100km)
- Kilometres : (not showing on some cars, but when it does extract it)
- Marketing Year (MY) : 16
- Year : 2016
- Features : (everything under the features tab, comma separated, without the headings)
The makes and models are in the excel sheet provided under the Makes and Models sheet. When scraping, we require every listing from the search result on the make and model.
The examples show the field names that are on the site but we will need to also normalize the database field to have the same column name regardless of the website we are scraping.
The scraper needs to be 100% tested without issues of anti spam detection.
Extra notes:
Genial Web Techologies
98% (9)Projects Completed
9
Freelancers worked with
9
Projects awarded
18%
Last project
6 Feb 2021
India
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
Are you in a rush or flexible on deadline?
-
Are you in a rush or flexible on deadline?
5514155101
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies