I need a web scraper configured or built & setup on a VPS

- or -

Post a project like this

Ended at: 01/08/2018

Fixed Price

Posted: 6 years ago
Proposals: 7
Remote
#2062285
Expired

+ have already sent a proposal.

Description

Experience Level: Intermediate

Estimated project duration: less than 1 week

I am looking for someone to build/configure a simple web scraper for a specific site. Please briefly (1 paragraph or 4/5 sentences) outline your approach in your quote. Please no copy & paste proposals.

About 20 specific, basic, mostly numeric data points will be scraped from a simple HTML table.

The scraper should:

1. Import a file containing all required URLs that use the single page template.
2. Be configurable to:
a) randomise the time in between requests for each URL, e.g. pick a random number of seconds between 1 and 10 seconds.
b) adjust each of the scraped variables, based on a per variable, random, modifier, e.g. if the value of variable “number_of_shoes” is 6, the system should be able to be configured to adjust that by a random multiplier of between 0.8 and 1.3, and then rounded down or up to a single number again, so 6 could be 5, 6, 7 or 8. For another variable, for example “number_of_boots”, a different multiplier range could be applied. Other values would need to be transformed, e.g. convert “black” to “Black”.
c) Set the user agent of the requests from a random selection.
d) Randomise the start time of the scraping.
e) Set the frequency of the scraping (single or multiple times a day).
3. Run without timing out.
4. Write the data to a csv or json file for each “day” of data (each webpage will show data for the next X days) that can be accessed from other browsers/servers.
5. On the understanding that each page contains data related to specific dates, update the data held for a specific date if that dates data is present on multiple scraping attempts.
6. Be written in an appropriate language to suit the application (Python, PHP etc).
7. Route traffic via Tor to ensure anonymity e.g: https://www.linkedin.com/pulse/python-how-scrape-websites-anonymously-afsheen-khosravian, https://jarroba.com/anonymous-scraping-by-tor-network/ or https://deshmukhsuraj.wordpress.com/2015/03/08/anonymous-web-scraping-using-python-and-tor/
8. Per 7, you will need to setup the hosting environment on a Ubuntu VPS (that we provide) to make this possible.
9. As part of this job, you will be responsible for creating the initial file containing all the required URLs (relatively easy by parsing the sitemap file provided by the site).

I have no issues if you want to use a free library/script that you customise to do this work.

The work will be completed via two steps:

1. First, providing me access to a sample file that you have scraped, to show the data integrity of the test scrapes.
2. Secondly, setting up the hosting environment to be able to run the scraper.

New Proposal

Clarification Board Ask a Question

04 Jul 2018

Please let me know if still available and I can do this with mine best expertise.

Description

Sean M.

New Proposal

Clarification Board Ask a Question