Scrape webpages, no crawling required

- or -

Post a project like this

Ends in (days)

1932

Fixed Price

£86(approx. $114)

Posted: 5 years ago
Proposals: 8
Remote
#3158616
Awarded

+ have already sent a proposal.

Description

Experience Level: Entry

I need to scrape 3324 webpages. I have a list of the urls to scrape, no crawling is required. Each webpage has the same format, only one scraper is required. The urls are provided in a CSV file.

URLs to scrape: https://drive.google.com/file/d/1narrrwVo5GFCyG9wsKfkjPj7AcyGe6MX/view?usp=sharing

For each URL I require this data in json format (example data provided):

url: https://www.xero.com/uk/advisors/accountant/armstrong-watson-156b74095297/
name: Armstrong Watson
headoffice: "Victoria Place, Fairview House, Carlisle, England"
website: https://www.armstrongwatson.co.uk/xero-cloud-accounting
*rootdomain: armstrongwatson.co.uk
*hompage: https://www.armstrongwatson.co.uk
partnerstatus: Platinum champion partner
partnersince: 2013
facebook: https://en-gb.facebook.com/armstrongwatson/
twitter: https://www.twitter.com/armstrongwatson
linkedin: https://www.linkedin.com/company/armstrong-watson/
offices: [{"name": "carlisle", "address":"Victoria Place, Fairview House, Carlisle, CA1 1EX, England", "phone":"+44 01228 690100"},{...}]
officecount: 10
**logo: "armstrongwatson.png"

* Field calculated from website url
** Logo downloaded and all logos put in one folder. To name the logo, take the name field, lowercase it and remove all non-alphanumeric characters, and add .png extension.

Save data to json format. Please validate the json before completing the job. Logo files should be saved in on folder and provided in zip archive file.

Please do not create empty fields. If a field does not exist, omit it from the json.

New Proposal

Clarification Board Ask a Question

There are no clarification messages.