Data scrapping tools required

- or -

Post a project like this

Ends in (days)

1694

Fixed Price

£300(approx. $404)

Posted: 5 years ago
Proposals: 12
Remote
#3058607
OPPORTUNITY
Awarded

+ have already sent a proposal.

Description

Experience Level: Intermediate

All local British councils have an obligation to publish a directory of childcare businesses in their area for parents to use when looking for a childminder/day nursery/preschool etc. We want to collate an internal database of contact details for all of these organisations, and are looking for an experienced code-based data scraper to produce a handful of tools to facilitate this.

Although there are approximately 200 websites which need scraping, 80%+ of them use one of three off the shelf products to publish their directory. A single tool for each of these directory styles will therefore be able to scrape data from multiple websites.

The tools must work as follows:
Be able to be run from windows command line (python preferred)
Take an input of a csv file listing all the urls which share a common directory format
Open each website in turn
For one directory style, open each record’s individual page in turn
Extract key contact information from each record in the directory based on css selectors
Save each record to a csv output (must reliably save each record in turn so that a failure mid-process does not loose all successfully extracted data up to the exception)
We will provide:
A template for the input csv
A template for the output csv showing required data to be extracted
3 example urls for each directory style

The selected freelancer must produce a separate tool for each directory style in line with the spec above, and must supply extracted data for all example urls to demonstrate their code’s efficacy.

New Proposal

Clarification Board Ask a Question

There are no clarification messages.