Data scrapping tools required
- or -
Post a project like this1247
£300(approx. $377)
- Posted:
- Proposals: 12
- Remote
- #3058607
- OPPORTUNITY
- Awarded
PPH TOP Website & App Developer✮LOGO & Graphic Designer✮Content Writer✮Translator
Dubai
PhD in computer science specialized in network telecommunication, virtualization and web development
Matmata
Senior Python/PHP Developer, AWS Certified Solution Architect, API Expert, DevOps proficient Developer, Web Scraping Expert, Docker/Kubernetes Experienced Developer
Bay Minette
524121206743227480425041112746752307177934708184103284488674549748875012672
Description
Experience Level: Intermediate
All local British councils have an obligation to publish a directory of childcare businesses in their area for parents to use when looking for a childminder/day nursery/preschool etc. We want to collate an internal database of contact details for all of these organisations, and are looking for an experienced code-based data scraper to produce a handful of tools to facilitate this.
Although there are approximately 200 websites which need scraping, 80%+ of them use one of three off the shelf products to publish their directory. A single tool for each of these directory styles will therefore be able to scrape data from multiple websites.
The tools must work as follows:
Be able to be run from windows command line (python preferred)
Take an input of a csv file listing all the urls which share a common directory format
Open each website in turn
For one directory style, open each record’s individual page in turn
Extract key contact information from each record in the directory based on css selectors
Save each record to a csv output (must reliably save each record in turn so that a failure mid-process does not loose all successfully extracted data up to the exception)
We will provide:
A template for the input csv
A template for the output csv showing required data to be extracted
3 example urls for each directory style
The selected freelancer must produce a separate tool for each directory style in line with the spec above, and must supply extracted data for all example urls to demonstrate their code’s efficacy.
Although there are approximately 200 websites which need scraping, 80%+ of them use one of three off the shelf products to publish their directory. A single tool for each of these directory styles will therefore be able to scrape data from multiple websites.
The tools must work as follows:
Be able to be run from windows command line (python preferred)
Take an input of a csv file listing all the urls which share a common directory format
Open each website in turn
For one directory style, open each record’s individual page in turn
Extract key contact information from each record in the directory based on css selectors
Save each record to a csv output (must reliably save each record in turn so that a failure mid-process does not loose all successfully extracted data up to the exception)
We will provide:
A template for the input csv
A template for the output csv showing required data to be extracted
3 example urls for each directory style
The selected freelancer must produce a separate tool for each directory style in line with the spec above, and must supply extracted data for all example urls to demonstrate their code’s efficacy.
Gregory Roger M.
100% (2)Projects Completed
2
Freelancers worked with
2
Projects awarded
25%
Last project
6 Oct 2021
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies