Scrape a website & extract company data, save to CSV data files & download items
- or -
Post a project like this3557
£50(approx. $63)
- Posted:
- Proposals: 7
- Remote
- #498126
- Completed
Description
Experience Level: Intermediate
Estimated project duration: 1 - 2 weeks
To provide a script to scrape the following URL:
http://www.harperswineriesguide.com/Default.aspx
Using the alphabetical index of winery company names to find each winery and to write to a CSV file and download the linked files for each company.
Ideally using Python and standard scraping libraries (Beautiful Soup etc), but willing to consider PHP.
To create a database as follows and download the linked files (logos and documents) to the local PC.
Table 1 - Company Information
-Fields:
1. CompanyID (as assigned by their database (see href="amendentry.aspx?id=14526")
2. Company Name
3. Company Logo (filename) named as Company ID in Logo sub Directory
4. Document ID (filename) named as CompanyID_x in Document sub Directory
5. Company Statement (Text)
6-11. Address1-6
12. Country
13. Telephone
14. Email Address
15. Website
16. Products and Services Item(s) (comma separated list)
17. Key personnel1 (Name: Job Title)
18. Key personnel2 (Name: Job Title)
19. Key personnel3 (Name: Job Title)
20. Key personnel4 (Name: Job Title)
21. Key personnel5 (Name: Job Title)
22. Date Downloaded (the date the utility is run)
Directory structure:
Base Directory (definable in script by modification of a variable)
- "Logo" Directory
- "Document" Directory
To log any errors to a separate "error_dd-mm-yy_hh;mm:ss.txt" with full details of any errors and the point in the code that caused or found the error (ie 404, time out, access denied)
To not overload the target website and be generally be sympathetic as a scraping script.
Please ask any questions after reviewing the specification and the target website.
Please explain fully your proposed language and any modules you'll be using to achieve the result.
We will want the fully tested and working script with some basic annotations to allow us to maintain it over future periods for regular re-use to refresh the data.
http://www.harperswineriesguide.com/Default.aspx
Using the alphabetical index of winery company names to find each winery and to write to a CSV file and download the linked files for each company.
Ideally using Python and standard scraping libraries (Beautiful Soup etc), but willing to consider PHP.
To create a database as follows and download the linked files (logos and documents) to the local PC.
Table 1 - Company Information
-Fields:
1. CompanyID (as assigned by their database (see href="amendentry.aspx?id=14526")
2. Company Name
3. Company Logo (filename) named as Company ID in Logo sub Directory
4. Document ID (filename) named as CompanyID_x in Document sub Directory
5. Company Statement (Text)
6-11. Address1-6
12. Country
13. Telephone
14. Email Address
15. Website
16. Products and Services Item(s) (comma separated list)
17. Key personnel1 (Name: Job Title)
18. Key personnel2 (Name: Job Title)
19. Key personnel3 (Name: Job Title)
20. Key personnel4 (Name: Job Title)
21. Key personnel5 (Name: Job Title)
22. Date Downloaded (the date the utility is run)
Directory structure:
Base Directory (definable in script by modification of a variable)
- "Logo" Directory
- "Document" Directory
To log any errors to a separate "error_dd-mm-yy_hh;mm:ss.txt" with full details of any errors and the point in the code that caused or found the error (ie 404, time out, access denied)
To not overload the target website and be generally be sympathetic as a scraping script.
Please ask any questions after reviewing the specification and the target website.
Please explain fully your proposed language and any modules you'll be using to achieve the result.
We will want the fully tested and working script with some basic annotations to allow us to maintain it over future periods for regular re-use to refresh the data.
Richard L.
99% (23)Projects Completed
10
Freelancers worked with
17
Projects awarded
75%
Last project
20 Sep 2019
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies