Data scraper and Text file/spreadsheet creation
- or -
Post a project like this4065
£50(approx. $63)
- Posted:
- Proposals: 4
- Remote
- #215306
- Awarded
Description
Experience Level: Intermediate
I want to create two things based on data from news sites that I have the login details for.
1 - Scrape the data from sites such as Newsbank and the Guardian Digital Archive to extract the Features articles. I want to collect the following data: Headline, Body Copy, Author, Date.
2 - Put this data into two formats. Put the data (headline, body copy, author, date) into a spreadsheet. And put the Body copy from each story into a separate .txt file and its filename being in the newspaper-date-headline.txt format. In the .txt the date format is to be 20130209 format and in 09/02/2013 in the spreadsheet.
In an ideal world there is a scraper to be created that can be used on ScraperWiki or similar that does this as I want to use this on the Times, Guardian, Mirror newspapers over several years of articles. If needed I can provide my login details (although if you are in the UK you can log into Newsbank using your local library card number via your local library website).
Alternatively, I can manually go into Newsbank and put the Features articles into a .pdf or .txt/.html file (not preferable but it works) - example of data from me doing this attached. Again some kind of code (I have Mac MS Office so perhaps a visual basic program, or a Mac Automator program?) to extract the data would be preferable to me having to copy and send the HTML/PDF over.
Any questions or clarification needed please ask.
Jonathan
1 - Scrape the data from sites such as Newsbank and the Guardian Digital Archive to extract the Features articles. I want to collect the following data: Headline, Body Copy, Author, Date.
2 - Put this data into two formats. Put the data (headline, body copy, author, date) into a spreadsheet. And put the Body copy from each story into a separate .txt file and its filename being in the newspaper-date-headline.txt format. In the .txt the date format is to be 20130209 format and in 09/02/2013 in the spreadsheet.
In an ideal world there is a scraper to be created that can be used on ScraperWiki or similar that does this as I want to use this on the Times, Guardian, Mirror newspapers over several years of articles. If needed I can provide my login details (although if you are in the UK you can log into Newsbank using your local library card number via your local library website).
Alternatively, I can manually go into Newsbank and put the Features articles into a .pdf or .txt/.html file (not preferable but it works) - example of data from me doing this attached. Again some kind of code (I have Mac MS Office so perhaps a visual basic program, or a Mac Automator program?) to extract the data would be preferable to me having to copy and send the HTML/PDF over.
Any questions or clarification needed please ask.
Jonathan
Jonathan R.
100% (1)Projects Completed
1
Freelancers worked with
1
Projects awarded
67%
Last project
26 Feb 2013
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies