Data scraper and Text file/spreadsheet creation

- or -

Post a project like this

Ends in (days)

4880

Fixed Price

£50(approx. $67)

Posted: 13 years ago
Proposals: 4
Remote
#215306
Awarded

have already sent a proposal.

Description

Experience Level: Intermediate

I want to create two things based on data from news sites that I have the login details for.

1 - Scrape the data from sites such as Newsbank and the Guardian Digital Archive to extract the Features articles. I want to collect the following data: Headline, Body Copy, Author, Date.

2 - Put this data into two formats. Put the data (headline, body copy, author, date) into a spreadsheet. And put the Body copy from each story into a separate .txt file and its filename being in the newspaper-date-headline.txt format. In the .txt the date format is to be 20130209 format and in 09/02/2013 in the spreadsheet.

In an ideal world there is a scraper to be created that can be used on ScraperWiki or similar that does this as I want to use this on the Times, Guardian, Mirror newspapers over several years of articles. If needed I can provide my login details (although if you are in the UK you can log into Newsbank using your local library card number via your local library website).

Alternatively, I can manually go into Newsbank and put the Features articles into a .pdf or .txt/.html file (not preferable but it works) - example of data from me doing this attached. Again some kind of code (I have Mac MS Office so perhaps a visual basic program, or a Mac Automator program?) to extract the data would be preferable to me having to copy and send the HTML/PDF over.

Any questions or clarification needed please ask.

Jonathan

New Proposal

Clarification Board Ask a Question

There are no clarification messages.