Experience Level: Expert
I have thousands of source HTML files each containing multiple Articles (formatted text) one example is attached. The job is to strip out the page header, footer and border stuff and extract all the articles into individual files (one html file per article), categorise and group under major headings/sub-headings and build an Excel/CSV file with meta-data pertaining to each article (name, header, original file names, date, author, etc), basically one line per article. The individual articles will need to be saved in HTML one file per article.
Hamid K.0% (0)
20 Oct 2011
Create an account now and send a proposal now to get this project.Sign up
Clarification Board Ask a Question
There are no clarification messages.