Back to all jobs
Client Details
- Member Since: Oct 2011
- Last Login: 17 Mar 2012
- Jobs Posted: 1
- Jobs Awarded: 1
- Paid out: $1,411
Category:IT/Web/Programming > Web Programming
ID:95694
Title:data processing
Location:Anywhere
Job Description
I have thousands of source HTML files each containing multiple Articles (formatted text) one example is attached. The job is to strip out the page header, footer and border stuff and extract all the articles into individual files (one html file per article), categorise and group under major headings/sub-headings and build an Excel/CSV file with meta-data pertaining to each article (name, header, original file names, date, author, etc), basically one line per article. The individual articles will need to be saved in HTML one file per article.
Job Budget
Type:Fixed Price
Budget:£500 - £800 (Approx $784 - $1,254)
Additional Information
Attached Files:
finance_and_energy.html
Bidding ends:Bidding Closed
Job Posted:07/10/2011 13:55
Clarification Board
J. M. on 07/10/2011 14:20can you gice clarity to the actual number of html files involved with this project?
Reply from Client
H. K. on 07/10/2011 14:38Approx 1100 html files containing on avg 4 articles each. So about 4400 individual articles to be extracted.
R. P. on 07/10/2011 18:55Do all the html files follow the same format?
Reply from Client
H. K. on 07/10/2011 20:22Yes and No ! Half are one format half s different format.
About half are from one Web site and follow a consistent format, the other half are from a different web site with a slightly different pattern but again consistent for this group. I have not looked at all of the files! but from the dozens or so I have seen the two formats are consistent.
Here are some practical tips on staying safe
- Never pay any money to Clients and or provide sensitive details like your Bank information, Credit Card or Passwords
- Never accept a Job if you have been directed away from PeoplePerHour to another website
- Be cautious when providing your personal contact details (email, telephone number, instant message ID etc) before your Bid is accepted
- If you have been asked to send samples, put a watermark on them or send low resolution examples to avoid being exposed and not getting paid
- Don’t start work before your Bid is accepted and your Escrow deposit is made
- Keep your Escrow account topped up and ensure that you use the Private Message Board to track all pertinent details about the Job e.g. timelines, deliverables etc
- Before you supply final works to Clients, ensure you have adequate funds in Escrow
Remember: if something looks too good to be true, it probably is!
If you are in doubt of the legitimacy of a Job and or Client, let us know by Reporting the Job.
For further support on preventing and reporting fraud please contact the UK’s National fraud reporting centre at http://www.actionfraud.org.uk/ or outside the UK Fraud Watchers at {fraudWatchers}