
Linux service to automate web scrapes
5542
£400(approx. $534)
- Posted:
- Proposals: 4
- Remote
- #48092
- Archived
Description
Experience Level: Expert
Development of a Linux based application that will automate the browsing of a given URL and store the resultant HTML.
A web service (already written) will provide a URL to be scraped by the application/service running on a Linux machine. The Linux machine will need to use a recognised web browser rendering engine (such as Gecko or WebKit) to browse the URL to ensure it downloads all parts of the page correctly.
The web service will provide a simple list of parts of the HTML document to be replaced by shortened tags. (ie <body> might be replaced with #1#).
Finally the html document needs to be zipped, before sending it back to the web service to be stored.
The web scrapper should run unattended on a linux machine and be able to run 24/7.
A web service (already written) will provide a URL to be scraped by the application/service running on a Linux machine. The Linux machine will need to use a recognised web browser rendering engine (such as Gecko or WebKit) to browse the URL to ensure it downloads all parts of the page correctly.
The web service will provide a simple list of parts of the HTML document to be replaced by shortened tags. (ie <body> might be replaced with #1#).
Finally the html document needs to be zipped, before sending it back to the web service to be stored.
The web scrapper should run unattended on a linux machine and be able to run 24/7.
Ben V.
0% (0)Projects Completed
1
Freelancers worked with
1
Projects awarded
50%
Last project
28 Jun 2011
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies