Help with configuring and extending functionality of a webcrawler like apache Nutch
- or -
Post a project like this£15/hr(approx. $19/hr)
- Posted:
- Proposals: 2
- Remote
- #723277
- Expired
Description
Experience Level: Intermediate
General information for the business: Online Investment Analysis Platform being developed
Kind of development: Customization of existing program
Description of requirements/functionality: I am currently developing a website in Java using hibernate and maven, and with a mysql database.
I would like to implement a webcrawler into the application.
I would like to integrate a pre existing web crawler like apache nutch or Crawler4J, and then extend its functionality to configure it to my requirements
As i plan to maintain the website myself, i need to know how everything works, so would like to develop the code myself.
I need someone to spend time with me on a messenger service, that allows screenshare, or skype to help explain how to carry out my requirements, and help me debug all the problems that emerge
I need help to complete the following tasks
Integrating the libraries for the web crawler into my project
the creation of a controller class, that will control the crawler by:
sending the crawler a list of sites to be crawled
setting the speed at which the site can be crawled
defining what documents that should be downloaded
define the depth of the crawl, i.e how many levels should the crawl go for each site specified
define the location to output the saved files
Extend the crawler functionality to download any file type, but if this is too complex, then
extend the crawler to download all pdf files.
Save all downloaded files to the location specified in the controller class
Please note, only apply to this, if you are already familiar with Apache Nutch, or Crawler4j, and can easily help carry out the required tasks.
Specific technologies required: applicants must have experience of using the crawler, i.e apache nutch or crawler4j
Extra notes:
Kind of development: Customization of existing program
Description of requirements/functionality: I am currently developing a website in Java using hibernate and maven, and with a mysql database.
I would like to implement a webcrawler into the application.
I would like to integrate a pre existing web crawler like apache nutch or Crawler4J, and then extend its functionality to configure it to my requirements
As i plan to maintain the website myself, i need to know how everything works, so would like to develop the code myself.
I need someone to spend time with me on a messenger service, that allows screenshare, or skype to help explain how to carry out my requirements, and help me debug all the problems that emerge
I need help to complete the following tasks
Integrating the libraries for the web crawler into my project
the creation of a controller class, that will control the crawler by:
sending the crawler a list of sites to be crawled
setting the speed at which the site can be crawled
defining what documents that should be downloaded
define the depth of the crawl, i.e how many levels should the crawl go for each site specified
define the location to output the saved files
Extend the crawler functionality to download any file type, but if this is too complex, then
extend the crawler to download all pdf files.
Save all downloaded files to the location specified in the controller class
Please note, only apply to this, if you are already familiar with Apache Nutch, or Crawler4j, and can easily help carry out the required tasks.
Specific technologies required: applicants must have experience of using the crawler, i.e apache nutch or crawler4j
Extra notes:
Arif S.
0% (0)Projects Completed
-
Freelancers worked with
-
Projects awarded
0%
Last project
24 Apr 2024
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies