Scrape cia factbook with selenium webdriver

- or -

Post a project like this

Ends in (days)

3778

Fixed Price

£300(approx. $375)

Posted: 10 years ago
Proposals: 5
Remote
#366738
Awarded

have already sent a proposal.

Description

Experience Level: Expert

Estimated project duration: less than 1 week

I would like the java project that can scrape the data from cia factbook so can create some statistics.

1.1 Skill Set Required:
If you do not have the following skills, PLEASE do not apply!

1.1.1 JAVA
This is the language the application must be written in.
Since the source code must be delivered (as an eclipse java project),
I plan maintain the code myself and as a statistician my java not very good so...
please use OOA/OOD and the highest coding standards.

1.1.2 Selenium Webdriver - The Tool to be used for scraping.
1.1.3 HTML, CSS, XPATH - To select the nodes from html document.
1.1.4 REGEX - To further process the nodes for output once they have been selected.
1.1.5 XML - Output of processed data is going to be in XML.

2.1 Software Requirements:
Must tick off 'all' of the points in the Software Requirement.

2.1.1 Must be Java application.
2.1.2 Must be Jave console application (No fancy GUI).

2.2.1 Must output logs to standard output.
2.2.2 Must also output logs to file using log4j.

2.3.1 Must us Selenium Server Standalone jar for scraping.
2.3.2 Must us the Selenium FirefoxDriver WebDriver.

2.4.1 Must have following command line arguments:

Usage: my_app_name -option
where option is one of the following:

2.4.1.1 -url
2.4.1.2 -list
2.4.1.3 -version print product version and exit
2.4.1.4 -? -help print this help message

2.5.1 Must output the following components:

2.5.1.1 xml file called factbook.{url}.xml for each url (see chapter 2.6.1)
2.5.1.2 flag gif file called factbook.flag.{url}.giff for each url (image is on factbook page)
2.5.1.3 locator gif file called factbook.locator.{url}.giff for each url (image is on factbook page)
2.5.1.4 map gif file called factbook.map.{url}.giff for each url (image is on factbook page)

2.6.1 Uploaded is sample output xml file for this url, https://www.cia.gov/library/publications/the-world-factbook/geos/br.html.
2.6.2 Please peruse sample output xml file (it has some comments) and the url and if all is not clear please ask questions.

3.1. Deliverables
3.1.1 XML output from 5 random factbook pages of my selection.
3.1.2 Once happy with 3.1.1, then Java Source code as eclipse project with instuctions how to build/run.

New Proposal

Clarification Board Ask a Question

There are no clarification messages.

Description

Tony W.

New Proposal

Clarification Board Ask a Question