Scrape cia factbook with selenium webdriver
- or -
Post a project like this3778
£300(approx. $375)
- Posted:
- Proposals: 5
- Remote
- #366738
- Awarded
Description
Experience Level: Expert
Estimated project duration: less than 1 week
I would like the java project that can scrape the data from cia factbook so can create some statistics.
1.1 Skill Set Required:
If you do not have the following skills, PLEASE do not apply!
1.1.1 JAVA
This is the language the application must be written in.
Since the source code must be delivered (as an eclipse java project),
I plan maintain the code myself and as a statistician my java not very good so...
please use OOA/OOD and the highest coding standards.
1.1.2 Selenium Webdriver - The Tool to be used for scraping.
1.1.3 HTML, CSS, XPATH - To select the nodes from html document.
1.1.4 REGEX - To further process the nodes for output once they have been selected.
1.1.5 XML - Output of processed data is going to be in XML.
2.1 Software Requirements:
Must tick off 'all' of the points in the Software Requirement.
2.1.1 Must be Java application.
2.1.2 Must be Jave console application (No fancy GUI).
2.2.1 Must output logs to standard output.
2.2.2 Must also output logs to file using log4j.
2.3.1 Must us Selenium Server Standalone jar for scraping.
2.3.2 Must us the Selenium FirefoxDriver WebDriver.
2.4.1 Must have following command line arguments:
Usage: my_app_name -option
where option is one of the following:
2.4.1.1 -url
2.4.1.2 -list
2.4.1.3 -version print product version and exit
2.4.1.4 -? -help print this help message
2.5.1 Must output the following components:
2.5.1.1 xml file called factbook.{url}.xml for each url (see chapter 2.6.1)
2.5.1.2 flag gif file called factbook.flag.{url}.giff for each url (image is on factbook page)
2.5.1.3 locator gif file called factbook.locator.{url}.giff for each url (image is on factbook page)
2.5.1.4 map gif file called factbook.map.{url}.giff for each url (image is on factbook page)
2.6.1 Uploaded is sample output xml file for this url, https://www.cia.gov/library/publications/the-world-factbook/geos/br.html.
2.6.2 Please peruse sample output xml file (it has some comments) and the url and if all is not clear please ask questions.
3.1. Deliverables
3.1.1 XML output from 5 random factbook pages of my selection.
3.1.2 Once happy with 3.1.1, then Java Source code as eclipse project with instuctions how to build/run.
1.1 Skill Set Required:
If you do not have the following skills, PLEASE do not apply!
1.1.1 JAVA
This is the language the application must be written in.
Since the source code must be delivered (as an eclipse java project),
I plan maintain the code myself and as a statistician my java not very good so...
please use OOA/OOD and the highest coding standards.
1.1.2 Selenium Webdriver - The Tool to be used for scraping.
1.1.3 HTML, CSS, XPATH - To select the nodes from html document.
1.1.4 REGEX - To further process the nodes for output once they have been selected.
1.1.5 XML - Output of processed data is going to be in XML.
2.1 Software Requirements:
Must tick off 'all' of the points in the Software Requirement.
2.1.1 Must be Java application.
2.1.2 Must be Jave console application (No fancy GUI).
2.2.1 Must output logs to standard output.
2.2.2 Must also output logs to file using log4j.
2.3.1 Must us Selenium Server Standalone jar for scraping.
2.3.2 Must us the Selenium FirefoxDriver WebDriver.
2.4.1 Must have following command line arguments:
Usage: my_app_name -option
where option is one of the following:
2.4.1.1 -url
2.4.1.2 -list
2.4.1.3 -version print product version and exit
2.4.1.4 -? -help print this help message
2.5.1 Must output the following components:
2.5.1.1 xml file called factbook.{url}.xml for each url (see chapter 2.6.1)
2.5.1.2 flag gif file called factbook.flag.{url}.giff for each url (image is on factbook page)
2.5.1.3 locator gif file called factbook.locator.{url}.giff for each url (image is on factbook page)
2.5.1.4 map gif file called factbook.map.{url}.giff for each url (image is on factbook page)
2.6.1 Uploaded is sample output xml file for this url, https://www.cia.gov/library/publications/the-world-factbook/geos/br.html.
2.6.2 Please peruse sample output xml file (it has some comments) and the url and if all is not clear please ask questions.
3.1. Deliverables
3.1.1 XML output from 5 random factbook pages of my selection.
3.1.2 Once happy with 3.1.1, then Java Source code as eclipse project with instuctions how to build/run.
Tony W.
100% (2)Projects Completed
5
Freelancers worked with
4
Projects awarded
67%
Last project
10 Oct 2020
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies