Java Programming / Web Scraping / Data Mining from UK Companies House

  • Posted:
  • Proposals: 11
  • Remote
  • #30825
  • Expired
Aneesh V.Mantra L.Noha N.Nathan B.Eric T. + 6 others have already sent a proposal.
  • 1

Description

Experience Level: Intermediate
I require a web scraping / data mining application which can extract and collate company information that is freely available from the UK Companies House \"WebCHeck\" service on its website.

The application should be written in Java (1.4 upwards) and must be able to run standalone on Windows XP (I am however willing to consider using alternatives to Java). The source code must also be provided.
The application will need to automatically navigate through the WebCHeck pages and iterate through a large number (approx 2 million plus) of company records to extract the fields required (see below). These records should be collated into a text / CSV file output.

The fields required from WebCHeck are
Company Name
Company Number
Status
Date of Incorporation
Country of Origin
Company Type
Nature of Business
Accounting Reference Date
Last Accounts Made Up To
Next Accounts Due
Last Return Made Up To
Next Return Due
Last Members List

New Proposal

Create an account now and send a proposal now to get this job.

Sign up

Clarification Board Ask a Question

    There are no clarification messages.