Text Data Scraping & Word Counting Application
4837
£800(approx. $1.0k)
- Posted:
- Proposals: 6
- Remote
- #50684
- Archived
Flexible Software developer and designer Inc. Lamp, CSS, MySQL, JavaScript, AJAX, .NET, IOS and more
Sutton in Ashfield
11158199273109321130546120358126759
Description
Experience Level: Expert
Hi,
I\'m looking for a developer to initially build a tool which forms part of a larger project I\'m working on. There is a potential for a lot more work for the right person.
I have a more detailed technical specification which I will share with the freelancer I work with. I would rather not post all details publicly.
The system should use Yahoo BOSS or similar suggestion to retain and store in the database a set of search results based upon a given keyword entered into the system. This should be able to be performed more than once and results be de-duplicated and stored in the database.
The system should then be able to query the WHO is database to bring back a couple of additional details for each URL we\'ve stored.
At this point we should be able to either download and re-upload a CSV of these websites or manually deselect those we don\'t want to continue to the next stage.
For the URL\'s we want to proceed with the system should visit the site and scrape around 5 - 10 pages of text from the URL and store it in the database. All HTML should be stripped out.
The system should perform a word count function on each site\'s store of scraped data. Similar to this program http://textalyser.net/
There are a couple of other processes to go through but what I\'d like as a finished product is a database which essentially contains...
A websites URL, some of it\'s who is information and the phrases it uses most frequently based upon the word count performed on the site.
If you think this is something you can do and would be interested in please let me know.
I\'m looking for a developer to initially build a tool which forms part of a larger project I\'m working on. There is a potential for a lot more work for the right person.
I have a more detailed technical specification which I will share with the freelancer I work with. I would rather not post all details publicly.
The system should use Yahoo BOSS or similar suggestion to retain and store in the database a set of search results based upon a given keyword entered into the system. This should be able to be performed more than once and results be de-duplicated and stored in the database.
The system should then be able to query the WHO is database to bring back a couple of additional details for each URL we\'ve stored.
At this point we should be able to either download and re-upload a CSV of these websites or manually deselect those we don\'t want to continue to the next stage.
For the URL\'s we want to proceed with the system should visit the site and scrape around 5 - 10 pages of text from the URL and store it in the database. All HTML should be stripped out.
The system should perform a word count function on each site\'s store of scraped data. Similar to this program http://textalyser.net/
There are a couple of other processes to go through but what I\'d like as a finished product is a database which essentially contains...
A websites URL, some of it\'s who is information and the phrases it uses most frequently based upon the word count performed on the site.
If you think this is something you can do and would be interested in please let me know.
James R.
100% (10)Projects Completed
16
Freelancers worked with
29
Projects awarded
58%
Last project
18 Mar 2021
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies