
Scrape and Parse Trademark & Patent Data into a DB
- or -
Post a project like this3984
$750
- Posted:
- Proposals: 6
- Remote
- #670797
- Awarded
Description
Experience Level: Expert
General information for the business: We collect data
Description of requirements/functionality: IMPORTANT! I will not respond to any form letters or lists of projects that you've done in the past. What I specifically want to see from any proposals is your approach to the problem. Anything other than that will be ignored.
I'm looking for someone to build a scraper to pull down and parse all of the trademark and patent data from this Google Books repository http://www.google.com/googlebooks/uspto.html into a database.
The scraper should crawl and download all of the files, extract them and parse them into the database. Every subsequent crawl of the database should only crawl, download, extract and parse the new files.
I'm open to whatever database you'd like to use although MySQL or MongoDB are preferred.
The deliverables should be the database, an outline of the database schema and the script.
The script should automatically run hourly on a cron job, but should feature a frontend that shows the progress of the crawl and parse job and previous jobs and allows for a forced crawl and parse at any time.
Specific technologies required: python, php, mysql, mongodb
Extra notes:
Description of requirements/functionality: IMPORTANT! I will not respond to any form letters or lists of projects that you've done in the past. What I specifically want to see from any proposals is your approach to the problem. Anything other than that will be ignored.
I'm looking for someone to build a scraper to pull down and parse all of the trademark and patent data from this Google Books repository http://www.google.com/googlebooks/uspto.html into a database.
The scraper should crawl and download all of the files, extract them and parse them into the database. Every subsequent crawl of the database should only crawl, download, extract and parse the new files.
I'm open to whatever database you'd like to use although MySQL or MongoDB are preferred.
The deliverables should be the database, an outline of the database schema and the script.
The script should automatically run hourly on a cron job, but should feature a frontend that shows the progress of the crawl and parse job and previous jobs and allows for a forced crawl and parse at any time.
Specific technologies required: python, php, mysql, mongodb
Extra notes:
Michael K.
100% (19)Projects Completed
12
Freelancers worked with
22
Projects awarded
34%
Last project
18 Aug 2017
United States
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies

