Scrapy webscrape spider
- or -
Post a project like this1205
$175
- Posted:
- Proposals: 8
- Remote
- #2829654
- PRE-FUNDED
- Awarded
Virtual Assistant, Web Scraping, Data Mining, Python Bot creation, Data Entry, Photoshop
7423832062946211494724990062746752379045340821504192410





Description
Experience Level: Intermediate
** Do not send me a generic, automated response and I will automatically decline it. ** Best response is to send me a couple relevant projects where you have used scrapy.
Develop a python script using scrapy 2.1 to crawl and scrape
for fiscal years beginning in 2015 in the Annual Index to the right of the page.
Click through that link and you see a Fast Facts tab and a Highlights tab, capture that text (don't need the images). Always capture the unique links associated with reports so we can easily get to the original website url needed. Some have podcasts which we don't need to download but capture the URL to the podcast, if one exists.
Similarly, this same process for each of the months listed on the page. Be sure to note any duplicates (links in months that are duplicated in the years). The numbers should be your key like GAO-19-539.
For reports that have a recommendation (indicated by a Y in the above index file), there is a second csv file of this nature:
sequence number (key to the index file above), report number, recommendation number, priority flag, recommendation, agency affected, status, comments
Some recommendations are "priority recommendations".
The priority flag I mention is set to Y if it's a priority recommendation. Using this example, I'd see something like this in the file
I'm going to leave the budget open. But I do not expect this to be very expensive. I would like to see initial results in two days, a test run with just a couple of records, for me to evaluate and comment on. I'll likely award within 2-3 days.
Develop a python script using scrapy 2.1 to crawl and scrape
for fiscal years beginning in 2015 in the Annual Index to the right of the page.
Click through that link and you see a Fast Facts tab and a Highlights tab, capture that text (don't need the images). Always capture the unique links associated with reports so we can easily get to the original website url needed. Some have podcasts which we don't need to download but capture the URL to the podcast, if one exists.
Similarly, this same process for each of the months listed on the page. Be sure to note any duplicates (links in months that are duplicated in the years). The numbers should be your key like GAO-19-539.
For reports that have a recommendation (indicated by a Y in the above index file), there is a second csv file of this nature:
sequence number (key to the index file above), report number, recommendation number, priority flag, recommendation, agency affected, status, comments
Some recommendations are "priority recommendations".
The priority flag I mention is set to Y if it's a priority recommendation. Using this example, I'd see something like this in the file
I'm going to leave the budget open. But I do not expect this to be very expensive. I would like to see initial results in two days, a test run with just a couple of records, for me to evaluate and comment on. I'll likely award within 2-3 days.

Herschel C.
100% (57)Projects Completed
53
Freelancers worked with
45
Projects awarded
43%
Last project
12 Apr 2022
United States
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies