Automatic scrapping in DRUPAL 7 (+PYTHON ?)
- or -
Post a project like this2124
$$
- Posted:
- Proposals: 3
- Remote
- #2047716
- Awarded
Description
Experience Level: Intermediate
The goal is to be able to add and update some Drupal nodes based on the content from external websites.
This is kind of web scrapping for Drupal, creating some new nodes from different listings
Importation will be based on 2 CSV files that include the parameters to help importing the correct fields :
- one file to describe which startup list URL we'd like to crawl : for example listing-1.csv (attached)
- one file to describe which elements of the startup we need to import with the corresponding drupal node field.
Example xpath-fields-1.csv
The idea is that this development could then be adapted to import new data from another startup list so it must be adaptable.
Example:
In our case, we need to get the startup list from 'https://angel.co/companies?locations[]=1717-France&company_types[]=SaaS&company_types[]=Startup' and update automatically on a periodic manner some drupal nodes.
Data that will be provided:
- node template of a drupal 'startup' node type
- URL to scrap periodically (cf files attached with parameters) :
https://angel.co/companies?locations[]=1717-France&company_types[]=SaaS&company_types[]=Startup
For each startup listed, we need to be able to import data into a drupal node
For example for the startup https://angel.co/appsfire, the file xpath-fields-1.csv gives the data we need to import and corresponding drupal fields:
- Title
- image
- startup description
- City + tags + number of employees + URL + social netwoks links: in our case it would import in different fields 'PAris', ' iOS · Mobile · Android · Mobile Advertising', 11-50 employees, appsfire.com, http://twitter.com/appsfire, https://www.facebook.com/appsfire,http://www.linkedin.com/company/appsfire.com
- Founder name:
Ouriel Ohayon
- Funding: we should ideally sum all investments, for example 3 600 000$ + 1 000 000$ in the case of appsfire.
** IMPORTANT **
-Web scrapping should have a delay so it doesn't get blacklisted by web sites (harvesting startups list could be spread on multiples hours or days)
-For the scraping, we need solution coptabile with AJAX website.
Seems that Python library (selenium, scrapy) can do the job but we are open to suggestions
NOTES:
-Ideally, we'd like a solution based on existing Drupal modules, for example Feeds to perform this mission.
The developper should be autonomous to setup his own test site.
This is kind of web scrapping for Drupal, creating some new nodes from different listings
Importation will be based on 2 CSV files that include the parameters to help importing the correct fields :
- one file to describe which startup list URL we'd like to crawl : for example listing-1.csv (attached)
- one file to describe which elements of the startup we need to import with the corresponding drupal node field.
Example xpath-fields-1.csv
The idea is that this development could then be adapted to import new data from another startup list so it must be adaptable.
Example:
In our case, we need to get the startup list from 'https://angel.co/companies?locations[]=1717-France&company_types[]=SaaS&company_types[]=Startup' and update automatically on a periodic manner some drupal nodes.
Data that will be provided:
- node template of a drupal 'startup' node type
- URL to scrap periodically (cf files attached with parameters) :
https://angel.co/companies?locations[]=1717-France&company_types[]=SaaS&company_types[]=Startup
For each startup listed, we need to be able to import data into a drupal node
For example for the startup https://angel.co/appsfire, the file xpath-fields-1.csv gives the data we need to import and corresponding drupal fields:
- Title
- image
- startup description
- City + tags + number of employees + URL + social netwoks links: in our case it would import in different fields 'PAris', ' iOS · Mobile · Android · Mobile Advertising', 11-50 employees, appsfire.com, http://twitter.com/appsfire, https://www.facebook.com/appsfire,http://www.linkedin.com/company/appsfire.com
- Founder name:
Ouriel Ohayon
- Funding: we should ideally sum all investments, for example 3 600 000$ + 1 000 000$ in the case of appsfire.
** IMPORTANT **
-Web scrapping should have a delay so it doesn't get blacklisted by web sites (harvesting startups list could be spread on multiples hours or days)
-For the scraping, we need solution coptabile with AJAX website.
Seems that Python library (selenium, scrapy) can do the job but we are open to suggestions
NOTES:
-Ideally, we'd like a solution based on existing Drupal modules, for example Feeds to perform this mission.
The developper should be autonomous to setup his own test site.
Mathieu D.
100% (15)Projects Completed
13
Freelancers worked with
9
Projects awarded
100%
Last project
4 Jun 2020
France
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
Mathieu, please share website links.
-
Mathieu , please share website link.
661024660991
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies