FAQs To CSV
- or -
Post a project like this- Posted:
- Proposals: 1
- Remote
- #1426363
- Expired
Description
Kind of development: New program from scratch
Description of every module: We need a PHP script to extract the FAQs (only the Q&A) from an url given as a parameter and store the result in a csv file.
Description of requirements/functionality: It should crawl the url and scrap the questions and answer from the site.
The result should be a UTF-8 csv file with 2 columns:
Question
Answer (including links and images in html format)
OS requirements: Linux
Extra notes: Sites that should work
https://www.lotusthemes.com/pages/zendesk-themes-download-faq
http://clave.gob.es/clave_Home/registro/Preguntas-frecuentes.html
http://www.carrefour.com.ar/c ontent/preguntas-frecuentes/
https://www.microsoft.com/en-us/software-download/faq
https://www.whatsapp.com/faq/
https://medlineplus.gov/spanish/faq/faq.html
https://www.timewarnercable.com/en/support/all-faqs.html
http://www.arba.gov.ar/Consultar/Inicio.asp
https://monisupport.zendesk.com/hc/es
https://moviepass.zendesk.com/hc/en-us
and the results are on the attached file
we develop chat bots, to help the customer on the initial training. They will provide their FAQs url on their sign up. We need the script to extract that from that url (we do not control the format or the source) the Q&A and store it on a csv file to be imported by our system.
All the examples were processed using qnamaker.ai, but the only thing we need to mimic is the scrapping of the FAQs to train our own bots
Jose N.
0% (0)New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
Hi Jose,
It's impossible to write a generic script that "understands" the structure of all those FAQ sites. Either you'll have to create a profile for each site you'd like to scrape or the code will have to have templates/profiles for all those sites and you need to be lucky that the one you'll be scraping later on is compatible with one of those.
Best regards,
PatrickJose N.20 Jan 2017Hi Patrick,
Sorry to hear that, but clearly the qnamaker.ai is able to process all those sites with any human intervention.
That is what we are looking for, so if you think you are not able to solve this, we will continue the research of somebody who could help usPatrick V.20 Jan 2017I suppose that they have a lot of tempates to be able to process most sites. The problem is that I can't guarantee that it will work for any site you'd like to scrape in the future.
The more templates the system must have, the more work, so the more expensive.
Would you like to have a proposal for a scraper with templates for those sites? The template won't be site-bounded, it will auto-detect the structure and use the most appropiate one, that way it will be compatible with other sites as well. -
Hello Jose,
since the FAQs are info that is not usually changing, would you be alright with just one time extract of all the URLs specified?
If not, could you please specify the version of PHP the script should work on? (It would be best to send also the phpinfo() of your configuration)
Best Regards.
PetrJose N.19 Jan 2017we develop chat bots, to help the customer on the initial training. They will provide their FAQs url on their sign up. We need the script to extract that from that url (we do not control the format or the source) the Q&A and store it on a csv file to be imported by our system.
Petr N.19 Jan 2017Hello Jose,
thank you for your answer,
is this list of 10 URLs final or are these source URLs just examples?
Because it is needed to handle all of them individually within the script, because all the URLs have different structure.
So if it's final number then it's not problem to create the script, if there will be more sources in the future there will need to be done adjustment each time the new URL with unknown structure is added.
Hope I am making sense and looking forward to hearing from you about clarifying this last thing.
Best Regards,
Petr