FAQs To CSV

- or -

Post a project like this

Ended at: 18/02/2017

Fixed Price

Posted: 7 years ago
Proposals: 1
Remote
#1426363
Expired

has already sent a proposal.

Description

Experience Level: Intermediate

General information for the business: Chat bots
Kind of development: New program from scratch
Description of every module: We need a PHP script to extract the FAQs (only the Q&A) from an url given as a parameter and store the result in a csv file.
Description of requirements/functionality: It should crawl the url and scrap the questions and answer from the site.

The result should be a UTF-8 csv file with 2 columns:
Question
Answer (including links and images in html format)
OS requirements: Linux
Extra notes: Sites that should work

https://www.lotusthemes.com/pages/zendesk-themes-download-faq

http://clave.gob.es/clave_Home/registro/Preguntas-frecuentes.html

http://www.carrefour.com.ar/c ontent/preguntas-frecuentes/

https://www.microsoft.com/en-us/software-download/faq

https://www.whatsapp.com/faq/

https://medlineplus.gov/spanish/faq/faq.html

https://www.timewarnercable.com/en/support/all-faqs.html

http://www.arba.gov.ar/Consultar/Inicio.asp

https://monisupport.zendesk.com/hc/es

https://moviepass.zendesk.com/hc/en-us

and the results are on the attached file

we develop chat bots, to help the customer on the initial training. They will provide their FAQs url on their sign up. We need the script to extract that from that url (we do not control the format or the source) the Q&A and store it on a csv file to be imported by our system.

All the examples were processed using qnamaker.ai, but the only thing we need to mimic is the scrapping of the FAQs to train our own bots

New Proposal

Clarification Board Ask a Question

19 Jan 2017

Hi Jose,

It's impossible to write a generic script that "understands" the structure of all those FAQ sites. Either you'll have to create a profile for each site you'd like to scrape or the code will have to have templates/profiles for all those sites and you need to be lucky that the one you'll be scraping later on is compatible with one of those.

Best regards,
Patrick

Jose N.20 Jan 2017
Hi Patrick,

Sorry to hear that, but clearly the qnamaker.ai is able to process all those sites with any human intervention.

That is what we are looking for, so if you think you are not able to solve this, we will continue the research of somebody who could help us

Patrick V.20 Jan 2017
I suppose that they have a lot of tempates to be able to process most sites. The problem is that I can't guarantee that it will work for any site you'd like to scrape in the future.
The more templates the system must have, the more work, so the more expensive.

Would you like to have a proposal for a scraper with templates for those sites? The template won't be site-bounded, it will auto-detect the structure and use the most appropiate one, that way it will be compatible with other sites as well.
19 Jan 2017

Hello Jose,
since the FAQs are info that is not usually changing, would you be alright with just one time extract of all the URLs specified?
If not, could you please specify the version of PHP the script should work on? (It would be best to send also the phpinfo() of your configuration)

Best Regards.
Petr

Jose N.19 Jan 2017
we develop chat bots, to help the customer on the initial training. They will provide their FAQs url on their sign up. We need the script to extract that from that url (we do not control the format or the source) the Q&A and store it on a csv file to be imported by our system.

Petr N.19 Jan 2017
Hello Jose,
thank you for your answer,
is this list of 10 URLs final or are these source URLs just examples?
Because it is needed to handle all of them individually within the script, because all the URLs have different structure.
So if it's final number then it's not problem to create the script, if there will be more sources in the future there will need to be done adjustment each time the new URL with unknown structure is added.
Hope I am making sense and looking forward to hearing from you about clarifying this last thing.

Best Regards,
Petr
- Show more messages

Description

Jose N.

New Proposal

Clarification Board Ask a Question