
Data Sourcing and RAG Creation for LLM
- or -
Post a project like this$400
- Posted:
- Proposals: 12
- Remote
- #4247917
- Expired
⭐ TOP CERT Graphic Designer ⭐| Expert 2D/3D Render | Video Animator | Web Developer |Logo Designer |Graphic Animations | Video Editor ||Illustration.

Python | Django | | OpenAI | GenerativeAI | ML | AI| Face recognition|ChatGPT|GoLang|React | Mobile App | Graphic Design
Freelance Developer: Python, MQL, Graphics, full-stack developer AI & ML, AWS, and Data Analysis Specialist
948732499773241074983011555243280013211193042591393511525604835974911386940311631411671240





Description
Experience Level: Intermediate
For now our main goal is to have the right data to train our recommendation Chat AI engine. The data quality, availability and value is our main purpose for this first project, which of course continues till our MVP launch. By data quality we mean, the sources, deduplication, handle missing values, normalization, stemming, tokenization indexing etc.. everything so that this RAG is properly scalable and prepared for embedding and vectorization, making it suitable for machine learning purposes.. This includes how and where we data mine/harvest, the date sources themselves that are relevant to the travel industry within the verticals mentioned: Restaurants, Hotels, Nightlife and Experiences (viator, airbnb experiences, getyourguide, etc…)
Performance Tracking and Visualization:
Visual Indicators: Utilizing platforms like PowerBI or Tableau, I will create dashboards to visualize and track the performance of your language models. This will help in quickly identifying patterns and areas for improvement
Data Handling Libraries: Experienced with Pandas, NumPy, Beautiful Soup, Scrapy for data manipulation and web scraping.
Machine Learning and NLP: Knowledgeable in using libraries such as NLTK and spaCy for natural language processing tasks, which are critical for RAG systems.
We will not use any corporate LLM such as ChatGPT, LLamada or Gemini that in the future can become our competitor, we want to build a value business in this space and have a very well trained proprietary model so for this we want to use an open source LLM. This is important to us and the individual that we bring onboard should be ok with building pipelines and APS for whatever open source LLM we all decide will achieve our desired outcome. We will use self deployed open source LLMs.
Overview outline:
Foundational model Building
Prompt templates and prompt engineering tools
Vector databases
Data SDKs and frameworks
Fine-tuning tools
Deployment and monitoring tools
Skills and Tools:
Data Handling Libraries: Experienced with Pandas, NumPy, Beautiful Soup, Scrapy for data manipulation and web scraping.
Database Management: Skilled in PostgreSQL and MongoDB for structuring and managing large datasets.
Machine Learning and NLP: Knowledgeable in using libraries such as NLTK and spaCy for natural language processing tasks, which are critical for RAG systems.
Conversational AI: Implement a conversational AI capable of handling various travel-related queries. Use frameworks like Rasa or Dialogflow for building the chatbot infrastructure.
Personalization: Ensure the chatbot can personalize responses based on user data and preferences, improving over time with machine learning algorithms.
This will be a worldwide launch of the chatbot AI, since launching a Travel AI app specific to a region will lead to a terrible user experience. Imagine opening Expedia and searching for a trip to Mexico and an error message is returned saying "Sorry, we do not have any information about this location, please only search in the US for now" We can't do that. Also, there is so much data and info out there in the internet now, that it would be lazy for us not to build the best Travel AI tool out there.
Our goal is an app better than this: https://justasklayla.com/. You should try it and then you would understand our primary AI goal.
Performance Tracking and Visualization:
Visual Indicators: Utilizing platforms like PowerBI or Tableau, I will create dashboards to visualize and track the performance of your language models. This will help in quickly identifying patterns and areas for improvement
Data Handling Libraries: Experienced with Pandas, NumPy, Beautiful Soup, Scrapy for data manipulation and web scraping.
Machine Learning and NLP: Knowledgeable in using libraries such as NLTK and spaCy for natural language processing tasks, which are critical for RAG systems.
We will not use any corporate LLM such as ChatGPT, LLamada or Gemini that in the future can become our competitor, we want to build a value business in this space and have a very well trained proprietary model so for this we want to use an open source LLM. This is important to us and the individual that we bring onboard should be ok with building pipelines and APS for whatever open source LLM we all decide will achieve our desired outcome. We will use self deployed open source LLMs.
Overview outline:
Foundational model Building
Prompt templates and prompt engineering tools
Vector databases
Data SDKs and frameworks
Fine-tuning tools
Deployment and monitoring tools
Skills and Tools:
Data Handling Libraries: Experienced with Pandas, NumPy, Beautiful Soup, Scrapy for data manipulation and web scraping.
Database Management: Skilled in PostgreSQL and MongoDB for structuring and managing large datasets.
Machine Learning and NLP: Knowledgeable in using libraries such as NLTK and spaCy for natural language processing tasks, which are critical for RAG systems.
Conversational AI: Implement a conversational AI capable of handling various travel-related queries. Use frameworks like Rasa or Dialogflow for building the chatbot infrastructure.
Personalization: Ensure the chatbot can personalize responses based on user data and preferences, improving over time with machine learning algorithms.
This will be a worldwide launch of the chatbot AI, since launching a Travel AI app specific to a region will lead to a terrible user experience. Imagine opening Expedia and searching for a trip to Mexico and an error message is returned saying "Sorry, we do not have any information about this location, please only search in the US for now" We can't do that. Also, there is so much data and info out there in the internet now, that it would be lazy for us not to build the best Travel AI tool out there.
Our goal is an app better than this: https://justasklayla.com/. You should try it and then you would understand our primary AI goal.

Alexander R.
0% (0)Projects Completed
-
Freelancers worked with
-
Projects awarded
0%
Last project
30 Apr 2025
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies