
AI Speech & Audio Projects
Looking for freelance AI Speech & Audio jobs and project work? PeoplePerHour has you covered.
AI-Powered Personalized Audiobook System for Shopify
I am looking for one highly experienced developer to build a complete, clean setup for a personalized audiobook system on Shopify. Scope (high level, simple for an expert): Custom Shopify frontend & backend for an audiobook configurator Claude for logic & personalization DeepSeek for story generation ElevenLabs for voice output Custom SFX / background sound integration Automated delivery via Klaviyo (or similar) Customers should be able to configure a story (names, style, voice, mood, length) and receive a fully generated audiobook automatically after checkout. This is not a research project – I’m looking for someone who has already built similar AI pipelines and can implement this efficiently and pragmatically.
10 days ago31 proposalsRemoteAI Speech & Audio Processing Project
We are seeking an experienced freelancer for an AI Speech and Audio Processing project that showcases advanced skills in artificial intelligence, machine learning, and audio manipulation techniques. The ideal candidate will possess a strong understanding of speech recognition, natural language processing, and audio signal processing. The project aims to develop innovative solutions that enhance audio quality, improve speech synthesis, and optimize voice recognition systems. We invite proposals from professionals who are eager to demonstrate their expertise and contribute to this cutting-edge initiative.
14 days ago16 proposalsRemote
Past Projects
Help at town hall in cuavas
We have to go to town hall in cuavas on Monday, we are at the second stage of visa application and need the staff to generate a QR code
Recording
I need people can recording 30 min in an application
Vapi Ai Voice calling - Refine spoken and written Swedish
I need help with advanced VAPI configuration for voice and Swedish transcribing, enhanced by instructions for better understanding of the spoken Swedish language. I'm currently using ElevenLabs voice, and I'm looking for someone experienced. Scope of work - Assist with advanced VAPI configuration for Swedish voice using ElevenLabs. Better pronuciation of spoken swedish, better understanding of swedish so the live transcription gets it right. currently testing both Deepgram and Speechmatics. Additional information You don’t need to understand Swedish as we will workshop this together. when we get the foundations right, we like to do structured outputs. Ideal Candidate - Experienced with advanced VAPI configuration and transcribing. - Skilled in using ElevenLabs voice technology. Preferably with pronunciation files, API and advanced transcribing to structured output fields . The Json files and others seems to be formatted differently based on the transcribe model? - Candidate must provide written scope of work/suggestions. Deliverables and expected outcomes. English. - Candidate need fluent in english language. - Keep reasonable service levels and attend to agreed meetings and feedbacks. - Start as soon as possible - Open to ongoing improvements as soon we can go live and actually get business value. Transcription language Swedish - We will help out with the swedish and identify needed improvements.
Multi-lingual Conversation Audio Collection Project (Canada)
TELUS Digital is seeking native-speaking individuals to participate in a conversational data collection project. The task involves recording real-world, two-party conversations to support AI model training. Contributors will work in pairs and generate conversations that sound natural, following strict guidelines for audio quality, content, and file format. This is a remote project Conversations must be recorded in the same room, using a single microphone. Each pair will cover general and medical-related topics (medical background is preferred but not required). The role of one speaker must remain consistent across all recordings for each topic. Your Partner / Friend who will perform the task, they will also need to register in our TELUS Digital AI Community Platform with the same link and submit a separate application. Estimated time to complete the task: Each speaker: up to 2 hours of recorded speech Each pair: up to 4 hours combined Minimum 1 hour of recorded speech required to qualify for payment (after QA check). Each participant may only complete the project once. Pay Rate: Canadian French - $35 per hour. This is an Independent Contractor opportunity. Payments will be made via Hyperwallet, where you can choose PayPal or Bank Transfer as the payment method. Key Requirements: French (Canada) native speaker Willing to record in pair, in the same room, on a single device. Adherence to specific audio specifications (WAV, 16kHz, mono). Ability to follow guidelines to ensure conversations sound natural and are not read from a script. Device with voice recording capability Stable Internet connection for uploading files Register here (both partners need to submit application separately): https://www.telusinternational.ai/cmp/contributor/jobs/available/127938 Selected participants will be contacted by TELUS Digital with detailed guidelines. If you have questions, we will be happy to assist you!
Dublagem de Vídeos Virais com Inteligência Artificial
Estou procurando um(a) profissional para realizar dublagens de vídeos virais usando tecnologia de Inteligência Artificial, mantendo naturalidade e sincronização com o áudio original.
Saudi Najidi Speaker
الوصف: اللغة واللهجة: سيتم تقديم النص باللغة العربية الفصحى. يجب على المتقدم تحويل النص إلى اللهجة النجدية وتسجيله بطريقة طبيعية، واضحة وسلسة، دون استخدام لهجات أخرى أو العودة إلى العربية الفصحى أثناء التسجيل. ملف النص العربي الخاص بالتسجيل لن يتم تسليمه إلا بعد مراجعة المتقدم واعتماده من قبل شركة Graphlogic كشخص مناسب للتسجيل الصوتي المطلوب. جودة التسجيل: يجب أن تكون جودة التسجيل عالية (Studio Quality)، خالية من أي ضوضاء أو تشويش أو مؤثرات خارجية. يجب أن يكون الصوت واضحًا، بنبرة ثابتة، وبأداء احترافي مناسب لاستخدامه في تقنيات استنساخ الصوت (Voice Cloning). لا يُسمح باستخدام أي فلاتر أو مؤثرات صوتية صناعية. مدة التسجيل وتسليمه: يجب أن تكون المدة الإجمالية للتسجيل حوالي ساعة واحدة. يمكن تسجيل المحتوى على شكل عدة ملفات منفصلة، بشرط أن يتم تغطية جميع المقاطع المطلوبة بشكل كامل، دون حذف أو إهمال أي جزء من النص. حقوق الاستخدام: جميع التسجيلات الصوتية وأي نموذج صوت مستنسخ ناتج عنها تعتبر ملكية حصرية لشركة Graphlogic. يوافق المتقدم على التنازل الكامل والدائم عن أي حقوق حالية أو مستقبلية في التسجيلات أو الصوت المستنسخ لصالح شركة Graphlogic. يُمنع منعًا باتًا على المتقدم استخدام أو مشاركة أو بيع أو توزيع التسجيلات أو الصوت المستنسخ بأي شكل من الأشكال أو لأي طرف ثالث. لا يجوز استخدام التسجيلات أو الصوت المستنسخ لأي غرض شخصي أو تجاري خارج نطاق شركة Graphlogic، ولا يمكن مشاركته مع أي جهة أخرى إلا بعد الحصول على موافقة كتابية رسمية من الشركة. السرية والخصوصية: جميع النصوص، التسجيلات، والمعلومات المقدمة من شركة Graphlogic تعتبر سرّية. يلتزم المتقدم بعدم مشاركتها أو تسريبها أو إعادة استخدامها بأي شكل من الأشكال. أي خرق لهذا البند يُعتبر مخالفة قانونية صريحة، ويمنح شركة Graphlogic الحق الكامل في اتخاذ الإجراءات القانونية المناسبة. المعايير المهنية: يجب الالتزام بالنبرة المطلوبة (رسمية/محايدة) مع الالتزام التام بالنطق الصحيح للهجة النجدية. في حال احتوى أي تسجيل على جمل أو مقاطع بلهجة غير نجدية أو نطق غير مطابق، يجب إعادة تسجيل الملف الصوتي بالكامل من جديد وليس فقط الجزء الخاطئ، وذلك لضمان الاتساق الكامل في اللهجة وجودة البيانات الصوتية. يجب تصحيح أي أخطاء أو انحرافات عن اللهجة أو الجودة المطلوبة بشكل كامل، وفقًا لملاحظات ومراجعات فريق شركة Graphlogic. إجراءات الموافقة والمراجعة: ١. يجب الحصول على موافقة كتابية صريحة من المتقدم لاستخدام صوته في عملية استنساخ الصوت، قبل البدء في أي خطوات تقنية. ٢. بعد تسليم التسجيل، سيتم إجراء مراجعة دقيقة للتسجيل وتصحيح أي أخطاء أو ملاحظات لضمان مطابقته الكاملة للمتطلبات. ٣. سيتم عقد اجتماع مخصص للتحقق من الصوت بعد التسجيل، يتضمن قراءة جملة مرجعية من منصة الاستنساخ الصوتي، للتأكد من مطابقة الصوت واعتماده قبل بناء نموذج الصوت.
urgent
Expert Voice AI Prompt Engineer (Consultant)
Hearth AI is seeking a top-tier Voice AI prompt specialist for a high-impact consulting engagement. We are refining the user experience for our fast-growing flagship AI receptionist. We are looking for a top-tier, freelance Prompt Engineer who specializes in voice agents to help us achieve a more natural, effective, and human-like conversational flow. This is a short-term consulting engagement. You will work directly with our product team and CEO to: Audit & Review: Analyze our current prompt library and conversation flows, identifying areas for improvement in clarity, tone, and efficiency. Design & Engineer: Design, test, and refine write new, experimental prompts for various scenarios (e.g., call routing, message taking, appointment scheduling, handling ambiguity, repeat callers). Test & Optimize: Help us structure A/B tests to measure the performance of new prompts against existing ones. This is the right role for you if: You have at least 1-2 years of professional experience focused specifically on prompt engineering and conversational design for voice-first AI products (e.g., AI assistants, advanced IVR, voice-controlled applications). Deep understanding of the nuances of spoken language vs. written language (e.g., handling pauses, interruptions, and non-linear conversations). Ability to define and embody a specific persona/tone of voice for an AI agent. Your portfolio demonstrates a deep understanding of Voice User Interface (VUI) design principles. We must see examples of your work for voice agents. You have a sophisticated understanding of modern prompt engineering techniques (e.g., chain-of-thought, few-shot learning, ReAct) and how to apply them to spoken dialogue. You are results-obsessed and have direct experience measuring and improving conversational AI performance through A/B testing and data analysis. You have a "good ear" for what makes a conversation feel natural and can write dialogue that leverages the high-fidelity output of platforms like ElevenLabs. Bonus points for: Current or previous experience at a leading voice-first company Experience with workflow automation tools (n8n, Zapier) and an understanding of how prompts fit into an API-driven system. Excellent communication skills and the ability to articulate the rationale behind your prompt design choices. How to Apply: Please begin your proposal with the word "HEARTH" so we know you've read the details. In your proposal, please include: A link to your portfolio or 2-3 specific examples of voice-related prompt engineering or conversational design work. A brief paragraph on your experience and philosophy for creating effective prompts for a voice-based AI receptionist to achieve desired user or business outcomes.
Realtime API chatgpt bot
I need a simple bot using the Realtime API, following the official docs. It doesn’t need to look good or be deployed online—just run locally on my machine. The bot should connect to the API, handle basic messaging, and let me export our conversation as a text file or JSON. Keep the code clean and easy to follow, with setup instructions. No extra features or fancy UI—just a basic, working example I can build on later. (i want to be able to set its instructions and guidelines however i want easily)
Multilingual Conversational Speech Recording
Task: Record conversational speech in pairs (2 speakers, same room, 1 device)
clone my voice and switch 2 min video in French language
I prepare the project to announce it in social media. I will record a video - 2 minutes, in native language and I need to clone my voice and switch it to French. So, I have to sound like my mother tongue in French: emotionality, shades of speech, ect..
opportunity
AI-powered IVR System for my company
I have a startup company. And recently I have decided to build an IVR System for our company. We can provide RingCentral account and Azure Open AI key. The system should be a bit complex. So I want someone who has some similar experience. P.S. please don't send a proposal if you don't have demo project. This is urgent project and I don't want to waste my time.
Create an AI voice agent
Project Title: Development of AI Voice Assistant Software for German Property Management Companies (Based on hallopetra.de & vetpal.de) Project Description: We are looking for an experienced development team or AI expert to build a custom AI voice assistant software inspired by hallopetra.de and vetpal.de. The goal is to develop a scalable voice AI solution tailored to the needs of German property management companies ("Hausverwaltungen"). The software should answer incoming phone calls, understand the caller’s intent, and forward the call or provide relevant information based on predefined workflows – thereby reducing the workload of the internal team. Key Requirements: Voice AI agent capable of answering and processing inbound phone calls in German (high-quality NLP/NLU). Intelligent call routing to the correct department or contact person. Ability to respond to frequently asked questions using AI or pre-defined text-to-speech responses. Customizable workflows to handle various tenant inquiries (e.g., noise complaints, heating issues, rental questions, etc.). Simple drag-and-drop frontend (modular builder) that allows non-technical users to set up workflows and FAQ responses easily. Integration with popular CRM/ticket/email systems (via API/webhook). Web-based admin dashboard to monitor and configure call handling, view logs, and manage clients. GDPR compliance, including proper consent for call recording and secure data handling. Multi-tenant architecture for use as a SaaS solution across multiple property management clients. References for Functionality & UX: https://hallopetra.de https://www.vetpal.de Target Market: German property management companies looking to automate and streamline daily inbound communication with tenants. Deliverables: Fully functional MVP or production-ready version of the voice assistant Admin dashboard with drag-and-drop builder Source code and documentation Deployment assistance and optional maintenance offer Technical Notes: Voice AI must support fluent German Backend: flexible, but scalable Frontend: should prioritize simplicity and usability (no-code/low-code logic builder) Use of existing frameworks like Twilio, Dialogflow, Whisper, etc. is welcome
opportunity
Build AI Voice-to-Invoice MVP (Speech to PDF via Email)
I’m looking for a UK-based developer (or small team) to build an AI-powered MVP tool that allows users to speak an invoice aloud, have it transcribed, and automatically generate and email a professional invoice PDF. This tool is aimed at small business owners (like tradespeople, coaches, and landlords) who often delay invoicing due to time constraints. Core workflow: • User speaks naturally: “Invoice Bernie’s Beans for £100 for helping me clean the garden” • System uses Whisper API (or similar) to transcribe the message • Extracts client name, service, amount, and date • Prompts for missing info (e.g., email, address) • Generates a branded invoice (PDF) • Sends the PDF to the client via email (using Gmail or SMTP) Optional (quote separately): • Stripe/PayPal link on invoice • Simple dashboard of past invoices • Save invoices to Google Sheets Deliverables: • Working hosted MVP (web-based or via Bubble/Glide) • Functional flow from voice input to emailed invoice • 1–2 invoice templates (custom branding) • Light documentation / short walkthrough Tech Suggestions: • Whisper API, Otter.ai, or AssemblyAI • Zapier/Make.com or lightweight backend • jsPDF or Puppeteer for PDF creation • Gmail API or SMTP for email Please include your experience with transcription or automation projects, your proposed approach, and estimated time/cost. Skills: • Voice recognition • Whisper API • Artificial Intelligence • Automation • PDF generation • Bubble.io • Gmail API • SaaS MVP • Speech-to-text
AI to Listen and Rate Infinity Call Tracking Calls
We're looking for an experienced AI specialist or developer to help us build or integrate a solution that can listen to recorded calls from Infinity Call Tracking and automatically rate them based on specific quality criteria (e.g. sales pitch, tone, lead qualification, objection handling, etc.).