
AI Speech & Audio Projects
Looking for freelance AI Speech & Audio jobs and project work? PeoplePerHour has you covered.
Ai Speech Expert Need
Seeking an AI speech expert to analyze and optimize speech recognition and synthesis systems. Responsibilities include evaluating current models, improving transcription accuracy, reducing latency, refining language models for diverse accents and noisy environments, and recommending architecture or data augmentation strategies. Candidate should possess deep experience with ASR, TTS, neural architectures, transfer learning, and evaluation metrics. Deliverables: technical audit, actionable roadmap, and performance benchmarks to guide implementation.
18 days ago17 proposalsRemoteUpgrade voice from theater play video 1 hr
I need to enhace and insolate voice from a theater play cellphone recording, removing background noise. One hour of video.
24 days ago18 proposalsRemote
Past Projects
ACX audio book voice over
I am looking for a voiceover to recreate my short e-book of 10,000 words. The book will do well with a motherly British/ Londoner accent and someone who can relate to the content which is also a guide book so just needs points of humour at some point's not monotone. The speaker will hold no rights to the book and will be required to sign a NDA. You will need to deliver the final copy to ACX format and therefore know what this is . Suggested AI tool would be elevenlabs.io as they provide a great range of voices and tones to suit the requirement of my e-book delivery of the book should take no more than 3 hours I would ask that the hired freelancer provides example of the chosen voice before completing and moving forward with the whole project ad the voice and tone is really important for the book.
Audio Data Collection with Wireless Earbuds
I need two participants for an audio data collection project using wireless earbuds. The task involves recording natural conversation between two people in a quiet indoor environment using the Riverside platform. Project Details: * Total Sessions: 2 recordings (10 minutes each) * Participants: Exactly 2 people * Device Requirement: Wireless earbuds with microphone (e.g., AirPods or similar) * Audio Format: WAV * Recording Method: Audio must be captured only through the earbuds microphone (not phone/laptop mic) Recording Process: 1. Session 1 (10 minutes): * Person A wears earbuds (primary speaker) * Person B sits 1–3 meters away (secondary speaker) 2. Session 2 (10 minutes): * Roles are switched * Person B wears earbuds (primary speaker) * Person A sits 1–3 meters away Requirements: * Continuous recording (no pauses, cuts, or edits) * Natural conversation (no scripted reading) * Distance must remain between 1 to 3 meters * Both participants must be physically present in the same room * Earbuds must remain in use throughout the session Additional Requirements: * Provide metadata including: * Earbud brand/model * Distance between participants * Ages of participants * Recording duration * Environment details (room setup, objects) * Background noise type and level * Room size category * Links to uploaded WAV files Important Notes: * Perform a short test recording before starting * Ensure devices are fully charged * Follow all instructions strictly to avoid rejection Deliverables: * Two 10-minute WAV audio files (one per primary speaker session) * Completed metadata sheet with all required details This is a simple task but requires strict adherence to guidelines and high-quality, natural audio recording.
opportunity
VAPI AI Customer service agent development
Looking for somebody to create an AI agent tor a client to provide customer service support for their customers over the phone.
Thai Voice Sentence Recording
I’m collecting a set of 300 short Thai sentences for speech-training research and I’d like a native speaker to record them directly in our mobile app. You’ll be working with a smartphone—any operating system is fine, even alternatives beyond iOS or Android—and you should be able to record in a completely quiet space so there’s no background noise on the clips. Once you accept, I’ll send login credentials, a step-by-step recording guideline, and a link to the app. The workflow is straightforward: open the script inside the app, tap to record each sentence, review the waveform for clarity, and save. The system automatically uploads every take, so there’s no post-processing required on your side. Deliverable • 300 clearly spoken Thai sentences, captured and uploaded through the app, passing the built-in silence and clipping checks. I release payment after the platform confirms that all 300 files meet the quality threshold for volume, pronunciation, and absence of background noise. If you’re a native speaker with a quiet room and a smartphone, this should take less than an hour. Feel free to apply and I’ll get you set up right away.
Promo video from photos
## AI Promotional Video Needed for New Medical Lifting Aid Product (Healthcare / Assistive Device) ### Project Overview We are launching a new healthcare lifting aid called Lone Raiser, designed to help elderly, disabled, and vulnerable individuals safely stand and transfer independently while reducing injury risk for carers. We are looking for a freelancer to create a professional AI-generated promotional video using product photos, branding, and a script. The video will be used for marketing to care homes, NHS, private carers, and families. --- ### What We Need We require a 30–45 second promotional video that: - Clearly explains the problem and solution - Demonstrates the product visually using AI or animation - Builds trust and credibility in a healthcare setting - Looks modern, clean, and professional - Includes motion graphics and text overlays - Includes voiceover (UK English preferred) We will provide: - Product photos and concept visuals - Logo and branding - Key benefits and messaging - Target audience information - Guidance on tone and style We are open to: - AI video creation - Motion graphics - Medical explainer animation - Product demo style - AI avatar presenter (optional but welcome) --- ### Target Audience - Care homes - Healthcare providers - Occupational therapists - NHS and private healthcare - Families caring for elderly relatives The tone should feel: - Professional - Trustworthy - Safe - Healthcare focused - Not overly “salesy” --- ### Key Product Benefits to Highlight - Reduces risk of injury for carers - Promotes patient independence and dignity - Safe and easy lifting support - Compact and discreet design - Suitable for home and care environments --- ### Deliverables - 1 main video (30–45 seconds) - 1–2 short cutdowns for social media (optional) - Voiceover and captions - Editable source file preferred --- ### Budget We are a startup, so we are looking for cost-effective options and potential long-term collaboration if results are strong. Please include: - Examples of healthcare or product videos - AI or animation work - Estimated turnaround time - Suggested creative approach --- ### Future Opportunity We plan to produce multiple marketing videos, so this could lead to ongoing work. --- We look forward to working with someone creative, reliable, and experienced in healthcare or product marketing.
A.I Voiceover Creator
Overview We are looking for a talented AI Voiceover Creator to join our team on a part-time, remote basis. This role offers flexible working hours within the weekdays, allowing you to apply your creative skills in producing high-quality AI-generated voiceovers. You will use your expertise to craft engaging voice content that enhances our brand communications and marketing efforts. Your Role - Use our pre-built custom AI voice (in MiniMax – NOT Eleven Labs) - Generate, pace, and clean voiceover from finalized scripts - Apply editing to fix tone, speed, pauses etc (regenerations for certain parts will be needed) - Deliver clean, finished MP3 voiceovers - 3–4 voiceovers per week (approximately 10 minutes each) Requirements - Familiarity with AI tools like MiniMax (or willingness to learn) - AI audio editing experience (generations, cleanup, pacing, pauses) - Fluent English comprehension and pacing sense - VERY detail-oriented and consistent - The voiceover must sound 100 percent real. This is not a press-a-button-and-it’s-done job. Regeneration and editing are required. Trial - This would start as a short trial to make sure the workflow and quality are a good fit on both sides before committing longer-term. Future Work - The channel will eventually expand into courses and digital products, which may create additional work opportunities over time. Application - Please tell us a bit about yourself, your experience with sound artistry, and why we should choose you for the job. - Please send an A.I Voiceover sample you made.
AI-Powered Personalized Audiobook System for Shopify
I am looking for one highly experienced developer to build a complete, clean setup for a personalized audiobook system on Shopify. Scope (high level, simple for an expert): Custom Shopify frontend & backend for an audiobook configurator Claude for logic & personalization DeepSeek for story generation ElevenLabs for voice output Custom SFX / background sound integration Automated delivery via Klaviyo (or similar) Customers should be able to configure a story (names, style, voice, mood, length) and receive a fully generated audiobook automatically after checkout. This is not a research project – I’m looking for someone who has already built similar AI pipelines and can implement this efficiently and pragmatically.
AI Speech & Audio Processing Project
We are seeking an experienced freelancer for an AI Speech and Audio Processing project that showcases advanced skills in artificial intelligence, machine learning, and audio manipulation techniques. The ideal candidate will possess a strong understanding of speech recognition, natural language processing, and audio signal processing. The project aims to develop innovative solutions that enhance audio quality, improve speech synthesis, and optimize voice recognition systems. We invite proposals from professionals who are eager to demonstrate their expertise and contribute to this cutting-edge initiative.
Help at town hall in cuavas
We have to go to town hall in cuavas on Monday, we are at the second stage of visa application and need the staff to generate a QR code
Recording
I need people can recording 30 min in an application
Vapi Ai Voice calling - Refine spoken and written Swedish
I need help with advanced VAPI configuration for voice and Swedish transcribing, enhanced by instructions for better understanding of the spoken Swedish language. I'm currently using ElevenLabs voice, and I'm looking for someone experienced. Scope of work - Assist with advanced VAPI configuration for Swedish voice using ElevenLabs. Better pronuciation of spoken swedish, better understanding of swedish so the live transcription gets it right. currently testing both Deepgram and Speechmatics. Additional information You don’t need to understand Swedish as we will workshop this together. when we get the foundations right, we like to do structured outputs. Ideal Candidate - Experienced with advanced VAPI configuration and transcribing. - Skilled in using ElevenLabs voice technology. Preferably with pronunciation files, API and advanced transcribing to structured output fields . The Json files and others seems to be formatted differently based on the transcribe model? - Candidate must provide written scope of work/suggestions. Deliverables and expected outcomes. English. - Candidate need fluent in english language. - Keep reasonable service levels and attend to agreed meetings and feedbacks. - Start as soon as possible - Open to ongoing improvements as soon we can go live and actually get business value. Transcription language Swedish - We will help out with the swedish and identify needed improvements.
Multi-lingual Conversation Audio Collection Project (Canada)
TELUS Digital is seeking native-speaking individuals to participate in a conversational data collection project. The task involves recording real-world, two-party conversations to support AI model training. Contributors will work in pairs and generate conversations that sound natural, following strict guidelines for audio quality, content, and file format. This is a remote project Conversations must be recorded in the same room, using a single microphone. Each pair will cover general and medical-related topics (medical background is preferred but not required). The role of one speaker must remain consistent across all recordings for each topic. Your Partner / Friend who will perform the task, they will also need to register in our TELUS Digital AI Community Platform with the same link and submit a separate application. Estimated time to complete the task: Each speaker: up to 2 hours of recorded speech Each pair: up to 4 hours combined Minimum 1 hour of recorded speech required to qualify for payment (after QA check). Each participant may only complete the project once. Pay Rate: Canadian French - $35 per hour. This is an Independent Contractor opportunity. Payments will be made via Hyperwallet, where you can choose PayPal or Bank Transfer as the payment method. Key Requirements: French (Canada) native speaker Willing to record in pair, in the same room, on a single device. Adherence to specific audio specifications (WAV, 16kHz, mono). Ability to follow guidelines to ensure conversations sound natural and are not read from a script. Device with voice recording capability Stable Internet connection for uploading files Register here (both partners need to submit application separately): https://www.telusinternational.ai/cmp/contributor/jobs/available/127938 Selected participants will be contacted by TELUS Digital with detailed guidelines. If you have questions, we will be happy to assist you!
Dublagem de Vídeos Virais com Inteligência Artificial
Estou procurando um(a) profissional para realizar dublagens de vídeos virais usando tecnologia de Inteligência Artificial, mantendo naturalidade e sincronização com o áudio original.
Saudi Najidi Speaker
الوصف: اللغة واللهجة: سيتم تقديم النص باللغة العربية الفصحى. يجب على المتقدم تحويل النص إلى اللهجة النجدية وتسجيله بطريقة طبيعية، واضحة وسلسة، دون استخدام لهجات أخرى أو العودة إلى العربية الفصحى أثناء التسجيل. ملف النص العربي الخاص بالتسجيل لن يتم تسليمه إلا بعد مراجعة المتقدم واعتماده من قبل شركة Graphlogic كشخص مناسب للتسجيل الصوتي المطلوب. جودة التسجيل: يجب أن تكون جودة التسجيل عالية (Studio Quality)، خالية من أي ضوضاء أو تشويش أو مؤثرات خارجية. يجب أن يكون الصوت واضحًا، بنبرة ثابتة، وبأداء احترافي مناسب لاستخدامه في تقنيات استنساخ الصوت (Voice Cloning). لا يُسمح باستخدام أي فلاتر أو مؤثرات صوتية صناعية. مدة التسجيل وتسليمه: يجب أن تكون المدة الإجمالية للتسجيل حوالي ساعة واحدة. يمكن تسجيل المحتوى على شكل عدة ملفات منفصلة، بشرط أن يتم تغطية جميع المقاطع المطلوبة بشكل كامل، دون حذف أو إهمال أي جزء من النص. حقوق الاستخدام: جميع التسجيلات الصوتية وأي نموذج صوت مستنسخ ناتج عنها تعتبر ملكية حصرية لشركة Graphlogic. يوافق المتقدم على التنازل الكامل والدائم عن أي حقوق حالية أو مستقبلية في التسجيلات أو الصوت المستنسخ لصالح شركة Graphlogic. يُمنع منعًا باتًا على المتقدم استخدام أو مشاركة أو بيع أو توزيع التسجيلات أو الصوت المستنسخ بأي شكل من الأشكال أو لأي طرف ثالث. لا يجوز استخدام التسجيلات أو الصوت المستنسخ لأي غرض شخصي أو تجاري خارج نطاق شركة Graphlogic، ولا يمكن مشاركته مع أي جهة أخرى إلا بعد الحصول على موافقة كتابية رسمية من الشركة. السرية والخصوصية: جميع النصوص، التسجيلات، والمعلومات المقدمة من شركة Graphlogic تعتبر سرّية. يلتزم المتقدم بعدم مشاركتها أو تسريبها أو إعادة استخدامها بأي شكل من الأشكال. أي خرق لهذا البند يُعتبر مخالفة قانونية صريحة، ويمنح شركة Graphlogic الحق الكامل في اتخاذ الإجراءات القانونية المناسبة. المعايير المهنية: يجب الالتزام بالنبرة المطلوبة (رسمية/محايدة) مع الالتزام التام بالنطق الصحيح للهجة النجدية. في حال احتوى أي تسجيل على جمل أو مقاطع بلهجة غير نجدية أو نطق غير مطابق، يجب إعادة تسجيل الملف الصوتي بالكامل من جديد وليس فقط الجزء الخاطئ، وذلك لضمان الاتساق الكامل في اللهجة وجودة البيانات الصوتية. يجب تصحيح أي أخطاء أو انحرافات عن اللهجة أو الجودة المطلوبة بشكل كامل، وفقًا لملاحظات ومراجعات فريق شركة Graphlogic. إجراءات الموافقة والمراجعة: ١. يجب الحصول على موافقة كتابية صريحة من المتقدم لاستخدام صوته في عملية استنساخ الصوت، قبل البدء في أي خطوات تقنية. ٢. بعد تسليم التسجيل، سيتم إجراء مراجعة دقيقة للتسجيل وتصحيح أي أخطاء أو ملاحظات لضمان مطابقته الكاملة للمتطلبات. ٣. سيتم عقد اجتماع مخصص للتحقق من الصوت بعد التسجيل، يتضمن قراءة جملة مرجعية من منصة الاستنساخ الصوتي، للتأكد من مطابقة الصوت واعتماده قبل بناء نموذج الصوت.
urgent
Expert Voice AI Prompt Engineer (Consultant)
Hearth AI is seeking a top-tier Voice AI prompt specialist for a high-impact consulting engagement. We are refining the user experience for our fast-growing flagship AI receptionist. We are looking for a top-tier, freelance Prompt Engineer who specializes in voice agents to help us achieve a more natural, effective, and human-like conversational flow. This is a short-term consulting engagement. You will work directly with our product team and CEO to: Audit & Review: Analyze our current prompt library and conversation flows, identifying areas for improvement in clarity, tone, and efficiency. Design & Engineer: Design, test, and refine write new, experimental prompts for various scenarios (e.g., call routing, message taking, appointment scheduling, handling ambiguity, repeat callers). Test & Optimize: Help us structure A/B tests to measure the performance of new prompts against existing ones. This is the right role for you if: You have at least 1-2 years of professional experience focused specifically on prompt engineering and conversational design for voice-first AI products (e.g., AI assistants, advanced IVR, voice-controlled applications). Deep understanding of the nuances of spoken language vs. written language (e.g., handling pauses, interruptions, and non-linear conversations). Ability to define and embody a specific persona/tone of voice for an AI agent. Your portfolio demonstrates a deep understanding of Voice User Interface (VUI) design principles. We must see examples of your work for voice agents. You have a sophisticated understanding of modern prompt engineering techniques (e.g., chain-of-thought, few-shot learning, ReAct) and how to apply them to spoken dialogue. You are results-obsessed and have direct experience measuring and improving conversational AI performance through A/B testing and data analysis. You have a "good ear" for what makes a conversation feel natural and can write dialogue that leverages the high-fidelity output of platforms like ElevenLabs. Bonus points for: Current or previous experience at a leading voice-first company Experience with workflow automation tools (n8n, Zapier) and an understanding of how prompts fit into an API-driven system. Excellent communication skills and the ability to articulate the rationale behind your prompt design choices. How to Apply: Please begin your proposal with the word "HEARTH" so we know you've read the details. In your proposal, please include: A link to your portfolio or 2-3 specific examples of voice-related prompt engineering or conversational design work. A brief paragraph on your experience and philosophy for creating effective prompts for a voice-based AI receptionist to achieve desired user or business outcomes.
Realtime API chatgpt bot
I need a simple bot using the Realtime API, following the official docs. It doesn’t need to look good or be deployed online—just run locally on my machine. The bot should connect to the API, handle basic messaging, and let me export our conversation as a text file or JSON. Keep the code clean and easy to follow, with setup instructions. No extra features or fancy UI—just a basic, working example I can build on later. (i want to be able to set its instructions and guidelines however i want easily)
Multilingual Conversational Speech Recording
Task: Record conversational speech in pairs (2 speakers, same room, 1 device)
clone my voice and switch 2 min video in French language
I prepare the project to announce it in social media. I will record a video - 2 minutes, in native language and I need to clone my voice and switch it to French. So, I have to sound like my mother tongue in French: emotionality, shades of speech, ect..