
Interactive AI Experience – 3D Guide & Custom Image Gen
- or -
Post a project like this- Posted:
- Proposals: 31
- Remote
- #4469890
- OPPORTUNITY
- Expired




Description
At the end of the interaction, the system produces:
• A symbolic, poetic spoken response
• One AI-generated image based on the participant’s clarified vision, rendered in a custom visual style trained on my artwork
This is a poetic, immersive digital art experience, not a generic chatbot or commercial tool.
Deliverable: A mini website / web module that can be integrated into an existing website (for example, as a subpage or subdirectory).
Scope Clarification
The generated images will later be shown in a separate digital “wall” project built by another team.
This job does NOT include building that wall interface.
Your responsibility is to:
✔ Generate the images
✔ Store them with structured metadata
✔ Make them exportable for future integration
Technical Constraints (Non-Negotiable) -
• Open-source / open-weight AI models only (LLM, image generation, TTS, STT)
• Self-hosted deployment on my infrastructure (Hetzner servers)
• No proprietary AI APIs
Core User Experience Flow -
- Short conceptual intro animation
- 3D character appears and speaks, introducing the ritual
- User selects one of five thematic prompts
- User shares a vision (text input; voice input optional bonus)
- AI-guided dialogue (2–4 turns) to clarify the scenario
- Final symbolic spoken response from the character
- One AI-generated image created from the clarified vision
- Session data saved for archive and future visual display
Technical Requirements -
Frontend (Mini Website)
• Immersive but lightweight interface
• Smooth transitions between stages
• Audio playback (music + character voice)
• Responsive design (desktop + mobile)
• Built using React / Next.js or similar
3D Speaking Character -
• WebGL / Three.js / A-Frame (or similar)
• Rigged character model (provided)
• Idle animation
• Speaking animation synced to audio
(lip sync preferred, amplitude-based acceptable for MVP)
AI Dialogue System (Open-Source LLM) -
• Self-hosted open-weight model
• Multi-turn conversation handling
• Structured prompting system
• Outputs:
– follow-up prompts
– final poetic response
– structured summary for image generation
Voice System (Open-Source TTS) -
• Open-source text-to-speech hosted on server
• Audio drives speaking animation
Custom Style Image Generation -
The generated image must consistently match a custom artistic visual language based on my artwork.
Prompting alone is not enough.
You must implement:
Preferred: LoRA training using my artwork dataset
Alternative: Style adapter / reference conditioning
Requirements:
• One image per session
• Seed reproducibility
• Style strength control
• Save prompt + generation parameters
Backend & Storage
Store for each session:
• Selected prompt theme
• Dialogue transcript
• Final spoken response
• Scenario summary
• Image prompt + parameters
• Generated image file
• Timestamp
Admin Panel
Simple password-protected page to:
• View sessions
• Download text and images
Deployment Requirements
• Linux deployment on Hetzner
• Docker / Docker Compose preferred
• Documentation for:
– setup
– model downloads
– environment variables
– running services
– updating style model
Project Timeline
Total duration: 2 months
Skills Required
• Web 3D (Three.js / A-Frame / WebGL)
• Experience integrating animated 3D characters in the browser
• Experience serving open-source LLMs
• Diffusion model LoRA or adapter training
• Backend/API development
• Docker + Linux deployment
How to Apply
Please include:
2–3 relevant projects (AI apps, WebGL/WebXR, or interactive experiences)
Proposed tech stack (frontend, backend, model serving)
Which open models you would use (LLM, diffusion, TTS) and why
Recommended server setup (GPU/VRAM) for acceptable performance
Screening Questions
How would you sync speech audio to a 3D character animation in the browser?
Which open-weight LLM would you deploy and how would you serve it?
How would you train and deploy a custom style LoRA for image generation?
What server setup would you recommend and why?
Pierre G.
100% (2)New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-

Hi Pierre,
What approximate size and resolution of artwork dataset will you provide for training the custom LoRA style model, and do you already have usage rights cleared for all included images?
Please let me know.
Thanks
NareshPierre G.Thu 8:07amHello, all works are mine. I will share hi-res images of past projects.
-

Hi Pierre, thanks for the detailed brief. Before I send a proposal, could you confirm two points so I scope this correctly: (1) What’s the must-have MVP for the first 2–3 weeks (e.g., text-only input, amplitude-based mouth movement acceptable, 2 turns vs 4 turns, intro animation optional)? (2) What Hetzner GPU spec/VRAM will be available, and should all services (LLM + diffusion + TTS + web) run on a single server via Docker Compose?
