
Technical Lead (Python / AI Systems)
- or -
Post a project like this- Posted:
- Proposals: 29
- Remote
- #4493340
- Open for Proposals





Description
The goal is not theoretical AI, but a working prototype that:
- processes audio/video input
- detects speech and activity
- derives simple patterns
- produces structured output for further analysis
I’m looking for a strong technical thinker who can:
- translate high-level ideas into concrete system architecture
- make pragmatic technical decisions (tools, libraries, structure)
- break down work into clear tasks for a developer
- review and guide implementation
- focus on getting something working (stepwise going for perfection and increasing complexity)
Your role:
- define the system architecture (input → processing → output)
- select and validate tools (e.g. Whisper, diarization, OpenCV, etc.)
- structure the pipeline and data flow
- write clear technical tasks/specs
- guide and review the work of 1 developer (offshore)
- act as a sparring partner for technical decisions
Profile:
- strong experience with Python backend and/or data pipelines
- experience building real systems (not only notebooks or experiments)
- experience with APIs (FastAPI / Flask)
- experience with audio/video processing is a strong plus
- familiarity with integrating AI tools (not necessarily training models)
- independent, critical, and structured thinker
Practical:
- freelance / part-time (5–10 hours per week)
- remote
- start asap
This is not a pure development role. I’m specifically looking for someone who can think, structure, and guide, not just execute.
To apply please include:
- relevant experience (with concrete examples)
- how you would approach building such a pipeline (short, structured)
- availability
Kris V.
0% (0)New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-

Hi, thanks for the clear brief, this is exactly the kind of structured AI system I enjoy working on.
To align on the architecture and scope, could you clarify: what level of real-time vs batch processing you’re aiming for initially, whether you already have sample audio/video datasets to work with, and what kind of output format/insights you expect (e.g. timestamps, speaker labels, activity summaries, structured JSON)? Also, do you have any preference for deployment environment (local, cloud, GPU availability) or should I propose the full pipeline setup?
This will help me define a practical, step-by-step system architecture and guide the developer effectively from day one. -

1/ What type of audio/video inputs are we primarily dealing with? Are these pre-recorded files, live streams, or both?
2/ How accurate does the speaker diarization need to be for your use case? Is approximate speaker separation acceptable or do you need high precision?
3/ What specific metrics are critical beyond speaking time and silence? Do you want advanced insights later like engagement scoring or sentiment?