
OpenClaw/OutlierAI Specialist-Python, Rubrics & Agent Trace
- or -
Post a project like this- Posted:
- Proposals: 20
- Remote
- #4501759
- Open for Proposals









Description
We are looking for a remote AI developer/evaluator with hands-on experience in OpenClaw and AI task platforms such as Outlier. The ideal candidate can support both OpenClaw Atlas-style tasks and authentic OpenClaw session trace submission work.
Responsibilities:
- Work on OpenClaw/Outlier-style AI evaluation tasks
- Build or review agent workflows using OpenClaw
- Create task-specific rubrics, validation checks, and unit tests
- Evaluate AI model trajectories, outputs, and common LLM errors
- Use Python for coding-related tasks when needed
- Export and prepare eligible completed OpenClaw sessions with 150+ turns
- Redact PII, credentials, confidential data, and sensitive identifiers
- Package traces and related artifacts according to submission guidelines
- Follow all platform rules, rights, privacy, and compliance requirements
Requirements:
- Proven experience with OpenClaw
- Experience with Outlier, RLHF, LLM evaluation, or AI training platforms
- Strong Python/coding knowledge
- Ability to write clear rubrics and evaluation criteria
- Understanding of AI agents, tool use, trajectories, and prompt quality
- Experience with long-horizon agentic sessions preferred
- Must only use legitimate personal work/data with proper rights to share
- Strong English reading and writing skills
- Detail-oriented, reliable, and able to work remotely
Nice to Have:
- Existing real OpenClaw sessions with 150+ turns
- Experience with OpenClaw Atlas
- Experience in data redaction, DevOps, cybersecurity, or API workflows
- Prior work on rubrics, unit tests, or model evaluation tasks
Important:
- No fabricated sessions
- No account sharing
- No confidential, customer, or employer data
- No policy violations
- All work must comply with platform terms, privacy rules, and data rights requirements
Marc D.
0% (0)New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-

I have a few questions to understand your requirement better:
===================================================
1. Are you looking for ongoing support or assistance on a project-by-project basis?
2. Which OpenClaw workflows are primarily involved—Atlas tasks, session trace submissions, or both?
3. Do you already have evaluation rubrics in place, or would you like them created from scratch?
4. What is the expected volume of traces or evaluation tasks per week?
5. Are there any specific AI models, domains, or tool-use workflows that require specialised evaluation experience? -

A few quick questions:
1. Are you looking for ongoing support or a fixed number of OpenClaw/Outlier tasks per week?
2. Will access to OpenClaw projects and evaluation guidelines be provided?
3. Do you already have existing session traces that need review, or should new evaluations be performed from scratch?
-

Thanks for sharing the details. A few questions that would help me understand the scope better:
• Which OpenClaw workflows are you primarily working with today—Atlas tasks, trace generation, evaluation, or a combination of all three?
• Is prior OpenClaw production experience a strict requirement, or would equivalent experience with agent evaluation platforms, RLHF workflows, LangGraph, AutoGen, CrewAI, or similar frameworks be acceptable?
• For session trace submissions, are you expecting candidates to already possess eligible 150+ turn sessions, or will new sessions be created as part of the engagement?
• What percentage of the role is focused on rubric creation and evaluation versus Python development and workflow implementation?
• Are there existing evaluation guidelines and scoring frameworks, or will the selected candidate be expected to design rubrics from scratch?
• What types of agents are being evaluated most frequently (research, coding, customer support, browser automation, API workflows, etc.)?
• Are the OpenClaw sessions executed locally, in a hosted environment, or through a managed platform?
• What does a typical weekly workload look like in terms of number of traces, evaluations, or tasks?
• Will candidates be required to write automated tests and validation suites, or primarily perform manual evaluation of trajectories?
• Is this project expected to evolve into ongoing AI evaluation work, or is it primarily a fixed-duration engagement focused on specific submissions?