OpenClaw/OutlierAI Specialist-Python, Rubrics & Agent Trace

- or -

Post a project like this

Ends in (days)

Per Hour

$25_/hr

Posted: 3 hours ago
Proposals: 20
Remote
#4501759
Open for Proposals

+ have already sent a proposal.

Description

Experience Level: Entry

Overview:
We are looking for a remote AI developer/evaluator with hands-on experience in OpenClaw and AI task platforms such as Outlier. The ideal candidate can support both OpenClaw Atlas-style tasks and authentic OpenClaw session trace submission work.

Responsibilities:
- Work on OpenClaw/Outlier-style AI evaluation tasks
- Build or review agent workflows using OpenClaw
- Create task-specific rubrics, validation checks, and unit tests
- Evaluate AI model trajectories, outputs, and common LLM errors
- Use Python for coding-related tasks when needed
- Export and prepare eligible completed OpenClaw sessions with 150+ turns
- Redact PII, credentials, confidential data, and sensitive identifiers
- Package traces and related artifacts according to submission guidelines
- Follow all platform rules, rights, privacy, and compliance requirements

Requirements:
- Proven experience with OpenClaw
- Experience with Outlier, RLHF, LLM evaluation, or AI training platforms
- Strong Python/coding knowledge
- Ability to write clear rubrics and evaluation criteria
- Understanding of AI agents, tool use, trajectories, and prompt quality
- Experience with long-horizon agentic sessions preferred
- Must only use legitimate personal work/data with proper rights to share
- Strong English reading and writing skills
- Detail-oriented, reliable, and able to work remotely

Nice to Have:
- Existing real OpenClaw sessions with 150+ turns
- Experience with OpenClaw Atlas
- Experience in data redaction, DevOps, cybersecurity, or API workflows
- Prior work on rubrics, unit tests, or model evaluation tasks

Important:
- No fabricated sessions
- No account sharing
- No confidential, customer, or employer data
- No policy violations
- All work must comply with platform terms, privacy rules, and data rights requirements

New Proposal

Clarification Board Ask a Question

3 hours ago

I have a few questions to understand your requirement better:
===================================================
1. Are you looking for ongoing support or assistance on a project-by-project basis?
2. Which OpenClaw workflows are primarily involved—Atlas tasks, session trace submissions, or both?
3. Do you already have evaluation rubrics in place, or would you like them created from scratch?
4. What is the expected volume of traces or evaluation tasks per week?
5. Are there any specific AI models, domains, or tool-use workflows that require specialised evaluation experience?
3 hours ago

A few quick questions:
1. Are you looking for ongoing support or a fixed number of OpenClaw/Outlier tasks per week?
2. Will access to OpenClaw projects and evaluation guidelines be provided?
3. Do you already have existing session traces that need review, or should new evaluations be performed from scratch?
4 hours ago

Thanks for sharing the details. A few questions that would help me understand the scope better:

• Which OpenClaw workflows are you primarily working with today—Atlas tasks, trace generation, evaluation, or a combination of all three?

• Is prior OpenClaw production experience a strict requirement, or would equivalent experience with agent evaluation platforms, RLHF workflows, LangGraph, AutoGen, CrewAI, or similar frameworks be acceptable?

• For session trace submissions, are you expecting candidates to already possess eligible 150+ turn sessions, or will new sessions be created as part of the engagement?

• What percentage of the role is focused on rubric creation and evaluation versus Python development and workflow implementation?

• Are there existing evaluation guidelines and scoring frameworks, or will the selected candidate be expected to design rubrics from scratch?

• What types of agents are being evaluated most frequently (research, coding, customer support, browser automation, API workflows, etc.)?

• Are the OpenClaw sessions executed locally, in a hosted environment, or through a managed platform?

• What does a typical weekly workload look like in terms of number of traces, evaluations, or tasks?

• Will candidates be required to write automated tests and validation suites, or primarily perform manual evaluation of trajectories?

• Is this project expected to evolve into ongoing AI evaluation work, or is it primarily a fixed-duration engagement focused on specific submissions?

Description

Marc D.

New Proposal

Clarification Board Ask a Question