
Terminal Bench Expert
- or -
Post a project like this29
$1.0k
- Posted:
- Proposals: 8
- Remote
- #4500383
- Open for Proposals
Senior Full Stack .NET Developer | .NET Core, ASP.NET, MVC, Blazor, C#, SQL Server

133108811145721913134256130652331431110132720041111303810223049
Description
Experience Level: Expert
Role- Terminal Bench Expert
Employment Type - Remote
3-10 years of experience
3–10 years of experience in software engineering or relevant domains. Strong debugging, reasoning, and analytical skills
Full-time. 40 hours per week with an overlap of 4 hours with PST.
What does day-to-day look like: • Design high-quality Terminal-Bench task ideas and specifications.
• Develop complex tasks requiring reasoning, investigation, and debugging.
• Write clear task descriptions, solution approaches, and verification logic.
• Define deterministic, outcome-based evaluation criteria.
• Identify realistic failure modes, edge cases, and operational constraints.
• Create tasks that challenge AI systems while remaining solvable by experts.
• Collaborate with reviewers to refine task quality and difficulty.
• Contribute expertise across one or more specialized domains.
Required Skills:
• 3–10 years of experience in software engineering or relevant domains.
• Strong debugging, reasoning, and analytical skills.
• Good understanding of system design, workflows, and dependencies.
• Ability to analyze complex systems across multiple layers.
• Experience with production systems, pipelines, or large-scale workflows.
• Strong technical writing and documentation skills.
• Exposure to LLMs, agentic systems, or AI evaluation frameworks.
• Experience reviewing technical specifications or designing validation logic.
Domains (Any of the following):
• Software Engineering & Code Operations
• Debugging & Codebase Navigation
• System Administration & Shell Workflows
• File & Text Processing Pipelines
• Data Engineering (ETL & Data Pipelines)
• Database & SQL Operations
• Machine Learning Pipelines & MLOps
• Post-training & Model Finetuning Workflows
• AI Evaluation & Benchmarking Systems
• Retrieval, Search & Ranking Systems
• GPU / Systems Performance Optimization
• Distributed Systems & Infrastructure
• Cloud & Platform Engineering
• DevOps & CI/CD Systems
• Build & Dependency Management
• Scientific & Numerical Computing
• Simulation & Optimization Systems
• Formal Methods & Theorem Proving
• Document & Structured Data Processing (PDFs, Excel, etc.)
• Media Processing (Video, Audio, Images via CLI tools)
• Programmatic Graphics & Design (SVG, layout, rendering)
• Data Visualization & Reporting Workflows
• Geospatial & Spatial Data Processing
• Time-series & Forecasting Systems
• Security, Forensics & Reverse Engineering
• Cybersecurity & Vulnerability Analysis
• Networking & API Integration Workflows
• Automation & Multi-step Toolchain Orchestration
• CLI Tooling & Developer Tool Workflows
• Version Control & Git Workflows
• Observability, Logging & Monitoring
• Storage Systems & File Systems
• Finance & Accounting Workflows
• Quantitative Finance & Risk Modeling
• Legal & Compliance Workflows
• Healthcare & Clinical Data Processing
• Supply Chain & Logistics Operations
• Marketing & Growth Analytics
• CRM & Sales Operations
• HR & Recruiting Analytics
• Consulting & Strategy Modeling
• Investment Workflows
• Operations Research & Decision Optimization
• Benchmark Infrastructure, Adapters & Harness
Pallavi M.
0% (0)Projects Completed
-
Freelancers worked with
-
Projects awarded
0%
Last project
5 Jun 2026
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-

A few quick questions:
1. Which domains are currently the highest priority for your team? Software Engineering, DevOps, AI Evaluation, Data Engineering, etc.?
2. Is there an existing Terminal Bench framework that task creators will follow?
3. What is the expected volume of tasks per week for this role?
-

Is there an existing framework for deterministic verification, or is part of the role to help design and improve evaluation logic as well?
-

What's the actually day to day tasks or project?
115605511560441156043
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies