LLM Reasoning Benchmark Questions + Python Evaluation Script

- or -

Post a project like this

Ends in (days)

430

Fixed Price

$20

Posted: 1 year ago
Proposals: 6
Remote
#4367579
Completed

+ have already sent a proposal.

Description

Experience Level: Entry

I need a freelancer to prepare benchmark questions and answers for testing a custom LLM’s reasoning ability.

Scope:
Question Set:
Collect 500–600 LLM benchmark questions with correct answers.
Focus areas: logical, mathematical, commonsense, analytical, and multi-step reasoning.
Deliver as JSON or CSV.
Python Script:
Load questions and send them to an LLM (I'll handle API integration).
Compare model answers to correct ones.
Output a simple accuracy report.
Requirements:
Knowledge of LLMs, reasoning datasets, or NLP is preferred.
Clean, documented code.
Use only open or original questions.

New Proposal

Clarification Board Ask a Question

There are no clarification messages.