
LLM Reasoning Benchmark Questions + Python Evaluation Script
- or -
Post a project like this17
$20
- Posted:
- Proposals: 6
- Remote
- #4367579
- Awarded
Description
Experience Level: Entry
I need a freelancer to prepare benchmark questions and answers for testing a custom LLM’s reasoning ability.
Scope:
Question Set:
Collect 500–600 LLM benchmark questions with correct answers.
Focus areas: logical, mathematical, commonsense, analytical, and multi-step reasoning.
Deliver as JSON or CSV.
Python Script:
Load questions and send them to an LLM (I'll handle API integration).
Compare model answers to correct ones.
Output a simple accuracy report.
Requirements:
Knowledge of LLMs, reasoning datasets, or NLP is preferred.
Clean, documented code.
Use only open or original questions.
Scope:
Question Set:
Collect 500–600 LLM benchmark questions with correct answers.
Focus areas: logical, mathematical, commonsense, analytical, and multi-step reasoning.
Deliver as JSON or CSV.
Python Script:
Load questions and send them to an LLM (I'll handle API integration).
Compare model answers to correct ones.
Output a simple accuracy report.
Requirements:
Knowledge of LLMs, reasoning datasets, or NLP is preferred.
Clean, documented code.
Use only open or original questions.

Roberto R.
100% (3)Projects Completed
3
Freelancers worked with
2
Projects awarded
25%
Last project
25 Apr 2025
United States
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies