Prompt Engineer – AI Systems & Automation Specialist

- or -

Post a project like this

Ends in (days)

Per Hour

$10_/hr

Posted: 1 day ago
Proposals: 18
Remote
#4470916
Open for Proposals

+ have already sent a proposal.

Description

Experience Level: Entry

I'm looking for someone who can build a solid and reliable prompting framework that delivers measurable ROI for business use cases, not just impressive demos. We need a prompt engineer who thinks like a systems architect, not just someone who can write clever prompts.
You'll be designing, testing, and optimizing prompts for production-grade AI workflows. This isn't about getting ChatGPT to write poems; it's about building reliable, scalable prompt chains that solve real business problems.

Deliverables
What You'll Do:
Design and optimize prompts for LLM-powered automation workflows (GPT-4, Claude, open-source models)
Build evaluation frameworks to measure prompt performance against business KPIs
Develop prompt templates and chains for multi-step reasoning tasks
Document prompt engineering patterns and create reusable libraries
Collaborate on agentic AI systems using frameworks like LangChain
Debug and iterate on prompts when model outputs don't meet specifications
Required Skills & Experience:
2+ years hands-on prompt engineering experience (personal projects count if well-documented)
Deep understanding of prompt patterns: chain-of-thought, few-shot learning, system prompt architecture, retrieval-augmented generation (RAG)
Experience with prompt evaluation and A/B testing methodologies
Familiarity with tokenization, context window management, and model-specific quirks
Strong technical writing—you can explain complex prompt logic clearly
Python proficiency for prompt testing and automation integration
Preferred (Not Required):
Background in ML/AI fundamentals (understanding of transformers, embeddings, fine-tuning concepts), at least foundational AI/ML understanding, you don't need to train models, but you understand concepts like temperature, context windows, and model limitations.
Experience with n8n, Make, or similar automation platforms
Exposure to agentic frameworks (LangChain, AutoGPT patterns, function calling)
In your application, please begin your cover letter with the word "CONTEXT:" followed by a one-sentence summary of what makes our AI engineering approach different based on this job description. Applications without this will not be reviewed.
Assessment Task (Required with Application)
Instead of multiple interview rounds, complete this practical assessment:
Scenario: A client wants to automate customer support ticket classification. Tickets should be categorized into: Billing, Technical, Feature Request, or Escalation. The model sometimes misclassifies urgent technical issues as "Feature Request."
Your task:
Write the system prompt you would use
Provide 3-5 few-shot examples you'd include
Explain your reasoning for structural choices
Describe how you would evaluate and iterate on this prompt
Time estimate: 30-45 minutes. We value thoughtful reasoning over speed.
This task reveals: Do they understand edge cases? Can they structure prompts systematically? Do they think about evaluation? AI-generated responses typically miss the nuanced "why" behind choices.
What Success Looks Like
Prompts that work reliably at scale, not just in demos
Clear documentation that another engineer could pick up
Data-driven iteration, not just "try things until it works."
To Apply: Submit your assessment response and a brief note on a prompt engineering challenge you've solved. Generic applications will not be reviewed.

New Proposal

Clarification Board Ask a Question

Yesterday

Could you please let me know:
Which LLMs and deployment environment (e.g., OpenAI, Anthropic, or self-hosted models) do you plan to use in production so the prompting framework can be optimized for their specific constraints and capabilities?
Thanks
Naresh
Yesterday

Hi Franklin, quick checks so I can tailor the proposal:

1. Which one workflow are we starting with (e.g., support ticket classification only)?

2. What’s the success metric (target accuracy/precision-recall, and “urgent technical” must be caught)?

3. What stack are you using (model + where prompts run: Zendesk/Intercom/custom app + LangChain/n8n/Make)?

4. Do you have sanitised ticket samples / labels for evaluation (even 30–50 examples)?

Description

Franklin P.

New Proposal

Clarification Board Ask a Question