Generalist AI Quality Rater (US-Remote) – $75/hr
We are seeking multiple Generalist AI Quality Raters to design effective prompts, generate and evaluate LLM outputs, and collaborate with cross-functional teams to improve model quality.
You’ll drive best practices for prompt engineering, human evaluation, and continuous model improvement.
- To Note: This is for an immediate project need. Project is approved for 3-months initially, with possibility to extend based on project/client demands.
- To Note: This is a fixed hourly rate project in USD ($75/hr)
Responsibilities:
- Design prompts tailored to tasks, contexts, and audiences
- Generate creative and informative texts (stories, articles, summaries, dialogues)
- Evaluate/compare LLM responses for fluency, coherence, factuality, creativity, bias
- Analyze outputs; identify strengths/weaknesses; write actionable feedback
- Collaborate with engineers/research/product on prompts and evaluation metrics
- Contribute guidelines and best practices for prompt engineering & LLM evaluation
Requirements:
- Bachelor’s in English, Creative Writing, Linguistics, CS, or related
- 3+ years in content writing/copywriting or related field
- Strong grasp of LLM capabilities/limitations; prompt engineering/eval experience a plus
- Excellent writing/editing; adaptable tone and style
- Analytical problem-solver; independent and team-oriented; growth mindset
- Familiarity with data annotation/labeling tools; stats/data analysis a plus
- Tech-savvy; willingness to grow technical skills (e.g., read JSON)
- Preferred: video generation exposure; NLP knowledge; human eval methods; research experience