
Mercor’s expert-feedback engine scales to $500M run rate and $10B+ valuation
Mercor has turned specialist human feedback into a routinized supply chain for model refinement, and the numbers show it: the firm reports paying over $1,000,000 per day to contributors, moved from a $1M run rate to roughly $500M in under 24 months, and is estimated at more than $10 billion in value. That acceleration signals investors view expert scoring and task-specific evaluation as a structural layer for production AI, not a temporary cost center.
The company aggregates tens of thousands of domain specialists—spanning medicine, law, finance, engineering, comedy and more—to produce labelled judgments and rubrics used in reinforcement learning from human feedback (RLHF) pipelines. Large labs such as OpenAI, Google and Anthropic rely on similar human-in-the-loop stages; Scale AI and other service providers supply adjacent capacity, making a nascent market architecture with multiple suppliers and high strategic value.
Practically, experts are enlisted to grade model outputs against custom rubrics and to author corrective responses; firms use those labels to optimize reward models and fine-tune policy networks. Compensation ranges up to several hundred dollars an hour for niche work, creating a gig-style labor market that monetizes professional judgment but also raises questions about the long-term composition of skilled work.
In healthcare applications, physicians train models on layered scenarios that mix patient-facing prompts and clinician-level jargon, which materially reduces hallucination risk and improves triage precision when integrated into clinical decision support. Yet experts interviewed stress models remain assistive: they reduce administrative load and surface differential diagnoses, but do not replicate tacit clinical judgement obtained from in-person evaluation.
The business implications are twofold: first, investor bets on companies like Mercor imply recurring revenue potential from continuous model maintenance and localization tasks; second, workflows that require specialized rubrics—comedy, legal reasoning, and region-specific content moderation—create defensible demand for curated human feedback. That dynamic favors platforms that can both recruit deep specialists and translate qualitative judgements into quantitative reward signals.
Operational risks persist. Subjective tasks scale poorly because inter-rater variance undermines consistent reward shaping, and localization demands increase labeling complexity for humor or culturally specific content. Competitors such as Surge AI, Handshake and Micro1, plus large strategic players, create pricing pressure that could compress margins if commoditization occurs.
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you

Render raises $100M Series C extension at $1.5B valuation to build AI application runtime
Render secured a $100 million Series C extension at a $1.5 billion valuation, bringing total capital raised to $258 million and accelerating its push into AI-native infrastructure. The company cited platform growth—over 4.5 million developers and roughly 250,000 monthly signups—and will invest in a unified AI application runtime and new primitives like Render Workflows .
U.S. private equity’s software strategy runs into an AI-driven valuation reset
Private-equity portfolios built on recurring‑revenue enterprise software face a rapid valuation reappraisal as AI shifts buyer priorities, raises integration costs and tightens financing terms. Sponsors must accelerate AI execution, shore up data and compute access, and contend with higher cost of capital and concentrated hyperscaler procurement or risk longer holds and lower exit multiples.





