#1: Yupp wants you to judge ChatGPT—for cash

A comprehensive analysis of the startup that's turning human judgment into a renewable economic resource

Jun 19, 2025

Imagine if every time you compared ChatGPT to Claude or tested a new AI model, you got paid for your opinion. That's exactly what Yupp is building—a platform that transforms casual AI interactions into valuable data that improves the models we all use.

Yupp emerged from stealth in June 2024 with a massive $33M seed round led by a16z crypto, tackling one of AI's most pressing challenges: how do we actually know which AI models are better? The startup has created a crowdsourced evaluation platform that lets users compare over 500 AI models for free while earning rewards for their feedback.

Introducing Yupp: Every AI for everyone | Pankaj Gupta

Key Highlights:

Funding: $33M seed round (June 2024) led by a16z crypto
Investors: 45+ investors including Google Chief Scientist Jeff Dean, Twitter co-founder Biz Stone, Pinterest co-founder Evan Sharp, and Coinbase Ventures
Market Opportunity: Addressing the $X billion AI evaluation market gap
Unique Approach: Blockchain-incentivised human feedback for AI model improvement

The Problem: AI's Evaluation Crisis

The Black Box Dilemma

Modern AI development faces a fundamental problem: companies build powerful models behind closed doors, but nobody really knows how good they are in real-world scenarios. Here's why this matters:

Traditional Evaluation Falls Short:

Academic benchmarks don't predict real-world performance
Static tests can't capture the nuanced ways humans actually use AI
Companies keep their training data and processes secret
Users provide feedback into a void with no transparency or rewards

The Hidden Costs:

AI developers struggle with "hallucinations" and unreliable outputs
Users can't trust AI recommendations without transparent evaluation
Billions invested in AI development without clear performance standards
Innovation slows when feedback loops are broken or opaque

Real-World Impact

Consider this scenario: A healthcare AI suggests a treatment, a legal AI drafts a contract, or a financial AI recommends an investment. How do we know these models are actually better than alternatives? Current evaluation methods rely on:

Closed-door testing by the companies that built the models
Academic benchmarks that may not reflect real usage
User feedback that disappears into proprietary systems

This creates a trust gap that Yupp aims to bridge.

So, the product is pretty simple, right? But, do we need it? Hell, yeah.

It’s still in an early stage and needs a lot of improvements to become the truth machine for LLMs.

The Solution: Crowdsourced AI Evaluation with Crypto Incentives

How Yupp Works

Yupp has built what they call a "trustless AI feedback market"—a platform where human judgment becomes a measurable, valuable resource.

For Users:

Free Access: Compare responses from 500+ AI models including ChatGPT, Claude, Gemini, and emerging models
Side-by-Side Testing: Enter prompts and see multiple AI responses simultaneously
Get Paid: Earn "Yupp Credits" for selecting the best responses (up to $50/month)
Multiple Payout Options: Convert credits to USD/EUR via Stripe/PayPal or instant crypto payments on Base and Solana

For AI Developers:

Quality Data: Access digitally-signed preference data from real users
Transparent Evaluation: Use blockchain-verified feedback for model training
Standardized Benchmarks: Leverage the "Yupp VIBE Score" for performance comparison
Continuous Improvement: Implement RLHF and Direct Preference Optimization with fresh data
(Leaderboard from Yupp)

The Technical Aspect

Yupp's core innovation lies in creating "digitally signed packets of preference data"—blockchain-verified records of user choices that serve multiple purposes:

Transparency: All feedback is auditable on-chain
Provenance: Data source and quality can be verified
Incentive Alignment: Users earn rewards, developers get quality data
Market Dynamics: Data "expires" as newer interactions replace it, creating continuous demand

Market Opportunity & Timing

The AI Evaluation Market Gap

The rapid deployment of large language models has created an urgent need for systematic evaluation. Current market dynamics show:

Growing Demand:

Every company building AI products needs evaluation frameworks
Post-training techniques like RLHF require massive human feedback datasets
Model safety and alignment depend on understanding real-world performance
Regulatory requirements increasingly demand transparent AI evaluation

Supply Constraints:

High-quality human feedback is expensive and time-consuming to collect
Most evaluation happens in silos within individual companies
Academic benchmarks lag behind real-world AI applications
Users provide feedback for free with no transparency on usage

Remember? Attention is the new money.

Market Size Indicators:

Global AI market projected to reach $2.4T by 2030
Every major AI company invests millions in evaluation and safety
Human feedback represents 20-30% of post-training costs
Growing enterprise demand for AI evaluation tools

Several trends converge to make Yupp's timing strategic:

AI Model Proliferation: Hundreds of new models launch monthly, creating evaluation complexity
Regulatory Pressure: Governments increasingly require AI transparency and safety measures
Crypto Infrastructure Maturity: Layer-2 solutions make micro-payments feasible
User Awareness: People understand their data has value and want compensation

Founding Team & Leadership

Proven Track Record in AI and Crypto

Pankaj Gupta, Co-founder

Background: Global Consumer Engineering leader at Google Pay and Coinbase
Experience: Scaled consumer products serving millions of users
Expertise: Intersection of payments, consumer behavior, and blockchain technology

Gilad Mishne, Co-founder

Background: Machine Learning lead at GoogleX (Google's moonshot division)
Experience: Built consumer-scale ML products in Twitter's early days
Expertise: Advanced ML systems, natural language processing, and product development

Investor Perspective: a16z's Strategic Bet

Why a16z Led the Round

From Chris Dixon and Elizabeth Harkavy's investment memo, several key themes emerge:

Market Timing:

"AI needs strong, reliable evaluation based on large-scale human input. Crypto is the trust machine that can help deliver it."

Technical Innovation: The combination of blockchain transparency with AI evaluation creates what a16z calls a "credibly neutral marketplace" where:

No one can hide the scoreboard
No one can manipulate rewards or results
All participants operate under transparent rules

Economic Model:

"Yupp's design turns human judgment into a renewable economic resource."

The flywheel effect creates sustainable growth:

More usage → Fresher evaluations → Better models → More usage

Blue-Chip Investor Validation

The 45+ investor syndicate includes remarkable names:

Tech Luminaries:

Jeff Dean: Google's Chief Scientist and AI pioneer
Biz Stone: Twitter co-founder who understands viral social platforms
Evan Sharp: Pinterest co-founder with visual discovery expertise

Strategic Investors:

Coinbase Ventures: Crypto infrastructure expertise
Multiple AI Research Labs: Technical validation from academia

This investor composition suggests Yupp has achieved rare consensus across traditionally separate communities (AI researchers, crypto investors, consumer internet veterans).

Competitive Landscape

Direct Competitors

Traditional AI Evaluation Platforms:

Confident AI: Focus on enterprise ML model monitoring
Arize AI: ML observability and performance tracking
MLFlow: Open-source ML lifecycle management
Ragas: LLM-specific evaluation frameworks

Limitations of Current Solutions:

Primarily serve enterprise developers, not end users
Rely on static benchmarks rather than dynamic human feedback
No incentive mechanism for quality data contribution
Limited transparency in evaluation methodologies

Yupp's Competitive Advantages

Technical Differentiation:

Blockchain-Verified Data: Tamper-proof evaluation records
User Incentives: Direct rewards for quality feedback
Real-World Testing: Crowdsourced evaluation vs. synthetic benchmarks
Transparent Scoring: Public "VIBE Score" leaderboard

Network Effects:

More users → Better data quality → More accurate evaluations → Attracts more developers
More developers → More models to test → Better user experience → More users

Strategic Moats:

Data Quality: Community-driven evaluation creates unique datasets
Brand Recognition: Public leaderboard becomes industry standard
Regulatory Alignment: Transparent evaluation meets compliance requirements

Technical Deep Dive

Architecture Overview

Core Platform Components:

Model Integration Layer
- Supports 500+ AI models via standardized APIs
- Real-time response collection and comparison
- Scalable infrastructure for concurrent model queries
Feedback Capture System
- User interface for side-by-side model comparison
- Cryptographically signed preference data packets
- Quality scoring and spam prevention mechanisms
Blockchain Infrastructure
- Base Ethereum L2 and Solana integration for payments
- Smart contracts for transparent reward distribution
- On-chain audit trails for data provenance
Data Processing Pipeline
- Real-time aggregation of user preferences
- Statistical analysis for model ranking (VIBE Score)
- Data packaging for developer consumption

Scalability Considerations

Current Capabilities:

Support for 500+ models with room for expansion
Global user base with multi-currency reward systems
Integration with major payment providers (Stripe, PayPal, Coinbase)

Technical Challenges:

Latency: Querying multiple AI models simultaneously
Cost Management: API costs for model access at scale
Data Quality: Ensuring feedback authenticity and preventing gaming
Blockchain Throughput: Managing micro-transactions efficiently

Solutions Implemented:

Layer-2 blockchain integration reduces transaction costs
Sophisticated anti-gaming algorithms protect data quality
Caching and optimization reduce model query costs
Distributed infrastructure supports global usage

Business Model Analysis

Revenue Streams

Primary Revenue (Projected):

Data Licensing: AI developers pay for access to preference datasets
Evaluation Services: Custom evaluation projects for enterprise clients
API Access: Developers integrate Yupp's evaluation capabilities
Premium Features: Advanced analytics and reporting tools

User Economics:

User Rewards: Up to $50/month for active contributors
Engagement Incentive: 1,000 credits = $1 USD
Quality Bonuses: Higher rewards for consistently good feedback

Unit Economics Framework

User Acquisition:

Customer Acquisition Cost (CAC): Low due to reward-based viral growth
Lifetime Value (LTV): Data value compounds over time
Payback Period: Immediate value creation through better AI evaluation

Data Economics:

Collection Cost: User rewards + platform infrastructure
Data Value: Premium pricing for verified, blockchain-backed preferences
Margin Potential: High-value data commands significant premiums

Growth Drivers:

Expanding AI model ecosystem increases evaluation demand
Regulatory requirements boost enterprise adoption
Network effects accelerate user and developer acquisition
International expansion multiplies addressable market

Review

I won’t get started with UI and UX as it is very new in the market and needs a lot of features to compete as a platform that is targeting consumers directly. The bigger competition here for Yupp is conversational AI platforms like ChatGPT or Claude. If the response is not great with the same prompts, people would not spend time here to make some bucks. When I tested it, two models (GPT 4o mini and Gemini 2.5 Flash) showed the results. GPT 4o mini was extremely fast but buggy. Gemini was accurate in most of the cases, depends how you play with the prompts.

I found this product very cool and waiting to see some cool and interactive features shipping in the next months.

Data is the new oil, indeed.

Keep cooking!

Thanks,

Sangam

Follow me on X, LinkedIn and Medium.