LLM-Powered Relevance Assessment for Pinterest Search

Pinterest uses fine-tuned large language models (LLMs) to scale relevance labeling and improve search quality evaluation in online A/B experiments.

💡

Why it matters

This work demonstrates how LLMs can be leveraged to scale relevance labeling and improve search quality measurement, which is crucial for personalized search systems.

Key Points

  • 1Pinterest measures search relevance using a 5-level guideline and fine-tunes open-source LLMs to predict relevance scores
  • 2LLM labeling significantly reduces labeling costs and enables a stratified sampling design to measure heterogeneous treatment effects
  • 3The stratified sampling approach and LLM-powered labeling reduced the minimum detectable effect (MDE) by an order of magnitude

Details

Pinterest tracks whole-page relevance in online A/B experiments to evaluate new ranking models. Relevance measurement typically relies on human annotations, which is limited by low availability and high cost. To address this, Pinterest fine-tunes open-source LLMs on relevance prediction tasks using human-annotated labels. The fine-tuned LLMs are then used to evaluate ranking results across experimental groups, significantly reducing labeling costs and improving evaluation efficiency. Additionally, the authors leverage a stratified query sampling design enabled by the scalable LLM labeling, which reduces the minimum detectable effect (MDE) by an order of magnitude compared to the previous approach.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies