Why Humans Are Still Powering AI [Sponsored]

Machine Learning Street Talk

Monday, November 3, 202524m

Spotify Apple

Machine Learning Street Talk

0:0024:19

What You'll Learn

✓Artificial intelligence is fundamentally built on human intelligence, with a 'messy layer' of human data, labeling, and evaluation.
✓Prolific aims to make it easy for AI researchers to access trustworthy, high-quality human participants for data collection, while incentivizing the right behavior and building long-term relationships.
✓The platform uses techniques like participant vetting, feedback loops, network analysis, and dynamic task assignment to ensure high-quality data inputs.
✓The goal is to move away from a 'mechanical Turk' model of commoditized human labor, and instead foster direct connections and mutual understanding between researchers and participants.
✓This human-centric approach to powering AI has implications for the future of work, as the platform aims to provide human expertise on demand.

AI Summary

This episode discusses the importance of human data and expertise in powering AI systems, despite the common perception that AI is fully automated. The guest, the co-founder of Prolific, a human data infrastructure company, explains how they build a platform to connect researchers with high-quality, vetted participants to provide data and insights for AI model development. Key challenges include incentivizing participants, maintaining data quality, and building long-term relationships between researchers and participants.

Key Points

1Artificial intelligence is fundamentally built on human intelligence, with a 'messy layer' of human data, labeling, and evaluation.
2Prolific aims to make it easy for AI researchers to access trustworthy, high-quality human participants for data collection, while incentivizing the right behavior and building long-term relationships.
3The platform uses techniques like participant vetting, feedback loops, network analysis, and dynamic task assignment to ensure high-quality data inputs.
4The goal is to move away from a 'mechanical Turk' model of commoditized human labor, and instead foster direct connections and mutual understanding between researchers and participants.
5This human-centric approach to powering AI has implications for the future of work, as the platform aims to provide human expertise on demand.

Topics Discussed

#Human data and expertise in AI#Participant vetting and incentivization#Long-term researcher-participant relationships#Data quality assurance techniques#Future of work and human-powered AI

Frequently Asked Questions

What is "Why Humans Are Still Powering AI [Sponsored]" about?

What topics are discussed in this episode?

This episode covers the following topics: Human data and expertise in AI, Participant vetting and incentivization, Long-term researcher-participant relationships, Data quality assurance techniques, Future of work and human-powered AI.

What is key insight #1 from this episode?

Artificial intelligence is fundamentally built on human intelligence, with a 'messy layer' of human data, labeling, and evaluation.

What is key insight #2 from this episode?

Prolific aims to make it easy for AI researchers to access trustworthy, high-quality human participants for data collection, while incentivizing the right behavior and building long-term relationships.

What is key insight #3 from this episode?

The platform uses techniques like participant vetting, feedback loops, network analysis, and dynamic task assignment to ensure high-quality data inputs.

What is key insight #4 from this episode?

The goal is to move away from a 'mechanical Turk' model of commoditized human labor, and instead foster direct connections and mutual understanding between researchers and participants.

Who should listen to this episode?

This episode is recommended for anyone interested in Human data and expertise in AI, Participant vetting and incentivization, Long-term researcher-participant relationships, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

<p>Ever wonder where AI models actually get their "intelligence"? We reveal the dirty secret of Silicon Valley: behind every impressive AI system are thousands of real humans providing crucial data, feedback, and expertise.Guest: Phelim Bradley, CEO and Co-founder of ProlificPhelim Bradley runs Prolific, a platform that connects AI companies with verified human experts who help train and evaluate their models. Think of it as a sophisticated marketplace matching the right human expertise to the right AI task - whether that's doctors evaluating medical chatbots or coders reviewing AI-generated software.Prolific: https://prolific.com/?utm_source=mlsthttps://uk.linkedin.com/in/phelim-bradley-84300826The discussion dives into:**The human data pipeline**: How AI companies rely on human intelligence to train, refine, and validate their models - something rarely discussed openly**Quality over quantity**: Why paying humans well and treating them as partners (not commodities) produces better AI training data**The matching challenge**: How Prolific solves the complex problem of finding the right expert for each specific task, similar to matching Uber drivers to riders but with deep expertise requirements**Future of work**: What it means when human expertise becomes an on-demand service, and why this might actually create more opportunities rather than fewer**Geopolitical implications**: Why the centralization of AI development in US tech companies should concern Europe and the UK</p>

Full Transcript

There's a dirty secret, isn't there, in Silicon Valley and in the tech world that there is, and I don't think people realise the extent to this, there is an absolutely huge importance on human data and human expertise and human understanding. And that is completely glossed over. Yeah, I mean, fundamentally, artificial intelligence is founded in human intelligence. And I think in the stack of data, algorithms and compute, I think the human data element is often the least spoken about, maybe the least glamorous. People want to imagine that there's a simple kind of input-output equation, but ultimately there's a messy layer in a stack of human beings who are providing their data to either label data, provide oral HF, post-training data, and then ultimately the evaluation and the assessment of the model performance all ultimately has an element of human data in it. I can't tell the difference between a doctor and someone who pretends to be a doctor. The only way that you can actually tell the difference is if you have this deep, abstract understanding and you know that they're breaking the rules. It's increasingly clear that Frontier Models as a platform are going to be centralised and controlled by a relatively small number of players, which at the moment is almost exclusively these US tech companies. So I think there is a bit of a wake-up call. What the future will hold is basically a marketplace of intelligence. So in the past, we had a marketplace of oil, for example, or electricity. Intelligence is going to be the new traded thing. I'm the co-founder and CEO of Prolific. And Prolific is a human data infrastructure company. So we make it easy for people developing frontier AI models and running research to get access to trustworthy, high-quality participants for high-quality online data collection. Prior to the chat GPT moment, the primary modes of data collection, the people were fairly fungible, right? So you're optimizing for cost and scale, maybe offshore lower-cost labor, which I think created this dynamic of human data for AI being a bit of a dirty secret. Let's talk about your core technology. Now, you've solved an interesting problem that I've tried to solve in the past. So I started a company called Merge and it was a code review platform. And it was exactly the same thing. I realized that code review has to be done by humans. And people talk about automation and the software engineering lifecycle. It's mostly bullshit. It's actually orchestration. You need humans involved in every single step. Most importantly, code review. We had a skill matrix and we could learn their skill and we could, you know, a pull request would come in from one of our customers and we would dynamically assign it to an expert in that field. and they would, you know, because you don't want a superficial rubber stamping pull request. You need to have someone actually who understands the code and goes into it. And this, I think, is one of the biggest problems in business in general, which is that, you know, we have all of this expertise out there. We have these problems over here. How do we match them together? How have you done that? Firstly, I appreciate that you understand the challenges and the complexity of dealing with human data and the orchestration and the routing of tasks of the right human. And I think often people want or expect this data to be as simple as calling an API and getting a kind of a CICD style response where it's fully automated, easy, clean and simple. But the reality is that humans are messy and dealing with humans, especially at the scale that we're dealing with and where we have hundreds of thousands of active participants or raters on the platform is challenging. And I guess fundamentally the value that we add is the deep kind of verification vetting of these participants or building up a deep profile of the nuances of each of our participants and the behavior that they provide in the data collection tasks so that we are able to route the most appropriate task or the right project to the right humans. I think increasingly understanding how to incentivize these participants and make sure that we're incentivizing the right behavior. And it's kind of a win-win-win dynamic across the three different relationships on the platform. sort of data collector, us and the participants, and not treating this as a strict supply chain where we're trying to commoditize or aggregate the participants and their data and get it for the lowest cost. Ultimately, we believe that the highest data quality is produced by people who are properly incentivized, like understand the impact of their work and are going kind of beyond just the financial incentives. I mean, obviously, we could use the Uber analogy, but it doesn't quite work because we're talking about very deep expertise here we're talking about very very specific things so um you need to find people and verify that they have the expertise you need to check that they're not gaming the systems i'm not sure you must have some kind of operational analytics where you know like this is what a normal behavior profile would look like and you need to incentivize them and certainly when you employ people in the real world you incentivize them in terms of things like autonomy and cultural fit and lots of like human-like factors how do you How do you do all of that? So the onboarding of the participants, everyone is kind of ID verified, make sure they are who they say they are, they are where they are in the world. Secondly, there's a feedback loop from the researcher. So this is analyzing and assessing the kind of QA of the data, feeding that back into the model and ultimately using that information to rank participants. So you're not just like selecting for an audience, but we're able to preferentially provide the participants who are going to provide the highest data quality for that task context And then I would say the third is the network analysis So looking at our participants as a network understanding the interconnections between those and finding pockets of disbehaving participants or people who are maybe trying to game the system and then using that information to kind of like filter the pool or derank those participants so that they are providing less data over time or ultimately removed from the platform. If required, And then the other thing, I think, is the incentive piece, which I think is very, very interesting kind of game theory around where from kind of behavioral research, you mentioned Danny Kahneman and co, we know that when the opportunity arises, people tend to cheat a bit. And especially when you have single-shot relationships. So there's the classic game theory, share or steal. I don't know if you're familiar with that experiment. It's a bit like the prisoner's dilemma. yeah prisoner's dilemma that's what i think about so there's a the the game theory idea of prisoner's dilemma where ultimately if you if you see that as a single sharp relationship a single point in time the incentive is for both participants to steal the things that changes change that into a dynamic where both participants are incentivized to share or in our context kind of not cheat or not game the system is if you treat it as a relationship and you have multiple touch points over a long period of time you have high communication between the two sides of the of the platform and ultimately you drive towards this kind of like win-win-win dynamic rather than a single shot experiment or data collection you might be you know measuring how long they spend to take to do tasks so how do you sort of bring the human component into it that's a great question So you're probably familiar with the analogy of the Mechanical Turk. Oh, yes. Tell the audience about that. We had a chap who was, I think, a robot playing chess, was a classic example. And it was perceived from the audience as this was a fully autonomous process. And then behind the scenes, ultimately, it was a human controlling the robot. I think the analogy of the mechanical work is super interesting because people want this process to be simple and for you to be able to fully abstract the humans behind an API and for you to be able to call human intelligence on demand via an API. That is the value that we want to provide ultimately. And we try to abstract away as much messiness and as much complexity as we can. And I think how we bring the human element back in is by trying to get out of the way as a middleman. And our philosophy is to try to build a direct connection between the people collecting the data and the participants providing the data. So they're able to communicate, provide feedback, ultimately kind of understand the impact that their work is having on the data collections, whether that's research or model development etc um so getting this feedback mutual feedback across the platform this peer-to-peer messaging we think is like crucial to uh the the trust and making sure that there's a sufficient amount of uh empathy for the people who are providing this this super valuable data now i've got a lot of experience about work and i um it's good and bad in a way i i don't like it because as we were just saying that there's this huge epistemic history even if someone is a creative professional they've been doing it for years i still find that quite often they're just uncalibrated and the onus is on me to specify what i want to an insane level of detail and the onus of doing that is often greater than just the cost i might as well just do it myself so how how do you guys overcome the specification problem and do the tasks that you do on prolific do they tend to be quite close-ended what i mean by that is is you have specific outputs or sometimes are they quite ambiguous and open-ended in terms of the output yeah that's a that's a great question i think it ties back to um obsession with data quality and i think there's two uh two aspects to data quality is obviously the profiling the quality of the audience so how um what what expertise and and specialism training the participants have and then also the quality of the the task uh task design or even the specification of the people you're looking for right we'll often get requests for uh phds in in biology like okay there's there's quite a lot of nuance within uh within biology you're looking for uh genetics uh expertise uh bioinformatics uh healthcare etc etc uh so so supporting the um the researchers in in specifying uh the audience with with sufficient detail we have a mix of of tasks so the platform is fairly use case agnostic. So some of it is fairly self-contained. Game theory dynamic of multiple touch points with the same participants tends to lead to kind of a relationship between the researcher and the participant, which leads to better data quality. We have many, many projects that are long running where the first part of the project is training or providing that context to the audience so that they're able to learn over time what good data quality means for this project. And that can be very, very long running. So sometimes weeks or months of longitudinal or multi-step data collection. Very cool. So essentially you built this platform which gives you human expertise on demand. What does this mean for the future of work? I think, yeah. So we've optimized the platform for breadth of audience choice. Although you can earn a great side hustle on Prolific don necessarily want to optimize for uh like very very professionalized uh raters or uh research participants right we want to tap into uh real world users uh so let say for example you're looking at healthcare workers to evaluate your medical chatbot we want to we want to tap into folks who are actively working in in the field and not people who've left the field and are now kind of professional uh professional annotators we're trying to reflect the real world and real world users as much as as much as possible so we we absolutely see our platform as as augmentation to work necessarily than a than a replacement though increasingly this work of human data for for ai is being professionalized what i like about it is it increases the market efficiency you know like the market is all about we have these economic tasks that that are valuable and we have these folks over here with skills uh yeah absolutely that is something we're explicitly thinking about is is uh how do we train folks uh with the skills that are uh useful for these frontier model providers so for example like how do we take a general um general participant and turn them into a high taste uh evaluator so it's not necessarily just uh domain expertise or domain knowledge which is which is valuable uh but also kind of general audience who are skilled in overcoming the kind of typical biases of preference valuation. You know, like Uber, for example, there is a critical mass, maybe even dating websites is another example, like you have to bootstrap it. So Uber wouldn't work if there was just no density of cars in my area. So I don't know, is it the case that you had to bootstrap it and it got easier? Or was it easier at the beginning? Because even when you have loads of participants you might have problems where there's just not enough work to go around because it's very stratified you know for example if you're doing some demographic research and i think you said you know six percent of people are for major so you'd want to like get six percent of people you know so you might have this sparsity problem but by the same token when you actually have this scale like do you find that it kind of works better yeah we have the chicken and egg problem of all marketplaces and i think um you can think of this again analogous to the uber cities uh problem right so in in early the i think the analogy of a of a city in the uber context for us would be a um a segment of the of the audience or people with a with a particular uh particular skill set where we might have bootstrapped up to um from an atomic network to a kind of a scaled network say for example for a u.s general audience population or uk audience population but then for each new expert or or kind of segment that's in in demand we need to go through that that kind of atomic network scaling scaling that network up to a point where we're able to provide kind of incident data you know we optimize for for data collection kind of in the space of hours rather than days or days or weeks so ultimately we look at that kind of market liquidity as a user segmentation problem where we might have um scale networks for particular audiences and then it's like thinking about what's the next marginal uh user who can kind of incrementally add to that network the the most value we kind of drive the the growth of the network in in in that way the matching algorithm itself could you could you tell me about that uh that does yeah secret secret secret sauce for sure yeah i'd be so fascinated to know because i did you know i'm thinking you know maybe there's an analogy to google search yeah they have like a page rank algorithm and that is an example of this kind of social graph type metadata because you know you you put a hyperlink to another page you know if you actually like that page and then of course you can build up a ranking from that but there's like intrinsic content type metadata which you know obviously you guys have come up with a way to like mix all this together yeah exactly i think the other analogy is like to i think it's they're called a two towers uh algorithm so uh tiktok instagram reels where you have the context of the user who's filtering for the content. And then you have all of the choice of content. I think the analogy here is you have the task context and then you have the human context and you want to be able to rank the humans so that you're getting the kind of optimal people floating to the top. And in a similar way that when you open up YouTube or Instagram reels or TikTok, you get the content that's most relevant for you. Who controls these AI platforms? Controls quite a lot. The internet was a fully decentralized platform, not particularly owned by anyone. It's increasingly clear that AI infrastructure and frontier models as a platform are going to be centralized and controlled by a relatively small number of players and predominantly US players at the moment. maybe China's playing a role unfortunately not a very very significant role being played in the UK and Europe as a whole right now I think we're lucky that many of the folks who work for these global labs even if the capital is coming from the US are international, they have a global perspective and we find when working with our customers in the frontier labs have extremely positive intent they want these models to be globally useful and reflect the input from a wide variety of opinions and subjectivity. Though I think there is obviously a risk if these models do become super intelligent and there is massive labor impact, that cost is going to be felt internationally and will be felt by us here in the UK and Europe. and the value of that efficiency is going to flow to the owners of the platform which at the moment is almost exclusively these US tech companies so I think there is a bit of a wake call for the for the UK even though we maybe late to play a more significant role in the life cycle of these models whether it like owning more of the training, having more locally produced models, data centers, energy abundance in order to power these energy-hungry models. Definitely think there's room for more UK-EU dynamism and accelerationism in this space. We are still hiring software engineers aggressively, including junior software engineers. Some people have a philosophy of junior engineers and software engineers won't be hired anymore because a senior engineer plus AI agents will be 10 times more efficient. I think that neglects the fact that these models are extremely powerful teachers and coaches and junior engineers much more rapidly become as competent as senior engineers with this co-pilot training to be better software developers. developers you know when i when i started prolific one of the motivations for starting prolific was to learn about web development and building a product and this was a relatively learning process than it would be would be now and i would have i would kill for a near super intelligent co-pilot in order to build more product faster and i think there's a very elastic demand for many of these things like software that models are going to improve our efficiency efficiency to build. There is something special about local situated human expertise. And what I see is that this could make the pie bigger, because assuming that these models will be limited in how they understand different domains, we might in the future need to have an operational loop on top of language models, just so that people can actually verify the, you know, it might be consequential health advice or something like that. And the user doesn't know whether it's whether it's correct or not so we could have a prolific type plug-in system where a user could press a button and say is this legit this is definitely where we see direction of travel also like you ai human interaction or like agent human interaction analogous maybe to like human computer interaction i think that's like the next next phase of research yeah you could you could imagine for example a deep research style agent going off and doing a long-running task And one of the steps along that workflow is to get a review from a human expert that's routed to the most appropriate person for that task. So again, I think something that we're looking at at Prolific is how do we build these systems where agents and humans can collaborate and where we can move beyond this kind of relatively simplistic preference-based evaluation in order to build out systems which better simulate the ultimate objective that we're trying to aim for, right? We know that models are, when they have a goal, because of Goodhart's law, they're very, very good at optimizing for that goal. So therefore, choosing the right goals is increasingly important and developing the tools to effectively simulate those goals and get as close to the true objective as possible, whatever I think you mean by that objective. But I think ultimately it's real world performance for real world users and like real world context is ultimately the thing that we're trying to simulate with all of these evaluations and benchmarks. We are still building a marketplace of intelligence by creating this link between all of these human situated experts and the system which matches and learns. And maybe there'd be a middle way. Maybe we can we can create some automation against that. But fundamentally, I believe that we're going to need human expertise more than ever. I think the pie has gotten bigger because so many people are just, they're getting their appetites whetted. They're generating videos. They're writing code. They're building applications. They're doing all these things they couldn't do before. And then they hit a brick wall because they realize they don't actually understand it deeply enough. So they need to bring the experts in. All of these experts are in more gainful employment, in my opinion, than they ever were before. And this could create a virtuous cycle. I mean, I think that's a positive outlook of the future. What do you think? Yeah, I tend to agree. I think this reminds me of the dynamic between synthetic data versus human data, and people often frame that as an either-or proposition, whereas I'd say we're very bullish both on synthetic data and human data, and ultimately augmenting human data and human expertise is expensive. And models are able to make that cheaper and maybe more effective. but when you reduce the cost of something, you tend to increase the demand to kind of compensate for that effect. It often comes up with these AI models as infrastructure. But I think you'll end up with a kind of a similar effect, even if we're accelerating human data in the context of post-training or model evaluation with LLM as a judge or synthetic data because of the explosive demand for these models. in the first place, even if the proportion of human data decreases over time, the actual scale and importance of that data is likely to increase. The Financial Times, the New York Times, YouTube, Reddit, etc., they get a recurring payment for their data and it's licensed. And I think you could also imagine that human expertise being licensed in a similar way where you get an ongoing passive incentive for you providing data to improve the model, maybe analogous to like a Spotify, right, where you have the proceeds of all of your subscriptions going out to all of the people who are providing music. You could imagine a similar approach to incentivizing human data in the future as well, where we get this kind of more ongoing incentive to continually improve either like a very personalized model, as you mentioned, right, kind of a digital twin of your expertise. I think that's maybe the more obvious example, or even like to improve the infrastructure centralized models. yeah it's almost a bit like you buy a solar panel and then you can give you can give energy back to the grid yeah yeah similar things like awesome

Share on X Share on LinkedIn