AI vs. human conflict will result if AI has goals that differ from those of humankind. It follows then that expressing human preferences to machines accurately, precisely and unambiguously is vital. Survey designers are key to that project.
The UK government blared an unprecedented alarm through the smartphones of its residents last weekend. Just weeks earlier, AI experts and tech leaders (including Elon Musk, Apple founder Steve Woznick) published a letter calling for a pause to AI experiments. A key concern is the alignment problem: what happens if AI has a goal that misaligns with the goals of humans? It is not implausible that the goal desired by the superintelligent AI will be achieved at the expense of humanity. Misalignment can only be avoided if human preferences are accurately incorporated by AI.
The latest efforts in achieving alignment rely on human feedback – survey respondents are shown AI output and asked to evaluate it. The trouble is that these sorts of tasks can deliver survey data that systematically misrepresent human preferences.
Economists have been using surveys to measure preferences since the 1980s. They used a stated preference survey to put a dollar value on the wildlife lost following the sinking of the Exxon Valdez oil tanker. A US court took these stated preferences into account when sentencing Exxon to pay $5bn in damages. Subsequently, the robustness of stated preferences came in for scrutiny. Two general results emerged from that research:
- Stated preferences are sensitive to superficial tweaks to the survey question
- Stated preferences are less sensitive than they ought to be when substantive details in the survey question are altered
Points 1 and 2 call for a rigorous and evidence-based approach to eliciting human feedback in the context of AI research. In short, the fate of the world rests on the quality of survey design.