Nudges, as characterized by Thaler and Sunstein (2008), aim to improve personal and societal welfare by steering our decision-making through insights from psychology and behavioral economics, without limiting freedom of choice. Nudges are designed to be easy and cheap to avoid. The nudge program is now global, with many governments adopting behavioral insights units in order to inform new social policies. So what’s the problem? Such an influential program of work has invited much debate, and so to push this into new directions this three-part series aims to communicate cutting edge insights on nudge, specifically concerning its evidential support, its theoretical basis, and its ethical acceptability.

The first part was aimed at laying the foundations of the nudge programme in order to set up the second part of this piece which takes the form of a Q and A. In this second part three academics involved in the theory and practice of nudging describe their take on each of the three issues posed as questions to them. The final part summarizes these answers and offers some ideas about how to move the nudge program ahead in ways that surmount some of the issues raised.

*   *   *

Part Two

Questioning Practicalities:  For wide scale field experiments, what do we need to take into account to judge when or where nudge policies are most effective?

 

Nick Chater: The value of interventions is best assessed comparatively. The question of which intervention is most effective is the crucial issue facing the policy maker, whether the interventions are nudges or not. So, for example, ideally a field study designed to reduce electricity consumption (either across the board or during peak-load hours) would comparatively assess the relative impact of the price of energy, the presence of a smart meter, the availability of specific nudges on the utility bill, and so on. It may not, of course, always be possible to manipulate these in the same study. Almost certainly, a field study will not be able to manipulate prices, or install smart meters. But, as a fall-back, by drawing on prior work, it may be possible to provide credible estimates of the effects of various policy interventions; and hence, as a second step, to compare their cost effectiveness.

 

A further question concerns the robustness of the intervention. Will it work on a large-scale? Will it work repeatedly and/or over long periods of time? Is the intervention robust to people understanding how it works? And are there compensatory behaviors that might undo the effectiveness of the nudge? So, for example, a phone app informing people of their home energy use might be an engaging novelty during the period of study, but the app might lose its cachet if it became ubiquitous, or it might simply become boring after a few weeks or months of use. Similarly, using small plates might reduce food consumption at a sitting; but a field study would need to check if people did eat more calories between meals.

 

Till Grüne-Yanoff: If effectiveness means that policy P produces the desired result under actual circumstances, and field experiments test the effect size of P in those circumstances, then field experiments indeed seem to provide all the relevant evidence to choose the most effective policy from a given set (Levitt & List 2009).

 

However, field experiments do not support wider effectiveness claims. First, field experiments performed in one system might not justify a policy intervention in different systems (Cartwright & Hardie 2012). Second, social systems can change rather quickly in time. Thus, field experiments performed at time t might not justify an intervention at time t+1, if e.g. demographic, technological or legislative changes have substantially transformed the system (Grüne-Yanoff, 2015).

 

Besides extrapolation, a third issue concerns the scope of the experimental control itself. Did the experiment control for (unintended) spill-over effects on other welfare relevant variables? Did it control for a possible deterioration of the effect size in time?

All three issues can be addressed and solved with the relevant mechanistic information: extrapolation is unproblematic if the differences in the systems do not affect the operation of the relevant intervention mechanisms; and spill-overs and temporal deteriorations can be similarly detected (Grüne-Yanoff, 2015; Steele, 2008). However, such information is often hard to obtain. This might caution us against focusing on interventions that require specific mechanistic assumptions, instead focusing on those that do not require mechanistic assumptions as much.

 

Elke Weber: Spearheaded by the British behavioral insights team (BIT), there are now wide-scale field experiments springing up nationally (e.g., New York City, Philadelphia, Rio de Janeiro) and internationally (e.g., USA, Germany, Singapore).  Given their popularity, being aware of failed attempts to change behavior using certain nudge type interventions in specific settings and contexts is just as important, or perhaps even more so than focusing on their successes. Cataloguing of all efforts is a key first step. For a start, what is needed is a record of attempted intervention type(s) (framing manipulation, default setting, social norm reminder, etc.) along with the specific settings in which they are implemented. The catalogue will also include details about the domain (tax compliance, claiming of eligible benefit/subsidy, etc.) and country/culture as well as subpopulation targeted (elderly citizens, civil servants). By having a data base that would be centrally-administered by an organization like the Organization for Economic Co-operation and Development (OECD), a large body of evidence can accumulate over a short period of time about optimal allocation of interventions to settings that can be meta-analyzed in different ways. With this in mind, the designers of wide-scale field experiments/randomized control trials (RCTs) can more easily follow some agreed procedures (random assignment to treatments and controls, factorial or nested combinations of different interventions if more than one is applied, documentation of effect sizes, etc.).  If at all possible, it would be helpful to also collect process variables (see Question 2) that would allow researchers to examine the reasons for why people accept or react against the implicitly guided hand of a BCT.

 

Questioning Theory: Based on an understanding of the mechanisms that underlie the interaction between human and choice environment, what examples are there in which this is used (in)effectively to map behavioral interventions (e.g. nudges) to the environment?

Nick Chater: One of the most basic psychological constraints on human choice is that the brain encodes magnitudes in relative, rather than absolute, terms (Laming, 1984, 1997). Indeed, when provided with a sequence of items varying on a single psychophysical dimension, people are unable to accurately divide these into more than five categories (e.g., Garner, 1954; Stewart, Brown & Chater, 2005). Roughly, for example, sounds can reliably be judged to be very quiet, quiet, average, loud, and very loud. But how these coarse categories are applied depends on the range of items encountered (Garner, 1962, Laming, 1997). The very same sound might be judged very loud, when contrasted with a set of pins dropping; but judged very quiet when presented alongside a variety of pneumatic drills.

 

The relative nature of judgment implies that we can make an object seem more or less appealing by manipulating the other options available; which options were previously available; or which options are chosen by others (Stewart, Chater & Brown, 2006; Stewart, Reimers & Harris, 2015). So, for example, the preferred size of a portion of chocolate cake at a buffet will likely be influenced by the range of cake sizes displayed (and the plate and cutlery sizes); any reminders that this cake is larger or smaller than a person usually eats; or observing the cakes that other people are eating. More powerful still is the implication that people would enjoy cakes no less if all cakes were scaled down in size. The cake’s health impacts will, by contrast, depend on absolute, not relative size. This type of reasoning is one motivation for the across-the-board reduction of salt in processed foods.

 

Till Grüne-Yanoff: The effect sizes of default changes are explained with at least four different possible mechanisms: cognitive effort, loss aversion, recommendation effects and change of meaning (Grüne-Yanoff 2015). We still know very little about when these mechanisms operate – but we can learn more about them by considering intervention failures.

 

First, when looking at the effectiveness of default changes at the population level, the results are often more ambiguous than claimed. Beshears et al. (2009) for example, show that shifting the default contribution from 3% to 6% for a saving plan induces about 25% more people to save 6% of their income – a “sizable impact”, in the authors’ view. However, in this case the population as a whole saves no more under the 6% default (M = 6.4%, σ = 3.5) than under the 3% default (M = 6.9%, σ = 4.0). Why? Because the higher default also induces many others to save less than before. These unintended consequences point to a lack of understanding of the complexity of the mechanisms underlying the intervention.

Second, whether default changes achieve their normative goals also depends on the mechanisms through which they operate. For example, studying employees’ reasons for sticking to the default in a public retirement system, Brown et al. (2012) found that employees who stuck with the default for reasons of avoiding cognitive effort were more likely to regret their choices later on than those who stuck with the default for other reasons. Knowledge of mechanisms thus can help explain both positive and normative intervention failures.

 

Elke Weber: Generally any robust psychological phenomena is determined by multiple factors (Weber & Johnson, 2009). So, it helps to first spell out the possible mechanisms that connect (in)effective choice architecture interventions to human psychology. For instance, Dinner et al. (2011) propose three channels by which the setting of a default option is effective: (a) reducing cognitive effort, (b) providing an implicit endorsement of that option, and (c) by guiding preference construction along the blue prints of prospect theory (Kahneman & Tversky, 1979; Tversky & Kahneman, 1992) and query theory (Johnson et al., 2007; Weber et al., 2007). Organ-donation (Johnson & Goldstein, 2003), the poster child for automatic defaults, undoubtedly works by engaging all three channels.

 

By the same token, the differences in the effect size of a single BCT (e.g., automatic default) across applications (e.g., pension scheme, organ donation, phone tariffs) also can be explained by identifying the various mediating channels operating in each application or context (see Jachimowicz, et al, 2016, for a meta-analysis of the effectiveness of setting a choice default). For instance, cognitive effort reduction may not be as important a motivator in emotionally less aversive decisions, and implicit endorsement may actually back-fire which can result in default rejection (Jachimowics et al., 2016). If decision makers doubt the implicit endorsement of an option because the motives of a choice architect (e.g., restaurant owner, public health official) don’t align with their own (Shu, Bang, Weber, 2016), then they may not be convinced by the use of small plates to reduce food consumption in a commercial buffet restaurant, but more convinced when they appear in a school cafeteria.

 

Questioning ethics. What are the relevant normative criteria that nudges (and their alternatives) should be judged by?

 

Nick Chater: When is a nudge a push? (e.g., Oliver, 2015; Sugden, 2008)? That is, under what conditions can nudges be viewed as coercive manipulation of the behavior of others? Most of us feel that a small charge for plastic bags, dramatically cutting plastic bag use in several countries, is benign. Yet we feel uncomfortable about free magazine subscriptions that rely on us forgetting to cancel; or “free” money when you open a betting account, which may be profitable only because many of us become “hooked” on gambling.

 

There is, I believe, a simple theoretical distinction between benign nudges and malign pushes, although making the distinction in practice is not straightforward. If the preservation of individual autonomy is our starting point, we should be happy to consent to actions of others, or public policies, if we agree to them (Gauthier, 1986); or more accurately if we would agree to them, given appropriate time, attention and information (in particular, our agreement cannot be treated as legitimate if it depends on us being hoodwinked in some way). In particular, a nudge is legitimate if we (or, in a democracy, a suitable majority of us) would agree, if we knew its mode of operation and consequences (c.f., Misyak, Melkonyan, Zeitoun & Chater, 2014).

 

The plastic bag tax (a small cost at the point of purchase in supermarkets), which has successfully been implemented across the UK and elsewhere, is a nudge almost everyone will agree to, and even more so when we realise how effective it is. Magazine subscriptions and “free” opening accounts at bookmakers appear to depend on subterfuge—on us not knowing how likely we are to be caught out by inattention or drawn into addiction.

 

Till Grüne-Yanoff: Nudges have been questioned with respect to various normative criteria, including autonomy, dignity, liberty, non-manipulation, welfare, equality and transparency (for an overview, see Barton & Grüne-Yanoff, 2015). Whether they meet these criteria, I believe, cannot be decided categorically, but rather depends on (i) the specific mechanism through which an intervention operates, and (ii) the context in which this intervention is performed.

 

For an illustration of (i), consider the claim that a particular default is manipulative in the sense that it was set to influence behavior by circumventing the rational deliberative faculties of the decision maker (Hausman & Welch, 2010). The truth of this claim depends on the assumed mechanism: A default operating through cognitive effort avoidance or through loss aversion would arguably be considered manipulative, as either mechanism does not engage the agent’s deliberative faculties. A default operating through a recommendation effect, however, would not be considered manipulative: here agents might incorporate defaults as relevant signals into their rational deliberation.  

 

For an illustration of (ii) a manipulative default-setting might reduce an agent’s autonomy, but only if the agent enjoyed minimal autonomy in this context to start with. If instead the agent already had lost minimal autonomy with respect to a specific choice – for example through heavy manipulation by narrowly self-interested third parties – then a manipulative intervention would not count as autonomy-reducing. Consequently, what is needed is an ecological rationality perspective: an investigation of the normative acceptability of specific interventions in the light of their mechanisms and contexts (Hertwig & Grüne-Yanoff 2016; Sunstein 2016a).

 

Elke Weber: The primary normative criterion on which choice architecture interventions ought to be judged is whether they improve public welfare at minimal cost to the welfare of individuals. To do this, one needs to consider whether the cost/benefit ratio is in favor of a BCT when they are contrasted against traditional policy tools (e.g., mandates, economic incentives), or when they are assessed as complementary methods that work with traditional policy tools. In addition, it is also necessary to consider the extent to which final choice (mediated via behavioral or conventional policy tools) maps onto decision makers’ preferences. In the case of discrete choices this could be done by creating a type of nxn matrix and using an associated Chi-squared test statistic, and in the case of continuous choice options it could involve correlating preferences and choices made (Johnson, 2017). 

 

Should public acceptance form a basis on which to judge the normativity of a BCT? It is undoubtedly helpful to have broad-scale approval across the American political spectrum for implementing a default like an optimal level of sustainable “green” sources for electricity provision (Sunstein, 2016b). But we should also realize that ex-ante resistance to a behavioral or conventional policy initiative is often reversed after implementation. This is usually when people experience the positive consequences of the change and when the new status-quo receives the privileged attention previously given to the old status-quo (Weber, 2015). In other words, some measure of paternalism by policy makers and public officials need not to be a dirty word.

 

Read Part 3

*   *   *

Nick Chater: Nick Chater joined Warwick Business School in 2010, after holding chairs in psychology at Warwick and UCL. He has over 200 publications, and was elected a Fellow of the Cognitive Science Society in 2010 and a Fellow of the British Academy in 2012. Nick is co-founder of the research consultancy Decision Technology; and is on the advisory board of the Behavioural Insight Team (BIT), popularly known as the ‘Nudge Unit’.

 

Yashar Saghai: Yashar Saghai is currently a research scholar and associate faculty member at the Berman Institute Johm Hopkins University. His research in applied ethics, political philosophy, and philosophy of science focuses on possible food futures. He engages with several fields, such as: food and agriculture; futures/foresight studies and history; public health and medical research; behavioral economics and cognitive psychology.

 

Elke Weber: Elke Weber is Gerhard R. Andlinger Professor Princeton University after holding the Jerome A. Chazen Professor of International Business in the Management Division of Columbia Business School and a chair in psychology. She has over 200 publications, is past president of the Society for Mathematical Psychology, the Society for Judgment and Decision Making, and sit on advisory committees of the U.S. National Academy of Sciences.

 

Till Grüne-Yanoff: Till Grüne-Yanoff is Professor of philosophy at the Royal Institute of Technology (KTH) in Stockholm. He has over 60 publications on the topics of philosophy of science and decision theory, as well as formal models of preference consistency and preference change, and the evaluation of evidence in policy decision making. He is a member of the TINT Finnish Centre of Excellence in the Philosophy of Social Science in Helsinki.

 

Magda Osman: Magda Osman is Associate Professor of Experimental Psychology at Queen Mary University of London. She has over 60 publications and authored 3 books. Her research interests include dynamic decision-making, agency and control, and critical evaluations of dual-systems of thought. She is head of the Centre for Mind in Society, and sits on various advisory panels, consulting on banking regulation, food safety, and advertising.