Hi all, this is a work of amateur philosophy that’s very special to me, and after some basic vetting with folks I respect, I think it’s time to share it with the world. Warning: it’s fairly dense, challenging, and not fully self-contained. But at least it’s short!
To set yourself up to understand, I highly recommend first checking out The Evolution of Trust and Wikipedia’s Knobe effect article.
I have made some edits to this article as my conceptual framework has improved. A list of these is maintained at the bottom.
Summary
Getting along is cognitively demanding for humans. Because of our ability to execute intentional strategies, as well as obfuscate our intentions, humans are capable of sophisticated parasitic “cheating” strategies that threaten to destabilize the high-payoff prosocial equilibrium of trust and cooperation. To respond to this, we have evolved the cognitive tools and motivations to build social norms. If we think of our social norms as classifiers which attempt to flag antisocial behavior for deterrence, we could call them cheat detectors, and that is the perspective we will adopt here. In normal adaptive use, cheat detectors rely on the imperfect information of people's behavior and speech to detect cheating as accurately as possible. By interpreting social norms as cheat detectors, we find a natural way to explain both the experimentally-demonstrated Knobe effect and its non-universality in a way that intuitively aligns with our collective moral experiences.
Main
Human intelligence enables complex prosocial cooperation strategies with very high potential payoffs. Unfortunately, the possibility of prosocial human behavior is threatened by “cheaters” with parasitic life strategies that enhance their reproductive success at the expense of prosocial strategists. Using a family of toy models of evolutionary game theory, Case (2017) and Alexrod (1984) find that three conditions are necessary for prosocial strategy to beat the cheaters:
Repeated interactions (the more the better)
Possibility of win-win scenarios (the bigger the better)
A low rate of miscommunication, defined as random or extrinsic cooperation failures not intended by an agent as part of its strategy.
While the Case-Axelrod family has many fascinating features that suggest prosociality can be a dominant strategy, especially if there is tolerance for miscommunication, the model has limited applicability to humans because we are capable of much more sophisticated strategies, especially deceptive ones which allow cheaters to parasitize other humans without their awareness. When deception succeeds, a person does not know that they are being cheated, which makes strategies like tit-for-tat impossible, with or without forgiveness. Thus, prosocial strategists must find ways to detect and punish cheating under deceptive conditions. To prevent deceptive exploitation without causing cycles of defection, they must accurately discriminate between deceptive and honest mistakes.
We1 do this by making the problem simpler, at the cost of making it probabilistic. We create cheat detectors, heuristics that trigger social sanction under “transgressive” conditions that can be observed without arbitrary access to other people's mental states. Cheat detectors are like the classifiers we develop using data and machine learning methods on computers, but they operate in a notional geometric space of possible contextually-dependent human behaviors. The more accurately these classifiers represent the boundary between honest and dishonest mistakes, the more useful they are for the human cooperation problem.2 We will define transgressions as behaviors that trigger the cheat detector.
At this juncture, it is absolutely essential to observe that transgressions are not the same as cheating. Transgressions are the predictions of the classifier, while cheating is the real-world behavior the classifier attempts to predict. This simple distinction between the predicted and the actual, foundational to all statistical learning theory, is the main reason that it is useful to think of social norms as classifiers, as we will see later when we come to the Knobe effect.
Cheat detectors have historically been designed, consciously or unconsciously, using an informal scientific process combining theory and data. The theory comes from our own mental models of other people's social strategies, based heavily on extrapolation from strategies we learn from others or develop ourselves. The data comes from our collective, deep historical memory of experience with different kinds of mistakes. This explains in part our enduring interest in deep fictional and non-fictional accounts of our fellow humans' behavior and its complex intentional and non-intentional causes.
The range of cheat detectors comprises not only laws and customs, but also norms held only by part of a community, such as speech that mentions bigoted slurs without using them. Individuals build their own cheat detectors, discuss detector design, and use informal and formal consensus processes to promote detectors into shared collective use. Detectors in collective use are customs or laws. Religions and moral philosophies are in part attempts to make detector design more public and formal, which both facilitates consensus and inhibits individual antisocial behavior from interfering with design.
This theory predicts that the Knobe effect arises from (among other things, see Post-script below) a semi-conscious suppression of cognitive dissonance by conflating transgressions with real cheating (i.e. dishonest mistakes). While transgressions can be honest mistakes, admitting this possibility for any specific transgression implies that punishment could violate the norm of tolerating honest mistakes. In many cases, we resolve this dissonance by pre-emptively labeling the transgression as a dishonest mistake and broadly distrusting any information from the transgressor, including information suggesting the mistake may have been honest.34
Cheat detector theory also explains why the Knobe effect is not universally observed in human moral judgments. Our collective moral experience includes many examples of cheat detectors malfunctioning, and punishing these cases not only violates the norm of tolerating honest mistakes but can also hurt people we respect or care about. Those who have not only seen such malfunctions, but recognized them as destructive malfunctions, are motivated to advocate for the review and redesign of cheat detectors to reduce their malfunction rate. They lead efforts to reverse the accidental or purposeful mis-triggering of cheat detectors in situations ranging from the Salem witch trials and the Dreyfus affair to modern controversies over allegedly inappropriate social media posts. Many features of modern liberalism, such as due process rights, have been motivated by a desire to stop the harm done to innocent people by malfunctioning cheat detectors. Liberalism advocates that we acknowledge:
The inherent risk of individual-level malfunction in all cheat detectors,
The potential benefits of reviewing and revoking the status of shared community cheat detectors that may no longer work well, and
The value of public moral philosophy as a guide to detector design and use that, by virtue of its public nature, is less vulnerable to private antisocial manipulation.
This work was significantly influenced by Knobe (2007) and Knobe (2010), and has resonances with Beebe (2012).
References
All of these can be found on Google Scholar except Axelrod.
Knobe (2007) - Knobe, Joshua. "Reason explanation in folk psychology." Midwest Studies in Philosophy 31.1 (2007): 90-106.
Knobe (2010) - Knobe, Joshua. "Person as scientist, person as moralist." Behavioral and brain sciences 33.4 (2010): 315-329.
Beebe (2012) - Beebe, James. "Social functions of knowledge attributions." Knowledge ascriptions 220 (2012): 242.
Case (2017) - an interactive demonstration inspired by Robert Axelrod's 1984 book "The Evolution of Cooperation". Miscommunications in this interactive are called honest mistakes in this essay.
Axelrod (1984) - The Evolution of Cooperation
Oh yeah, and also I figured it out by analyzing my own past actions and their consequences by talking to a witch on Twitter lmao
Post-script: the Knobe effect isn't just about suppressing cognitive dissonance
While I believe the Knobe effect does commonly arise from an attempt to suppress cognitive dissonance by erasing valuable distinctions, that isn't the end of its story. I believe the Knobe effect can emerge from multiple mechanisms in our complex moral psychology. For example, if a person committed to a generally prosocial strategy has acted in a way that has both triggered a widely respected transgression detector and had negative consequences, there is a substantial chance that the person will benefit from deep reflection on their motives for what they did. I believe that some people provisionally relabel themselves as dishonest and unworthy of trust. These people then perform a skeptical, perhaps even hostile interrogation of their own motives. Depending on the level of hostility, this could be a healthy and productive process or the beginning of an OCD-adjacent mental disorder.
Notes
This space is for notes
A point in favor of this theory is that it is consistent with our broad cognitive architecture. We use statistically-based predictive models extensively across all of our cognition. Even something as simple as “this person will do as they said because they have in the past” is essentially a prediction from a learned statistical model.
I’d like to make this a full journal article. A friend pointed the way: in a standard philosophy paper you might explain the Knobe effect and its non-universality, explain standard approaches to explaining it, show how they fail, introduce your new term, show how it does a better job of explaining the data, fend off any intuitive counterarguments, and explain why this idea has implications for further research or our understanding of ourselves in some other way.
Questions this paper does (or should) answer:
why do we accept social norms, even though they sometimes lead to innocent people being hurt socially?
false positives are inevitable for a classifier to be effective under the field conditions in which social norms are used. intentions are never really available, and the possibility of gaining fitness advantages through deceptive antisocial strategies is ever-persistent. (potential analogy to generative adversarial networks - the game never ends, or at least far far exceeds the vantage point of any one person or society)
on what basis are social norms selected or changed? how do humans select a particular configuration of social norms from the infinite space of possibilities? how do we decide when we need to add or modify a social norm?
i think conceptually the answer could be something like: gradient ascent to maximize the accuracy of cheat detectors, over an surrogate landscape constructed from the “informal scientific process combining theory and data” mentioned in the paper.
regardless of the exact process, the point here is that using a quantitative, accuracy-related criterion as a compass seems like the only plausible way to productively navigate the infinite space of possible social norm configurations.
in our folk moral thinking, i think we see this design process as based on feelings, but where do feelings come from if not evolution? and how does evolution select what feelings we will have in what contexts? doesn’t it have to come down, again, to something like this? some kind of evolved, quantitative calibration, which implicitly improves fitness by improving the accuracy of our cheat detectors?
Blackstone’s 10:1 ratio can be viewed as building upon the cheat detector framework by specifying a particular weighted cost function.
Changes to this paper
August 11, 2024
Title changed from “social norms as classifiers” to “social norms as cheat detectors”
“Transgression detectors” was changed to “cheat detectors” to emphasize what these predictive models are trying to detect, as is standard in applications of predictive models.
I now define transgressions as the predictions of cheat detectors, distinct from the cheating or dishonest mistakes that are being predicted.
At this point in the essay, I will change the pronoun used to refer to humanity from "they" to "we". I do this to invite you, the reader, to take the role of a prosocial strategist. I ask you to consider whether my argument rings true not only to your intellect, but to your moral-emotional reasoning. Or, dare I say, your heart.
Interestingly, cheat detectors in public use can become self-fulfilling prophecies because prosocial strategists will generally take pains to avoid setting them off, while cheaters may be willing to set them off if it serves their purposes. They’re like no-trespassing signs in moral territory that are only wandered into by strategic cheaters, naive prosocials. Or, much more rarely, strategic prosocials doing what has to be done in a challenging situation.
The Knobe effect is noted for its asymmetry: we tend to treat people as intending the consequences of their bad actions, even if they say they didn’t intend them, but we don’t do the same for good actions. Some readers have asked what my theory says about the good actions. My answer is that there is nothing to explain. People normally distinguish intent and consequence if they can. It’s only when we’re defending against cheaters that we get into a mess.
It is very important to note that proposed mechanisms like this need to be experimentally validated, as works like Knobe’s have led the way on. There may be many ways to explain the Knobe effect with different subtle consequences in different contexts.