Probabilistic, Reformative Justice

Introduction

Imagine you are an all-powerful judge, presented with two indistinguishable copies of the same person — one evil and one good. You know that one of them has committed multiple murders, whilst the other one is completely innocent. However, it is impossible for you to figure which is which. Letting both walk free feels wrong, and convicting both feels worse. Intuitively, some form of compromise would be best.The hypothetical is contrived, but it illustrates the point I’d like to argue: the outcome of criminal cases largely follow a binary distinction, one that makes sense if the innocence/guilt of the subject is of crucial importance, but otherwise doesn’t. The core of this argument was developed by Fisher in Conviction without Conviction (2012).[1]

Exposition and Taxonomy

The questions “Why do we enforce justice?” and “How do we enforce justice?” tell you a lot about a society. They split paradigms of justice along two axes: punitive vs. reformative, and binary vs. probabilistic. Here’s a table to illustrate the taxonomy, as well as my ranking for each quadrant:


Punitive

Reformative

Binary

3

2

Probabilistic

4 (worst)

1 (best)


Let’s start with the axis. A reformative justice system’s job is simply to minimize the amount of crime, and to rehabilitate criminals as quick as possible. It stands in contrast to a punitive justice system, whose goal it is to exact revenge on behalf of society or the plaintiff, through adequate punishment.

The axis differentiates loss functions: how serious the intervention is as a function of p(guilty). A binary judgment distinguishes primarily on a guilty/innocent basis. In its purest form, its loss function is just the Heaviside function. A probabilistic judgment system has a smoother loss function, that doesn’t include any steps or threshold values.

Since no justice system is a perfect instantiation of any of these quadrants, the actual classification should be a 2d plane — a “judicial compass”. For instance, I would place the U.S. justice system in the third quadrant (Punitive/Binary), despite it having certain probabilistic features, most notably on the investigatory side. The burden of evidence varies depending on the severity of the measure: reasonable suspicion for stop-and-frisk, probable cause for arrest etc. The division into discrete table entries is an simplifying assumption for the sake of analysis, and aims to capture the overall ethos of a justice system.

The bottom left corner is a curious combination. Coupled with punitive intent, the probabilistic application of justice is reminiscent of the medieval system of punishment, as described by Foucault in Discipline and Punish. Medieval sentences were public displays of power, and torture was used as an interrogatory tactic: the investigation itself involved punishment, and if one managed to suffer through it without confessing, it was deemed as sufficient punishment[2].

Ranking the Quadrants

I won’t spend a lot of time arguing for the rankings between the three other quadrants. In a punitive system, knowledge of guilt is crucial — unlike reformative systems, where, for instance, mental health assistance can be positive regardless of guilt — the punishment is otherwise not only useless, but deeply unjust. Hence, P/P < P/B. Given a binary mode of judgment, many people argue that reformative methods are more effective at reducing the total harm to society. They are also seen as morally superior: as the self-determination of criminals is questioned, a justice system based on revenge seems barbaric[3]. Hence P/B < R/B.

Finally, given a system whose goal is rehabilitation, the guilty/innocent distinction is less important. As previously mentioned, reformative systems abstract away any moral feelings of disgust towards the individual, and focus on minimizing the occurrence of crimes. From this perspective, the determined guilt of the subject — the binary variable that is if found guilty and otherwise — should be irrelevant to the sentence delivered. I am not advocating for the removal of the “innocent until proven guilty” standard, or from the removal of any such verdicts: it is still very important for individuals to be cleared/charged of wrongdoing. Instead, I am arguing for the sentence to be based on the probability of guilt, rather than the final verdict. In other words, the sentence and the verdict should be conditionally independent given the probability of having committed the crime[4]:

This in turn leads to a probabilistic loss function, as the sentence isn’t influenced by a discrete variable.


Why the Quadrant is Empty

So why isn't this already a thing? Here are the main explanations I could this of:

  1. On a cultural level, the taxonomy I've laid out illustrates a clear progression sequence: jumping from P/B to R/P is too big a change. Reformative justice systems are already pretty avant-garde: when confronted with a probabilistic mode of judgment, the public imports the punitive template it is familiar with. Additionally, people prefer clear-cut judgments, and judiciary interventions remain highly stigmatized.
  2. Certain preventative measures, such as mental health aid, are already distributed on a probabilistic basis. This amounts to a de facto but not de jure instantiation of a probabilistic mode of judgment. Thus, non-existence of a system instantiating the R/B paradigm is not conclusive: systems are already drifting toward this quadrant on the 2d plane.
  3. The probabilistic model is a slippery slope: it can easily be weaponized to justify entirely dropping the “innocent until proven guilty” standard of justice, enforce a surveillance state etc. It is possible that such systems are inherently less stable than the alternative quadrants.

Conclusion

The loss function — the “how” of a society's justice system — is a choice that has to me made, and not a direct consequence of its “why”. Investigating different choices of loss function leads to an interesting taxonomy of justice systems, where progress follows an arc-shaped trajectory, rather than a diagonal line. Probabilistic sentencing becomes the right choice within the context of a reformative justice system: the binary outcome of a trial shouldn't determine the sentence.

  1. ^

    Talia Fisher, Conviction Without Conviction, 96 Minnesota Law Review 833 (2012). Fisher proposes replacing the “threshold model” of conviction with a “probabilistic model” in which sentence severity scales with posterior probability of guilt. The core argument of this post overlaps substantially with hers. I came to it independently and discovered her paper while writing this up.

  2. ^

    Michel Foucault, Discipline and Punish: The Birth of the Prison, trans. Alan Sheridan (New York: Vintage Books, 1995 [1977]), 42. On the continuous gradation of guilt: “penal demonstration did not obey a dualistic system: true or false; but a principle of continuous gradation; a degree reached in the demonstration already formed a degree of guilt and consequently involved a degree of punishment.”

  3. ^

    Robert Sapolsky illustrates this argument well in his books on free will

  4. ^

    is the sentence, the verdict, and the probability of guilt, given all information available to the judge. can also be influenced by other variables, e.g. the subject's receptiveness to interventions/probability of recidivsm, or the severity of the measure's impact on an innocent person.



Discuss

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top