A gentle primer on caring, including in strange senses, with applications

Introduction

Firstly, I will define a straightforward model of caring about other people. I think it is a good model for understanding (and predicting) people’s actions. I also think it’s pretty meta-good to have one’s own ethics effectively be of approximately this form. However, I will not argue for either of these claims in this post.

I will make a few observations about the relationship between Pareto improvements and social welfare in this model. I will then clarify the model in a crucial way, leading to a low-complexity way to think about caring across time. I will use this framework to make some observations on value drift, to give a justification of (some) sin taxes, and to state an insight into parenting.

I see this post as mostly not having ground-breaking insights, but instead as thinking through basic things carefully. I think it’s surprising how much one can get out of such a simple model. I will mention a number of open directions for further thought along the way. It was my intention to keep the main text relatively modular and approachable^[1], with further discussion in the footnotes.

The model

Let $I$ be the set^[2] of all moral patients.^[3] For a moral patient $i \in I$ , let $x_i$ denote the personal utility of $i$ . One might also call this the welfare of $i$ , or how much fun it’s having, or the total amount of joy its experiences spark, or (the integral of) valence or pleasantness, or personal pleasure.^[4][5]

The idea we would like to implement here is that $i$ is not necessarily (just) trying to maximize $x_i$ – it’s possible that $i$ cares about (certain) other people, and not just instrumentally because of e.g. some contract, but terminally.^[6] In particular, we will assume that the terminal utility of $i$ , denoted $u_i$ , takes the following linear form^[7]:

\[u_i=\sum_{j\in I} w_{ij}x_j.\]

$u_i$ is what agent $i$ strives to maximize.^[8] For simplicity, we will assume WLOG that for every $i$ , we have $w_{ii}=1$ .^[9] We will also assume all $w_{ij}\geq 0$ .^[10] If one likes, one can think of the weights $w_{ij}$ as living in a matrix with rows and columns indexed by $I$ . One can treat this as the adjacency matrix of a weighted directed graph.^[11] The weights $w_{ij}$ capture how much $i$ cares about each element of $I$ .^[12] For example:

$i$ being completely altruistic corresponds to having $w_{ij}=1$ for all $j\in J$ ;
$i$ being$$ completely selfish corresponds to $w_{ij}=0$ for all $j\\in I\\setminus\{i\}$ ;
a mother who really cares about her immediate family and also cares just enough about people in general to sacrifice herself to save $10^4$ random people might assign weight $\frac{1}{2}$ to her partner, weight $\frac{1}{2}$ to each of her children, and weight $10^{-4}$ to every person not in her family.

Or one might assign weights according to genetic relatedness, or according to distance in some friendship graph, or according to physical distance, or according to mental similarity to oneself.^[13]

The inequivalence of society-level and individual charity

To give a first taste of this model in action, I will use it to explain why supporting society-level charity is very far from implying that one needs to engage in individual charity^[14][15], contra what seems to sometimes be (implicitly) claimed^[16]. Assume an agent $i$ is living in a country of $10^7$ rich people, and that there are $10^7$ poor people living outside the country. Conveniently for the ease of our analysis, it happens that $i$ is a simple agent, assigning to each other agent $j$ a constant weight $w=w_{ij}$ . Now $$``$i$ is considering the following three worlds:

A world in which $i$ personally donates $100 to a poor person.
A world in which the rich country passes a development aid law that takes $100 from each citizen and sends it to a poor person.
A world in which no donations happen.

Let’s assume that all $10^7$ poor people are equally poor, that all $10^7$ rich people are 100 times wealthier, and that personal utility is the $\log$ of wealth.^[17] Since the derivative of $\log y$ is $\frac{1}{y}$ ,^[18] in a linear approximation, the poor person gains $100$ times the personal utility that the rich person loses from each such donation. The condition on $w$ that is equivalent to $i$ being selfish enough to prefer World C to World A is that $w< \frac{1}{100}$ . In words: $i$ needs to be sort of selfish. But the condition corresponding to $i$ preferring World C to World B is roughly^[19] $w<10^{-9}$ . In words: $i$ needs to be really really selfish. We see that there is a vast range, namely $10^{-2}<w\leq 10^{-9}$ , where $i$ is selfish enough to prefer not to make a personal donation, but where $i$ would nevertheless prefer a society-level donation (which $i$ participates in) over no donation. I would guess that many a reasonable person inhabits this range.

(All this said, I admit that the implication from public to private charity could be more compelling if one is more of a deontologist^[20], and in particular thinks that taxes are also justified solely as duties. And in practice, there are also other things, such as the deadweight loss from taxes or government inefficiency, to consider, which could move the needle toward private charity.)

Pareto improvements and total welfare (subsection for nerds)

The agents in our setting might sometimes come across an option$$^[21] which every agent would prefer over another option, but which is nevertheless worse from the perspective of a social planner.

To say the same thing formally, in world $A$ , let the terminal utilities of agents be $(u_{i,A})_{i\in I}$ , and in world $B$ , let these be $(u_{i,B})_{i\in I}$ . Suppose that for every agent $i\in I$ , we have $u_{i,A}\leq u_{i,B}$ , i.e., every agent $i\in I$ weakly (terminally) prefers world $B$ to world $A$ . We will assume that a social planner cares about $u=\sum_{i\in I}x_i$ . It is then nevertheless possible that the social planner prefers world $A$ to world $B$ , i.e. that

\[u_A=\sum_{i\in I} x_{i,A}>\sum_{i\in I} x_{i,B}=u_B.\]

For an example of this, suppose that two parents $i,j$ are considering buying an expensive toy for their child $k$ , splitting the cost evenly. The toy would contribute $+10$ to the utility of $k$ , but bearing half the cost would change the utility of each of $i,j$ by $-9$ .^[22] Suppose that the parents do not care about each other, but both care about the child with $w_{ik}=w_{jk}=1$ , and that the child is selfish. It is then the case that $i,j,k$ each support buying the toy, but a social planner would not, because $+10-9-9=-8$ . This strikes me as a situation that might occasionally happen in practice.

However, various other nice properties are true in this model. I will quickly state these in terms of some undefined (but suggestive) terminology. I will leave making sense of and proving these as an exercise to the interested reader (commenter?).

If no $i$ cares about some $j$ more than about $i$ -self, then any Pareto trade (between two agents) with no externalities is good.^[23]
More generally, at least two people have to personally lose from any contract that is [a Pareto improvement but bad].
A corollary: the example we gave earlier is the simplest one.^[24]
If $w_{ij}=w_{ji}$ for all $i,j$ and every agent has an equal amount of care to give to the other agents who are asked to consent to a contract^[25], then if everyone consents to the contract, it is good.^[26][27]
If at least one person affected by a Pareto improvement is perfectly altruistic, then it is good.
Any Pareto improvement for selfish agents is also a Pareto improvement for the same caring agents.

$$An interesting further direction is to investigate the impact of more care on total wellbeing. That is, suppose the agents start off inhabiting a default world (perhaps specified by current property rights and contracts). They come across various possible contracts, and they agree to a contract iff it is a Pareto improvement over the current default, after which the default switches to the world specified in this contract. Should we expect worlds where agents care more about other agents to end up better off in this process? Here are some observations:

In the extreme case where every agent is completely altruistic, everyone will indeed agree to precisely those contracts that are socially good.
Conditional on all $w_{ij}\leq w_{ii}$ throughout, considering only trades involving two agents, increasing a weight $w_{ij}$ can only increase (in the sense of set inclusion) the set of possible trades which are Pareto improvements.^[28] So since all such Pareto improvements are good, if every contract available only involves two agents, then increasing weights can only be good.

However, it follows from our earlier example and from the fact that selfish agents only agree to socially good contracts that caring more would turn some socially bad contracts into Pareto improvements. This makes it difficult to unambiguously conclude that caring more is good from the perspective of a social planner. I think we would need to propose a distribution of possible contracts to proceed further with this line of inquiry. My guess is that given a reasonable distribution, more care will turn out to be better. I see figuring this out as an interesting open research direction.

Instrumental caring

A lot of what looks like caring in practice is instrumental caring, i.e. something like a set of agents signing a (possibly implicit) contract that commits each one to consenting to future contracts as if they cared about the other signatories. Alternatively, simply bundling many contracts (for instance, bundling a contract with the contract of one agent paying another agent) could look similar, although this might be computationally more difficult in certain domains in practice (in particular, we might need to think seriously about bargaining), and the outcomes could look quite different if there is uncertainty about the contracts one comes across in the future. I would expect gains from instrumental caring to largely substitute for the gains from terminal caring in domains where transaction costs are low (for instance, the important stuff being easily measurable could be helpful), between agents at similar power levels.

To say a little more, what I mean here by two agents having similar power levels is that it would be possible for each to turn its own personal utility into the personal utility of the other at a rate close to $1:1$ .^[29][30] The reason this is relevant is that for a set of agents where every pair has similar power levels^[31], a contract which is socially good is a Kaldor-Hicks improvement, which can be converted into a Pareto improvement by adding appropriate compensation requirements into the contract (and accepting the new contract leads to a state which is as good as if the initial contract was accepted (with the same total amount of utility being distributed differently)).^[32] So, at least given that transaction costs are low, we would expect socially good contracts to be amended and accepted when the agents involved are similarly powerful.

So, in conclusion, in domains where transaction costs are high or where the agents have very different power levels, I would expect the realized total utility to depend more on the degree to which different agents care about each other. One instance where transactions are very problematic (impossible?^[33]) is when the agents inhabit different times.

Slicing people up across time

Instead of thinking of a person as a single agent inhabiting a long time interval, let’s partition a person into a number of agents, one for each short time interval. Namely, let’s choose a partition of time into short intervals (e.g. of length $1$ second); let the set of starting times of these intervals be $T$ . In the ~simplest case, there is a set of people $J$ , independent of our decisions (i.e. the same in all possibilities we are considering), with the set of weights being also being independent of our decisions. In this case, the set of agents would be $I=J\times T$ , so

\[u_i=\sum_{j\in I} w_{ij}x_j=\sum_{k\in J}\sum_{t\in T}w_{i(k,t)}x_{(k,t)}.\]

In more generality, we might want to allow the set of agents in existence to be different for the different possible worlds under consideration, both because our decisions might lead to different people (in particular, a different number of people) being born, and also because our decisions might affect what certain already living people end up being like, on which we might want to have the weights depend (which is possible in our formalism iff we treat these different possible versions of a person as different agents). It might also be desirable to allow the weights to depend on indexical information – e.g. $i$ -now might assign a high weight to the agent $i$ -at-time-t loves^[34] – which we will handle by letting indexical information be part of the agent specification. In other words, if needed, we will for instance consider same-agent-except-not-loved-by-future-me and same-agent-except-loved-by-future-me to be different elements of $I$ .

Value drift

This section is in titles and footnotes, because I started to run out of steam. Feel free to skip it – nothing in the later sections depends on it.

Avoid weight change, avoid becoming a Kantian

Having one’s ethics change radically is not good from the perspective of maximizing whatever one previously thought ought to maximized, which is reason to prevent such a change.

But it’s okay for different things to spark joy

One thinks of future selves as other people one cares about. One doesn’t care much whether someone one cares about likes playing League of Legends or Dota, just that they are having fun.

Higher-order corrections to the above

Maybe if one assigns weights according to mental similarity, then one actually does care about one’s future selves liking League of Legends instead of Dota.
Maybe if one notices strong correlations between particular ethical views (including weight assignments) and personal interests, one is justified in keeping one’s personal interests.
It seems quite reasonably to instead try to maximize whatever is “actually good” in some deeper sense, even if one is not a moral realist – one could e.g. subscribe to coherent extrapolated volition as an antirealist, and be open to changing weights as one’s CEV-approximation computation has had longer to run (i.e. one has had more time to think about ethics).
Some instances of what might look like changing weights are actually indexical dependence of weights.

Some slightly strange externalities

We will now examine some cases where an action has externalities on other agents that the agent cares about. Externalities often cause some socially suboptimal contracts to go through (or some socially optimal contracts to fail to go through), especially in cases where transaction costs are high, including instances where some contracts are hard to enforce, or in cases where the agents involved have very different power levels.^[35]

All the cases of externalities which we will be looking at will be cases where a single agent can make a decision which affects multiple agents. If this agent is perfectly altruistic w.r.t. the set of agents that are affected, it will make a decision iff it deems it socially optimal. Things become more interesting if the agent is completely or partially selfish. For such agents, the state is often justified in taxing or subsidizing actions which have significant externalities.

Let’s consider the question of externality pricing (i.e. figuring out how large the tax (or subsidy^[36]) on someone should be for doing something which has externalities on other agents, starting from the case where everyone is completely selfish. Suppose agent $i$ is deciding whether to perform an action which has negative externalities on each agent in a set $J\subseteq I$ who are practically unable to trade with $i$ . Taxing $i$ for performing this action at the sum of what agents in $J$ would maximally want to pay to avoid the negative externalities on themselves ensures that $i$ takes this action iff $i$ would take this action if trading were possible (with zero transaction costs and perfect information about the willingnesses to pay of each agent). This seems like a reasonable proxy for the action being socially good.^[37] Or at least, this level of externality taxation is most likely better than none.

[\BEGIN{REMARK FROM FINAL EDIT} I initially thought that handling externality pricing in terms of dollars is better than handling it in terms of utilities. The version in terms of utilities would be to price an externality so that the externality tax leads to a utility loss for $i$ which is the sum of utility losses for all the other agents. Dealing with caring becomes straightforwardly just summing up these losses with weights given by $1-w_{ij}$ to find the appropriate utility cost and picking the monetary cost to match that (if it’s not clear what I mean by this, it will be after reading a little bit ahead in the body of the post).

One reason for doing externality pricing in terms of dollars is that this might be more tractable to figure out in practice. (The utility approach is getting close to “let’s try to figure out what the optimal thing for each person to do is, and pay them iff they do that”.) Another reason is that if the externality tax collected is actually paid out to the people suffering the externalities, then this is exactly the minimal price at which a trade with compensation is guaranteed to have positive utility. At this price, exactly those actions go through for which externalities can be compensated so that the whole contract has positive utility for everyone. Even so, it is possible that the people suffering the externalities would prefer a different price, assuming there has to be one price across versions of the same contract involving agents with different personal preferences, for the same reason that a monopoly would not price at the efficient market price. And e.g. in instances in which these people are many orders of magnitude poorer, such a monopoly price would likely lead to higher total utility than the efficient market price, at least if we look at contracts of this type in isolation.

The utility version of externality pricing leads to the optimal level of production-without-a-wealth-transfer, which also seems like a reasonable proxy to aim for.

I guess that a somewhat more accurate and more complicated way to think of this is the following. There is an action $X$ we are considering taxing, which different agents get different personal utility from, but which always has the same negative externalities. Firstly, assume there is an optimal use of the additional tax dollars collected from a tax (possibly, reducing other taxes), with marginal rate $r$ of turning tax dollars into utility, which we assume the government knows and implements. $r$ would be good to know, and I guess that an adequate government would try to constantly have a good estimate of $r$ . Second, assume that everyone who might do $X$ has the same rate of turning marginal dollars into utility, let this be $s$ . I would guess that it is also fine to instead let $s$ be [the average of this marginal exchange rate across all the people doing $X$ ] – hopefully this is not too dependent on the tax rate in the cases we will consider. Given the total of utility losses from externalities per action – call it $$``$t$ (and assume it is a constant independent of the number of actions), and given $r$ and $s$ , and given the empirical distribution for utilities gained from $X$ , for a proposed externality price $p$ , the effects of the tax are the following:

For each time action $X$ is taken, there is a utility difference of $p(r-s)$ coming from the transfer from the agent taking the action to the government money pool (and its subsequent use).
$$For each time action $X$ is taken, there is some utility gain to the agent taking the action and utility loss $t$ to the agents on whom the externalities fall.

If we are really lucky and $r=s$ , then all we have to care about is maximizing utility from 2, which is equivalent to a trade happening iff the utility gain to the agent is greater than the utility loss from externalities. And this happens exactly if we ensure that $pr=t$ , so $p=\frac{t}{r}$ .

Unfortunately, it seems likely that $r>s$ , since otherwise the government could do at least as well as its optimal thing to spend money on by just giving marginal money to (people like) the agent, at least if we ignore the fact that some money would leak out of the loop into administrative costs (although maybe it is often the case that they should be doing this, with the optimal policy being lowering other taxes on the agent, which also has the benefit of decreasing administrative costs, hmm). I’m not sure if I have anything interesting to additionally say about the general case here. I would like to understand this better. Maybe one can assume that the distribution of personal utility gains to agents taking $X$ is something simple, and then derive some interesting general result?

Everything considered, especially given some of the messiness you will see discussed later, my guess is that the utility approach would have been better, but I won’t fix this now. I might rewrite this entire section if people seem interested.

\END{REMARK FROM FINAL EDIT}]

So far, we have covered the extreme cases where a decision-maker is completely altruistic or completely selfish, which is unfortunately 0% of all cases :). Motivated by these simple cases, I propose pricing externalities by figuring out how much each agent in $j\in J$ is maximally willing to pay for the externality, multiplying this by $1-w_{ij}$ , and summing these over $J$ . It feels like there should be some decent justification of this, but I am currently failing to see one. This should definitely follow if one makes the simplification that utility is just equal to money (times a constant), which will be reasonable in certain regimes (in particular, this is similar to assuming that everyone involved has the same power level).

Another half-assed justification for this pricing rule is that it is the unique method which satisfies the following three properties:

The externality tax is a function of the form $\sum_{j\in J}f\_j(y\_j,w_{ij})$ , where $y_j$ is the maximal amount $j$ would be willing to pay for the externality on $j$ -self. For instance, this is intuitively motivated if we want to ask each agent for what they would pay for the “externality unaccounted for by caring”, and summing the answers to find the total price to assign to the externality.
$f(y\_j,0)=y\_j$ and $f(y_j,1)=0$ . This is a strong version of the statement that we are extrapolating between the selfish and altruistic case. For instance, the second statement is saying that if $i$ is completely altruistic toward $j$ , then we should not assign any further price to the externality on $j$ . Intuitively, the externality should already be internalized.
Maximal simplicity assumption: the externality tax should be a linear function if we vary only $w_{ij}$ (with any constant values for all the other weights).

To be precise: this is the unique solution even if we fix a single contract and only allow weights to differ. If we instead aim to pick a pricing method across different contracts (but with the same functions $f_j$ across contracts), then we should be able to replace the second assumption with the weaker assumption that we are extrapolating between the completely selfish and completely altruistic case.

A poetic way to state our conclusion is that externalities should be priced according to the total effect on people (and parts of people) external to the agent’s web of caring.

Having kids (and abortion)

In my observation, (social) libertarians who think bringing people into existence is valuable (both positions which I like) often tie themselves into knots over abortion. Here’s what seems to me to be the obvious way to think about abortion. One’s child’s worthwhile life is a positive externality (to the child)^[38], and the obvious policy w.r.t. any externality is to internalize it, i.e. in this case to have a subsidy for having children.

There’s a number of important details to be figured out regarding the best payment scheme, e.g. a lump transfer upon birth, or monthly installments, or a lump transfer when the child earns a PhD, or % of the child’s salary,^[39] which track a child living a worthwhile life to various extents and could get Goodharted to various extents, but I do not want to discuss this further here. Instead, I would like to assume that a parent has the option to press a button that creates a child and secures the child a life of known personal utility (possibly also with some effect to the parent’s personal utility, of size depending on the circumstances), and that there is no way for the parent and child to sign a contract that the child has to compensate the parent for this personal utility gain in the future. It seems clear that the amount the parent cares about their child matters a lot in getting the externality pricing right here, and I hope we have made some theoretical progress towards answering this question.

A further messy issue is choosing the right way to compare money from different times. One option for this is to discount by average government bond interest rates. Another option is to turn money into utility on each side of the equation first, either at the rate for the given person or at society’s median rate at their time. I will not discuss this further now, but thinking this through and coming up with an estimate for the optimal subsidy size seems potentially practically important.

Time discounting

By time discounting one’s personal utility, I mean assigning future versions of oneself lower weights in our model.^[40] I claim that most people time discount utility.^[41]This is just another way of saying that people are not fully altruistic towards future versions of themselves. Many activities, e.g. smoking, drinking alcohol, exercise, saving, studying, have obvious externalities on future versions of oneself. Many sin taxes can be argued for as being usual externality taxes, with externalities being on partially-uncared-for future selves.

Democracy is less myopic than its constituent individuals

I claim that democracy optimizes for utility with significantly less time discounting than individual people. But why would the “sum” of a bunch of short-sighted opinions be any less short-sighted? I recommend pausing here for a bit to see if you can figure why that would be the case.^[42]

Here is my explanation. My first subclaim is that most people would sacrifice themselves for at most like $10^5$ other people. It follows from this that in a democracy with at least like $10^6$ people, considerations about other people’s utility should dominate the voting decision of each person.^[43] (This is assuming that democracy is just people voting on contracts, and that the costs or benefits of each contract are fairly evenly distributed across people. The conclusion will not apply e.g. if the law that people are voting on says that you and only you should stop smoking.)

My second subclaim is that most people have a discount rate for other people which is significantly lower than their discount rate for themselves. One reason this might make sense is that to person $p$ , the difference between $p$ -now and $p$ -20-years-from-now feels a lot bigger than the difference between random-guy-now- $p$ -has-no-relation-to and random-guy-20-years-from-now- $p$ -has-no-relation-to. This could be because the random walk $p$ will be on for the next $20$ years will take him reasonably far from who he is now (average societal drift plus aging plus driftless random walk), but the expected difference between the two random guys from different times is determined by societal drift only. If for instance one’s weights are based on mental similarity or on distance in some relatedness graph, then I think one can conclude from this observation that the sum of weights one assigns to all agents alive at a particular time decays slower in time than the weight one assigns to oneself. But it also just seems empirically accurate that people seem to discount more when deciding for themselves than when deciding for other people.

Combining the two subclaims, we conclude that people make voting decisions with a significantly smaller effective discount rate than the one used for personal decisions.

Parenting as setting up less myopic incentives

As a parent, caring about one’s child with a smaller time discount rate than the child has for themselves^[44] provides reason to set up an incentive structure for one’s child which internalizes the externalities on future versions of the child. For instance, if child+[20 years] would benefit from being better at math, then that’s a reason to reward child-now for learning math. One can sort of think of parents as merchants facilitating trades^[45] between child+[20 years] and child-now.

Optimal pricing of these externalities is easier to think about if we consider child-20-years-from-now to be a roughly fixed agent independent of our actions, i.e. with fixed preferences. But what if there are versions of child-20-years-from-now that end up being violinists who think it’s really great child-from-their-past learned how to play the violin, but there are also versions of child-20-years-from-now who did not learn to play the violin and do not think of not having done so as a big loss either? One can dissolve this by noting that what we care about is the expected value of our child’s future, i.e. expected value of the personal utility of the self that is realized, and we can sensibly reason about changes now that are likely to change this expected value (that said, we might want to assign different weights to different future versions based on e.g. mental similarity, or if we dislike wireheaders).

A final observation: such rewards should generally decrease as the child’s time horizon broadens, because they start to discount less.

Double counting in externality internalization

Here are two cases that stand interestingly in contrast to each other.

In the first case, suppose the state pays children for being kind to their parents (with the money coming from the general tax pool, i.e. with no increase in taxes for parents whose children are nice), exactly internalizing the positive externality to the parents. Suppose further that a child still feels that being nice is just barely not worth it. Would a parent be interested in paying the child 1 dollar for being nice if that tips the scale? (Note that this would constitute overcompensating for the externality, leading to a deadweight loss.)

In the second case, suppose a parent does no time discounting for their child, and the state has an incentive structure in place that rewards children for learning which appropriately captures the externality on future versions of themselves (again, with no discounting). Would a parent still be interested in setting up an additional incentive structure that rewards the child for learning?

I claim that even though the situations look superficially similar, the answer to the first question is yes, whereas the answer to the second question is no. I will leave making sense of this as one final exercise to the reader.^[46]

Further directions

In addition to the many directions for further thought mentioned in the text and in the footnotes, there is an obvious way of combining this with Internal Family Systems stuff. I don’t presently see a clear path to any interesting insights that only fall out of the conjunction of these two views, but I find it likely that there would be some.

Accreditation

I would be surprised if there were more than a few individual points here that had not been noted before by someone else, but I don’t know who first made each point, and I decided not to spend a significant amount of time finding out. I will instead thank the intellectual culture (that arose out) of the Enlightenment, and Kirke and Rudolf, for helpful discussions. And I’ll thank DALL-E 2 and Picasso for the illustrations.

^{^}

And I think (and hope!) that this mostly worked out, except for some messiness in the section on externalities.
^{^}

We will be assuming that $I$ is countable, and in fact finite in cases where there would be concerns about convergence otherwise. When discussing future selves, it might be neater to allow $I$ to be uncountable, and to modify the formalism so that $u_i$ is a sum of integrals, but we will refrain from this to keep the presentation simpler.
^{^}

By “moral patient”, I just mean a being whose experiences have intrinsic moral value, which potentially includes any being with experiences. I will later assume that moral patients are all also agents, by which I mean something like things that make decisions; if this equivocation is a source of concern for you: I think everything in this post remains true if we treat moral patients that can’t make decisions as “agents” that just never get any chances to make decisions.
^{^}

I think the rest of the post makes sense if one remains pretty agnostic about what “personal utility” means precisely, as long as one considers the basic idea to be workable, and in particular understands the distinction with the terminal utility of that person, and I don’t intend to discuss what $x_i$ means at significant length in this post. But here is a discussion of insignificant length:

I think of $x_i$ as being the dumbest sensible thing that captures the idea of being linear in the number of equally pleasurable experiences (where I’m assuming that pleasurability already captures the effect of instrumental considerations like getting bored). If you like, the unit of $x_i$ could be a marginal neg-dustspeck in the eye of the median person in annoyance-derived-from-dustspecks. The calibration of various experiences to a common metric within one agent can be estimated by offering it, or a computationally more powerful version of it, various tradeoffs between lotteries involving cases where it knows the only conscious whose experiences are affected is itself, or asking it to condition its answers on solipsism.^[47] One unit of utility could maybe be calibrated between two agents by trying to estimate the tradeoff they would accept from behind a veil of ignorance; maybe by doing some crazy thing with Neuralink; maybe by coming up with some model for predicting the intensity of various experiences in various people, for instance by tracking people over time and asking them to consider tradeoffs between current and past versions of themselves; maybe by setting up some appropriate economic game; maybe by experimenting on twins; maybe by using just noticeable differences. Adjust for likely biases. Potentially do something somewhat wackier for wackier moral patients. Or perhaps we will be successful in constructing a neat theory of which computations or field configurations correspond to good experiences.

I admit that I still haven’t quite defined this “personal utility”, at least not in the sense of reducing it to more basic concepts. At least for now, I’m fine with it being a theoretical concept that relates in various ways to other stuff. I guess this is also mostly what I think about “up quark”, “force”, “belief”, and so on. If this strikes you as appallingly anti-realist: consider replacing these last few sentences with a semantic externalist thing and proceeding.

By the way, given that one has worked out the details of the above, I don’t think there is any additional coefficient that results need to be multiplied by to account for complexity/level of consciousness/intelligence of each agent. I think the above methodology would already take this into account correctly. The process would output that the value of a typical human experience is (at least) an order of magnitude larger in absolute value than the value of a typical bee experience. That said, figuring out this complexity-dependence might well be a crucial part of the above process.
^{^}

You can think of $x_i$ as a real number (which makes sense if we are implicitly operating with a single history of the world, or more narrowly a single history of experiences of $i$ , from the beginning of time till the end of time, in mind), or as a function from the set of possible world-histories (or the set of possible experience-histories) to $\mathbb{R}$ . I hope everything to come makes sense with either framing in mind.
^{^}

I am guessing that this distinction will be obvious to most readers here, but I think there is a reasonably possible confusion in this region of concept-space that leads one from something like [the metaethical position that all there is to ethics is acting according to one’s own preferences] to something like ethical egoism via an equivocation error involving personal utility and terminal utility. (That said, I do not wish to claim that there is no way to make a sound argument from one to the other.)
^{^}

This is clearly related to Harsanyi’s Utilitarianism Theorem. In fact, I see this theorem as providing strong justification for having a terminal utility function of this form – the philosophical setting here is somewhat different than the setting Harsanyi appears to have had in mind in the paper, but I think the assumptions of the theorem are quite compelling in our setting.

To explain the difference in setting: it appears to me that Harsanyi was thinking of the terminal utilities (or rational preferences) of each agent as being given, and showing that some assumptions then constrain a social welfare function into having a certain form. By the way, I actually think his Postulate c is incorrect (or well, unappealing) in this philosophical context, with there being compelling counterexamples similar to the main example I provide in the subsection on Pareto improvements.

Here is what I currently believe is an explicit counterexample to his Postulate c (but I recommend reading the rest of this section of my post first and then returning here): let the weight graph be the directed version of a big star, with everyone really caring about the guy in the middle, and the guy in the middle only sort of caring about each other agent; offer this set of agents the contract of $+1$ personal utility to the middle guy and $-1$ personal utility to everyone else; I will leave it to the interested reader to figure out weights in each direction so that everyone is indifferent about this contract; however, it seems clear to me that this contract is really bad from the perspective of a social planner.
^{^}

To be precise: randomness over world-histories makes $u_i$ into a random variable, and $i$ is of course maximizing the expectation of the random variable $u_i$ . (I won’t specify the decision theory with much precision, because I don’t think anything in this post hinges on it, but if one is causally minded, one might want to only look at only the contribution of everything from the future here. Or, this becomes vacuous if one decides in the next section to assign weight 0 to all past agents.)
^{^}

Or well, there is a teeny-tiny loss of generality here: we have assumed that if $i$ cares about something at all, then $i$ cares about $i$ -self at least a little bit, i.e. that $w_{ii}>0.$ Other than that, $w_{ii}=1$ without loss of generality, because maximizing $u_i$ is equivalent to maximizing $v\_i=\frac{u\_i}{w_{ii}}.$ The weights don’t have any “physical meaning”, but ratios of weights do have a “physical meaning”. For instance, $\frac{w_{ij}}{w_{ii}}=\frac{1}{2}$ iff $i$ is indifferent between getting $1$ unit of personal pleasure $i$ -self and $j$ getting $2$ units of personal pleasure.
^{^}

This in no way rules out that there could be instrumental reasons to decrease someone's personal utility. But regarding terminal values, I doubt there is anyone who has a negative coefficient on someone else's utility that survives some contemplation (well, I don't currently see a plausible path to this), except maybe for people who are too computationally bounded to operate with a distinction between instrumental and terminal values?

^{^}

It would be very cool if one could draw connections between stuff from \[graph theory\]/\[network analysis\] and ethically/economically interesting properties of this graph. Will an upper bound on the second eigenvalue of the adjacency matrix together with a lower bound on trust in a society guarantee that rich people use public transport? I will mention another particular question of this kind in a later footnote.

^{^}

It's important to understand that the weights capture per-experience care, not total care. For instance, with `$i$` being a grandfather and `$j$` being his grandchild, it's perfectly possible that simultaneously `$w_{ij}<1$` and it maximizes `$u_i$` if the grandfather sacrifices his life to save his grandchild's.

^{^}

Out of these options, the ones that I think have the smallest expected distance to personal coherent extrapolated volition (or what would be suggested by [an ideal advisor](http://intelligence.org/files/IdealAdvisorTheories.pdf), or the views held in reflective equilibrium, where the equilibrium might be reached by doing [Bayesian ethics](https://rucore.libraries.rutgers.edu/rutgers-lib/40469/PDF/1/play/); or [some other kind of indirect normativity](https://ordinaryideas.wordpress.com/2012/04/21/indirect-normativity-write-up/)), where the expectation is taken both over my uncertainty and over picking a uniformly random person, are being completely altruistic and assigning weights according to mental similarity.

For the above claim to fully make sense, one needs to specify the personal utilities, since otherwise the model's prescriptions are not fully specified, which makes it unclear how we should be calculating its distance to CEV – by distance, what I had in mind was something like the number of disagreements on some representative set of decision problems, or a sum of all the badnesses of the verdicts (where badness is measured by the difference of the CEV-utility of the best option versus the option chosen by the proposed model), or the `$L^2$` norm of the difference of the CEV-utility and the `$L^2$`-distance-minimizing affine transformation of the model-utility (this assumes a measure on the space of all worlds), or how much worse the world would be (in terms of CEV-utility) if one perfectly followed the advice of model-utility instead of CEV-utility in one's decisions.

I endorse the claim with the personal utilities in this model being what I proposed in a previous footnote. I also endorse it with the personal utilities being "chosen by CEV", meaning the ones that minimize distance from CEV for given weights. I would also probably endorse this claim with most other reasonable things as these personal utilities.

^{^}

Or well, I only want to say this conditional on the settings of weights considered in a bit being "metaethically tenable", I think. I do not necessarily wish to claim that they are tenable.

^{^}

That said, I think both are good!

^{^}

I do not claim that this is commonly claimed by rationalists/EAs, but I think it is often (implicitly) claimed by characters appearing in my media diet (e.g. [here](https://twitter.com/michelletandler/status/1508236751361519616) or [here](https://www.youtube.com/watch?v=65uuGA2xGwg&t=188s)).

^{^}

These assumptions are actually unnecessary, in the sense that the result of this section is robust to making much weaker assumptions here. The assumptions are mostly here to facilitate the presentation.

^{^}

I'm using the convention `$\log y:=\log_e y$` here. It's the most common convention in math, and I'd like to spread it. :)

^{^}

The condition is that `$10^7\\cdot w\cdot 100-(10^7-1)\cdot w-1<0$`, or equivalently `$w<\frac{1}{10^9-10^7+1}\approx 10^{-9}$`.

^{^}

Actually, there is a way to justify a kind of deontological principle as a heuristic for utility maximization, at least for a completely altruistic agent. For concreteness, consider the question of whether to recycle (or whether to be vegan for environmental reasons (the same approach also works for animal welfare, although in this case the negative effect is less diffuse and easier to grasp directly), or whether to engage in some high-`$\text{CO}_2$`-emission-activity, or possibly whether to lie, etc.). It seems like the positive effect from recycling to each other agent is tiny, so maybe it can be safely ignored, so recycling has negative utility?^[\[48\]](#fnx1vfqvx8tjn)^ I think this argument is roughly as bad as saying that the number of people affected is huge, so the positive effect must be infinite. A tiny number times a large number is sometimes a reasonably-sized number – even at the extremes, size can matter.

A better first-order way to think of this is the following. Try to imagine a world in which everyone recycles, and one in which no one does. Recycle iff you'd prefer the former to the latter. This is a lot like the categorical imperative. What justifies this equivalence? Consider the process of going from a world where no one recycles to a world where everyone does, switching people from non-recycler to recycler one by one. We will make a linearity assumption, saying that each step along the way changes total welfare by the same amount. It follows that one person becoming a recycler changes total welfare by a positive amount iff a world in which everyone recycles has higher total welfare than a world in which no one does. So if one is completely altruistic (i.e. maximizes total welfare), then one should become a recycler iff one prefers a world where everyone is a recycler.

I think the main benefit of this is that it makes the tradeoff easier to imagine, at least in some cases. Here are three final remarks on this:

1) If our agent is not completely altruistic, then one can still understand diffuse effects in this way, except one needs to add a multiplier on one side of the equation. E.g. if one assigns a weight of `$1/10$` to everyone else, then one should compare the current world to a world in which everyone recycles, but with the diffuse benefits from recycling being only `$1/10$` of what they actually are.

2) We might deviate from linearity, but we can often understand this deviation. E.g. early vegans probably have a superlinear impact because of promoting veganism.

3) See [this](https://twitter.com/wtgowers/status/1564496390516097024?s=20&t=PXwk-V_mmMrA_-jpNg1l-Q) for discussion of an alternative similar principle.

^{^}

We think of decisions here as choosing between two fully specified worlds. One can also allow choices between lotteries more generally, in which case we just think of the options considered here as trivial lotteries.

^{^}

Let us assume that the price of the toy minus the production costs is small, in the sense that the total contribution from the parents buying the toy to the wellbeing of the employees and shareholders of the toy company is at least an order of magnitude less than the contribution of the utility changes we mentioned earlier. (And assume similarly for any externalities.)

^{^}

That said, it's possible that a trade has no externality on anyone else's personal utility, but a third person would nevertheless want to subsidize a particular trade, that this would make the trade go through, and that this contract would be bad.

^{^}

Actually, there is a similar example where caring is always mutual, which one might consider simpler: let the utility differences be respectively `$-5,-5,+9$`, and let the nonzero cross-weights be `$w_{ik}=w_{ki}=0.8$` and `$w_{jk}=w_{kj}=0.8$`.

^{^}

Okay, I will say what this means: with `$J$` being the set of agents asked to consent, there is a constant `$c$` independent of `$i\in J$` such that `$c=\sum_{j\in J}w_{ij}$`. A term I would propose for this is that the weight graph's induced subgraph on `$J$` is `$$``$c$`-\[weighted-regular\].

^{^}

The weight graph being a disjoint union of cliques (e.g. everyone cares about their family) is a subcase, of which everyone being selfish is a subsubcase.

^{^}

If you are looking for exactly one statement to prove, I strongly recommend this one.

^{^}

There is a subtlety here. By a Pareto improvement, we mean a trade that any agent whose personal utility is affected would agree to, not a trade that any agent whose terminal utility is affected would agree to. The latter is a stronger condition, and under that latter stricter notion of \[Pareto improvement\]*, it is possible that increasing a weight would make an initial \[Pareto improvement\]* no longer be one.

^{^}

In many situations, this correlates quite well with the agents being equally wealthy. The idea is that a rich person could transfer a tiny fraction of their wealth, hence only incurring a slight personal utility cost, to a poor person, while increasing the personal utility of the poor person enormously, whereas any transfer of wealth in the opposite direction would hurt the poor person much more than it would benefit the rich person. The bidirectionally possible exchange rates in this case are bounded quite far away from `$1:1$`, so we would see this pair as having vastly different power levels under our formalism, and I think this matches our intuitive notion of power levels as well.

I think this also holds up when the extremal rates of exchange are achieved by things stranger than wealth transfers, like in the manager-employee relationship (especially if there is a significant principal-agent-misalignment between the manager and the company), in the \[government official\]-citizen relationship (especially if the official is significantly misaligned with the state), or in the teacher-student relationship (again, especially if the teacher is misaligned with the school).

^{^}

Under "turning one's own personal utility into the personal utility of the other", I think we might want to include contracts involving more than these 2 people (assuming everyone else is happy with the contract), but only those which would still go through if every agent involved was selfish.

^{^}

Assuming zero transaction costs, having equal power levels is a transitive relation, at least assuming one is allowed to propose a sequence of multiple trades (i.e. contract involving multiple people) in "turning one's own personal utility into the personal utility of the other", so it defines an equivalence relation. Given transaction costs, stuff becomes trickier. I think transaction costs should decrease as the number of similar-power-pairs increases, and conditional on the number of pairs staying the same, as the similar-power-graph becomes a better expander. Saying something non-vague in this direction would be interesting. (Also, it feels like there could be some business ideas here?)

^{^}

Actually, I lied here. I believe this argument works for selfish agents, but not necessarily for terminally caring agents, at least not with the notion of good the maximizing of which matters (i.e. the argument fails if we care about the sum of personal utilities; the argument might work if we care about the sum of terminal utilities, but I consider it incorrect to do so). I nevertheless think that the big claim from this paragraph is mostly correct; my true justification is a hope that the result from the simple selfish case reasonably extends to the messy case where people can care about each other.

Actually, when these conditions are satisfied (which is suspicious, and it is especially suspicious that this would be preserved over time as e.g. the more capable or better-positioned agents become richer, but let's proceed), I guess it could only be the case that more caring decreases the total utility achieved compared to the case where everyone is selfish, but with a return to guaranteed optimality in the extremum where everyone is totally altruistic. So in this regime, I hereby reverse my earlier guess about more caring being better. My updated general guess is the following: more caring is good between agents at different power levels, and much less important (or perhaps as likely to be bad as good) between agents at similar power levels. (Also see [this](https://www.lesswrong.com/posts/jDQm7YJxLnMnSNHFu/moral-strategies-at-different-capability-levels).)

^{^}

except for (arguably) pretty wacky [stuff](https://www.lesswrong.com/tag/acausal-trade)?

^{^}

To make this example work without circular definitions, we might want to be careful about defining love without reference to caring.

^{^}

I think the justification I would give for externalities not being that much of a problem (w.r.t. achieving maximum social welfare) otherwise is essentially the same as in our earlier discussion on when Kaldor-Hicks improvements can be transformed into Pareto improvements. (Also see the [Coase theorem](https://en.wikipedia.org/wiki/Coase_theorem).) As earlier, I think such an argument only works if the agents are selfish (and also if there are no issues with information and bargaining, which I am taking to be subsumed by the assumption that transaction costs are low).

^{^}

A subsidy on `$A$` is sort of just a negative tax on `$A$`. It's also sort of just a tax on not-`$$``$A$`. I think most economics things about taxes generalize to subsidies in both of these ways without changing any of the math. But I could imagine an argument that there is some significant behavioral economics type (irrational) difference, sort of like (I would guess) there is an empirical difference between how people treat paying for bus tickets vs paying for penalty charges for not having a ticket. (There are of course also rational reasons for not just comparing \[ticket price\] to \[penalty fare times probability of getting caught\], e.g. having to waste some time, but I'm guessing that there is a big empirical difference even after we count these as costs.)

^{^}

Given no computational constraints, we might want to set taxes so that exactly the utility-maximizing actions are made, but this is clearly difficult.

^{^}

There could be positive effects for the parents as well, but the parents account for those when making a decision. But externalities on people other than the child and the parent could also enter into the compensation calculation, if there is reason to believe that these would contribute significantly.

^{^}

Furthermore, children subscribing to certain decision theories might compensate their parents later anyway (the situation here seems quite similar to [Parfit's hitchhiker](https://www.lesswrong.com/tag/parfits-hitchhiker)); state intervention would be superfluous in such cases. Or the parents could brainwash the child or do something to effectively make the child sign a contract to compensate them later.

^{^}

Of course, the weight might be different for different future versions of oneself. I think what I want to say here more precisely is that this is true for most (according to the empirical distribution) future versions of most people, or for the average future version of most people, or for the average future version, with the average taken both over people and over their future versions.

^{^}

The plots on page 362 [here](http://www.christosaioannou.com/Frederick,%20Loewenstein%20and%20O'Donoghue%20(2002).pdf) (page 12 in the pdf) look like reasonably strong evidence of this, although I have some uncertainty regarding whether the studies were any good at capturing utility discounting (in particular, did not fall for something stupid like ignoring the fact that people are likely to be richer when older and hence value money less), or in fact about whether this was even what they were trying to do. I have not spent sufficient time on this to be reasonably certain about this empirical question. I will try to update the post if someone points out in the comments that one can't deduce the existence of time discounting from this data.

One problem I anticipate with these plots is that they might not be accounting for uncertainty about one's future existence, which is a commonly cited reason for instrumental time discounting, and which would not constitute time discounting in the sense relevant to this subsection of my post. That said, I don't expect there to be a rational way to get the high discount rates indicated by the plots from instrumental discounting of this kind alone.

(By the way, the plots include a data point where the discount rate seems to be graphically indistinguishable from 1, which seems interesting. (In fact, I'd put like >1% probability on that being the one study in this sample that correctly captured non-instrumental time discounting in utility...)  If anyone posts a link to that paper in the comments, I would be grateful for that.)

^{^}

Or maybe you can come up with an argument for why it's not the case? Or maybe it is the case, but there is some completely different consideration that should dominate the analysis, which I've missed?

^{^}

I could see a counter-claim here saying that people still seem to vote for policies according to what benefits them individually. This could be because people are irrational, or because they are computationally limited and this is a heuristic, or because this is part of the perceived rules of the voting game (I would guess that many analyses of voting decisions assume that each person is voting according to a decision rule with a majority of the total weight on themselves or their family). One could make the plausible counter-counter-claim that perhaps a better description of what's going on is that people are trying to vote according to a decision rule with a majority of the total weight on other people (or more strictly people distant in the social graph), but it just often ends up being what's good for themselves, perhaps because of something like the [typical situation fallacy](https://www.lesswrong.com/tag/typical-mind-fallacy), but in a way which still makes the rest of this argument go through (perhaps because one is still less myopic when deciding for these other versions of oneself). But even if that counter-counter-claim fails, what we need for this argument is actually the much weaker claim the sum of weights assigned to other people is at least on the same order of magnitude as the weight assigned to oneself, which strikes me as eminently reasonable.

^{^}

Or even if the child has a terminal time discount rate which is no lower, one could argue that a good heuristic for their computational boundedness is that they ignore consequences on future selves, and I think the rest of this section would still apply in that case.

^{^}

This justifies calling [unparenting](https://en.wikipedia.org/wiki/Free-range_parenting) "North Korean style parenting". (I am actually very supportive of North-Korean style parenting – I think parents often epicfail at setting up an adequate incentive structure.)

^{^}

I might post my explanation as a comment later.

^{^}

I guess these proposals might not give reasonable results for agents who care about stuff other than what is experienced by someone, or for agents who value their own experiences only conditional on the experiences of other agents that are involved. I currently think this is misguided, but even if it is misguided, I admit that this is still a major issue for this framework's usefulness for understanding agents. My hope is that even if you disagree with this being misguided, or agree that it is a major issue for explanatory/predictive purposes, you can still join me in drawing some conclusions from this model that have a decent chance of extending to models you would see as better, or to reality.

^{^}

Among others, I have seen a philosophy professor at a fine Institvte use this argument.