Milan Juza 19/05/2026 Milan Juza 19/05/2026

Most charity evaluation measures the wrong thing

Singer's argument answers the whether question. It does not answer the how. Once I accepted that giving substantially was both morally sensible and practically required, I found myself facing a different and considerably harder problem: how do you know whether a charity is actually worth giving to?

My first instinct was to look for external validation. Charity watchdogs exist precisely for this purpose: Charity Commission filings in the UK, Charity Navigator in the US, assorted rating schemes that purport to separate responsible organisations from careless ones. What I found was a methodology built almost entirely on the wrong question.

This is Part 2 of 5 of the Giving With Intention series. Read Part 1 and Part 3

Singer’s argument, which I wrote about in the first post in this series, answers the whether question. It does not answer the how. Once I accepted that giving substantially was both morally sensible and practically required, I found myself facing a different and considerably harder problem: how do you know whether a charity is actually worth giving to? If I really wanted to ensure I am making as much impact with my donations as possible, how do I determine which causes, charities or institutions to focus on?

The overhead ratio is not a measure of impact

The dominant heuristic in mainstream charity evaluation is the overhead ratio: what percentage of income does an organisation pass through to programmes versus spending on administration, fundraising, and staff? Charities that exceed roughly 20 to 25% overhead are treated with suspicion. Those below it are treated as virtuous.

While this superficially looks like a sensible indicator to optimise for, it is a measure of organisational tidiness. It is not a measure of impact, and treating them as interchangeable produces real distortions.

A charity that spends 5% on overheads and delivers a programme with no evidence of effectiveness is worse than one that spends 30% on overheads to run a rigorously evaluated intervention that demonstrably works. The overhead figure tells you nothing about what the money achieves. It tells you something about how the accounts are structured.

There is a deeper problem. Overhead ratios actively penalise the activities that produce good outcomes: monitoring, evaluation, randomised controlled trials, and staff capable of designing interventions and interpreting evidence. A charity that invests in understanding whether it works is, under the overhead metric, a worse charity than one that spends all its money on inputs without ever asking about outputs. The incentive runs exactly backwards and effectively penalises investments into outcome optimisation.

It became very clear to me that treating overhead as the primary signal is clearly not helpful and I realised that quite early in the process. Finding a better approach took a bit longer.

What cost-effectiveness actually measures

The question I eventually settled on is simple to state: what does £1 buy, in terms of human welfare? Answering it requires having some mechanism to measure welfare, which is not a trivial challenge.

In practice, two units are commonly used, and they measure the same thing from opposite directions. A QALY (Quality-Adjusted Life Year) represents one year of full health gained; it is the standard unit in high-income healthcare systems, including the NHS. A DALY (Disability-Adjusted Life Year) represents one year of healthy life lost, either to premature death or to illness and disability; it is the standard unit in global health economics, used by the WHO and the research bodies that evaluate interventions in low-income settings. One averted DALY corresponds roughly to one QALY gained. The distinction matters in practice because the contexts in which each metric is used are quite different.

The UK’s National Institute for Health and Care Excellence (NICE) typically approves treatments costing up to £20,000 to £30,000 per QALY gained. This is roughly what UK society has decided a healthy year of life is worth in a domestic healthcare context. Naive idealists may find the idea that you can assign a monetary amount to a year of someone’s life somewhat repulsive but, in the real world, governments and organisations do this (and have to do it) all the time to make decisions about different policies.

Now contrast the number above with what follows. The Disease Control Priorities project, a large academic collaboration that evaluates interventions across low-income settings, estimates that many of the most effective global health programmes avert a DALY for well under $200. GiveWell, whose methodology I will come to, puts the cost per life saved at its top-rated charities in the range of $4,000 to $6,500, depending on geography and year. The implicit cost per DALY at that level is substantially lower than the NICE threshold, often by a factor of 50 or more.

This is not a rounding error. It is a structural consequence of two facts: disease burden is concentrated in low-income contexts, and intervention costs are structurally lower where wages and supply chains are different. The arithmetic does not change based on whether you find it comfortable. It’s clear that impact of one’s donation can be orders of magnitude larger if directed in the right way.

The scale of the asymmetry

Abstract multipliers are easy to discount so let’s look at concrete examples.

Guide Dogs for the Blind UK costs approximately £50,000 to train and place a single guide dog. That investment helps one person navigate the world with greater independence; the benefit is real and not trivial. By contrast, the cost of cataract surgery in a high-burden developing country context, through organisations such as Sightsavers, runs to roughly £25 to £50 per operation, restoring functional sight to a person who had lost it. At those prices, the same £50,000 would restore sight to somewhere between 1,000 and 2,000 people. Both interventions address visual impairment. The cost-effectiveness ratio between them is approximately three orders of magnitude.

The comparison is not a criticism of guide dog charities. It is an illustration of what geographic and economic context does to cost-effectiveness, independent of how well-run or well-intentioned an organisation is.

The same logic applies when comparing interventions across health contexts more broadly. Despite significant progress in recent decades, Malaria still kills around 600,000 people every year, many of them children. Long-lasting insecticide-treated bednets distributed in high-malaria-burden sub-Saharan Africa, at roughly $5 to $7 per net, prevent a substantial fraction of malaria deaths in the households where they are used. The cost per life saved, at scale, sits in the low thousands of dollars. Most charitable interventions operating in high-income countries cannot approach that ratio, not because they are doing less valuable work in absolute terms, but because the baseline cost structure of operating in those environments is entirely different.

This is the central asymmetry that evidence-based giving tries to navigate.

Counterfactual impact

Cost-effectiveness is necessary but not sufficient. A charity with a strong evidence base and favourable cost estimates still needs to pass a different question: Does my donation change what actually happens?

This is the counterfactual test. If a charity has more funds than it can productively deploy, or is delivering activities that would occur regardless of marginal donations, then additional money has no causal effect on outcomes. The organisation’s impact is genuine; my contribution to that impact is zero.

In practice, this means asking what GiveWell calls “room for more funding”: whether a charity can absorb additional donations, translate them into additional programme delivery, and do so within a reasonable timeframe. A charity with a funding gap and established operational capacity is a materially different bet from one that cannot deploy capital without degrading quality or accumulating reserves with no deployment plan.

I treat room for more funding as a filter, not a bonus. Evidence of effectiveness matters. So does evidence that my donation is adding to a distribution, not releasing pressure somewhere else.

GiveWell, Giving What We Can, and the case for independent evaluation

The most systematic attempt I know of to apply this framework at scale is GiveWell. Founded in 2007, it produces detailed cost-effectiveness analyses of a small number of charities that meet a high evidential bar. The methodology involves primary data, model-based projections, explicit uncertainty quantification, and published reasoning that can be audited and challenged. Their models are available in full; the assumptions are stated, not buried.

Giving What We Can (GWWC) takes a complementary approach. Originally a pledge community built around Toby Ord’s commitment to give 10% of lifetime income to effective charities, it has evolved into a broader recommendation platform that draws on multiple evaluators. Its Global Health and Wellbeing Fundcurrently allocates to GiveWell’s research, reflecting the degree to which GiveWell has become the reference standard in this domain. GWWC also maintains its own recommended charity list and extends into cause areas beyond global health, including animal welfare and global catastrophic risk.

Neither GiveWell nor GWWC is the only organisation doing this work. Founders Pledge produces comparable analysis for founders and high earners considering large donations. The Copenhagen Consensus has run systematic cost-effectiveness comparisons across a broader set of global priorities. For an individual donor making evidence-based decisions in global health, GiveWell remains the most thorough publicly available resource, but the ecosystem around it matters.

GiveWell’s current top charities, as of its most recent update, are the Against Malaria Foundation (bednet distribution in high-burden malaria regions), Malaria Consortium (seasonal malaria chemoprevention), Helen Keller International(vitamin A supplementation), and New Incentives (conditional cash transfers to increase childhood vaccination uptake in Nigeria). All operate in high-mortality, low-income contexts where the disease burden is concentrated and the cost per intervention is low.

A methodology that corrects itself

One feature of this approach that I find more compelling than any single cost-effectiveness estimate is the commitment to self-correction.

For instance, GiveWell maintains a public page titled “Our Mistakes,” updated regularly, that documents errors in research, modelling, and funding recommendations, along with what changed as a result. I find GiveWell’s epistemic honesty here genuinely compelling: this is not a public relations gesture, it is a live log of methodological revision and course correction. It’s impossible to never make a mistake, and having a robust mechanism that looks for errors and actively working to address them is a great way to maximise learning.

The most instructive example of an error made and subsequent course correction is No Lean Season. GiveWell recommended the programme, run by Evidence Action in Bangladesh, beginning in 2017. The intervention subsidised seasonal migration during the lean season, and two earlier randomised controlled trials had found significant effects on migration, income, and consumption. A third RCT, conducted in 2017 and reported to GiveWell in late 2018, found no statistically significant impact on migration. GiveWell removed the recommendation publicly in November 2018, published a detailed account of what the evidence showed and why it had updated, and Evidence Action shut the programme down in 2019.

This is the scientific method applied to philanthropy. The original recommendation was not made carelessly and naively. It was grounded in replicable evidence. And the removal of the charity was not defensive or delayed; it followed the evidence as soon as the evidence arrived. The whole process was documented publicly so donors could see the reasoning at every stage. That’s how it should be done.

Giving What We Can applies similar standards to its own recommendations, updating and revising as evaluator research changes. The output of both organisations at any given moment is not a definitive answer; it is the current best estimate, subject to revision.

I find this more credible, not less, than organisations that do not acknowledge errors. A methodology that cannot be falsified is not rigorous. One that actively invites falsification is.

What this framework does not answer

Running a charity through this framework, cost-effectiveness per DALY, evidence quality, counterfactual impact, room for more funding, changes the answer considerably from what instinct alone would produce. Most well-known UK (and indeed Western) charities perform poorly against it: not because they do bad work, but because they operate in high-income contexts where the cost of each intervention is structurally high. The charities that perform best are, almost without exception, absent from the fundraising solicitations in my inbox or in the mainstream public discourse.

This framework answers how to evaluate charities within a cause area. It is an approach that ought to be taken much more seriously by governments and public institutions alike, and it provides a rigorous (within what the available data allows) and systemic, self-correcting model for individuals to make informed decisions about their charitable giving. But it does not answer which cause areas deserve priority in the first place. Deciding how to weight global health against animal welfare against existential risk against climate change requires a different kind of reasoning, one that involves comparing not just cost-effectiveness but also scope, tractability, and how neglected a problem already is relative to its scale. That framework, and how I personally apply it, is the subject of the next post.

Giving with intention is a five-part series documenting how I think about charitable giving: the framework I use, how I evaluate charities, how I have structured my portfolio across cause areas, and what I have got wrong.

Milan Juza 11/05/2026 Milan Juza 11/05/2026

I gave to charity for eight years before asking if it worked

In 2007, the year our first child was born, I set up a direct debit to Cancer Research UK. It felt meaningful. I had no way of knowing whether it was. That question would take me the better part of a decade to begin answering properly.

Over the years that followed, the portfolio grew the way most people's giving grows: incrementally, emotionally, and without much strategy. A colleague mentioned a charity. A news story prompted a one-off donation. Each choice felt right at the time of making. None were made on the basis of any clear idea what the money was actually achieving. This is, I suspect, how most people give. The intention is genuine. The method is essentially nonexistent.

What changed things was a single short essay, written in 1972, that I heard mentioned on a podcast at roughly the right point in my life.

This is post 1 of 5 of the Giving With Intention series. Read Part 2 and Part 3.

In 2007, the year our first child was born, I set up a standing order to Cancer Research UK. Cancer had appeared in my family, which gave the choice a vague rationale, but beyond that I had no particular reason to choose it over anything else. Neither my family nor my wife's had any tradition of organised giving, no inherited framework for where charitable money should go or why. What we had was a new baby, and with it came a sudden and unfamiliar feeling of obligation. The charity felt serious. A safe, credible bet. The direct debit at the time was £25 a month. It felt meaningful, but I had no way of knowing whether it was.

That question, it turned out, would take me the better part of a decade to begin answering properly. The ten years since have been spent on little else.

On instinct

Becoming a parent changes the way you look at the world in ways that are difficult to predict in advance. I tend to think of myself as a fairly rational person, someone who works things through before acting, but becoming a parent altered my priorities in ways I had not anticipated. I started to notice things I had not paid attention to before. One of those changes was a sharp and immediate consciousness of scale and of responsibility. I had brought a child into a world with a great deal in it that was wrong, a great deal of suffering that was preventable, and a great deal of complacency from people, including myself, who had the means to do something about it but had not done much.

I did not think about this in those terms at the time. What I felt, more precisely, was that it would be dishonest not to act on it in some way. The Cancer Research direct debit was the path of least resistance.

Over the following years, the portfolio grew in the way that most people's charitable giving grows: incrementally, emotionally, and without much strategy. A colleague mentioned a charity. A news story prompted a one-off donation. A direct debit to a second organisation appeared, then a third. Each choice felt right at the time of making. None of them were made on the basis of any clear idea about what the charity was actually achieving with the money.

This is, I suspect, how most people give. The intention is genuine. The method is essentially nonexistent.

On not asking the right question

For a while, not asking felt acceptable. I was giving, which was more than most people around me were doing. The charities I supported were credible, respected, and well-known. Surely the money was going somewhere good. That was the assumption: I had no hard evidence for it, and I was not actively looking for any.

What broke the assumption was not a single dramatic moment. It was a short essay, written in 1972 by a philosopher named Peter Singer, that I heard mentioned on a podcast at roughly the right point in my life. The argument Singer makes is simple enough to state in a sentence: if you can prevent something very bad from happening at no comparable cost to yourself, you are obligated to do it. He illustrates this with the image of a child drowning in a shallow pond. You would not walk past the child to protect your shoes. The fact that the child is a stranger, or that others are nearby who could also help, does not change your obligation.

The argument is discomfiting in proportion to how seriously you take it. I took it seriously enough to sit with it for a while, and over time, it changed something in my thinking. It did not reassure me that what I was doing was sufficient. If anything, it pointed firmly in the opposite direction.

But Singer's argument answers the whether question. It does not answer the how.

The question that followed

Once I accepted that giving substantially was both morally defensible and practically possible, the next question became more technical and considerably harder. If I was going to give, what should I be trying to achieve? How would I know whether I was achieving it? And were some ways of giving dramatically more effective than others?

The answer to the last question, I discovered, is: yes, by orders of magnitude.

This is where the Effective Altruism movement enters the picture, to which Singer is a philosophical forefather. At its most basic, it is a framework for trying to do as much good as possible with whatever resources you commit to giving. It takes seriously the idea that not all charitable giving is equivalent, that the difference between a well-evaluated intervention and a poorly-evaluated one can be the difference between genuinely helping and spending money on good intentions with negligible effect, and that the question "What works?" is both answerable and worth asking before writing a cheque.

For me, that shift in thinking happened roughly ten years ago. The direct debit to Cancer Research UK had been running for the better part of a decade by that point, and I had never once asked what it was achieving. When I started asking, seriously and with effort, the answers were complicated. They changed how I thought about the rest of my giving, and they set off a process of rebuilding the whole approach from the ground up. That process is still running.

Where this series goes

This post is the start of a short series. Over the next four instalments, I want to do something I have not attempted publicly before: document how I actually think about giving now, what framework I use to evaluate charities, how I have structured my giving across cause areas, and what my giving portfolio looks like in concrete terms.

These posts are not advocacy. They are an attempt to document the thinking, show practical examples, and perhaps start a conversation, or prompt someone reading this to act. I have spent a decade building a view on this, and it is worth showing the working.

The question that runs through all of it: if effectiveness can be measured, and it can, at least imperfectly, what does it mean to give without measuring it?

That is where the next post starts.