[Original post: Book Review - From Oversight To Overkill]

Table of Contents

1: Comments From The Author Of The Book
2: Stories From People In The Trenches
3: Stories From People In Other Industries
4: Stories From People Who Use Mechanical Turk
5: Comments About Regulation, Liability, and Vetocracy
6: Comments About The Act/Omission Distinction
7: Comments About The Applications To AI
8: Other Interesting Comments

1. Comments From The Author Of The Book

Simon Whitney (author of the book!) writes:

Thank you, Scott, for this careful and thought-provoking essay.

Since so many people wonder, the study by Lynn Epstein and Louis Lasagna showed that people who read the short consent form were better at both comprehending the experiment and about realizing that the study drug might be dangerous to them.

Much of this fascinating conversation on ACX is on the theoretical side, and there’s a reason for that. IRBs are ever on the outlook for proposed research that would be unethical—that is why they exist. But there is no national database of proposed experiments to show how many were turned down because they would be abusive. In fact, I know of no individual IRB that even attempts to keep track of this. There are IRBs that are proud they turned down this or that specific protocol, but those decisions are made in private so neither other IRBs nor the public can ever see if they were right. Some IRBs pride themselves on improving the science of the protocols they review, but I know of no IRB that has ever permitted outside review to see if its suggestions actually helped. Ditto for a dozen other aspects of IRB review that could be measured, but are not. It’s a largely data-free zone.

I got an interesting email yesterday from a friend who read my book. She is part of a major enterprise that helps develop new gene therapies. From her point of view, IRBs aren’t really a problem at all. Her enterprise has standard ways of doing business that the IRBs they work with accept. She sees this work with and around big pharma as providing the relatively predictable breakthroughs that will lead to major life-enhancing treatments down the road. This is a world of big money and Big Science, and it’s all about the billions. A new drug costs $2.6 billion to develop; the FDA employs 17,000 people and has a budget of $3.3 billion; the companies involved measure their value and profits in the billions.

The scientists I am speaking for in “From Oversight to Overkill” are lucky when they can cobble together a budget in the millions, and much of the work they do, like Scott’s frustrating project, is entirely unfunded. They are dealing with OHRP, an agency with a budget of $9 million that employs 30 people. Unlike big pharma with its standardized routines, they are trying new approaches that raise new regulatory questions. And because OHRP operates on such a smaller scale, its actions are rarely newsworthy even when they make no sense at all. This includes decisions that suppress the little projects with no funding that people just starting out attempt.

Of course, the smaller budgets of the scientists in my book don’t mean that their findings will be trivial. It has always been true that when myriad scientists work to better understand human health and disease, each in their own way, that the vast majority will make, at most, tiny steps, and that a very few will be on the track of something transformative. A system that makes their work more difficult means that we, the public who struggle with disease and death in our daily lives, are the ones who suffer.

I did accidentally mess up the conclusion of the Lasagna and Epstein study1 - the short consent forms were better, not worse. Most of you caught this from context, but sorry.

2. Stories From People In The Trenches

BladeDoc writes:

I have to tell my consent form story. I was asked to join an ongoing, IRB approved study in order to get control samples of normal skin to compare to samples of melanomas that had already been collected by the primary investigator. The samples were to be 3mm in diameter taken from the edge of an open incision made at the time of another surgery (e.g. I make an incision to fix your hernia and before I close the incision I take a 3mm wide ellipse of extra skin at the edge). There is literally zero extra risk. You could not have told after the closure where the skin was taken. The consent form was 6 pages long (the consent form for the operation itself that could actually have risk was 1 page and included a consent for blood products). I had to read every page to the patient out loud (the IRB was worried that the patients might not be literate and I wasn’t allowed to ask them because that would risk harm by embarrassing them). They had to initial every page and sign at the end.

I attempted to enroll three patients. Every one stopped me at the first page and said they would be happy to sign but they refused to listen to the other 5 pages of boilerplate. The only actual risk of the study seemed to be pissing off the subjects with the consent process itself. I quit after my first clinic day.

I never did any prospective research again as chart review and database dredging was much simpler.

This is one of my favorite stories. It perfectly captures the spirit of IRB requirements, and their exact flavor of “let’s make everything terrible for everyone because of something I can sort of imagining one person in an absurd scenario being psychologically harmed by”

sclmlw writes:

Clinical researcher here. I wanted to comment on this suggestion:

- Let each institution run their IRB with limited federal interference. Big institutions doing dangerous studies can enforce more regulations; small institutions doing simpler ones can be more permissive. The government only has to step in when some institution seems to be failing really badly.

This is kind of already how it goes. Smaller clinical sites tend to use what we call “central IRBs”, which are essentially IRBs for hire. They can pick and choose which IRB best suits their needs. These include IRBs like Advarra and WIRB. Meanwhile, most clinicians at larger academic institutions have to use what we call a “local IRB”, which is the institution-specific board that everything has to go through no matter what. In some cases, they can outsource the use of a ‘central’ IRB, but they still have to justify that decision to their institutional IRB, which still includes a lengthy review process (and the potential the IRB says “no”).

What’s the difference between a central and a local IRB? At least 2x the startup time, but often longer (from 3 months to 6+ months). Partly, this is because a smaller research site can decide to switch from WIRB to Advarra if their review times are too long, so central IRBs have an incentive to not be needlessly obstructive. While a central IRB might meet weekly or sometimes even more than once a week, with local IRBs you’re lucky if they meet more than once a month. Did you miss your submission deadline? Better luck next month. You were supposed to get it in 2 weeks before the board meeting.

But this isn’t the end of the difference between smaller clinics and those associated with large institutions. At many academic centers, before you can submit to the IRB you have to get through the committee phase. Sometimes you’re lucky and you only have one committee, or you maybe you can submit to them all simultaneously. More often, you have to run the gauntlet of sequential committee reviews, with each one taking 2-5 weeks plus comments and responses. There’s a committee to review the scientific benefit of the study (which the IRB will also review), one to review the safety (again, also the IRB’s job), and one to review the statistics (IRB will opine here as well).

In my experience, central IRBs tend to not just have a much faster turn-around time, they also tend to ask fewer questions. Often, those questions are already answered in the protocol, demonstrating that the IRB didn’t understand what they were supposed to be reviewing. I don’t remember ever going back to change the protocol because of an IRB suggestion.

Maybe you could argue that local IRBs are still better for other reasons? I’m not convinced this is the case. We brought in a site through a local IRB on a liver study. It took an extra six months past when most other sites had started (including other local IRB sites - obviously a much more stringent IRB!). Did that translate to better patient safety?

Nope, the opposite happened. One of the provisions of the protocol was that patients would get periodic LFT labs done (liver function tests) to make sure there was no drug-induced liver injury. In cases of elevated LFTs, patients were supposed to come back into the site for a confirmation within 48 hours of receiving the lab results. We were very strict about this, given the nature of the experimental treatment. The treatment period went on for 2 years, so there’s a concern that a long-term treatment might result in long-term damage if you’re not careful.

This site, with its local IRB, enrolled a few patients onto our study. At one point, I visited the site to check on them and discovered the PI hadn’t been reviewing the lab results in a timely manner. Sometimes he’d wait a month or more after a patient’s results came in to assess the labs. Obviously they couldn’t follow the protocol and get confirmatory LFT draws in time. Someone with a liver injury could continue accumulating damage to this vital organ without any intervention, simply because the PI wasn’t paying attention to the study. I was concerned, but these studies can sometimes be complicated so I communicated the concern - and the reason it was important - to the PI. The PI agreed he’d messed up and committed to do better.

When I came back, six months later, I discovered things had gotten worse, not better. There were multiple instances of patients with elevated LFTs, including one instance of a critical lab value. NONE of the labs had been reviewed by anyone at the site since I visited last. They hadn’t even pulled the reports from the lab. There was nobody at the wheel, but patients kept getting drug so the site could keep getting paid.

Since it’s not our job to report this kind of thing to the IRB, we told them to do it. We do review what they report, though, so we made sure they told the whole story to the IRB. These were major, safety-related protocol violations. They did the reporting. The PI blamed the whole fiasco on one of his low-paid research coordinators - one who hadn’t actually been working on the study at the time, but the IRB didn’t ask for details, so the PI could pretty much claim whatever and get away with it. The PI then said he’d let that guy go, so problem solved. The hutzpah of that excuse was that it’s not the coordinator’s job to review lab reports, it’s the PI’s job. This would be like claiming the reason you removed the wrong kidney is because you were relying on one of the nurses to do the actual resection and she did it wrong. The obvious question should have been WTF was the nurse doing operating on the patient!?! Isn’t that your job? Why weren’t you doing your job?

What was the IRB’s response to this gross negligence that put patient safety in danger? They ACKNOWLEDGED RECIEPT of the protocol violation and that was the end of it. They didn’t censure the PI, or ask further questions, or anything. If ‘strict IRBs’ were truly organized in the interest of patient safety, that PI would not be conducting any more research. We certainly put him on our list of investigators to NEVER use again. But the IRB ignored the whole thing.

I’m not convinced that this is a ‘tradeoff’ between spending a bunch of money to stall research versus saving patients’ lives through more stringent review. I think that the vetocracy isn’t about safety, so much as the illusion of safety.

Thanks, this sounds like an interesting application of polycentric law.

And spandrel writes:

I’m a scientist who does medical research at several top tier institituions. I only do research, and every month or so one of my projects is submitted to an IRB somewhere. I do clinical trials and observational studies, as well as a lot of health system trials (e.g., where we are randomizing doctors or hospitals, not patients). I have a few observations, some of which aren’t consistent with what Scott reports here.

1. I’ve never had an IRB nix a study or require non-trivial modifications to a study. This may be because my colleagues and I are always thinking about consent when we design a study, or it may be because top tier institutions have more effective IRBs. These institutions receive vast amounts of funding for doing research, which may incentivize a more efficient and flexible IRB.

2. I have done some small studies on the order of Scotts questionnaire investigation. For these, and even some larger studies, we start by asking the IRB for a waiver of consent - we make the case that there are no risks, etc, and so no consent is needed. We have always recieved the waiver. Searching PubMed turns up many such trials - here’s a patient randomized trial of antibiotics where the IRB waived the requirement for patient consent: https://pubmed.ncbi.nlm.nih.gov/36898748/ I am wondering if the author discusses such studies where IRBs waive patient consent.

3. There are people working on the problem of how terrible patient consent forms can be. There are guidelines, standards, even measures. And of course research into what sort of patient consent form is maximally useful to patients (which is determined by asking patients). I helped develop a measure of informed consent for elective surgery (not the same thing as a trial, but same problem with consent forms) that is being considered for use in determining payment to providers.

4. Every year or so I have to take a test to be/stay certified for doing human subjects research. Interestingly, all the materials and questions indicate that the idea of patient consent emerged from the Nuremberg Trials and what was discovered there about the malfeasance of Nazi scientists. I’m surprised to hear the (more plausible) sequence of events Scott reports from the book.

5. Technology, especially internet + smartphones, is beginning to change the underlying paradigm of how some research is done. There are organizations which enroll what are essentially ‘subscribers’ who are connected via app and who can elect to participate in what is called ‘distributed’ research. Maybe you have diabetes, so you sign up; you get all the latest tips on managing diabetes, and if someone wants to do a study of a new diabetes drug you get an alert with an option to participate. There is still informed consent, but it is standardized and simplified, and all your data are ready and waiting to be uploaded when you agree. Obviously, there are some concerns here about patient data, but there are many people who want to be in trials, and this supports those people. These kinds of registries are in a sense standardizing the entire process, which will make it easier/harder for IRBs.

While this book sounds very interesting, and like one I will read, it also maybe obscures the vast number of studies that are greenlighted every day without any real IRB objections or concerns.

Regarding (2), see the full story of my IRB experience written up here for how the attempt to get a waiver of consent went.

3. Stories From People In Other Industries

CinnabarTactician writes:

I work at a big tech company and this is depressingly relatable (minus the AIDS, smallpox and death).

Any time something goes wrong with a launch the obvious response is to add extra process that would have prevented that particular issue. And there is no incentive to remove processes. Things go wrong in obvious, legible, and individually high impact ways. Whereas the extra 5% productivity hit from the new process is diffuse, hard to measure, and easy to ignore.

I’ve been trying to launch a very simple feature for months and months, and there are dozens of de facto approvers who can block the whole thing over some trivial issue like the wording of some text or the colour of a button. And these people have no incentive to move quickly.

I’m surprised by this, both because I thought tech had a reputation for “move fast and break things”, and because I would have expected the market to train this out of companies that don’t have to fear lawsuits.

But playing devil’s advocate: at a startup, code changes usually have high upside (you need to build the product fast in order to survive) and low downside (if your site used by 100 people goes down for a few minutes it doesn’t matter very much). At Facebook, code changes have low upside (Facebook will remain a hegemon regardless of whether it introduces a new feature today vs. in a year) and high downside (if you accidentally crash Facebook for an hour, it’s international news). Also, if the design of one button on the Facebook website causes people to use it 0.1% more, that’s probably still a difference of millions of hits - so it’s worth having strong opinions on button design.

Gbdub writes:

Same in defense contracting. Easily half and probably more of the cost and schedule of programs comes from “quality standards” and micromanagement that gives the illusion of competent oversight. Distilling very complicated technical problems into easy to understand but basically useless metrics so paper shufflers and congressional staffers can feel smart and like they know what’s going on is a big part of my job.

When in reality, we learn by screwing up in novel ways - the new process rarely catches any problems because we already learned our lesson and the next screw up will probably be something new and unanticipated. But the cost of the new process stays forever, because no one wants to be the guy that makes “quality controls less rigorous”.

Another situation where you would expect competition to train people out of this, but also another situation where a hegemon might feel tempted to rest on its laurels!

Anya L writes:

This reminds me a lot of a concept in software engineering I read in the google Site Reliability Engineering book, the concept of error budgets as a way to resolve the conflict of interest between progress and safety.

Normally, you have devs, who want to improve a product, add new features, and iterate quickly. But change introduces risk, things crash more often, new bugs are found, and so you have a different group whose job it is to make sure things never crash. These incentives conflict, and so you have constant fighting between the second group trying to add new checklists, change management processes, and internal regulations to make release safer, and the first group who try to skip or circumvent these so they can make things. The equilibrium ends up being decided by whoever has more local political power.

The “solution” that google uses is to first define (by business commitee) a non-zero number of “how much should this crash per unit time”. This is common, for contracts, but what is less common is that the people responsible for defending this number are expected to defend it from both sides, not just preventing crashing too often but also preventing crashing not often enough. If there are too few crashes, then that means there is too much safety and effort should be put on faster change/releases, and that way the incentives are better.

I don’t know how directly applicable this is to the legal system, and of course this is the ideal theory, real implementation has a dozen warts involved, but it seemed like a relevant line of thought.

This is great, thanks.

4. Stories From People Who Use Mechanical Turk

NoIndustry9653 from the subreddit writes:

I don’t work in academics, but I have a positive impression of IRBs from my time as an Amazon MTurk worker. It is very common for researchers to try to defraud such workers in various ways to cut costs (most commonly they fail to understand or care that a rejected hit threatens your ability to continue working and is not only about the few cents paid for it, so rejections should not be arbitrary or used as a way to get a refund). It’s widely reported to be effective to contact a requester’s governing IRB to resolve disputes if you can’t come to an agreement directly, or that even mentioning you know how to contact their IRB often leads to a resolution […]

A rejected hit is when the requester claims you didn’t do the given task correctly and declines to pay you. Requesters can filter workers for eligibility by accepted hit rate, so if you go below 98% it’s really bad, you want to do everything you can to avoid rejected hits, for example there are third party tools to help warn you about requesters with rejection rates that are too high. Naturally Amazon itself has no interest in mediating fairly or at all. […]

I don’t think it’s strongarming to appeal to the only authority which could possibly hold them accountable, especially for legitimate grievances, and there were definitely a lot of those. For example I remember something that would often happen is rejections being sent en-mass with many people reporting getting the same message, which clearly admitted the reason was more that they didn’t want the data anymore and wanted a refund, than any specific mistake on the part of individual workers. At the time this was my only source of income, so the threat of this sort of thing getting me de-facto banned from the platform was a big deal.

At the same time I don’t think most mturk workers would try to use an IRB to get out of a rejection that’s the result of their own mistakes. There seemed to be consensus in the community that rejections for things like missing complex attention checks were legitimate and you just have to try to avoid making mistakes, and I don’t think anyone regarded IRBs as get out of jail free cards, more a last resort effort to maybe get justice when being outright scammed.

OwenQuillion gives a longer explanation:

I also fooled with MTurk for a while a few years ago (pre-pandemic), so this is based on sentiments from that point in time (though I doubt it’s changed much). If for whatever reason you want to look deeper into this, I’d suggest looking up Turkopticon, a group that I think has made a little headway in managing all this nonsense.

Since I sort of wound up editorializing, here are some bullet points:

  • A ‘rejected HIT’ is a task that the requestor declines to pay the worker for. This can be for any reason, legit or not. Amazon does not mediate at all. Disputes about rejected hits can also only be made for a month.

  • Amazon offers no recourse for dealing with bad-faith requestors, and poor communication tools for misunderstandings. This often leaves complaining to the IRB as the worker’s only option.

  • Amazon recommends requestors filter for workers that have a ~98+% acceptance rate.

  • ‘Batch HITs’ (quick simple tasks, e.g. image moderation, sentiment analysis) are more desirable than surveys, and a ‘mass rejection’ on these can easily tank one’s acceptance rate. Thus workers are very protective of their acceptance rate.

  • Foreign workers gaining access to US MTurk accounts and defrauding requestors was absolutely rampant in 2018 at least. Obviously this puts requestors on guard.

  • Finally, most of the surveys in question are just a series of basic psychology scales or tasks both the worker and average SSC reader are very familiar with. I suspect many of them are administered by students as practice rather than ‘serious’ research.

As the other poster said, rejected HITs are just any task the requestor declines for any reason. A worker’s acceptance rate is extremely important - one of the few pieces of advice Amazon seems to give requestors is to filter for 98% or 99% acceptance rate. It’s probably pretty reasonable for surveys - if you can’t get 99 out of 100 of those filled out acceptably (assuming good faith by the requestors), maybe you should be filtered. It’s also worth noting that Amazon makes communication difficult, and that rejected HITs can only be reversed for like a month - after that, they’re permanently on your record.

It’s also probably worth restating: if a worker goes below the high 90s, they’ll have access to fewer tasks, likely from less reputable requestors, and they’ll need to do 100 of these to offset every rejection. And the worker is at much greater risk of being dug deeper into that hole by requestors rejecting their work in bad faith with no recourse - part of why surveys are popular is because the IRB can bludgeon requestors into accountability.

Most of the surveys in question are also are the crumbs that filter through the grasping pedipalps of the hordes of workers (and their scripts). If people are seriously using MTurk to monetize their time, they’re likely looking for ‘batch HITs’ - the sort of thing where there’s hundreds or thousands of tasks that can be quickly repeated (moderating images, 3 cents for a sentiment analysis, a couple quarters to outline a car in an image, etc.)

Of course, this mana from heaven rarely lasts long, and the worker always takes a risk - ‘if I do 100 of these, and this is an unscrupulous requestor, well - I better have ten thousand accepted HITs under my belt.’ That’s why workers are so protective of their acceptance rate.

Back to surveys - again as the other poster replied, most of what the average MTurk worker will see is probably a psychology study questionnaire with a series of whatever common scales, attention checks, and other tricks the worker has probably seen at least dozens if not hundreds of times by now. They often pay Amazon’s princely sum of about 10 cents per (expected) minute - based on the minimum wage in whatever benighted 00s year Amazon Mechanical Turk launched. Anecdotally, it also seems like a lot of these are from students - probably just practice research by someone who likely has less experience with the platform than the worker themselves.

The problem the requestor has - at least as of ~2018 - is that there is a lot of fraud with foreign workers getting access to MTurk accounts and submitting totally garbo data, often very quickly. Based purely on a ‘time to complete’ metric, this is hard to distinguish from a legit worker who has filled out hundreds of these and is looking to maximize how many pennies they get for their minutes. It also wasn’t uncommon for workers to ‘cook’ such a survey - letting it sit at the end screen before submitting - just to avoid getting pinged for finishing it quickly.

As for how this all ties back into Institutional Review Boards - well, yeah, griping to the IRB is often the MTurk worker’s only recourse. Amazon just doesn’t care, and as I recall a lot of requestors don’t even know workers can contact them - and as mentioned there’s a narrow time window to discuss rejected HITs before they become permanent. On the other hand, in a lot of cases this is basically a reddit mob complaining that a student doling out dimes screwed up their understanding of MTurk’s arcane inner workings, and that’s in the case that the workers aren’t actually trying to defraud them for said dimes.

5. Comments About Regulation, Liability, and Vetocracy

CatCube writes:

I think the fundamental problem is that you cannot separate the ability to make a decision from the ability to make a wrong decision. However, our society–pushed by the regulator/lawyer/journalist/administrator axis you discuss–tries to use detailed written rules to prevent wrong decisions from being made. But, because of the decision/wrong decision inseparability thing, the consequences are that nobody has the ability to make a decision.

This is ultimately a political question. It’s not wrong, precisely, or right either. It’s a question of value tradeoffs. Any constraint you put on a course of action is necessarily something that you value more than the action, but this isn’t something people like to admit or hear voiced aloud. If you say, “We want to make sure that no infrastructure project will drive a species to extinction”, then you are saying that’s more important than building infrastructure. Which can be a defensible decision! But if you keep adding stuff–we need to make sure we’re not burdening certain races, we need to make sure we’re getting input from each neighborhood nearby, etc.–you can eventually end up overconstraining the problem, where there turns out to be no viable path forward for a project. This is often a consequence of the detailed rules to prevent wrong decisions.

But because we can’t admit that we’re valuing things more than building stuff (or doing medical research, I guess?), we as a society just end up sitting and stewing about how we seemingly can’t do anything anymore. We need to either: 1) admit we’re fine with crumbling infrastructure, so long as we don’t have any environmental, social, etc., impacts; or 2) decide which of those are less important and streamline the rules, admitting that sometimes the people who are thus able to make a decision are going to screw it up and do stuff we ultimately won’t like.

Darwin on why safetyism expanded just as the neoliberals were trying to decrease government regulation:

Without the excuse of ‘we were following all of the very strict and explicit regulations, so the bad thing that happened was a freak accident and not our fault’ to rely on, companies had to take safety and caution and liability limitation and PR management into their own hands in a much more serious way.

And without the confidence in very strict and explicit regulations to limit the bad things companies might do, and without democratically-elected regulators as a means to bring complaint and affect change, we became much more focused on seeking remedy for corporate malfeasance by suing companies into oblivion and destroying them in the court of public opinion.

Basically, government actually can do useful things, as it turns out.

One of the useful things it can do is be a third party to a dispute between two people or entities, such as ‘corporations’ and ‘citizens’, and use it’s power to legibly and credibly ensure cooperation by explicitly specifying what will be considered defection and then punishing it harshly. This actually allows the two parties, which might otherwise be in conflict, to trust each other much more and cooperate much better, because their incentives have been shifted by a third party to make defection more costly.

Without government playing that role, you can fall back into bad equilibrium of distrust and warring, which in this case might look like a wary populace ready to sue and decry at the slightest excuse, and paranoid corporations going overboard on caution and PR to shield from that.

Meadow Freckle writes:

Why can’t you sue an IRB for killing people for blocking research? You can clearly at least sometimes activist them into changing course. But their behavior seems sue-worthy in these examples, and completely irresponsible. We have negligence laws in other areas. Is there an airtight legal case that they’re beyond suing, or is it just that nobody’s tried?

I don’t know, and this seems like an important question.

And Donald writes:

Why do we need special rules for medicine?

The law has rules about what dangerous activities people are allowed to consent to, for example in the context of dangerous sports or dangerous jobs. Criminal and civil trials in this context seem to be a fairly functional system. If Doctors do bad things, they can stand in the accused box in court and get charged with assault or murder, with the same standards applied as are applied to everyone else. If there need to be exceptions, they should be exceptions of the form “doctors have special permission to do X”.

I do want to slightly defend something IRB-like here.

When a doctor asks you to be part of a study, they’re implicitly promising that they did their homework, this is a valuable thing to study, and that there’s no obvious reason it should be extremely unsafe. As a patient (who may be uneducated) you have no way of knowing whether or not this promise is true.

Every so often, someone does everything right, and something goes wrong anyway. A drug that everyone reasonably thought would be safe and effective turns out to have unpredictable side effects - this is part of why we have to do studies in the first place. If every time this happened, a doctor had to stand trial for assault/murder, nobody would ever study new drugs. Trials are a crapshoot, and juries tend to rule against doctors on the grounds that the disabled/dead patient is very sympathetic and everyone knows doctors/hospitals are rich and can give them infinite money as damages. There is no way for an average uneducated jury to distinguish between “doctor did their homework and got unlucky” and “doctor did an idiotic thing”. Either way, the prosecution can find “expert witnesses” to testify, for money, that you were an idiot and should have known the study would fail.

In order to remove this risk, you need some standards for when a study is safe, so that if people sue you, you can say “I was following the standards and everyone else agreed with me that this was good” and then the lawsuit will fail. Right now those standards are “complied with an IRB”. This book is arguing that the IRB’s standards are too high, but we can’t cut the IRB out entirely without some kind of profound reform of the very concept of lawsuits, and I don’t know what that reform would look like.

6. Comments About The Act/Omission Distinction

jumpingjacksplash writes:

I think you’ve unintentionally elided two distinct points: first, that IRBs are wildly inefficient and often pointless within the prevailing legal-moral normative system (PLMNS); second, that IRBs are at odds with utilitarianism.

Law in Anglo-Saxon countries, and most people’s opinions, draw a huge distinction between harming someone and not helping them. If I cut you with a knife causing a small amount of blood loss and maybe a small scar, that’s a serious crime because I have an obligation not to harm you. If I see a car hurtling towards you that you’ve got time to escape from if you notice it, but don’t shout to warn you (even if I do this because I don’t like you), then that’s completely fine because I have no obligation to help you. This is the answer you’d get from both Christianity and Liberalism (in the old-fashioned/European sense of the term, cf. American Right-Libertarianism). Notably, in most Anglo-Saxon legal systems, you can’t consent to be caused physical injury.

Under PLMNS, researchers should always ask people if they consent to using their personal data in studies which are purely comparing data and don’t change how someone will be treated. For anything that affects what medical treatment someone will or won’t receive, you’d at least have to give them a full account of how their treatment would be different and what the risks of that are. If there’s a real risk of killing someone, or permanently disabling them, you probably shouldn’t be allowed to do the study even if all the participants give their informed consent. This isn’t quite Hans Jonas’ position, but it cashes out pretty similarly.

That isn’t to say the current IRB system works fine for PLMNS purposes; obviously there’s a focus on matters that are simply irrelevant to anything anyone could be rationally concerned with. But if, for example, they were putting people on a different ventilator setting than they otherwise would, and that risked killing the patient, then that probably shouldn’t be allowed; the fact that it might lead to the future survival of other, unconnected people isn’t a relevant consideration, and nor is “the same number of people end up on each ventilator setting, who cares which ones it is” because under PLMNS individuals aren’t fungible.

Under utilitarianism, you’d probably still want some sort of oversight to eliminate pointless yet harmful experiments or reduce unnecessary harm, but it’s not clear why subjects’ consent would ever be a relevant concern; you might not want to tell them about the worst risks of a study, as this would upset them. The threshold would be really low, because any advance in medical science could potentially last for centuries and save vastly more people than the study would ever involve. The problem is, as is always the case for utilitarianism, this binds you to some pretty nasty stuff; I can’t work out whether the Tuskegee experiment’s findings have saved any lives, but Mengele’s research has definitely saved more people than he killed, and I’d be surprised if that didn’t apply to Unit 731 as well. The utilitarian IRB would presumably sign off on those. More interestingly, it might have to object to a study where everyone gives informed consent but the risk of serious harm to subjects is pretty high, and insist that it be done on people whose quality of life will be less affected if it goes wrong (or whose lower expected utility in the longer term makes their deaths less bad) such as prisoners or the disabled.

The starting point to any ideal system has to be setting out what it’s trying to achieve. Granted, if you wanted reform in the utilitarian direction, you probably wouldn’t advocate a fully utilitarian system due to the tendency of the general public to recoil in horror.

I want to stress how far we are away from “do experiments without patient’s consent” here - a much more common problem is that patients really want to be in experiments, and the system won’t allow it. This is most classic in studies on cancer, where patients really want access to experimental drugs and IRBs are constantly coming up with reasons not to give it to them. Jonas argued that all cancer studies should be banned because it’s impossible to consent when you’re desperate to survive, which isn’t the direction I would have taken that particular example in. But there are other examples - during COVID, lots of effective altruists stepped up to be in human challenge trials that would have gotten the vaccines tested faster, but the government wouldn’t allow them to participate.

I would honestly be happy with a system that counts the harm of denying a patient’s ability to consent to an experiment they really want to be in as a negative, forget about any lives saved.

And JDK writes:

I haven’t finished reading by felt compelled to comment on this:

“the stricter IRB system in place since the

’90s probably only prevents a single-digit number of deaths per decade, but causes tens of thousands more by preventing lifesaving studies.”

No. It does NOT “cause” deaths. We can’t go down this weird path of imprecision about what “causing” means.

I’ve been examining Ivan Illich, “Medical Nemesis” recently. By claiming IRBs which stop research ostensibly CAUSE death strikes me as cultural iatrogenesis masquerading as a cure for clinical iatrogenesis. […] “Might have been saved if” is not the same as “death was caused by”.

This seems to me to be a weird and overly metaphysical nitpick.

Suppose a surgeon is operating on someone. In the process, they must clamp a blood vessel - this is completely safe for one minute, but if they leave it clamped more than one minute, the patient dies. They clamp it as usual, but I rush into the operating room and forceably restrain the surgeon and all the staff. The surgeon is unable to remove the clamp and the patient dies.

I (and probably the legal system) would like to be able to say I caused the patient’s death in this scenario. But it sounds like JDK is saying I have to say the surgeon caused the patient’s death and I was only tangentially involved.

Here’s another example; suppose the US government bans all food production - farmers, hunters, fishermen, etc are forbidden from doing their jobs. After a few months, everyone starves to death. I might want to say something like “the US government’s ban on food production killed people”. But by JDK’s reasoning, this is wrong - the government merely prevented farmers and fishermen from saving people (by giving them food so they didn’t starve).

I might want to say something like “Mao’s collective farming policy killed lots of people”. But since this is just a weaker version of hypothetical-Biden’s ban on food, by JDK’s reasoning I can’t do this.

This seems contrary to common usage, common sense, and communicating information clearly. I have never heard any philosopher or dictionary suggest this, so what exactly is the argument?

(JDK has a response here, but I didn’t find it especially enlightening)

7. Comments About The Applications For AI

Metaphysiocrat writes:

People have joked about applying NEPA review to AI capabilities research, but I wonder if some kind of IRB model might have legs (as part of a larger package of capabilities-slowing policy.) It’s embedded in research bureaucracies, we sort of know how to subject institutions to it, and so on.

I can think of seven obvious reasons this wouldn’t work, but at this point I’m getting doomery enough that I feel like we may just have to throw every snowball we have at the train on the off chance one has stopping power.

Zach Stein-Perlman writes:

A colleague of mine is interested in ‘IRBs for AI’– he hasn’t investigated it but has thought about IRB-y stuff in the context of takeaways for AI (https://wiki.aiimpacts.org/doku.php?id=responses_to_ai:technological_inevitability:incentivized_technologies_not_pursued:vaccine_challenge_trials). He’s interested in people’s takes on the topic.

My take: my understanding is that the US can’t technically demand all doctors use IRBs. (Almost) al doctors use IRBs for a combination of a few reasons :

  • The US government demands that everyone who receives federal funding use an IRB, and most doctors get some federal funding.

  • Journals can demand that doctors use an IRB if they want to publish in the journal.

  • The FDA demands that everyone who wants their studies to count for drug trials use an IRB.

  • Most doctors are affiliated with a larger institution (eg hospital, university) that demands all their affiliates use IRBs.

When the book says “the government demanded that IRBs do X”, my understanding is that the government demanded that everyone who wanted to remain linked to the federal funding spigot and the collection of other institutions linked to the federal funding spigot do X.

But I think a lot of AI development is genuinely not linked to the federal funding spigot. If the government passed an IRB law based off how things work in medicine, I think OpenAI could say “We’re not receiving federal funding, we can publish all our findings on the ArXIV, and we don’t care about the FDA, you have no power over us”. I don’t know if some branch of the government has enough power to mandate everyone use IRBs regardless of their funding source.

8. Other Interesting Comments

Rbbb writes:

I was going along nodding my head in general agreement til I got to the part where you said this just like NIMBYism.


This is the near opposite of NIMBYism. When people (to cite recent examples in my neighborhood) rise up to protest building houses on unused land, they do it because they are more or less directly “injured”.

A person who prefers trees instead of town houses across the street is completely different from some institution that wants a dense thicket of regulations to prevent being sued. There is no connection.

I appreciate this correction - NIMBYs, whatever else you think about them, are resisting things they think would hurt them personally, whereas IRBs are often pushing rules that nobody (including themselves) wants or benefits from them.

I still think there’s a useful analogy to be drawn in that they’re both systems designed to care about potential harms but not potential benefits, and so nothing can ever get done.

And DannyK writes:

Like with tax preparation, there is a small but lucrative industry organized around lubricating the IRB process, selling software, etc, and they are strongly opposed to anything that would make their services less necessary.

I’d never heard of this before and would be interested in more information.

  1. Thank you to everyone who pointed out that this study had a funny name.