A bit about Receiver Operator Curves and Cesarean Delivery
In a few posts I have mentioned Reciever Operator Curves (ROC), and a few folks have asked what I mean, so I want to explain it. This is an extremely important concept in medicine, and in decision making in general. Unfortunately, it is also quite complex. So complex in fact, that it is possible to explain an ROC in very high end mathematical speaking, such that few would understand (and yes, it can get over my head as well.) To see this kind of explanation, check out the Wikipedia entry on the ROC. But I want to try to make it a little simpler.
Let’s take the example we have been working with about cesareans for protracted labor, and see if we can think about an ROC for the decision on whether or not to do a cesarean. Consider two populations of women: 1) women who given enough time, will deliver a healthy baby and 2) women who bear a baby who will given enough time, will be injured in utero or will deliver vaginally but injured or dead. Now, consider the decision of whether or not to do a cesarean delivery. If this decision (the test) is to do a cesarean, we would say that the test was positive, and if the decision were to await a vaginal delivery, we would say that test is negative. A cesarean delivery for dystocia done in group 2 would be a correct decision (a true positive). A cesarean delivery done in group 1 would be an incorrect decision (a false positive). Waiting for vaginal delivery in group 1 would be the correct move (a true negative), and waiting for vaginal delivery in group 2 would be the wrong move (a false negative).
So what about ROC. ROC is a graph of the sensitivity of a test versus the inverse of its specificity. OK its getting confusing already, and that’s why ROC is a little hard to understand.
Sensitivity is the likelihood that the test will correctly identify those with a condition (likelihood that babies that need to be delivered by cesarean to be uninjured will get a cesarean), and specificity is the likelihood that the test will correctly identify those without the condition (likelihood that those who will eventually deliver vaginally uninjured will not get a cesarean).
Sensitivity and specificity of a test depend on the cutoff value that one chooses to put on the test – that is where the line is that defines positive versus negative. In a case like iron deficiency which can be defined objectively, we could have a objective cutoff like a ferritin of 100, and decide that those under 100 test positive and those over 100 test negative. Then we could compare those results to some gold standard, like bone marrow iron stores, and decide what the sensitivity and specificity were. We could then look at what they would have been if the cutoff had been 50. And again at 150. And again at 10, and then 20, and then 30, and so on. And when we graphed all those points, what would we have? A ROC!
In the cesarean example, it is a little more obscure, but in some ways more apropos to real medical decision making. In this case, the cutoff is not a objective value, but an internal thought process of how convinced we are going to need to be before we will take action. Are we going to do a cesarean at the first sign of trouble (way out towards sensitivity) or are we going to wait for a really terrible strip, or a woman arrested for 12 hours, before we go to the operating room (way out towards specificity.)
Ulimately the ROC describes just how good a test is. It describes the interplay between sensitivity and specificity, how much of one we have to give up to get some of the other. If a test is great, we may be able to get very high specificity and sensitivity at the same time. If a test is not as great, we may only be able to have one at a time, depending on what cutoff value we choose to use.
Here is an example of a ROC for a typical medical test, where sensitivity and specificity are traded for one another at different thresholds.
At point 6, we have nearly 100% specificity, but only 50% sensitivity. At point 3 we have 70% specificity, but 90% sensitivity.
One can see that the closer that line hugs the left and top parts of the graph, the better the test will be; the more sensitivity and specificity one can simulataneously have.
So why does this all matter in the cesarean section case? Because it demonstrates that there is no absolute to these decisions. Some commenters have tried to make the case that many cesareans are unecessary, and they are of course correct. Some commenters have made the case that most cesareans are necessary, and they are correct as well. It all depends on where you put your cutpoint, and what the ROC for the decision looks like.
If our #1 outcome is to prevent any neonatal injury from intrapartum asphyxia and infection, we could do cesareans for everybody, at the expense of doing many cesareans that were not necessary. That would be running our setpoint all the way on the right side of the ROC. If our goal was to prevent every cesarean but the ones that were obviously necessary, we could run all the way on the left, doing the minimum number, but also failing to do cesareans for babies that might have ultimately needed them.
For this particular problem, we don’t know exactly what the ROC looks like, because we don’t have a gold standard test that can tell us what will happen to a baby if it is or is not delivered by cesarean. But this idea illustrates some of the difference between me and the OB/GYN commenters and some of the midwifery and doula commenters. OB/GYNs tend to run their setpoint further to the right on the ROC, and midwives and doulas prefer to run further on the left. OB/GYNs go for sensitivity, while the midwives and doulas go for specificity.
In complex decision making, this idea is crucial. Any time you change your decision threshold, you will trade sensitivity for specificity. To make a good decision, one has to be honest about what one fears most: a false positive or a false negative. In OB/GYN, we fear the false negative of the baby that needed a cesarean, that we failed to perform. And that’s why we tend to run to the right. Perhaps a little too far, as I have suggested before.
I notice nobody is jumping in on the comments on this one. 😉 But I do want to thank you for posting it.
My take-home from this is that there is a fancy mathematical model to aid decision-making at various cut-off points for a particular treatment/decision/intervention. And that the ideal treatment/decision/intervention would be one that is high in specificity and high in sensitivity, but in real life, you often sacrifice one for the other.
The problem, as you said, is there is that the cut-off for the c-section decision is not well defined, and I have the feeling it may not be easily definable, considering how many factors probably go into a decision that in the end, the doctor probably makes heuristically. Are there any studies that attempt to frame the intervention decision in terms of ROC curves? I’d love to see them.
Thanks again for this post. Learn something new every day.
LikeLike
I think this makes things a bit too simple, insofar as it assumes that all things remain equal other than the outcome (intrapartum asphyxia and infection) and the “test” (CS? – though I’m not sure that CS is rightly described as the test…the analogy seems to be wrong, I would have thought it is the various indicators for CS that are the “tests” and not CS itself).
“If our #1 outcome is to prevent any neonatal injury from intrapartum asphyxia and infection, we could do cesareans for everybody, at the expense of doing many cesareans that were not necessary” I think you gloss over the real contention. The objection is not to “caesareans that were not necessary” (i.e. we have simply performed an unnecessary ‘test’ – like I drew an extra bottle for a blood test at the lab that wasn’t indictaed) but that hiden within that term “unnecessary caesarean” are all the risks for women and babies from caesarean section, that are not balanced by a corresponding benefit.
Sure, there are trade offs between sensitivity and specificity, but these need to be discussed in relation to particular real tests (fetal heartrate monitoring, partograms and definitions of prolonged labour, time from rupture of membranes, maternal risk factors etc…) and not in the abstract as if caesarean section were a “test”.
LikeLike
I’m really at odds as to how to respond to this piece. Not that your explanation of an ROC curve is particularly debatable — you’ve done an excellent job describing it in approachable terms. (Although I do tend to agree with Yehudit’s argument that “the cesarean” is not as appropriate a test as particular tracing patterns, various definitions of prolonged labor, etc.)
I think my “trouble” (if that is the appropriate word for it) stems from the concern that observing where a particular care provider’s decision falls on the curve omits the very pertinent discussion as to where an individual patient’s decision would fall on that same curve. You state the objective nature of this problem succinctly when you say: “Some [midwives and doulas] have tried to make the case that many cesareans are unecessary, and they are of course correct. Some [ob/gyns] have made the case that most cesareans are necessary, and they are correct as well.” To that I would add “some [patients] have tried to make the case that they would rather do X, Y or Z, and they, too are correct.” ( ) 😉
I simply am not convinced that science, mathematics and statistics can point us (collectively) to the “right” decision (on the issue of “to cut or not to cut”) every time, because the individual factors in the decision — including the individual mother and the baby — change, by definition, every.single.time.
LikeLike
Hmmm. Post did not come through appropriately. In between the parentheses should be the words “Please do not insert slippery slope argument here.” (to be appended with the smiley)
LikeLike
You all are right that there can be no specific ROC for cesarean delivery, and in that way it is a bit inappropriate, but it does illustrate the issue of sensitivity/specificity tradeoff in the decision to perform a cesarean delivery (as Ms Morvay notes). There is a ROC, but we can never know what it is as there is no gold standard to know if a cesarean was necessary or not, and therefore no way to accurately construct the curve. Nontheless, ROC concepts are involved in medical decision making, though they may lack numerical values they have when use in objective tests.
My idea is abstract, but it does help to explain how different people can justifiably have different decision making patterns given the same data. We choose different points on that virtual ROC, as we have different concerns regarding sensitivity and specificity.
Ms Morvay quite correctly stated that we all make our own ROC when we make decisions, and this is constructed not rationally but heuristically. While heuristic thinking allows us to operate quickly in a complex world, it also allows our biases to influence our decision making. This is not to say that this can be avoided, but by being aware of this problem we can often identify such biases prospectively and make better decisions.
LikeLike
“heuristic thinking allows us to operate quickly in a complex world, it also allows our biases to influence our decision making.”
Truer words never spoken, my friend.
I’ll just quickly echo what others have said in that all decisions (and medical decisions in particular) involve trade offs. In some cases, the trade offs are clear and the decision is simple. In other cases the trade-offs are complex and well informed people will arrive at drastically different conclusions. That is also fine.
The problem, as you have stated is when we allow our biases to overrule quantifiable decisions and end up with less than optimal choices and actions.
LikeLike
Thanks for posting this and I hope you’ll do more posts about specific statistical concepts. They may not be as sexy as some of the other stuff we all like to blog about, but they’re helpful nonetheless.
With that said, I had the exact same reaction as Yehudit when I read this. I think part of the reason we end up on different points along the ROC curve is that we have perceive and measure the trade-offs differently. I recently commented on a post at the Orthopedic Posterous about how we define success. I think it is relevant here because how we define the risks/tradeoffs of a procedure will inform how we define whether or not a procedure has been successful or worthwhile. In reality, you can get your live baby and live mother discharged home on time with many different approaches to care, but what will fluctuate is the amount of additional risk or minor (often unquantified) injury or waste that you have added to the mix. It seems to me, as someone who is no expert in ROC, that the curve is most helpful in the cesarean context at the ends where it hugs the axes, and less relevant in the middle.
LikeLike
Thanks!
One of the underlying concepts of a ROC is that there is only one for a given test – it is what it is. When we apply it to something like a decision for cesarean, it gets sort of fuzzy though, as in this case it is more of a metaphorical concept than an actual mathematical formula. The ROC for a analytical test is absolute. The ROC for a intellectual decision is more of a concept than an actual thing.
The best ROCs will hug the upper left corner of the graph, allowing one to have both high sensitivity and specificity at the same time. In that sense, the center curve of the graph is actually the most important, as it is the part that is furthest from that upper left corner. It is this point where the most tradeoff is made between sensitivity and specificity.
>>> In reality, you can get your live baby and live mother discharged home on time with many different approaches to care, but what will fluctuate is the amount of additional risk or minor (often unquantified) injury or waste that you have added to the mix.
The concept of a ROC ignores an individual case – in our metaphor it is not about getting any single baby and mother discharged home. It is about the decision making process in general, and how wherever we choose to set our threshold for making a decision, we will always be trading false positives for false negatives and vice versa.
LikeLike
Also, “unnecessary caesareans”- thought of in relation to the ROC curve – may not be the right concept. I have seen very very few, if any, caesareans (or instrumental) deliveries that were truly medically unjustifiable at the time to decision for caesarean was made. However, I have been at many caesareans that, on the basis of current evidence, may been preventable a couple of steps prior to the them becoming necessary. For example, we know that induction with low bishops score increases risk of caesarean section, we know that women are more likely to have a spontaneous vaginal birth if they have intermittent monitoring rather than continuous, we know that 1-2-1 labour support increases spontaneous birth, and that where you place actions lines on a partogram and whether you admit while not in established labour also make a difference. So, in that sense, there are plenty of preventable, caesareans resulting from suboptimal care, which are nonetheless necessary caesareans at the time the decision to go for section is made.
LikeLike
I like the way you put that, and agree in part that at least some cesareans are preventable by changing the starting conditions. Unfavorable inductions are a problem, particularly when there is not a strong maternal indication for delivery. Your other points are well taken as well – and most of them go together. Intermittent monitoring is fine for spontaneous labor, but we get off that track by inducing people and admitting people who are not yet in labor, thus leading us to augment labor with pitocin to get things going. Pre-labor admissions are almost always driven by patient request and dislike of being sent home when they perceive themselves to be in labor. This often originates, though, in a lack of education during the pregnancy about the labor process and when is the best time to get admitted for labor.
LikeLike
It’s interesting. In discussion with colleagues working at different units, I find that prelabour admissions (or not) are in some large measure dependent on unit norms. Where I am, we “encourage” women to go home, but we also will give them a shot of pethidine and a bed on the antenatal ward if they really “won’t” go home. (So, when we discuss how they should go home, that option is always in the back of our mind). Consequently, we have a fair number of women admitted when not in established labour. On the other hand, we are absolutely rigorous about not starting the partogram (and hence any augmentation) until someone is in active labour).
Friends who work in hospitals that simply DO NOT admit in prelabour tell me that women don’t show up until they are in active labour, they don’t come in when they think they ‘might be’ in labour, but when they know they are. Now, I’d like to see the figures from audit, not from anecdote, but they are genuinely incredulous when I tell them about our difficulties in keeping women not in established labour out of the hospital!
I think women do ‘get it’ when you tell them that hospital is not the best place to be in latent phase of labour, you are right – it comes down to education. And the motivation of the caregivers in giving that education.
LikeLike
Just wanted to say I think you did a great job of explaining ROC. For an understandable description of usage and reason you did a better job than the textbook I learned it from!
LikeLike
I agree with your comment that midwives tend to “run to the left” in decision making about whether or not or when to call a C-section. I would agrue that it relates directly to the idea of how success is defined. Most OBs consider any birth that results in a healthy mother and a healthy baby to be succesful. However, there are many mothers who have a healthy baby yet are extremely dissatified with their birth experience, some to the point of experiencing PTSD. Most midwives would argue that a healthy baby and mother is a necessary but insufficient definition of success. The midwifery model of care by definition is concerned that mother’s choices and experience are respected (autonomy) as well as promoting conditions that favor early attachment behaviors and breastfeeding. Most of my clients who have C-sections are convinced that the c-section was necessary because they know that as long as FHT indicate that baby is doing well, we have tried everything possible to promote a vaginal birth before recommending a C-section. I have never had a woman refuse or resist a C-section when FHT indicate that baby is not tolerating labor. Any laboring woman who requests a C-section is immeadiatly evaluated by my back up OB and usually has her C-section. Women who feel that their autonomy has been respected are more likely to feel satisfied with their birth regardless of the method of delivery. Studies have shown that nurse-midwives have very good outcomes with higher maternal satisfaction scores than OBs (aknowledging that there is selection bias in this outcome since midwives care for mostly low-risk clients who choose midwifery care). Nevertheless, rather than trying to change any provider’s individual tolerance/decision point, I think that what will most effectively reduce the incidence of C-sections is avoidance of those factors that we know increase the risk for C-section as discussed above. It is especially important to stand firm on not inducing an unfavorable cervix unless immeadiate delivery is medically indicated!
LikeLike
I think that this blog is really cool .so please visit my this site and get collection information.
LikeLike