Aletho News

ΑΛΗΘΩΣ

Gospel science: We found only one-third of published psychology research is reliable – now what?

What does it mean if the majority of what’s published in journals can’t be reproduced?

By Maggie Villiger | The Conversation | August 27, 2015

The ability to repeat a study and find the same results twice is a prerequisite for building scientific knowledge. Replication allows us to ensure empirical findings are reliable and refines our understanding of when a finding occurs. It may surprise you to learn, then, that scientists do not often conduct – much less publish – attempted replications of existing studies.

Journals prefer to publish novel, cutting-edge research. And professional advancement is determined by making new discoveries, not painstakingly confirming claims that are already on the books. As one of our colleagues recently put it, “Running replications is fine for other people, but I have better ways to spend my precious time.”

Once a paper appears in a peer-reviewed journal, it acquires a kind of magical, unassailable authority. News outlets, and sometimes even scientists themselves, will cite these findings without a trace of skepticism. Such unquestioning confidence in new studies is likely undeserved, or at least premature.

A small but vocal contingent of researchers – addressing fields ranging from physics to medicine to economics – has maintained that many, perhaps most, published studies are wrong. But how bad is this problem, exactly? And what features make a study more or less likely to turn out to be true?

We are two of the 270 researchers who together have just published in the journal Science the first-ever large-scale effort trying to answer these questions by attempting to reproduce 100 previously published psychological science findings.

Attempting to re-find psychology findings

Publishing together as the Open Science Collaboration and coordinated by social psychologist Brian Nosek from the Center for Open Science, research teams from around the world each ran a replication of a study published in three top psychology journals – Psychological Science ; Journal of Personality and Social Psychology ; and Journal of Experimental Psychology : Learning, Memory, and Cognition. To ensure the replication was as exact as possible, research teams obtained study materials from the original authors, and worked closely with these authors whenever they could.

Almost all of the original published studies (97%) had statistically significant results. This is as you’d expect – while many experiments fail to uncover meaningful results, scientists tend only to publish the ones that do.

What we found is that when these 100 studies were run by other researchers, however, only 36% reached statistical significance. This number is alarmingly low. Put another way, only around one-third of the rerun studies came out with the same results that were found the first time around. That rate is especially low when you consider that, once published, findings tend to be held as gospel.

The bad news doesn’t end there. Even when the new study found evidence for the existence of the original finding, the magnitude of the effect was much smaller — half the size of the original, on average.

One caveat: just because something fails to replicate doesn’t mean it isn’t true. Some of these failures could be due to luck, or poor execution, or an incomplete understanding of the circumstances needed to show the effect (scientists call these “moderators” or “boundary conditions”). For example, having someone practice a task repeatedly might improve their memory, but only if they didn’t know the task well to begin with. In a way, what these replications (and failed replications) serve to do is highlight the inherent uncertainty of any single study – original or new.

More robust findings more replicable

Given how low these numbers are, is there anything we can do to predict the studies that will replicate and those that won’t? The results from this Reproducibility Project offer some clues.

There are two major ways that researchers quantify the nature of their results. The first is a p-value, which estimates the probability that the result was arrived at purely by chance and is a false positive. (Technically, the p-value is the chance that the result, or a stronger result, would have occurred even when there was no real effect.) Generally, if a statistical test shows that the p-value is lower than 5%, the study’s results are considered “significant” – most likely due to actual effects.

Another way to quantify a result is with an effect size – not how reliable the difference is, but how big it is. Let’s say you find that people spend more money in a sad mood. Well, how much more money do they spend? This is the effect size.

We found that the smaller the original study’s p-value and the larger its effect size, the more likely it was to replicate. Strong initial statistical evidence was a good marker of whether a finding was reproducible.

Studies that were rated as more challenging to conduct were less likely to replicate, as were findings that were considered surprising. For instance, if a study shows that reading lowers IQs, or if it uses a very obscure and unfamiliar methodology, we would do well to be skeptical of such data. Scientists are often rewarded for delivering results that dazzle and defy expectation, but extraordinary claims require extraordinary evidence.

Although our replication effort is novel in its scope and level of transparency – the methods and data for all replicated studies are available online – they are consistent with previous work from other fields. Cancer biologists, for instance, have reported replication rates as low as 11%25%.

We have a problem. What’s the solution?

Some conclusions seem warranted here.

We must stop treating single studies as unassailable authorities of the truth. Until a discovery has been thoroughly vetted and repeatedly observed, we should treat it with the measure of skepticism that scientific thinking requires. After all, the truly scientific mindset is critical, not credulous. There is a place for breakthrough findings and cutting-edge theories, but there is also merit in the slow, systematic checking and refining of those findings and theories.

Of course, adopting a skeptical attitude will take us only so far. We also need to provide incentives for reproducible science by rewarding those who conduct replications and who conduct replicable work. For instance, at least one top journal has begun to give special “badges” to articles that make their data and materials available, and the Berkeley Initiative for Transparency in the Social Sciences has established a prize for practicing more transparent social science.

Better research practices are also likely to ensure higher replication rates. There is already evidence that taking certain concrete steps – such as making hypotheses clear prior to data analysis, openly sharing materials and data, and following transparent reporting standards – decreases false positive rates in published studies. Some funding organizations are already demanding hypothesis registration and data sharing.

Although perfect replicability in published papers is an unrealistic goal, current replication rates are unacceptably low. The first step, as they say, is admitting you have a problem. What scientists and the public now choose to do with this information remains to be seen, but our collective response will guide the course of future scientific progress.

August 29, 2015 Posted by | Corruption, Deception, Science and Pseudo-Science | , | Leave a comment

Five Important Questions About DEA’s Vehicle Surveillance Program

By Rachel Levinson-Waldman | Just Security | January 30, 2015

With each week, we seem to learn about a new government location tracking program. This time, it’s the expanded use of license plate readers. According to the Wall Street Journal, relying on interviews with officials and documents obtained by the ACLU through a FOIA request, the Drug Enforcement Administration has been collecting hundreds of millions of records about cars traveling on U.S. roads. The uses for the data sound compelling: combating drug and weapons trafficking and finding suspects in serious crimes. But as usual, the devil is in the details, and plenty of important questions remain about those details.

First, who approved the program, and under what circumstances? We don’t know. The DEA is an arm of the Department of Justice, so presumably the Attorney General’s office has been involved, but details aren’t yet available. Also unknown is whether there has been any judicial oversight.

Second, are there any limitations on how the data can be used? This is also unknown. The emails obtained by the ACLU indicate that the main purpose of the program was to assist in seizures of cars, money, and other assets, often from people not charged with any crime, a program that has come under withering criticism. But the history of data collection programs is that information collected for one purpose quickly becomes attractive for other purposes. And the more information available (even for proper purposes), the more is available for misuse as well. Indeed, license plate information has been abused in the past, with peaceful protestors’ data shared with the FBI.

Third, how long can it be kept? The article reports that the DEA holds the data for three months, a significant drop from its previous two-year retention period. Much of this data is coming from readers set up by state and local law enforcement, though, and the retention periods for those jurisdictions are an inconsistent patchwork, with deletion times ranging from immediate (Ohio state patrol) to 90 days (Boston) to two years (Los Angeles County) to five years (New York City) to never (New York State Police). This is especially alarming given that a vanishingly small percentage of the millions of license plates scanned are actually connected to any crime or wrongdoing. At the same time, data collected by DEA reportedly goes back to state and local jurisdictions as well, setting up an endless loop of information with inadequate oversight. 

Fourth, where else does the data go? Some of it is sent to fusion centers, which are state- or regional-based hubs that centralize information for sharing among the federal government, states, and private partners. Originally established in the wake of 9/11, fusion centers have largely abandoned their focus on terrorism for want of credible threats; they have instead transformed into an “all threats” model. In the process, they have been roundly criticized for wasting money, contributing little to counterterrorism efforts, and endangering both civil liberties and Privacy Act protections. Maryland and Vermont are known to feed their plate data to fusion centers, and the numbers are likely higher, given fusion centers’ voracity for data.

Finally, which other federal agencies are using license plate readers? We know that the Department of Homeland Security is using them as part of their border enforcement. As of early 2009, nearly 100% of cars crossing the border were scanned with a license plate reader. And both DEA and DHS license plate readers can be coupled with cameras that provide pictures of the occupants of vehicles being scanned.

Of course, the DEA database is only the latest in a string of disclosures that, taken together, reveal a web of powerful surveillance capabilities. Late last year, the Wall Street Journal revealed that the U.S. Marshals Service is using a secretive technology that sweeps up information about thousands of innocent Americans’ cell phones in the process of searching for suspects. As with the license plate reader scheme, little is known about the specifics of this program.

And just last week, USA Today revealed that at least 50 law enforcement agencies, including the FBI and the U.S. Marshals Service, have obtained radar devices that allow them to detect any human movements inside a house, even motion as minimal as breathing, from more than 50 feet away. In at least one case, the device was used without a warrant to case a home for the presence of a suspected parolee.

Senators Chuck Grassley (R-Iowa) and Patrick Leahy (D-Vt.) have already expressed concern about this technology, and it’s hard to see how its use without a warrant passes constitutional muster. As the Tenth Circuit observed in a recently published case weighing the use of the radar technology, the Supreme Court has already disapproved of the use of a thermal imaging device to capture details of life within a home. Perhaps even more salient, the Court earlier established that tracking technology (known as a beeper) cannot be used without a warrant to confirm a person’s presence inside a private home, if obtaining that information would otherwise require entry into the home. It’s a little mystifying that using a high-powered radar for the same purpose would be kosher.

Taken together, these stories suggest a zone of privacy that is narrowing so much as to be almost imperceptible. Separate from the question of how these technologies are actually being used, the breadth of surveillance capabilities they provide are staggering. You can be tracked on the streets; in your home; on your phone; and almost anywhere else. We seem to forever be caught in a kind of vicious cycle: it’s too early to criticize or critique technologies when they’ve just been introduced and there’s no record of misuse, but once they’ve been in place for even a year or two, they take on an air of inevitability. … Full article

Rachel Levinson-Waldman serves as Counsel to the Brennan Center’s Liberty and National Security Program, which seeks to advance effective national security policies that respect constitutional values and the rule of law.

February 2, 2015 Posted by | Civil Liberties, Full Spectrum Dominance | , , , , | 1 Comment