Alexandru Marcoci

Dundee ARG-tech

About

Alexandru Marcoci

I am a Postdoctoral Researcher in the Centre for Argument Technology (ARG-Tech) at the University of Dundee studying collective decision-making and argumentation. My work often crosses traditional disciplinary boundaries and is situated at the intersection of philosophy, public policy and decision theory.

At ARG-Tech I am mainly working on the AHRC-DFG funded Trajectories of conflict: the dynamics of argumentation in the UN Security Council project. You can watch a video that contextualizes our work produced by Pindex and narrated by Stephen Fry here:

I collaborated with the DARPA-funded repliCATS project on using structured expert elicitation techniques to predict the reliability of research in the social & behavioural sciences (including research into COVID-19). You can watch a video that summarises the aims and achievements of the repliCATS project here:

Before coming to ARG-Tech I was a Teaching Assistant Professor and a core faculty member in the Philosophy, Politics and Economics Program at the University of North Carolina, Chapel Hill (UNC) and a Fellow in Political Theory in the Department of Government at the London School of Economics and Political Science (LSE). I have a PhD in Philosophy from LSE (2018). Contact me

Teaching

PHIL 165 Bioethics (UNC: Spring 2020, 2021)

PHIL/POLI/PWAD 272 The Ethics of Peace, War, and Defense (UNC: Spring, Fall 2019; Fall 2020)

PHIL/POLI/ECON 698 Philosophy, Politics and Economics: Capstone Course (UNC: Spring 2020)

PHIL 273 Justice, Rights, and the Common Good: Philosophical Perspectives on Social and Economic Issues (UNC: Fall 2019)

PHIL 273 Philosophical Perspectives on Justice (UNC: Fall 2018)

GV100 Introduction to Political Theory (LSE: 2015-2018)

PH231 Evidence and Policy (LSE: Lent Term 2015)

PH201 Philosophy of Science (LSE: 2013-2014)

PH100 Logic (LSE: 2012-2015)

Publications

Hannah Fraser, Martin Bush, Bonnie Wintle, Fallon Mody, Eden Smith, Anca Hanea, Elliot Gould, Victoria Hemming, Dan Hamilton, Libby Rumpff, David Peter Wilkinson, Ross Pearson, Felix Singleton Thorn, Raquel Ashton, Aaron Willcox, Charles T Gray, Andrew Head, Melissa Ross, Rebecca Groenewegen, Alexandru Marcoci, Ans Vercammen, Timothy H Parker, Rink Hoekstra, Shinichi Nakagawa, David R Mandel, Don van Ravenzwaaij, Marissa McBride, Richard O Sinnott, Peter Vesk, Mark Burgman and Fiona Fidler. Predicting reliability through structured expert elicitation with the repliCATS (Collaborative Assessments for Trustworthy Science) process. Forthcoming in PLoS ONE Abstract

As replications of individual studies are resource intensive, techniques for predicting the replicability are required. We introduce the repliCATS (Collaborative Assessments for Trustworthy Science) process, a new method for eliciting expert predictions about the replicability of research. This process is a structured expert elicitation approach based on a modified Delphi technique applied to the evaluation of research claims in social and behavioural sciences. The utility of processes to predict replicability is their capacity to test scientific claims without the costs of full replication. Experimental data supports the validity of this process, with accuracy that meets or exceeds that of other techniques used to predict replicability while providing additional benefits. The repliCATS process is highly scalable, able to be deployed for both rapid assessment of small numbers of claims, and assessment of high volumes of claims over an extended period through an online elicitation platform. It is available to be implemented in a range of ways and we describe one such implementation. An important advantage of the repliCATS process is that it collects qualitative data that has the potential to assist with problems like understanding the limits of generalizability of scientific claims. The primary limitation of the repliCATS process is its reliance on human-derived predictions with consequent costs in terms of participant fatigue although careful design can minimise these costs. The repliCATS process has potential applications in alternative peer review and in the allocation of effort for replication studies.

Mark Burgman, Rafael Chiaravalloti, Fiona Fidler, Yizhong Huan, Marissa McBride, Alexandru Marcoci, Juliet Norman, Ans Vercammen, Bonnie C. Wintle and Yurong Yu. A toolkit for open and pluralistic conservation science. Forthcoming in Conservation Letters Abstract

Conservation science practitioners seek to pre-empt irreversible impacts on species, ecosystems, and social-ecological systems, requiring efficient and timely action even when data and understanding are unavailable, incomplete, dated, or biased. These challenges are exacerbated by the scientific community's capacity to consistently distinguish between reliable and unreliable evidence, including the recognition of questionable research practices (QRPs, or ‘questionable practices’), which may threaten the credibility of research, including harming trust in well-designed and reliable scientific research. In this paper, we propose a ‘toolkit’ for open and pluralistic conservation science, highlighting common questionable practices and sources of bias and indicating where remedies for these problems may be found. The toolkit provides an accessible resource for anyone conducting, reviewing, or using conservation research, to identify sources of false claims or misleading evidence that arise unintentionally, or through misunderstandings or carelessness in the application of scientific methods and analyses. We aim to influence editorial and review practices and hopefully to remedy problems before they are published or deployed in policy or conservation practice.

Luc Bovens and Alexandru Marcoci. The Gender-Neutral Bathroom: A New Frame and Some Nudges. Forthcoming in Behavioural Public Policy, 1-24 Abstract

Gender-neutral bathrooms are usually framed as an accommodation for trans and other gender non-conforming individuals. In this paper we show that the benefits of gender-neutral bathrooms are much broader. First, our simulations show that gender-neutral bathrooms reduce average waiting times: while waiting times for women go down invariably, waiting times for men either go down or slightly increase depending on usage intensity, occupancy time differentials, and the presence of urinals. Second, our result can be turned on its head: firms have an opportunity to reduce the number of facilities and cut costs by making them all gender-neutral without increasing waiting times. These observations can be used to reframe the gender-neutral bathrooms debate so that they appeal to a larger constituency, cutting across the usual dividing lines in the “bathroom wars”. Finally, there are improved designs and behavioural strategies that can help overcome resistance. We explore what strategies can be invoked to mitigate the objections that gender-neutral bathrooms (1) are unsafe; (2) elicit discomfort; and (3) are unhygienic.

Alexandru Marcoci, Margaret E. Webb, Luke Rowe, Ashley Barnett, Tamar Primoratz, Ariel Kruger, Benjamin Stone, Morgan Saletta, Tim van Gelder, Simon Dennis. (2022). Measuring Quality of General Reasoning. In J. Culbertson, A. Perfors, H. Rabagliati & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022), 3229-3235 Abstract

Machine learning models that automatically assess reasoning quality are trained on human-annotated written products. These “gold-standard” corpora are typically created by prompting annotators to choose, using a forced choice design, which of two products presented side by side is the most convincing, contains the strongest evidence or would be adopted by more people. Despite the increase in popularity of using a forced choice design for assessing quality of reasoning (QoR), no study to date has established the validity and reliability of such a method. In two studies, we simultaneously presented two products of reasoning to participants and asked them to identify which product was ‘better justified’ through a forced choice design. We investigated the criterion validity and inter-rater reliability of the forced choice protocol by assessing the relationship between QoR, measured using the forced choice protocol, and accuracy in objectively answerable problems using naive raters sampled from MTurk (Study 1) and experts (Study 2), respectively. In both studies products that were closer to the correct answer and products generated by larger teams were consistently preferred. Experts were substantially better at picking the reasoning products that corresponded to accurate answers. Perhaps the most surprising finding was just how rapidly raters made judgements regarding reasoning: On average, both novices and experts made reliable decisions in under 15 seconds. We conclude that forced choice is a valid and reliable method of assessing QoR.

Alexandru Marcoci, Ans Vercammen, Martin Bush, Daniel Hamilton, Anca Hanea, Victoria Hemming, Bonnie C. Wintle, Mark Burgman and Fiona Fidler. (2022). Reimagining peer review as an expert elicitation process. BMC Research Notes 15, 127 (SI: Reproducibility and Research Integrity) Abstract

Journal peer review regulates the flow of ideas through an academic discipline and thus has the power to shape what a research community knows, actively investigates, and recommends to policymakers and the wider public. We might assume that editors can identify the ‘best’ experts and rely on them for peer review. But decades of research on both expert decision-making and peer review suggest they cannot. In the absence of a clear criterion for demarcating reliable, insightful, and accurate expert assessors of research quality, the best safeguard against unwanted biases, uneven power distributions and general inefficiencies is to introduce greater transparency and structure into the process. This paper argues that peer review would therefore benefit from applying a series of evidence-based recommendations from the empirical literature on structured expert elicitation. We highlight individual and group characteristics that contribute to higher quality judgements, and elements of elicitation protocols that reduce bias, promote constructive discussion, and enable opinions to be objectively and transparently aggregated.

Ans Vercammen, Alexandru Marcoci and Mark Burgman. (2021). Pre-screening workers to overcome bias amplification in online labour markets. PLoS ONE 16(3), e0249051. Abstract

Groups have access to more diverse information and typically outperform individuals on problem solving tasks. Crowdsolving utilises this principle to generate novel and/or superior solutions to intellective tasks by pooling the inputs from a distributed online crowd. However, it is unclear whether this particular instance of “wisdom of the crowd” can overcome the influence of potent cognitive biases that habitually lead individuals to commit reasoning errors. We empirically test the prevalence of cognitive bias on a popular crowdsourcing platform, examining susceptibility to bias of online panels at the individual and aggregate levels. We then investigate the use of the Cognitive Reflection Test, notable for its predictive validity for real-life reasoning, as a screening tool to improve collective performance. We find that systematic biases in crowdsourced answers are not as prevalent as anticipated, but when they occur, biases are amplified with increasing group size, as predicted by the Condorcet Jury Theorem. The results further suggest that pre-screening individuals with the Cognitive Reflection Test can substantially enhance collective judgement and improve crowdsolving performance.

Diana Popescu and Alexandru Marcoci. (2020). Coronavirus: allocating ICU beds and ventilators based on age is discriminatory. The Conversation, April 22 Lead

Being a member of a certain age group shouldn't be a liability.

Alexandru Marcoci and James Nguyen. (2020). Judgement aggregation in scientific collaborations: The case for waiving expertise. Studies in History and Philosophy of Science Part A 84, 66-74 Abstract

The fragmentation of academic disciplines forces individuals to specialise. In doing so, they become experts over their narrow area of research. However, ambitious scientific projects, such as the search for gravitational waves, require them to come together and collaborate across disciplinary borders. How should scientists with expertise in different disciplines treat each others' expert claims? An intuitive answer is that the collaboration should defer to the opinions of experts. In this paper we show that under certain seemingly innocuous assumptions, this intuitive answer gives rise to an impossibility result when it comes to aggregating the beliefs of experts to deliver the beliefs of a collaboration as a whole. We then argue that when experts' beliefs come into conflict, they should waive their expert status.

Alexandru Marcoci. (2020). Monty Hall saves Dr. Evil: On Elga's restricted principle of indifference. Erkenntnis 85(1), 65-76 Abstract

In this paper I show that Elga's argument for a restricted principle of indifference for self-locating belief relies on the kind of mistaken reasoning that recommends the 'staying' strategy in the Monty Hall problem.

Gregg Willcox, Louis Rosenberg, Mark Burgman and Alexandru Marcoci. (2020). Prioritizing Policy Objectives in Polarized Societies using Artificial Swarm Intelligence. In the Proceedings of the IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA 2020), 1-9 Abstract

Groups often struggle to reach decisions, especially when populations are strongly divided by conflicting views. Traditional methods for collective decision-making involve polling individuals and aggregating results. In recent years, a new method called Artificial Swarm Intelligence (ASI) has been developed that enables networked human groups to deliberate in real-time systems, moderated by artificial intelligence algorithms. While traditional voting methods aggregate input provided by isolated participants, Swarm-based methods enable participants to influence each other and converge on solutions together. In this study we compare the output of traditional methods such as Majority vote and Borda count to the Swarm method on a set of divisive policy issues. We find that the rankings generated using ASI and the Borda Count methods are often rated as significantly more satisfactory than those generated by the Majority vote system (p<0.05). This result held for both the population that generated the rankings (the “in-group”) and the population that did not (the “out-group”): the in-group ranked the Swarm prioritizations as 9.6% more satisfactory than the Majority prioritizations, while the out-group ranked the Swarm prioritizations as 6.5% more satisfactory than the Majority prioritizations. This effect also held even when the out-group was subject to a demographic sampling bias of 10% (i.e. the out-group was composed of 10% more Labour voters than the in-group). The Swarm method was the only method to be perceived as more satisfactory to the “out-group” than the voting group.

Alexandru Marcoci and James Nguyen. (2019).Objectivity, ambiguity and theory choice. Erkenntnis 84(2), 343–357 Abstract

Kuhn argued that scientific theory choice is, in some sense, a rational matter, but one that is not fully determined by shared objective scientific virtues like accuracy, simplicity, and scope. Okasha imports Arrow's impossibility theorem into the context of theory choice to show that rather than not fully determining theory choice, these virtues cannot determine it at all. If Okasha is right, then there is no function (satisfying certain desirable conditions) from 'preference' rankings supplied by scientific virtues over competing theories (or models, or hypotheses) to a single all-things-considered ranking. This threatens the rationality of science. In this paper we show that if Kuhn's claims about the role that subjective elements play in theory choice are taken seriously, then the threat dissolves.

Alexandru Marcoci, Ans Vercammen and Mark Burgman. (2019). ODNI as an analytic ombudsman: Is Intelligence Community Directive 203 up to the task? Intelligence and National Security 34(2), 205-224 Abstract

In the wake of 9/11 and the war in Iraq, the Office of the Director of National Intelligence adopted Intelligence Community Directive (ICD) 203 – a list of analytic tradecraft standards – and appointed an ombudsman charged with monitoring their implementation. In this paper, we identify three assumptions behind ICD203: (1) tradecraft standards can be employed consistently; (2) tradecraft standards sufficiently capture the key elements of good reasoning; and (3) good reasoning leads to more accurate judgments. We then report on two controlled experiments that uncover operational constraints in the reliable application of the ICD203 criteria for the assessment of intelligence products.

Alexandru Marcoci, Mark Burgman, Ariel Kruger, Elizabeth Silver, Marissa McBride, Felix Singleton Thorn, Hannah Fraser, Bonnie Wintle, Fiona Fidler and Ans Vercammen. (2019). Better together: Reliable application of the post-9/11 and post-Iraq US intelligence tradecraft standards requires collective analysis. Frontiers in Psychology 9, 2634 (SI: Judgment and Decision Making Under Uncertainty) Abstract

Background. The events of 9/11 and the October 2002 National Intelligence Estimate on Iraq's Continuing Programs for Weapons of Mass Destruction precipitated fundamental changes within the US Intelligence Community. As part of the reform, analytic tradecraft standards were revised and codified into a policy document – Intelligence Community Directive (ICD) 203 – and an analytic ombudsman was appointed in the newly created Office for the Director of National Intelligence to ensure compliance across the intelligence community. In this paper we investigate the untested assumption that the ICD203 criteria can facilitate reliable evaluations of analytic products.
Method. Fifteen independent raters used a rubric based on the ICD203 criteria to assess the quality of reasoning of 64 analytical reports generated in response to hypothetical intelligence problems. We calculated the intra-class correlation coefficients for single and group-aggregated assessments.
Results. Despite general training and rater calibration, the reliability of individual assessments was poor. However, aggregate ratings showed good to excellent reliability.
Conclusions. Given that real problems will be more difficult and complex than our hypothetical case studies, we advise that groups of at least three raters are required to obtain reliable quality control procedures for intelligence products. Our study sets limits on assessment reliability and provides a basis for further evaluation of the predictive validity of intelligence reports generated in compliance with the tradecraft standards.

Alexandru Marcoci. (2018). On a dilemma of redistribution. Dialectica 72(3), 453-460 Abstract

McKenzie Alexander presents a dilemma for a social planner who wants to correct the unfair distribution of an indivisible good between two equally worthy individuals or groups: either she guarantees a fair outcome, or she follows a fair procedure (but not both). In this paper I show that this dilemma only holds if the social planner can redistribute the good in question at most once. To wit, the bias of the initial distribution always washes out when we allow for sufficiently many redistributions.

Luc Bovens and Alexandru Marcoci. (2018). Gender-neutral restrooms require new (choice) architecture. Behavioural Public Policy Blog, April 17 Lead

"What’s not to love about gender-neutral restrooms?" ask Bovens and Marcoci. Their spread could only come about trough a sensitive mix of good design and nudges; working on social norms and behaviours. Some discomforts may, however, prove to be beyond nudging, and an incremental, learning approach is probably required.

Luc Bovens and Alexandru Marcoci. (2017). To those who oppose gender-neutral toilets: they’re better for everybody. The Guardian, December 1 Lead

Bovens and Marcoci's research into the economics of these facilities shows they cut waiting for women, and address the concerns of trans and disabled people.

Alexandru Marcoci and James Nguyen. (2017). Scientific rationality by degrees. In M. Massimi, J.W. Romeijn, and G. Schurz (Eds.), EPSA15 Selected Papers. European Studies in Philosophy of Science, Vol. 5 (Cham: Springer), 321-333 Abstract

In a recent paper, Samir Okasha imports Arrow's impossibility theorem into the context of theory choice. He shows that there is no function (satisfying certain desirable conditions) from profiles of preference rankings over competing theories, models or hypotheses provided by scientific virtues to a single all-things-considered ranking. This is a prima facie threat to the rationality of theory choice. In this paper we show this threat relies on an all-or-nothing understanding of scientific rationality and articulate instead a notion of rationality by degrees. The move from all-or-nothing rationality to rationality by degrees will allow us to argue that theory choice can be rational enough.

Alexandru Marcoci. (2015). Review of Quitting Certainties: A Bayesian Framework Modeling Degrees of Belief, by Michael G. Titelbaum. Economics and Philosophy 31(1), 194–200

Zoé Christoff, Paolo Galeazzi, Nina Gierasimczuk, Alexandru Marcoci and Sonja Smets (Eds.). Logic and Interactive RAtionality Yearbook 2012: Volumes 1 & 2. Institute for Logic Language and Computation, University of Amsterdam

Alexandru Baltag, Davide Grossi, Alexandru Marcoci, Ben Rodenhäuser and Sonja Smets (Eds.). Logic and Interactive RAtionality Yearbook 2011. Institute for Logic Language and Computation, University of Amsterdam

Centre for Argument Technology, School of Science & Engineering (Computing), University of Dundee, Dundee DD1 4HN, United Kingdom.