Empirical evidence: A definition

Empirical evidence is information that is acquired by observation or experimentation.

Scientists in a lab

The scientific method

Types of empirical research, identifying empirical evidence, empirical law vs. scientific law, empirical, anecdotal and logical evidence, additional resources and reading, bibliography.

Empirical evidence is information acquired by observation or experimentation. Scientists record and analyze this data. The process is a central part of the scientific method , leading to the proving or disproving of a hypothesis and our better understanding of the world as a result.

Empirical evidence might be obtained through experiments that seek to provide a measurable or observable reaction, trials that repeat an experiment to test its efficacy (such as a drug trial, for instance) or other forms of data gathering against which a hypothesis can be tested and reliably measured. 

"If a statement is about something that is itself observable, then the empirical testing can be direct. We just have a look to see if it is true. For example, the statement, 'The litmus paper is pink', is subject to direct empirical testing," wrote Peter Kosso in " A Summary of Scientific Method " (Springer, 2011).

"Science is most interesting and most useful to us when it is describing the unobservable things like atoms , germs , black holes , gravity , the process of evolution as it happened in the past, and so on," wrote Kosso. Scientific theories , meaning theories about nature that are unobservable, cannot be proven by direct empirical testing, but they can be tested indirectly, according to Kosso. "The nature of this indirect evidence, and the logical relation between evidence and theory, are the crux of scientific method," wrote Kosso.

The scientific method begins with scientists forming questions, or hypotheses , and then acquiring the knowledge through observations and experiments to either support or disprove a specific theory. "Empirical" means "based on observation or experience," according to the Merriam-Webster Dictionary . Empirical research is the process of finding empirical evidence. Empirical data is the information that comes from the research.

Before any pieces of empirical data are collected, scientists carefully design their research methods to ensure the accuracy, quality and integrity of the data. If there are flaws in the way that empirical data is collected, the research will not be considered valid.

The scientific method often involves lab experiments that are repeated over and over, and these experiments result in quantitative data in the form of numbers and statistics. However, that is not the only process used for gathering information to support or refute a theory. 

This methodology mostly applies to the natural sciences. "The role of empirical experimentation and observation is negligible in mathematics compared to natural sciences such as psychology, biology or physics," wrote Mark Chang, an adjunct professor at Boston University, in " Principles of Scientific Methods " (Chapman and Hall, 2017).

"Empirical evidence includes measurements or data collected through direct observation or experimentation," said Jaime Tanner, a professor of biology at Marlboro College in Vermont. There are two research methods used to gather empirical measurements and data: qualitative and quantitative.

Qualitative research, often used in the social sciences, examines the reasons behind human behavior, according to the National Center for Biotechnology Information (NCBI) . It involves data that can be found using the human senses . This type of research is often done in the beginning of an experiment. "When combined with quantitative measures, qualitative study can give a better understanding of health related issues," wrote Dr. Sanjay Kalra for NCBI.

Quantitative research involves methods that are used to collect numerical data and analyze it using statistical methods, ."Quantitative research methods emphasize objective measurements and the statistical, mathematical, or numerical analysis of data collected through polls, questionnaires, and surveys, or by manipulating pre-existing statistical data using computational techniques," according to the LeTourneau University . This type of research is often used at the end of an experiment to refine and test the previous research.

Scientist in a lab

Identifying empirical evidence in another researcher's experiments can sometimes be difficult. According to the Pennsylvania State University Libraries , there are some things one can look for when determining if evidence is empirical:

  • Can the experiment be recreated and tested?
  • Does the experiment have a statement about the methodology, tools and controls used?
  • Is there a definition of the group or phenomena being studied?

The objective of science is that all empirical data that has been gathered through observation, experience and experimentation is without bias. The strength of any scientific research depends on the ability to gather and analyze empirical data in the most unbiased and controlled fashion possible. 

However, in the 1960s, scientific historian and philosopher Thomas Kuhn promoted the idea that scientists can be influenced by prior beliefs and experiences, according to the Center for the Study of Language and Information . 

— Amazing Black scientists

— Marie Curie: Facts and biography

— What is multiverse theory?

"Missing observations or incomplete data can also cause bias in data analysis, especially when the missing mechanism is not random," wrote Chang.

Because scientists are human and prone to error, empirical data is often gathered by multiple scientists who independently replicate experiments. This also guards against scientists who unconsciously, or in rare cases consciously, veer from the prescribed research parameters, which could skew the results.

The recording of empirical data is also crucial to the scientific method, as science can only be advanced if data is shared and analyzed. Peer review of empirical data is essential to protect against bad science, according to the University of California .

Empirical laws and scientific laws are often the same thing. "Laws are descriptions — often mathematical descriptions — of natural phenomenon," Peter Coppinger, associate professor of biology and biomedical engineering at the Rose-Hulman Institute of Technology, told Live Science. 

Empirical laws are scientific laws that can be proven or disproved using observations or experiments, according to the Merriam-Webster Dictionary . So, as long as a scientific law can be tested using experiments or observations, it is considered an empirical law.

Empirical, anecdotal and logical evidence should not be confused. They are separate types of evidence that can be used to try to prove or disprove and idea or claim.

Logical evidence is used proven or disprove an idea using logic. Deductive reasoning may be used to come to a conclusion to provide logical evidence. For example, "All men are mortal. Harold is a man. Therefore, Harold is mortal."

Anecdotal evidence consists of stories that have been experienced by a person that are told to prove or disprove a point. For example, many people have told stories about their alien abductions to prove that aliens exist. Often, a person's anecdotal evidence cannot be proven or disproven. 

There are some things in nature that science is still working to build evidence for, such as the hunt to explain consciousness .

Meanwhile, in other scientific fields, efforts are still being made to improve research methods, such as the plan by some psychologists to fix the science of psychology .

" A Summary of Scientific Method " by Peter Kosso (Springer, 2011)

"Empirical" Merriam-Webster Dictionary

" Principles of Scientific Methods " by Mark Chang (Chapman and Hall, 2017)

"Qualitative research" by Dr. Sanjay Kalra National Center for Biotechnology Information (NCBI)

"Quantitative Research and Analysis: Quantitative Methods Overview" LeTourneau University

"Empirical Research in the Social Sciences and Education" Pennsylvania State University Libraries

"Thomas Kuhn" Center for the Study of Language and Information

"Misconceptions about science" University of California

Sign up for the Live Science daily newsletter now

Get the world’s most fascinating discoveries delivered straight to your inbox.

Alina Bradford

30,000 years of history reveals that hard times boost human societies' resilience

'We're meeting people where they are': Graphic novels can help boost diversity in STEM, says MIT's Ritu Raman

Mysterious L-shaped structure found in Giza cemetery — what is it?

Most Popular

  • 2 2,500-year-old Illyrian helmet found in burial mound likely caused 'awe in the enemy'
  • 3 Space photo of the week: 'God's Hand' leaves astronomers scratching their heads
  • 4 Papua New Guineans, genetically isolated for 50,000 years, carry Denisovan genes that help their immune system, study suggests
  • 5 Massive study of 8,000 cats reveals which breeds live longest
  • 2 Papua New Guineans, genetically isolated for 50,000 years, carry Denisovan genes that help their immune system, study suggests
  • 3 Massive study of 8,000 cats reveals which breeds live longest
  • 4 Why can't we see the far side of the moon?
  • 5 'The most critically harmful fungi to humans': How the rise of C. auris was inevitable

empirical research vs scientific method

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents


Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Scientific Method

Science is an enormously successful human enterprise. The study of scientific method is the attempt to discern the activities by which that success is achieved. Among the activities often identified as characteristic of science are systematic observation and experimentation, inductive and deductive reasoning, and the formation and testing of hypotheses and theories. How these are carried out in detail can vary greatly, but characteristics like these have been looked to as a way of demarcating scientific activity from non-science, where only enterprises which employ some canonical form of scientific method or methods should be considered science (see also the entry on science and pseudo-science ). Others have questioned whether there is anything like a fixed toolkit of methods which is common across science and only science. Some reject privileging one view of method as part of rejecting broader views about the nature of science, such as naturalism (Dupré 2004); some reject any restriction in principle (pluralism).

Scientific method should be distinguished from the aims and products of science, such as knowledge, predictions, or control. Methods are the means by which those goals are achieved. Scientific method should also be distinguished from meta-methodology, which includes the values and justifications behind a particular characterization of scientific method (i.e., a methodology) — values such as objectivity, reproducibility, simplicity, or past successes. Methodological rules are proposed to govern method and it is a meta-methodological question whether methods obeying those rules satisfy given values. Finally, method is distinct, to some degree, from the detailed and contextual practices through which methods are implemented. The latter might range over: specific laboratory techniques; mathematical formalisms or other specialized languages used in descriptions and reasoning; technological or other material means; ways of communicating and sharing results, whether with other scientists or with the public at large; or the conventions, habits, enforced customs, and institutional controls over how and what science is carried out.

While it is important to recognize these distinctions, their boundaries are fuzzy. Hence, accounts of method cannot be entirely divorced from their methodological and meta-methodological motivations or justifications, Moreover, each aspect plays a crucial role in identifying methods. Disputes about method have therefore played out at the detail, rule, and meta-rule levels. Changes in beliefs about the certainty or fallibility of scientific knowledge, for instance (which is a meta-methodological consideration of what we can hope for methods to deliver), have meant different emphases on deductive and inductive reasoning, or on the relative importance attached to reasoning over observation (i.e., differences over particular methods.) Beliefs about the role of science in society will affect the place one gives to values in scientific method.

The issue which has shaped debates over scientific method the most in the last half century is the question of how pluralist do we need to be about method? Unificationists continue to hold out for one method essential to science; nihilism is a form of radical pluralism, which considers the effectiveness of any methodological prescription to be so context sensitive as to render it not explanatory on its own. Some middle degree of pluralism regarding the methods embodied in scientific practice seems appropriate. But the details of scientific practice vary with time and place, from institution to institution, across scientists and their subjects of investigation. How significant are the variations for understanding science and its success? How much can method be abstracted from practice? This entry describes some of the attempts to characterize scientific method or methods, as well as arguments for a more context-sensitive approach to methods embedded in actual scientific practices.

1. Overview and organizing themes

2. historical review: aristotle to mill, 3.1 logical constructionism and operationalism, 3.2. h-d as a logic of confirmation, 3.3. popper and falsificationism, 3.4 meta-methodology and the end of method, 4. statistical methods for hypothesis testing, 5.1 creative and exploratory practices.

  • 5.2 Computer methods and the ‘new ways’ of doing science

6.1 “The scientific method” in science education and as seen by scientists

6.2 privileged methods and ‘gold standards’, 6.3 scientific method in the court room, 6.4 deviating practices, 7. conclusion, other internet resources, related entries.

This entry could have been given the title Scientific Methods and gone on to fill volumes, or it could have been extremely short, consisting of a brief summary rejection of the idea that there is any such thing as a unique Scientific Method at all. Both unhappy prospects are due to the fact that scientific activity varies so much across disciplines, times, places, and scientists that any account which manages to unify it all will either consist of overwhelming descriptive detail, or trivial generalizations.

The choice of scope for the present entry is more optimistic, taking a cue from the recent movement in philosophy of science toward a greater attention to practice: to what scientists actually do. This “turn to practice” can be seen as the latest form of studies of methods in science, insofar as it represents an attempt at understanding scientific activity, but through accounts that are neither meant to be universal and unified, nor singular and narrowly descriptive. To some extent, different scientists at different times and places can be said to be using the same method even though, in practice, the details are different.

Whether the context in which methods are carried out is relevant, or to what extent, will depend largely on what one takes the aims of science to be and what one’s own aims are. For most of the history of scientific methodology the assumption has been that the most important output of science is knowledge and so the aim of methodology should be to discover those methods by which scientific knowledge is generated.

Science was seen to embody the most successful form of reasoning (but which form?) to the most certain knowledge claims (but how certain?) on the basis of systematically collected evidence (but what counts as evidence, and should the evidence of the senses take precedence, or rational insight?) Section 2 surveys some of the history, pointing to two major themes. One theme is seeking the right balance between observation and reasoning (and the attendant forms of reasoning which employ them); the other is how certain scientific knowledge is or can be.

Section 3 turns to 20 th century debates on scientific method. In the second half of the 20 th century the epistemic privilege of science faced several challenges and many philosophers of science abandoned the reconstruction of the logic of scientific method. Views changed significantly regarding which functions of science ought to be captured and why. For some, the success of science was better identified with social or cultural features. Historical and sociological turns in the philosophy of science were made, with a demand that greater attention be paid to the non-epistemic aspects of science, such as sociological, institutional, material, and political factors. Even outside of those movements there was an increased specialization in the philosophy of science, with more and more focus on specific fields within science. The combined upshot was very few philosophers arguing any longer for a grand unified methodology of science. Sections 3 and 4 surveys the main positions on scientific method in 20 th century philosophy of science, focusing on where they differ in their preference for confirmation or falsification or for waiving the idea of a special scientific method altogether.

In recent decades, attention has primarily been paid to scientific activities traditionally falling under the rubric of method, such as experimental design and general laboratory practice, the use of statistics, the construction and use of models and diagrams, interdisciplinary collaboration, and science communication. Sections 4–6 attempt to construct a map of the current domains of the study of methods in science.

As these sections illustrate, the question of method is still central to the discourse about science. Scientific method remains a topic for education, for science policy, and for scientists. It arises in the public domain where the demarcation or status of science is at issue. Some philosophers have recently returned, therefore, to the question of what it is that makes science a unique cultural product. This entry will close with some of these recent attempts at discerning and encapsulating the activities by which scientific knowledge is achieved.

Attempting a history of scientific method compounds the vast scope of the topic. This section briefly surveys the background to modern methodological debates. What can be called the classical view goes back to antiquity, and represents a point of departure for later divergences. [ 1 ]

We begin with a point made by Laudan (1968) in his historical survey of scientific method:

Perhaps the most serious inhibition to the emergence of the history of theories of scientific method as a respectable area of study has been the tendency to conflate it with the general history of epistemology, thereby assuming that the narrative categories and classificatory pigeon-holes applied to the latter are also basic to the former. (1968: 5)

To see knowledge about the natural world as falling under knowledge more generally is an understandable conflation. Histories of theories of method would naturally employ the same narrative categories and classificatory pigeon holes. An important theme of the history of epistemology, for example, is the unification of knowledge, a theme reflected in the question of the unification of method in science. Those who have identified differences in kinds of knowledge have often likewise identified different methods for achieving that kind of knowledge (see the entry on the unity of science ).

Different views on what is known, how it is known, and what can be known are connected. Plato distinguished the realms of things into the visible and the intelligible ( The Republic , 510a, in Cooper 1997). Only the latter, the Forms, could be objects of knowledge. The intelligible truths could be known with the certainty of geometry and deductive reasoning. What could be observed of the material world, however, was by definition imperfect and deceptive, not ideal. The Platonic way of knowledge therefore emphasized reasoning as a method, downplaying the importance of observation. Aristotle disagreed, locating the Forms in the natural world as the fundamental principles to be discovered through the inquiry into nature ( Metaphysics Z , in Barnes 1984).

Aristotle is recognized as giving the earliest systematic treatise on the nature of scientific inquiry in the western tradition, one which embraced observation and reasoning about the natural world. In the Prior and Posterior Analytics , Aristotle reflects first on the aims and then the methods of inquiry into nature. A number of features can be found which are still considered by most to be essential to science. For Aristotle, empiricism, careful observation (but passive observation, not controlled experiment), is the starting point. The aim is not merely recording of facts, though. For Aristotle, science ( epistêmê ) is a body of properly arranged knowledge or learning—the empirical facts, but also their ordering and display are of crucial importance. The aims of discovery, ordering, and display of facts partly determine the methods required of successful scientific inquiry. Also determinant is the nature of the knowledge being sought, and the explanatory causes proper to that kind of knowledge (see the discussion of the four causes in the entry on Aristotle on causality ).

In addition to careful observation, then, scientific method requires a logic as a system of reasoning for properly arranging, but also inferring beyond, what is known by observation. Methods of reasoning may include induction, prediction, or analogy, among others. Aristotle’s system (along with his catalogue of fallacious reasoning) was collected under the title the Organon . This title would be echoed in later works on scientific reasoning, such as Novum Organon by Francis Bacon, and Novum Organon Restorum by William Whewell (see below). In Aristotle’s Organon reasoning is divided primarily into two forms, a rough division which persists into modern times. The division, known most commonly today as deductive versus inductive method, appears in other eras and methodologies as analysis/​synthesis, non-ampliative/​ampliative, or even confirmation/​verification. The basic idea is there are two “directions” to proceed in our methods of inquiry: one away from what is observed, to the more fundamental, general, and encompassing principles; the other, from the fundamental and general to instances or implications of principles.

The basic aim and method of inquiry identified here can be seen as a theme running throughout the next two millennia of reflection on the correct way to seek after knowledge: carefully observe nature and then seek rules or principles which explain or predict its operation. The Aristotelian corpus provided the framework for a commentary tradition on scientific method independent of science itself (cosmos versus physics.) During the medieval period, figures such as Albertus Magnus (1206–1280), Thomas Aquinas (1225–1274), Robert Grosseteste (1175–1253), Roger Bacon (1214/1220–1292), William of Ockham (1287–1347), Andreas Vesalius (1514–1546), Giacomo Zabarella (1533–1589) all worked to clarify the kind of knowledge obtainable by observation and induction, the source of justification of induction, and best rules for its application. [ 2 ] Many of their contributions we now think of as essential to science (see also Laudan 1968). As Aristotle and Plato had employed a framework of reasoning either “to the forms” or “away from the forms”, medieval thinkers employed directions away from the phenomena or back to the phenomena. In analysis, a phenomena was examined to discover its basic explanatory principles; in synthesis, explanations of a phenomena were constructed from first principles.

During the Scientific Revolution these various strands of argument, experiment, and reason were forged into a dominant epistemic authority. The 16 th –18 th centuries were a period of not only dramatic advance in knowledge about the operation of the natural world—advances in mechanical, medical, biological, political, economic explanations—but also of self-awareness of the revolutionary changes taking place, and intense reflection on the source and legitimation of the method by which the advances were made. The struggle to establish the new authority included methodological moves. The Book of Nature, according to the metaphor of Galileo Galilei (1564–1642) or Francis Bacon (1561–1626), was written in the language of mathematics, of geometry and number. This motivated an emphasis on mathematical description and mechanical explanation as important aspects of scientific method. Through figures such as Henry More and Ralph Cudworth, a neo-Platonic emphasis on the importance of metaphysical reflection on nature behind appearances, particularly regarding the spiritual as a complement to the purely mechanical, remained an important methodological thread of the Scientific Revolution (see the entries on Cambridge platonists ; Boyle ; Henry More ; Galileo ).

In Novum Organum (1620), Bacon was critical of the Aristotelian method for leaping from particulars to universals too quickly. The syllogistic form of reasoning readily mixed those two types of propositions. Bacon aimed at the invention of new arts, principles, and directions. His method would be grounded in methodical collection of observations, coupled with correction of our senses (and particularly, directions for the avoidance of the Idols, as he called them, kinds of systematic errors to which naïve observers are prone.) The community of scientists could then climb, by a careful, gradual and unbroken ascent, to reliable general claims.

Bacon’s method has been criticized as impractical and too inflexible for the practicing scientist. Whewell would later criticize Bacon in his System of Logic for paying too little attention to the practices of scientists. It is hard to find convincing examples of Bacon’s method being put in to practice in the history of science, but there are a few who have been held up as real examples of 16 th century scientific, inductive method, even if not in the rigid Baconian mold: figures such as Robert Boyle (1627–1691) and William Harvey (1578–1657) (see the entry on Bacon ).

It is to Isaac Newton (1642–1727), however, that historians of science and methodologists have paid greatest attention. Given the enormous success of his Principia Mathematica and Opticks , this is understandable. The study of Newton’s method has had two main thrusts: the implicit method of the experiments and reasoning presented in the Opticks, and the explicit methodological rules given as the Rules for Philosophising (the Regulae) in Book III of the Principia . [ 3 ] Newton’s law of gravitation, the linchpin of his new cosmology, broke with explanatory conventions of natural philosophy, first for apparently proposing action at a distance, but more generally for not providing “true”, physical causes. The argument for his System of the World ( Principia , Book III) was based on phenomena, not reasoned first principles. This was viewed (mainly on the continent) as insufficient for proper natural philosophy. The Regulae counter this objection, re-defining the aims of natural philosophy by re-defining the method natural philosophers should follow. (See the entry on Newton’s philosophy .)

To his list of methodological prescriptions should be added Newton’s famous phrase “ hypotheses non fingo ” (commonly translated as “I frame no hypotheses”.) The scientist was not to invent systems but infer explanations from observations, as Bacon had advocated. This would come to be known as inductivism. In the century after Newton, significant clarifications of the Newtonian method were made. Colin Maclaurin (1698–1746), for instance, reconstructed the essential structure of the method as having complementary analysis and synthesis phases, one proceeding away from the phenomena in generalization, the other from the general propositions to derive explanations of new phenomena. Denis Diderot (1713–1784) and editors of the Encyclopédie did much to consolidate and popularize Newtonianism, as did Francesco Algarotti (1721–1764). The emphasis was often the same, as much on the character of the scientist as on their process, a character which is still commonly assumed. The scientist is humble in the face of nature, not beholden to dogma, obeys only his eyes, and follows the truth wherever it leads. It was certainly Voltaire (1694–1778) and du Chatelet (1706–1749) who were most influential in propagating the latter vision of the scientist and their craft, with Newton as hero. Scientific method became a revolutionary force of the Enlightenment. (See also the entries on Newton , Leibniz , Descartes , Boyle , Hume , enlightenment , as well as Shank 2008 for a historical overview.)

Not all 18 th century reflections on scientific method were so celebratory. Famous also are George Berkeley’s (1685–1753) attack on the mathematics of the new science, as well as the over-emphasis of Newtonians on observation; and David Hume’s (1711–1776) undermining of the warrant offered for scientific claims by inductive justification (see the entries on: George Berkeley ; David Hume ; Hume’s Newtonianism and Anti-Newtonianism ). Hume’s problem of induction motivated Immanuel Kant (1724–1804) to seek new foundations for empirical method, though as an epistemic reconstruction, not as any set of practical guidelines for scientists. Both Hume and Kant influenced the methodological reflections of the next century, such as the debate between Mill and Whewell over the certainty of inductive inferences in science.

The debate between John Stuart Mill (1806–1873) and William Whewell (1794–1866) has become the canonical methodological debate of the 19 th century. Although often characterized as a debate between inductivism and hypothetico-deductivism, the role of the two methods on each side is actually more complex. On the hypothetico-deductive account, scientists work to come up with hypotheses from which true observational consequences can be deduced—hence, hypothetico-deductive. Because Whewell emphasizes both hypotheses and deduction in his account of method, he can be seen as a convenient foil to the inductivism of Mill. However, equally if not more important to Whewell’s portrayal of scientific method is what he calls the “fundamental antithesis”. Knowledge is a product of the objective (what we see in the world around us) and subjective (the contributions of our mind to how we perceive and understand what we experience, which he called the Fundamental Ideas). Both elements are essential according to Whewell, and he was therefore critical of Kant for too much focus on the subjective, and John Locke (1632–1704) and Mill for too much focus on the senses. Whewell’s fundamental ideas can be discipline relative. An idea can be fundamental even if it is necessary for knowledge only within a given scientific discipline (e.g., chemical affinity for chemistry). This distinguishes fundamental ideas from the forms and categories of intuition of Kant. (See the entry on Whewell .)

Clarifying fundamental ideas would therefore be an essential part of scientific method and scientific progress. Whewell called this process “Discoverer’s Induction”. It was induction, following Bacon or Newton, but Whewell sought to revive Bacon’s account by emphasising the role of ideas in the clear and careful formulation of inductive hypotheses. Whewell’s induction is not merely the collecting of objective facts. The subjective plays a role through what Whewell calls the Colligation of Facts, a creative act of the scientist, the invention of a theory. A theory is then confirmed by testing, where more facts are brought under the theory, called the Consilience of Inductions. Whewell felt that this was the method by which the true laws of nature could be discovered: clarification of fundamental concepts, clever invention of explanations, and careful testing. Mill, in his critique of Whewell, and others who have cast Whewell as a fore-runner of the hypothetico-deductivist view, seem to have under-estimated the importance of this discovery phase in Whewell’s understanding of method (Snyder 1997a,b, 1999). Down-playing the discovery phase would come to characterize methodology of the early 20 th century (see section 3 ).

Mill, in his System of Logic , put forward a narrower view of induction as the essence of scientific method. For Mill, induction is the search first for regularities among events. Among those regularities, some will continue to hold for further observations, eventually gaining the status of laws. One can also look for regularities among the laws discovered in a domain, i.e., for a law of laws. Which “law law” will hold is time and discipline dependent and open to revision. One example is the Law of Universal Causation, and Mill put forward specific methods for identifying causes—now commonly known as Mill’s methods. These five methods look for circumstances which are common among the phenomena of interest, those which are absent when the phenomena are, or those for which both vary together. Mill’s methods are still seen as capturing basic intuitions about experimental methods for finding the relevant explanatory factors ( System of Logic (1843), see Mill entry). The methods advocated by Whewell and Mill, in the end, look similar. Both involve inductive generalization to covering laws. They differ dramatically, however, with respect to the necessity of the knowledge arrived at; that is, at the meta-methodological level (see the entries on Whewell and Mill entries).

3. Logic of method and critical responses

The quantum and relativistic revolutions in physics in the early 20 th century had a profound effect on methodology. Conceptual foundations of both theories were taken to show the defeasibility of even the most seemingly secure intuitions about space, time and bodies. Certainty of knowledge about the natural world was therefore recognized as unattainable. Instead a renewed empiricism was sought which rendered science fallible but still rationally justifiable.

Analyses of the reasoning of scientists emerged, according to which the aspects of scientific method which were of primary importance were the means of testing and confirming of theories. A distinction in methodology was made between the contexts of discovery and justification. The distinction could be used as a wedge between the particularities of where and how theories or hypotheses are arrived at, on the one hand, and the underlying reasoning scientists use (whether or not they are aware of it) when assessing theories and judging their adequacy on the basis of the available evidence. By and large, for most of the 20 th century, philosophy of science focused on the second context, although philosophers differed on whether to focus on confirmation or refutation as well as on the many details of how confirmation or refutation could or could not be brought about. By the mid-20 th century these attempts at defining the method of justification and the context distinction itself came under pressure. During the same period, philosophy of science developed rapidly, and from section 4 this entry will therefore shift from a primarily historical treatment of the scientific method towards a primarily thematic one.

Advances in logic and probability held out promise of the possibility of elaborate reconstructions of scientific theories and empirical method, the best example being Rudolf Carnap’s The Logical Structure of the World (1928). Carnap attempted to show that a scientific theory could be reconstructed as a formal axiomatic system—that is, a logic. That system could refer to the world because some of its basic sentences could be interpreted as observations or operations which one could perform to test them. The rest of the theoretical system, including sentences using theoretical or unobservable terms (like electron or force) would then either be meaningful because they could be reduced to observations, or they had purely logical meanings (called analytic, like mathematical identities). This has been referred to as the verifiability criterion of meaning. According to the criterion, any statement not either analytic or verifiable was strictly meaningless. Although the view was endorsed by Carnap in 1928, he would later come to see it as too restrictive (Carnap 1956). Another familiar version of this idea is operationalism of Percy William Bridgman. In The Logic of Modern Physics (1927) Bridgman asserted that every physical concept could be defined in terms of the operations one would perform to verify the application of that concept. Making good on the operationalisation of a concept even as simple as length, however, can easily become enormously complex (for measuring very small lengths, for instance) or impractical (measuring large distances like light years.)

Carl Hempel’s (1950, 1951) criticisms of the verifiability criterion of meaning had enormous influence. He pointed out that universal generalizations, such as most scientific laws, were not strictly meaningful on the criterion. Verifiability and operationalism both seemed too restrictive to capture standard scientific aims and practice. The tenuous connection between these reconstructions and actual scientific practice was criticized in another way. In both approaches, scientific methods are instead recast in methodological roles. Measurements, for example, were looked to as ways of giving meanings to terms. The aim of the philosopher of science was not to understand the methods per se , but to use them to reconstruct theories, their meanings, and their relation to the world. When scientists perform these operations, however, they will not report that they are doing them to give meaning to terms in a formal axiomatic system. This disconnect between methodology and the details of actual scientific practice would seem to violate the empiricism the Logical Positivists and Bridgman were committed to. The view that methodology should correspond to practice (to some extent) has been called historicism, or intuitionism. We turn to these criticisms and responses in section 3.4 . [ 4 ]

Positivism also had to contend with the recognition that a purely inductivist approach, along the lines of Bacon-Newton-Mill, was untenable. There was no pure observation, for starters. All observation was theory laden. Theory is required to make any observation, therefore not all theory can be derived from observation alone. (See the entry on theory and observation in science .) Even granting an observational basis, Hume had already pointed out that one could not deductively justify inductive conclusions without begging the question by presuming the success of the inductive method. Likewise, positivist attempts at analyzing how a generalization can be confirmed by observations of its instances were subject to a number of criticisms. Goodman (1965) and Hempel (1965) both point to paradoxes inherent in standard accounts of confirmation. Recent attempts at explaining how observations can serve to confirm a scientific theory are discussed in section 4 below.

The standard starting point for a non-inductive analysis of the logic of confirmation is known as the Hypothetico-Deductive (H-D) method. In its simplest form, a sentence of a theory which expresses some hypothesis is confirmed by its true consequences. As noted in section 2 , this method had been advanced by Whewell in the 19 th century, as well as Nicod (1924) and others in the 20 th century. Often, Hempel’s (1966) description of the H-D method, illustrated by the case of Semmelweiss’ inferential procedures in establishing the cause of childbed fever, has been presented as a key account of H-D as well as a foil for criticism of the H-D account of confirmation (see, for example, Lipton’s (2004) discussion of inference to the best explanation; also the entry on confirmation ). Hempel described Semmelsweiss’ procedure as examining various hypotheses explaining the cause of childbed fever. Some hypotheses conflicted with observable facts and could be rejected as false immediately. Others needed to be tested experimentally by deducing which observable events should follow if the hypothesis were true (what Hempel called the test implications of the hypothesis), then conducting an experiment and observing whether or not the test implications occurred. If the experiment showed the test implication to be false, the hypothesis could be rejected. If the experiment showed the test implications to be true, however, this did not prove the hypothesis true. The confirmation of a test implication does not verify a hypothesis, though Hempel did allow that “it provides at least some support, some corroboration or confirmation for it” (Hempel 1966: 8). The degree of this support then depends on the quantity, variety and precision of the supporting evidence.

Another approach that took off from the difficulties with inductive inference was Karl Popper’s critical rationalism or falsificationism (Popper 1959, 1963). Falsification is deductive and similar to H-D in that it involves scientists deducing observational consequences from the hypothesis under test. For Popper, however, the important point was not the degree of confirmation that successful prediction offered to a hypothesis. The crucial thing was the logical asymmetry between confirmation, based on inductive inference, and falsification, which can be based on a deductive inference. (This simple opposition was later questioned, by Lakatos, among others. See the entry on historicist theories of scientific rationality. )

Popper stressed that, regardless of the amount of confirming evidence, we can never be certain that a hypothesis is true without committing the fallacy of affirming the consequent. Instead, Popper introduced the notion of corroboration as a measure for how well a theory or hypothesis has survived previous testing—but without implying that this is also a measure for the probability that it is true.

Popper was also motivated by his doubts about the scientific status of theories like the Marxist theory of history or psycho-analysis, and so wanted to demarcate between science and pseudo-science. Popper saw this as an importantly different distinction than demarcating science from metaphysics. The latter demarcation was the primary concern of many logical empiricists. Popper used the idea of falsification to draw a line instead between pseudo and proper science. Science was science because its method involved subjecting theories to rigorous tests which offered a high probability of failing and thus refuting the theory.

A commitment to the risk of failure was important. Avoiding falsification could be done all too easily. If a consequence of a theory is inconsistent with observations, an exception can be added by introducing auxiliary hypotheses designed explicitly to save the theory, so-called ad hoc modifications. This Popper saw done in pseudo-science where ad hoc theories appeared capable of explaining anything in their field of application. In contrast, science is risky. If observations showed the predictions from a theory to be wrong, the theory would be refuted. Hence, scientific hypotheses must be falsifiable. Not only must there exist some possible observation statement which could falsify the hypothesis or theory, were it observed, (Popper called these the hypothesis’ potential falsifiers) it is crucial to the Popperian scientific method that such falsifications be sincerely attempted on a regular basis.

The more potential falsifiers of a hypothesis, the more falsifiable it would be, and the more the hypothesis claimed. Conversely, hypotheses without falsifiers claimed very little or nothing at all. Originally, Popper thought that this meant the introduction of ad hoc hypotheses only to save a theory should not be countenanced as good scientific method. These would undermine the falsifiabililty of a theory. However, Popper later came to recognize that the introduction of modifications (immunizations, he called them) was often an important part of scientific development. Responding to surprising or apparently falsifying observations often generated important new scientific insights. Popper’s own example was the observed motion of Uranus which originally did not agree with Newtonian predictions. The ad hoc hypothesis of an outer planet explained the disagreement and led to further falsifiable predictions. Popper sought to reconcile the view by blurring the distinction between falsifiable and not falsifiable, and speaking instead of degrees of testability (Popper 1985: 41f.).

From the 1960s on, sustained meta-methodological criticism emerged that drove philosophical focus away from scientific method. A brief look at those criticisms follows, with recommendations for further reading at the end of the entry.

Thomas Kuhn’s The Structure of Scientific Revolutions (1962) begins with a well-known shot across the bow for philosophers of science:

History, if viewed as a repository for more than anecdote or chronology, could produce a decisive transformation in the image of science by which we are now possessed. (1962: 1)

The image Kuhn thought needed transforming was the a-historical, rational reconstruction sought by many of the Logical Positivists, though Carnap and other positivists were actually quite sympathetic to Kuhn’s views. (See the entry on the Vienna Circle .) Kuhn shares with other of his contemporaries, such as Feyerabend and Lakatos, a commitment to a more empirical approach to philosophy of science. Namely, the history of science provides important data, and necessary checks, for philosophy of science, including any theory of scientific method.

The history of science reveals, according to Kuhn, that scientific development occurs in alternating phases. During normal science, the members of the scientific community adhere to the paradigm in place. Their commitment to the paradigm means a commitment to the puzzles to be solved and the acceptable ways of solving them. Confidence in the paradigm remains so long as steady progress is made in solving the shared puzzles. Method in this normal phase operates within a disciplinary matrix (Kuhn’s later concept of a paradigm) which includes standards for problem solving, and defines the range of problems to which the method should be applied. An important part of a disciplinary matrix is the set of values which provide the norms and aims for scientific method. The main values that Kuhn identifies are prediction, problem solving, simplicity, consistency, and plausibility.

An important by-product of normal science is the accumulation of puzzles which cannot be solved with resources of the current paradigm. Once accumulation of these anomalies has reached some critical mass, it can trigger a communal shift to a new paradigm and a new phase of normal science. Importantly, the values that provide the norms and aims for scientific method may have transformed in the meantime. Method may therefore be relative to discipline, time or place

Feyerabend also identified the aims of science as progress, but argued that any methodological prescription would only stifle that progress (Feyerabend 1988). His arguments are grounded in re-examining accepted “myths” about the history of science. Heroes of science, like Galileo, are shown to be just as reliant on rhetoric and persuasion as they are on reason and demonstration. Others, like Aristotle, are shown to be far more reasonable and far-reaching in their outlooks then they are given credit for. As a consequence, the only rule that could provide what he took to be sufficient freedom was the vacuous “anything goes”. More generally, even the methodological restriction that science is the best way to pursue knowledge, and to increase knowledge, is too restrictive. Feyerabend suggested instead that science might, in fact, be a threat to a free society, because it and its myth had become so dominant (Feyerabend 1978).

An even more fundamental kind of criticism was offered by several sociologists of science from the 1970s onwards who rejected the methodology of providing philosophical accounts for the rational development of science and sociological accounts of the irrational mistakes. Instead, they adhered to a symmetry thesis on which any causal explanation of how scientific knowledge is established needs to be symmetrical in explaining truth and falsity, rationality and irrationality, success and mistakes, by the same causal factors (see, e.g., Barnes and Bloor 1982, Bloor 1991). Movements in the Sociology of Science, like the Strong Programme, or in the social dimensions and causes of knowledge more generally led to extended and close examination of detailed case studies in contemporary science and its history. (See the entries on the social dimensions of scientific knowledge and social epistemology .) Well-known examinations by Latour and Woolgar (1979/1986), Knorr-Cetina (1981), Pickering (1984), Shapin and Schaffer (1985) seem to bear out that it was social ideologies (on a macro-scale) or individual interactions and circumstances (on a micro-scale) which were the primary causal factors in determining which beliefs gained the status of scientific knowledge. As they saw it therefore, explanatory appeals to scientific method were not empirically grounded.

A late, and largely unexpected, criticism of scientific method came from within science itself. Beginning in the early 2000s, a number of scientists attempting to replicate the results of published experiments could not do so. There may be close conceptual connection between reproducibility and method. For example, if reproducibility means that the same scientific methods ought to produce the same result, and all scientific results ought to be reproducible, then whatever it takes to reproduce a scientific result ought to be called scientific method. Space limits us to the observation that, insofar as reproducibility is a desired outcome of proper scientific method, it is not strictly a part of scientific method. (See the entry on reproducibility of scientific results .)

By the close of the 20 th century the search for the scientific method was flagging. Nola and Sankey (2000b) could introduce their volume on method by remarking that “For some, the whole idea of a theory of scientific method is yester-year’s debate …”.

Despite the many difficulties that philosophers encountered in trying to providing a clear methodology of conformation (or refutation), still important progress has been made on understanding how observation can provide evidence for a given theory. Work in statistics has been crucial for understanding how theories can be tested empirically, and in recent decades a huge literature has developed that attempts to recast confirmation in Bayesian terms. Here these developments can be covered only briefly, and we refer to the entry on confirmation for further details and references.

Statistics has come to play an increasingly important role in the methodology of the experimental sciences from the 19 th century onwards. At that time, statistics and probability theory took on a methodological role as an analysis of inductive inference, and attempts to ground the rationality of induction in the axioms of probability theory have continued throughout the 20 th century and in to the present. Developments in the theory of statistics itself, meanwhile, have had a direct and immense influence on the experimental method, including methods for measuring the uncertainty of observations such as the Method of Least Squares developed by Legendre and Gauss in the early 19 th century, criteria for the rejection of outliers proposed by Peirce by the mid-19 th century, and the significance tests developed by Gosset (a.k.a. “Student”), Fisher, Neyman & Pearson and others in the 1920s and 1930s (see, e.g., Swijtink 1987 for a brief historical overview; and also the entry on C.S. Peirce ).

These developments within statistics then in turn led to a reflective discussion among both statisticians and philosophers of science on how to perceive the process of hypothesis testing: whether it was a rigorous statistical inference that could provide a numerical expression of the degree of confidence in the tested hypothesis, or if it should be seen as a decision between different courses of actions that also involved a value component. This led to a major controversy among Fisher on the one side and Neyman and Pearson on the other (see especially Fisher 1955, Neyman 1956 and Pearson 1955, and for analyses of the controversy, e.g., Howie 2002, Marks 2000, Lenhard 2006). On Fisher’s view, hypothesis testing was a methodology for when to accept or reject a statistical hypothesis, namely that a hypothesis should be rejected by evidence if this evidence would be unlikely relative to other possible outcomes, given the hypothesis were true. In contrast, on Neyman and Pearson’s view, the consequence of error also had to play a role when deciding between hypotheses. Introducing the distinction between the error of rejecting a true hypothesis (type I error) and accepting a false hypothesis (type II error), they argued that it depends on the consequences of the error to decide whether it is more important to avoid rejecting a true hypothesis or accepting a false one. Hence, Fisher aimed for a theory of inductive inference that enabled a numerical expression of confidence in a hypothesis. To him, the important point was the search for truth, not utility. In contrast, the Neyman-Pearson approach provided a strategy of inductive behaviour for deciding between different courses of action. Here, the important point was not whether a hypothesis was true, but whether one should act as if it was.

Similar discussions are found in the philosophical literature. On the one side, Churchman (1948) and Rudner (1953) argued that because scientific hypotheses can never be completely verified, a complete analysis of the methods of scientific inference includes ethical judgments in which the scientists must decide whether the evidence is sufficiently strong or that the probability is sufficiently high to warrant the acceptance of the hypothesis, which again will depend on the importance of making a mistake in accepting or rejecting the hypothesis. Others, such as Jeffrey (1956) and Levi (1960) disagreed and instead defended a value-neutral view of science on which scientists should bracket their attitudes, preferences, temperament, and values when assessing the correctness of their inferences. For more details on this value-free ideal in the philosophy of science and its historical development, see Douglas (2009) and Howard (2003). For a broad set of case studies examining the role of values in science, see e.g. Elliott & Richards 2017.

In recent decades, philosophical discussions of the evaluation of probabilistic hypotheses by statistical inference have largely focused on Bayesianism that understands probability as a measure of a person’s degree of belief in an event, given the available information, and frequentism that instead understands probability as a long-run frequency of a repeatable event. Hence, for Bayesians probabilities refer to a state of knowledge, whereas for frequentists probabilities refer to frequencies of events (see, e.g., Sober 2008, chapter 1 for a detailed introduction to Bayesianism and frequentism as well as to likelihoodism). Bayesianism aims at providing a quantifiable, algorithmic representation of belief revision, where belief revision is a function of prior beliefs (i.e., background knowledge) and incoming evidence. Bayesianism employs a rule based on Bayes’ theorem, a theorem of the probability calculus which relates conditional probabilities. The probability that a particular hypothesis is true is interpreted as a degree of belief, or credence, of the scientist. There will also be a probability and a degree of belief that a hypothesis will be true conditional on a piece of evidence (an observation, say) being true. Bayesianism proscribes that it is rational for the scientist to update their belief in the hypothesis to that conditional probability should it turn out that the evidence is, in fact, observed (see, e.g., Sprenger & Hartmann 2019 for a comprehensive treatment of Bayesian philosophy of science). Originating in the work of Neyman and Person, frequentism aims at providing the tools for reducing long-run error rates, such as the error-statistical approach developed by Mayo (1996) that focuses on how experimenters can avoid both type I and type II errors by building up a repertoire of procedures that detect errors if and only if they are present. Both Bayesianism and frequentism have developed over time, they are interpreted in different ways by its various proponents, and their relations to previous criticism to attempts at defining scientific method are seen differently by proponents and critics. The literature, surveys, reviews and criticism in this area are vast and the reader is referred to the entries on Bayesian epistemology and confirmation .

5. Method in Practice

Attention to scientific practice, as we have seen, is not itself new. However, the turn to practice in the philosophy of science of late can be seen as a correction to the pessimism with respect to method in philosophy of science in later parts of the 20 th century, and as an attempted reconciliation between sociological and rationalist explanations of scientific knowledge. Much of this work sees method as detailed and context specific problem-solving procedures, and methodological analyses to be at the same time descriptive, critical and advisory (see Nickles 1987 for an exposition of this view). The following section contains a survey of some of the practice focuses. In this section we turn fully to topics rather than chronology.

A problem with the distinction between the contexts of discovery and justification that figured so prominently in philosophy of science in the first half of the 20 th century (see section 2 ) is that no such distinction can be clearly seen in scientific activity (see Arabatzis 2006). Thus, in recent decades, it has been recognized that study of conceptual innovation and change should not be confined to psychology and sociology of science, but are also important aspects of scientific practice which philosophy of science should address (see also the entry on scientific discovery ). Looking for the practices that drive conceptual innovation has led philosophers to examine both the reasoning practices of scientists and the wide realm of experimental practices that are not directed narrowly at testing hypotheses, that is, exploratory experimentation.

Examining the reasoning practices of historical and contemporary scientists, Nersessian (2008) has argued that new scientific concepts are constructed as solutions to specific problems by systematic reasoning, and that of analogy, visual representation and thought-experimentation are among the important reasoning practices employed. These ubiquitous forms of reasoning are reliable—but also fallible—methods of conceptual development and change. On her account, model-based reasoning consists of cycles of construction, simulation, evaluation and adaption of models that serve as interim interpretations of the target problem to be solved. Often, this process will lead to modifications or extensions, and a new cycle of simulation and evaluation. However, Nersessian also emphasizes that

creative model-based reasoning cannot be applied as a simple recipe, is not always productive of solutions, and even its most exemplary usages can lead to incorrect solutions. (Nersessian 2008: 11)

Thus, while on the one hand she agrees with many previous philosophers that there is no logic of discovery, discoveries can derive from reasoned processes, such that a large and integral part of scientific practice is

the creation of concepts through which to comprehend, structure, and communicate about physical phenomena …. (Nersessian 1987: 11)

Similarly, work on heuristics for discovery and theory construction by scholars such as Darden (1991) and Bechtel & Richardson (1993) present science as problem solving and investigate scientific problem solving as a special case of problem-solving in general. Drawing largely on cases from the biological sciences, much of their focus has been on reasoning strategies for the generation, evaluation, and revision of mechanistic explanations of complex systems.

Addressing another aspect of the context distinction, namely the traditional view that the primary role of experiments is to test theoretical hypotheses according to the H-D model, other philosophers of science have argued for additional roles that experiments can play. The notion of exploratory experimentation was introduced to describe experiments driven by the desire to obtain empirical regularities and to develop concepts and classifications in which these regularities can be described (Steinle 1997, 2002; Burian 1997; Waters 2007)). However the difference between theory driven experimentation and exploratory experimentation should not be seen as a sharp distinction. Theory driven experiments are not always directed at testing hypothesis, but may also be directed at various kinds of fact-gathering, such as determining numerical parameters. Vice versa , exploratory experiments are usually informed by theory in various ways and are therefore not theory-free. Instead, in exploratory experiments phenomena are investigated without first limiting the possible outcomes of the experiment on the basis of extant theory about the phenomena.

The development of high throughput instrumentation in molecular biology and neighbouring fields has given rise to a special type of exploratory experimentation that collects and analyses very large amounts of data, and these new ‘omics’ disciplines are often said to represent a break with the ideal of hypothesis-driven science (Burian 2007; Elliott 2007; Waters 2007; O’Malley 2007) and instead described as data-driven research (Leonelli 2012; Strasser 2012) or as a special kind of “convenience experimentation” in which many experiments are done simply because they are extraordinarily convenient to perform (Krohs 2012).

5.2 Computer methods and ‘new ways’ of doing science

The field of omics just described is possible because of the ability of computers to process, in a reasonable amount of time, the huge quantities of data required. Computers allow for more elaborate experimentation (higher speed, better filtering, more variables, sophisticated coordination and control), but also, through modelling and simulations, might constitute a form of experimentation themselves. Here, too, we can pose a version of the general question of method versus practice: does the practice of using computers fundamentally change scientific method, or merely provide a more efficient means of implementing standard methods?

Because computers can be used to automate measurements, quantifications, calculations, and statistical analyses where, for practical reasons, these operations cannot be otherwise carried out, many of the steps involved in reaching a conclusion on the basis of an experiment are now made inside a “black box”, without the direct involvement or awareness of a human. This has epistemological implications, regarding what we can know, and how we can know it. To have confidence in the results, computer methods are therefore subjected to tests of verification and validation.

The distinction between verification and validation is easiest to characterize in the case of computer simulations. In a typical computer simulation scenario computers are used to numerically integrate differential equations for which no analytic solution is available. The equations are part of the model the scientist uses to represent a phenomenon or system under investigation. Verifying a computer simulation means checking that the equations of the model are being correctly approximated. Validating a simulation means checking that the equations of the model are adequate for the inferences one wants to make on the basis of that model.

A number of issues related to computer simulations have been raised. The identification of validity and verification as the testing methods has been criticized. Oreskes et al. (1994) raise concerns that “validiation”, because it suggests deductive inference, might lead to over-confidence in the results of simulations. The distinction itself is probably too clean, since actual practice in the testing of simulations mixes and moves back and forth between the two (Weissart 1997; Parker 2008a; Winsberg 2010). Computer simulations do seem to have a non-inductive character, given that the principles by which they operate are built in by the programmers, and any results of the simulation follow from those in-built principles in such a way that those results could, in principle, be deduced from the program code and its inputs. The status of simulations as experiments has therefore been examined (Kaufmann and Smarr 1993; Humphreys 1995; Hughes 1999; Norton and Suppe 2001). This literature considers the epistemology of these experiments: what we can learn by simulation, and also the kinds of justifications which can be given in applying that knowledge to the “real” world. (Mayo 1996; Parker 2008b). As pointed out, part of the advantage of computer simulation derives from the fact that huge numbers of calculations can be carried out without requiring direct observation by the experimenter/​simulator. At the same time, many of these calculations are approximations to the calculations which would be performed first-hand in an ideal situation. Both factors introduce uncertainties into the inferences drawn from what is observed in the simulation.

For many of the reasons described above, computer simulations do not seem to belong clearly to either the experimental or theoretical domain. Rather, they seem to crucially involve aspects of both. This has led some authors, such as Fox Keller (2003: 200) to argue that we ought to consider computer simulation a “qualitatively different way of doing science”. The literature in general tends to follow Kaufmann and Smarr (1993) in referring to computer simulation as a “third way” for scientific methodology (theoretical reasoning and experimental practice are the first two ways.). It should also be noted that the debates around these issues have tended to focus on the form of computer simulation typical in the physical sciences, where models are based on dynamical equations. Other forms of simulation might not have the same problems, or have problems of their own (see the entry on computer simulations in science ).

In recent years, the rapid development of machine learning techniques has prompted some scholars to suggest that the scientific method has become “obsolete” (Anderson 2008, Carrol and Goodstein 2009). This has resulted in an intense debate on the relative merit of data-driven and hypothesis-driven research (for samples, see e.g. Mazzocchi 2015 or Succi and Coveney 2018). For a detailed treatment of this topic, we refer to the entry scientific research and big data .

6. Discourse on scientific method

Despite philosophical disagreements, the idea of the scientific method still figures prominently in contemporary discourse on many different topics, both within science and in society at large. Often, reference to scientific method is used in ways that convey either the legend of a single, universal method characteristic of all science, or grants to a particular method or set of methods privilege as a special ‘gold standard’, often with reference to particular philosophers to vindicate the claims. Discourse on scientific method also typically arises when there is a need to distinguish between science and other activities, or for justifying the special status conveyed to science. In these areas, the philosophical attempts at identifying a set of methods characteristic for scientific endeavors are closely related to the philosophy of science’s classical problem of demarcation (see the entry on science and pseudo-science ) and to the philosophical analysis of the social dimension of scientific knowledge and the role of science in democratic society.

One of the settings in which the legend of a single, universal scientific method has been particularly strong is science education (see, e.g., Bauer 1992; McComas 1996; Wivagg & Allchin 2002). [ 5 ] Often, ‘the scientific method’ is presented in textbooks and educational web pages as a fixed four or five step procedure starting from observations and description of a phenomenon and progressing over formulation of a hypothesis which explains the phenomenon, designing and conducting experiments to test the hypothesis, analyzing the results, and ending with drawing a conclusion. Such references to a universal scientific method can be found in educational material at all levels of science education (Blachowicz 2009), and numerous studies have shown that the idea of a general and universal scientific method often form part of both students’ and teachers’ conception of science (see, e.g., Aikenhead 1987; Osborne et al. 2003). In response, it has been argued that science education need to focus more on teaching about the nature of science, although views have differed on whether this is best done through student-led investigations, contemporary cases, or historical cases (Allchin, Andersen & Nielsen 2014)

Although occasionally phrased with reference to the H-D method, important historical roots of the legend in science education of a single, universal scientific method are the American philosopher and psychologist Dewey’s account of inquiry in How We Think (1910) and the British mathematician Karl Pearson’s account of science in Grammar of Science (1892). On Dewey’s account, inquiry is divided into the five steps of

(i) a felt difficulty, (ii) its location and definition, (iii) suggestion of a possible solution, (iv) development by reasoning of the bearing of the suggestions, (v) further observation and experiment leading to its acceptance or rejection. (Dewey 1910: 72)

Similarly, on Pearson’s account, scientific investigations start with measurement of data and observation of their correction and sequence from which scientific laws can be discovered with the aid of creative imagination. These laws have to be subject to criticism, and their final acceptance will have equal validity for “all normally constituted minds”. Both Dewey’s and Pearson’s accounts should be seen as generalized abstractions of inquiry and not restricted to the realm of science—although both Dewey and Pearson referred to their respective accounts as ‘the scientific method’.

Occasionally, scientists make sweeping statements about a simple and distinct scientific method, as exemplified by Feynman’s simplified version of a conjectures and refutations method presented, for example, in the last of his 1964 Cornell Messenger lectures. [ 6 ] However, just as often scientists have come to the same conclusion as recent philosophy of science that there is not any unique, easily described scientific method. For example, the physicist and Nobel Laureate Weinberg described in the paper “The Methods of Science … And Those By Which We Live” (1995) how

The fact that the standards of scientific success shift with time does not only make the philosophy of science difficult; it also raises problems for the public understanding of science. We do not have a fixed scientific method to rally around and defend. (1995: 8)

Interview studies with scientists on their conception of method shows that scientists often find it hard to figure out whether available evidence confirms their hypothesis, and that there are no direct translations between general ideas about method and specific strategies to guide how research is conducted (Schickore & Hangel 2019, Hangel & Schickore 2017)

Reference to the scientific method has also often been used to argue for the scientific nature or special status of a particular activity. Philosophical positions that argue for a simple and unique scientific method as a criterion of demarcation, such as Popperian falsification, have often attracted practitioners who felt that they had a need to defend their domain of practice. For example, references to conjectures and refutation as the scientific method are abundant in much of the literature on complementary and alternative medicine (CAM)—alongside the competing position that CAM, as an alternative to conventional biomedicine, needs to develop its own methodology different from that of science.

Also within mainstream science, reference to the scientific method is used in arguments regarding the internal hierarchy of disciplines and domains. A frequently seen argument is that research based on the H-D method is superior to research based on induction from observations because in deductive inferences the conclusion follows necessarily from the premises. (See, e.g., Parascandola 1998 for an analysis of how this argument has been made to downgrade epidemiology compared to the laboratory sciences.) Similarly, based on an examination of the practices of major funding institutions such as the National Institutes of Health (NIH), the National Science Foundation (NSF) and the Biomedical Sciences Research Practices (BBSRC) in the UK, O’Malley et al. (2009) have argued that funding agencies seem to have a tendency to adhere to the view that the primary activity of science is to test hypotheses, while descriptive and exploratory research is seen as merely preparatory activities that are valuable only insofar as they fuel hypothesis-driven research.

In some areas of science, scholarly publications are structured in a way that may convey the impression of a neat and linear process of inquiry from stating a question, devising the methods by which to answer it, collecting the data, to drawing a conclusion from the analysis of data. For example, the codified format of publications in most biomedical journals known as the IMRAD format (Introduction, Method, Results, Analysis, Discussion) is explicitly described by the journal editors as “not an arbitrary publication format but rather a direct reflection of the process of scientific discovery” (see the so-called “Vancouver Recommendations”, ICMJE 2013: 11). However, scientific publications do not in general reflect the process by which the reported scientific results were produced. For example, under the provocative title “Is the scientific paper a fraud?”, Medawar argued that scientific papers generally misrepresent how the results have been produced (Medawar 1963/1996). Similar views have been advanced by philosophers, historians and sociologists of science (Gilbert 1976; Holmes 1987; Knorr-Cetina 1981; Schickore 2008; Suppe 1998) who have argued that scientists’ experimental practices are messy and often do not follow any recognizable pattern. Publications of research results, they argue, are retrospective reconstructions of these activities that often do not preserve the temporal order or the logic of these activities, but are instead often constructed in order to screen off potential criticism (see Schickore 2008 for a review of this work).

Philosophical positions on the scientific method have also made it into the court room, especially in the US where judges have drawn on philosophy of science in deciding when to confer special status to scientific expert testimony. A key case is Daubert vs Merrell Dow Pharmaceuticals (92–102, 509 U.S. 579, 1993). In this case, the Supreme Court argued in its 1993 ruling that trial judges must ensure that expert testimony is reliable, and that in doing this the court must look at the expert’s methodology to determine whether the proffered evidence is actually scientific knowledge. Further, referring to works of Popper and Hempel the court stated that

ordinarily, a key question to be answered in determining whether a theory or technique is scientific knowledge … is whether it can be (and has been) tested. (Justice Blackmun, Daubert v. Merrell Dow Pharmaceuticals; see Other Internet Resources for a link to the opinion)

But as argued by Haack (2005a,b, 2010) and by Foster & Hubner (1999), by equating the question of whether a piece of testimony is reliable with the question whether it is scientific as indicated by a special methodology, the court was producing an inconsistent mixture of Popper’s and Hempel’s philosophies, and this has later led to considerable confusion in subsequent case rulings that drew on the Daubert case (see Haack 2010 for a detailed exposition).

The difficulties around identifying the methods of science are also reflected in the difficulties of identifying scientific misconduct in the form of improper application of the method or methods of science. One of the first and most influential attempts at defining misconduct in science was the US definition from 1989 that defined misconduct as

fabrication, falsification, plagiarism, or other practices that seriously deviate from those that are commonly accepted within the scientific community . (Code of Federal Regulations, part 50, subpart A., August 8, 1989, italics added)

However, the “other practices that seriously deviate” clause was heavily criticized because it could be used to suppress creative or novel science. For example, the National Academy of Science stated in their report Responsible Science (1992) that it

wishes to discourage the possibility that a misconduct complaint could be lodged against scientists based solely on their use of novel or unorthodox research methods. (NAS: 27)

This clause was therefore later removed from the definition. For an entry into the key philosophical literature on conduct in science, see Shamoo & Resnick (2009).

The question of the source of the success of science has been at the core of philosophy since the beginning of modern science. If viewed as a matter of epistemology more generally, scientific method is a part of the entire history of philosophy. Over that time, science and whatever methods its practitioners may employ have changed dramatically. Today, many philosophers have taken up the banners of pluralism or of practice to focus on what are, in effect, fine-grained and contextually limited examinations of scientific method. Others hope to shift perspectives in order to provide a renewed general account of what characterizes the activity we call science.

One such perspective has been offered recently by Hoyningen-Huene (2008, 2013), who argues from the history of philosophy of science that after three lengthy phases of characterizing science by its method, we are now in a phase where the belief in the existence of a positive scientific method has eroded and what has been left to characterize science is only its fallibility. First was a phase from Plato and Aristotle up until the 17 th century where the specificity of scientific knowledge was seen in its absolute certainty established by proof from evident axioms; next was a phase up to the mid-19 th century in which the means to establish the certainty of scientific knowledge had been generalized to include inductive procedures as well. In the third phase, which lasted until the last decades of the 20 th century, it was recognized that empirical knowledge was fallible, but it was still granted a special status due to its distinctive mode of production. But now in the fourth phase, according to Hoyningen-Huene, historical and philosophical studies have shown how “scientific methods with the characteristics as posited in the second and third phase do not exist” (2008: 168) and there is no longer any consensus among philosophers and historians of science about the nature of science. For Hoyningen-Huene, this is too negative a stance, and he therefore urges the question about the nature of science anew. His own answer to this question is that “scientific knowledge differs from other kinds of knowledge, especially everyday knowledge, primarily by being more systematic” (Hoyningen-Huene 2013: 14). Systematicity can have several different dimensions: among them are more systematic descriptions, explanations, predictions, defense of knowledge claims, epistemic connectedness, ideal of completeness, knowledge generation, representation of knowledge and critical discourse. Hence, what characterizes science is the greater care in excluding possible alternative explanations, the more detailed elaboration with respect to data on which predictions are based, the greater care in detecting and eliminating sources of error, the more articulate connections to other pieces of knowledge, etc. On this position, what characterizes science is not that the methods employed are unique to science, but that the methods are more carefully employed.

Another, similar approach has been offered by Haack (2003). She sets off, similar to Hoyningen-Huene, from a dissatisfaction with the recent clash between what she calls Old Deferentialism and New Cynicism. The Old Deferentialist position is that science progressed inductively by accumulating true theories confirmed by empirical evidence or deductively by testing conjectures against basic statements; while the New Cynics position is that science has no epistemic authority and no uniquely rational method and is merely just politics. Haack insists that contrary to the views of the New Cynics, there are objective epistemic standards, and there is something epistemologically special about science, even though the Old Deferentialists pictured this in a wrong way. Instead, she offers a new Critical Commonsensist account on which standards of good, strong, supportive evidence and well-conducted, honest, thorough and imaginative inquiry are not exclusive to the sciences, but the standards by which we judge all inquirers. In this sense, science does not differ in kind from other kinds of inquiry, but it may differ in the degree to which it requires broad and detailed background knowledge and a familiarity with a technical vocabulary that only specialists may possess.

  • Aikenhead, G.S., 1987, “High-school graduates’ beliefs about science-technology-society. III. Characteristics and limitations of scientific knowledge”, Science Education , 71(4): 459–487.
  • Allchin, D., H.M. Andersen and K. Nielsen, 2014, “Complementary Approaches to Teaching Nature of Science: Integrating Student Inquiry, Historical Cases, and Contemporary Cases in Classroom Practice”, Science Education , 98: 461–486.
  • Anderson, C., 2008, “The end of theory: The data deluge makes the scientific method obsolete”, Wired magazine , 16(7): 16–07
  • Arabatzis, T., 2006, “On the inextricability of the context of discovery and the context of justification”, in Revisiting Discovery and Justification , J. Schickore and F. Steinle (eds.), Dordrecht: Springer, pp. 215–230.
  • Barnes, J. (ed.), 1984, The Complete Works of Aristotle, Vols I and II , Princeton: Princeton University Press.
  • Barnes, B. and D. Bloor, 1982, “Relativism, Rationalism, and the Sociology of Knowledge”, in Rationality and Relativism , M. Hollis and S. Lukes (eds.), Cambridge: MIT Press, pp. 1–20.
  • Bauer, H.H., 1992, Scientific Literacy and the Myth of the Scientific Method , Urbana: University of Illinois Press.
  • Bechtel, W. and R.C. Richardson, 1993, Discovering complexity , Princeton, NJ: Princeton University Press.
  • Berkeley, G., 1734, The Analyst in De Motu and The Analyst: A Modern Edition with Introductions and Commentary , D. Jesseph (trans. and ed.), Dordrecht: Kluwer Academic Publishers, 1992.
  • Blachowicz, J., 2009, “How science textbooks treat scientific method: A philosopher’s perspective”, The British Journal for the Philosophy of Science , 60(2): 303–344.
  • Bloor, D., 1991, Knowledge and Social Imagery , Chicago: University of Chicago Press, 2 nd edition.
  • Boyle, R., 1682, New experiments physico-mechanical, touching the air , Printed by Miles Flesher for Richard Davis, bookseller in Oxford.
  • Bridgman, P.W., 1927, The Logic of Modern Physics , New York: Macmillan.
  • –––, 1956, “The Methodological Character of Theoretical Concepts”, in The Foundations of Science and the Concepts of Science and Psychology , Herbert Feigl and Michael Scriven (eds.), Minnesota: University of Minneapolis Press, pp. 38–76.
  • Burian, R., 1997, “Exploratory Experimentation and the Role of Histochemical Techniques in the Work of Jean Brachet, 1938–1952”, History and Philosophy of the Life Sciences , 19(1): 27–45.
  • –––, 2007, “On microRNA and the need for exploratory experimentation in post-genomic molecular biology”, History and Philosophy of the Life Sciences , 29(3): 285–311.
  • Carnap, R., 1928, Der logische Aufbau der Welt , Berlin: Bernary, transl. by R.A. George, The Logical Structure of the World , Berkeley: University of California Press, 1967.
  • –––, 1956, “The methodological character of theoretical concepts”, Minnesota studies in the philosophy of science , 1: 38–76.
  • Carrol, S., and D. Goodstein, 2009, “Defining the scientific method”, Nature Methods , 6: 237.
  • Churchman, C.W., 1948, “Science, Pragmatics, Induction”, Philosophy of Science , 15(3): 249–268.
  • Cooper, J. (ed.), 1997, Plato: Complete Works , Indianapolis: Hackett.
  • Darden, L., 1991, Theory Change in Science: Strategies from Mendelian Genetics , Oxford: Oxford University Press
  • Dewey, J., 1910, How we think , New York: Dover Publications (reprinted 1997).
  • Douglas, H., 2009, Science, Policy, and the Value-Free Ideal , Pittsburgh: University of Pittsburgh Press.
  • Dupré, J., 2004, “Miracle of Monism ”, in Naturalism in Question , Mario De Caro and David Macarthur (eds.), Cambridge, MA: Harvard University Press, pp. 36–58.
  • Elliott, K.C., 2007, “Varieties of exploratory experimentation in nanotoxicology”, History and Philosophy of the Life Sciences , 29(3): 311–334.
  • Elliott, K. C., and T. Richards (eds.), 2017, Exploring inductive risk: Case studies of values in science , Oxford: Oxford University Press.
  • Falcon, Andrea, 2005, Aristotle and the science of nature: Unity without uniformity , Cambridge: Cambridge University Press.
  • Feyerabend, P., 1978, Science in a Free Society , London: New Left Books
  • –––, 1988, Against Method , London: Verso, 2 nd edition.
  • Fisher, R.A., 1955, “Statistical Methods and Scientific Induction”, Journal of The Royal Statistical Society. Series B (Methodological) , 17(1): 69–78.
  • Foster, K. and P.W. Huber, 1999, Judging Science. Scientific Knowledge and the Federal Courts , Cambridge: MIT Press.
  • Fox Keller, E., 2003, “Models, Simulation, and ‘computer experiments’”, in The Philosophy of Scientific Experimentation , H. Radder (ed.), Pittsburgh: Pittsburgh University Press, 198–215.
  • Gilbert, G., 1976, “The transformation of research findings into scientific knowledge”, Social Studies of Science , 6: 281–306.
  • Gimbel, S., 2011, Exploring the Scientific Method , Chicago: University of Chicago Press.
  • Goodman, N., 1965, Fact , Fiction, and Forecast , Indianapolis: Bobbs-Merrill.
  • Haack, S., 1995, “Science is neither sacred nor a confidence trick”, Foundations of Science , 1(3): 323–335.
  • –––, 2003, Defending science—within reason , Amherst: Prometheus.
  • –––, 2005a, “Disentangling Daubert: an epistemological study in theory and practice”, Journal of Philosophy, Science and Law , 5, Haack 2005a available online . doi:10.5840/jpsl2005513
  • –––, 2005b, “Trial and error: The Supreme Court’s philosophy of science”, American Journal of Public Health , 95: S66-S73.
  • –––, 2010, “Federal Philosophy of Science: A Deconstruction-and a Reconstruction”, NYUJL & Liberty , 5: 394.
  • Hangel, N. and J. Schickore, 2017, “Scientists’ conceptions of good research practice”, Perspectives on Science , 25(6): 766–791
  • Harper, W.L., 2011, Isaac Newton’s Scientific Method: Turning Data into Evidence about Gravity and Cosmology , Oxford: Oxford University Press.
  • Hempel, C., 1950, “Problems and Changes in the Empiricist Criterion of Meaning”, Revue Internationale de Philosophie , 41(11): 41–63.
  • –––, 1951, “The Concept of Cognitive Significance: A Reconsideration”, Proceedings of the American Academy of Arts and Sciences , 80(1): 61–77.
  • –––, 1965, Aspects of scientific explanation and other essays in the philosophy of science , New York–London: Free Press.
  • –––, 1966, Philosophy of Natural Science , Englewood Cliffs: Prentice-Hall.
  • Holmes, F.L., 1987, “Scientific writing and scientific discovery”, Isis , 78(2): 220–235.
  • Howard, D., 2003, “Two left turns make a right: On the curious political career of North American philosophy of science at midcentury”, in Logical Empiricism in North America , G.L. Hardcastle & A.W. Richardson (eds.), Minneapolis: University of Minnesota Press, pp. 25–93.
  • Hoyningen-Huene, P., 2008, “Systematicity: The nature of science”, Philosophia , 36(2): 167–180.
  • –––, 2013, Systematicity. The Nature of Science , Oxford: Oxford University Press.
  • Howie, D., 2002, Interpreting probability: Controversies and developments in the early twentieth century , Cambridge: Cambridge University Press.
  • Hughes, R., 1999, “The Ising Model, Computer Simulation, and Universal Physics”, in Models as Mediators , M. Morgan and M. Morrison (eds.), Cambridge: Cambridge University Press, pp. 97–145
  • Hume, D., 1739, A Treatise of Human Nature , D. Fate Norton and M.J. Norton (eds.), Oxford: Oxford University Press, 2000.
  • Humphreys, P., 1995, “Computational science and scientific method”, Minds and Machines , 5(1): 499–512.
  • ICMJE, 2013, “Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals”, International Committee of Medical Journal Editors, available online , accessed August 13 2014
  • Jeffrey, R.C., 1956, “Valuation and Acceptance of Scientific Hypotheses”, Philosophy of Science , 23(3): 237–246.
  • Kaufmann, W.J., and L.L. Smarr, 1993, Supercomputing and the Transformation of Science , New York: Scientific American Library.
  • Knorr-Cetina, K., 1981, The Manufacture of Knowledge , Oxford: Pergamon Press.
  • Krohs, U., 2012, “Convenience experimentation”, Studies in History and Philosophy of Biological and BiomedicalSciences , 43: 52–57.
  • Kuhn, T.S., 1962, The Structure of Scientific Revolutions , Chicago: University of Chicago Press
  • Latour, B. and S. Woolgar, 1986, Laboratory Life: The Construction of Scientific Facts , Princeton: Princeton University Press, 2 nd edition.
  • Laudan, L., 1968, “Theories of scientific method from Plato to Mach”, History of Science , 7(1): 1–63.
  • Lenhard, J., 2006, “Models and statistical inference: The controversy between Fisher and Neyman-Pearson”, The British Journal for the Philosophy of Science , 57(1): 69–91.
  • Leonelli, S., 2012, “Making Sense of Data-Driven Research in the Biological and the Biomedical Sciences”, Studies in the History and Philosophy of the Biological and Biomedical Sciences , 43(1): 1–3.
  • Levi, I., 1960, “Must the scientist make value judgments?”, Philosophy of Science , 57(11): 345–357
  • Lindley, D., 1991, Theory Change in Science: Strategies from Mendelian Genetics , Oxford: Oxford University Press.
  • Lipton, P., 2004, Inference to the Best Explanation , London: Routledge, 2 nd edition.
  • Marks, H.M., 2000, The progress of experiment: science and therapeutic reform in the United States, 1900–1990 , Cambridge: Cambridge University Press.
  • Mazzochi, F., 2015, “Could Big Data be the end of theory in science?”, EMBO reports , 16: 1250–1255.
  • Mayo, D.G., 1996, Error and the Growth of Experimental Knowledge , Chicago: University of Chicago Press.
  • McComas, W.F., 1996, “Ten myths of science: Reexamining what we think we know about the nature of science”, School Science and Mathematics , 96(1): 10–16.
  • Medawar, P.B., 1963/1996, “Is the scientific paper a fraud”, in The Strange Case of the Spotted Mouse and Other Classic Essays on Science , Oxford: Oxford University Press, 33–39.
  • Mill, J.S., 1963, Collected Works of John Stuart Mill , J. M. Robson (ed.), Toronto: University of Toronto Press
  • NAS, 1992, Responsible Science: Ensuring the integrity of the research process , Washington DC: National Academy Press.
  • Nersessian, N.J., 1987, “A cognitive-historical approach to meaning in scientific theories”, in The process of science , N. Nersessian (ed.), Berlin: Springer, pp. 161–177.
  • –––, 2008, Creating Scientific Concepts , Cambridge: MIT Press.
  • Newton, I., 1726, Philosophiae naturalis Principia Mathematica (3 rd edition), in The Principia: Mathematical Principles of Natural Philosophy: A New Translation , I.B. Cohen and A. Whitman (trans.), Berkeley: University of California Press, 1999.
  • –––, 1704, Opticks or A Treatise of the Reflections, Refractions, Inflections & Colors of Light , New York: Dover Publications, 1952.
  • Neyman, J., 1956, “Note on an Article by Sir Ronald Fisher”, Journal of the Royal Statistical Society. Series B (Methodological) , 18: 288–294.
  • Nickles, T., 1987, “Methodology, heuristics, and rationality”, in Rational changes in science: Essays on Scientific Reasoning , J.C. Pitt (ed.), Berlin: Springer, pp. 103–132.
  • Nicod, J., 1924, Le problème logique de l’induction , Paris: Alcan. (Engl. transl. “The Logical Problem of Induction”, in Foundations of Geometry and Induction , London: Routledge, 2000.)
  • Nola, R. and H. Sankey, 2000a, “A selective survey of theories of scientific method”, in Nola and Sankey 2000b: 1–65.
  • –––, 2000b, After Popper, Kuhn and Feyerabend. Recent Issues in Theories of Scientific Method , London: Springer.
  • –––, 2007, Theories of Scientific Method , Stocksfield: Acumen.
  • Norton, S., and F. Suppe, 2001, “Why atmospheric modeling is good science”, in Changing the Atmosphere: Expert Knowledge and Environmental Governance , C. Miller and P. Edwards (eds.), Cambridge, MA: MIT Press, 88–133.
  • O’Malley, M., 2007, “Exploratory experimentation and scientific practice: Metagenomics and the proteorhodopsin case”, History and Philosophy of the Life Sciences , 29(3): 337–360.
  • O’Malley, M., C. Haufe, K. Elliot, and R. Burian, 2009, “Philosophies of Funding”, Cell , 138: 611–615.
  • Oreskes, N., K. Shrader-Frechette, and K. Belitz, 1994, “Verification, Validation and Confirmation of Numerical Models in the Earth Sciences”, Science , 263(5147): 641–646.
  • Osborne, J., S. Simon, and S. Collins, 2003, “Attitudes towards science: a review of the literature and its implications”, International Journal of Science Education , 25(9): 1049–1079.
  • Parascandola, M., 1998, “Epidemiology—2 nd -Rate Science”, Public Health Reports , 113(4): 312–320.
  • Parker, W., 2008a, “Franklin, Holmes and the Epistemology of Computer Simulation”, International Studies in the Philosophy of Science , 22(2): 165–83.
  • –––, 2008b, “Computer Simulation through an Error-Statistical Lens”, Synthese , 163(3): 371–84.
  • Pearson, K. 1892, The Grammar of Science , London: J.M. Dents and Sons, 1951
  • Pearson, E.S., 1955, “Statistical Concepts in Their Relation to Reality”, Journal of the Royal Statistical Society , B, 17: 204–207.
  • Pickering, A., 1984, Constructing Quarks: A Sociological History of Particle Physics , Edinburgh: Edinburgh University Press.
  • Popper, K.R., 1959, The Logic of Scientific Discovery , London: Routledge, 2002
  • –––, 1963, Conjectures and Refutations , London: Routledge, 2002.
  • –––, 1985, Unended Quest: An Intellectual Autobiography , La Salle: Open Court Publishing Co..
  • Rudner, R., 1953, “The Scientist Qua Scientist Making Value Judgments”, Philosophy of Science , 20(1): 1–6.
  • Rudolph, J.L., 2005, “Epistemology for the masses: The origin of ‘The Scientific Method’ in American Schools”, History of Education Quarterly , 45(3): 341–376
  • Schickore, J., 2008, “Doing science, writing science”, Philosophy of Science , 75: 323–343.
  • Schickore, J. and N. Hangel, 2019, “‘It might be this, it should be that…’ uncertainty and doubt in day-to-day science practice”, European Journal for Philosophy of Science , 9(2): 31. doi:10.1007/s13194-019-0253-9
  • Shamoo, A.E. and D.B. Resnik, 2009, Responsible Conduct of Research , Oxford: Oxford University Press.
  • Shank, J.B., 2008, The Newton Wars and the Beginning of the French Enlightenment , Chicago: The University of Chicago Press.
  • Shapin, S. and S. Schaffer, 1985, Leviathan and the air-pump , Princeton: Princeton University Press.
  • Smith, G.E., 2002, “The Methodology of the Principia”, in The Cambridge Companion to Newton , I.B. Cohen and G.E. Smith (eds.), Cambridge: Cambridge University Press, 138–173.
  • Snyder, L.J., 1997a, “Discoverers’ Induction”, Philosophy of Science , 64: 580–604.
  • –––, 1997b, “The Mill-Whewell Debate: Much Ado About Induction”, Perspectives on Science , 5: 159–198.
  • –––, 1999, “Renovating the Novum Organum: Bacon, Whewell and Induction”, Studies in History and Philosophy of Science , 30: 531–557.
  • Sober, E., 2008, Evidence and Evolution. The logic behind the science , Cambridge: Cambridge University Press
  • Sprenger, J. and S. Hartmann, 2019, Bayesian philosophy of science , Oxford: Oxford University Press.
  • Steinle, F., 1997, “Entering New Fields: Exploratory Uses of Experimentation”, Philosophy of Science (Proceedings), 64: S65–S74.
  • –––, 2002, “Experiments in History and Philosophy of Science”, Perspectives on Science , 10(4): 408–432.
  • Strasser, B.J., 2012, “Data-driven sciences: From wonder cabinets to electronic databases”, Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences , 43(1): 85–87.
  • Succi, S. and P.V. Coveney, 2018, “Big data: the end of the scientific method?”, Philosophical Transactions of the Royal Society A , 377: 20180145. doi:10.1098/rsta.2018.0145
  • Suppe, F., 1998, “The Structure of a Scientific Paper”, Philosophy of Science , 65(3): 381–405.
  • Swijtink, Z.G., 1987, “The objectification of observation: Measurement and statistical methods in the nineteenth century”, in The probabilistic revolution. Ideas in History, Vol. 1 , L. Kruger (ed.), Cambridge MA: MIT Press, pp. 261–285.
  • Waters, C.K., 2007, “The nature and context of exploratory experimentation: An introduction to three case studies of exploratory research”, History and Philosophy of the Life Sciences , 29(3): 275–284.
  • Weinberg, S., 1995, “The methods of science… and those by which we live”, Academic Questions , 8(2): 7–13.
  • Weissert, T., 1997, The Genesis of Simulation in Dynamics: Pursuing the Fermi-Pasta-Ulam Problem , New York: Springer Verlag.
  • William H., 1628, Exercitatio Anatomica de Motu Cordis et Sanguinis in Animalibus , in On the Motion of the Heart and Blood in Animals , R. Willis (trans.), Buffalo: Prometheus Books, 1993.
  • Winsberg, E., 2010, Science in the Age of Computer Simulation , Chicago: University of Chicago Press.
  • Wivagg, D. & D. Allchin, 2002, “The Dogma of the Scientific Method”, The American Biology Teacher , 64(9): 645–646
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • Blackmun opinion , in Daubert v. Merrell Dow Pharmaceuticals (92–102), 509 U.S. 579 (1993).
  • Scientific Method at philpapers. Darrell Rowbottom (ed.).
  • Recent Articles | Scientific Method | The Scientist Magazine

al-Kindi | Albert the Great [= Albertus magnus] | Aquinas, Thomas | Arabic and Islamic Philosophy, disciplines in: natural philosophy and natural science | Arabic and Islamic Philosophy, historical and methodological topics in: Greek sources | Arabic and Islamic Philosophy, historical and methodological topics in: influence of Arabic and Islamic Philosophy on the Latin West | Aristotle | Bacon, Francis | Bacon, Roger | Berkeley, George | biology: experiment in | Boyle, Robert | Cambridge Platonists | confirmation | Descartes, René | Enlightenment | epistemology | epistemology: Bayesian | epistemology: social | Feyerabend, Paul | Galileo Galilei | Grosseteste, Robert | Hempel, Carl | Hume, David | Hume, David: Newtonianism and Anti-Newtonianism | induction: problem of | Kant, Immanuel | Kuhn, Thomas | Leibniz, Gottfried Wilhelm | Locke, John | Mill, John Stuart | More, Henry | Neurath, Otto | Newton, Isaac | Newton, Isaac: philosophy | Ockham [Occam], William | operationalism | Peirce, Charles Sanders | Plato | Popper, Karl | rationality: historicist theories of | Reichenbach, Hans | reproducibility, scientific | Schlick, Moritz | science: and pseudo-science | science: theory and observation in | science: unity of | scientific discovery | scientific knowledge: social dimensions of | simulations in science | skepticism: medieval | space and time: absolute and relational space and motion, post-Newtonian theories | Vienna Circle | Whewell, William | Zabarella, Giacomo

Copyright © 2021 by Brian Hepburn < brian . hepburn @ wichita . edu > Hanne Andersen < hanne . andersen @ ind . ku . dk >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054


  • University of Memphis Libraries
  • Research Guides

Empirical Research: Defining, Identifying, & Finding

Defining empirical research, what is empirical research, quantitative or qualitative.

  • Introduction
  • Database Tools
  • Search Terms
  • Image Descriptions

Calfee & Chambliss (2005)  (UofM login required) describe empirical research as a "systematic approach for answering certain types of questions."  Those questions are answered "[t]hrough the collection of evidence under carefully defined and replicable conditions" (p. 43). 

The evidence collected during empirical research is often referred to as "data." 

Characteristics of Empirical Research

Emerald Publishing's guide to conducting empirical research identifies a number of common elements to empirical research: 

  • A  research question , which will determine research objectives.
  • A particular and planned  design  for the research, which will depend on the question and which will find ways of answering it with appropriate use of resources.
  • The gathering of  primary data , which is then analysed.
  • A particular  methodology  for collecting and analysing the data, such as an experiment or survey.
  • The limitation of the data to a particular group, area or time scale, known as a sample [emphasis added]: for example, a specific number of employees of a particular company type, or all users of a library over a given time scale. The sample should be somehow representative of a wider population.
  • The ability to  recreate  the study and test the results. This is known as  reliability .
  • The ability to  generalize  from the findings to a larger sample and to other situations.

If you see these elements in a research article, you can feel confident that you have found empirical research. Emerald's guide goes into more detail on each element. 

Empirical research methodologies can be described as quantitative, qualitative, or a mix of both (usually called mixed-methods).

Ruane (2016)  (UofM login required) gets at the basic differences in approach between quantitative and qualitative research:

  • Quantitative research  -- an approach to documenting reality that relies heavily on numbers both for the measurement of variables and for data analysis (p. 33).
  • Qualitative research  -- an approach to documenting reality that relies on words and images as the primary data source (p. 33).

Both quantitative and qualitative methods are empirical . If you can recognize that a research study is quantitative or qualitative study, then you have also recognized that it is empirical study. 

Below are information on the characteristics of quantitative and qualitative research. This video from Scribbr also offers a good overall introduction to the two approaches to research methodology: 

Characteristics of Quantitative Research 

Researchers test hypotheses, or theories, based in assumptions about causality, i.e. we expect variable X to cause variable Y. Variables have to be controlled as much as possible to ensure validity. The results explain the relationship between the variables. Measures are based in pre-defined instruments.

Examples: experimental or quasi-experimental design, pretest & post-test, survey or questionnaire with closed-ended questions. Studies that identify factors that influence an outcomes, the utility of an intervention, or understanding predictors of outcomes. 

Characteristics of Qualitative Research

Researchers explore “meaning individuals or groups ascribe to social or human problems (Creswell & Creswell, 2018, p3).” Questions and procedures emerge rather than being prescribed. Complexity, nuance, and individual meaning are valued. Research is both inductive and deductive. Data sources are multiple and varied, i.e. interviews, observations, documents, photographs, etc. The researcher is a key instrument and must be reflective of their background, culture, and experiences as influential of the research.

Examples: open question interviews and surveys, focus groups, case studies, grounded theory, ethnography, discourse analysis, narrative, phenomenology, participatory action research.

Calfee, R. C. & Chambliss, M. (2005). The design of empirical research. In J. Flood, D. Lapp, J. R. Squire, & J. Jensen (Eds.),  Methods of research on teaching the English language arts: The methodology chapters from the handbook of research on teaching the English language arts (pp. 43-78). Routledge.  http://ezproxy.memphis.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=125955&site=eds-live&scope=site .

Creswell, J. W., & Creswell, J. D. (2018).  Research design: Qualitative, quantitative, and mixed methods approaches  (5th ed.). Thousand Oaks: Sage.

How to... conduct empirical research . (n.d.). Emerald Publishing.  https://www.emeraldgrouppublishing.com/how-to/research-methods/conduct-empirical-research .

Scribbr. (2019). Quantitative vs. qualitative: The differences explained  [video]. YouTube.  https://www.youtube.com/watch?v=a-XtVF7Bofg .

Ruane, J. M. (2016).  Introducing social research methods : Essentials for getting the edge . Wiley-Blackwell.  http://ezproxy.memphis.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=1107215&site=eds-live&scope=site .  

  • << Previous: Home
  • Next: Identifying Empirical Research >>
  • Last Updated: Apr 2, 2024 11:25 AM
  • URL: https://libguides.memphis.edu/empirical-research

Philosophy Institute

Understanding the Empirical Method in Research Methodology

empirical research vs scientific method

Table of Contents

Have you ever wondered how scientists gather evidence to support their theories? Or what steps researchers take to ensure that their findings are reliable and not just based on speculation? The answer lies in a cornerstone of scientific investigation known as the empirical method . This approach to research is all about collecting data and observing the world to form solid, evidence-based conclusions. Let’s dive into the empirical method’s fascinating world and understand why it’s so critical in research methodology.

What is the empirical method?

The empirical method is a way of gaining knowledge by means of direct and indirect observation or experience. It’s fundamentally based on the idea that knowledge comes from sensory experience and can be acquired through observation and experimentation. This method stands in contrast to approaches that rely solely on theoretical or logical means.

The role of observation in the empirical method

Observation is at the heart of the empirical method. It involves using your senses to gather information about the world. This could be as simple as noting the color of a flower or as complex as using advanced technology to observe the behavior of microscopic organisms. The key is that the observations must be systematic and replicable, providing reliable data that can be used to draw conclusions.

Data collection: qualitative and quantitative

Different types of data can be collected using the empirical method:

  • Qualitative data – This data type is descriptive and conceptual, often collected through interviews, observations, and case studies.
  • Quantitative data – This involves numerical data collected through methods like surveys, experiments, and statistical analysis.

Empirical vs. experimental methods

While the empirical method is often associated with experimentation, it’s important to distinguish between the two. Experimental methods involve controlled tests where the researcher manipulates one variable to observe the effect on another. In contrast, the empirical method doesn’t necessarily involve manipulation. Instead, it focuses on observing and collecting data in natural settings, offering a broader understanding of phenomena as they occur in real life.

Why the distinction matters

Understanding the difference between empirical and experimental methods is crucial because it affects how research is conducted and how results are interpreted. Empirical research can provide a more naturalistic view of the subject matter, whereas experimental research can offer more control over variables and potentially more precise outcomes.

The significance of experiential learning

The empirical method has deep roots in experiential learning, which emphasizes learning through experience. This connection is vital because it underlines the importance of engaging with the subject matter at a practical level, rather than just theoretically. It’s a hands-on approach to knowledge that has been valued since the time of Aristotle.

Developing theories from empirical research

One of the most significant aspects of the empirical method is its role in theory development . Researchers collect and analyze data, and from these findings, they can formulate or refine theories. Theories that are supported by empirical evidence tend to be more robust and widely accepted in the scientific community.

Applying the empirical method in various fields

The empirical method is not limited to the natural sciences. It’s used across a range of disciplines, from social sciences to humanities, to understand different aspects of the world. For instance:

  • In psychology , researchers might use the empirical method to observe and record behaviors to understand the underlying mental processes.
  • In sociology , it could involve studying social interactions to draw conclusions about societal structures.
  • In economics , empirical data might be used to test the validity of economic theories or to measure market trends.

Challenges and limitations

Despite its importance, the empirical method has its challenges and limitations. One major challenge is ensuring that observations and data collection are unbiased. Additionally, not all phenomena are easily observable, and some may require more complex or abstract approaches.

The empirical method is a fundamental aspect of research methodology that has stood the test of time. By relying on observation and data collection, it allows researchers to ground their theories in reality, providing a solid foundation for knowledge. Whether it’s used in the hard sciences, social sciences, or humanities, the empirical method continues to be a critical tool for understanding our complex world.

How do you think the empirical method affects the credibility of research findings? And can you think of a situation where empirical methods might be difficult to apply but still necessary for advancing knowledge? Let’s discuss these thought-provoking questions and consider the breadth of the empirical method’s impact on the pursuit of understanding.

How useful was this post?

Click on a star to rate it!

Average rating / 5. Vote count:

No votes so far! Be the first to rate this post.

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Submit Comment

Research Methodology

1 Introduction to Research in General

  • Research in General
  • Research Circle
  • Tools of Research
  • Methods: Quantitative or Qualitative
  • The Product: Research Report or Papers

2 Original Unity of Philosophy and Science

  • Myth Philosophy and Science: Original Unity
  • The Myth: A Spiritual Metaphor
  • Myth Philosophy and Science
  • The Greek Quest for Unity
  • The Ionian School
  • Towards a Grand Unification Theory or Theory of Everything
  • Einstein’s Perennial Quest for Unity

3 Evolution of the Distinct Methods of Science

  • Definition of Scientific Method
  • The Evolution of Scientific Methods
  • Theory-Dependence of Observation
  • Scope of Science and Scientific Methods
  • Prevalent Mistakes in Applying the Scientific Method

4 Relation of Scientific and Philosophical Methods

  • Definitions of Scientific and Philosophical method
  • Philosophical method
  • Scientific method
  • The relation
  • The Importance of Philosophical and scientific methods

5 Dialectical Method

  • Introduction and a Brief Survey of the Method
  • Types of Dialectics
  • Dialectics in Classical Philosophy
  • Dialectics in Modern Philosophy
  • Critique of Dialectical Method

6 Rational Method

  • Understanding Rationalism
  • Rational Method of Investigation
  • Descartes’ Rational Method
  • Leibniz’ Aim of Philosophy
  • Spinoza’ Aim of Philosophy

7 Empirical Method

  • Common Features of Philosophical Method
  • Empirical Method
  • Exposition of Empiricism
  • Locke’s Empirical Method
  • Berkeley’s Empirical Method
  • David Hume’s Empirical Method

8 Critical Method

  • Basic Features of Critical Theory
  • On Instrumental Reason
  • Conception of Society
  • Human History as Dialectic of Enlightenment
  • Substantive Reason
  • Habermasian Critical Theory
  • Habermas’ Theory of Society
  • Habermas’ Critique of Scientism
  • Theory of Communicative Action
  • Discourse Ethics of Habermas

9 Phenomenological Method (Western and Indian)

  • Phenomenology in Philosophy
  • Phenomenology as a Method
  • Phenomenological Analysis of Knowledge
  • Phenomenological Reduction
  • Husserl’s Triad: Ego Cogito Cogitata
  • Intentionality
  • Understanding ‘Consciousness’
  • Phenomenological Method in Indian Tradition
  • Phenomenological Method in Religion

10 Analytical Method (Western and Indian)

  • Analysis in History of Philosophy
  • Conceptual Analysis
  • Analysis as a Method
  • Analysis in Logical Atomism and Logical Positivism
  • Analytic Method in Ethics
  • Language Analysis
  • Quine’s Analytical Method
  • Analysis in Indian Traditions

11 Hermeneutical Method (Western and Indian)

  • The Power (Sakti) to Convey Meaning
  • Three Meanings
  • Pre-understanding
  • The Semantic Autonomy of the Text
  • Towards a Fusion of Horizons
  • The Hermeneutical Circle
  • The True Scandal of the Text
  • Literary Forms

12 Deconstructive Method

  • The Seminal Idea of Deconstruction in Heidegger
  • Deconstruction in Derrida
  • Structuralism and Post-structuralism
  • Sign Signifier and Signified
  • Writing and Trace
  • Deconstruction as a Strategic Reading
  • The Logic of Supplement
  • No Outside-text

13 Method of Bibliography

  • Preparing to Write
  • Writing a Paper
  • The Main Divisions of a Paper
  • Writing Bibliography in Turabian and APA
  • Sample Bibliography

14 Method of Footnotes

  • Citations and Notes
  • General Hints for Footnotes
  • Writing Footnotes
  • Examples of Footnote or Endnote
  • Example of a Research Article

15 Method of Notes Taking

  • Methods of Note-taking
  • Note Book Style
  • Note taking in a Computer
  • Types of Note-taking
  • Notes from Field Research
  • Errors to be Avoided

16 Method of Thesis Proposal and Presentation

  • Preliminary Section
  • Presenting the Problem of the Thesis
  • Design of the Study
  • Main Body of the Thesis
  • Conclusion Summary and Recommendations
  • Reference Material

Share on Mastodon

Purdue University

  • Ask a Librarian

Research: Overview & Approaches

  • Getting Started with Undergraduate Research
  • Planning & Getting Started
  • Building Your Knowledge Base
  • Locating Sources
  • Reading Scholarly Articles
  • Creating a Literature Review
  • Productivity & Organizing Research
  • Scholarly and Professional Relationships

Introduction to Empirical Research

Databases for finding empirical research, guided search, google scholar, examples of empirical research, sources and further reading.

  • Interpretive Research
  • Action-Based Research
  • Creative & Experimental Approaches

Your Librarian

Profile Photo

  • Introductory Video This video covers what empirical research is, what kinds of questions and methods empirical researchers use, and some tips for finding empirical research articles in your discipline.

Video Tutorial

  • Guided Search: Finding Empirical Research Articles This is a hands-on tutorial that will allow you to use your own search terms to find resources.

Google Scholar Search

  • Study on radiation transfer in human skin for cosmetics
  • Long-Term Mobile Phone Use and the Risk of Vestibular Schwannoma: A Danish Nationwide Cohort Study
  • Emissions Impacts and Benefits of Plug-In Hybrid Electric Vehicles and Vehicle-to-Grid Services
  • Review of design considerations and technological challenges for successful development and deployment of plug-in hybrid electric vehicles
  • Endocrine disrupters and human health: could oestrogenic chemicals in body care cosmetics adversely affect breast cancer incidence in women?

empirical research vs scientific method

  • << Previous: Scholarly and Professional Relationships
  • Next: Interpretive Research >>
  • Last Updated: Apr 25, 2024 4:11 PM
  • URL: https://guides.lib.purdue.edu/research_approaches

Empirical Research

  • Living reference work entry
  • First Online: 22 May 2017
  • Cite this living reference work entry

empirical research vs scientific method

  • Emeka Thaddues Njoku 2  

513 Accesses

The term “empirical” entails gathered data based on experience, observations, or experimentation. In empirical research, knowledge is developed from factual experience as opposed to theoretical assumption and usually involved the use of data sources like datasets or fieldwork, but can also be based on observations within a laboratory setting. Testing hypothesis or answering definite questions is a primary feature of empirical research. Empirical research, in other words, involves the process of employing working hypothesis that are tested through experimentation or observation. Hence, empirical research is a method of uncovering empirical evidence.

Through the process of gathering valid empirical data, scientists from a variety of fields, ranging from the social to the natural sciences, have to carefully design their methods. This helps to ensure quality and accuracy of data collection and treatment. However, any error in empirical data collection process could inevitably render such...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Bhattacherjee, A. (2012). Social science research: Principles, methods, and practices. Textbooks Collection . Book 3.

Google Scholar  

Comte, A., & Bridges, J. H. (Tr.) (1865). A general view of positivism . Trubner and Co. (reissued by Cambridge University Press , 2009).

Dilworth, C. B. (1982). Empirical research in the literature class. English Journal, 71 (3), 95–97.

Article   Google Scholar  

Heisenberg, W. (1971). Positivism, metaphysics and religion. In R. N. Nanshen (Ed.), Werner Heisenberg – Physics and beyond – Encounters and conversations , World Perspectives. 42. Translator: Arnold J. Pomerans. New York: Harper and Row.

Hossain, F. M. A. (2014). A critical analysis of empiricism. Open Journal of Philosophy, 2014 (4), 225–230.

Kant, I. (1783). Prolegomena to any future metaphysic (trans: Bennett, J.). Early Modern Texts. www.earlymoderntexts.com

Koch, S. (1992). Psychology’s Bridgman vs. Bridgman’s Bridgman: An essay in reconstruction. Theory and Psychology, 2 (3), 261–290.

Matin, A. (1968). An outline of philosophy . Dhaka: Mullick Brothers.

Mcleod, S. (2008). Psychology as science. http://www.simplypsychology.org/science-psychology.html

Popper, K. (1963). Conjectures and refutations: The growth of scientific knowledge . London: Routledge.

Simmel, G. (1908). The problem areas of sociology in Kurt H. Wolf: The sociology of Georg Simmel . London: The Free Press.

Weber, M. (1991). The nature of social action. In W. G. Runciman (Ed.), Weber: Selections in translation . Cambridge: Cambridge University Press.

Download references

Author information

Authors and affiliations.

Department of Political Science, University of Ibadan, Ibadan, Oyo, 200284, Nigeria

Emeka Thaddues Njoku

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Emeka Thaddues Njoku .

Editor information

Editors and affiliations.

Rhinebeck, New York, USA

David A. Leeming

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer-Verlag GmbH Germany

About this entry

Cite this entry.

Njoku, E.T. (2017). Empirical Research. In: Leeming, D. (eds) Encyclopedia of Psychology and Religion. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27771-9_200051-1

Download citation

DOI : https://doi.org/10.1007/978-3-642-27771-9_200051-1

Received : 01 April 2017

Accepted : 08 May 2017

Published : 22 May 2017

Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-642-27771-9

Online ISBN : 978-3-642-27771-9

eBook Packages : Springer Reference Behavioral Science and Psychology Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

empirical research vs scientific method

Home Market Research

Empirical Research: Definition, Methods, Types and Examples

What is Empirical Research

Content Index

Empirical research: Definition

Empirical research: origin, quantitative research methods, qualitative research methods, steps for conducting empirical research, empirical research methodology cycle, advantages of empirical research, disadvantages of empirical research, why is there a need for empirical research.

Empirical research is defined as any research where conclusions of the study is strictly drawn from concretely empirical evidence, and therefore “verifiable” evidence.

This empirical evidence can be gathered using quantitative market research and  qualitative market research  methods.

For example: A research is being conducted to find out if listening to happy music in the workplace while working may promote creativity? An experiment is conducted by using a music website survey on a set of audience who are exposed to happy music and another set who are not listening to music at all, and the subjects are then observed. The results derived from such a research will give empirical evidence if it does promote creativity or not.

LEARN ABOUT: Behavioral Research

You must have heard the quote” I will not believe it unless I see it”. This came from the ancient empiricists, a fundamental understanding that powered the emergence of medieval science during the renaissance period and laid the foundation of modern science, as we know it today. The word itself has its roots in greek. It is derived from the greek word empeirikos which means “experienced”.

In today’s world, the word empirical refers to collection of data using evidence that is collected through observation or experience or by using calibrated scientific instruments. All of the above origins have one thing in common which is dependence of observation and experiments to collect data and test them to come up with conclusions.

LEARN ABOUT: Causal Research

Types and methodologies of empirical research

Empirical research can be conducted and analysed using qualitative or quantitative methods.

  • Quantitative research : Quantitative research methods are used to gather information through numerical data. It is used to quantify opinions, behaviors or other defined variables . These are predetermined and are in a more structured format. Some of the commonly used methods are survey, longitudinal studies, polls, etc
  • Qualitative research:   Qualitative research methods are used to gather non numerical data.  It is used to find meanings, opinions, or the underlying reasons from its subjects. These methods are unstructured or semi structured. The sample size for such a research is usually small and it is a conversational type of method to provide more insight or in-depth information about the problem Some of the most popular forms of methods are focus groups, experiments, interviews, etc.

Data collected from these will need to be analysed. Empirical evidence can also be analysed either quantitatively and qualitatively. Using this, the researcher can answer empirical questions which have to be clearly defined and answerable with the findings he has got. The type of research design used will vary depending on the field in which it is going to be used. Many of them might choose to do a collective research involving quantitative and qualitative method to better answer questions which cannot be studied in a laboratory setting.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

Quantitative research methods aid in analyzing the empirical evidence gathered. By using these a researcher can find out if his hypothesis is supported or not.

  • Survey research: Survey research generally involves a large audience to collect a large amount of data. This is a quantitative method having a predetermined set of closed questions which are pretty easy to answer. Because of the simplicity of such a method, high responses are achieved. It is one of the most commonly used methods for all kinds of research in today’s world.

Previously, surveys were taken face to face only with maybe a recorder. However, with advancement in technology and for ease, new mediums such as emails , or social media have emerged.

For example: Depletion of energy resources is a growing concern and hence there is a need for awareness about renewable energy. According to recent studies, fossil fuels still account for around 80% of energy consumption in the United States. Even though there is a rise in the use of green energy every year, there are certain parameters because of which the general population is still not opting for green energy. In order to understand why, a survey can be conducted to gather opinions of the general population about green energy and the factors that influence their choice of switching to renewable energy. Such a survey can help institutions or governing bodies to promote appropriate awareness and incentive schemes to push the use of greener energy.

Learn more: Renewable Energy Survey Template Descriptive Research vs Correlational Research

  • Experimental research: In experimental research , an experiment is set up and a hypothesis is tested by creating a situation in which one of the variable is manipulated. This is also used to check cause and effect. It is tested to see what happens to the independent variable if the other one is removed or altered. The process for such a method is usually proposing a hypothesis, experimenting on it, analyzing the findings and reporting the findings to understand if it supports the theory or not.

For example: A particular product company is trying to find what is the reason for them to not be able to capture the market. So the organisation makes changes in each one of the processes like manufacturing, marketing, sales and operations. Through the experiment they understand that sales training directly impacts the market coverage for their product. If the person is trained well, then the product will have better coverage.

  • Correlational research: Correlational research is used to find relation between two set of variables . Regression analysis is generally used to predict outcomes of such a method. It can be positive, negative or neutral correlation.

LEARN ABOUT: Level of Analysis

For example: Higher educated individuals will get higher paying jobs. This means higher education enables the individual to high paying job and less education will lead to lower paying jobs.

  • Longitudinal study: Longitudinal study is used to understand the traits or behavior of a subject under observation after repeatedly testing the subject over a period of time. Data collected from such a method can be qualitative or quantitative in nature.

For example: A research to find out benefits of exercise. The target is asked to exercise everyday for a particular period of time and the results show higher endurance, stamina, and muscle growth. This supports the fact that exercise benefits an individual body.

  • Cross sectional: Cross sectional study is an observational type of method, in which a set of audience is observed at a given point in time. In this type, the set of people are chosen in a fashion which depicts similarity in all the variables except the one which is being researched. This type does not enable the researcher to establish a cause and effect relationship as it is not observed for a continuous time period. It is majorly used by healthcare sector or the retail industry.

For example: A medical study to find the prevalence of under-nutrition disorders in kids of a given population. This will involve looking at a wide range of parameters like age, ethnicity, location, incomes  and social backgrounds. If a significant number of kids coming from poor families show under-nutrition disorders, the researcher can further investigate into it. Usually a cross sectional study is followed by a longitudinal study to find out the exact reason.

  • Causal-Comparative research : This method is based on comparison. It is mainly used to find out cause-effect relationship between two variables or even multiple variables.

For example: A researcher measured the productivity of employees in a company which gave breaks to the employees during work and compared that to the employees of the company which did not give breaks at all.

LEARN ABOUT: Action Research

Some research questions need to be analysed qualitatively, as quantitative methods are not applicable there. In many cases, in-depth information is needed or a researcher may need to observe a target audience behavior, hence the results needed are in a descriptive analysis form. Qualitative research results will be descriptive rather than predictive. It enables the researcher to build or support theories for future potential quantitative research. In such a situation qualitative research methods are used to derive a conclusion to support the theory or hypothesis being studied.

LEARN ABOUT: Qualitative Interview

  • Case study: Case study method is used to find more information through carefully analyzing existing cases. It is very often used for business research or to gather empirical evidence for investigation purpose. It is a method to investigate a problem within its real life context through existing cases. The researcher has to carefully analyse making sure the parameter and variables in the existing case are the same as to the case that is being investigated. Using the findings from the case study, conclusions can be drawn regarding the topic that is being studied.

For example: A report mentioning the solution provided by a company to its client. The challenges they faced during initiation and deployment, the findings of the case and solutions they offered for the problems. Such case studies are used by most companies as it forms an empirical evidence for the company to promote in order to get more business.

  • Observational method:   Observational method is a process to observe and gather data from its target. Since it is a qualitative method it is time consuming and very personal. It can be said that observational research method is a part of ethnographic research which is also used to gather empirical evidence. This is usually a qualitative form of research, however in some cases it can be quantitative as well depending on what is being studied.

For example: setting up a research to observe a particular animal in the rain-forests of amazon. Such a research usually take a lot of time as observation has to be done for a set amount of time to study patterns or behavior of the subject. Another example used widely nowadays is to observe people shopping in a mall to figure out buying behavior of consumers.

  • One-on-one interview: Such a method is purely qualitative and one of the most widely used. The reason being it enables a researcher get precise meaningful data if the right questions are asked. It is a conversational method where in-depth data can be gathered depending on where the conversation leads.

For example: A one-on-one interview with the finance minister to gather data on financial policies of the country and its implications on the public.

  • Focus groups: Focus groups are used when a researcher wants to find answers to why, what and how questions. A small group is generally chosen for such a method and it is not necessary to interact with the group in person. A moderator is generally needed in case the group is being addressed in person. This is widely used by product companies to collect data about their brands and the product.

For example: A mobile phone manufacturer wanting to have a feedback on the dimensions of one of their models which is yet to be launched. Such studies help the company meet the demand of the customer and position their model appropriately in the market.

  • Text analysis: Text analysis method is a little new compared to the other types. Such a method is used to analyse social life by going through images or words used by the individual. In today’s world, with social media playing a major part of everyone’s life, such a method enables the research to follow the pattern that relates to his study.

For example: A lot of companies ask for feedback from the customer in detail mentioning how satisfied are they with their customer support team. Such data enables the researcher to take appropriate decisions to make their support team better.

Sometimes a combination of the methods is also needed for some questions that cannot be answered using only one type of method especially when a researcher needs to gain a complete understanding of complex subject matter.

We recently published a blog that talks about examples of qualitative data in education ; why don’t you check it out for more ideas?

Since empirical research is based on observation and capturing experiences, it is important to plan the steps to conduct the experiment and how to analyse it. This will enable the researcher to resolve problems or obstacles which can occur during the experiment.

Step #1: Define the purpose of the research

This is the step where the researcher has to answer questions like what exactly do I want to find out? What is the problem statement? Are there any issues in terms of the availability of knowledge, data, time or resources. Will this research be more beneficial than what it will cost.

Before going ahead, a researcher has to clearly define his purpose for the research and set up a plan to carry out further tasks.

Step #2 : Supporting theories and relevant literature

The researcher needs to find out if there are theories which can be linked to his research problem . He has to figure out if any theory can help him support his findings. All kind of relevant literature will help the researcher to find if there are others who have researched this before, or what are the problems faced during this research. The researcher will also have to set up assumptions and also find out if there is any history regarding his research problem

Step #3: Creation of Hypothesis and measurement

Before beginning the actual research he needs to provide himself a working hypothesis or guess what will be the probable result. Researcher has to set up variables, decide the environment for the research and find out how can he relate between the variables.

Researcher will also need to define the units of measurements, tolerable degree for errors, and find out if the measurement chosen will be acceptable by others.

Step #4: Methodology, research design and data collection

In this step, the researcher has to define a strategy for conducting his research. He has to set up experiments to collect data which will enable him to propose the hypothesis. The researcher will decide whether he will need experimental or non experimental method for conducting the research. The type of research design will vary depending on the field in which the research is being conducted. Last but not the least, the researcher will have to find out parameters that will affect the validity of the research design. Data collection will need to be done by choosing appropriate samples depending on the research question. To carry out the research, he can use one of the many sampling techniques. Once data collection is complete, researcher will have empirical data which needs to be analysed.

LEARN ABOUT: Best Data Collection Tools

Step #5: Data Analysis and result

Data analysis can be done in two ways, qualitatively and quantitatively. Researcher will need to find out what qualitative method or quantitative method will be needed or will he need a combination of both. Depending on the unit of analysis of his data, he will know if his hypothesis is supported or rejected. Analyzing this data is the most important part to support his hypothesis.

Step #6: Conclusion

A report will need to be made with the findings of the research. The researcher can give the theories and literature that support his research. He can make suggestions or recommendations for further research on his topic.

Empirical research methodology cycle

A.D. de Groot, a famous dutch psychologist and a chess expert conducted some of the most notable experiments using chess in the 1940’s. During his study, he came up with a cycle which is consistent and now widely used to conduct empirical research. It consists of 5 phases with each phase being as important as the next one. The empirical cycle captures the process of coming up with hypothesis about how certain subjects work or behave and then testing these hypothesis against empirical data in a systematic and rigorous approach. It can be said that it characterizes the deductive approach to science. Following is the empirical cycle.

  • Observation: At this phase an idea is sparked for proposing a hypothesis. During this phase empirical data is gathered using observation. For example: a particular species of flower bloom in a different color only during a specific season.
  • Induction: Inductive reasoning is then carried out to form a general conclusion from the data gathered through observation. For example: As stated above it is observed that the species of flower blooms in a different color during a specific season. A researcher may ask a question “does the temperature in the season cause the color change in the flower?” He can assume that is the case, however it is a mere conjecture and hence an experiment needs to be set up to support this hypothesis. So he tags a few set of flowers kept at a different temperature and observes if they still change the color?
  • Deduction: This phase helps the researcher to deduce a conclusion out of his experiment. This has to be based on logic and rationality to come up with specific unbiased results.For example: In the experiment, if the tagged flowers in a different temperature environment do not change the color then it can be concluded that temperature plays a role in changing the color of the bloom.
  • Testing: This phase involves the researcher to return to empirical methods to put his hypothesis to the test. The researcher now needs to make sense of his data and hence needs to use statistical analysis plans to determine the temperature and bloom color relationship. If the researcher finds out that most flowers bloom a different color when exposed to the certain temperature and the others do not when the temperature is different, he has found support to his hypothesis. Please note this not proof but just a support to his hypothesis.
  • Evaluation: This phase is generally forgotten by most but is an important one to keep gaining knowledge. During this phase the researcher puts forth the data he has collected, the support argument and his conclusion. The researcher also states the limitations for the experiment and his hypothesis and suggests tips for others to pick it up and continue a more in-depth research for others in the future. LEARN MORE: Population vs Sample

LEARN MORE: Population vs Sample

There is a reason why empirical research is one of the most widely used method. There are a few advantages associated with it. Following are a few of them.

  • It is used to authenticate traditional research through various experiments and observations.
  • This research methodology makes the research being conducted more competent and authentic.
  • It enables a researcher understand the dynamic changes that can happen and change his strategy accordingly.
  • The level of control in such a research is high so the researcher can control multiple variables.
  • It plays a vital role in increasing internal validity .

Even though empirical research makes the research more competent and authentic, it does have a few disadvantages. Following are a few of them.

  • Such a research needs patience as it can be very time consuming. The researcher has to collect data from multiple sources and the parameters involved are quite a few, which will lead to a time consuming research.
  • Most of the time, a researcher will need to conduct research at different locations or in different environments, this can lead to an expensive affair.
  • There are a few rules in which experiments can be performed and hence permissions are needed. Many a times, it is very difficult to get certain permissions to carry out different methods of this research.
  • Collection of data can be a problem sometimes, as it has to be collected from a variety of sources through different methods.

LEARN ABOUT:  Social Communication Questionnaire

Empirical research is important in today’s world because most people believe in something only that they can see, hear or experience. It is used to validate multiple hypothesis and increase human knowledge and continue doing it to keep advancing in various fields.

For example: Pharmaceutical companies use empirical research to try out a specific drug on controlled groups or random groups to study the effect and cause. This way, they prove certain theories they had proposed for the specific drug. Such research is very important as sometimes it can lead to finding a cure for a disease that has existed for many years. It is useful in science and many other fields like history, social sciences, business, etc.

LEARN ABOUT: 12 Best Tools for Researchers

With the advancement in today’s world, empirical research has become critical and a norm in many fields to support their hypothesis and gain more knowledge. The methods mentioned above are very useful for carrying out such research. However, a number of new methods will keep coming up as the nature of new investigative questions keeps getting unique or changing.

Create a single source of real data with a built-for-insights platform. Store past data, add nuggets of insights, and import research data from various sources into a CRM for insights. Build on ever-growing research with a real-time dashboard in a unified research management platform to turn insights into knowledge.



data information vs insight

Data Information vs Insight: Essential differences

May 14, 2024

pricing analytics software

Pricing Analytics Software: Optimize Your Pricing Strategy

May 13, 2024

relationship marketing

Relationship Marketing: What It Is, Examples & Top 7 Benefits

May 8, 2024

email survey tool

The Best Email Survey Tool to Boost Your Feedback Game

May 7, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 01 June 2023

Data, measurement and empirical methods in the science of science

  • Lu Liu 1 , 2 , 3 , 4 ,
  • Benjamin F. Jones   ORCID: orcid.org/0000-0001-9697-9388 1 , 2 , 3 , 5 , 6 ,
  • Brian Uzzi   ORCID: orcid.org/0000-0001-6855-2854 1 , 2 , 3 &
  • Dashun Wang   ORCID: orcid.org/0000-0002-7054-2206 1 , 2 , 3 , 7  

Nature Human Behaviour volume  7 ,  pages 1046–1058 ( 2023 ) Cite this article

17k Accesses

7 Citations

118 Altmetric

Metrics details

  • Scientific community

The advent of large-scale datasets that trace the workings of science has encouraged researchers from many different disciplinary backgrounds to turn scientific methods into science itself, cultivating a rapidly expanding ‘science of science’. This Review considers this growing, multidisciplinary literature through the lens of data, measurement and empirical methods. We discuss the purposes, strengths and limitations of major empirical approaches, seeking to increase understanding of the field’s diverse methodologies and expand researchers’ toolkits. Overall, new empirical developments provide enormous capacity to test traditional beliefs and conceptual frameworks about science, discover factors associated with scientific productivity, predict scientific outcomes and design policies that facilitate scientific progress.

Similar content being viewed by others

empirical research vs scientific method

Entropy, irreversibility and inference at the foundations of statistical physics

empirical research vs scientific method

Interviews in the social sciences

empirical research vs scientific method

The role of artificial intelligence in achieving the Sustainable Development Goals

Scientific advances are a key input to rising standards of living, health and the capacity of society to confront grand challenges, from climate change to the COVID-19 pandemic 1 , 2 , 3 . A deeper understanding of how science works and where innovation occurs can help us to more effectively design science policy and science institutions, better inform scientists’ own research choices, and create and capture enormous value for science and humanity. Building on these key premises, recent years have witnessed substantial development in the ‘science of science’ 4 , 5 , 6 , 7 , 8 , 9 , which uses large-scale datasets and diverse computational toolkits to unearth fundamental patterns behind scientific production and use.

The idea of turning scientific methods into science itself is long-standing. Since the mid-20th century, researchers from different disciplines have asked central questions about the nature of scientific progress and the practice, organization and impact of scientific research. Building on these rich historical roots, the field of the science of science draws upon many disciplines, ranging from information science to the social, physical and biological sciences to computer science, engineering and design. The science of science closely relates to several strands and communities of research, including metascience, scientometrics, the economics of science, research on research, science and technology studies, the sociology of science, metaknowledge and quantitative science studies 5 . There are noticeable differences between some of these communities, mostly around their historical origins and the initial disciplinary composition of researchers forming these communities. For example, metascience has its origins in the clinical sciences and psychology, and focuses on rigour, transparency, reproducibility and other open science-related practices and topics. The scientometrics community, born in library and information sciences, places a particular emphasis on developing robust and responsible measures and indicators for science. Science and technology studies engage the history of science and technology, the philosophy of science, and the interplay between science, technology and society. The science of science, which has its origins in physics, computer science and sociology, takes a data-driven approach and emphasizes questions on how science works. Each of these communities has made fundamental contributions to understanding science. While they differ in their origins, these differences pale in comparison to the overarching, common interest in understanding the practice of science and its societal impact.

Three major developments have encouraged rapid advances in the science of science. The first is in data 9 : modern databases include millions of research articles, grant proposals, patents and more. This windfall of data traces scientific activity in remarkable detail and at scale. The second development is in measurement: scholars have used data to develop many new measures of scientific activities and examine theories that have long been viewed as important but difficult to quantify. The third development is in empirical methods: thanks to parallel advances in data science, network science, artificial intelligence and econometrics, researchers can study relationships, make predictions and assess science policy in powerful new ways. Together, new data, measurements and methods have revealed fundamental new insights about the inner workings of science and scientific progress itself.

With multiple approaches, however, comes a key challenge. As researchers adhere to norms respected within their disciplines, their methods vary, with results often published in venues with non-overlapping readership, fragmenting research along disciplinary boundaries. This fragmentation challenges researchers’ ability to appreciate and understand the value of work outside of their own discipline, much less to build directly on it for further investigations.

Recognizing these challenges and the rapidly developing nature of the field, this paper reviews the empirical approaches that are prevalent in this literature. We aim to provide readers with an up-to-date understanding of the available datasets, measurement constructs and empirical methodologies, as well as the value and limitations of each. Owing to space constraints, this Review does not cover the full technical details of each method, referring readers to related guides to learn more. Instead, we will emphasize why a researcher might favour one method over another, depending on the research question.

Beyond a positive understanding of science, a key goal of the science of science is to inform science policy. While this Review mainly focuses on empirical approaches, with its core audience being researchers in the field, the studies reviewed are also germane to key policy questions. For example, what is the appropriate scale of scientific investment, in what directions and through what institutions 10 , 11 ? Are public investments in science aligned with public interests 12 ? What conditions produce novel or high-impact science 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 ? How do the reward systems of science influence the rate and direction of progress 13 , 21 , 22 , 23 , 24 , and what governs scientific reproducibility 25 , 26 , 27 ? How do contributions evolve over a scientific career 28 , 29 , 30 , 31 , 32 , and how may diversity among scientists advance scientific progress 33 , 34 , 35 , among other questions relevant to science policy 36 , 37 .

Overall, this review aims to facilitate entry to science of science research, expand researcher toolkits and illustrate how diverse research approaches contribute to our collective understanding of science. Section 2 reviews datasets and data linkages. Section 3 reviews major measurement constructs in the science of science. Section 4 considers a range of empirical methods, focusing on one study to illustrate each method and briefly summarizing related examples and applications. Section 5 concludes with an outlook for the science of science.

Historically, data on scientific activities were difficult to collect and were available in limited quantities. Gathering data could involve manually tallying statistics from publications 38 , 39 , interviewing scientists 16 , 40 , or assembling historical anecdotes and biographies 13 , 41 . Analyses were typically limited to a specific domain or group of scientists. Today, massive datasets on scientific production and use are at researchers’ fingertips 42 , 43 , 44 . Armed with big data and advanced algorithms, researchers can now probe questions previously not amenable to quantification and with enormous increases in scope and scale, as detailed below.

Publication datasets cover papers from nearly all scientific disciplines, enabling analyses of both general and domain-specific patterns. Commonly used datasets include the Web of Science (WoS), PubMed, CrossRef, ORCID, OpenCitations, Dimensions and OpenAlex. Datasets incorporating papers’ text (CORE) 45 , 46 , 47 , data entities (DataCite) 48 , 49 and peer review reports (Publons) 33 , 50 , 51 have also become available. These datasets further enable novel measurement, for example, representations of a paper’s content 52 , 53 , novelty 15 , 54 and interdisciplinarity 55 .

Notably, databases today capture more diverse aspects of science beyond publications, offering a richer and more encompassing view of research contexts and of researchers themselves (Fig. 1 ). For example, some datasets trace research funding to the specific publications these investments support 56 , 57 , allowing high-scale studies of the impact of funding on productivity and the return on public investment. Datasets incorporating job placements 58 , 59 , curriculum vitae 21 , 59 and scientific prizes 23 offer rich quantitative evidence on the social structure of science. Combining publication profiles with mentorship genealogies 60 , 61 , dissertations 34 and course syllabi 62 , 63 provides insights on mentoring and cultivating talent.

figure 1

This figure presents commonly used data types in science of science research, information contained in each data type and examples of data sources. Datasets in the science of science research have not only grown in scale but have also expanded beyond publications to integrate upstream funding investments and downstream applications that extend beyond science itself.

Finally, today’s scope of data extends beyond science to broader aspects of society. Altmetrics 64 captures news media and social media mentions of scientific articles. Other databases incorporate marketplace uses of science, including through patents 10 , pharmaceutical clinical trials and drug approvals 65 , 66 . Policy documents 67 , 68 help us to understand the role of science in the halls of government 69 and policy making 12 , 68 .

While datasets of the modern scientific enterprise have grown exponentially, they are not without limitations. As is often the case for data-driven research, drawing conclusions from specific data sources requires scrutiny and care. Datasets are typically based on published work, which may favour easy-to-publish topics over important ones (the streetlight effect) 70 , 71 . The publication of negative results is also rare (the file drawer problem) 72 , 73 . Meanwhile, English language publications account for over 90% of articles in major data sources, with limited coverage of non-English journals 74 . Publication datasets may also reflect biases in data collection across research institutions or demographic groups. Despite the open science movement, many datasets require paid subscriptions, which can create inequality in data access. Creating more open datasets for the science of science, such as OpenAlex, may not only improve the robustness and replicability of empirical claims but also increase entry to the field.

As today’s datasets become larger in scale and continue to integrate new dimensions, they offer opportunities to unveil the inner workings and external impacts of science in new ways. They can enable researchers to reach beyond previous limitations while conducting original studies of new and long-standing questions about the sciences.


Here we discuss prominent measurement approaches in the science of science, including their purposes and limitations.

Modern publication databases typically include data on which articles and authors cite other papers and scientists. These citation linkages have been used to engage core conceptual ideas in scientific research. Here we consider two common measures based on citation information: citation counts and knowledge flows.

First, citation counts are commonly used indicators of impact. The term ‘indicator’ implies that it only approximates the concept of interest. A citation count is defined as how many times a document is cited by subsequent documents and can proxy for the importance of research papers 75 , 76 as well as patented inventions 77 , 78 , 79 . Rather than treating each citation equally, measures may further weight the importance of each citation, for example by using the citation network structure to produce centrality 80 , PageRank 81 , 82 or Eigenfactor indicators 83 , 84 .

Citation-based indicators have also faced criticism 84 , 85 . Citation indicators necessarily oversimplify the construct of impact, often ignoring heterogeneity in the meaning and use of a particular reference, the variations in citation practices across fields and institutional contexts, and the potential for reputation and power structures in science to influence citation behaviour 86 , 87 . Researchers have started to understand more nuanced citation behaviours ranging from negative citations 86 to citation context 47 , 88 , 89 . Understanding what a citation actually measures matters in interpreting and applying many research findings in the science of science. Evaluations relying on citation-based indicators rather than expert judgements raise questions regarding misuse 90 , 91 , 92 . Given the importance of developing indicators that can reliably quantify and evaluate science, the scientometrics community has been working to provide guidance for responsible citation practices and assessment 85 .

Second, scientists use citations to trace knowledge flows. Each citation in a paper is a link to specific previous work from which we can proxy how new discoveries draw upon existing ideas 76 , 93 and how knowledge flows between fields of science 94 , 95 , research institutions 96 , regions and nations 97 , 98 , 99 , and individuals 81 . Combinations of citation linkages can also approximate novelty 15 , disruptiveness 17 , 100 and interdisciplinarity 55 , 95 , 101 , 102 . A rapidly expanding body of work further examines citations to scientific articles from other domains (for example, patents, clinical drug trials and policy documents) to understand the applied value of science 10 , 12 , 65 , 66 , 103 , 104 , 105 .


Analysing individual careers allows researchers to answer questions such as: How do we quantify individual scientific productivity? What is a typical career lifecycle? How are resources and credits allocated across individuals and careers? A scholar’s career can be examined through the papers they publish 30 , 31 , 106 , 107 , 108 , with attention to career progression and mobility, publication counts and citation impact, as well as grant funding 24 , 109 , 110 and prizes 111 , 112 , 113 ,

Studies of individual impact focus on output, typically approximated by the number of papers a researcher publishes and citation indicators. A popular measure for individual impact is the h -index 114 , which takes both volume and per-paper impact into consideration. Specifically, a scientist is assigned the largest value h such that they have h papers that were each cited at least h times. Later studies build on the idea of the h -index and propose variants to address limitations 115 , these variants ranging from emphasizing highly cited papers in a career 116 , to field differences 117 and normalizations 118 , to the relative contribution of an individual in collaborative works 119 .

To study dynamics in output over the lifecycle, individuals can be studied according to age, career age or the sequence of publications. A long-standing literature has investigated the relationship between age and the likelihood of outstanding achievement 28 , 106 , 111 , 120 , 121 . Recent studies further decouple the relationship between age, publication volume and per-paper citation, and measure the likelihood of producing highly cited papers in the sequence of works one produces 30 , 31 .

As simple as it sounds, representing careers using publication records is difficult. Collecting the full publication list of a researcher is the foundation to study individuals yet remains a key challenge, requiring name disambiguation techniques to match specific works to specific researchers. Although algorithms are increasingly capable at identifying millions of career profiles 122 , they vary in accuracy and robustness. ORCID can help to alleviate the problem by offering researchers the opportunity to create, maintain and update individual profiles themselves, and it goes beyond publications to collect broader outputs and activities 123 . A second challenge is survivorship bias. Empirical studies tend to focus on careers that are long enough to afford statistical analyses, which limits the applicability of the findings to scientific careers as a whole. A third challenge is the breadth of scientists’ activities, where focusing on publications ignores other important contributions such as mentorship and teaching, service (for example, refereeing papers, reviewing grant proposals and editing journals) or leadership within their organizations. Although researchers have begun exploring these dimensions by linking individual publication profiles with genealogical databases 61 , 124 , dissertations 34 , grants 109 , curriculum vitae 21 and acknowledgements 125 , scientific careers beyond publication records remain under-studied 126 , 127 . Lastly, citation-based indicators only serve as an approximation of individual performance with similar limitations as discussed above. The scientific community has called for more appropriate practices 85 , 128 , ranging from incorporating expert assessment of research contributions to broadening the measures of impact beyond publications.

Over many decades, science has exhibited a substantial and steady shift away from solo authorship towards coauthorship, especially among highly cited works 18 , 129 , 130 . In light of this shift, a research field, the science of team science 131 , 132 , has emerged to study the mechanisms that facilitate or hinder the effectiveness of teams. Team size can be proxied by the number of coauthors on a paper, which has been shown to predict distinctive types of advance: whereas larger teams tend to develop ideas, smaller teams tend to disrupt current ways of thinking 17 . Team characteristics can be inferred from coauthors’ backgrounds 133 , 134 , 135 , allowing quantification of a team’s diversity in terms of field, age, gender or ethnicity. Collaboration networks based on coauthorship 130 , 136 , 137 , 138 , 139 offer nuanced network-based indicators to understand individual and institutional collaborations.

However, there are limitations to using coauthorship alone to study teams 132 . First, coauthorship can obscure individual roles 140 , 141 , 142 , which has prompted institutional responses to help to allocate credit, including authorship order and individual contribution statements 56 , 143 . Second, coauthorship does not reflect the complex dynamics and interactions between team members that are often instrumental for team success 53 , 144 . Third, collaborative contributions can extend beyond coauthorship in publications to include members of a research laboratory 145 or co-principal investigators (co-PIs) on a grant 146 . Initiatives such as CRediT may help to address some of these issues by recording detailed roles for each contributor 147 .


Research institutions, such as departments, universities, national laboratories and firms, encompass wider groups of researchers and their corresponding outputs. Institutional membership can be inferred from affiliations listed on publications or patents 148 , 149 , and the output of an institution can be aggregated over all its affiliated researchers 150 . Institutional research information systems (CRIS) contain more comprehensive research outputs and activities from employees.

Some research questions consider the institution as a whole, investigating the returns to research and development investment 104 , inequality of resource allocation 22 and the flow of scientists 21 , 148 , 149 . Other questions focus on institutional structures as sources of research productivity by looking into the role of peer effects 125 , 151 , 152 , 153 , how institutional policies impact research outcomes 154 , 155 and whether interdisciplinary efforts foster innovation 55 . Institution-oriented measurement faces similar limitations as with analyses of individuals and teams, including name disambiguation for a given institution and the limited capacity of formal publication records to characterize the full range of relevant institutional outcomes. It is also unclear how to allocate credit among multiple institutions associated with a paper. Moreover, relevant institutional employees extend beyond publishing researchers: interns, technicians and administrators all contribute to research endeavours 130 .

In sum, measurements allow researchers to quantify scientific production and use across numerous dimensions, but they also raise questions of construct validity: Does the proposed metric really reflect what we want to measure? Testing the construct’s validity is important, as is understanding a construct’s limits. Where possible, using alternative measurement approaches, or qualitative methods such as interviews and surveys, can improve measurement accuracy and the robustness of findings.

Empirical methods

In this section, we review two broad categories of empirical approaches (Table 1 ), each with distinctive goals: (1) to discover, estimate and predict empirical regularities; and (2) to identify causal mechanisms. For each method, we give a concrete example to help to explain how the method works, summarize related work for interested readers, and discuss contributions and limitations.

Descriptive and predictive approaches

Empirical regularities and generalizable facts.

The discovery of empirical regularities in science has had a key role in driving conceptual developments and the directions of future research. By observing empirical patterns at scale, researchers unveil central facts that shape science and present core features that theories of scientific progress and practice must explain. For example, consider citation distributions. de Solla Price first proposed that citation distributions are fat-tailed 39 , indicating that a few papers have extremely high citations while most papers have relatively few or even no citations at all. de Solla Price proposed that citation distribution was a power law, while researchers have since refined this view to show that the distribution appears log-normal, a nearly universal regularity across time and fields 156 , 157 . The fat-tailed nature of citation distributions and its universality across the sciences has in turn sparked substantial theoretical work that seeks to explain this key empirical regularity 20 , 156 , 158 , 159 .

Empirical regularities are often surprising and can contest previous beliefs of how science works. For example, it has been shown that the age distribution of great achievements peaks in middle age across a wide range of fields 107 , 121 , 160 , rejecting the common belief that young scientists typically drive breakthroughs in science. A closer look at the individual careers also indicates that productivity patterns vary widely across individuals 29 . Further, a scholar’s highest-impact papers come at a remarkably constant rate across the sequence of their work 30 , 31 .

The discovery of empirical regularities has had important roles in shaping beliefs about the nature of science 10 , 45 , 161 , 162 , sources of breakthrough ideas 15 , 163 , 164 , 165 , scientific careers 21 , 29 , 126 , 127 , the network structure of ideas and scientists 23 , 98 , 136 , 137 , 138 , 139 , 166 , gender inequality 57 , 108 , 126 , 135 , 143 , 167 , 168 , and many other areas of interest to scientists and science institutions 22 , 47 , 86 , 97 , 102 , 105 , 134 , 169 , 170 , 171 . At the same time, care must be taken to ensure that findings are not merely artefacts due to data selection or inherent bias. To differentiate meaningful patterns from spurious ones, it is important to stress test the findings through different selection criteria or across non-overlapping data sources.

Regression analysis

When investigating correlations among variables, a classic method is regression, which estimates how one set of variables explains variation in an outcome of interest. Regression can be used to test explicit hypotheses or predict outcomes. For example, researchers have investigated whether a paper’s novelty predicts its citation impact 172 . Adding additional control variables to the regression, one can further examine the robustness of the focal relationship.

Although regression analysis is useful for hypothesis testing, it bears substantial limitations. If the question one wishes to ask concerns a ‘causal’ rather than a correlational relationship, regression is poorly suited to the task as it is impossible to control for all the confounding factors. Failing to account for such ‘omitted variables’ can bias the regression coefficient estimates and lead to spurious interpretations. Further, regression models often have low goodness of fit (small R 2 ), indicating that the variables considered explain little of the outcome variation. As regressions typically focus on a specific relationship in simple functional forms, regressions tend to emphasize interpretability rather than overall predictability. The advent of predictive approaches powered by large-scale datasets and novel computational techniques offers new opportunities for modelling complex relationships with stronger predictive power.

Mechanistic models

Mechanistic modelling is an important approach to explaining empirical regularities, drawing from methods primarily used in physics. Such models predict macro-level regularities of a system by modelling micro-level interactions among basic elements with interpretable and modifiable formulars. While theoretical by nature, mechanistic models in the science of science are often empirically grounded, and this approach has developed together with the advent of large-scale, high-resolution data.

Simplicity is the core value of a mechanistic model. Consider for example, why citations follow a fat-tailed distribution. de Solla Price modelled the citing behaviour as a cumulative advantage process on a growing citation network 159 and found that if the probability a paper is cited grows linearly with its existing citations, the resulting distribution would follow a power law, broadly aligned with empirical observations. The model is intentionally simplified, ignoring myriad factors. Yet the simple cumulative advantage process is by itself sufficient in explaining a power law distribution of citations. In this way, mechanistic models can help to reveal key mechanisms that can explain observed patterns.

Moreover, mechanistic models can be refined as empirical evidence evolves. For example, later investigations showed that citation distributions are better characterized as log-normal 156 , 173 , prompting researchers to introduce a fitness parameter to encapsulate the inherent differences in papers’ ability to attract citations 174 , 175 . Further, older papers are less likely to be cited than expected 176 , 177 , 178 , motivating more recent models 20 to introduce an additional aging effect 179 . By combining the cumulative advantage, fitness and aging effects, one can already achieve substantial predictive power not just for the overall properties of the system but also the citation dynamics of individual papers 20 .

In addition to citations, mechanistic models have been developed to understand the formation of collaborations 136 , 180 , 181 , 182 , 183 , knowledge discovery and diffusion 184 , 185 , topic selection 186 , 187 , career dynamics 30 , 31 , 188 , 189 , the growth of scientific fields 190 and the dynamics of failure in science and other domains 178 .

At the same time, some observers have argued that mechanistic models are too simplistic to capture the essence of complex real-world problems 191 . While it has been a cornerstone for the natural sciences, representing social phenomena in a limited set of mathematical equations may miss complexities and heterogeneities that make social phenomena interesting in the first place. Such concerns are not unique to the science of science, as they represent a broader theme in computational social sciences 192 , 193 , ranging from social networks 194 , 195 to human mobility 196 , 197 to epidemics 198 , 199 . Other observers have questioned the practical utility of mechanistic models and whether they can be used to guide decisions and devise actionable policies. Nevertheless, despite these limitations, several complex phenomena in the science of science are well captured by simple mechanistic models, showing a high degree of regularity beneath complex interacting systems and providing powerful insights about the nature of science. Mixing such modelling with other methods could be particularly fruitful in future investigations.

Machine learning

The science of science seeks in part to forecast promising directions for scientific research 7 , 44 . In recent years, machine learning methods have substantially advanced predictive capabilities 200 , 201 and are playing increasingly important parts in the science of science. In contrast to the previous methods, machine learning does not emphasize hypotheses or theories. Rather, it leverages complex relationships in data and optimizes goodness of fit to make predictions and categorizations.

Traditional machine learning models include supervised, semi-supervised and unsupervised learning. The model choice depends on data availability and the research question, ranging from supervised models for citation prediction 202 , 203 to unsupervised models for community detection 204 . Take for example mappings of scientific knowledge 94 , 205 , 206 . The unsupervised method applies network clustering algorithms to map the structures of science. Related visualization tools make sense of clusters from the underlying network, allowing observers to see the organization, interactions and evolution of scientific knowledge. More recently, supervised learning, and deep neural networks in particular, have witnessed especially rapid developments 207 . Neural networks can generate high-dimensional representations of unstructured data such as images and texts, which encode complex properties difficult for human experts to perceive.

Take text analysis as an example. A recent study 52 utilizes 3.3 million paper abstracts in materials science to predict the thermoelectric properties of materials. The intuition is that the words currently used to describe a material may predict its hitherto undiscovered properties (Fig. 2 ). Compared with a random material, the materials predicted by the model are eight times more likely to be reported as thermoelectric in the next 5 years, suggesting that machine learning has the potential to substantially speed up knowledge discovery, especially as data continue to grow in scale and scope. Indeed, predicting the direction of new discoveries represents one of the most promising avenues for machine learning models, with neural networks being applied widely to biology 208 , physics 209 , 210 , mathematics 211 , chemistry 212 , medicine 213 and clinical applications 214 . Neural networks also offer a quantitative framework to probe the characteristics of creative products ranging from scientific papers 53 , journals 215 , organizations 148 , to paintings and movies 32 . Neural networks can also help to predict the reproducibility of papers from a variety of disciplines at scale 53 , 216 .

figure 2

This figure illustrates the word2vec skip-gram methods 52 , where the goal is to predict useful properties of materials using previous scientific literature. a , The architecture and training process of the word2vec skip-gram model, where the 3-layer, fully connected neural network learns the 200-dimensional representation (hidden layer) from the sparse vector for each word and its context in the literature (input layer). b , The top two principal components of the word embedding. Materials with similar features are close in the 2D space, allowing prediction of a material’s properties. Different targeted words are shown in different colours. Reproduced with permission from ref. 52 , Springer Nature Ltd.

While machine learning can offer high predictive accuracy, successful applications to the science of science face challenges, particularly regarding interpretability. Researchers may value transparent and interpretable findings for how a given feature influences an outcome, rather than a black-box model. The lack of interpretability also raises concerns about bias and fairness. In predicting reproducible patterns from data, machine learning models inevitably include and reproduce biases embedded in these data, often in non-transparent ways. The fairness of machine learning 217 is heavily debated in applications ranging from the criminal justice system to hiring processes. Effective and responsible use of machine learning in the science of science therefore requires thoughtful partnership between humans and machines 53 to build a reliable system accessible to scrutiny and modification.

Causal approaches

The preceding methods can reveal core facts about the workings of science and develop predictive capacity. Yet, they fail to capture causal relationships, which are particularly useful in assessing policy interventions. For example, how can we test whether a science policy boosts or hinders the performance of individuals, teams or institutions? The overarching idea of causal approaches is to construct some counterfactual world where two groups are identical to each other except that one group experiences a treatment that the other group does not.

Towards causation

Before engaging in causal approaches, it is useful to first consider the interpretative challenges of observational data. As observational data emerge from mechanisms that are not fully known or measured, an observed correlation may be driven by underlying forces that were not accounted for in the analysis. This challenge makes causal inference fundamentally difficult in observational data. An awareness of this issue is the first step in confronting it. It further motivates intermediate empirical approaches, including the use of matching strategies and fixed effects, that can help to confront (although not fully eliminate) the inference challenge. We first consider these approaches before turning to more fully causal methods.

Matching. Matching utilizes rich information to construct a control group that is similar to the treatment group on as many observable characteristics as possible before the treatment group is exposed to the treatment. Inferences can then be made by comparing the treatment and the matched control groups. Exact matching applies to categorical values, such as country, gender, discipline or affiliation 35 , 218 . Coarsened exact matching considers percentile bins of continuous variables and matches observations in the same bin 133 . Propensity score matching estimates the probability of receiving the ‘treatment’ on the basis of the controlled variables and uses the estimates to match treatment and control groups, which reduces the matching task from comparing the values of multiple covariates to comparing a single value 24 , 219 . Dynamic matching is useful for longitudinally matching variables that change over time 220 , 221 .

Fixed effects. Fixed effects are a powerful and now standard tool in controlling for confounders. A key requirement for using fixed effects is that there are multiple observations on the same subject or entity (person, field, institution and so on) 222 , 223 , 224 . The fixed effect works as a dummy variable that accounts for the role of any fixed characteristic of that entity. Consider the finding where gender-diverse teams produce higher-impact papers than same-gender teams do 225 . A confounder may be that individuals who tend to write high-impact papers may also be more likely to work in gender-diverse teams. By including individual fixed effects, one accounts for any fixed characteristics of individuals (such as IQ, cultural background or previous education) that might drive the relationship of interest.

In sum, matching and fixed effects methods reduce potential sources of bias in interpreting relationships between variables. Yet, confounders may persist in these studies. For instance, fixed effects do not control for unobserved factors that change with time within the given entity (for example, access to funding or new skills). Identifying casual effects convincingly will then typically require distinct research methods that we turn to next.


Researchers in economics and other fields have developed a range of quasi-experimental methods to construct treatment and control groups. The key idea here is exploiting randomness from external events that differentially expose subjects to a particular treatment. Here we review three quasi-experimental methods: difference-in-differences, instrumental variables and regression discontinuity (Fig. 3 ).

figure 3

a – c , This figure presents illustrations of ( a ) differences-in-differences, ( b ) instrumental variables and ( c ) regression discontinuity methods. The solid line in b represents causal links and the dashed line represents the relationships that are not allowed, if the IV method is to produce causal inference.

Difference-in-differences. Difference-in-difference regression (DiD) investigates the effect of an unexpected event, comparing the affected group (the treated group) with an unaffected group (the control group). The control group is intended to provide the counterfactual path—what would have happened were it not for the unexpected event. Ideally, the treated and control groups are on virtually identical paths before the treatment event, but DiD can also work if the groups are on parallel paths (Fig. 3a ). For example, one study 226 examines how the premature death of superstar scientists affects the productivity of their previous collaborators. The control group are collaborators of superstars who did not die in the time frame. The two groups do not show significant differences in publications before a death event, yet upon the death of a star scientist, the treated collaborators on average experience a 5–8% decline in their quality-adjusted publication rates compared with the control group. DiD has wide applicability in the science of science, having been used to analyse the causal effects of grant design 24 , access costs to previous research 155 , 227 , university technology transfer policies 154 , intellectual property 228 , citation practices 229 , evolution of fields 221 and the impacts of paper retractions 230 , 231 , 232 . The DiD literature has grown especially rapidly in the field of economics, with substantial recent refinements 233 , 234 .

Instrumental variables. Another quasi-experimental approach utilizes ‘instrumental variables’ (IV). The goal is to determine the causal influence of some feature X on some outcome Y by using a third, instrumental variable. This instrumental variable is a quasi-random event that induces variation in X and, except for its impact through X , has no other effect on the outcome Y (Fig. 3b ). For example, consider a study of astronomy that seeks to understand how telescope time affects career advancement 235 . Here, one cannot simply look at the correlation between telescope time and career outcomes because many confounds (such as talent or grit) may influence both telescope time and career opportunities. Now consider the weather as an instrumental variable. Cloudy weather will, at random, reduce an astronomer’s observational time. Yet, the weather on particular nights is unlikely to correlate with a scientist’s innate qualities. The weather can then provide an instrumental variable to reveal a causal relationship between telescope time and career outcomes. Instrumental variables have been used to study local peer effects in research 151 , the impact of gender composition in scientific committees 236 , patents on future innovation 237 and taxes on inventor mobility 238 .

Regression discontinuity. In regression discontinuity, policies with an arbitrary threshold for receiving some benefit can be used to construct treatment and control groups (Fig. 3c ). Take the funding paylines for grant proposals as an example. Proposals with scores increasingly close to the payline are increasingly similar in their both observable and unobservable characteristics, yet only those projects with scores above the payline receive the funding. For example, a study 110 examines the effect of winning an early-career grant on the probability of winning a later, mid-career grant. The probability has a discontinuous jump across the initial grant’s payline, providing the treatment and control groups needed to estimate the causal effect of receiving a grant. This example utilizes the ‘sharp’ regression discontinuity that assumes treatment status to be fully determined by the cut-off. If we assume treatment status is only partly determined by the cut-off, we can use ‘fuzzy’ regression discontinuity designs. Here the probability of receiving a grant is used to estimate the future outcome 11 , 110 , 239 , 240 , 241 .

Although quasi-experiments are powerful tools, they face their own limitations. First, these approaches identify causal effects within a specific context and often engage small numbers of observations. How representative the samples are for broader populations or contexts is typically left as an open question. Second, the validity of the causal design is typically not ironclad. Researchers usually conduct different robustness checks to verify whether observable confounders have significant differences between the treated and control groups, before treatment. However, unobservable features may still differ between treatment and control groups. The quality of instrumental variables and the specific claim that they have no effect on the outcome except through the variable of interest, is also difficult to assess. Ultimately, researchers must rely partly on judgement to tell whether appropriate conditions are met for causal inference.

This section emphasized popular econometric approaches to causal inference. Other empirical approaches, such as graphical causal modelling 242 , 243 , also represent an important stream of work on assessing causal relationships. Such approaches usually represent causation as a directed acyclic graph, with nodes as variables and arrows between them as suspected causal relationships. In the science of science, the directed acyclic graph approach has been applied to quantify the causal effect of journal impact factor 244 and gender or racial bias 245 on citations. Graphical causal modelling has also triggered discussions on strengths and weaknesses compared to the econometrics methods 246 , 247 .


In contrast to quasi-experimental approaches, laboratory and field experiments conduct direct randomization in assigning treatment and control groups. These methods engage explicitly in the data generation process, manipulating interventions to observe counterfactuals. These experiments are crafted to study mechanisms of specific interest and, by designing the experiment and formally randomizing, can produce especially rigorous causal inference.

Laboratory experiments. Laboratory experiments build counterfactual worlds in well-controlled laboratory environments. Researchers randomly assign participants to the treatment or control group and then manipulate the laboratory conditions to observe different outcomes in the two groups. For example, consider laboratory experiments on team performance and gender composition 144 , 248 . The researchers randomly assign participants into groups to perform tasks such as solving puzzles or brainstorming. Teams with a higher proportion of women are found to perform better on average, offering evidence that gender diversity is causally linked to team performance. Laboratory experiments can allow researchers to test forces that are otherwise hard to observe, such as how competition influences creativity 249 . Laboratory experiments have also been used to evaluate how journal impact factors shape scientists’ perceptions of rewards 250 and gender bias in hiring 251 .

Laboratory experiments allow for precise control of settings and procedures to isolate causal effects of interest. However, participants may behave differently in synthetic environments than in real-world settings, raising questions about the generalizability and replicability of the results 252 , 253 , 254 . To assess causal effects in real-world settings, researcher use randomized controlled trials.

Randomized controlled trials. A randomized controlled trial (RCT), or field experiment, is a staple for causal inference across a wide range of disciplines. RCTs randomly assign participants into the treatment and control conditions 255 and can be used not only to assess mechanisms but also to test real-world interventions such as policy change. The science of science has witnessed growing use of RCTs. For instance, a field experiment 146 investigated whether lower search costs for collaborators increased collaboration in grant applications. The authors randomly allocated principal investigators to face-to-face sessions in a medical school, and then measured participants’ chance of writing a grant proposal together. RCTs have also offered rich causal insights on peer review 256 , 257 , 258 , 259 , 260 and gender bias in science 261 , 262 , 263 .

While powerful, RCTs are difficult to conduct in the science of science, mainly for two reasons. The first concerns potential risks in a policy intervention. For instance, while randomizing funding across individuals could generate crucial causal insights for funders, it may also inadvertently harm participants’ careers 264 . Second, key questions in the science of science often require a long-time horizon to trace outcomes, which makes RCTs costly. It also raises the difficulty of replicating findings. A relative advantage of the quasi-experimental methods discussed earlier is that one can identify causal effects over potentially long periods of time in the historical record. On the other hand, quasi-experiments must be found as opposed to designed, and they often are not available for many questions of interest. While the best approaches are context dependent, a growing community of researchers is building platforms to facilitate RCTs for the science of science, aiming to lower their costs and increase their scale. Performing RCTs in partnership with science institutions can also contribute to timely, policy-relevant research that may substantially improve science decision-making and investments.

Research in the science of science has been empowered by the growth of high-scale data, new measurement approaches and an expanding range of empirical methods. These tools provide enormous capacity to test conceptual frameworks about science, discover factors impacting scientific productivity, predict key scientific outcomes and design policies that better facilitate future scientific progress. A careful appreciation of empirical techniques can help researchers to choose effective tools for questions of interest and propel the field. A better and broader understanding of these methodologies may also build bridges across diverse research communities, facilitating communication and collaboration, and better leveraging the value of diverse perspectives. The science of science is about turning scientific methods on the nature of science itself. The fruits of this work, with time, can guide researchers and research institutions to greater progress in discovery and understanding across the landscape of scientific inquiry.

Bush, V . S cience–the Endless Frontier: A Report to the President on a Program for Postwar Scientific Research (National Science Foundation, 1990).

Mokyr, J. The Gifts of Athena (Princeton Univ. Press, 2011).

Jones, B. F. in Rebuilding the Post-Pandemic Economy (eds Kearney, M. S. & Ganz, A.) 272–310 (Aspen Institute Press, 2021).

Wang, D. & Barabási, A.-L. The Science of Science (Cambridge Univ. Press, 2021).

Fortunato, S. et al. Science of science. Science 359 , eaao0185 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Azoulay, P. et al. Toward a more scientific science. Science 361 , 1194–1197 (2018).

Article   PubMed   Google Scholar  

Clauset, A., Larremore, D. B. & Sinatra, R. Data-driven predictions in the science of science. Science 355 , 477–480 (2017).

Article   CAS   PubMed   Google Scholar  

Zeng, A. et al. The science of science: from the perspective of complex systems. Phys. Rep. 714 , 1–73 (2017).

Article   Google Scholar  

Lin, Z., Yin. Y., Liu, L. & Wang, D. SciSciNet: a large-scale open data lake for the science of science research. Sci. Data, https://doi.org/10.1038/s41597-023-02198-9 (2023).

Ahmadpoor, M. & Jones, B. F. The dual frontier: patented inventions and prior scientific advance. Science 357 , 583–587 (2017).

Azoulay, P., Graff Zivin, J. S., Li, D. & Sampat, B. N. Public R&D investments and private-sector patenting: evidence from NIH funding rules. Rev. Econ. Stud. 86 , 117–152 (2019).

Yin, Y., Dong, Y., Wang, K., Wang, D. & Jones, B. F. Public use and public funding of science. Nat. Hum. Behav. 6 , 1344–1350 (2022).

Merton, R. K. The Sociology of Science: Theoretical and Empirical Investigations (Univ. Chicago Press, 1973).

Kuhn, T. The Structure of Scientific Revolutions (Princeton Univ. Press, 2021).

Uzzi, B., Mukherjee, S., Stringer, M. & Jones, B. Atypical combinations and scientific impact. Science 342 , 468–472 (2013).

Zuckerman, H. Scientific Elite: Nobel Laureates in the United States (Transaction Publishers, 1977).

Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature 566 , 378–382 (2019).

Wuchty, S., Jones, B. F. & Uzzi, B. The increasing dominance of teams in production of knowledge. Science 316 , 1036–1039 (2007).

Foster, J. G., Rzhetsky, A. & Evans, J. A. Tradition and innovation in scientists’ research strategies. Am. Sociol. Rev. 80 , 875–908 (2015).

Wang, D., Song, C. & Barabási, A.-L. Quantifying long-term scientific impact. Science 342 , 127–132 (2013).

Clauset, A., Arbesman, S. & Larremore, D. B. Systematic inequality and hierarchy in faculty hiring networks. Sci. Adv. 1 , e1400005 (2015).

Ma, A., Mondragón, R. J. & Latora, V. Anatomy of funded research in science. Proc. Natl Acad. Sci. USA 112 , 14760–14765 (2015).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Ma, Y. & Uzzi, B. Scientific prize network predicts who pushes the boundaries of science. Proc. Natl Acad. Sci. USA 115 , 12608–12615 (2018).

Azoulay, P., Graff Zivin, J. S. & Manso, G. Incentives and creativity: evidence from the academic life sciences. RAND J. Econ. 42 , 527–554 (2011).

Schor, S. & Karten, I. Statistical evaluation of medical journal manuscripts. JAMA 195 , 1123–1128 (1966).

Platt, J. R. Strong inference: certain systematic methods of scientific thinking may produce much more rapid progress than others. Science 146 , 347–353 (1964).

Ioannidis, J. P. Why most published research findings are false. PLoS Med. 2 , e124 (2005).

Simonton, D. K. Career landmarks in science: individual differences and interdisciplinary contrasts. Dev. Psychol. 27 , 119 (1991).

Way, S. F., Morgan, A. C., Clauset, A. & Larremore, D. B. The misleading narrative of the canonical faculty productivity trajectory. Proc. Natl Acad. Sci. USA 114 , E9216–E9223 (2017).

Sinatra, R., Wang, D., Deville, P., Song, C. & Barabási, A.-L. Quantifying the evolution of individual scientific impact. Science 354 , aaf5239 (2016).

Liu, L. et al. Hot streaks in artistic, cultural, and scientific careers. Nature 559 , 396–399 (2018).

Liu, L., Dehmamy, N., Chown, J., Giles, C. L. & Wang, D. Understanding the onset of hot streaks across artistic, cultural, and scientific careers. Nat. Commun. 12 , 5392 (2021).

Squazzoni, F. et al. Peer review and gender bias: a study on 145 scholarly journals. Sci. Adv. 7 , eabd0299 (2021).

Hofstra, B. et al. The diversity–innovation paradox in science. Proc. Natl Acad. Sci. USA 117 , 9284–9291 (2020).

Huang, J., Gates, A. J., Sinatra, R. & Barabási, A.-L. Historical comparison of gender inequality in scientific careers across countries and disciplines. Proc. Natl Acad. Sci. USA 117 , 4609–4616 (2020).

Gläser, J. & Laudel, G. Governing science: how science policy shapes research content. Eur. J. Sociol. 57 , 117–168 (2016).

Stephan, P. E. How Economics Shapes Science (Harvard Univ. Press, 2012).

Garfield, E. & Sher, I. H. New factors in the evaluation of scientific literature through citation indexing. Am. Doc. 14 , 195–201 (1963).

Article   CAS   Google Scholar  

de Solla Price, D. J. Networks of scientific papers. Science 149 , 510–515 (1965).

Etzkowitz, H., Kemelgor, C. & Uzzi, B. Athena Unbound: The Advancement of Women in Science and Technology (Cambridge Univ. Press, 2000).

Simonton, D. K. Scientific Genius: A Psychology of Science (Cambridge Univ. Press, 1988).

Khabsa, M. & Giles, C. L. The number of scholarly documents on the public web. PLoS ONE 9 , e93949 (2014).

Xia, F., Wang, W., Bekele, T. M. & Liu, H. Big scholarly data: a survey. IEEE Trans. Big Data 3 , 18–35 (2017).

Evans, J. A. & Foster, J. G. Metaknowledge. Science 331 , 721–725 (2011).

Milojević, S. Quantifying the cognitive extent of science. J. Informetr. 9 , 962–973 (2015).

Rzhetsky, A., Foster, J. G., Foster, I. T. & Evans, J. A. Choosing experiments to accelerate collective discovery. Proc. Natl Acad. Sci. USA 112 , 14569–14574 (2015).

Poncela-Casasnovas, J., Gerlach, M., Aguirre, N. & Amaral, L. A. Large-scale analysis of micro-level citation patterns reveals nuanced selection criteria. Nat. Hum. Behav. 3 , 568–575 (2019).

Hardwicke, T. E. et al. Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition. R. Soc. Open Sci. 5 , 180448 (2018).

Nagaraj, A., Shears, E. & de Vaan, M. Improving data access democratizes and diversifies science. Proc. Natl Acad. Sci. USA 117 , 23490–23498 (2020).

Bravo, G., Grimaldo, F., López-Iñesta, E., Mehmani, B. & Squazzoni, F. The effect of publishing peer review reports on referee behavior in five scholarly journals. Nat. Commun. 10 , 322 (2019).

Tran, D. et al. An open review of open review: a critical analysis of the machine learning conference review process. Preprint at https://doi.org/10.48550/arXiv.2010.05137 (2020).

Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571 , 95–98 (2019).

Yang, Y., Wu, Y. & Uzzi, B. Estimating the deep replicability of scientific findings using human and artificial intelligence. Proc. Natl Acad. Sci. USA 117 , 10762–10768 (2020).

Mukherjee, S., Uzzi, B., Jones, B. & Stringer, M. A new method for identifying recombinations of existing knowledge associated with high‐impact innovation. J. Prod. Innov. Manage. 33 , 224–236 (2016).

Leahey, E., Beckman, C. M. & Stanko, T. L. Prominent but less productive: the impact of interdisciplinarity on scientists’ research. Adm. Sci. Q. 62 , 105–139 (2017).

Sauermann, H. & Haeussler, C. Authorship and contribution disclosures. Sci. Adv. 3 , e1700404 (2017).

Oliveira, D. F. M., Ma, Y., Woodruff, T. K. & Uzzi, B. Comparison of National Institutes of Health grant amounts to first-time male and female principal investigators. JAMA 321 , 898–900 (2019).

Yang, Y., Chawla, N. V. & Uzzi, B. A network’s gender composition and communication pattern predict women’s leadership success. Proc. Natl Acad. Sci. USA 116 , 2033–2038 (2019).

Way, S. F., Larremore, D. B. & Clauset, A. Gender, productivity, and prestige in computer science faculty hiring networks. In Proc. 25th International Conference on World Wide Web 1169–1179. (ACM 2016)

Malmgren, R. D., Ottino, J. M. & Amaral, L. A. N. The role of mentorship in protege performance. Nature 465 , 622–626 (2010).

Ma, Y., Mukherjee, S. & Uzzi, B. Mentorship and protégé success in STEM fields. Proc. Natl Acad. Sci. USA 117 , 14077–14083 (2020).

Börner, K. et al. Skill discrepancies between research, education, and jobs reveal the critical need to supply soft skills for the data economy. Proc. Natl Acad. Sci. USA 115 , 12630–12637 (2018).

Biasi, B. & Ma, S. The Education-Innovation Gap (National Bureau of Economic Research Working papers, 2020).

Bornmann, L. Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics. J. Informetr. 8 , 895–903 (2014).

Cleary, E. G., Beierlein, J. M., Khanuja, N. S., McNamee, L. M. & Ledley, F. D. Contribution of NIH funding to new drug approvals 2010–2016. Proc. Natl Acad. Sci. USA 115 , 2329–2334 (2018).

Spector, J. M., Harrison, R. S. & Fishman, M. C. Fundamental science behind today’s important medicines. Sci. Transl. Med. 10 , eaaq1787 (2018).

Haunschild, R. & Bornmann, L. How many scientific papers are mentioned in policy-related documents? An empirical investigation using Web of Science and Altmetric data. Scientometrics 110 , 1209–1216 (2017).

Yin, Y., Gao, J., Jones, B. F. & Wang, D. Coevolution of policy and science during the pandemic. Science 371 , 128–130 (2021).

Sugimoto, C. R., Work, S., Larivière, V. & Haustein, S. Scholarly use of social media and altmetrics: a review of the literature. J. Assoc. Inf. Sci. Technol. 68 , 2037–2062 (2017).

Dunham, I. Human genes: time to follow the roads less traveled? PLoS Biol. 16 , e3000034 (2018).

Kustatscher, G. et al. Understudied proteins: opportunities and challenges for functional proteomics. Nat. Methods 19 , 774–779 (2022).

Rosenthal, R. The file drawer problem and tolerance for null results. Psychol. Bull. 86 , 638 (1979).

Franco, A., Malhotra, N. & Simonovits, G. Publication bias in the social sciences: unlocking the file drawer. Science 345 , 1502–1505 (2014).

Vera-Baceta, M.-A., Thelwall, M. & Kousha, K. Web of Science and Scopus language coverage. Scientometrics 121 , 1803–1813 (2019).

Waltman, L. A review of the literature on citation impact indicators. J. Informetr. 10 , 365–391 (2016).

Garfield, E. & Merton, R. K. Citation Indexing: Its Theory and Application in Science, Technology, and Humanities (Wiley, 1979).

Kelly, B., Papanikolaou, D., Seru, A. & Taddy, M. Measuring Technological Innovation Over the Long Run Report No. 0898-2937 (National Bureau of Economic Research, 2018).

Kogan, L., Papanikolaou, D., Seru, A. & Stoffman, N. Technological innovation, resource allocation, and growth. Q. J. Econ. 132 , 665–712 (2017).

Hall, B. H., Jaffe, A. & Trajtenberg, M. Market value and patent citations. RAND J. Econ. 36 , 16–38 (2005).

Google Scholar  

Yan, E. & Ding, Y. Applying centrality measures to impact analysis: a coauthorship network analysis. J. Am. Soc. Inf. Sci. Technol. 60 , 2107–2118 (2009).

Radicchi, F., Fortunato, S., Markines, B. & Vespignani, A. Diffusion of scientific credits and the ranking of scientists. Phys. Rev. E 80 , 056103 (2009).

Bollen, J., Rodriquez, M. A. & Van de Sompel, H. Journal status. Scientometrics 69 , 669–687 (2006).

Bergstrom, C. T., West, J. D. & Wiseman, M. A. The eigenfactor™ metrics. J. Neurosci. 28 , 11433–11434 (2008).

Cronin, B. & Sugimoto, C. R. Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact (MIT Press, 2014).

Hicks, D., Wouters, P., Waltman, L., De Rijcke, S. & Rafols, I. Bibliometrics: the Leiden Manifesto for research metrics. Nature 520 , 429–431 (2015).

Catalini, C., Lacetera, N. & Oettl, A. The incidence and role of negative citations in science. Proc. Natl Acad. Sci. USA 112 , 13823–13826 (2015).

Alcacer, J. & Gittelman, M. Patent citations as a measure of knowledge flows: the influence of examiner citations. Rev. Econ. Stat. 88 , 774–779 (2006).

Ding, Y. et al. Content‐based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65 , 1820–1833 (2014).

Teufel, S., Siddharthan, A. & Tidhar, D. Automatic classification of citation function. In Proc. 2006 Conference on Empirical Methods in Natural Language Processing, 103–110 (Association for Computational Linguistics 2006)

Seeber, M., Cattaneo, M., Meoli, M. & Malighetti, P. Self-citations as strategic response to the use of metrics for career decisions. Res. Policy 48 , 478–491 (2019).

Pendlebury, D. A. The use and misuse of journal metrics and other citation indicators. Arch. Immunol. Ther. Exp. 57 , 1–11 (2009).

Biagioli, M. Watch out for cheats in citation game. Nature 535 , 201 (2016).

Jo, W. S., Liu, L. & Wang, D. See further upon the giants: quantifying intellectual lineage in science. Quant. Sci. Stud. 3 , 319–330 (2022).

Boyack, K. W., Klavans, R. & Börner, K. Mapping the backbone of science. Scientometrics 64 , 351–374 (2005).

Gates, A. J., Ke, Q., Varol, O. & Barabási, A.-L. Nature’s reach: narrow work has broad impact. Nature 575 , 32–34 (2019).

Börner, K., Penumarthy, S., Meiss, M. & Ke, W. Mapping the diffusion of scholarly knowledge among major US research institutions. Scientometrics 68 , 415–426 (2006).

King, D. A. The scientific impact of nations. Nature 430 , 311–316 (2004).

Pan, R. K., Kaski, K. & Fortunato, S. World citation and collaboration networks: uncovering the role of geography in science. Sci. Rep. 2 , 902 (2012).

Jaffe, A. B., Trajtenberg, M. & Henderson, R. Geographic localization of knowledge spillovers as evidenced by patent citations. Q. J. Econ. 108 , 577–598 (1993).

Funk, R. J. & Owen-Smith, J. A dynamic network measure of technological change. Manage. Sci. 63 , 791–817 (2017).

Yegros-Yegros, A., Rafols, I. & D’este, P. Does interdisciplinary research lead to higher citation impact? The different effect of proximal and distal interdisciplinarity. PLoS ONE 10 , e0135095 (2015).

Larivière, V., Haustein, S. & Börner, K. Long-distance interdisciplinarity leads to higher scientific impact. PLoS ONE 10 , e0122565 (2015).

Fleming, L., Greene, H., Li, G., Marx, M. & Yao, D. Government-funded research increasingly fuels innovation. Science 364 , 1139–1141 (2019).

Bowen, A. & Casadevall, A. Increasing disparities between resource inputs and outcomes, as measured by certain health deliverables, in biomedical research. Proc. Natl Acad. Sci. USA 112 , 11335–11340 (2015).

Li, D., Azoulay, P. & Sampat, B. N. The applied value of public investments in biomedical research. Science 356 , 78–81 (2017).

Lehman, H. C. Age and Achievement (Princeton Univ. Press, 2017).

Simonton, D. K. Creative productivity: a predictive and explanatory model of career trajectories and landmarks. Psychol. Rev. 104 , 66 (1997).

Duch, J. et al. The possible role of resource requirements and academic career-choice risk on gender differences in publication rate and impact. PLoS ONE 7 , e51332 (2012).

Wang, Y., Jones, B. F. & Wang, D. Early-career setback and future career impact. Nat. Commun. 10 , 4331 (2019).

Bol, T., de Vaan, M. & van de Rijt, A. The Matthew effect in science funding. Proc. Natl Acad. Sci. USA 115 , 4887–4890 (2018).

Jones, B. F. Age and great invention. Rev. Econ. Stat. 92 , 1–14 (2010).

Newman, M. Networks (Oxford Univ. Press, 2018).

Mazloumian, A., Eom, Y.-H., Helbing, D., Lozano, S. & Fortunato, S. How citation boosts promote scientific paradigm shifts and nobel prizes. PLoS ONE 6 , e18975 (2011).

Hirsch, J. E. An index to quantify an individual’s scientific research output. Proc. Natl Acad. Sci. USA 102 , 16569–16572 (2005).

Alonso, S., Cabrerizo, F. J., Herrera-Viedma, E. & Herrera, F. h-index: a review focused in its variants, computation and standardization for different scientific fields. J. Informetr. 3 , 273–289 (2009).

Egghe, L. An improvement of the h-index: the g-index. ISSI Newsl. 2 , 8–9 (2006).

Kaur, J., Radicchi, F. & Menczer, F. Universality of scholarly impact metrics. J. Informetr. 7 , 924–932 (2013).

Majeti, D. et al. Scholar plot: design and evaluation of an information interface for faculty research performance. Front. Res. Metr. Anal. 4 , 6 (2020).

Sidiropoulos, A., Katsaros, D. & Manolopoulos, Y. Generalized Hirsch h-index for disclosing latent facts in citation networks. Scientometrics 72 , 253–280 (2007).

Jones, B. F. & Weinberg, B. A. Age dynamics in scientific creativity. Proc. Natl Acad. Sci. USA 108 , 18910–18914 (2011).

Dennis, W. Age and productivity among scientists. Science 123 , 724–725 (1956).

Sanyal, D. K., Bhowmick, P. K. & Das, P. P. A review of author name disambiguation techniques for the PubMed bibliographic database. J. Inf. Sci. 47 , 227–254 (2021).

Haak, L. L., Fenner, M., Paglione, L., Pentz, E. & Ratner, H. ORCID: a system to uniquely identify researchers. Learn. Publ. 25 , 259–264 (2012).

Malmgren, R. D., Ottino, J. M. & Amaral, L. A. N. The role of mentorship in protégé performance. Nature 465 , 662–667 (2010).

Oettl, A. Reconceptualizing stars: scientist helpfulness and peer performance. Manage. Sci. 58 , 1122–1140 (2012).

Morgan, A. C. et al. The unequal impact of parenthood in academia. Sci. Adv. 7 , eabd1996 (2021).

Morgan, A. C. et al. Socioeconomic roots of academic faculty. Nat. Hum. Behav. 6 , 1625–1633 (2022).

San Francisco Declaration on Research Assessment (DORA) (American Society for Cell Biology, 2012).

Falk‐Krzesinski, H. J. et al. Advancing the science of team science. Clin. Transl. Sci. 3 , 263–266 (2010).

Cooke, N. J. et al. Enhancing the Effectiveness of Team Science (National Academies Press, 2015).

Börner, K. et al. A multi-level systems perspective for the science of team science. Sci. Transl. Med. 2 , 49cm24 (2010).

Leahey, E. From sole investigator to team scientist: trends in the practice and study of research collaboration. Annu. Rev. Sociol. 42 , 81–100 (2016).

AlShebli, B. K., Rahwan, T. & Woon, W. L. The preeminence of ethnic diversity in scientific collaboration. Nat. Commun. 9 , 5163 (2018).

Hsiehchen, D., Espinoza, M. & Hsieh, A. Multinational teams and diseconomies of scale in collaborative research. Sci. Adv. 1 , e1500211 (2015).

Koning, R., Samila, S. & Ferguson, J.-P. Who do we invent for? Patents by women focus more on women’s health, but few women get to invent. Science 372 , 1345–1348 (2021).

Barabâsi, A.-L. et al. Evolution of the social network of scientific collaborations. Physica A 311 , 590–614 (2002).

Newman, M. E. Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 64 , 016131 (2001).

Newman, M. E. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys. Rev. E 64 , 016132 (2001).

Palla, G., Barabási, A.-L. & Vicsek, T. Quantifying social group evolution. Nature 446 , 664–667 (2007).

Ross, M. B. et al. Women are credited less in science than men. Nature 608 , 135–145 (2022).

Shen, H.-W. & Barabási, A.-L. Collective credit allocation in science. Proc. Natl Acad. Sci. USA 111 , 12325–12330 (2014).

Merton, R. K. Matthew effect in science. Science 159 , 56–63 (1968).

Ni, C., Smith, E., Yuan, H., Larivière, V. & Sugimoto, C. R. The gendered nature of authorship. Sci. Adv. 7 , eabe4639 (2021).

Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N. & Malone, T. W. Evidence for a collective intelligence factor in the performance of human groups. Science 330 , 686–688 (2010).

Feldon, D. F. et al. Postdocs’ lab engagement predicts trajectories of PhD students’ skill development. Proc. Natl Acad. Sci. USA 116 , 20910–20916 (2019).

Boudreau, K. J. et al. A field experiment on search costs and the formation of scientific collaborations. Rev. Econ. Stat. 99 , 565–576 (2017).

Holcombe, A. O. Contributorship, not authorship: use CRediT to indicate who did what. Publications 7 , 48 (2019).

Murray, D. et al. Unsupervised embedding of trajectories captures the latent structure of mobility. Preprint at https://doi.org/10.48550/arXiv.2012.02785 (2020).

Deville, P. et al. Career on the move: geography, stratification, and scientific impact. Sci. Rep. 4 , 4770 (2014).

Edmunds, L. D. et al. Why do women choose or reject careers in academic medicine? A narrative review of empirical evidence. Lancet 388 , 2948–2958 (2016).

Waldinger, F. Peer effects in science: evidence from the dismissal of scientists in Nazi Germany. Rev. Econ. Stud. 79 , 838–861 (2012).

Agrawal, A., McHale, J. & Oettl, A. How stars matter: recruiting and peer effects in evolutionary biology. Res. Policy 46 , 853–867 (2017).

Fiore, S. M. Interdisciplinarity as teamwork: how the science of teams can inform team science. Small Group Res. 39 , 251–277 (2008).

Hvide, H. K. & Jones, B. F. University innovation and the professor’s privilege. Am. Econ. Rev. 108 , 1860–1898 (2018).

Murray, F., Aghion, P., Dewatripont, M., Kolev, J. & Stern, S. Of mice and academics: examining the effect of openness on innovation. Am. Econ. J. Econ. Policy 8 , 212–252 (2016).

Radicchi, F., Fortunato, S. & Castellano, C. Universality of citation distributions: toward an objective measure of scientific impact. Proc. Natl Acad. Sci. USA 105 , 17268–17272 (2008).

Waltman, L., van Eck, N. J. & van Raan, A. F. Universality of citation distributions revisited. J. Am. Soc. Inf. Sci. Technol. 63 , 72–77 (2012).

Barabási, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286 , 509–512 (1999).

de Solla Price, D. A general theory of bibliometric and other cumulative advantage processes. J. Am. Soc. Inf. Sci. 27 , 292–306 (1976).

Cole, S. Age and scientific performance. Am. J. Sociol. 84 , 958–977 (1979).

Ke, Q., Ferrara, E., Radicchi, F. & Flammini, A. Defining and identifying sleeping beauties in science. Proc. Natl Acad. Sci. USA 112 , 7426–7431 (2015).

Bornmann, L., de Moya Anegón, F. & Leydesdorff, L. Do scientific advancements lean on the shoulders of giants? A bibliometric investigation of the Ortega hypothesis. PLoS ONE 5 , e13327 (2010).

Mukherjee, S., Romero, D. M., Jones, B. & Uzzi, B. The nearly universal link between the age of past knowledge and tomorrow’s breakthroughs in science and technology: the hotspot. Sci. Adv. 3 , e1601315 (2017).

Packalen, M. & Bhattacharya, J. NIH funding and the pursuit of edge science. Proc. Natl Acad. Sci. USA 117 , 12011–12016 (2020).

Zeng, A., Fan, Y., Di, Z., Wang, Y. & Havlin, S. Fresh teams are associated with original and multidisciplinary research. Nat. Hum. Behav. 5 , 1314–1322 (2021).

Newman, M. E. The structure of scientific collaboration networks. Proc. Natl Acad. Sci. USA 98 , 404–409 (2001).

Larivière, V., Ni, C., Gingras, Y., Cronin, B. & Sugimoto, C. R. Bibliometrics: global gender disparities in science. Nature 504 , 211–213 (2013).

West, J. D., Jacquet, J., King, M. M., Correll, S. J. & Bergstrom, C. T. The role of gender in scholarly authorship. PLoS ONE 8 , e66212 (2013).

Gao, J., Yin, Y., Myers, K. R., Lakhani, K. R. & Wang, D. Potentially long-lasting effects of the pandemic on scientists. Nat. Commun. 12 , 6188 (2021).

Jones, B. F., Wuchty, S. & Uzzi, B. Multi-university research teams: shifting impact, geography, and stratification in science. Science 322 , 1259–1262 (2008).

Chu, J. S. & Evans, J. A. Slowed canonical progress in large fields of science. Proc. Natl Acad. Sci. USA 118 , e2021636118 (2021).

Wang, J., Veugelers, R. & Stephan, P. Bias against novelty in science: a cautionary tale for users of bibliometric indicators. Res. Policy 46 , 1416–1436 (2017).

Stringer, M. J., Sales-Pardo, M. & Amaral, L. A. Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal. J. Assoc. Inf. Sci. Technol. 61 , 1377–1385 (2010).

Bianconi, G. & Barabási, A.-L. Bose-Einstein condensation in complex networks. Phys. Rev. Lett. 86 , 5632 (2001).

Bianconi, G. & Barabási, A.-L. Competition and multiscaling in evolving networks. Europhys. Lett. 54 , 436 (2001).

Yin, Y. & Wang, D. The time dimension of science: connecting the past to the future. J. Informetr. 11 , 608–621 (2017).

Pan, R. K., Petersen, A. M., Pammolli, F. & Fortunato, S. The memory of science: Inflation, myopia, and the knowledge network. J. Informetr. 12 , 656–678 (2018).

Yin, Y., Wang, Y., Evans, J. A. & Wang, D. Quantifying the dynamics of failure across science, startups and security. Nature 575 , 190–194 (2019).

Candia, C. & Uzzi, B. Quantifying the selective forgetting and integration of ideas in science and technology. Am. Psychol. 76 , 1067 (2021).

Milojević, S. Principles of scientific research team formation and evolution. Proc. Natl Acad. Sci. USA 111 , 3984–3989 (2014).

Guimera, R., Uzzi, B., Spiro, J. & Amaral, L. A. N. Team assembly mechanisms determine collaboration network structure and team performance. Science 308 , 697–702 (2005).

Newman, M. E. Coauthorship networks and patterns of scientific collaboration. Proc. Natl Acad. Sci. USA 101 , 5200–5205 (2004).

Newman, M. E. Clustering and preferential attachment in growing networks. Phys. Rev. E 64 , 025102 (2001).

Iacopini, I., Milojević, S. & Latora, V. Network dynamics of innovation processes. Phys. Rev. Lett. 120 , 048301 (2018).

Kuhn, T., Perc, M. & Helbing, D. Inheritance patterns in citation networks reveal scientific memes. Phys. Rev. 4 , 041036 (2014).

Jia, T., Wang, D. & Szymanski, B. K. Quantifying patterns of research-interest evolution. Nat. Hum. Behav. 1 , 0078 (2017).

Zeng, A. et al. Increasing trend of scientists to switch between topics. Nat. Commun. https://doi.org/10.1038/s41467-019-11401-8 (2019).

Siudem, G., Żogała-Siudem, B., Cena, A. & Gagolewski, M. Three dimensions of scientific impact. Proc. Natl Acad. Sci. USA 117 , 13896–13900 (2020).

Petersen, A. M. et al. Reputation and impact in academic careers. Proc. Natl Acad. Sci. USA 111 , 15316–15321 (2014).

Jin, C., Song, C., Bjelland, J., Canright, G. & Wang, D. Emergence of scaling in complex substitutive systems. Nat. Hum. Behav. 3 , 837–846 (2019).

Hofman, J. M. et al. Integrating explanation and prediction in computational social science. Nature 595 , 181–188 (2021).

Lazer, D. et al. Computational social science. Science 323 , 721–723 (2009).

Lazer, D. M. et al. Computational social science: obstacles and opportunities. Science 369 , 1060–1062 (2020).

Albert, R. & Barabási, A.-L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74 , 47 (2002).

Newman, M. E. The structure and function of complex networks. SIAM Rev. 45 , 167–256 (2003).

Song, C., Qu, Z., Blumm, N. & Barabási, A.-L. Limits of predictability in human mobility. Science 327 , 1018–1021 (2010).

Alessandretti, L., Aslak, U. & Lehmann, S. The scales of human mobility. Nature 587 , 402–407 (2020).

Pastor-Satorras, R. & Vespignani, A. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86 , 3200 (2001).

Pastor-Satorras, R., Castellano, C., Van Mieghem, P. & Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. 87 , 925 (2015).

Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).

Dong, Y., Johnson, R. A. & Chawla, N. V. Will this paper increase your h-index? Scientific impact prediction. In Proc. 8th ACM International Conference on Web Search and Data Mining, 149–158 (ACM 2015)

Xiao, S. et al. On modeling and predicting individual paper citation count over time. In IJCAI, 2676–2682 (IJCAI, 2016)

Fortunato, S. Community detection in graphs. Phys. Rep. 486 , 75–174 (2010).

Chen, C. Science mapping: a systematic review of the literature. J. Data Inf. Sci. 2 , 1–40 (2017).

CAS   Google Scholar  

Van Eck, N. J. & Waltman, L. Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics 111 , 1053–1070 (2017).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444 (2015).

Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577 , 706–710 (2020).

Krenn, M. & Zeilinger, A. Predicting research trends with semantic and neural networks with an application in quantum physics. Proc. Natl Acad. Sci. USA 117 , 1910–1916 (2020).

Iten, R., Metger, T., Wilming, H., Del Rio, L. & Renner, R. Discovering physical concepts with neural networks. Phys. Rev. Lett. 124 , 010508 (2020).

Guimerà, R. et al. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Sci. Adv. 6 , eaav6971 (2020).

Segler, M. H., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555 , 604–610 (2018).

Ryu, J. Y., Kim, H. U. & Lee, S. Y. Deep learning improves prediction of drug–drug and drug–food interactions. Proc. Natl Acad. Sci. USA 115 , E4304–E4311 (2018).

Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172 , 1122–1131.e9 (2018).

Peng, H., Ke, Q., Budak, C., Romero, D. M. & Ahn, Y.-Y. Neural embeddings of scholarly periodicals reveal complex disciplinary organizations. Sci. Adv. 7 , eabb9004 (2021).

Youyou, W., Yang, Y. & Uzzi, B. A discipline-wide investigation of the replicability of psychology papers over the past two decades. Proc. Natl Acad. Sci. USA 120 , e2208863120 (2023).

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR) 54 , 1–35 (2021).

Way, S. F., Morgan, A. C., Larremore, D. B. & Clauset, A. Productivity, prominence, and the effects of academic environment. Proc. Natl Acad. Sci. USA 116 , 10729–10733 (2019).

Li, W., Aste, T., Caccioli, F. & Livan, G. Early coauthorship with top scientists predicts success in academic careers. Nat. Commun. 10 , 5170 (2019).

Hendry, D. F., Pagan, A. R. & Sargan, J. D. Dynamic specification. Handb. Econ. 2 , 1023–1100 (1984).

Jin, C., Ma, Y. & Uzzi, B. Scientific prizes and the extraordinary growth of scientific topics. Nat. Commun. 12 , 5619 (2021).

Azoulay, P., Ganguli, I. & Zivin, J. G. The mobility of elite life scientists: professional and personal determinants. Res. Policy 46 , 573–590 (2017).

Slavova, K., Fosfuri, A. & De Castro, J. O. Learning by hiring: the effects of scientists’ inbound mobility on research performance in academia. Organ. Sci. 27 , 72–89 (2016).

Sarsons, H. Recognition for group work: gender differences in academia. Am. Econ. Rev. 107 , 141–145 (2017).

Campbell, L. G., Mehtani, S., Dozier, M. E. & Rinehart, J. Gender-heterogeneous working groups produce higher quality science. PLoS ONE 8 , e79147 (2013).

Azoulay, P., Graff Zivin, J. S. & Wang, J. Superstar extinction. Q. J. Econ. 125 , 549–589 (2010).

Furman, J. L. & Stern, S. Climbing atop the shoulders of giants: the impact of institutions on cumulative research. Am. Econ. Rev. 101 , 1933–1963 (2011).

Williams, H. L. Intellectual property rights and innovation: evidence from the human genome. J. Polit. Econ. 121 , 1–27 (2013).

Rubin, A. & Rubin, E. Systematic Bias in the Progress of Research. J. Polit. Econ. 129 , 2666–2719 (2021).

Lu, S. F., Jin, G. Z., Uzzi, B. & Jones, B. The retraction penalty: evidence from the Web of Science. Sci. Rep. 3 , 3146 (2013).

Jin, G. Z., Jones, B., Lu, S. F. & Uzzi, B. The reverse Matthew effect: consequences of retraction in scientific teams. Rev. Econ. Stat. 101 , 492–506 (2019).

Azoulay, P., Bonatti, A. & Krieger, J. L. The career effects of scandal: evidence from scientific retractions. Res. Policy 46 , 1552–1569 (2017).

Goodman-Bacon, A. Difference-in-differences with variation in treatment timing. J. Econ. 225 , 254–277 (2021).

Callaway, B. & Sant’Anna, P. H. Difference-in-differences with multiple time periods. J. Econ. 225 , 200–230 (2021).

Hill, R. Searching for Superstars: Research Risk and Talent Discovery in Astronomy Working Paper (Massachusetts Institute of Technology, 2019).

Bagues, M., Sylos-Labini, M. & Zinovyeva, N. Does the gender composition of scientific committees matter? Am. Econ. Rev. 107 , 1207–1238 (2017).

Sampat, B. & Williams, H. L. How do patents affect follow-on innovation? Evidence from the human genome. Am. Econ. Rev. 109 , 203–236 (2019).

Moretti, E. & Wilson, D. J. The effect of state taxes on the geographical location of top earners: evidence from star scientists. Am. Econ. Rev. 107 , 1858–1903 (2017).

Jacob, B. A. & Lefgren, L. The impact of research grant funding on scientific productivity. J. Public Econ. 95 , 1168–1177 (2011).

Li, D. Expertise versus bias in evaluation: evidence from the NIH. Am. Econ. J. Appl. Econ. 9 , 60–92 (2017).

Pearl, J. Causal diagrams for empirical research. Biometrika 82 , 669–688 (1995).

Pearl, J. & Mackenzie, D. The Book of Why: The New Science of Cause and Effect (Basic Books, 2018).

Traag, V. A. Inferring the causal effect of journals on citations. Quant. Sci. Stud. 2 , 496–504 (2021).

Traag, V. & Waltman, L. Causal foundations of bias, disparity and fairness. Preprint at https://doi.org/10.48550/arXiv.2207.13665 (2022).

Imbens, G. W. Potential outcome and directed acyclic graph approaches to causality: relevance for empirical practice in economics. J. Econ. Lit. 58 , 1129–1179 (2020).

Heckman, J. J. & Pinto, R. Causality and Econometrics (National Bureau of Economic Research, 2022).

Aggarwal, I., Woolley, A. W., Chabris, C. F. & Malone, T. W. The impact of cognitive style diversity on implicit learning in teams. Front. Psychol. 10 , 112 (2019).

Balietti, S., Goldstone, R. L. & Helbing, D. Peer review and competition in the Art Exhibition Game. Proc. Natl Acad. Sci. USA 113 , 8414–8419 (2016).

Paulus, F. M., Rademacher, L., Schäfer, T. A. J., Müller-Pinzler, L. & Krach, S. Journal impact factor shapes scientists’ reward signal in the prospect of publication. PLoS ONE 10 , e0142537 (2015).

Williams, W. M. & Ceci, S. J. National hiring experiments reveal 2:1 faculty preference for women on STEM tenure track. Proc. Natl Acad. Sci. USA 112 , 5360–5365 (2015).

Collaboration, O. S. Estimating the reproducibility of psychological science. Science 349 , aac4716 (2015).

Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351 , 1433–1436 (2016).

Camerer, C. F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat. Hum. Behav. 2 , 637–644 (2018).

Duflo, E. & Banerjee, A. Handbook of Field Experiments (Elsevier, 2017).

Tomkins, A., Zhang, M. & Heavlin, W. D. Reviewer bias in single versus double-blind peer review. Proc. Natl Acad. Sci. USA 114 , 12708–12713 (2017).

Blank, R. M. The effects of double-blind versus single-blind reviewing: experimental evidence from the American Economic Review. Am. Econ. Rev. 81 , 1041–1067 (1991).

Boudreau, K. J., Guinan, E. C., Lakhani, K. R. & Riedl, C. Looking across and looking beyond the knowledge frontier: intellectual distance, novelty, and resource allocation in science. Manage. Sci. 62 , 2765–2783 (2016).

Lane, J. et al. When Do Experts Listen to Other Experts? The Role of Negative Information in Expert Evaluations for Novel Projects Working Paper #21-007 (Harvard Business School, 2020).

Teplitskiy, M. et al. Do Experts Listen to Other Experts? Field Experimental Evidence from Scientific Peer Review (Harvard Business School, 2019).

Moss-Racusin, C. A., Dovidio, J. F., Brescoll, V. L., Graham, M. J. & Handelsman, J. Science faculty’s subtle gender biases favor male students. Proc. Natl Acad. Sci. USA 109 , 16474–16479 (2012).

Forscher, P. S., Cox, W. T., Brauer, M. & Devine, P. G. Little race or gender bias in an experiment of initial review of NIH R01 grant proposals. Nat. Hum. Behav. 3 , 257–264 (2019).

Dennehy, T. C. & Dasgupta, N. Female peer mentors early in college increase women’s positive academic experiences and retention in engineering. Proc. Natl Acad. Sci. USA 114 , 5964–5969 (2017).

Azoulay, P. Turn the scientific method on ourselves. Nature 484 , 31–32 (2012).

Download references


The authors thank all members of the Center for Science of Science and Innovation (CSSI) for invaluable comments. This work was supported by the Air Force Office of Scientific Research under award number FA9550-19-1-0354, National Science Foundation grant SBE 1829344, and the Alfred P. Sloan Foundation G-2019-12485.

Author information

Authors and affiliations.

Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA

Lu Liu, Benjamin F. Jones, Brian Uzzi & Dashun Wang

Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA

Kellogg School of Management, Northwestern University, Evanston, IL, USA

College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, USA

National Bureau of Economic Research, Cambridge, MA, USA

Benjamin F. Jones

Brookings Institution, Washington, DC, USA

McCormick School of Engineering, Northwestern University, Evanston, IL, USA

  • Dashun Wang

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Dashun Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Human Behaviour thanks Ludo Waltman, Erin Leahey and Sarah Bratt for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Liu, L., Jones, B.F., Uzzi, B. et al. Data, measurement and empirical methods in the science of science. Nat Hum Behav 7 , 1046–1058 (2023). https://doi.org/10.1038/s41562-023-01562-4

Download citation

Received : 30 June 2022

Accepted : 17 February 2023

Published : 01 June 2023

Issue Date : July 2023

DOI : https://doi.org/10.1038/s41562-023-01562-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Rescaling the disruption index reveals the universality of disruption distributions in science.

  • Alex J. Yang
  • Hongcun Gong
  • Sanhong Deng

Scientometrics (2024)

SciSciNet: A large-scale open data lake for the science of science research

Scientific Data (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

empirical research vs scientific method

What is Empirical Research? Definition, Methods, Examples

Appinio Research · 09.02.2024 · 36min read

What is Empirical Research Definition Methods Examples

Ever wondered how we gather the facts, unveil hidden truths, and make informed decisions in a world filled with questions? Empirical research holds the key.

In this guide, we'll delve deep into the art and science of empirical research, unraveling its methods, mysteries, and manifold applications. From defining the core principles to mastering data analysis and reporting findings, we're here to equip you with the knowledge and tools to navigate the empirical landscape.

What is Empirical Research?

Empirical research is the cornerstone of scientific inquiry, providing a systematic and structured approach to investigating the world around us. It is the process of gathering and analyzing empirical or observable data to test hypotheses, answer research questions, or gain insights into various phenomena. This form of research relies on evidence derived from direct observation or experimentation, allowing researchers to draw conclusions based on real-world data rather than purely theoretical or speculative reasoning.

Characteristics of Empirical Research

Empirical research is characterized by several key features:

  • Observation and Measurement : It involves the systematic observation or measurement of variables, events, or behaviors.
  • Data Collection : Researchers collect data through various methods, such as surveys, experiments, observations, or interviews.
  • Testable Hypotheses : Empirical research often starts with testable hypotheses that are evaluated using collected data.
  • Quantitative or Qualitative Data : Data can be quantitative (numerical) or qualitative (non-numerical), depending on the research design.
  • Statistical Analysis : Quantitative data often undergo statistical analysis to determine patterns , relationships, or significance.
  • Objectivity and Replicability : Empirical research strives for objectivity, minimizing researcher bias . It should be replicable, allowing other researchers to conduct the same study to verify results.
  • Conclusions and Generalizations : Empirical research generates findings based on data and aims to make generalizations about larger populations or phenomena.

Importance of Empirical Research

Empirical research plays a pivotal role in advancing knowledge across various disciplines. Its importance extends to academia, industry, and society as a whole. Here are several reasons why empirical research is essential:

  • Evidence-Based Knowledge : Empirical research provides a solid foundation of evidence-based knowledge. It enables us to test hypotheses, confirm or refute theories, and build a robust understanding of the world.
  • Scientific Progress : In the scientific community, empirical research fuels progress by expanding the boundaries of existing knowledge. It contributes to the development of theories and the formulation of new research questions.
  • Problem Solving : Empirical research is instrumental in addressing real-world problems and challenges. It offers insights and data-driven solutions to complex issues in fields like healthcare, economics, and environmental science.
  • Informed Decision-Making : In policymaking, business, and healthcare, empirical research informs decision-makers by providing data-driven insights. It guides strategies, investments, and policies for optimal outcomes.
  • Quality Assurance : Empirical research is essential for quality assurance and validation in various industries, including pharmaceuticals, manufacturing, and technology. It ensures that products and processes meet established standards.
  • Continuous Improvement : Businesses and organizations use empirical research to evaluate performance, customer satisfaction, and product effectiveness. This data-driven approach fosters continuous improvement and innovation.
  • Human Advancement : Empirical research in fields like medicine and psychology contributes to the betterment of human health and well-being. It leads to medical breakthroughs, improved therapies, and enhanced psychological interventions.
  • Critical Thinking and Problem Solving : Engaging in empirical research fosters critical thinking skills, problem-solving abilities, and a deep appreciation for evidence-based decision-making.

Empirical research empowers us to explore, understand, and improve the world around us. It forms the bedrock of scientific inquiry and drives progress in countless domains, shaping our understanding of both the natural and social sciences.

How to Conduct Empirical Research?

So, you've decided to dive into the world of empirical research. Let's begin by exploring the crucial steps involved in getting started with your research project.

1. Select a Research Topic

Selecting the right research topic is the cornerstone of a successful empirical study. It's essential to choose a topic that not only piques your interest but also aligns with your research goals and objectives. Here's how to go about it:

  • Identify Your Interests : Start by reflecting on your passions and interests. What topics fascinate you the most? Your enthusiasm will be your driving force throughout the research process.
  • Brainstorm Ideas : Engage in brainstorming sessions to generate potential research topics. Consider the questions you've always wanted to answer or the issues that intrigue you.
  • Relevance and Significance : Assess the relevance and significance of your chosen topic. Does it contribute to existing knowledge? Is it a pressing issue in your field of study or the broader community?
  • Feasibility : Evaluate the feasibility of your research topic. Do you have access to the necessary resources, data, and participants (if applicable)?

2. Formulate Research Questions

Once you've narrowed down your research topic, the next step is to formulate clear and precise research questions . These questions will guide your entire research process and shape your study's direction. To create effective research questions:

  • Specificity : Ensure that your research questions are specific and focused. Vague or overly broad questions can lead to inconclusive results.
  • Relevance : Your research questions should directly relate to your chosen topic. They should address gaps in knowledge or contribute to solving a particular problem.
  • Testability : Ensure that your questions are testable through empirical methods. You should be able to gather data and analyze it to answer these questions.
  • Avoid Bias : Craft your questions in a way that avoids leading or biased language. Maintain neutrality to uphold the integrity of your research.

3. Review Existing Literature

Before you embark on your empirical research journey, it's essential to immerse yourself in the existing body of literature related to your chosen topic. This step, often referred to as a literature review, serves several purposes:

  • Contextualization : Understand the historical context and current state of research in your field. What have previous studies found, and what questions remain unanswered?
  • Identifying Gaps : Identify gaps or areas where existing research falls short. These gaps will help you formulate meaningful research questions and hypotheses.
  • Theory Development : If your study is theoretical, consider how existing theories apply to your topic. If it's empirical, understand how previous studies have approached data collection and analysis.
  • Methodological Insights : Learn from the methodologies employed in previous research. What methods were successful, and what challenges did researchers face?

4. Define Variables

Variables are fundamental components of empirical research. They are the factors or characteristics that can change or be manipulated during your study. Properly defining and categorizing variables is crucial for the clarity and validity of your research. Here's what you need to know:

  • Independent Variables : These are the variables that you, as the researcher, manipulate or control. They are the "cause" in cause-and-effect relationships.
  • Dependent Variables : Dependent variables are the outcomes or responses that you measure or observe. They are the "effect" influenced by changes in independent variables.
  • Operational Definitions : To ensure consistency and clarity, provide operational definitions for your variables. Specify how you will measure or manipulate each variable.
  • Control Variables : In some studies, controlling for other variables that may influence your dependent variable is essential. These are known as control variables.

Understanding these foundational aspects of empirical research will set a solid foundation for the rest of your journey. Now that you've grasped the essentials of getting started, let's delve deeper into the intricacies of research design.

Empirical Research Design

Now that you've selected your research topic, formulated research questions, and defined your variables, it's time to delve into the heart of your empirical research journey – research design . This pivotal step determines how you will collect data and what methods you'll employ to answer your research questions. Let's explore the various facets of research design in detail.

Types of Empirical Research

Empirical research can take on several forms, each with its own unique approach and methodologies. Understanding the different types of empirical research will help you choose the most suitable design for your study. Here are some common types:

  • Experimental Research : In this type, researchers manipulate one or more independent variables to observe their impact on dependent variables. It's highly controlled and often conducted in a laboratory setting.
  • Observational Research : Observational research involves the systematic observation of subjects or phenomena without intervention. Researchers are passive observers, documenting behaviors, events, or patterns.
  • Survey Research : Surveys are used to collect data through structured questionnaires or interviews. This method is efficient for gathering information from a large number of participants.
  • Case Study Research : Case studies focus on in-depth exploration of one or a few cases. Researchers gather detailed information through various sources such as interviews, documents, and observations.
  • Qualitative Research : Qualitative research aims to understand behaviors, experiences, and opinions in depth. It often involves open-ended questions, interviews, and thematic analysis.
  • Quantitative Research : Quantitative research collects numerical data and relies on statistical analysis to draw conclusions. It involves structured questionnaires, experiments, and surveys.

Your choice of research type should align with your research questions and objectives. Experimental research, for example, is ideal for testing cause-and-effect relationships, while qualitative research is more suitable for exploring complex phenomena.

Experimental Design

Experimental research is a systematic approach to studying causal relationships. It's characterized by the manipulation of one or more independent variables while controlling for other factors. Here are some key aspects of experimental design:

  • Control and Experimental Groups : Participants are randomly assigned to either a control group or an experimental group. The independent variable is manipulated for the experimental group but not for the control group.
  • Randomization : Randomization is crucial to eliminate bias in group assignment. It ensures that each participant has an equal chance of being in either group.
  • Hypothesis Testing : Experimental research often involves hypothesis testing. Researchers formulate hypotheses about the expected effects of the independent variable and use statistical analysis to test these hypotheses.

Observational Design

Observational research entails careful and systematic observation of subjects or phenomena. It's advantageous when you want to understand natural behaviors or events. Key aspects of observational design include:

  • Participant Observation : Researchers immerse themselves in the environment they are studying. They become part of the group being observed, allowing for a deep understanding of behaviors.
  • Non-Participant Observation : In non-participant observation, researchers remain separate from the subjects. They observe and document behaviors without direct involvement.
  • Data Collection Methods : Observational research can involve various data collection methods, such as field notes, video recordings, photographs, or coding of observed behaviors.

Survey Design

Surveys are a popular choice for collecting data from a large number of participants. Effective survey design is essential to ensure the validity and reliability of your data. Consider the following:

  • Questionnaire Design : Create clear and concise questions that are easy for participants to understand. Avoid leading or biased questions.
  • Sampling Methods : Decide on the appropriate sampling method for your study, whether it's random, stratified, or convenience sampling.
  • Data Collection Tools : Choose the right tools for data collection, whether it's paper surveys, online questionnaires, or face-to-face interviews.

Case Study Design

Case studies are an in-depth exploration of one or a few cases to gain a deep understanding of a particular phenomenon. Key aspects of case study design include:

  • Single Case vs. Multiple Case Studies : Decide whether you'll focus on a single case or multiple cases. Single case studies are intensive and allow for detailed examination, while multiple case studies provide comparative insights.
  • Data Collection Methods : Gather data through interviews, observations, document analysis, or a combination of these methods.

Qualitative vs. Quantitative Research

In empirical research, you'll often encounter the distinction between qualitative and quantitative research . Here's a closer look at these two approaches:

  • Qualitative Research : Qualitative research seeks an in-depth understanding of human behavior, experiences, and perspectives. It involves open-ended questions, interviews, and the analysis of textual or narrative data. Qualitative research is exploratory and often used when the research question is complex and requires a nuanced understanding.
  • Quantitative Research : Quantitative research collects numerical data and employs statistical analysis to draw conclusions. It involves structured questionnaires, experiments, and surveys. Quantitative research is ideal for testing hypotheses and establishing cause-and-effect relationships.

Understanding the various research design options is crucial in determining the most appropriate approach for your study. Your choice should align with your research questions, objectives, and the nature of the phenomenon you're investigating.

Data Collection for Empirical Research

Now that you've established your research design, it's time to roll up your sleeves and collect the data that will fuel your empirical research. Effective data collection is essential for obtaining accurate and reliable results.

Sampling Methods

Sampling methods are critical in empirical research, as they determine the subset of individuals or elements from your target population that you will study. Here are some standard sampling methods:

  • Random Sampling : Random sampling ensures that every member of the population has an equal chance of being selected. It minimizes bias and is often used in quantitative research.
  • Stratified Sampling : Stratified sampling involves dividing the population into subgroups or strata based on specific characteristics (e.g., age, gender, location). Samples are then randomly selected from each stratum, ensuring representation of all subgroups.
  • Convenience Sampling : Convenience sampling involves selecting participants who are readily available or easily accessible. While it's convenient, it may introduce bias and limit the generalizability of results.
  • Snowball Sampling : Snowball sampling is instrumental when studying hard-to-reach or hidden populations. One participant leads you to another, creating a "snowball" effect. This method is common in qualitative research.
  • Purposive Sampling : In purposive sampling, researchers deliberately select participants who meet specific criteria relevant to their research questions. It's often used in qualitative studies to gather in-depth information.

The choice of sampling method depends on the nature of your research, available resources, and the degree of precision required. It's crucial to carefully consider your sampling strategy to ensure that your sample accurately represents your target population.

Data Collection Instruments

Data collection instruments are the tools you use to gather information from your participants or sources. These instruments should be designed to capture the data you need accurately. Here are some popular data collection instruments:

  • Questionnaires : Questionnaires consist of structured questions with predefined response options. When designing questionnaires, consider the clarity of questions, the order of questions, and the response format (e.g., Likert scale , multiple-choice).
  • Interviews : Interviews involve direct communication between the researcher and participants. They can be structured (with predetermined questions) or unstructured (open-ended). Effective interviews require active listening and probing for deeper insights.
  • Observations : Observations entail systematically and objectively recording behaviors, events, or phenomena. Researchers must establish clear criteria for what to observe, how to record observations, and when to observe.
  • Surveys : Surveys are a common data collection instrument for quantitative research. They can be administered through various means, including online surveys, paper surveys, and telephone surveys.
  • Documents and Archives : In some cases, data may be collected from existing documents, records, or archives. Ensure that the sources are reliable, relevant, and properly documented.

To streamline your process and gather insights with precision and efficiency, consider leveraging innovative tools like Appinio . With Appinio's intuitive platform, you can harness the power of real-time consumer data to inform your research decisions effectively. Whether you're conducting surveys, interviews, or observations, Appinio empowers you to define your target audience, collect data from diverse demographics, and analyze results seamlessly.

By incorporating Appinio into your data collection toolkit, you can unlock a world of possibilities and elevate the impact of your empirical research. Ready to revolutionize your approach to data collection?

Book a Demo

Data Collection Procedures

Data collection procedures outline the step-by-step process for gathering data. These procedures should be meticulously planned and executed to maintain the integrity of your research.

  • Training : If you have a research team, ensure that they are trained in data collection methods and protocols. Consistency in data collection is crucial.
  • Pilot Testing : Before launching your data collection, conduct a pilot test with a small group to identify any potential problems with your instruments or procedures. Make necessary adjustments based on feedback.
  • Data Recording : Establish a systematic method for recording data. This may include timestamps, codes, or identifiers for each data point.
  • Data Security : Safeguard the confidentiality and security of collected data. Ensure that only authorized individuals have access to the data.
  • Data Storage : Properly organize and store your data in a secure location, whether in physical or digital form. Back up data to prevent loss.

Ethical Considerations

Ethical considerations are paramount in empirical research, as they ensure the well-being and rights of participants are protected.

  • Informed Consent : Obtain informed consent from participants, providing clear information about the research purpose, procedures, risks, and their right to withdraw at any time.
  • Privacy and Confidentiality : Protect the privacy and confidentiality of participants. Ensure that data is anonymized and sensitive information is kept confidential.
  • Beneficence : Ensure that your research benefits participants and society while minimizing harm. Consider the potential risks and benefits of your study.
  • Honesty and Integrity : Conduct research with honesty and integrity. Report findings accurately and transparently, even if they are not what you expected.
  • Respect for Participants : Treat participants with respect, dignity, and sensitivity to cultural differences. Avoid any form of coercion or manipulation.
  • Institutional Review Board (IRB) : If required, seek approval from an IRB or ethics committee before conducting your research, particularly when working with human participants.

Adhering to ethical guidelines is not only essential for the ethical conduct of research but also crucial for the credibility and validity of your study. Ethical research practices build trust between researchers and participants and contribute to the advancement of knowledge with integrity.

With a solid understanding of data collection, including sampling methods, instruments, procedures, and ethical considerations, you are now well-equipped to gather the data needed to answer your research questions.

Empirical Research Data Analysis

Now comes the exciting phase of data analysis, where the raw data you've diligently collected starts to yield insights and answers to your research questions. We will explore the various aspects of data analysis, from preparing your data to drawing meaningful conclusions through statistics and visualization.

Data Preparation

Data preparation is the crucial first step in data analysis. It involves cleaning, organizing, and transforming your raw data into a format that is ready for analysis. Effective data preparation ensures the accuracy and reliability of your results.

  • Data Cleaning : Identify and rectify errors, missing values, and inconsistencies in your dataset. This may involve correcting typos, removing outliers, and imputing missing data.
  • Data Coding : Assign numerical values or codes to categorical variables to make them suitable for statistical analysis. For example, converting "Yes" and "No" to 1 and 0.
  • Data Transformation : Transform variables as needed to meet the assumptions of the statistical tests you plan to use. Common transformations include logarithmic or square root transformations.
  • Data Integration : If your data comes from multiple sources, integrate it into a unified dataset, ensuring that variables match and align.
  • Data Documentation : Maintain clear documentation of all data preparation steps, as well as the rationale behind each decision. This transparency is essential for replicability.

Effective data preparation lays the foundation for accurate and meaningful analysis. It allows you to trust the results that will follow in the subsequent stages.

Descriptive Statistics

Descriptive statistics help you summarize and make sense of your data by providing a clear overview of its key characteristics. These statistics are essential for understanding the central tendencies, variability, and distribution of your variables. Descriptive statistics include:

  • Measures of Central Tendency : These include the mean (average), median (middle value), and mode (most frequent value). They help you understand the typical or central value of your data.
  • Measures of Dispersion : Measures like the range, variance, and standard deviation provide insights into the spread or variability of your data points.
  • Frequency Distributions : Creating frequency distributions or histograms allows you to visualize the distribution of your data across different values or categories.

Descriptive statistics provide the initial insights needed to understand your data's basic characteristics, which can inform further analysis.

Inferential Statistics

Inferential statistics take your analysis to the next level by allowing you to make inferences or predictions about a larger population based on your sample data. These methods help you test hypotheses and draw meaningful conclusions. Key concepts in inferential statistics include:

  • Hypothesis Testing : Hypothesis tests (e.g., t-tests, chi-squared tests) help you determine whether observed differences or associations in your data are statistically significant or occurred by chance.
  • Confidence Intervals : Confidence intervals provide a range within which population parameters (e.g., population mean) are likely to fall based on your sample data.
  • Regression Analysis : Regression models (linear, logistic, etc.) help you explore relationships between variables and make predictions.
  • Analysis of Variance (ANOVA) : ANOVA tests are used to compare means between multiple groups, allowing you to assess whether differences are statistically significant.

Inferential statistics are powerful tools for drawing conclusions from your data and assessing the generalizability of your findings to the broader population.

Qualitative Data Analysis

Qualitative data analysis is employed when working with non-numerical data, such as text, interviews, or open-ended survey responses. It focuses on understanding the underlying themes, patterns, and meanings within qualitative data. Qualitative analysis techniques include:

  • Thematic Analysis : Identifying and analyzing recurring themes or patterns within textual data.
  • Content Analysis : Categorizing and coding qualitative data to extract meaningful insights.
  • Grounded Theory : Developing theories or frameworks based on emergent themes from the data.
  • Narrative Analysis : Examining the structure and content of narratives to uncover meaning.

Qualitative data analysis provides a rich and nuanced understanding of complex phenomena and human experiences.

Data Visualization

Data visualization is the art of representing data graphically to make complex information more understandable and accessible. Effective data visualization can reveal patterns, trends, and outliers in your data. Common types of data visualization include:

  • Bar Charts and Histograms : Used to display the distribution of categorical data or discrete data .
  • Line Charts : Ideal for showing trends and changes in data over time.
  • Scatter Plots : Visualize relationships and correlations between two variables.
  • Pie Charts : Display the composition of a whole in terms of its parts.
  • Heatmaps : Depict patterns and relationships in multidimensional data through color-coding.
  • Box Plots : Provide a summary of the data distribution, including outliers.
  • Interactive Dashboards : Create dynamic visualizations that allow users to explore data interactively.

Data visualization not only enhances your understanding of the data but also serves as a powerful communication tool to convey your findings to others.

As you embark on the data analysis phase of your empirical research, remember that the specific methods and techniques you choose will depend on your research questions, data type, and objectives. Effective data analysis transforms raw data into valuable insights, bringing you closer to the answers you seek.

How to Report Empirical Research Results?

At this stage, you get to share your empirical research findings with the world. Effective reporting and presentation of your results are crucial for communicating your research's impact and insights.

1. Write the Research Paper

Writing a research paper is the culmination of your empirical research journey. It's where you synthesize your findings, provide context, and contribute to the body of knowledge in your field.

  • Title and Abstract : Craft a clear and concise title that reflects your research's essence. The abstract should provide a brief summary of your research objectives, methods, findings, and implications.
  • Introduction : In the introduction, introduce your research topic, state your research questions or hypotheses, and explain the significance of your study. Provide context by discussing relevant literature.
  • Methods : Describe your research design, data collection methods, and sampling procedures. Be precise and transparent, allowing readers to understand how you conducted your study.
  • Results : Present your findings in a clear and organized manner. Use tables, graphs, and statistical analyses to support your results. Avoid interpreting your findings in this section; focus on the presentation of raw data.
  • Discussion : Interpret your findings and discuss their implications. Relate your results to your research questions and the existing literature. Address any limitations of your study and suggest avenues for future research.
  • Conclusion : Summarize the key points of your research and its significance. Restate your main findings and their implications.
  • References : Cite all sources used in your research following a specific citation style (e.g., APA, MLA, Chicago). Ensure accuracy and consistency in your citations.
  • Appendices : Include any supplementary material, such as questionnaires, data coding sheets, or additional analyses, in the appendices.

Writing a research paper is a skill that improves with practice. Ensure clarity, coherence, and conciseness in your writing to make your research accessible to a broader audience.

2. Create Visuals and Tables

Visuals and tables are powerful tools for presenting complex data in an accessible and understandable manner.

  • Clarity : Ensure that your visuals and tables are clear and easy to interpret. Use descriptive titles and labels.
  • Consistency : Maintain consistency in formatting, such as font size and style, across all visuals and tables.
  • Appropriateness : Choose the most suitable visual representation for your data. Bar charts, line graphs, and scatter plots work well for different types of data.
  • Simplicity : Avoid clutter and unnecessary details. Focus on conveying the main points.
  • Accessibility : Make sure your visuals and tables are accessible to a broad audience, including those with visual impairments.
  • Captions : Include informative captions that explain the significance of each visual or table.

Compelling visuals and tables enhance the reader's understanding of your research and can be the key to conveying complex information efficiently.

3. Interpret Findings

Interpreting your findings is where you bridge the gap between data and meaning. It's your opportunity to provide context, discuss implications, and offer insights. When interpreting your findings:

  • Relate to Research Questions : Discuss how your findings directly address your research questions or hypotheses.
  • Compare with Literature : Analyze how your results align with or deviate from previous research in your field. What insights can you draw from these comparisons?
  • Discuss Limitations : Be transparent about the limitations of your study. Address any constraints, biases, or potential sources of error.
  • Practical Implications : Explore the real-world implications of your findings. How can they be applied or inform decision-making?
  • Future Research Directions : Suggest areas for future research based on the gaps or unanswered questions that emerged from your study.

Interpreting findings goes beyond simply presenting data; it's about weaving a narrative that helps readers grasp the significance of your research in the broader context.

With your research paper written, structured, and enriched with visuals, and your findings expertly interpreted, you are now prepared to communicate your research effectively. Sharing your insights and contributing to the body of knowledge in your field is a significant accomplishment in empirical research.

Examples of Empirical Research

To solidify your understanding of empirical research, let's delve into some real-world examples across different fields. These examples will illustrate how empirical research is applied to gather data, analyze findings, and draw conclusions.

Social Sciences

In the realm of social sciences, consider a sociological study exploring the impact of socioeconomic status on educational attainment. Researchers gather data from a diverse group of individuals, including their family backgrounds, income levels, and academic achievements.

Through statistical analysis, they can identify correlations and trends, revealing whether individuals from lower socioeconomic backgrounds are less likely to attain higher levels of education. This empirical research helps shed light on societal inequalities and informs policymakers on potential interventions to address disparities in educational access.

Environmental Science

Environmental scientists often employ empirical research to assess the effects of environmental changes. For instance, researchers studying the impact of climate change on wildlife might collect data on animal populations, weather patterns, and habitat conditions over an extended period.

By analyzing this empirical data, they can identify correlations between climate fluctuations and changes in wildlife behavior, migration patterns, or population sizes. This empirical research is crucial for understanding the ecological consequences of climate change and informing conservation efforts.

Business and Economics

In the business world, empirical research is essential for making data-driven decisions. Consider a market research study conducted by a business seeking to launch a new product. They collect data through surveys , focus groups , and consumer behavior analysis.

By examining this empirical data, the company can gauge consumer preferences, demand, and potential market size. Empirical research in business helps guide product development, pricing strategies, and marketing campaigns, increasing the likelihood of a successful product launch.

Psychological studies frequently rely on empirical research to understand human behavior and cognition. For instance, a psychologist interested in examining the impact of stress on memory might design an experiment. Participants are exposed to stress-inducing situations, and their memory performance is assessed through various tasks.

By analyzing the data collected, the psychologist can determine whether stress has a significant effect on memory recall. This empirical research contributes to our understanding of the complex interplay between psychological factors and cognitive processes.

These examples highlight the versatility and applicability of empirical research across diverse fields. Whether in medicine, social sciences, environmental science, business, or psychology, empirical research serves as a fundamental tool for gaining insights, testing hypotheses, and driving advancements in knowledge and practice.

Conclusion for Empirical Research

Empirical research is a powerful tool for gaining insights, testing hypotheses, and making informed decisions. By following the steps outlined in this guide, you've learned how to select research topics, collect data, analyze findings, and effectively communicate your research to the world. Remember, empirical research is a journey of discovery, and each step you take brings you closer to a deeper understanding of the world around you. Whether you're a scientist, a student, or someone curious about the process, the principles of empirical research empower you to explore, learn, and contribute to the ever-expanding realm of knowledge.

How to Collect Data for Empirical Research?

Introducing Appinio , the real-time market research platform revolutionizing how companies gather consumer insights for their empirical research endeavors. With Appinio, you can conduct your own market research in minutes, gaining valuable data to fuel your data-driven decisions.

Appinio is more than just a market research platform; it's a catalyst for transforming the way you approach empirical research, making it exciting, intuitive, and seamlessly integrated into your decision-making process.

Here's why Appinio is the go-to solution for empirical research:

  • From Questions to Insights in Minutes : With Appinio's streamlined process, you can go from formulating your research questions to obtaining actionable insights in a matter of minutes, saving you time and effort.
  • Intuitive Platform for Everyone : No need for a PhD in research; Appinio's platform is designed to be intuitive and user-friendly, ensuring that anyone can navigate and utilize it effectively.
  • Rapid Response Times : With an average field time of under 23 minutes for 1,000 respondents, Appinio delivers rapid results, allowing you to gather data swiftly and efficiently.
  • Global Reach with Targeted Precision : With access to over 90 countries and the ability to define target groups based on 1200+ characteristics, Appinio empowers you to reach your desired audience with precision and ease.

Register now EN

Get free access to the platform!

Join the loop 💌

Be the first to hear about new updates, product news, and data insights. We'll send it all straight to your inbox.

Get the latest market research news straight to your inbox! 💌

Wait, there's more

Time Series Analysis Definition Types Techniques Examples

16.05.2024 | 30min read

Time Series Analysis: Definition, Types, Techniques, Examples

Experimental Research Definition Types Design Examples

14.05.2024 | 30min read

Experimental Research: Definition, Types, Design, Examples

Interval Scale Definition Characteristics Examples

07.05.2024 | 29min read

Interval Scale: Definition, Characteristics, Examples

1.1 Methods of Knowing

Learning objectives.

  • Describe the 5 methods of acquiring knowledge
  • Understand the benefits and problems with each.

Take a minute to ponder some of what you know and how you acquired that knowledge. Perhaps you know that you should make your bed in the morning because your mother or father told you this is what you should do, perhaps you know that swans are white because all of the swans you have seen are white, or perhaps you know that your friend is lying to you because she is acting strange and won’t look you in the eye. But should we trust knowledge from these sources? The methods of acquiring knowledge can be broken down into five categories each with its own strengths and weaknesses.

The first method of knowing is intuition. When we use our intuition, we are relying on our guts, our emotions, and/or our instincts to guide us. Rather than examining facts or using rational thought, intuition involves believing what feels true. The problem with relying on intuition is that our intuitions can be wrong because they are driven by cognitive and motivational biases rather than logical reasoning or scientific evidence. While the strange behavior of your friend may lead you to think s/he is lying to you it may just be that s/he is holding in a bit of gas or is preoccupied with some other issue that is irrelevant to you. However, weighing alternatives and thinking of all the different possibilities can be paralyzing for some people and sometimes decisions based on intuition are actually superior to those based on analysis (people interested in this idea should read Malcolm Gladwell’s book Blink) [1] .

Perhaps one of the most common methods of acquiring knowledge is through authority. This method involves accepting new ideas because some authority figure states that they are true. These authorities include parents, the media, doctors, Priests and other religious authorities, the government, and professors. While in an ideal world we should be able to trust authority figures, history has taught us otherwise and many instances of atrocities against humanity are a consequence of people unquestioningly following authority (e.g., Salem Witch Trials, Nazi War Crimes). On a more benign level, while your parents may have told you that you should make your bed in the morning, making your bed provides the warm damp environment in which mites thrive. Keeping the sheets open provides a less hospitable environment for mites. These examples illustrate that the problem with using authority to obtain knowledge is that they may be wrong, they may just be using their intuition to arrive at their conclusions, and they may have their own reasons to mislead you. Nevertheless, much of the information we acquire is through authority because we don’t have time to question and independently research every piece of knowledge we learn through authority. But we can learn to evaluate the credentials of authority figures, to evaluate the methods they used to arrive at their conclusions, and evaluate whether they have any reasons to mislead us.


Rationalism involves using logic and reasoning to acquire new knowledge. Using this method premises are stated and logical rules are followed to arrive at sound conclusions. For instance, if I am given the premise that all swans are white and the premise that this is a swan then I can come to the rational conclusion that this swan is white without actually seeing the swan. The problem with this method is that if the premises are wrong or there is an error in logic then the conclusion will not be valid. For instance, the premise that all swans are white is incorrect; there are black swans in Australia. Also, unless formally trained in the rules of logic it is easy to make an error. Nevertheless, if the premises are correct and logical rules are followed appropriately then this is sound means of acquiring knowledge.

Empiricism involves acquiring knowledge through observation and experience. Once again many of you may have believed that all swans are white because you have only ever seen white swans. For centuries people believed the world is flat because it appears to be flat. These examples and the many visual illusions that trick our senses illustrate the problems with relying on empiricism alone to derive knowledge. We are limited in what we can experience and observe and our senses can deceive us. Moreover, our prior experiences can alter the way we perceive events. Nevertheless, empiricism is at the heart of the scientific method. Science relies on observations. But not just any observations, science relies on structured observations which is known as systematic empiricism.

The Scientific Method

The scientific method is a process of systematically collecting and evaluating evidence to test ideas and answer questions. While scientists may use intuition, authority, rationalism, and empiricism to generate new ideas they don’t stop there. Scientists go a step further by using systematic empiricism to make careful observations under various controlled conditions in order to test their ideas and they use rationalism to arrive at valid conclusions. While the scientific method is the most likely of all of the methods to produce valid knowledge, like all methods of acquiring knowledge it also has its drawbacks. One major problem is that it is not always feasible to use the scientific method; this method can require considerable time and resources. Another problem with the scientific method is that it cannot be used to answer all questions. As described in the following section, the scientific method can only be used to address empirical questions. This book and your research methods course are designed to provide you with an in-depth examination of how psychologists use the scientific method to advance our understanding of human behavior and the mind.

  • Gladwell, M. E. (2007). Blink: The power of thinking without thinking.  How to think straight about psychology (9th ed.). New York: Little, Brown & Company. ↵

Creative Commons License

Share This Book

  • Increase Font Size

Enago Academy

Conceptual Vs. Empirical Research: Which Is Better?

' src=

Scientific research is often divided into two classes: conceptual research and empirical research. There used to be distinct ways of doing research and a researcher would proudly claim to be one or the other, praising his method and scorning the alternative. Today the distinction is not so clear.

What is Conceptual Research?

Conceptual research focuses on the concept or theory that explains or describes the phenomenon being studied. What causes disease? How can we describe the motions of the planets? What are the building blocks of matter? The conceptual researcher sits at his desk with pen in hand and tries to solve these problems by thinking about them. He does no experiments but may make use of observations by others, since this is the mass of data that he is trying to make sense of. Until fairly recently, conceptual research methodology was considered the most honorable form of research—it required using the brain, not the hands. Researchers such as the alchemists who did experiments were considered little better than blacksmiths—“filthy empiricists.”

What is Empirical Research?

For all of their lofty status, conceptual researchers regularly produced theories that were wrong. Aristotle taught that large cannonballs fell to earth faster than small ones, and many generations of professors repeated his teachings until Galileo proved them wrong. Galileo was an empiricist of the best sort, one who performed original experiments not merely to destroy old theories but to provide the basis for new theories. A reaction against the ivory tower theoreticians culminated in those who claimed to have no use for theory, arguing that empirical acquisition of knowledge was the only way to the truth. A pure empiricist would simply graph data and see if he got a straight line relation between variables. If so, he had a good “empirical” relationship that would make useful predictions. The theory behind the correlation was irrelevant.

Conceptual vs. Empirical Research

The Scientific Method: A Bit of Both

The modern scientific method is really a combination of empirical and conceptual research. Using known experimental data a scientist formulates a working hypothesis to explain some aspect of nature. He then performs new experiments designed to test predictions of the theory, to support it or disprove it. Einstein is often cited as an example of a conceptual researcher, but he based his theories on experimental observations and proposed experiments, real and thought, which would test his theories. On the other hand, Edison is often considered an empiricist, the “Edisonian method” being a by-word for trial and error. But Edison appreciated the work of theorists and hired some of the best. Random screening of myriad possibilities is still valuable: pharmaceutical companies looking for new drugs do this, sometimes with great success. Personally, I tend to be a semi-empiricist. In graduate school I used the Hammett linear free-energy relation (a semi-empirical equation) to gain insight into chemical transition states. So I don’t debate on “conceptual vs. empirical research.” There is a range of possibilities between both the forms, all of which have their uses.

' src=

Excellent explanations in a simple language.

' src=

Greeting from Enago Academy! Thank you for your positive comment. We are glad to know that you found our resources useful. Your feedback is very valuable to us. Happy reading!

Thanks for this article,really helpful university of zambia

Albert Einstein did theoretical work–he had no laboratory, Put simply, through new conceptual models, he re-interpreted the findings of others and expressed them mathematically.

Rate this article Cancel Reply

Your email address will not be published.

empirical research vs scientific method

Enago Academy's Most Popular Articles

empirical research vs scientific method

  • Old Webinars
  • Webinar Mobile App

Improving Research Manuscripts Using AI-Powered Insights: Enago reports for effective research communication

Language Quality Importance in Academia AI in Evaluating Language Quality Enago Language Reports Live Demo…

Beyond spellcheck- How Copyediting guarantees an error-free submission

  • Reporting Research

Beyond Spellcheck: How copyediting guarantees error-free submission

Submitting a manuscript is a complex and often an emotional experience for researchers. Whether it’s…

How to Find the Right Journal and Fix Your Manuscript Before Submission

Selection of right journal Meets journal standards Plagiarism free manuscripts Rated from reviewer's POV

empirical research vs scientific method

  • Manuscripts & Grants

Research Aims and Objectives: The dynamic duo for successful research

Picture yourself on a road trip without a destination in mind — driving aimlessly, not…

empirical research vs scientific method

How Academic Editors Can Enhance the Quality of Your Manuscript

Avoiding desk rejection Detecting language errors Conveying your ideas clearly Following technical requirements

Top 4 Guidelines for Health and Clinical Research Report

Top 10 Questions for a Complete Literature Review

empirical research vs scientific method

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

empirical research vs scientific method

As a researcher, what do you consider most when choosing an image manipulation detector?

Conceptual Research vs. Empirical Research

What's the difference.

Conceptual research and empirical research are two distinct approaches to conducting research. Conceptual research focuses on exploring and developing theories, concepts, and ideas. It involves analyzing existing literature, theories, and concepts to gain a deeper understanding of a particular topic. Conceptual research is often used in the early stages of research to generate hypotheses and develop a theoretical framework. On the other hand, empirical research involves collecting and analyzing data to test hypotheses and answer research questions. It relies on observation, measurement, and experimentation to gather evidence and draw conclusions. Empirical research is more focused on obtaining concrete and measurable results, often through surveys, experiments, or observations. Both approaches are valuable in research, with conceptual research providing a foundation for empirical research and empirical research validating or refuting conceptual theories.

Further Detail


Research is a fundamental aspect of any field of study, providing a systematic approach to acquiring knowledge and understanding. In the realm of research, two primary methodologies are commonly employed: conceptual research and empirical research. While both approaches aim to contribute to the body of knowledge, they differ significantly in their attributes, methodologies, and outcomes. This article aims to explore and compare the attributes of conceptual research and empirical research, shedding light on their unique characteristics and applications.

Conceptual Research

Conceptual research, also known as theoretical research, focuses on the exploration and development of theories, concepts, and ideas. It is primarily concerned with abstract and hypothetical constructs, aiming to enhance understanding and generate new insights. Conceptual research often involves a comprehensive review of existing literature, analyzing and synthesizing various theories and concepts to propose new frameworks or models.

One of the key attributes of conceptual research is its reliance on deductive reasoning. Researchers start with a set of existing theories or concepts and use logical reasoning to derive new hypotheses or frameworks. This deductive approach allows researchers to build upon existing knowledge and propose innovative ideas. Conceptual research is often exploratory in nature, seeking to expand the boundaries of knowledge and provide a foundation for further empirical investigations.

Conceptual research is particularly valuable in fields where empirical data may be limited or difficult to obtain. It allows researchers to explore complex phenomena, develop theoretical frameworks, and generate hypotheses that can later be tested through empirical research. By focusing on abstract concepts and theories, conceptual research provides a theoretical foundation for empirical investigations, guiding researchers in their quest for empirical evidence.

Furthermore, conceptual research plays a crucial role in the development of new disciplines or interdisciplinary fields. It helps establish a common language and theoretical framework, facilitating communication and collaboration among researchers from different backgrounds. By synthesizing existing knowledge and proposing new concepts, conceptual research lays the groundwork for empirical studies and contributes to the overall advancement of knowledge.

Empirical Research

Empirical research, in contrast to conceptual research, is concerned with the collection and analysis of observable data. It aims to test hypotheses, validate theories, and provide evidence-based conclusions. Empirical research relies on the systematic collection of data through various methods, such as surveys, experiments, observations, or interviews. The data collected is then analyzed using statistical or qualitative techniques to draw meaningful conclusions.

One of the primary attributes of empirical research is its inductive reasoning approach. Researchers start with specific observations or data and use them to develop general theories or conclusions. This inductive approach allows researchers to derive broader implications from specific instances, providing a basis for generalization. Empirical research is often hypothesis-driven, seeking to test and validate theories or hypotheses through the collection and analysis of data.

Empirical research is highly valued for its ability to provide concrete evidence and support or refute existing theories. It allows researchers to investigate real-world phenomena, understand cause-and-effect relationships, and make informed decisions based on empirical evidence. By relying on observable data, empirical research enhances the credibility and reliability of research findings, contributing to the overall body of knowledge in a field.

Moreover, empirical research is particularly useful in applied fields, where practical implications and real-world applications are of utmost importance. It allows researchers to evaluate the effectiveness of interventions, assess the impact of policies, or measure the outcomes of specific actions. Empirical research provides valuable insights that can inform decision-making processes, guide policy development, and drive evidence-based practices.

Comparing Conceptual Research and Empirical Research

While conceptual research and empirical research differ in their methodologies and approaches, they are both essential components of the research process. Conceptual research focuses on the development of theories and concepts, providing a theoretical foundation for empirical investigations. Empirical research, on the other hand, relies on the collection and analysis of observable data to test and validate theories.

Conceptual research is often exploratory and aims to expand the boundaries of knowledge. It is valuable in fields where empirical data may be limited or difficult to obtain. By synthesizing existing theories and proposing new frameworks, conceptual research provides a theoretical basis for empirical studies. It helps researchers develop hypotheses and guides their quest for empirical evidence.

Empirical research, on the other hand, is hypothesis-driven and seeks to provide concrete evidence and support or refute existing theories. It allows researchers to investigate real-world phenomena, understand cause-and-effect relationships, and make informed decisions based on empirical evidence. Empirical research is particularly useful in applied fields, where practical implications and real-world applications are of utmost importance.

Despite their differences, conceptual research and empirical research are not mutually exclusive. In fact, they often complement each other in the research process. Conceptual research provides the theoretical foundation and guidance for empirical investigations, while empirical research validates and refines existing theories or concepts. The iterative nature of research often involves a continuous cycle of conceptual and empirical research, with each informing and influencing the other.

Both conceptual research and empirical research contribute to the advancement of knowledge in their respective fields. Conceptual research expands theoretical frameworks, proposes new concepts, and lays the groundwork for empirical investigations. Empirical research, on the other hand, provides concrete evidence, validates theories, and informs practical applications. Together, they form a symbiotic relationship, driving progress and innovation in various disciplines.

Conceptual research and empirical research are two distinct methodologies employed in the pursuit of knowledge and understanding. While conceptual research focuses on the development of theories and concepts, empirical research relies on the collection and analysis of observable data. Both approaches have their unique attributes, methodologies, and applications.

Conceptual research plays a crucial role in expanding theoretical frameworks, proposing new concepts, and providing a foundation for empirical investigations. It is particularly valuable in fields where empirical data may be limited or difficult to obtain. On the other hand, empirical research provides concrete evidence, validates theories, and informs practical applications. It is highly valued in applied fields, where evidence-based decision-making is essential.

Despite their differences, conceptual research and empirical research are not mutually exclusive. They often work in tandem, with conceptual research guiding the development of hypotheses and theoretical frameworks, and empirical research validating and refining these theories through the collection and analysis of data. Together, they contribute to the overall advancement of knowledge and understanding in various disciplines.

Comparisons may contain inaccurate information about people, places, or facts. Please report any issues.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • CBE Life Sci Educ
  • v.17(1); Spring 2018

Understanding the Complex Relationship between Critical Thinking and Science Reasoning among Undergraduate Thesis Writers

Jason e. dowd.

† Department of Biology, Duke University, Durham, NC 27708

Robert J. Thompson, Jr.

‡ Department of Psychology and Neuroscience, Duke University, Durham, NC 27708

Leslie A. Schiff

§ Department of Microbiology and Immunology, University of Minnesota, Minneapolis, MN 55455

Julie A. Reynolds

Associated data.

This study empirically examines the relationship between students’ critical-thinking skills and scientific reasoning as reflected in undergraduate thesis writing in biology. Writing offers a unique window into studying this relationship, and the findings raise potential implications for instruction.

Developing critical-thinking and scientific reasoning skills are core learning objectives of science education, but little empirical evidence exists regarding the interrelationships between these constructs. Writing effectively fosters students’ development of these constructs, and it offers a unique window into studying how they relate. In this study of undergraduate thesis writing in biology at two universities, we examine how scientific reasoning exhibited in writing (assessed using the Biology Thesis Assessment Protocol) relates to general and specific critical-thinking skills (assessed using the California Critical Thinking Skills Test), and we consider implications for instruction. We find that scientific reasoning in writing is strongly related to inference , while other aspects of science reasoning that emerge in writing (epistemological considerations, writing conventions, etc.) are not significantly related to critical-thinking skills. Science reasoning in writing is not merely a proxy for critical thinking. In linking features of students’ writing to their critical-thinking skills, this study 1) provides a bridge to prior work suggesting that engagement in science writing enhances critical thinking and 2) serves as a foundational step for subsequently determining whether instruction focused explicitly on developing critical-thinking skills (particularly inference ) can actually improve students’ scientific reasoning in their writing.


Critical-thinking and scientific reasoning skills are core learning objectives of science education for all students, regardless of whether or not they intend to pursue a career in science or engineering. Consistent with the view of learning as construction of understanding and meaning ( National Research Council, 2000 ), the pedagogical practice of writing has been found to be effective not only in fostering the development of students’ conceptual and procedural knowledge ( Gerdeman et al. , 2007 ) and communication skills ( Clase et al. , 2010 ), but also scientific reasoning ( Reynolds et al. , 2012 ) and critical-thinking skills ( Quitadamo and Kurtz, 2007 ).

Critical thinking and scientific reasoning are similar but different constructs that include various types of higher-order cognitive processes, metacognitive strategies, and dispositions involved in making meaning of information. Critical thinking is generally understood as the broader construct ( Holyoak and Morrison, 2005 ), comprising an array of cognitive processes and dispostions that are drawn upon differentially in everyday life and across domains of inquiry such as the natural sciences, social sciences, and humanities. Scientific reasoning, then, may be interpreted as the subset of critical-thinking skills (cognitive and metacognitive processes and dispositions) that 1) are involved in making meaning of information in scientific domains and 2) support the epistemological commitment to scientific methodology and paradigm(s).

Although there has been an enduring focus in higher education on promoting critical thinking and reasoning as general or “transferable” skills, research evidence provides increasing support for the view that reasoning and critical thinking are also situational or domain specific ( Beyer et al. , 2013 ). Some researchers, such as Lawson (2010) , present frameworks in which science reasoning is characterized explicitly in terms of critical-thinking skills. There are, however, limited coherent frameworks and empirical evidence regarding either the general or domain-specific interrelationships of scientific reasoning, as it is most broadly defined, and critical-thinking skills.

The Vision and Change in Undergraduate Biology Education Initiative provides a framework for thinking about these constructs and their interrelationship in the context of the core competencies and disciplinary practice they describe ( American Association for the Advancement of Science, 2011 ). These learning objectives aim for undergraduates to “understand the process of science, the interdisciplinary nature of the new biology and how science is closely integrated within society; be competent in communication and collaboration; have quantitative competency and a basic ability to interpret data; and have some experience with modeling, simulation and computational and systems level approaches as well as with using large databases” ( Woodin et al. , 2010 , pp. 71–72). This framework makes clear that science reasoning and critical-thinking skills play key roles in major learning outcomes; for example, “understanding the process of science” requires students to engage in (and be metacognitive about) scientific reasoning, and having the “ability to interpret data” requires critical-thinking skills. To help students better achieve these core competencies, we must better understand the interrelationships of their composite parts. Thus, the next step is to determine which specific critical-thinking skills are drawn upon when students engage in science reasoning in general and with regard to the particular scientific domain being studied. Such a determination could be applied to improve science education for both majors and nonmajors through pedagogical approaches that foster critical-thinking skills that are most relevant to science reasoning.

Writing affords one of the most effective means for making thinking visible ( Reynolds et al. , 2012 ) and learning how to “think like” and “write like” disciplinary experts ( Meizlish et al. , 2013 ). As a result, student writing affords the opportunities to both foster and examine the interrelationship of scientific reasoning and critical-thinking skills within and across disciplinary contexts. The purpose of this study was to better understand the relationship between students’ critical-thinking skills and scientific reasoning skills as reflected in the genre of undergraduate thesis writing in biology departments at two research universities, the University of Minnesota and Duke University.

In the following subsections, we discuss in greater detail the constructs of scientific reasoning and critical thinking, as well as the assessment of scientific reasoning in students’ thesis writing. In subsequent sections, we discuss our study design, findings, and the implications for enhancing educational practices.

Critical Thinking

The advances in cognitive science in the 21st century have increased our understanding of the mental processes involved in thinking and reasoning, as well as memory, learning, and problem solving. Critical thinking is understood to include both a cognitive dimension and a disposition dimension (e.g., reflective thinking) and is defined as “purposeful, self-regulatory judgment which results in interpretation, analysis, evaluation, and inference, as well as explanation of the evidential, conceptual, methodological, criteriological, or contextual considera­tions upon which that judgment is based” ( Facione, 1990, p. 3 ). Although various other definitions of critical thinking have been proposed, researchers have generally coalesced on this consensus: expert view ( Blattner and Frazier, 2002 ; Condon and Kelly-Riley, 2004 ; Bissell and Lemons, 2006 ; Quitadamo and Kurtz, 2007 ) and the corresponding measures of critical-­thinking skills ( August, 2016 ; Stephenson and Sadler-McKnight, 2016 ).

Both the cognitive skills and dispositional components of critical thinking have been recognized as important to science education ( Quitadamo and Kurtz, 2007 ). Empirical research demonstrates that specific pedagogical practices in science courses are effective in fostering students’ critical-thinking skills. Quitadamo and Kurtz (2007) found that students who engaged in a laboratory writing component in the context of a general education biology course significantly improved their overall critical-thinking skills (and their analytical and inference skills, in particular), whereas students engaged in a traditional quiz-based laboratory did not improve their critical-thinking skills. In related work, Quitadamo et al. (2008) found that a community-based inquiry experience, involving inquiry, writing, research, and analysis, was associated with improved critical thinking in a biology course for nonmajors, compared with traditionally taught sections. In both studies, students who exhibited stronger presemester critical-thinking skills exhibited stronger gains, suggesting that “students who have not been explicitly taught how to think critically may not reach the same potential as peers who have been taught these skills” ( Quitadamo and Kurtz, 2007 , p. 151).

Recently, Stephenson and Sadler-McKnight (2016) found that first-year general chemistry students who engaged in a science writing heuristic laboratory, which is an inquiry-based, writing-to-learn approach to instruction ( Hand and Keys, 1999 ), had significantly greater gains in total critical-thinking scores than students who received traditional laboratory instruction. Each of the four components—inquiry, writing, collaboration, and reflection—have been linked to critical thinking ( Stephenson and Sadler-McKnight, 2016 ). Like the other studies, this work highlights the value of targeting critical-thinking skills and the effectiveness of an inquiry-based, writing-to-learn approach to enhance critical thinking. Across studies, authors advocate adopting critical thinking as the course framework ( Pukkila, 2004 ) and developing explicit examples of how critical thinking relates to the scientific method ( Miri et al. , 2007 ).

In these examples, the important connection between writing and critical thinking is highlighted by the fact that each intervention involves the incorporation of writing into science, technology, engineering, and mathematics education (either alone or in combination with other pedagogical practices). However, critical-thinking skills are not always the primary learning outcome; in some contexts, scientific reasoning is the primary outcome that is assessed.

Scientific Reasoning

Scientific reasoning is a complex process that is broadly defined as “the skills involved in inquiry, experimentation, evidence evaluation, and inference that are done in the service of conceptual change or scientific understanding” ( Zimmerman, 2007 , p. 172). Scientific reasoning is understood to include both conceptual knowledge and the cognitive processes involved with generation of hypotheses (i.e., inductive processes involved in the generation of hypotheses and the deductive processes used in the testing of hypotheses), experimentation strategies, and evidence evaluation strategies. These dimensions are interrelated, in that “experimentation and inference strategies are selected based on prior conceptual knowledge of the domain” ( Zimmerman, 2000 , p. 139). Furthermore, conceptual and procedural knowledge and cognitive process dimensions can be general and domain specific (or discipline specific).

With regard to conceptual knowledge, attention has been focused on the acquisition of core methodological concepts fundamental to scientists’ causal reasoning and metacognitive distancing (or decontextualized thinking), which is the ability to reason independently of prior knowledge or beliefs ( Greenhoot et al. , 2004 ). The latter involves what Kuhn and Dean (2004) refer to as the coordination of theory and evidence, which requires that one question existing theories (i.e., prior knowledge and beliefs), seek contradictory evidence, eliminate alternative explanations, and revise one’s prior beliefs in the face of contradictory evidence. Kuhn and colleagues (2008) further elaborate that scientific thinking requires “a mature understanding of the epistemological foundations of science, recognizing scientific knowledge as constructed by humans rather than simply discovered in the world,” and “the ability to engage in skilled argumentation in the scientific domain, with an appreciation of argumentation as entailing the coordination of theory and evidence” ( Kuhn et al. , 2008 , p. 435). “This approach to scientific reasoning not only highlights the skills of generating and evaluating evidence-based inferences, but also encompasses epistemological appreciation of the functions of evidence and theory” ( Ding et al. , 2016 , p. 616). Evaluating evidence-based inferences involves epistemic cognition, which Moshman (2015) defines as the subset of metacognition that is concerned with justification, truth, and associated forms of reasoning. Epistemic cognition is both general and domain specific (or discipline specific; Moshman, 2015 ).

There is empirical support for the contributions of both prior knowledge and an understanding of the epistemological foundations of science to scientific reasoning. In a study of undergraduate science students, advanced scientific reasoning was most often accompanied by accurate prior knowledge as well as sophisticated epistemological commitments; additionally, for students who had comparable levels of prior knowledge, skillful reasoning was associated with a strong epistemological commitment to the consistency of theory with evidence ( Zeineddin and Abd-El-Khalick, 2010 ). These findings highlight the importance of the need for instructional activities that intentionally help learners develop sophisticated epistemological commitments focused on the nature of knowledge and the role of evidence in supporting knowledge claims ( Zeineddin and Abd-El-Khalick, 2010 ).

Scientific Reasoning in Students’ Thesis Writing

Pedagogical approaches that incorporate writing have also focused on enhancing scientific reasoning. Many rubrics have been developed to assess aspects of scientific reasoning in written artifacts. For example, Timmerman and colleagues (2011) , in the course of describing their own rubric for assessing scientific reasoning, highlight several examples of scientific reasoning assessment criteria ( Haaga, 1993 ; Tariq et al. , 1998 ; Topping et al. , 2000 ; Kelly and Takao, 2002 ; Halonen et al. , 2003 ; Willison and O’Regan, 2007 ).

At both the University of Minnesota and Duke University, we have focused on the genre of the undergraduate honors thesis as the rhetorical context in which to study and improve students’ scientific reasoning and writing. We view the process of writing an undergraduate honors thesis as a form of professional development in the sciences (i.e., a way of engaging students in the practices of a community of discourse). We have found that structured courses designed to scaffold the thesis-­writing process and promote metacognition can improve writing and reasoning skills in biology, chemistry, and economics ( Reynolds and Thompson, 2011 ; Dowd et al. , 2015a , b ). In the context of this prior work, we have defined scientific reasoning in writing as the emergent, underlying construct measured across distinct aspects of students’ written discussion of independent research in their undergraduate theses.

The Biology Thesis Assessment Protocol (BioTAP) was developed at Duke University as a tool for systematically guiding students and faculty through a “draft–feedback–revision” writing process, modeled after professional scientific peer-review processes ( Reynolds et al. , 2009 ). BioTAP includes activities and worksheets that allow students to engage in critical peer review and provides detailed descriptions, presented as rubrics, of the questions (i.e., dimensions, shown in Table 1 ) upon which such review should focus. Nine rubric dimensions focus on communication to the broader scientific community, and four rubric dimensions focus on the accuracy and appropriateness of the research. These rubric dimensions provide criteria by which the thesis is assessed, and therefore allow BioTAP to be used as an assessment tool as well as a teaching resource ( Reynolds et al. , 2009 ). Full details are available at www.science-writing.org/biotap.html .

Theses assessment protocol dimensions

In previous work, we have used BioTAP to quantitatively assess students’ undergraduate honors theses and explore the relationship between thesis-writing courses (or specific interventions within the courses) and the strength of students’ science reasoning in writing across different science disciplines: biology ( Reynolds and Thompson, 2011 ); chemistry ( Dowd et al. , 2015b ); and economics ( Dowd et al. , 2015a ). We have focused exclusively on the nine dimensions related to reasoning and writing (questions 1–9), as the other four dimensions (questions 10–13) require topic-specific expertise and are intended to be used by the student’s thesis supervisor.

Beyond considering individual dimensions, we have investigated whether meaningful constructs underlie students’ thesis scores. We conducted exploratory factor analysis of students’ theses in biology, economics, and chemistry and found one dominant underlying factor in each discipline; we termed the factor “scientific reasoning in writing” ( Dowd et al. , 2015a , b , 2016 ). That is, each of the nine dimensions could be understood as reflecting, in different ways and to different degrees, the construct of scientific reasoning in writing. The findings indicated evidence of both general and discipline-specific components to scientific reasoning in writing that relate to epistemic beliefs and paradigms, in keeping with broader ideas about science reasoning discussed earlier. Specifically, scientific reasoning in writing is more strongly associated with formulating a compelling argument for the significance of the research in the context of current literature in biology, making meaning regarding the implications of the findings in chemistry, and providing an organizational framework for interpreting the thesis in economics. We suggested that instruction, whether occurring in writing studios or in writing courses to facilitate thesis preparation, should attend to both components.

Research Question and Study Design

The genre of thesis writing combines the pedagogies of writing and inquiry found to foster scientific reasoning ( Reynolds et al. , 2012 ) and critical thinking ( Quitadamo and Kurtz, 2007 ; Quitadamo et al. , 2008 ; Stephenson and Sadler-­McKnight, 2016 ). However, there is no empirical evidence regarding the general or domain-specific interrelationships of scientific reasoning and critical-thinking skills, particularly in the rhetorical context of the undergraduate thesis. The BioTAP studies discussed earlier indicate that the rubric-based assessment produces evidence of scientific reasoning in the undergraduate thesis, but it was not designed to foster or measure critical thinking. The current study was undertaken to address the research question: How are students’ critical-thinking skills related to scientific reasoning as reflected in the genre of undergraduate thesis writing in biology? Determining these interrelationships could guide efforts to enhance students’ scientific reasoning and writing skills through focusing instruction on specific critical-thinking skills as well as disciplinary conventions.

To address this research question, we focused on undergraduate thesis writers in biology courses at two institutions, Duke University and the University of Minnesota, and examined the extent to which students’ scientific reasoning in writing, assessed in the undergraduate thesis using BioTAP, corresponds to students’ critical-thinking skills, assessed using the California Critical Thinking Skills Test (CCTST; August, 2016 ).

Study Sample

The study sample was composed of students enrolled in courses designed to scaffold the thesis-writing process in the Department of Biology at Duke University and the College of Biological Sciences at the University of Minnesota. Both courses complement students’ individual work with research advisors. The course is required for thesis writers at the University of Minnesota and optional for writers at Duke University. Not all students are required to complete a thesis, though it is required for students to graduate with honors; at the University of Minnesota, such students are enrolled in an honors program within the college. In total, 28 students were enrolled in the course at Duke University and 44 students were enrolled in the course at the University of Minnesota. Of those students, two students did not consent to participate in the study; additionally, five students did not validly complete the CCTST (i.e., attempted fewer than 60% of items or completed the test in less than 15 minutes). Thus, our overall rate of valid participation is 90%, with 27 students from Duke University and 38 students from the University of Minnesota. We found no statistically significant differences in thesis assessment between students with valid CCTST scores and invalid CCTST scores. Therefore, we focus on the 65 students who consented to participate and for whom we have complete and valid data in most of this study. Additionally, in asking students for their consent to participate, we allowed them to choose whether to provide or decline access to academic and demographic background data. Of the 65 students who consented to participate, 52 students granted access to such data. Therefore, for additional analyses involving academic and background data, we focus on the 52 students who consented. We note that the 13 students who participated but declined to share additional data performed slightly lower on the CCTST than the 52 others (perhaps suggesting that they differ by other measures, but we cannot determine this with certainty). Among the 52 students, 60% identified as female and 10% identified as being from underrepresented ethnicities.

In both courses, students completed the CCTST online, either in class or on their own, late in the Spring 2016 semester. This is the same assessment that was used in prior studies of critical thinking ( Quitadamo and Kurtz, 2007 ; Quitadamo et al. , 2008 ; Stephenson and Sadler-McKnight, 2016 ). It is “an objective measure of the core reasoning skills needed for reflective decision making concerning what to believe or what to do” ( Insight Assessment, 2016a ). In the test, students are asked to read and consider information as they answer multiple-choice questions. The questions are intended to be appropriate for all users, so there is no expectation of prior disciplinary knowledge in biology (or any other subject). Although actual test items are protected, sample items are available on the Insight Assessment website ( Insight Assessment, 2016b ). We have included one sample item in the Supplemental Material.

The CCTST is based on a consensus definition of critical thinking, measures cognitive and metacognitive skills associated with critical thinking, and has been evaluated for validity and reliability at the college level ( August, 2016 ; Stephenson and Sadler-McKnight, 2016 ). In addition to providing overall critical-thinking score, the CCTST assesses seven dimensions of critical thinking: analysis, interpretation, inference, evaluation, explanation, induction, and deduction. Scores on each dimension are calculated based on students’ performance on items related to that dimension. Analysis focuses on identifying assumptions, reasons, and claims and examining how they interact to form arguments. Interpretation, related to analysis, focuses on determining the precise meaning and significance of information. Inference focuses on drawing conclusions from reasons and evidence. Evaluation focuses on assessing the credibility of sources of information and claims they make. Explanation, related to evaluation, focuses on describing the evidence, assumptions, or rationale for beliefs and conclusions. Induction focuses on drawing inferences about what is probably true based on evidence. Deduction focuses on drawing conclusions about what must be true when the context completely determines the outcome. These are not independent dimensions; the fact that they are related supports their collective interpretation as critical thinking. Together, the CCTST dimensions provide a basis for evaluating students’ overall strength in using reasoning to form reflective judgments about what to believe or what to do ( August, 2016 ). Each of the seven dimensions and the overall CCTST score are measured on a scale of 0–100, where higher scores indicate superior performance. Scores correspond to superior (86–100), strong (79–85), moderate (70–78), weak (63–69), or not manifested (62 and below) skills.

Scientific Reasoning in Writing

At the end of the semester, students’ final, submitted undergraduate theses were assessed using BioTAP, which consists of nine rubric dimensions that focus on communication to the broader scientific community and four additional dimensions that focus on the exhibition of topic-specific expertise ( Reynolds et al. , 2009 ). These dimensions, framed as questions, are displayed in Table 1 .

Student theses were assessed on questions 1–9 of BioTAP using the same procedures described in previous studies ( Reynolds and Thompson, 2011 ; Dowd et al. , 2015a , b ). In this study, six raters were trained in the valid, reliable use of BioTAP rubrics. Each dimension was rated on a five-point scale: 1 indicates the dimension is missing, incomplete, or below acceptable standards; 3 indicates that the dimension is adequate but not exhibiting mastery; and 5 indicates that the dimension is excellent and exhibits mastery (intermediate ratings of 2 and 4 are appropriate when different parts of the thesis make a single category challenging). After training, two raters independently assessed each thesis and then discussed their independent ratings with one another to form a consensus rating. The consensus score is not an average score, but rather an agreed-upon, discussion-based score. On a five-point scale, raters independently assessed dimensions to be within 1 point of each other 82.4% of the time before discussion and formed consensus ratings 100% of the time after discussion.

In this study, we consider both categorical (mastery/nonmastery, where a score of 5 corresponds to mastery) and numerical treatments of individual BioTAP scores to better relate the manifestation of critical thinking in BioTAP assessment to all of the prior studies. For comprehensive/cumulative measures of BioTAP, we focus on the partial sum of questions 1–5, as these questions relate to higher-order scientific reasoning (whereas questions 6–9 relate to mid- and lower-order writing mechanics [ Reynolds et al. , 2009 ]), and the factor scores (i.e., numerical representations of the extent to which each student exhibits the underlying factor), which are calculated from the factor loadings published by Dowd et al. (2016) . We do not focus on questions 6–9 individually in statistical analyses, because we do not expect critical-thinking skills to relate to mid- and lower-order writing skills.

The final, submitted thesis reflects the student’s writing, the student’s scientific reasoning, the quality of feedback provided to the student by peers and mentors, and the student’s ability to incorporate that feedback into his or her work. Therefore, our assessment is not the same as an assessment of unpolished, unrevised samples of students’ written work. While one might imagine that such an unpolished sample may be more strongly correlated with critical-thinking skills measured by the CCTST, we argue that the complete, submitted thesis, assessed using BioTAP, is ultimately a more appropriate reflection of how students exhibit science reasoning in the scientific community.

Statistical Analyses

We took several steps to analyze the collected data. First, to provide context for subsequent interpretations, we generated descriptive statistics for the CCTST scores of the participants based on the norms for undergraduate CCTST test takers. To determine the strength of relationships among CCTST dimensions (including overall score) and the BioTAP dimensions, partial-sum score (questions 1–5), and factor score, we calculated Pearson’s correlations for each pair of measures. To examine whether falling on one side of the nonmastery/mastery threshold (as opposed to a linear scale of performance) was related to critical thinking, we grouped BioTAP dimensions into categories (mastery/nonmastery) and conducted Student’s t tests to compare the means scores of the two groups on each of the seven dimensions and overall score of the CCTST. Finally, for the strongest relationship that emerged, we included additional academic and background variables as covariates in multiple linear-regression analysis to explore questions about how much observed relationships between critical-thinking skills and science reasoning in writing might be explained by variation in these other factors.

Although BioTAP scores represent discreet, ordinal bins, the five-point scale is intended to capture an underlying continuous construct (from inadequate to exhibiting mastery). It has been argued that five categories is an appropriate cutoff for treating ordinal variables as pseudo-continuous ( Rhemtulla et al. , 2012 )—and therefore using continuous-variable statistical methods (e.g., Pearson’s correlations)—as long as the underlying assumption that ordinal scores are linearly distributed is valid. Although we have no way to statistically test this assumption, we interpret adequate scores to be approximately halfway between inadequate and mastery scores, resulting in a linear scale. In part because this assumption is subject to disagreement, we also consider and interpret a categorical (mastery/nonmastery) treatment of BioTAP variables.

We corrected for multiple comparisons using the Holm-Bonferroni method ( Holm, 1979 ). At the most general level, where we consider the single, comprehensive measures for BioTAP (partial-sum and factor score) and the CCTST (overall score), there is no need to correct for multiple comparisons, because the multiple, individual dimensions are collapsed into single dimensions. When we considered individual CCTST dimensions in relation to comprehensive measures for BioTAP, we accounted for seven comparisons; similarly, when we considered individual dimensions of BioTAP in relation to overall CCTST score, we accounted for five comparisons. When all seven CCTST and five BioTAP dimensions were examined individually and without prior knowledge, we accounted for 35 comparisons; such a rigorous threshold is likely to reject weak and moderate relationships, but it is appropriate if there are no specific pre-existing hypotheses. All p values are presented in tables for complete transparency, and we carefully consider the implications of our interpretation of these data in the Discussion section.

CCTST scores for students in this sample ranged from the 39th to 99th percentile of the general population of undergraduate CCTST test takers (mean percentile = 84.3, median = 85th percentile; Table 2 ); these percentiles reflect overall scores that range from moderate to superior. Scores on individual dimensions and overall scores were sufficiently normal and far enough from the ceiling of the scale to justify subsequent statistical analyses.

Descriptive statistics of CCTST dimensions a

a Scores correspond to superior (86–100), strong (79–85), moderate (70–78), weak (63–69), or not manifested (62 and lower) skills.

The Pearson’s correlations between students’ cumulative scores on BioTAP (the factor score based on loadings published by Dowd et al. , 2016 , and the partial sum of scores on questions 1–5) and students’ overall scores on the CCTST are presented in Table 3 . We found that the partial-sum measure of BioTAP was significantly related to the overall measure of critical thinking ( r = 0.27, p = 0.03), while the BioTAP factor score was marginally related to overall CCTST ( r = 0.24, p = 0.05). When we looked at relationships between comprehensive BioTAP measures and scores for individual dimensions of the CCTST ( Table 3 ), we found significant positive correlations between the both BioTAP partial-sum and factor scores and CCTST inference ( r = 0.45, p < 0.001, and r = 0.41, p < 0.001, respectively). Although some other relationships have p values below 0.05 (e.g., the correlations between BioTAP partial-sum scores and CCTST induction and interpretation scores), they are not significant when we correct for multiple comparisons.

Correlations between dimensions of CCTST and dimensions of BioTAP a

a In each cell, the top number is the correlation, and the bottom, italicized number is the associated p value. Correlations that are statistically significant after correcting for multiple comparisons are shown in bold.

b This is the partial sum of BioTAP scores on questions 1–5.

c This is the factor score calculated from factor loadings published by Dowd et al. (2016) .

When we expanded comparisons to include all 35 potential correlations among individual BioTAP and CCTST dimensions—and, accordingly, corrected for 35 comparisons—we did not find any additional statistically significant relationships. The Pearson’s correlations between students’ scores on each dimension of BioTAP and students’ scores on each dimension of the CCTST range from −0.11 to 0.35 ( Table 3 ); although the relationship between discussion of implications (BioTAP question 5) and inference appears to be relatively large ( r = 0.35), it is not significant ( p = 0.005; the Holm-Bonferroni cutoff is 0.00143). We found no statistically significant relationships between BioTAP questions 6–9 and CCTST dimensions (unpublished data), regardless of whether we correct for multiple comparisons.

The results of Student’s t tests comparing scores on each dimension of the CCTST of students who exhibit mastery with those of students who do not exhibit mastery on each dimension of BioTAP are presented in Table 4 . Focusing first on the overall CCTST scores, we found that the difference between those who exhibit mastery and those who do not in discussing implications of results (BioTAP question 5) is statistically significant ( t = 2.73, p = 0.008, d = 0.71). When we expanded t tests to include all 35 comparisons—and, like above, corrected for 35 comparisons—we found a significant difference in inference scores between students who exhibit mastery on question 5 and students who do not ( t = 3.41, p = 0.0012, d = 0.88), as well as a marginally significant difference in these students’ induction scores ( t = 3.26, p = 0.0018, d = 0.84; the Holm-Bonferroni cutoff is p = 0.00147). Cohen’s d effect sizes, which reveal the strength of the differences for statistically significant relationships, range from 0.71 to 0.88.

The t statistics and effect sizes of differences in ­dimensions of CCTST across dimensions of BioTAP a

a In each cell, the top number is the t statistic for each comparison, and the middle, italicized number is the associated p value. The bottom number is the effect size. Correlations that are statistically significant after correcting for multiple comparisons are shown in bold.

Finally, we more closely examined the strongest relationship that we observed, which was between the CCTST dimension of inference and the BioTAP partial-sum composite score (shown in Table 3 ), using multiple regression analysis ( Table 5 ). Focusing on the 52 students for whom we have background information, we looked at the simple relationship between BioTAP and inference (model 1), a robust background model including multiple covariates that one might expect to explain some part of the variation in BioTAP (model 2), and a combined model including all variables (model 3). As model 3 shows, the covariates explain very little variation in BioTAP scores, and the relationship between inference and BioTAP persists even in the presence of all of the covariates.

Partial sum (questions 1–5) of BioTAP scores ( n = 52)

** p < 0.01.

*** p < 0.001.

The aim of this study was to examine the extent to which the various components of scientific reasoning—manifested in writing in the genre of undergraduate thesis and assessed using BioTAP—draw on general and specific critical-thinking skills (assessed using CCTST) and to consider the implications for educational practices. Although science reasoning involves critical-thinking skills, it also relates to conceptual knowledge and the epistemological foundations of science disciplines ( Kuhn et al. , 2008 ). Moreover, science reasoning in writing , captured in students’ undergraduate theses, reflects habits, conventions, and the incorporation of feedback that may alter evidence of individuals’ critical-thinking skills. Our findings, however, provide empirical evidence that cumulative measures of science reasoning in writing are nonetheless related to students’ overall critical-thinking skills ( Table 3 ). The particularly significant roles of inference skills ( Table 3 ) and the discussion of implications of results (BioTAP question 5; Table 4 ) provide a basis for more specific ideas about how these constructs relate to one another and what educational interventions may have the most success in fostering these skills.

Our results build on previous findings. The genre of thesis writing combines pedagogies of writing and inquiry found to foster scientific reasoning ( Reynolds et al. , 2012 ) and critical thinking ( Quitadamo and Kurtz, 2007 ; Quitadamo et al. , 2008 ; Stephenson and Sadler-McKnight, 2016 ). Quitadamo and Kurtz (2007) reported that students who engaged in a laboratory writing component in a general education biology course significantly improved their inference and analysis skills, and Quitadamo and colleagues (2008) found that participation in a community-based inquiry biology course (that included a writing component) was associated with significant gains in students’ inference and evaluation skills. The shared focus on inference is noteworthy, because these prior studies actually differ from the current study; the former considered critical-­thinking skills as the primary learning outcome of writing-­focused interventions, whereas the latter focused on emergent links between two learning outcomes (science reasoning in writing and critical thinking). In other words, inference skills are impacted by writing as well as manifested in writing.

Inference focuses on drawing conclusions from argument and evidence. According to the consensus definition of critical thinking, the specific skill of inference includes several processes: querying evidence, conjecturing alternatives, and drawing conclusions. All of these activities are central to the independent research at the core of writing an undergraduate thesis. Indeed, a critical part of what we call “science reasoning in writing” might be characterized as a measure of students’ ability to infer and make meaning of information and findings. Because the cumulative BioTAP measures distill underlying similarities and, to an extent, suppress unique aspects of individual dimensions, we argue that it is appropriate to relate inference to scientific reasoning in writing . Even when we control for other potentially relevant background characteristics, the relationship is strong ( Table 5 ).

In taking the complementary view and focusing on BioTAP, when we compared students who exhibit mastery with those who do not, we found that the specific dimension of “discussing the implications of results” (question 5) differentiates students’ performance on several critical-thinking skills. To achieve mastery on this dimension, students must make connections between their results and other published studies and discuss the future directions of the research; in short, they must demonstrate an understanding of the bigger picture. The specific relationship between question 5 and inference is the strongest observed among all individual comparisons. Altogether, perhaps more than any other BioTAP dimension, this aspect of students’ writing provides a clear view of the role of students’ critical-thinking skills (particularly inference and, marginally, induction) in science reasoning.

While inference and discussion of implications emerge as particularly strongly related dimensions in this work, we note that the strongest contribution to “science reasoning in writing in biology,” as determined through exploratory factor analysis, is “argument for the significance of research” (BioTAP question 2, not question 5; Dowd et al. , 2016 ). Question 2 is not clearly related to critical-thinking skills. These findings are not contradictory, but rather suggest that the epistemological and disciplinary-specific aspects of science reasoning that emerge in writing through BioTAP are not completely aligned with aspects related to critical thinking. In other words, science reasoning in writing is not simply a proxy for those critical-thinking skills that play a role in science reasoning.

In a similar vein, the content-related, epistemological aspects of science reasoning, as well as the conventions associated with writing the undergraduate thesis (including feedback from peers and revision), may explain the lack of significant relationships between some science reasoning dimensions and some critical-thinking skills that might otherwise seem counterintuitive (e.g., BioTAP question 2, which relates to making an argument, and the critical-thinking skill of argument). It is possible that an individual’s critical-thinking skills may explain some variation in a particular BioTAP dimension, but other aspects of science reasoning and practice exert much stronger influence. Although these relationships do not emerge in our analyses, the lack of significant correlation does not mean that there is definitively no correlation. Correcting for multiple comparisons suppresses type 1 error at the expense of exacerbating type 2 error, which, combined with the limited sample size, constrains statistical power and makes weak relationships more difficult to detect. Ultimately, though, the relationships that do emerge highlight places where individuals’ distinct critical-thinking skills emerge most coherently in thesis assessment, which is why we are particularly interested in unpacking those relationships.

We recognize that, because only honors students submit theses at these institutions, this study sample is composed of a selective subset of the larger population of biology majors. Although this is an inherent limitation of focusing on thesis writing, links between our findings and results of other studies (with different populations) suggest that observed relationships may occur more broadly. The goal of improved science reasoning and critical thinking is shared among all biology majors, particularly those engaged in capstone research experiences. So while the implications of this work most directly apply to honors thesis writers, we provisionally suggest that all students could benefit from further study of them.

There are several important implications of this study for science education practices. Students’ inference skills relate to the understanding and effective application of scientific content. The fact that we find no statistically significant relationships between BioTAP questions 6–9 and CCTST dimensions suggests that such mid- to lower-order elements of BioTAP ( Reynolds et al. , 2009 ), which tend to be more structural in nature, do not focus on aspects of the finished thesis that draw strongly on critical thinking. In keeping with prior analyses ( Reynolds and Thompson, 2011 ; Dowd et al. , 2016 ), these findings further reinforce the notion that disciplinary instructors, who are most capable of teaching and assessing scientific reasoning and perhaps least interested in the more mechanical aspects of writing, may nonetheless be best suited to effectively model and assess students’ writing.

The goal of the thesis writing course at both Duke University and the University of Minnesota is not merely to improve thesis scores but to move students’ writing into the category of mastery across BioTAP dimensions. Recognizing that students with differing critical-thinking skills (particularly inference) are more or less likely to achieve mastery in the undergraduate thesis (particularly in discussing implications [question 5]) is important for developing and testing targeted pedagogical interventions to improve learning outcomes for all students.

The competencies characterized by the Vision and Change in Undergraduate Biology Education Initiative provide a general framework for recognizing that science reasoning and critical-thinking skills play key roles in major learning outcomes of science education. Our findings highlight places where science reasoning–related competencies (like “understanding the process of science”) connect to critical-thinking skills and places where critical thinking–related competencies might be manifested in scientific products (such as the ability to discuss implications in scientific writing). We encourage broader efforts to build empirical connections between competencies and pedagogical practices to further improve science education.

One specific implication of this work for science education is to focus on providing opportunities for students to develop their critical-thinking skills (particularly inference). Of course, as this correlational study is not designed to test causality, we do not claim that enhancing students’ inference skills will improve science reasoning in writing. However, as prior work shows that science writing activities influence students’ inference skills ( Quitadamo and Kurtz, 2007 ; Quitadamo et al. , 2008 ), there is reason to test such a hypothesis. Nevertheless, the focus must extend beyond inference as an isolated skill; rather, it is important to relate inference to the foundations of the scientific method ( Miri et al. , 2007 ) in terms of the epistemological appreciation of the functions and coordination of evidence ( Kuhn and Dean, 2004 ; Zeineddin and Abd-El-Khalick, 2010 ; Ding et al. , 2016 ) and disciplinary paradigms of truth and justification ( Moshman, 2015 ).

Although this study is limited to the domain of biology at two institutions with a relatively small number of students, the findings represent a foundational step in the direction of achieving success with more integrated learning outcomes. Hopefully, it will spur greater interest in empirically grounding discussions of the constructs of scientific reasoning and critical-thinking skills.

This study contributes to the efforts to improve science education, for both majors and nonmajors, through an empirically driven analysis of the relationships between scientific reasoning reflected in the genre of thesis writing and critical-thinking skills. This work is rooted in the usefulness of BioTAP as a method 1) to facilitate communication and learning and 2) to assess disciplinary-specific and general dimensions of science reasoning. The findings support the important role of the critical-thinking skill of inference in scientific reasoning in writing, while also highlighting ways in which other aspects of science reasoning (epistemological considerations, writing conventions, etc.) are not significantly related to critical thinking. Future research into the impact of interventions focused on specific critical-thinking skills (i.e., inference) for improved science reasoning in writing will build on this work and its implications for science education.

Supplementary Material


We acknowledge the contributions of Kelaine Haas and Alexander Motten to the implementation and collection of data. We also thank Mine Çetinkaya-­Rundel for her insights regarding our statistical analyses. This research was funded by National Science Foundation award DUE-1525602.

  • American Association for the Advancement of Science. (2011). Vision and change in undergraduate biology education: A call to action . Washington, DC: Retrieved September 26, 2017, from https://visionandchange.org/files/2013/11/aaas-VISchange-web1113.pdf . [ Google Scholar ]
  • August D. (2016). California Critical Thinking Skills Test user manual and resource guide . San Jose: Insight Assessment/California Academic Press. [ Google Scholar ]
  • Beyer C. H., Taylor E., Gillmore G. M. (2013). Inside the undergraduate teaching experience: The University of Washington’s growth in faculty teaching study . Albany, NY: SUNY Press. [ Google Scholar ]
  • Bissell A. N., Lemons P. P. (2006). A new method for assessing critical thinking in the classroom . BioScience , ( 1 ), 66–72. https://doi.org/10.1641/0006-3568(2006)056[0066:ANMFAC]2.0.CO;2 . [ Google Scholar ]
  • Blattner N. H., Frazier C. L. (2002). Developing a performance-based assessment of students’ critical thinking skills . Assessing Writing , ( 1 ), 47–64. [ Google Scholar ]
  • Clase K. L., Gundlach E., Pelaez N. J. (2010). Calibrated peer review for computer-assisted learning of biological research competencies . Biochemistry and Molecular Biology Education , ( 5 ), 290–295. [ PubMed ] [ Google Scholar ]
  • Condon W., Kelly-Riley D. (2004). Assessing and teaching what we value: The relationship between college-level writing and critical thinking abilities . Assessing Writing , ( 1 ), 56–75. https://doi.org/10.1016/j.asw.2004.01.003 . [ Google Scholar ]
  • Ding L., Wei X., Liu X. (2016). Variations in university students’ scientific reasoning skills across majors, years, and types of institutions . Research in Science Education , ( 5 ), 613–632. https://doi.org/10.1007/s11165-015-9473-y . [ Google Scholar ]
  • Dowd J. E., Connolly M. P., Thompson R. J., Jr., Reynolds J. A. (2015a). Improved reasoning in undergraduate writing through structured workshops . Journal of Economic Education , ( 1 ), 14–27. https://doi.org/10.1080/00220485.2014.978924 . [ Google Scholar ]
  • Dowd J. E., Roy C. P., Thompson R. J., Jr., Reynolds J. A. (2015b). “On course” for supporting expanded participation and improving scientific reasoning in undergraduate thesis writing . Journal of Chemical Education , ( 1 ), 39–45. https://doi.org/10.1021/ed500298r . [ Google Scholar ]
  • Dowd J. E., Thompson R. J., Jr., Reynolds J. A. (2016). Quantitative genre analysis of undergraduate theses: Uncovering different ways of writing and thinking in science disciplines . WAC Journal , , 36–51. [ Google Scholar ]
  • Facione P. A. (1990). Critical thinking: a statement of expert consensus for purposes of educational assessment and instruction. Research findings and recommendations . Newark, DE: American Philosophical Association; Retrieved September 26, 2017, from https://philpapers.org/archive/FACCTA.pdf . [ Google Scholar ]
  • Gerdeman R. D., Russell A. A., Worden K. J., Gerdeman R. D., Russell A. A., Worden K. J. (2007). Web-based student writing and reviewing in a large biology lecture course . Journal of College Science Teaching , ( 5 ), 46–52. [ Google Scholar ]
  • Greenhoot A. F., Semb G., Colombo J., Schreiber T. (2004). Prior beliefs and methodological concepts in scientific reasoning . Applied Cognitive Psychology , ( 2 ), 203–221. https://doi.org/10.1002/acp.959 . [ Google Scholar ]
  • Haaga D. A. F. (1993). Peer review of term papers in graduate psychology courses . Teaching of Psychology , ( 1 ), 28–32. https://doi.org/10.1207/s15328023top2001_5 . [ Google Scholar ]
  • Halonen J. S., Bosack T., Clay S., McCarthy M., Dunn D. S., Hill G. W., Whitlock K. (2003). A rubric for learning, teaching, and assessing scientific inquiry in psychology . Teaching of Psychology , ( 3 ), 196–208. https://doi.org/10.1207/S15328023TOP3003_01 . [ Google Scholar ]
  • Hand B., Keys C. W. (1999). Inquiry investigation . Science Teacher , ( 4 ), 27–29. [ Google Scholar ]
  • Holm S. (1979). A simple sequentially rejective multiple test procedure . Scandinavian Journal of Statistics , ( 2 ), 65–70. [ Google Scholar ]
  • Holyoak K. J., Morrison R. G. (2005). The Cambridge handbook of thinking and reasoning . New York: Cambridge University Press. [ Google Scholar ]
  • Insight Assessment. (2016a). California Critical Thinking Skills Test (CCTST) Retrieved September 26, 2017, from www.insightassessment.com/Products/Products-Summary/Critical-Thinking-Skills-Tests/California-Critical-Thinking-Skills-Test-CCTST .
  • Insight Assessment. (2016b). Sample thinking skills questions. Retrieved September 26, 2017, from www.insightassessment.com/Resources/Teaching-Training-and-Learning-Tools/node_1487 .
  • Kelly G. J., Takao A. (2002). Epistemic levels in argument: An analysis of university oceanography students’ use of evidence in writing . Science Education , ( 3 ), 314–342. https://doi.org/10.1002/sce.10024 . [ Google Scholar ]
  • Kuhn D., Dean D., Jr. (2004). Connecting scientific reasoning and causal inference . Journal of Cognition and Development , ( 2 ), 261–288. https://doi.org/10.1207/s15327647jcd0502_5 . [ Google Scholar ]
  • Kuhn D., Iordanou K., Pease M., Wirkala C. (2008). Beyond control of variables: What needs to develop to achieve skilled scientific thinking? . Cognitive Development , ( 4 ), 435–451. https://doi.org/10.1016/j.cogdev.2008.09.006 . [ Google Scholar ]
  • Lawson A. E. (2010). Basic inferences of scientific reasoning, argumentation, and discovery . Science Education , ( 2 ), 336–364. https://doi.org/­10.1002/sce.20357 . [ Google Scholar ]
  • Meizlish D., LaVaque-Manty D., Silver N., Kaplan M. (2013). Think like/write like: Metacognitive strategies to foster students’ development as disciplinary thinkers and writers . In Thompson R. J. (Ed.), Changing the conversation about higher education (pp. 53–73). Lanham, MD: Rowman & Littlefield. [ Google Scholar ]
  • Miri B., David B.-C., Uri Z. (2007). Purposely teaching for the promotion of higher-order thinking skills: A case of critical thinking . Research in Science Education , ( 4 ), 353–369. https://doi.org/10.1007/s11165-006-9029-2 . [ Google Scholar ]
  • Moshman D. (2015). Epistemic cognition and development: The psychology of justification and truth . New York: Psychology Press. [ Google Scholar ]
  • National Research Council. (2000). How people learn: Brain, mind, experience, and school . Expanded ed. Washington, DC: National Academies Press. [ Google Scholar ]
  • Pukkila P. J. (2004). Introducing student inquiry in large introductory genetics classes . Genetics , ( 1 ), 11–18. https://doi.org/10.1534/genetics.166.1.11 . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Quitadamo I. J., Faiola C. L., Johnson J. E., Kurtz M. J. (2008). Community-based inquiry improves critical thinking in general education biology . CBE—Life Sciences Education , ( 3 ), 327–337. https://doi.org/10.1187/cbe.07-11-0097 . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Quitadamo I. J., Kurtz M. J. (2007). Learning to improve: Using writing to increase critical thinking performance in general education biology . CBE—Life Sciences Education , ( 2 ), 140–154. https://doi.org/10.1187/cbe.06-11-0203 . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Reynolds J. A., Smith R., Moskovitz C., Sayle A. (2009). BioTAP: A systematic approach to teaching scientific writing and evaluating undergraduate theses . BioScience , ( 10 ), 896–903. https://doi.org/10.1525/bio.2009.59.10.11 . [ Google Scholar ]
  • Reynolds J. A., Thaiss C., Katkin W., Thompson R. J. (2012). Writing-to-learn in undergraduate science education: A community-based, conceptually driven approach . CBE—Life Sciences Education , ( 1 ), 17–25. https://doi.org/10.1187/cbe.11-08-0064 . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Reynolds J. A., Thompson R. J. (2011). Want to improve undergraduate thesis writing? Engage students and their faculty readers in scientific peer review . CBE—Life Sciences Education , ( 2 ), 209–215. https://doi.org/­10.1187/cbe.10-10-0127 . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rhemtulla M., Brosseau-Liard P. E., Savalei V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions . Psychological Methods , ( 3 ), 354–373. https://doi.org/­10.1037/a0029315 . [ PubMed ] [ Google Scholar ]
  • Stephenson N. S., Sadler-McKnight N. P. (2016). Developing critical thinking skills using the science writing heuristic in the chemistry laboratory . Chemistry Education Research and Practice , ( 1 ), 72–79. https://doi.org/­10.1039/C5RP00102A . [ Google Scholar ]
  • Tariq V. N., Stefani L. A. J., Butcher A. C., Heylings D. J. A. (1998). Developing a new approach to the assessment of project work . Assessment and Evaluation in Higher Education , ( 3 ), 221–240. https://doi.org/­10.1080/0260293980230301 . [ Google Scholar ]
  • Timmerman B. E. C., Strickland D. C., Johnson R. L., Payne J. R. (2011). Development of a “universal” rubric for assessing undergraduates’ scientific reasoning skills using scientific writing . Assessment and Evaluation in Higher Education , ( 5 ), 509–547. https://doi.org/10.1080/­02602930903540991 . [ Google Scholar ]
  • Topping K. J., Smith E. F., Swanson I., Elliot A. (2000). Formative peer assessment of academic writing between postgraduate students . Assessment and Evaluation in Higher Education , ( 2 ), 149–169. https://doi.org/10.1080/713611428 . [ Google Scholar ]
  • Willison J., O’Regan K. (2007). Commonly known, commonly not known, totally unknown: A framework for students becoming researchers . Higher Education Research and Development , ( 4 ), 393–409. https://doi.org/10.1080/07294360701658609 . [ Google Scholar ]
  • Woodin T., Carter V. C., Fletcher L. (2010). Vision and Change in Biology Undergraduate Education: A Call for Action—Initial responses . CBE—Life Sciences Education , ( 2 ), 71–73. https://doi.org/10.1187/cbe.10-03-0044 . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Zeineddin A., Abd-El-Khalick F. (2010). Scientific reasoning and epistemological commitments: Coordination of theory and evidence among college science students . Journal of Research in Science Teaching , ( 9 ), 1064–1093. https://doi.org/10.1002/tea.20368 . [ Google Scholar ]
  • Zimmerman C. (2000). The development of scientific reasoning skills . Developmental Review , ( 1 ), 99–149. https://doi.org/10.1006/drev.1999.0497 . [ Google Scholar ]
  • Zimmerman C. (2007). The development of scientific thinking skills in elementary and middle school . Developmental Review , ( 2 ), 172–223. https://doi.org/10.1016/j.dr.2006.12.001 . [ Google Scholar ]


  1. Empirical research

    The result of empirical research using statistical hypothesis testing is never proof. It can only support a hypothesis, reject it, or do neither. These methods yield only probabilities. Among scientific researchers, empirical evidence (as distinct from empirical research) refers to objective evidence that appears the same regardless of the ...

  2. Empirical evidence: A definition

    Empirical evidence is information acquired by observation or experimentation. Scientists record and analyze this data. The process is a central part of the scientific method, leading to the ...

  3. Scientific Method

    Advances in logic and probability held out promise of the possibility of elaborate reconstructions of scientific theories and empirical method, the best example being Rudolf Carnap's The Logical Structure of the World (1928). Carnap attempted to show that a scientific theory could be reconstructed as a formal axiomatic system—that is, a logic.

  4. Scientific method

    The scientific method is critical to the development of scientific theories, which explain empirical (experiential) laws in a scientifically rational manner. In a typical application of the scientific method, a researcher develops a hypothesis, tests it through various means, and then modifies the hypothesis on the basis of the outcome of the ...

  5. Empirical Research: Defining, Identifying, & Finding

    Empirical research methodologies can be described as quantitative, qualitative, or a mix of both (usually called mixed-methods). Ruane (2016) (UofM login required) gets at the basic differences in approach between quantitative and qualitative research: Quantitative research -- an approach to documenting reality that relies heavily on numbers both for the measurement of variables and for data ...

  6. Understanding the Empirical Method in Research Methodology

    The empirical method, central to scientific inquiry, relies on data collection and observation over theoretical speculation. It contrasts with experimental methods by focusing on natural data aggregation rather than controlled experiments, highlighting its roots in experiential learning and its significance in developing theories or conclusions.

  7. Research Method vs. Scientific Method

    However, research method is a broader term that encompasses various techniques and strategies used to gather and analyze data, while scientific method specifically refers to the process of formulating hypotheses, conducting experiments, and drawing conclusions based on empirical evidence. Both methods involve a systematic and logical approach ...

  8. Empirical Research

    This video covers what empirical research is, what kinds of questions and methods empirical researchers use, and some tips for finding empirical research articles in your discipline. ... Solving Everyday Problems with the Scientific Method: Thinking Like a Scientist. ISBN: 1282441132. Publication Date: 2009-01-01 << Previous: Scholarly and ...

  9. Research Methods vs. Scientific Methods

    Research methods and scientific methods are integral components of the scientific process, each with its own attributes and strengths. While research methods offer flexibility and in-depth exploration of complex phenomena, scientific methods prioritize objectivity, empirical evidence, and reproducibility.

  10. Empirical Research

    Hence, empirical research is a method of uncovering empirical evidence. Through the process of gathering valid empirical data, scientists from a variety of fields, ranging from the social to the natural sciences, have to carefully design their methods. This helps to ensure quality and accuracy of data collection and treatment.

  11. Empirical Research: Definition, Methods, Types and Examples

    Empirical research is defined as any research where conclusions of the study is strictly drawn from concretely empirical evidence, and therefore "verifiable" evidence. This empirical evidence can be gathered using quantitative market research and qualitative market research methods. For example: A research is being conducted to find out if ...

  12. Data, measurement and empirical methods in the science of science

    Abstract. The advent of large-scale datasets that trace the workings of science has encouraged researchers from many different disciplinary backgrounds to turn scientific methods into science ...

  13. What is Empirical Research? Definition, Methods, Examples

    Empirical research is characterized by several key features: Observation and Measurement: It involves the systematic observation or measurement of variables, events, or behaviors. Data Collection: Researchers collect data through various methods, such as surveys, experiments, observations, or interviews.

  14. Empirical evidence

    scientific theory. belief. empirical evidence, information gathered directly or indirectly through observation or experimentation that may be used to confirm or disconfirm a scientific theory or to help justify, or establish as reasonable, a person's belief in a given proposition. A belief may be said to be justified if there is sufficient ...

  15. Perspective: Dimensions of the scientific method

    The scientific method has been guiding biological research for a long time. It not only prescribes the order and types of activities that give a scientific study validity and a stamp of approval but also has substantially shaped how we collectively think about the endeavor of investigating nature. The advent of high-throughput data generation ...

  16. What Is Empirical Research? Definition, Types & Samples in 2024

    Empirical research is defined as any study whose conclusions are exclusively derived from concrete, verifiable evidence. The term empirical basically means that it is guided by scientific experimentation and/or evidence. Likewise, a study is empirical when it uses real-world evidence in investigating its assertions.

  17. 1.1 Methods of Knowing

    The scientific method is a process of systematically collecting and evaluating evidence to test ideas and answer questions. ... the scientific method can only be used to address empirical questions. This book and your research methods course are designed to provide you with an in-depth examination of how psychologists use the scientific method ...

  18. Scientific method

    The scientific method is an empirical method for acquiring knowledge that has characterized the development of science since at least the 17th century. The scientific method involves careful observation coupled with rigorous scepticism, because cognitive assumptions can distort the interpretation of the observation.Scientific inquiry includes creating a hypothesis through inductive reasoning ...

  19. Conceptual Vs. Empirical Research: Which Is Better?

    The modern scientific method is really a combination of empirical and conceptual research. Using known experimental data a scientist formulates a working hypothesis to explain some aspect of nature. He then performs new experiments designed to test predictions of the theory, to support it or disprove it. Einstein is often cited as an example of ...

  20. Experimental vs Empirical: Differences And Uses For Each One

    Experimental Vs Empirical. Experimental research is typically used to establish causality between variables. It involves manipulating one or more variables to see how they affect the outcome of interest. Empirical research, on the other hand, involves collecting data through observation, surveys, or other methods, without manipulating any ...

  21. Conceptual Research vs. Empirical Research

    Conceptual research focuses on the development of theories and concepts, providing a theoretical foundation for empirical investigations. Empirical research, on the other hand, relies on the collection and analysis of observable data to test and validate theories. Conceptual research is often exploratory and aims to expand the boundaries of ...

  22. Understanding the Complex Relationship between Critical Thinking and

    Empirical research demonstrates that specific pedagogical practices in science courses are effective in fostering students' critical-thinking skills. ... 2004) and developing explicit examples of how critical thinking relates to the scientific method (Miri et al., 2007).

  23. (PDF) Scientific Research Methodology Vs. Social Science Research

    Mobile: +91-9959738774. Abstract. The multiple research methodologies used in scientific and social science research such as. conceptual research, empirical research, model research methodology ...