This version: November 2, 2015.
Gov. 50/Govt E 1005 Introduction to Political Science Research Methods Spring Semester 2016 T/Th, 11-12, Location:TBA
Prof. Muhammet Bas CGIS K-209 [email protected]
617-495-4765 Office Hours: TBA
Teaching Fellows: TBA
Overview New research is the most exciting and important aspect of political science: we are able to pose novel questions, construct fresh theories, and provide new evidence about the way the world works. But before we start doing research, we have to learn how it is done. With this in mind, this class will introduce students to techniques used for research in the study of politics. Part of this task is conceptual: helping students to think sensibly and systematically about research design. To this end, students will learn how data and theory fit together, and how to measure the quantities we care about. But part of the task is practical too: students will learn a ‘toolbox’ of methods—including statistical software—that enable them to execute their plans. As a purely pragmatic matter, this class is highly recommended for students who plan to write senior theses: the statistical and computing material learned will be very helpful for those undertaking such projects.
Govt E 1005 Extension School This course is part of the Extension School’s Distance Education Program. The lectures, which are given at Harvard each week, will be recorded and made available to all Extension School registered students via the Internet. Please see the distance education website for details on the program: http://www.extension.harvard.edu/distanceed For students of the Extension school, there will be a (recorded) section, at 1
• TBA Students of the Extension school (but not students of the College) may attend this section in person if they wish. The information below, concerning homeworks, exams, lateness penalties, academic honesty and the syllabus apply to Extension School students.
Preliminaries and Requirements for students of the College This course provides twice-weekly lectures, and a weekly recitation. Students are very strongly advised to attend all three. This is a lecture-based and section-based class: the information and skills that you need to complete your homework assignments and exams will be provided by the Professor or the TFs. Nonetheless, you should read the materials we assign: they will help you garner a deeper and more complete understanding of the class. Sections: for students of the College, your TFs will hold sections in TBA (possibly moving to a computer lab, if or when required). As of Spring 2016, the possible section times are TBA We will have one midterm and one final exam. Your final grade will be based on a combination of: • Homeworks (30%) • Midterm exam (30%): The midterm exam is scheduled on TBA and will be in class. • Final Exam (40%): The final exam will cover all the material discussed in class throughout the semester. The exam is scheduled to take place on TBA (Exam group TBA) Once confirmed, the midterm and final exam times are firm. Athletes and other representatives who have scheduling conflicts will have their coaches administer the exams (and this is for the athletes to arrange). If you miss an exam due to some unavoidable documented illness or circumstance, we will administer one make-up. No documentation, no grade. In general, the homeworks will be due one week after they are assigned. They must be turned in for grading on time. Students of the College must turn in their work on paper (no emails!). Extension school students may submit their homeworks electronically online via the arrangement we set up. Every day late results in a grade level drop: an A− becomes a B+, a B+ becomes a B and so on. Academic Honesty and Plagiarism: the University is very clear that students work is expected to be their own and that plagiarism is not tolerated. The same rules apply here: No assignment on which you receive a grade is to be collaborative. You may consult with others, but any work you hand in must be your own. Do not copy another individual’s work, answers or ideas. Do not allow another individual to copy your work, answers or ideas.
Disciplinary action follows for those that choose to disobey these instructions. Syllabus and Plan: in general, we hope that this syllabus is an accurate plan of the classes and material that follow. Occasionally, changes may need to be made: your responsibility as a student is to keep yourself informed of all such changes and to be aware of exam and homework dates. Ignorance will not be an acceptable excuse.
Reading and Textbooks The following textbooks are either required or recommended. In terms of the recommended texts, there is no need to buy them before speaking to a TF, who can advise you on a purchase if you are struggling with certain parts of the material. We have ordered them for the Coop, and you should also be able to purchase or rent them via online sellers such as amazon.com. They will be available via the course reserves system. Note that we won’t always go in exactly the same order that the textbooks organize the material, and we will include some information that none of the books cover in detail.
Required Pollock, Edition. This manner.
Philip H. 2015. The Essentials of Political Analysis, CQ Press, Washington DC. Fifth ISBN-13: 978-1506305837. is the main textbook for the class, and covers most of the topics in a clear and concise We refer to this text as ‘Pollock’ below.
Pollock, Philip H. 2015. A STATA Companion to Political Analysis, CQ Press, Washington DC. Third Edition. ISBN-13: 978-1452240428. This the companion guide for the main textbook, and comes with a STATA data CD. You will need that for examples and exercises. We refer to this text as ‘STATA companion’ below.
Recommended Shively, W. Phillips. 2012. The Craft of Political Research, Pearson-Prentice Hall, Upper Saddle River NJ. Ninth Edition. ISBN-13: 978-0205854622. This book complements the main textbook and is particularly helpful for thinking about ‘theory’ and ‘experiments’ in political science research. We refer to this text as ‘Shively’ below. Paul S. Gray and John B. Williamson and David A. Karp and John R Dalphin. 2007. The Research Imagination: An Introduction to Qualitative and Quantitative Methods. Cambridge University Press, New York. ISBN-13: 9780521705554. This book cover qualitative techniques well, and has chapters devoted to the comparative method. This may be particularly helpful towards end of the class, and will be useful for students planning to learn more about methods in future. We refer to this text as ‘Imagination’ below. Neil J. Salkind. 2013. Statistics for People Who (Think They) Hate Statistics. Sage, Los Angeles. Fifth Edition. ISBN-13: 978-1452277714. 3
Worth a look for those struggling with the quantitative aspects of the class, this text offers a more humorous and ‘chatty’ approach than Pollock. It doesn’t cover research planning and design in much detail though. We refer to this text as ‘Salkind’ below. Gary King and Robert Keohane and Sidney Verba. 1994. Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton University Press, New Jersey. ISBN-13: 9780691034713. A helpful text for those considering case-study approaches, their strengths and limitations. We refer to this text as ‘KKV’ below.
Software We will be using STATA, a statistical package. The Science Center lab has the software (as do other labs), and you can use a downloaded version while on campus or off-campus use. On the FAS Software Downloads page, it is listed as Stata MP (current version is 14), which can be obtained here: http://downloads.fas.harvard.edu/download Speak to your TF for more details.
COURSE SCHEDULE 1
Data and Theories
Aims Upon completing this topic you will understand. . . • what differentiates data types, and why it matters • what makes for a useful theory—that can actually be tested
Notes Data enables political scientists to test their theories of the world: without it, we cannot make progress as a discipline. It comes in many different forms: from historical cases to numbers on a spreadsheet; from surveys of countries, to experiments in a laboratory; from minutely detailed changes in individual participation behavior to huge events like revolutions and wars. Yet even before we gather our data, there are important prerequisites for our theories such that they are helpful to our understanding.
Reading Pollock, Ch 2.
Further Reading Shively, Ch 1 and 2. KKV, Ch 1. Imagination, Ch 1 and Ch 2. Salkind, Ch 6.
Key words and concepts • types of analysis: quantitative, qualitative • types of data: categorical, nominal, ordinal, interval/continuous. • characteristics of theories: parsimony, falsifiability, testability.
Measurement and Concepts
Aims Upon completing this topic you will understand. . . • the steps required to translate our theoretical ideas to ones that can be tested with real data • the problems inherent in this move, and how to guard against them
Notes Before we begin our analysis, we need to be clear about what we are studying: voters? women? wars? countries? revolutions? A separate question concerns the ‘level’ at which we obtain data about our subjects of interest. Our theories tell us that certain characteristics matter for behavior: perhaps ‘maleness’ is important for vote choice, or societal ‘openness’ is important for racial harmony in a country. To proceed scientifically, we must find ways to measure these aspects of our subjects so that they can be compared. This can be difficult in practice—which can make our analysis prone to error.
Reading Pollock, Ch 1 and 2.
Further Reading Shively, Ch 4 and 5. Imagination, Ch 4. Salkind, Ch 6.
Key Words and Concepts • subjects and data: the unit of analysis vs the level • testing our theories: theory to concept to measure • characteristics of subjects: variables and operationalization • variable ‘quality’: reliability, validity and measurement error
Sampling and Surveys
Aims Upon completing this topic you will understand. . . • how and why we use samples to tell us about populations • various sampling methods • problems with survey methodologies that can throw our inferences awry
Notes We often need to ask individuals about their political behavior: sometimes because decisions are secret or confidential (like voting), other times because decisions have not been documented (like who went to an anti-war protest). Unsurprisingly, this does not always produce satisfactory results; people forget, lie or are influenced by the person asking the question. Even if we accept the need to solicit the answers of individuals directly, we cannot usually gather data from every subject who might possibly be of interest: filling in a spreadsheet for every one of the three hundred million Americans would, for example, be both time consuming and expensive! Hence, we obtain information from a smaller, more manageable number who are representative of the whole. Working with only this data, we still attempt to conclude things about the behavior of everyone else in society. There are problems with this approach though, and we need to be careful about both selecting those we study and interpreting their responses.
Reading Pollock, Ch 6.
Further Reading Shively, Ch 7. Imagination, Ch 6 and 7.
Key Words and Concepts • inference: from sample to population, from statistic to parameter • using representative subjects: (simple) random samples • sampling problems: self-selection and response bias • survey problems: observer and ‘Hawthorne’ effects, ‘der Kluge Hans’ effects, ‘Shy Tories’, ‘Bradley’ effects
Causality and Experiments
Aims Upon completing this topic you will understand. . . • why we care about causation, and how it differs from association • why it is so difficult to ‘prove’ causal statements are ‘true’: at a theoretical and empirical level • the importance of ‘control’ groups within ‘experiments’ in social science
Notes It is easy to find variables that are related empirically: shoe size and height, ice cream sales and drowning, computer processing speed and the incidence of allergies among children. It is quite another matter to assert causation. To do so, we need to show that some ‘treatment’ (like an education program for high-schoolers, or the election of a particular President) produced an outcome (like a reduction in teenage pregnancies, or a war) that would not have occurred—or not to the same degree—otherwise. The latter point is very important and under-appreciated: we absolutely must have a comparable set of subjects—voters, women, citizens, countries, districts, schools—who did not receive the treatment (the policy change, the electoral system, the drug, the Federal funds). We also need to rule out coincidences and other factors that are causing both variables to change. Laboratory-style experiments might be the preferred way to proceed, though these can be tricky in social science situations: luckily, there are other things we can do.
Reading Pollock, Ch 4 and 5. 7
Further Reading Shively, Ch 6. Imagination, Ch 12. KKV, Ch 4, 5 and 6.
Key Words and Concepts • the fundamental problem: theory of causation cannot be observed • causation versus (spurious) associations and coincidences • understanding treatment: holding variables constant and the ‘control’ group • assessing policies: regression to the mean • approaches to causality: true experiments, ‘natural’ experiments—with and without premeasurment.
Large-n Methods I: Descriptive Statistics
Upon completing this topic you will understand. . . • some straightforward ways to describe data: its average value, and the way that it is spread • that it is easy to mislead our audience—with even simple statistics! • how we think about the ‘distribution’ of the data, and the importance of the ‘normal’ distribution within that framework
Notes Once we have our data, we need to describe it for both ourselves and others. Statisticians have standard ways to do this, and they concern the ‘average’ value of the data and the way it is ‘spread’. Depending on the nature of our data, we cannot meaningfully use certain measures—and even if we can, we have to be careful not to mislead our readers with our presentation choices! Statisticians also think about the way data ‘stacks up’—that is, the frequency of any particular value of a variable (like the curve that describes the number of people who are various heights, or the number of districts which have various percentages of Republican voters). One of the most important of these ‘distributions’ is the ‘normal.’
Reading Pollock, Ch 2 and 6 STATA companion, Ch 1 and 2. 8
Further Reading Salkind, Ch 2, 3 and 4.
Key Words and Concepts • central tendency: mean, median, mode • spread: variance and central tendency • distributions: normal, Gaussian, bell-curve
Large-n Methods II: Assessing Hypotheses
Upon completing this topic you will understand. . . • how our theory leads us to scientific statements that will be tested with our data • that we can never ‘prove’ our theory is true. . . but we can provide evidence that supports (or refutes!) it • how to use special distributions to weigh the evidence for the posited relationship versus the evidence for no relationship at all
Notes Once we have our theory and our data, we want to use the latter to ‘test’ the former. In statistics, this operation take a very particular form that seems odd at first: we assess the evidence for our theory by thinking about the counterfactual of what we would see if we were wrong. That is, we look at possibility that there is no relationship between the variables of interest. It turns out that even though we can use well accepted and accurate techniques to do this, they still involve a judgement call from us (and our readers). We use special distributions to help us interpret our evidence here.
Reading Pollock, Ch 3 STATA companion, Ch 6 and 7
Further Reading Salkind, Ch 7, 8 and 9. Shively, Ch 10.
Key Words and Concepts • hypotheses: the null and alternative • assessing the evidence for hypotheses: p-values and significance tests • cross tabulations and χ2 (“ki square”) tests
Large-n Methods III: t-tests for Comparison of Means
Upon completing this topic you will understand. . . • how to compare an observed sample mean to a hypothesized value • how to compare two means arising from paired data • how to compare two means from unpaired data
Notes Once we know how to assess hypotheses in the abstract we will want to put that knowledge to use. When dealing with data, we may be interested in drawing inferences from a single sample, or we may want to gather information about two different populations in order to compare them. Examples could be comparing the average income levels among men and women, or average growth rates among democracies versus autocracies. This means we need to think about how to calculate test statistics from the data and how to evaluate our results. We will also find out what assumptions we need to make for our inferences to be valid.
Reading Pollock, Ch 6/7 (pages 138-154). STATA companion, Ch 4 and 5.
Further Reading Salkind, Ch 10 and Ch 11.
Key Words and Concepts • t-statistic • one sided vs. two-sided tests
Large-n Methods IV: Linear Regression
Upon completing this topic you will understand. . . • how to assess the relationship between two variables • the difference between correlation and regression, and the assumptions behind each • how to specify a regression model • how to interpret a regression coefficient • how to make predictions using results from a regression
Notes Comparing the means of two populations was fun. What if we want to do more with our data? We will look at alternative ways to assess the association between two variables in paired data. We will start with correlation, which examines how closely large (or small) values of one variable are related to large (or small) values of another. When we want to assess causal claims, we need to think about the relationship between our dependent variable (the thing being caused) and independent variables (the things doing the causing). In the simplest form, we can use ‘regression’ techniques to examine the expected change in our dependent variable in response to an increase or decrease in our independent variable.
Reading Pollock, Ch 8 . STATA companion, Ch 8.
Further Reading Shively, Ch 8. Imagination, Ch 18 and Ch 19. Salkind, Ch 14 and Ch 15.
Key Words and Concepts • correlation coefficient, regression coefficient, error term (residual) • assumptions of the linear regression (least squares estimation) • testing the significance of regression coefficients • prediction using regression results
Large-n Methods V: Multiple Regression
Upon completing this topic you will understand. . . • how to use STATA to estimate a multiple regression model • methods to assess goodness of fit • testing overall significance in multiple regression • how to make predictions using results from multiple regression
Notes Our theories rarely specify a relationship between just two variables. Even when they do so, it is usually wise to control for confounding factors by including other independent variables. In these series of lectures, we will see how we can extend the bivariate regression framework to incorporate multiple independent variables. We will also talk about how to test the significance of individual coefficients and the overall significance of the model, and how to evaluate goodness of fit in a given model. At the end of the lecture, we will discuss potential problems with the linear regression framework, and alternative methods that can be used in situations in which linear regression is not appropriate.
Reading Pollock, Ch 8 (pages 187-197). STATA companion, Ch 8
Further Reading Salkind, Ch 15. Shively, Ch 8.
Key Words and Concepts • R2 (coefficient of determination), Adjusted R2 • F-statistic test for overall significance
Small-N Methods: Case Studies and the Comparative Research Strategy
Upon completing this topic you will understand. . . • how to make the most out of a single case • how to design a comparative study • how to select cases for analysis 12
Notes Sometimes there exist relatively few cases of the phenomenon that we want to study. Or it could be that data are not available, or too costly to collect, to conduct a large-N analysis. In those situations scholars use case studies or the comparative method for theory-building or testing hypotheses. Case study approach employs an in-depth analysis to a single case, while the comparative approach refers to the close examination of small number of cases for comparison. In these lectures, we will discuss merits of these two methods with several examples. We will also look at potential issues related to these approaches such as selection bias and the “many variables, small-N” problem.
Further Reading Shively, Ch 7. Imagination, Ch 6 (pages 115-120) and Ch 15. KKV, Ch 4, Ch 5 and Ch 6.
Key Words and Concepts • Mill’s method of difference and agreement • selection bias