4 Problem Sets
4.1 Introduction
You are a team assigned to conduct research on one of the following topics. Each group will receive a specific topic randomly from the list below. Your tasks are divided into two Problem Sets:
- Problem Set 1:
Data Management
Visualization
Descriptive Statistics
Conceptual Endogeneity
- Problem Set 2
Formulate Testable Hypotheses
OLS Estimation
Diagnostics
Sensitivity/Robustness Checks
Important: All datasets must come from PSA OpenSTAT or World Bank Open Data.
If you want to use other datasets, you must inform the lecturer by Week 2.
No Kaggle or any cleaned datasets allowed.
4.2 Assigned Topics List (General and Specific)
Trade and Economic Outcomes
a. Export volume and regional economic growth
b. Imports, domestic production, and household income
c. Trade openness and employment in key sectors
Agriculture and Productivity
a. Crop yield differences by farm size
b. Fertilizer input and productivity
c. Regional specialization in agriculture
Money, Banking and Household Finance
a. Household access to banking and income
b. Regional credit availability and small business activity
c. Income, savings, and household financial stability
Stocks and Capital Markets
a. Stock market index movements and GDP
b. Stock volatility and investment or savings
c. Economic shocks and market performance
Education and Human Capital
a. Educational attainment and earnings
b. Regional schooling differences and income disparities
c. Education spending and enrollment/completion rates
Health and Economic Outcomes
a. Health expenditure and labor productivity
b. Regional health access and income
c. Health outcomes and employment
4.3 Problem Set 1 - Data Management to Descriptive Statistics and Conceptual Endogeneity
Deadline: Week 6
Note: HW means handwritten.
- Data Management and Cleaning
Acquire dataset(s) for your assigned topic
Clean data (handle missing values, recode variables, reshape, etc.)
Report observations before and after cleaning (HW)
- Data Visualization
Create at least 3 visualizations
For each visualization:
Explain why you chose it (HW)
Discuss what it shows in economic terms (HW)
Interpret patterns, trends, or anomalies (HW)
- Descriptive Statistics
Compute summary statistics (mean, median, SD)
Compare across groups, regions, or categories
Discuss findings in economic terms (HW)
- Conceptual Endogeneity / Confounding Variables
Identify at least one variable that might confound relationships (HW)
Discuss how it could bias interpretation (HW)
- Economic Discussion
Summarize your findings clearly (HW)
Link patterns to economic reasoning or policy (HW)
4.4 Problem Set 2 - Testable Hypotheses, OLS, Diagnostics, Robustness
Deadline: Week 12
Note: HW means handwritten.
- Formulate Testable Hypotheses
- Clearly define dependent and independent variables
- OLS Estimation
Run bivariate OLS
Run multivariate OLS with controls
Report coefficients, standard errors, and R2 (Note: This can be printed and pasted)
Discuss (HW):
Economic interpretation of coefficients
Can the OLS be interpreted causally? Why or why not?
Potential sources of bias
- Sensitivity/ Robustness Checks
Test whether results hold with different sets of control variables
Perform subsample analysis (i.e., gender, region, time period)
Discuss which results are robust, which change and why (HW)
Please emphasize your discussion with economic interpretation (HW)
- Diagnostics or Formal Tests
Multicollinearity (VIF)
Heteroskedasticity (BP or White)
Functional form (RESET)
Discuss the implications of the results (HW)
- Economic Discussion (HW)
Compare bivariate vs multivariate results and the sensitivity checks
Summarize findings with Problem Set 1, policy, limitations
4.5 Reminders:
Hard copy submission is the only submission accepted.
All with (HW) means these are handwritten. Please write legibly.
Typed R code printed and included.
Plots are printed and pasted together with the handwritten discussion.
Only include essential outputs meaning, no need to print the dataset contents, whatsoever.
First pages would be the answers to all questions together then, next pages would be the step-by-step process with code chunks in R.
In printing the Quarto Markdown file for submission, put in the code chunks the following line on top where you see the {r}. It should be like this: {r, eval=FALSE}. This is so that the results will not appear in the HTML. You can then print the Quarto Markdown clearly.
Only do this when you are printing the R codes.