Abstract:
|
In recent decades, data-driven approaches have been developed to analyze demographic and economic surveys on a large scale. The goal of this report is to apply multivariate analysis techniques to gain insight on relationships between income and expenditure of American households using the fmli161 dataset based on the Consumer Expenditure survey conducted by Bureau of Labor Statistics. Initially, 35 variables are selected from three categories: demographics, income and expenditure. Missing values and categorical variables are the first to be handled in preliminary analysis. On the mathematical side, I propose to evaluate the data and the results for stability and reproducibility. Further interpretations beyond economics presents the potential of the dataset. In conclusion, sparse PCA suggests FINCBTXM, FSALARYM, TOTEXPCQ, FOODCQ and HOUSCQ as the five most important variables of the selected, while cluster analysis gives more options depending on the number of clusters needed. CCA revealed high correlation between income and expenditure for middle class Americans, while correspondence analysis does not fully support suggestions of rebalancing higher educational rights based on race.
|