RAnalytics
This portfolio captures the work I completed for a course, Programming in R for Analytics, at Carnegie Mellon University in Fall 2020. The work done here involves: manipulating data objects, producing graphics, analyzing data using popular statistical methods, and generating reproducible statistical reports. To view my course repository on GitHub, please click here.
Key Learnings
From the course, Programming in R for Analytics, I learnt how to use RStudio, read R documentation, and write R scripts. I gained an understanding of the functional challenges related to importing, exporting and manipulating data. In addition to this, I garnered the skills required to produce statistical summaries of continuous and categorical data; generate basic graphics using standard functions; and produce more advanced graphics using the ggplot2 library. Finally, I gained insights into performing popular hypothesis tests, and running regression models to furnish statistical analyses in R Markdown/R Notebooks.
Portfolio
Here are the assignments that I completed during the course of this class.
Assignments
To view the assignment code and output files for each, please click on the hyperlinks for folders below.
I. Tabular Summaries and Data Cleaning: Homework II
II. Data Visualization: Homework III
III. Statistical Tests: Homework IV
Final Project
Brief Description: “Sex-related differences: Is there a significant difference in income between men and women? Does the difference vary depending on other factors (e.g., education, marital status, criminal history, drug use, childhood household factors, profession, etc.)?”
The main question of interest was answered in the project through the following steps:
1) Data processing and summarization: Insightful graphical and tabular summaries of the data
2) Methodology: Dealing with missing values and topcoded variables; exploring trends and correlations; variable selection
3) Findings: Tabular summaries; graphical Summaries; regression and interpretation of coefficients; assessment of statistical significance
4) Discussion: Potential confounders; model fit limitations; confidence in results for policy makers
To view the different components of the project, please click on the hyperlinks for each part below.