Skip to the content.

Back to Home Page

DescriptiveAnalytics

This portfolio captures the work I completed for a course, Exploring and Visualizing Data, at Carnegie Mellon University in Spring 2022. The work done here involves statistical exploration, detailed visualization of data, and model fitting. To view my course repository on GitHub, please click here.

Key Learnings

From the course, Exploring and Visualizing Data, I learnt how to use RStudio, read R documentation, and write R scripts. I gained an understanding of how to use R for production of graphics and data manipulation such as filtering, aggregating etc. Moreover, I learnt how transformations, model fits, and residuals can be used to explore and check statistical assumptions about data. Finally, I gained insights into how techniques such as simulation can be used to explore questions of model fit and statistical significance.

Portfolio

Here are the assignments that I completed during the course of this class.

Assignments

To view the assignment code and output files for each, please click on the hyperlinks for folders below.

I. Data Cleaning and Aggregation: Homework I

II. Data Visualization: Homework II

III. Identifying Outliers using p-values and Bonferoni-corrected CIs: Homework III

IV. Understanding Distributions Using Quantile Plots & QQ-Plots: Homework IV

Final Project

Brief Description: The first part of the project focuses on manipulating data to identify and explore details of outliers in a crime related dataset. For example, detecting anomalies for block-week i.e. weeks for which crimes in a particular block (Poisson Random Variable) were unusually high compared to the weekly average for that block. The second part of the project relies upon concepts from variable transformation, splines & cross-validation, and variable interactions. The final part draws upon individual creativity to use a movie related dataset for creating an insightful visualization.

To view the project, please click here.