DataMining
This portfolio captures the work I completed for a course, Data Mining, at Carnegie Mellon University in Spring 2021. The work done here involves discovering structure and making predictions in large, complex data sets. The tasks encompass learnings about many commonly used methods for predictive and descriptive analytics tasks, in addition to assessing the methods’ predictive and practical utility. To view my course repository on GitHub, please click here.
Key Learnings
From the course, Data Mining, I learnt how to utilize popular data mining methods in R, along with weighing their advantages & disadvantages by comparing their utility through various assignments highlighted below. Moreover, I used real-world datasets to navigate through various stages of model building such as feature selection, model evaluation and resampling, to assess the performance & reliability of these methods.
Portfolio
Here are the assignments that I completed during the course of this class.
Assignments
To view the data, code and output files for each, please click on the hyperlinks for folders below.
I. Assignment I
- Part 1: Qualitative predictors
- Part 2: Multiple linear regression
- Part 3: Dealing with collinearity
- Part 4: Exploring non-linearities
II. Assignment II
- Code to be updated.
- Part 1: Placing knots, choosing degrees of freedom
- Part 2: Cross-validation
III. Assignment III
- Part 1: Variable selection
- Part 2: Best subset selection
- Part 3: Forward Stepwise Selection
- Part 4: Lasso
- Part 5: Instability of Logistic regression
IV. Assignment IV
- Part 1: Classifier performance metrics
- Part 2: Decision trees
- Part 3: Random forests
- Part 1: Exploring the dataset
- Part 2: Subset Selection
- Part 3: Cross-validation and Generalized Additive Model
- Part 4: Lasso
- Part 5: Final Model Evaluation
Final Project
Brief Description: Detecting Network Intrusions - “There is a large and profitable bank in Saint Louis, Missouri. Like any large corporation, this bank has a very large and intricate infrastructure that supports its networking system. A Network Analyst recently discovered unusual network activity. Then, pouring over year’s worth of logs, their team of analysts discovered many instances of anomalous network activity that resulted in significant sums of money being siphoned from bank accounts. The Chief Networking Officer has come to your group for help in developing a system that can automatically detect and warn of such known, as well as other unknown, anomalous network activities.”
The following tasks were performed as part of this project: differentiating between the labeled intrusions and benign sessions; identifying different types of intrusions; developing and implementing a systematic approach to detect instances of intrusions in log files; evaluating the detection power of the designed system; assessing the real-time use of the intrusion detector.
To view the different components of the project, please click on the hyperlinks for each part below: