Skip to the content.

Back to Home Page

DataMining

This portfolio captures the work I completed for a course, Data Mining, at Carnegie Mellon University in Spring 2021. The work done here involves discovering structure and making predictions in large, complex data sets. The tasks encompass learnings about many commonly used methods for predictive and descriptive analytics tasks, in addition to assessing the methods’ predictive and practical utility. To view my course repository on GitHub, please click here.

Key Learnings

From the course, Data Mining, I learnt how to utilize popular data mining methods in R, along with weighing their advantages & disadvantages by comparing their utility through various assignments highlighted below. Moreover, I used real-world datasets to navigate through various stages of model building such as feature selection, model evaluation and resampling, to assess the performance & reliability of these methods.

Portfolio

Here are the assignments that I completed during the course of this class.

Assignments

To view the data, code and output files for each, please click on the hyperlinks for folders below.

I. Assignment I

II. Assignment II

III. Assignment III

IV. Assignment IV

v. Midterm Project

Final Project

Brief Description: Detecting Network Intrusions - “There is a large and profitable bank in Saint Louis, Missouri. Like any large corporation, this bank has a very large and intricate infrastructure that supports its networking system. A Network Analyst recently discovered unusual network activity. Then, pouring over year’s worth of logs, their team of analysts discovered many instances of anomalous network activity that resulted in significant sums of money being siphoned from bank accounts. The Chief Networking Officer has come to your group for help in developing a system that can automatically detect and warn of such known, as well as other unknown, anomalous network activities.”

The following tasks were performed as part of this project: differentiating between the labeled intrusions and benign sessions; identifying different types of intrusions; developing and implementing a systematic approach to detect instances of intrusions in log files; evaluating the detection power of the designed system; assessing the real-time use of the intrusion detector.

To view the different components of the project, please click on the hyperlinks for each part below: