Flatiron Data Science


From Data Analyst to Data Scientist

AUC - ROC Curve

In Machine Learning, there are plenty of metrics used to measure the performance of the model. For classification problems, AUC - ROC curves are an essential evaluation metric. Before we dive into the AUC - ROC curve, I will give a quick introduction to the Confusion Matrix.


Module 4 - Forecasting House Prices

So here you are, a new and upcoming Data Scientist, hired by a real-estate investment firm to allocate millions of their dollars into various houses around the nation. You’ve got more than 14,000 zip codes and more than 20 years of monthly average house price data to work with. How would you convince the firm on what the 5 best zip codes are to invest in?


Northwind - The Data and The Money

This project is a treasure hunt. We’re given a database map and expected to search every nook and cranny of the data to find all the Easter eggs. Luckily for us, this is just an exercise and we only needed to do 4 hypothesis tests but even with a small sample database, the potential questions are endless.


Mod1 R-Squared and Feature Selection

In a scatterplot of independent and dependent variables, datapoints are usually not plotted in an easy straight line. The most you can hope for is a pattern. A Linear regression model attempts to define the relationship between the variables by best fitting a linear equation. In order to measure the accuracy of the linear equation, we use a statistical measure called R-Squared.