This is first in the series of notes that I plan to put online which I take during my online courses. This one specifically is from the Udacity Course Classification Models.
The course start with a discussion around a problem where we need to predict which things people are more likely to buy with the change of weather.
Lesson 1:
- Discussion about examples of classification problems. For eg. determining if a soyabean has a disease or not using an image.
- Discussion about binary(either of the two values) and non-binary(categorical variables) examples.
- Discussion about Statistical Terms - Target and Predictor Variable.
- Target Variable - Field we are trying to understand and predict - Dependent Variable.
- Predictor Variable - Used to predict the target variable - Independent Variable.
- Remove Duplicate Variables(one variable subset of other) identifying correlations.
- Correlation - Measure of association between two variables whose vaues lie between -1 to 1.
- Three types of associations to be studied for understanding correlation
- Pearson correlation
- Spearman’s Rank correlation
- Hoeffding’s Independent Test
- Pearson Correlation - Correlation Plots
- No issues during training but issues while testing or predicting.
Key take outs:
- For those who are already aware of Machine Learning, Predictor Variable is what we call as a feature and Target Variable is the objective function.