Model Selection:
Finally, you learned 4 unique points about using a simpler model where ever possible:
A simpler model is usually more generic than a complex model. This becomes important because generic models are bound to perform better on unseen datasets.
A simpler model requires less training data points. This becomes extremely important because in many cases one has to work with limited data points.
A simple model is more robust and does not change significantly if the training data points undergo small changes.
A simple model may make more errors in the training phase but it is bound to outperform complex models when it sees new data. This happens because of overfitting.
Complexity:
The disadvantages the first person is likely to face because of a complex model are (mark all that apply):
1. He’ll need more training data to ‘learn’
2. Despite the training, he may as well not learn and perform poorly in the real world
Overfitting:
The possibility of overfitting exists primarily because:
Models are trained on a set of training data but their efficacy is determined by the ability to perform well on unseen (test) data
Bias and Variance:
Variance: variance here refers to the degree of changes in the model itself with respect to changes in training data.
Example of a model memorizing the entire training dataset.
If you change the dataset a little, this model will need to change drastically. The model is,
therefore, unstable and sensitive to changes in training data, and this is called high variance.
Bias: quantifies how accurate the model is likely to be on future (test) data.
In an ideal case, we want to reduce both the bias and the variance,
because the expected total error of a model is the sum of the errors in bias and the variance, as shown in the figure below.
Regularization:
Having established that we need to find the right balance between model bias and variance, or simplicity and complexity, we need tools which can reduce or increase the complexity.
Regularization methods which are used to keep an eye on model complexity.
Regularization is the process of deliberately simplifying models to achieve the correct balance between keeping the model simple and yet not too naive.
Hyperparameters:
Hyperparameters are parameters that we pass on to the learning algorithm to control the complexity of the final model.
Hyperparameter are choices that the algorithm designer makes to ‘tune’ the behavior of the learning algorithm.
To summarize the concept of hyperparameters:
-Hyperparameters are used to 'fine-tune' or regularize the model so as to keep it optimally complex
-The learning algorithm is given the hyperparameters as an 'input' and returns the model parameters as the output
-Hyperparameters are not a part of the final model output
Model Evaluation and Cross Validation
- a model should never be evaluated on data it has already seen before
Two cases:
1) The training data is abundant. // divide into test and train data based on percentage.
2) The training data is limited. // use Cross Validation (CV) to generate the data
To summarise, the problems with manual hyperparameter tuning are:
Split into train and test sets: Tuning a hyperparameter makes the model 'see' the test data. Also, the results are dependent on the specific train-test split.
Split into train, validation, test sets: The validation data would eat into the training set.
However, in cross-validation, you split the data into train and test sets and train multiple models by sampling the train set.
Finally, you just use the test set to test the hyperparameter once.
Specifically, you will do k-fold cross-validation, wherein you divide the training data into k-folds.
In K-fold CV, you divide the training data into K-groups of samples. If K=4 (say), you use K-1 folds to build the model and test the model on Kth fold.
he various types of cross-validation are -
-K-fold cross-vali-dation
-Leave one out(LOO)
-Leave P-out(LPO)
-Stratified K-Fold
1. Given two linear regression models on the same dataset with 100 attributes, model A has ten attributes while the model B ninety attributes, what can you say about the two models?
- Model B has tried to memorize the data and when the training data changes a little, the expected results will change.
- The model B will have a much higher variance compared to model A.
- Model A will have a higher bias
2. Consider that data is generated via a polynomial equation of degree 4 (i.e. the said polynomial equation will perfectly fit the given data). Which of the following statements are correct in this case?
- Linear regression will have high bias and low variance
Linear regression would create a degree 1 polynomial which would be less complex than the degree four polynomial and hence would have a higher bias. Since the model is less complex and won’t overfit, it would have a low variance.
- Polynomial equation of degree 4 will have low bias and Low variance
Since the equation fits the data perfectly, bias and variance will both be low here.
3. How do you measure the variance of a model?
A. By measuring how much does the estimates of the model change on the test data, on changing the training data.
Variance measures how much the model changes with respect to the training data.
4. In principle it is always possible to reduce the training error to zero
A. Yes, You could always make the model memorise the entire training dataset!
5. Regularization is a: complexity vs accuracy
Technique which is used to strike a balance between model complexity and model accuracy on training data.
6. Which of the following statements is correct with respect to k-fold cross validation?
A. Training happens k times and higher k would imply higher run time for training with k-fold cross validation.
Also, a higher k implies that the training set every time is bigger and is a better representation of the actual data.
Finally, you learned 4 unique points about using a simpler model where ever possible:
A simpler model is usually more generic than a complex model. This becomes important because generic models are bound to perform better on unseen datasets.
A simpler model requires less training data points. This becomes extremely important because in many cases one has to work with limited data points.
A simple model is more robust and does not change significantly if the training data points undergo small changes.
A simple model may make more errors in the training phase but it is bound to outperform complex models when it sees new data. This happens because of overfitting.
Complexity:
The disadvantages the first person is likely to face because of a complex model are (mark all that apply):
1. He’ll need more training data to ‘learn’
2. Despite the training, he may as well not learn and perform poorly in the real world
Overfitting:
The possibility of overfitting exists primarily because:
Models are trained on a set of training data but their efficacy is determined by the ability to perform well on unseen (test) data
Bias and Variance:
Variance: variance here refers to the degree of changes in the model itself with respect to changes in training data.
Example of a model memorizing the entire training dataset.
If you change the dataset a little, this model will need to change drastically. The model is,
therefore, unstable and sensitive to changes in training data, and this is called high variance.
Bias: quantifies how accurate the model is likely to be on future (test) data.
In an ideal case, we want to reduce both the bias and the variance,
because the expected total error of a model is the sum of the errors in bias and the variance, as shown in the figure below.
Regularization:
Having established that we need to find the right balance between model bias and variance, or simplicity and complexity, we need tools which can reduce or increase the complexity.
Regularization methods which are used to keep an eye on model complexity.
Regularization is the process of deliberately simplifying models to achieve the correct balance between keeping the model simple and yet not too naive.
Hyperparameters:
Hyperparameters are parameters that we pass on to the learning algorithm to control the complexity of the final model.
Hyperparameter are choices that the algorithm designer makes to ‘tune’ the behavior of the learning algorithm.
To summarize the concept of hyperparameters:
-Hyperparameters are used to 'fine-tune' or regularize the model so as to keep it optimally complex
-The learning algorithm is given the hyperparameters as an 'input' and returns the model parameters as the output
-Hyperparameters are not a part of the final model output
Model Evaluation and Cross Validation
- a model should never be evaluated on data it has already seen before
Two cases:
1) The training data is abundant. // divide into test and train data based on percentage.
2) The training data is limited. // use Cross Validation (CV) to generate the data
To summarise, the problems with manual hyperparameter tuning are:
Split into train and test sets: Tuning a hyperparameter makes the model 'see' the test data. Also, the results are dependent on the specific train-test split.
Split into train, validation, test sets: The validation data would eat into the training set.
However, in cross-validation, you split the data into train and test sets and train multiple models by sampling the train set.
Finally, you just use the test set to test the hyperparameter once.
Specifically, you will do k-fold cross-validation, wherein you divide the training data into k-folds.
In K-fold CV, you divide the training data into K-groups of samples. If K=4 (say), you use K-1 folds to build the model and test the model on Kth fold.
he various types of cross-validation are -
-K-fold cross-vali-dation
-Leave one out(LOO)
-Leave P-out(LPO)
-Stratified K-Fold
1. Given two linear regression models on the same dataset with 100 attributes, model A has ten attributes while the model B ninety attributes, what can you say about the two models?
- Model B has tried to memorize the data and when the training data changes a little, the expected results will change.
- The model B will have a much higher variance compared to model A.
- Model A will have a higher bias
2. Consider that data is generated via a polynomial equation of degree 4 (i.e. the said polynomial equation will perfectly fit the given data). Which of the following statements are correct in this case?
- Linear regression will have high bias and low variance
Linear regression would create a degree 1 polynomial which would be less complex than the degree four polynomial and hence would have a higher bias. Since the model is less complex and won’t overfit, it would have a low variance.
- Polynomial equation of degree 4 will have low bias and Low variance
Since the equation fits the data perfectly, bias and variance will both be low here.
3. How do you measure the variance of a model?
A. By measuring how much does the estimates of the model change on the test data, on changing the training data.
Variance measures how much the model changes with respect to the training data.
4. In principle it is always possible to reduce the training error to zero
A. Yes, You could always make the model memorise the entire training dataset!
5. Regularization is a: complexity vs accuracy
Technique which is used to strike a balance between model complexity and model accuracy on training data.
6. Which of the following statements is correct with respect to k-fold cross validation?
A. Training happens k times and higher k would imply higher run time for training with k-fold cross validation.
Also, a higher k implies that the training set every time is bigger and is a better representation of the actual data.
Comments
Post a Comment