+ -

Pages

Tuesday, January 29, 2019

Most commonly asked interview questions on Machine Learning

What is a feature?

Feature. A symbolic or numeric property of a real world object that might be useful to determine its class. The word 'attribute' is used for this as well. Different objects however may have different numbers of attributes, while usually for all objects in the same problem the same features can be measured. Thereby objects may be represented by a feature vector, or by a set of attributes.

When do you use the Classification algorithm?

Classification. When the data are being used to predict a category, supervised learning is also called classification.
This is the case when assigning an image as a picture of either a 'cat' or a 'dog'. When there are only two choices, this is called two-class or binomial classification. When there are more categories, as when predicting the winner of the NCAA March Madness tournament, this problem is known as multi-class classification.

What is the curse of dimensionality?

The curse of dimensionality refers to how certain learning algorithms may perform poorly in high-dimensional data.

Let's say you have a straight line 100 yards long and you dropped a penny somewhere on it. It wouldn't be too hard to find. You walk along the line and it takes two minutes.

Now let's say you have a square 100 yards on each side and you dropped a penny somewhere on it. It would be pretty hard, like searching across two football fields stuck together. It could take days.

Now a cube 100 yards across. That's like searching a 30-story building the size of a football stadium. Ugh.


What is regularization?

In Machine learning and statistics, a common task is to fit a model to a set of training data. This model can be used later to make predictions or classify new data points.
When the model fits the training data but does not have a good predicting performance and generalization power, we have an overfitting problem.
Regularization is a technique used to avoid this overfitting problem. The idea behind regularization is that models that overfit the data are complex models that have for example too many parameters

How do you use categorical features with scikit-learn?

 The categorical features have to be converted to dummy or in python speak in other times called indicator variables.
So there is no difference between a dummy and an indicator variable.

How do you calculate Miscalssification Error
(FP + FN ) / N
False positive - FP
False Negative - FN
Negative - N

What are some of the ways to tune a model?
  1. Feature Selection
  2. Realization
  3. Hyperparameters
  4. Cross Validation

Which following are reasons for visualizing a data set before attempting to build a supervised machine learning model?

  • Develop an understanding of the relationship between the features and the label to determine which features are likely to be predictive of the label and should be used in training the machine learning model.
  • Develop an understanding of which features are redundant or collinear with other features and should be eliminated from the dataset before training the machine learning model.
  • Find features that are not likely to be predictive of the label and should be removed from the dataset before training the machine learning model.

You are training a supervised machine learning model. You want to ensure that when training and testing the dataset you do not introduce any unintentional bias.

What should you do?
Split the dataset into two non-overlapping portions, then train the model using one portion and test it using the other.

5 ML-AI Buzz: Most commonly asked interview questions on Machine Learning What is a feature? Feature. A symbolic or numeric property of a real world object that might be useful to determine its class. The wor...

No comments:

Post a Comment

< >