10 most common interview questions for any aspiring Data Scientist

1. What are feature vectors?

Answer:

Feature vectors are used to represent numeric or symbolic characteristics, called features, of an object in a mathematical, easily analyzable way. They are important for many different areas of machine learning and pattern processing. Machine learning algorithms typically require a numerical representation of objects in order for the algorithms to do processing and statistical analysis. Feature vectors are the equivalent of vectors of explanatory variables that are used in statistical procedures such as linear regression.

2. What is logistic regression?

Answer:

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

3. What are Recommender Systems?

Answer:

A recommender system or a recommendation system is a subclass of information filtering system that seeks to predict the "rating" or "preference" a user would give to an item. Recommender systems are one of the most successful and widespread application of machine learning technologies in business

4. Explain cross-validation.

Answer:

It is a model validation technique for evaluating how the outcomes of a statistical analysis will generalize to an independent data set. It is mainly used in backgrounds where the objective is forecast and one wants to estimate how accurately a model will accomplish in practice. The goal of cross-validation is to term a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting and gain insight on how the model will generalize to an independent data set.

5. What is Collaborative Filtering?

Answer:

The process of filtering used by most recommender systems to find patterns and information by collaborating perspectives, numerous data sources, and several agents.

6. Do gradient descent methods at all times converge to a similar point?

Answer:

No, they do not because in some cases they reach a local minima or a local optima point. You would not reach the global optima point. This is governed by the data and the starting conditions.

7. What is the goal of A/B Testing?

Answer:

This is a statistical hypothesis testing for randomized experiments with two variables, A and B. The objective of A/B testing is to detect any changes to a web page to maximize or increase the outcome of a strategy.

8. What are the drawbacks of the linear model?

Answer:

Some drawbacks of the linear model are:

The assumption of linearity of the errors.
There are overfitting problems that it can’t solve

9. What is the Law of Large Numbers?

Answer:

The law of large numbers is a principle of probability according to which the frequencies of events with the same likelihood of occurrence even out, given enough trials or instances. As the number of experiments increases, the actual ratio of outcomes will converge on the theoretical, or expected, ratio of outcomes.

For example, if a fair coin (where heads and tails come up equally often) is tossed 1,000,000 times, about half of the tosses will come up heads, and half will come up tails. The heads-to-tails ratio will be extremely close to 1:1. However, if the same coin is tossed only 10 times, the ratio will likely not be 1:1, and in fact might come out far different, say 3:7 or even 0:10.

The law of large numbers is sometimes referred to as the law of averages and generalized, mistakenly, to situations with too few trials or instances to illustrate the law of large numbers. This error in logic is known as the gambler’s fallacy.

If, for example, someone tosses a fair coin and gets several heads in a row, that person might think that the next toss is more likely to come up tails than heads because they expect frequencies of outcomes to become equal. But, because each coin toss is an independent event, the true probabilities of the two outcomes are still equal for the next coin toss and any coin toss that might follow.

Nevertheless, if the coin is tossed enough times, because the probability of the either outcome is the same, the law of large numbers comes into play and the number of heads and tails will be close to equal.

10. How regularly must an algorithm be updated?

Answer:

You will want to update an algorithm when:

You want the model to evolve as data streams through infrastructure
The underlying data source is changing
There is a case of non-stationarity

ML-AI Buzz

Pages

Tuesday, January 1, 2019

10 most common interview questions for any aspiring Data Scientist

No comments:

Post a Comment