Non-linearity helps in training your model at a much faster rate and with more accuracy without the loss of your important information?

False

During the training process, each additional layer in your network can successively reduce signal vs. noise. How can we fix this?

Use non-saturating, linear activation functions.

Use non-saturating, nonlinear activation functions such as ReLUs.

Sigmoid or tanh activation functions.

None of the above

How can we stop ReLU layers from dying?

Smaller batch sizes

Batch normalization

Weight regularization

Lower your learning rates

The activation function which is linear in the positive domain and the function is 0 in the negative domain?

A: Sigmoid

Tan-h

ReLU

None of the above.

How can we solve the problem called internal covariate shift?