Login using Social Account
Login using your credentials
Problem with ReLU is/are
slower training
dying gradients for positive inputs
dying gradients for negative inputs
Taking you to the next exercise in seconds...
Stay here Next Exercise
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Go Back to the Course
1 Training Deep Neural Networks - Deep Neural Networks
2 MCQ - The backpropagation algorithm works by going from the output layer to the input layer propagating the error gradient on the way.
3 MCQ - The value of sigmoid is between 0 and 0.25.
4 MCQ - The output of the derivative of the Sigmoid function is always between 0 and 0.25.
5 MCQ - Problems caused due to gradient descent are
6 MCQ - If input is large on positive or negative axis, Sigmoid function saturates at 0 or 1 and its derivative becomes extremely close to 0
7 MCQ - Early layers are responsible for detecting simple patterns and are building blocks of the neural network and hence it becomes important that early layers are accurate
8 Training Deep Neural Networks - Activation Functions
9 MCQ - Problem with ReLU is/are
10 MCQ - Which of the following is smooth everywhere including around z = 0?
11 MCQ - ReLU has a nonzero gradient for z < 0 which avoids the dying units issue
12 MCQ - With SELU activation function Even a 100 layer deep neural network Preserves roughly mean 0 and standard deviation 1 across all layers Avoiding the exploding/vanishing gradients problem
13 Training Deep Neural Networks - Batch Normalization
14 MCQ - Batch Normalization helps in
15 MCQ - In batch normalization, we standardize the hidden layer inputs for each mini-batch using a common mean and variance.
16 MCQ - In batch normalization, the neural network makes slower predictions due to the extra computations required at each layer.
17 Training Deep Neural Networks - Gradient Clipping
18 MCQ - We can reduce the exploding gradients problem by clipping the gradients during backpropagation so that they never exceed some threshold.
19 MCQ - Using optimizer = keras.optimizers.SGD(clipvalue=1.0),
20 MCQ - If you want to ensure that Gradient Clipping does not change the direction of the gradient vector, you should Clip by norm by setting clipnorm instead of clipvalue
21 Training Deep Neural Networks - Reusing Pretrained Layers
22 MCQ - Which of the following hyperparameters could you not fine-tune in transfer learning?
23 MCQ - Transfer learning is preferable because
24 MCQ - In Transfer learning, if input pictures in our task do not have the same size as the one in the existing network
25 MCQ - The deep learning models from Keras Applications
26 MCQ - Using autoencoders than RBM (Restricted Boltzmann Machines) is still a good option when we have complex task to solve and no similar pretrained model is available.
27 Training Deep Neural Networks - Faster Optimizers (Part I)
28 MCQ - Which of the following are the ways to speedup training?
29 MCQ -Momentum optimization does not care about what previous gradients were. So gradient descent is better to converge faster.
30 MCQ - To simulate some sort of friction mechanism and prevent the momentum from growing too large, the algorithm introduces a new hyperparameter ?, simply called the momentum, which must be set between 0 (high friction) and 1 (no friction).
31 MCQ - Momentum optimization can help roll past local optima.
32 MCQ - In deep neural networks that don’t use Batch Normalization, the upper layers will often end up having inputs with very different scales, so using Momentum optimization helps a lot.
33 MCQ - Momentum optimization doesn't have any hyper parameter to tune
34 MCQ - Nesterov Optimizer
35 MCQ - AdaGrad often performs well for simple quadratic problems, but unfortunately it often stops too early when training neural networks
36 Training Deep Neural Networks - Faster Optimizers (Part II)
37 MCQ - The RMSProp algorithm accumulates only the gradients from the most recent iterations, as opposed to all the gradients since the beginning of training.
38 MCQ - Adam which stands for adaptive moment estimation, combines the ideas of Momentum optimization And RMSProp
39 MCQ - We could apply L1 regularization during training that will reduce some weights to zero.
40 MCQ - If learning rate is set slightly too high, convergence occurs very fast at the optimal point.
41 MCQ - Which of the following is/are adaptive learning rate optimization algorithm(s)?
42 MCQ - Measure the validation error every N steps, just like for early stopping and reduce the learning rate by a factor of ? when the error stops dropping.
43 MCQ - Since AdaGrad, RMSProp, and Adam optimization do not automatically reduce the learning rate during training, it is not necessary to add an extra learning schedule.
44 Training Deep Neural Networks - Regularization
45 MCQ - In early stopping, we stop training as soon as the validation error reaches a maximum.
46 MCQ - Is dropout used on the test set?
47 MCQ - Using a dropout
48 MCQ - In data augmentation,
49 MCQ - It is preferable to generate new images on the fly during training rather than wasting Storage space and Network bandwidth
Loading comments...