lasagne.init
¶
Functions to create initializers for parameter variables.
Examples¶
>>> from lasagne.layers import DenseLayer
>>> from lasagne.init import Constant, GlorotUniform
>>> l1 = DenseLayer((100,20), num_units=50,
... W=GlorotUniform('relu'), b=Constant(0.0))
Initializers¶
Constant ([val]) |
Initialize weights with constant value. |
Normal ([std, mean]) |
Sample initial weights from the Gaussian distribution. |
Uniform ([range, std, mean]) |
Sample initial weights from the uniform distribution. |
Glorot (initializer[, gain, c01b]) |
Glorot weight initialization. |
GlorotNormal ([gain, c01b]) |
Glorot with weights sampled from the Normal distribution. |
GlorotUniform ([gain, c01b]) |
Glorot with weights sampled from the Uniform distribution. |
He (initializer[, gain, c01b]) |
He weight initialization. |
HeNormal ([gain, c01b]) |
He initializer with weights sampled from the Normal distribution. |
HeUniform ([gain, c01b]) |
He initializer with weights sampled from the Uniform distribution. |
Orthogonal ([gain]) |
Intialize weights as Orthogonal matrix. |
Sparse ([sparsity, std]) |
Initialize weights as sparse matrix. |
Detailed description¶
-
class
lasagne.init.
Initializer
[source]¶ Base class for parameter tensor initializers.
The
Initializer
class represents a weight initializer used to initialize weight parameters in a neural network layer. It should be subclassed when implementing new types of weight initializers.
-
class
lasagne.init.
Constant
(val=0.0)[source]¶ Initialize weights with constant value.
Parameters: val : float
Constant value for weights.
-
class
lasagne.init.
Normal
(std=0.01, mean=0.0)[source]¶ Sample initial weights from the Gaussian distribution.
Initial weight parameters are sampled from N(mean, std).
Parameters: std : float
Std of initial parameters.
mean : float
Mean of initial parameters.
-
class
lasagne.init.
Uniform
(range=0.01, std=None, mean=0.0)[source]¶ Sample initial weights from the uniform distribution.
Parameters are sampled from U(a, b).
Parameters: range : float or tuple
When std is None then range determines a, b. If range is a float the weights are sampled from U(-range, range). If range is a tuple the weights are sampled from U(range[0], range[1]).
std : float or None
If std is a float then the weights are sampled from U(mean - np.sqrt(3) * std, mean + np.sqrt(3) * std).
mean : float
see std for description.
-
class
lasagne.init.
Glorot
(initializer, gain=1.0, c01b=False)[source]¶ Glorot weight initialization.
This is also known as Xavier initialization [R4].
Parameters: initializer : lasagne.init.Initializer
Initializer used to sample the weights, must accept std in its constructor to sample from a distribution with a given standard deviation.
gain : float or ‘relu’
Scaling factor for the weights. Set this to
1.0
for linear and sigmoid units, to ‘relu’ orsqrt(2)
for rectified linear units, and tosqrt(2/(1+alpha**2))
for leaky rectified linear units with leakinessalpha
. Other transfer functions may need different factors.c01b : bool
For a
lasagne.layers.cuda_convnet.Conv2DCCLayer
constructed withdimshuffle=False
, c01b must be set toTrue
to compute the correct fan-in and fan-out.See also
GlorotNormal
- Shortcut with Gaussian initializer.
GlorotUniform
- Shortcut with uniform initializer.
Notes
For a
DenseLayer
, ifgain='relu'
andinitializer=Uniform
, the weights are initialized as\[\begin{split}a &= \sqrt{\frac{12}{fan_{in}+fan_{out}}}\\ W &\sim U[-a, a]\end{split}\]If
gain=1
andinitializer=Normal
, the weights are initialized as\[\begin{split}\sigma &= \sqrt{\frac{2}{fan_{in}+fan_{out}}}\\ W &\sim N(0, \sigma)\end{split}\]References
[R4] (1, 2) Xavier Glorot and Yoshua Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. International conference on artificial intelligence and statistics.
-
class
lasagne.init.
GlorotNormal
(gain=1.0, c01b=False)[source]¶ Glorot with weights sampled from the Normal distribution.
See
Glorot
for a description of the parameters.
-
class
lasagne.init.
GlorotUniform
(gain=1.0, c01b=False)[source]¶ Glorot with weights sampled from the Uniform distribution.
See
Glorot
for a description of the parameters.
-
class
lasagne.init.
He
(initializer, gain=1.0, c01b=False)[source]¶ He weight initialization.
Weights are initialized with a standard deviation of \(\sigma = gain \sqrt{\frac{1}{fan_{in}}}\) [R5].
Parameters: initializer : lasagne.init.Initializer
Initializer used to sample the weights, must accept std in its constructor to sample from a distribution with a given standard deviation.
gain : float or ‘relu’
Scaling factor for the weights. Set this to
1.0
for linear and sigmoid units, to ‘relu’ orsqrt(2)
for rectified linear units, and tosqrt(2/(1+alpha**2))
for leaky rectified linear units with leakinessalpha
. Other transfer functions may need different factors.c01b : bool
For a
lasagne.layers.cuda_convnet.Conv2DCCLayer
constructed withdimshuffle=False
, c01b must be set toTrue
to compute the correct fan-in and fan-out.References
[R5] (1, 2) Kaiming He et al. (2015): Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. arXiv preprint arXiv:1502.01852.
-
class
lasagne.init.
HeNormal
(gain=1.0, c01b=False)[source]¶ He initializer with weights sampled from the Normal distribution.
See
He
for a description of the parameters.
-
class
lasagne.init.
HeUniform
(gain=1.0, c01b=False)[source]¶ He initializer with weights sampled from the Uniform distribution.
See
He
for a description of the parameters.
-
class
lasagne.init.
Orthogonal
(gain=1.0)[source]¶ Intialize weights as Orthogonal matrix.
Orthogonal matrix initialization [R6]. For n-dimensional shapes where n > 2, the n-1 trailing axes are flattened. For convolutional layers, this corresponds to the fan-in, so this makes the initialization usable for both dense and convolutional layers.
Parameters: gain : float or ‘relu’
Scaling factor for the weights. Set this to
1.0
for linear and sigmoid units, to ‘relu’ orsqrt(2)
for rectified linear units, and tosqrt(2/(1+alpha**2))
for leaky rectified linear units with leakinessalpha
. Other transfer functions may need different factors.References
[R6] (1, 2) Saxe, Andrew M., James L. McClelland, and Surya Ganguli. “Exact solutions to the nonlinear dynamics of learning in deep linear neural networks.” arXiv preprint arXiv:1312.6120 (2013).