1. Batchnormalization
batchnorm_forward ( x, gamma, beta, bn_param )
input
- x ; data
- gamma ; Scale parameter of shape
- beta ; Shift parameter of shape
- bn_param
mode ; 'train' or 'test'.
eps ; constant for numeric stability.
momentum ; constant for running mean / variance.
running_mean ; giving running mean of features
running_var ; giving running varaiance of features
returns a tuple of
- out ; of shape (N, D)
- cache ; tuple of values needed in the backward pass
'''
At each timestep we update the running averages for mean and variance using an exponential decay based on the momentum parameter.
running_mean = momentum * running_mean + (1 - momentum) * sample_mean
running_var = momentum * runnig_var + (1 - momentum) * sample_var
'''
batchnorm_backward(dout, cache)
- dout ; Upstream derivatives
- cache ; Variable of intermediates from batchnorm_forward.
Returns a tuple of;
- dx ; Gradient with respect to inputs x, of shape
- dgamma ; Gradient with respect to scale parameter gamma
- dbeta ; Gradient with respect to shift parameter beta
"Backward pass for batxch noramlization."
앞서 저장한 cache 에서 x_noramlized, gamma, beta, sample_mean , sample_var, x , eps 를 꺼내온다.
batchnorm_backward_alt(dout, cache)
#code and describe
def batchnorm_backward_alt(dout, cache):
"""
Alternative backward pass for batch normalization.
For this implementation you should work out the derivatives for the batch
normalizaton backward pass on paper and simplify as much as possible. You
should be able to derive a simple expression for the backward pass.
See the jupyter notebook for more hints.
Note: This implementation should expect to receive the same cache variable
as batchnorm_backward, but might not use all of the values in the cache.
Inputs / outputs: Same as batchnorm_backward
"""
dx, dgamma, dbeta = None, None, None
###########################################################################
# TODO: Implement the backward pass for batch normalization. Store the #
# results in the dx, dgamma, and dbeta variables. #
# #
# After computing the gradient with respect to the centered inputs, you #
# should be able to compute gradients with respect to the inputs in a #
# single statement; our implementation fits on a single 80-character line.#
###########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#pass
gamma, x_norm, x_mu, var_inv = cache
N = x_mu.shape[0]
dgamma = np.sum(dout * x_norm, axis=0)
dbeta = np.sum(dout, axis=0)
dvar = np.sum(dout * gamma * x_mu, axis=0) * -0.5 * (var_inv ** 3) ##
dmu = np.sum(dout * gamma * -var_inv, axis=0) + dvar * -2 * np.mean(x_mu, axis=0) ##
dx = dout * gamma * var_inv + dvar * (2 / N) * x_mu + dmu * (1 / N)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
###########################################################################
# END OF YOUR CODE #
###########################################################################
return dx, dgamma, dbeta
layernorm_forward
input ; x , gamma, beta , ln_param
- x ; Data
- gamma ; Scale parameter
- beta ; Shift parameter
- ln_param ; eps -> Constant for numeric stability
Note that in contrast to batch normalization, the behavior druing train and test-time for
layer normalization are identical, and we do not need to keep track of running averages of any sort.
layernorm_backward
input ; dout, cache
- dout ; Upstream derivatives
- cache ; Variable of intermediates from layernorm_forward
output ; dx , dgamma , dbeta
- dx ; Gradient with respect to inputs, x
- dgamma ; Gradient with respect to scale parameter gamma
- dbeta ; Gradient with respect to shift parameter beta
dropout_forward
input ; x, dropout_param
- x ; Input data
- dropout_param ;
- p ; Dropout parameter
- mode ; 'test' or 'train' , train -> O , test -> X
- seed ; Seed for the random number generator
output ; out, cache
- out ;
- cache ;
- dropout_param , mask
dropout_backward
input ; dout , cache
- dout ; Upstream derivatives
- cache ; (dropout_param, mask)
conv_forward_naive
'''
'''
A naive implementation of the forward pass for a convolutional layer.
The input consists of N data points, each with C channels, height H and
width W. We convolve each input with F different filters, where each filter
spans all C channels and has height HH and width WW.
Input:
- x: Input data of shape (N, C, H, W)
- w: Filter weights of shape (F, C, HH, WW)
- b: Biases, of shape (F,)
- conv_param: A dictionary with the following keys:
- 'stride': The number of pixels between adjacent receptive fields in the
horizontal and vertical directions.
- 'pad': The number of pixels that will be used to zero-pad the input.
During padding, 'pad' zeros should be placed symmetrically (i.e equally on both sides)
along the height and width axes of the input. Be careful not to modfiy the original
input x directly.
Returns a tuple of:
- out: Output data, of shape (N, F, H', W') where H' and W' are given by
H' = 1 + (H + 2 * pad - HH) / stride
W' = 1 + (W + 2 * pad - WW) / stride
- cache: (x, w, b, conv_param)
"""
out = None
###########################################################################
# TODO: Implement the convolutional forward pass. #
# Hint: you can use the function np.pad for padding. #
###########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#pass
N, C, H, W = x.shape
F, C, HH, WW = w.shape
stride = conv_param['stride']
pad = conv_param['pad']
out_h = 1 + (H + 2 * pad - HH) // stride
out_w = 1 + (W + 2 * pad - WW) // stride
### padding process ###
# Padding
for sample in range(N):
for fil in range(F):
for h in range(out_h):
start_h = stride * h
end_ h = stride * h + HH
for wi in range(out_w):
start_w = stride * wi
end_w = stride * wi + WW
x_conv = x_pad[sample,:,start_h:end_h,start_w:end_w]
out[sample,fil,h,wi] = np.sum(x_conv * w[fil]) + b[fil]
'''
The input consists of N data points, each with C channels, height H and width W.
we convolut each input with F different filters,
where each filter spans all C channels and has height HH and width WW.
'''
input ; x , w , b , conv_param
x ; input data
w ; filter weights
b ; Biases
conv_pam ;
'stride' ; The number of pixels between adjacent receptive fields in the horizontal and vertical directions.
'pad' ; The number of pixels that will be used to zero-pad the input.
During padding, 'pad' zeros should be placed symmetrically along the height and width axes of the input.
output ;
out ; output data
H' = 1 + (H + 2 * pad - HH) / stride
W' = 1 + (W + 2 * pad - WW) / stride
cache ; (x, w, b, conv_param)
conv_backward_naive
input
dout ; Upstream derivatives.
cache ; tuple of (x, w, b, conv_param_ as in conv_forward_naive
output
dx ; Gradient with respect to x
dw ; Gradient with respect to w
db ; Gradient with respect to bias
max_pool_forward_naive
input
x ; data
pool_param ; dictionary
- pool_height ; The height of each pooling region
- pool_width ; The width of each pooling region
- stride ; The distance between adjacent pooling regions
output
out ; output data
H' ; 1 + (H - pool_height) / stride
W' ; 1 + (W - pool_width) / stride
cache ; (x, pool_param)
max_pool_backward_naive
input
dout ; Upstream derivatives
output
dx ; Gradient with respect to x
spatial_batchnorm_forward
Computes the forward pass for spatial batch normalization
input ; x, gamma, beta, bn_param
* momentum : Constant for running mean / variance.
momentum = 0 means that old information is discarded completely at every time step, while
momentum = 1 means that new information is never incorporated.
The default of momentum = 0.9 should work well in most situations.
running_mean ; Array of shape (D, ) giving running mean of features.
running_var ; Array of shape (D, ) giving running variance of features.
output
out ; output data
cache ; values
spatial_batchnorm_backward
spatial_groupnorm_forward
in contrast to layer normalization, group normalization splits each entry in the data into G contiguous pieces,
which it then normalizes independently.
Per feature shifting and scaling are then applied to the data, in a manner identical to that of batch normalization and layer normalization.
input ; x , gamma , beta , *G , gn_param
G ; Integer number of groups to split into, should be a divisor of C
output ; out, cache
spatial_groupnorm_backward
a
## Instancenorm , Batchnorm , Layernorm , Groupnorm
Instance Normalization ; 각 mini-batch 의 이미지 한장씩만 계산 하여 각각의 개별 이미지 분포 사용
Batch Normalization ; 배치의 평균 및 표준 편차를 계산
Layer Normalization ; 동일한 층의 뉴런간 정규화
Group Normalization ; Layer Normalization 과 Instance Normalization 의 절충된 형태로 볼 수 있음.
-> 각 채널을 N개의 그룹으로 나누어 정규화 함.