카테고리 없음

cs231n_assignment_fullyconnectedlayer

Hardy. 2021. 5. 25. 21:22

1. Batchnormalization

 

batchnorm_forward ( x, gamma, beta, bn_param )

input

- x ; data

- gamma ; Scale parameter of shape 

- beta ; Shift parameter of shape

- bn_param

mode ; 'train' or 'test'.

eps ; constant for numeric stability.

momentum ; constant for running mean / variance.

running_mean ; giving running mean of features

running_var ; giving running varaiance of features

 

returns a tuple of

- out ; of shape (N, D)

- cache ; tuple of values needed in the backward pass

 

'''

At each timestep we update the running averages for mean and variance using an exponential decay based on the momentum parameter.

 

running_mean = momentum * running_mean + (1 - momentum) * sample_mean

running_var = momentum * runnig_var + (1 - momentum) * sample_var

'''

 

batchnorm_backward(dout, cache)

 

- dout ; Upstream derivatives

- cache ; Variable of intermediates from batchnorm_forward.

 

Returns a tuple of;

 

- dx ; Gradient with respect to inputs x, of shape

- dgamma ; Gradient with respect to scale parameter gamma

- dbeta ; Gradient with respect to shift parameter beta

 

 

 

"Backward pass for batxch noramlization."

 

앞서 저장한 cache 에서 x_noramlized, gamma, beta, sample_mean , sample_var, x , eps 를 꺼내온다.

 

batchnorm_backward_alt(dout, cache)

 

#code and describe

 

def batchnorm_backward_alt(dout, cache):
    """
    Alternative backward pass for batch normalization.

    For this implementation you should work out the derivatives for the batch
    normalizaton backward pass on paper and simplify as much as possible. You
    should be able to derive a simple expression for the backward pass. 
    See the jupyter notebook for more hints.
     
    Note: This implementation should expect to receive the same cache variable
    as batchnorm_backward, but might not use all of the values in the cache.

    Inputs / outputs: Same as batchnorm_backward
    """
    dx, dgamma, dbeta = None, None, None
    ###########################################################################
    # TODO: Implement the backward pass for batch normalization. Store the    #
    # results in the dx, dgamma, and dbeta variables.                         #
    #                                                                         #
    # After computing the gradient with respect to the centered inputs, you   #
    # should be able to compute gradients with respect to the inputs in a     #
    # single statement; our implementation fits on a single 80-character line.#
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    #pass
    gamma, x_norm, x_mu, var_inv = cache
    N = x_mu.shape[0]
    
    dgamma = np.sum(dout * x_norm, axis=0)
    dbeta = np.sum(dout, axis=0)
    
    dvar = np.sum(dout * gamma * x_mu, axis=0) * -0.5 * (var_inv ** 3) ##
    dmu = np.sum(dout * gamma * -var_inv, axis=0) + dvar * -2 * np.mean(x_mu, axis=0) ##
    
    dx = dout * gamma * var_inv + dvar * (2 / N) * x_mu + dmu * (1 / N)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    return dx, dgamma, dbeta

 

layernorm_forward

 

input ; x , gamma, beta , ln_param

 

- x ; Data

- gamma ; Scale parameter

- beta ; Shift parameter

- ln_param ; eps -> Constant for numeric stability

 

Note that in contrast to batch normalization, the behavior druing train and test-time for 

layer normalization are identical, and we do not need to keep track of running averages of any sort.

 

layernorm_backward

 

input ; dout, cache

 

- dout ; Upstream derivatives

- cache ; Variable of intermediates from layernorm_forward

 

output ; dx , dgamma , dbeta

 

- dx ; Gradient with respect to inputs, x

- dgamma ; Gradient with respect to scale parameter gamma

- dbeta ; Gradient with respect to shift parameter beta

 

dropout_forward

 

input ; x, dropout_param

 

- x ; Input data

- dropout_param ; 

  - p ; Dropout parameter

  - mode ; 'test' or 'train' , train -> O , test -> X

  - seed ; Seed for the random number generator

 

output ; out, cache

- out ; 

- cache ; 

  - dropout_param , mask

 

dropout_backward

 

input ; dout , cache

  - dout ; Upstream derivatives

  - cache ; (dropout_param, mask)

 

conv_forward_naive

 

'''

'''

    A naive implementation of the forward pass for a convolutional layer.

    The input consists of N data points, each with C channels, height H and
    width W. We convolve each input with F different filters, where each filter
    spans all C channels and has height HH and width WW.

    Input:
    - x: Input data of shape (N, C, H, W)
    - w: Filter weights of shape (F, C, HH, WW)
    - b: Biases, of shape (F,)
    - conv_param: A dictionary with the following keys:
      - 'stride': The number of pixels between adjacent receptive fields in the
        horizontal and vertical directions.
      - 'pad': The number of pixels that will be used to zero-pad the input. 
        

    During padding, 'pad' zeros should be placed symmetrically (i.e equally on both sides)
    along the height and width axes of the input. Be careful not to modfiy the original
    input x directly.

    Returns a tuple of:
    - out: Output data, of shape (N, F, H', W') where H' and W' are given by
      H' = 1 + (H + 2 * pad - HH) / stride
      W' = 1 + (W + 2 * pad - WW) / stride
    - cache: (x, w, b, conv_param)
    """
    out = None
    ###########################################################################
    # TODO: Implement the convolutional forward pass.                         #
    # Hint: you can use the function np.pad for padding.                      #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    #pass
    N, C, H, W = x.shape
    F, C, HH, WW = w.shape
    stride = conv_param['stride']
    pad = conv_param['pad']
    
    out_h = 1 + (H + 2 * pad - HH) // stride
    out_w = 1 + (W + 2 * pad - WW) // stride

 

    ### padding process ### 
    # Padding
    for sample in range(N):
        for fil in range(F):
            for h in range(out_h):
                start_h = stride * h
                end_ h = stride * h + HH
                for wi in range(out_w):
                    start_w = stride * wi
                    end_w = stride * wi + WW
                    x_conv = x_pad[sample,:,start_h:end_h,start_w:end_w]
                    out[sample,fil,h,wi] = np.sum(x_conv * w[fil]) + b[fil]

 

'''

The input consists of N data points, each with C channels, height H and width W.

we convolut each input with F different filters,

where each filter spans all C channels and has height HH and width WW.

'''

 

input ; x , w , b , conv_param

x ; input data

w ; filter weights

b ; Biases

conv_pam ;

  'stride' ; The number of pixels between adjacent receptive fields in the horizontal and vertical directions.

  'pad' ; The number of pixels that will be used to zero-pad the input.

 

During padding, 'pad' zeros should be placed symmetrically along the height and width axes of the input.

 

output ;

out ; output data

H' = 1 + (H + 2 * pad - HH) / stride

W' = 1 + (W + 2 * pad - WW) / stride

cache ; (x, w, b, conv_param)

 

conv_backward_naive

 

input 

dout ; Upstream derivatives.

cache ; tuple of (x, w, b, conv_param_ as in conv_forward_naive

 

output

dx ; Gradient with respect to x

dw ; Gradient with respect to w

db ; Gradient with respect to bias

 

max_pool_forward_naive

 

input

x ; data

pool_param ; dictionary 

  - pool_height ; The height of each pooling region

  - pool_width ; The width of each pooling region

  - stride ; The distance between adjacent pooling regions

 

output

out ; output data

H' ; 1 + (H - pool_height) / stride

W' ; 1 + (W - pool_width) / stride

cache ; (x, pool_param)

 

max_pool_backward_naive

 

input

dout ; Upstream derivatives

 

output

dx ; Gradient with respect to x

 

spatial_batchnorm_forward

 

Computes the forward pass for spatial batch normalization

 

input ; x, gamma, beta, bn_param

* momentum : Constant for running mean / variance. 

momentum = 0 means that old information is discarded completely at every time step, while 

momentum = 1 means that new information is never incorporated. 

The default of momentum = 0.9 should work well in most situations.

 

running_mean ; Array of shape (D, ) giving running mean of features.

running_var ; Array of shape (D, ) giving running variance of features.

 

output

out ; output data

cache ; values

 

 

 

spatial_batchnorm_backward

 

 

 

spatial_groupnorm_forward

 

in contrast to layer normalization, group normalization splits each entry in the data into G contiguous pieces,

which it then normalizes independently.

Per feature shifting and scaling are then applied to the data, in a manner identical to that of batch normalization and layer normalization.

 

input ; x , gamma , beta , *G , gn_param

G ; Integer number of groups to split into, should be a divisor of C

 

output ; out, cache

 

spatial_groupnorm_backward

 

a

 

 

 

## Instancenorm , Batchnorm , Layernorm , Groupnorm

Instance Normalization ; 각 mini-batch 의 이미지 한장씩만 계산 하여 각각의 개별 이미지 분포 사용

 

Batch Normalization ; 배치의 평균 및 표준 편차를 계산

 

Layer Normalization ; 동일한 층의 뉴런간 정규화

 

Group Normalization ; Layer Normalization 과 Instance Normalization 의 절충된 형태로 볼 수 있음.

-> 각 채널을 N개의 그룹으로 나누어 정규화 함.