(04강) Convolution은 무엇인가?

210811

Convolution

I는 입력 이미지, K는 적용하고자 하는 커널이다.

3x3 필터와 7x7 이미지를 컨볼루션 연산을 하면 5x5가 나온다.

같은 이미지에 대해서 적용하고자 하는 필터의 모양에 따라서 Output이 흐려지거나 강조될 수 있고 외곽선만 딸 수도 있다.

일반적으로 이미지에 대해 5x5 필터를 사용한다는 것은 이미지의 채널 개수와 필터의 커널의 개수가 동일하다는 조건이 있기 때문에 문제될 것이 없다. 이미지가 32x32x3 이라면 5x5필터도 5x5x3이 될것이다.

또, 여러개의 필터를 쓸 수도 있고 여러 층에서 이어서 사용할 수도 있다

이 때, 파라미터의 수를 구하는 것이 중요하다. (우측 그림을 보면)

첫번째 층에서의 파라미터 수는 커널의 크기 5x5 그리고 채널 수 3 그리고 output의 채널 수 4를 곱한 5x5x3x4 가 된다.
두번째 층에서의 파라미터 수는 커널의 크기 5x5 그리고 채널 수 4 그리고 output의 채널 수 10을 곱한 5x5x4x10이 된다

Convolution Neural Network, CNN은 도장을 찍는 Conv와 Pooling 그리고 Fully Connected, FC로 이루어져 있다.

최근에 추세는 FC를 제거하는 방향이다. 왜냐하면 FC는 모델의 파라미터 수를 많이 필요하기 때문이다.
- 파라미터 수가 많아지면 학습이 어려워지고 범용성이 줄어들게된다.

Stride

S = 1이면 픽셀을 한칸씩 이동하고 S = 2이면 픽셀을 두칸씩 이동한다

Padding

커널은 가장자리를 중심으로 찍지 못하는데, 패딩을 덧대면 가능하다.

파라미터의 개수는 커널의 크기 * 인풋 채널 수 * 아웃풋 채널 수 이다. 이걸 알아야 하는 이유는 어떤 모델을 볼 때 그 모델의 파라미터의 수가 대략 만단위 인지 십만단위 인지를 바로 파악해야 하기 때문이다.

첫번째 레이어

필터의 크기 : 11x11
입력채널 : 3
출력채널 : 96
- 이 때는 GPU 메모리가 부족해서 48개의 채널을 2개의 Stream으로 나누어서 진행했다.
파라미터 수 : 11 * 11 * 3 * 48 * 2 = 35k

두번째 레이어

필터의 크기 : 5x5
입력채널 : 48
출력 채널 : 128 * 2
파라미터 수 : 307k

세번째 레이어 : (3x3x2, 128, 192*2)

필터개수가 2개, 출력 개수도 2개이기 때문에 각각 2를 곱해준다.
특히 이 부분은 서로 다른 GPU에서 파라미터교환을 하는 부분으로 Intersection된다고 한다.

네번째 레이어 : (3x3, 192, 192*2)

다섯번째 레이어 : (3x3, 192, 128*2)

여섯번째 부터는 덴스 레이어(=Fully Connected Layer)이다.

입력크기 : 13x13
입력채널 : 128*2
출력채널 : 2048*2

일곱번째 레이어 : (1x1, 2048*2, 2048*2)

여덟번째 레이어 : (1x1, 2048*2, 1000)

파리미터 수를 비교해보면 알겠지만 Conv. 보다 Dense 에서 1000배 이상 수가 급증하게 된다. 따라서 이러한 증가를 막고자 Dense 대신 1x1 Conv. 를 하는 추세이다.

1x1 Conv. 를 하는 이유

크기는 유지하면서, 채널을 줄임으로써 파라미터를 줄이게 된다.
- 차원 감소라고도 한다

실습

Convolutional Neural Network (CNN)

import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
%matplotlib inline
%config InlineBackend.figure_format='retina'
print ("PyTorch version:[%s]."%(torch.__version__))
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print ("device:[%s]."%(device))

PyTorch version:[1.9.0+cu102].
device:[cuda:0].

Dataset

from torchvision import datasets,transforms
mnist_train = datasets.MNIST(root='./data/',train=True,transform=transforms.ToTensor(),download=True)
mnist_test = datasets.MNIST(root='./data/',train=False,transform=transforms.ToTensor(),download=True)
print ("mnist_train:\n",mnist_train,"\n")
print ("mnist_test:\n",mnist_test,"\n")
print ("Done.")

Data Iterator

BATCH_SIZE = 256
train_iter = torch.utils.data.DataLoader(mnist_train,batch_size=BATCH_SIZE,shuffle=True,num_workers=1)
test_iter = torch.utils.data.DataLoader(mnist_test,batch_size=BATCH_SIZE,shuffle=True,num_workers=1)
print ("Done.")

Define Model

class ConvolutionalNeuralNetworkClass(nn.Module):
    """
        Convolutional Neural Network (CNN) Class
    """
    def __init__(self,name='cnn',xdim=[1,28,28],
                 ksize=3,cdims=[32,64],hdims=[1024,128],ydim=10,
                 USE_BATCHNORM=False):
        super(ConvolutionalNeuralNetworkClass,self).__init__()
        self.name = name
        self.xdim = xdim
        self.ksize = ksize
        self.cdims = cdims
        self.hdims = hdims
        self.ydim = ydim
        self.USE_BATCHNORM = USE_BATCHNORM

CNN 클래스를 정의해준다. 입력 차원은 (1, 28, 28) 이고 출력 차원은 (10, ) 히든 레이어의 차원은 (1024, 128) 이다. 이 때 각각의 Conv Layer의 차원도 32와 64로 정의해주었다.
커널 사이즈는 3이다.

        # Convolutional layers
        self.layers = []
        prev_cdim = self.xdim[0]
        for cdim in self.cdims: # for each hidden layer
            self.layers.append(
                nn.Conv2d(
                    in_channels=prev_cdim,
                    out_channels=cdim,
                    kernel_size=self.ksize,
                    stride=(1,1),
                    padding=self.ksize//2)
                ) # convlution 
            if self.USE_BATCHNORM:
                self.layers.append(nn.BatchNorm2d(cdim)) # batch-norm
            self.layers.append(nn.ReLU(True))  # activation
            self.layers.append(nn.MaxPool2d(kernel_size=(2,2), stride=(2,2))) # max-pooling 
            self.layers.append(nn.Dropout2d(p=0.5))  # dropout
            prev_cdim = cdim

처음에 정의해주었던 변수들도 CNN 을 구성한다.
차원은 입력차원인 1차원에서, 32차원, 64차원을 거쳐 출력차원인 10차원으로 끝난다.
- 자세히는 (1, 28, 28) 에서 (32, 14, 14) 그리고 (64, 7, 7)로 반환될 것이다.
- CNN에서는 64차원으로 반환되며, 덴스 레이어에서 10차원으로 반환된다.
- Max pooling 때문에 이미지 크기가 절반씩 줄어들게 된다.

        # Dense layers
        self.layers.append(nn.Flatten())
        prev_hdim = prev_cdim*(self.xdim[1]//(2**len(self.cdims)))*(self.xdim[2]//(2**len(self.cdims)))
        for hdim in self.hdims:
            self.layers.append(nn.Linear(
                prev_hdim, hdim, bias=True
                               ))
            self.layers.append(nn.ReLU(True))  # activation
            prev_hdim = hdim
        # Final layer (without activation)
        self.layers.append(nn.Linear(prev_hdim,self.ydim,bias=True))

        # Concatenate all layers 
        self.net = nn.Sequential()
        for l_idx,layer in enumerate(self.layers):
            layer_name = "%s_%02d"%(type(layer).__name__.lower(),l_idx)
            self.net.add_module(layer_name,layer)
        self.init_param() # initialize parameters

덴스 레이어를 정의했다.
마지막에 nn.Sequential() 을 실행하면서 list에 append한 layer들을 하나로 합치게 된다. 이후, 가중치를 초기화하는 작업을 한다.

    def init_param(self):
        for m in self.modules():
            if isinstance(m,nn.Conv2d): # init conv
                nn.init.kaiming_normal_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m,nn.BatchNorm2d): # init BN
                nn.init.constant_(m.weight,1)
                nn.init.constant_(m.bias,0)
            elif isinstance(m,nn.Linear): # lnit dense
                nn.init.kaiming_normal_(m.weight)
                nn.init.zeros_(m.bias)
            
    def forward(self,x):
        return self.net(x)

C = ConvolutionalNeuralNetworkClass(
    name='cnn',xdim=[1,28,28],ksize=3,cdims=[32,64],
    hdims=[32],ydim=10).to(device)
loss = nn.CrossEntropyLoss()
optm = optim.Adam(C.parameters(),lr=1e-3)
print ("Done.")

CNN은 He initializaiotn을 적용하고 배치같은 경우에는 weight과 bias를 표준정규분포를 따르도록 한다.

Check Parameters

np.set_printoptions(precision=3)
n_param = 0
for p_idx,(param_name,param) in enumerate(C.named_parameters()):
    if param.requires_grad:
        param_numpy = param.detach().cpu().numpy() # to numpy array 
        n_param += len(param_numpy.reshape(-1))
        print ("[%d] name:[%s] shape:[%s]."%(p_idx,param_name,param_numpy.shape))
        print ("    val:%s"%(param_numpy.reshape(-1)[:5]))
print ("Total number of parameters:[%s]."%(format(n_param,',d')))

[0] name:[net.conv2d_00.weight] shape:[(32, 1, 3, 3)].
    val:[ 0.48   0.795 -0.328  0.003 -0.101]
[1] name:[net.conv2d_00.bias] shape:[(32,)].
    val:[0. 0. 0. 0. 0.]
[2] name:[net.conv2d_04.weight] shape:[(64, 32, 3, 3)].
    val:[0.155 0.017 0.136 0.019 0.046]
[3] name:[net.conv2d_04.bias] shape:[(64,)].
    val:[0. 0. 0. 0. 0.]
[4] name:[net.linear_09.weight] shape:[(32, 3136)].
    val:[-0.041 -0.032 -0.001  0.041 -0.015]
[5] name:[net.linear_09.bias] shape:[(32,)].
    val:[0. 0. 0. 0. 0.]
[6] name:[net.linear_11.weight] shape:[(10, 32)].
    val:[-0.072 -0.048 -0.105  0.251  0.523]
[7] name:[net.linear_11.bias] shape:[(10,)].
    val:[0. 0. 0. 0. 0.]
Total number of parameters:[119,530].

결과를 보면 인덱스가 00 에서 04로 바로 건너뛰게된다.
- 01 : ReLU
- 02 : MaxPool2D
- 03 : Dropout

Simple Forward Path of the CNN Model

np.set_printoptions(precision=3)
torch.set_printoptions(precision=3)
x_numpy = np.random.rand(2,1,28,28)
x_torch = torch.from_numpy(x_numpy).float().to(device)
y_torch = C.forward(x_torch) # forward path
y_numpy = y_torch.detach().cpu().numpy() # torch tensor to numpy array
print ("x_torch:\n",x_torch)
print ("y_torch:\n",y_torch)
print ("\nx_numpy %s:\n"%(x_numpy.shape,),x_numpy)
print ("y_numpy %s:\n"%(y_numpy.shape,),y_numpy)

x_torch:
 tensor([[[[0.216, 0.686, 0.449,  ..., 0.845, 0.632, 0.367],
          [0.423, 0.263, 0.057,  ..., 0.135, 0.180, 0.564],
          [0.438, 0.473, 0.898,  ..., 0.777, 0.365, 0.650],
          ...,
          [0.453, 0.744, 0.648,  ..., 0.873, 0.492, 0.284],
          [0.500, 0.825, 0.532,  ..., 0.899, 0.706, 0.611],
          [0.012, 0.561, 0.997,  ..., 0.676, 0.276, 0.328]]],


        [[[0.421, 0.828, 0.172,  ..., 0.137, 0.138, 0.450],
          [0.536, 0.576, 0.426,  ..., 0.309, 0.624, 0.366],
          [0.655, 0.762, 0.226,  ..., 0.279, 0.492, 0.777],
          ...,
          [0.554, 0.616, 0.794,  ..., 0.321, 0.287, 0.028],
          [0.486, 0.343, 0.304,  ..., 0.181, 0.804, 0.304],
          [0.771, 0.622, 0.573,  ..., 0.587, 0.940, 0.416]]]], device='cuda:0')
y_torch:
 tensor([[-0.054, -3.038,  3.234,  5.741,  1.936, -9.030,  3.322,  0.424, -3.799,
         -3.518],
        [ 0.431, -1.759,  2.307,  2.540,  0.906, -5.047, -1.595,  4.348, -8.021,
          2.194]], device='cuda:0', grad_fn=<AddmmBackward>)

x_numpy (2, 1, 28, 28):
 [[[[0.216 0.686 0.449 ... 0.845 0.632 0.367]
   [0.423 0.263 0.057 ... 0.135 0.18  0.564]
   [0.438 0.473 0.898 ... 0.777 0.365 0.65 ]
   ...
   [0.453 0.744 0.648 ... 0.873 0.492 0.284]
   [0.5   0.825 0.532 ... 0.899 0.706 0.611]
   [0.012 0.561 0.997 ... 0.676 0.276 0.328]]]


 [[[0.421 0.828 0.172 ... 0.137 0.138 0.45 ]
   [0.536 0.576 0.426 ... 0.309 0.624 0.366]
   [0.655 0.762 0.226 ... 0.279 0.492 0.777]
   ...
   [0.554 0.616 0.794 ... 0.321 0.287 0.028]
   [0.486 0.343 0.304 ... 0.181 0.804 0.304]
   [0.771 0.622 0.573 ... 0.587 0.94  0.416]]]]
y_numpy (2, 10):
 [[-0.054 -3.038  3.234  5.741  1.936 -9.03   3.322  0.424 -3.799 -3.518]
 [ 0.431 -1.759  2.307  2.54   0.906 -5.047 -1.595  4.348 -8.021  2.194]]

x_torch = torch.from_numpy(x_numpy).float().to(device)
- numpy에서 torch로 변환되는 구문
y_numpy = y_torch.detach().cpu().numpy()
- torch에서 numpy로 변환되는 구문

Evaluation Function

def func_eval(model,data_iter,device):
    with torch.no_grad():
        n_total,n_correct = 0,0
        model.eval() # evaluate (affects DropOut and BN)
        for batch_in,batch_out in data_iter:
            y_trgt = batch_out.to(device)
            model_pred = model(batch_in.view(-1,1,28,28).to(device))
            _,y_pred = torch.max(model_pred.data,1)
            n_correct += (y_pred==y_trgt).sum().item()
            n_total += batch_in.size(0)
        val_accr = (n_correct/n_total)
        model.train() # back to train mode 
    return val_accr
print ("Done")

Initial Evaluation

C.init_param() # initialize parameters
train_accr = func_eval(C,train_iter,device)
test_accr = func_eval(C,test_iter,device)
print ("train_accr:[%.3f] test_accr:[%.3f]."%(train_accr,test_accr))

train_accr:[0.113] test_accr:[0.104].

Train

print ("Start training.")
C.init_param() # initialize parameters
C.train() # to train mode 
EPOCHS,print_every = 10,1
for epoch in range(EPOCHS):
    loss_val_sum = 0
    for batch_in,batch_out in train_iter:
        # Forward path
        y_pred = C.forward(batch_in.view(-1,1,28,28).to(device))
        loss_out = loss(y_pred,batch_out.to(device))
        # Update
        optm.zero_grad()      # reset gradient 
        loss_out.backward()      # backpropagate
        optm.step()      # optimizer update
        loss_val_sum += loss_out
    loss_val_avg = loss_val_sum/len(train_iter)
    # Print
    if ((epoch%print_every)==0) or (epoch==(EPOCHS-1)):
        train_accr = func_eval(C,train_iter,device)
        test_accr = func_eval(C,test_iter,device)
        print ("epoch:[%d] loss:[%.3f] train_accr:[%.3f] test_accr:[%.3f]."%
               (epoch,loss_val_avg,train_accr,test_accr))
print ("Done")

Start training.
epoch:[0] loss:[0.566] train_accr:[0.960] test_accr:[0.960].
epoch:[1] loss:[0.163] train_accr:[0.977] test_accr:[0.977].
epoch:[2] loss:[0.121] train_accr:[0.981] test_accr:[0.980].
epoch:[3] loss:[0.098] train_accr:[0.985] test_accr:[0.984].
epoch:[4] loss:[0.087] train_accr:[0.987] test_accr:[0.985].
epoch:[5] loss:[0.077] train_accr:[0.989] test_accr:[0.986].
epoch:[6] loss:[0.072] train_accr:[0.990] test_accr:[0.987].
epoch:[7] loss:[0.066] train_accr:[0.991] test_accr:[0.987].
epoch:[8] loss:[0.060] train_accr:[0.992] test_accr:[0.989].
epoch:[9] loss:[0.055] train_accr:[0.992] test_accr:[0.988].
Done

Test

n_sample = 25
sample_indices = np.random.choice(len(mnist_test.targets),n_sample,replace=False)
test_x = mnist_test.data[sample_indices]
test_y = mnist_test.targets[sample_indices]
with torch.no_grad():
    C.eval() # to evaluation mode 
    y_pred = C.forward(test_x.view(-1,1,28,28).type(torch.float).to(device)/255.)
y_pred = y_pred.argmax(axis=1)
plt.figure(figsize=(10,10))
for idx in range(n_sample):
    plt.subplot(5, 5, idx+1)
    plt.imshow(test_x[idx], cmap='gray')
    plt.axis('off')
    plt.title("Pred:%d, Label:%d"%(y_pred[idx],test_y[idx]))
plt.show()    
print ("Done")

Done

Previous(05강) Modern CNN - 1x1 convolution의 중요성 Next(03강) Optimization

Last updated 3 years ago

Was this helpful?