Hyperparameter Tuning

Naver boostcamp -ai tech/week 02

Hyperparameter Tuning

끵뀐꿩긘 2022. 10. 1. 18:30

모델 스스로 학습하지 않는 값인 hyperparameter(학습률, 손실함수, 배치사이즈, 가중치 초기화 값)를 직접 사람이 지정하면서 최적의 값을 찾는것

데이터를 더 많이 학습시키는 것이나, 더 좋은 모델을 사용하는 것보단 중요성이 떨어지며, 딥러닝과정의 마지막에 약간의 성능 향상을 위해서 사용하는 방법이다

하이퍼 파라미터 튜닝의 종류

Manual Search
Grid Search
Random Search
Bayesian Optimization
Non-Probabilistic
Evolutionary Optimization
Gradient-based Optimization
Early Stopping

Manual Search

경험/감으로 하이퍼파라미터 값을 설정하는 방법

이미 많은 모델들에는 어느정도 최적화를 보장하는 일반적인 hyperparameter 값이 존재한다

Grid Search

가장 기본적인 하이퍼파라미터 최적화 방법으로, 가능한 모든 조합의 hyperparameter로 훈련시켜서 최적의 조합을 찾는 것을 의미한다. 모든 가능성을 살펴보기 때문에 하이퍼파라미터 조합이나 데이터가 많은 경우 시간이 매우 오래 걸린다.

Random Search

경계 내에서 임의의 조합을 추출하여 최적의 조합을 찾는 방법, Grid search보다 시간대비 성능이 좋지만, 최적의 파라미터를 찾기 위해 넓은 범위를 탐색하므로 비효율적인 방법이다

Bayesian Optimization

베이지안 최적화는 목적함수를 최대 또는 최소로 하는 최적해를 찾는 방법이다

Surrogate model: 목적 함수에 대한 확률적인 추정을 하는 모델. 우리가 추정한 목적함수
Acquisition function: Surrogate model이 확률적으로 추정한 결과를 바탕으로 다음 입력 데이터(하이퍼파라미터 조합)를 추천하는 함수

여기서의 목적함수는 loss, accuracy 등 우리가 구하고자 하는 지표를 의미한다.

베이지안 최적화는 목적함수와 하이퍼파라미터의 조합을 대상으로 Surrogate model을 만들어 평가하고 Acquisition Function이 다음 인풋으로 사용할 조합을 추천하는 과정을 반복하면서 순차적으로 업데이트하여 최적의 조합을 찾아낸다.

주로 DL보다는 ML에서 사용한다고 한다.

HyperOpt , Optuna

하이퍼파라미터 최적화 태스크를 자동화해주는 프레임워크이다

Ray를 사용한 하이퍼 파라미터 튜닝

Ray는 parallel and distributed application(병렬 분산 처리) 제작을 위한 universal API이다

기본적으로 현재의 분산 병렬 ML/DL 모듈의 표준이며, hyperarameter search를 위한 다양한 모듈을 제공한다

Ray, 텐서보드, wandb설치

print("Install ray")
!pip uninstall -y -q pyarrow
!pip install -q -U ray[tune]
!pip install -q ray[debug]

초기설정, 데이터 가져오기

from functools import partial
import numpy as np
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import random_split
import torchvision
import torchvision.transforms as transforms
from ray import tune
from ray.tune import CLIReporter
from ray.tune.schedulers import ASHAScheduler

import wandb

# CIFAR10 데이터
def load_data(data_dir="./data"):
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

    trainset = torchvision.datasets.CIFAR10(
        root=data_dir, train=True, download=True, transform=transform)

    testset = torchvision.datasets.CIFAR10(
        root=data_dir, train=False, download=True, transform=transform)

    return trainset, testset

모듈 정의

class Net(nn.Module):
    def __init__(self, l1=120, l2=84):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, l1)
        self.fc2 = nn.Linear(l1, l2)
        self.fc3 = nn.Linear(l2, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

'''
Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=1, bias=True)
  (fc2): Linear(in_features=1, out_features=2, bias=True)
  (fc3): Linear(in_features=2, out_features=10, bias=True)
)
'''

통제 & 조작 변인

# 통제 변인
## 1. imagenet_resnet18 pretrained 모델
def get_imagenet_pretrained_model():
  imagenet_resnet18 = torchvision.models.resnet18(pretrained=True)
  target_model = imagenet_resnet18
  FASHION_INPUT_NUM = 1
  FASHION_CLASS_NUM = 10

  target_model.conv1 = torch.nn.Conv2d(FASHION_INPUT_NUM, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  target_model.fc = torch.nn.Linear(in_features=512, out_features=FASHION_CLASS_NUM, bias=True)

  torch.nn.init.xavier_uniform_(target_model.fc.weight)
  stdv = 1. / math.sqrt(target_model.fc.weight.size(1))
  target_model.fc.bias.data.uniform_(-stdv, stdv)

  return target_model

# 조작 변인
## 1. Learning Rate
def get_adam_by_learningrate(model, learning_rate:float):
  return torch.optim.Adam(model.parameters(), lr=learning_rate)
## 2. Epoch 개수
def get_epoch_by_epoch(epoch:int):
  return epoch
## 3. BatchSize 크기에 따른 데이터 로더 생성
common_transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])
fashion_train_transformed = torchvision.datasets.FashionMNIST(root='./fashion', train=True, download=True, transform=common_transform)
fashion_test_transformed = torchvision.datasets.FashionMNIST(root='./fashion', train=False, download=True, transform=common_transform)

def get_dataloaders_by_batchsize(batch_size:int):
  # Mnist Dataset을 DataLoader에 붙이기
  BATCH_SIZE = batch_size
  fashion_train_dataloader = torch.utils.data.DataLoader(fashion_train_transformed, batch_size=BATCH_SIZE, shuffle=True, num_workers=2)
  fashion_test_dataloader = torch.utils.data.DataLoader(fashion_test_transformed, batch_size=BATCH_SIZE, shuffle=False, num_workers=2)

  dataloaders = {
      "train" : fashion_train_dataloader,
      "test" : fashion_test_dataloader
  }

  return dataloaders

hyperparmeter 설정

from ray import tune

config_space = {
    "NUM_EPOCH" : tune.choice([4,5,6,7,8,9]),
    "LearningRate" : tune.uniform(0.0001, 0.001),
    "BatchSize" : tune.choice([32,64,128]),
}

model train

def training(
    config # 조작 변인 learning rate, epoch, batchsize 정보
):
  # 통제 변인
  target_model = get_imagenet_pretrained_model() 

  device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # 학습 때 GPU 사용여부 결정. Colab에서는 "런타임"->"런타임 유형 변경"에서 "GPU"를 선택할 수 있음
  target_model.to(device)

  # 조작 변인
  NUM_EPOCH = get_epoch_by_epoch(config["NUM_EPOCH"])
  dataloaders = get_dataloaders_by_batchsize(config["BatchSize"])
  optimizer = get_adam_by_learningrate(target_model, config["LearningRate"])
  

  ### 학습 코드 시작
  best_test_accuracy = 0.
  best_test_loss = 0.
  for epoch in range(NUM_EPOCH):
    for phase in ["train", "test"]:
      running_loss = 0.
      running_acc = 0.
      if phase == "train":
        target_model.train() # 네트워크 모델을 train 모드로 두어 gradient을 계산하고, 여러 sub module (배치 정규화, 드롭아웃 등)이 train mode로 작동할 수 있도록 함
      elif phase == "test":
        target_model.eval() # 네트워크 모델을 eval 모드 두어 여러 sub module들이 eval mode로 작동할 수 있게 함

      for ind, (images, labels) in enumerate(tqdm(dataloaders[phase])):
        # (참고.해보기) 현재 tqdm으로 출력되는 것이 단순히 진행 상황 뿐인데 현재 epoch, running_loss와 running_acc을 출력하려면 어떻게 할 수 있는지 tqdm 문서를 보고 해봅시다!
        # hint - with, pbar
        images = images.to(device)
        labels = labels.to(device)

        optimizer.zero_grad() # parameter gradient를 업데이트 전 초기화함

        with torch.set_grad_enabled(phase == "train"): # train 모드일 시에는 gradient를 계산하고, 아닐 때는 gradient를 계산하지 않아 연산량 최소화
          logits = target_model(images)
          _, preds = torch.max(logits, 1) # 모델에서 linear 값으로 나오는 예측 값 ([0.9,1.2, 3.2,0.1,-0.1,...])을 최대 output index를 찾아 예측 레이블([2])로 변경함  
          loss = loss_fn(logits, labels)

          if phase == "train":
            loss.backward() # 모델의 예측 값과 실제 값의 CrossEntropy 차이를 통해 gradient 계산
            optimizer.step() # 계산된 gradient를 가지고 모델 업데이트

        running_loss += loss.item() * images.size(0) # 한 Batch에서의 loss 값 저장
        running_acc += torch.sum(preds == labels.data) # 한 Batch에서의 Accuracy 값 저장

      # 한 epoch이 모두 종료되었을 때,
      epoch_loss = running_loss / len(dataloaders[phase].dataset)
      epoch_acc = running_acc / len(dataloaders[phase].dataset)

      if phase == "test" and best_test_accuracy < epoch_acc: # phase가 test일 때, best accuracy 계산
        best_test_accuracy = epoch_acc
      if phase == "test" and best_test_loss < epoch_loss: # phase가 test일 때, best loss 계산
        best_test_loss = epoch_loss
  # epoch 종료
  tune.report(accuracy=best_test_accuracy.item(), loss=best_test_loss)

hyperOpt search

from ray.tune.suggest.hyperopt import HyperOptSearch

optim = HyperOptSearch( # HyperOptSearch 통해 Search를 진행합니다. 더 다양한 Optimizer들은 https://docs.ray.io/en/master/tune/api_docs/suggestion.html#bayesopt 문서를 참고해주세요
    metric='accuracy', # hyper parameter tuning 시 최적화할 metric을 결정합니다. 본 실험은 test accuracy를 target으로 합니다
    mode="max", # target objective를 maximize 하는 것을 목표로 설정합니다
)

ray로 최적값 찾기

from ray.tune import CLIReporter
import ray

NUM_TRIAL = 2 # Hyper Parameter를 탐색할 때에, 실험을 최대 수행할 횟수를 지정합니다.

reporter = CLIReporter( # jupyter notebook을 사용하기 때문에 중간 수행 결과를 command line에 출력하도록 함
    parameter_columns=["NUM_EPOCH", "LearningRate", "BatchSize"],
    metric_columns=["accuracy", "loss"])

ray.shutdown() # ray 초기화 후 실행

analysis = tune.run(
    training,
    config=config_space,
    search_alg=optim,
    #verbose=1,
    progress_reporter=reporter,
    num_samples=NUM_TRIAL,
    resources_per_trial={'gpu': 1} # Colab 런타임이 GPU를 사용하지 않는다면 comment 처리로 지워주세요
)

최고 성능

best_trial = analysis.get_best_trial('accuracy', 'max')
print(f"최고 성능 config : {best_trial.config}")
print(f"최고 test accuracy : {best_trial.last_result['accuracy']}")

'''
최고 성능 config : {'NUM_EPOCH': 9, 'LearningRate': 0.0006957755303870037, 'BatchSize': 64}
최고 test accuracy : 0.9113999605178833
'''