21 Wed

๋”ฅ๋Ÿฌ๋‹ CNN ์™„๋ฒฝ ๊ฐ€์ด๋“œ - Fundamental ํŽธ

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์˜ ์ดํ•ด

  • ํŠน์ง•์ด ๋งŽ์„ ์ˆ˜๋ก ๊ฐ€์ค‘์น˜์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚˜๊ฒŒ ๋œ๋‹ค.

  • ๊ฐ€์ค‘์น˜์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚˜๋ฉด ๊ณ ์ฐจ์› ๋ฐฉ์ •์‹์œผ๋กœ ๋น„์šฉ ํ•จ์ˆ˜๊ฐ€ ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ๊ฐ€์ค‘์น˜ W๋ฅผ ์ฐพ๊ธฐ๊ฐ€ ์–ด๋ ค์›Œ์ง„๋‹ค.

  • ๊ณ ์ฐจ์› ๋ฐฉ์ •์‹์— ๋Œ€ํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ฉด์„œ RSS๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ง๊ด€์ ์œผ๋กœ ์ œ๊ณตํ•˜๋Š” ๋›ฐ์–ด๋‚œ ๋ฐฉ์‹

์†์‹คํ•จ์ˆ˜์˜ ํŽธ๋ฏธ๋ถ„

  • W์— ๋Œ€ํ•ด์„œ ๋ชจ๋“  ํ•ญ์„ ๋ฏธ๋ถ„ํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ W์— ๋Œ€ํ•œ ๋ฏธ๋ถ„์„ ํ•˜๊ฒŒ ๋œ๋‹ค.

๊ฐ€์ค‘์น˜์™€ ์ ˆํŽธ

  • ๊ฐ€์ค‘์น˜์™€ ์ ˆํŽธ์€ ์†์‹ค ํ•จ์ˆ˜์˜ ํŽธ๋ฏธ๋ถ„ ๊ฐ’์„ ๊ฐฑ์‹ ํ•˜๋ฉด์„œ ๊ฐฑ์‹ ๋œ๋‹ค.

  • ์ด ๋•Œ ์ผ์ •ํ•œ ๊ณ„์ˆ˜๋ฅผ ๊ณฑํ•ด์„œ ๊ฐฑ์‹ ๋˜๋Š”๋ฐ ์ด ๊ณ„์ˆ˜๋ฅผ ํ•™์Šต๋ฅ ์ด๋ผ๊ณ  ํ•œ๋‹ค.

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ์ด์šฉํ•˜์—ฌ ์„ ํ˜•ํšŒ๊ท€ ๊ตฌํ˜„ํ•˜๊ธฐ - 01

# ๋ฐ์ดํ„ฐ ๊ฑด์ˆ˜
N = len(target)
# ์˜ˆ์ธก ๊ฐ’. 
predicted = w1 * rm + w2*lstat + bias
# ์‹ค์ œ๊ฐ’๊ณผ ์˜ˆ์ธก๊ฐ’์˜ ์ฐจ์ด 
diff = target - predicted
# bias ๋ฅผ array ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ ์„ค์ •. 
bias_factors = np.ones((N,))

# weight์™€ bias๋ฅผ ์–ผ๋งˆ๋‚˜ updateํ•  ๊ฒƒ์ธ์ง€๋ฅผ ๊ณ„์‚ฐ.  
w1_update = -(2/N)*learning_rate*(np.dot(rm.T, diff))
w2_update = -(2/N)*learning_rate*(np.dot(lstat.T, diff))
bias_update = -(2/N)*learning_rate*(np.dot(bias_factors.T, diff))

# Mean Squared Error๊ฐ’์„ ๊ณ„์‚ฐ. 
mse_loss = np.mean(np.square(diff))

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ์ด์šฉํ•˜์—ฌ ์„ ํ˜•ํšŒ๊ท€ ๊ตฌํ˜„ํ•˜๊ธฐ - 02

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(bostonDF[['RM', 'LSTAT']])
  • ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ minmax์Šค์นผ๋ผ๋ฅผ ์ด์šฉํ•˜๋ฉด ๋ฐ์ดํ„ฐ์˜ ๊ฐ’์„ 0์—์„œ 1 ์‚ฌ์ด๋กœ ์ •๊ทœํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.

from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam

model = Sequential([
    # ๋‹จ ํ•˜๋‚˜์˜ units ์„ค์ •. input_shape๋Š” 2์ฐจ์›, ํšŒ๊ท€์ด๋ฏ€๋กœ activation์€ ์„ค์ •ํ•˜์ง€ ์•Š์Œ. 
    # weight์™€ bias ์ดˆ๊ธฐํ™”๋Š” kernel_inbitializer์™€ bias_initializer๋ฅผ ์ด์šฉ. 
    Dense(1, input_shape=(2, ), activation=None, kernel_initializer='zeros', bias_initializer='ones')
])
# Adam optimizer๋ฅผ ์ด์šฉํ•˜๊ณ  Loss ํ•จ์ˆ˜๋Š” Mean Squared Error, ์„ฑ๋Šฅ ์ธก์ • ์—ญ์‹œ MSE๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•™์Šต ์ˆ˜ํ–‰. 
model.compile(optimizer=Adam(learning_rate=0.01), loss='mse', metrics=['mse'])
model.fit(scaled_features, bostonDF['PRICE'].values, epochs=1000)
  • ์ผ€๋ผ์Šค ๋ชจ๋“ˆ์„ ์ด์šฉํ•˜๋ฉด ๋ชจ๋ธ ๊ตฌ์„ฑ์— ํ•„์š”ํ•œ ํ™˜๊ฒฝ๋“ค์„ ํ•œ๋ฒˆ์— ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

ํ™•๋ฅ ์  ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•๊ณผ ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์˜ ์ดํ•ด

GD, Gradient Descent

  • ์ „์ฒด ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜

SGD, Stochastic GD

  • ์ „์ฒด ํ•™์Šต ๋ฐ์ดํ„ฐ ์ค‘ ํ•œ ๊ฑด๋งŒ ์ž„์˜๋กœ ์„ ํƒ

    • ๋งค์šฐ ํฐ ๋ฐ์ดํ„ฐ ์ค‘ ํ•˜๋‚˜๋งŒ ํƒํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์˜์•„ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์˜์™ธ๋กœ ์กฐ๊ธˆ์€ ์„ฑ๋Šฅ์ด ๋‚˜์˜จ๋‹ค.

Mini - Batch GD

  • ์ „์ฒด ํ•™์Šต ๋ฐ์ดํ„ฐ ์ค‘ ํŠน์ • ํฌ๊ธฐ ๋งŒํผ ์ž„์˜๋กœ ์„ ํƒ

ํ™•๋ฅ ์  ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• ๊ตฌํ˜„ํ•˜๊ธฐ

batch_indexes = np.random.choice(target.shape[0], batch_size)
  • np.random.choice ๋Š” ๋ฌด์ž‘์œ„๋กœ ์ผ์ •ํฌ๊ธฐ๋งŒํผ ์„ ํƒํ•˜๋Š” ํ•จ์ˆ˜

  • ์ด ๋•Œ batch_size ๊ฐ€ 1์ด๋ฉด SGD, 2 ์ด์ƒ์ด๋ฉด Mini - Batch ์ด๋‹ค.

๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• ๊ตฌํ˜„ํ•˜๊ธฐ

  • ์ถ”๊ฐ€์ ์œผ๋กœ BATCH_SIZE ๋งŒํผ์„ ์ง€์ •ํ•ด์ฃผ๋Š” ๊ฒƒ ์ด์™ธ์—๋Š” ํ™•๋ฅ ์  ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•๊ณผ ์ฐจ์ด๊ฐ€ ์—†๋‹ค

  • Keras๋Š” ๋ฐ˜๋“œ์‹œ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ์ ์šฉํ•˜๋ฉฐ ๊ธฐ๋ณธ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋Š” 32์ด๋‹ค.

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์˜ ์ฃผ์š” ๋ฌธ์ œ

Learning Rate

  • ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ์ˆ˜๋ ดํ•˜๋Š”๋ฐ ์˜ค๋žœ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆผ

  • ๋„ˆ๋ฌด ํฌ๋ฉด ์ฐพ์ง€ ๋ชปํ•˜๊ฑฐ๋‚˜ ๋ฐœ์‚ฐ๋จ

์ „์—ญ ์ตœ์†Œ์ , ๊ตญ์†Œ ์ตœ์†Œ์ 

  • ๋ชจ๋“  ๋น„์šฉ ํ•จ์ˆ˜๊ฐ€ Convex์™€ ๊ฐ™์€ ์ด์ƒ์ ์ธ ํ•จ์ˆ˜๊ฐ€ ๋˜์ง€ ์•Š์Œ

Last updated

Was this helpful?