programming python

Cách vẽ đồ thị hàm mất mát trong python

Vì một mạng thần kinh thường có nhiều tham số [hàng trăm triệu trở lên], bề mặt mất mát này sẽ nằm trong một không gian quá lớn để hình dung. Tuy nhiên, có một số thủ thuật mà chúng ta có thể sử dụng để có được bản chất hai chiều của nó và do đó có được nguồn trực giác quý giá. Tôi đã học được những điều này từ Visualizing the Loss Scene of Neural Nets của Li, et al. [arXiv]

Không thể phát video vì lý do nào đó. Nhấn vào đây để tải về một gif. trường hợp tuyến tính

Hãy bắt đầu với trường hợp hai chiều để có ý tưởng về những gì chúng ta đang tìm kiếm

Một nơ-ron duy nhất có một đầu vào tính toán \[y = w x + b\] và do đó chỉ có hai tham số. trọng số \[w\] cho đầu vào \[x\] and a bias \[b\]. Having only two parameters means we can view every dimension of the loss surface with a simple contour plot, the bias along one axis and the single weight along the other:

Không thể phát video vì lý do nào đó. Nhấn vào đây để tải về một gif. Vượt qua bối cảnh mất mát của mô hình tuyến tính với SGD

Sau khi huấn luyện mô hình tuyến tính đơn giản này, chúng ta sẽ có một cặp trọng số \[w_c\] và \[b_c\] that should be approximately where the minimal loss occurs – it’s nice to take this point \[[w_c, b_c]\] as the center of the plot. By collecting the weights at every step of training, we can trace out the path taken by SGD across the loss surface towards the minimum.

Mục tiêu của chúng tôi bây giờ là có được các loại hình ảnh tương tự cho các mạng với bất kỳ số tham số nào

Tạo các lát cắt ngẫu nhiên

Làm thế nào chúng ta có thể xem toàn cảnh mất mát của một mạng lớn hơn?

Lát cắt này về cơ bản là một hệ tọa độ. chúng ta cần một tâm [gốc tọa độ] và một cặp vectơ chỉ phương [trục]. Như trước đây, hãy lấy các trọng số \[W_c\] từ mạng đã huấn luyện để đóng vai trò là trung tâm và các vectơ chỉ phương mà chúng ta sẽ tạo ngẫu nhiên.

Bây giờ, tổn thất tại một số điểm \[[a, b]\] trên biểu đồ được lấy bằng cách đặt trọng số của mạng . Biểu diễn những tổn thất này trên một số phạm vi giá trị cho \[W_c + a W_0 + b W_1\] and evaluating it on the given data. Plot these losses across some range of values for \[a\] và \[b\], and we can produce our contour plot.

Các lát cắt ngẫu nhiên qua bề mặt mất chiều cao

Chúng tôi có thể lo lắng rằng biểu đồ sẽ bị bóp méo nếu các vectơ ngẫu nhiên mà chúng tôi chọn tình cờ ở gần nhau, mặc dù chúng tôi đã vẽ chúng như thể chúng nằm ở một góc vuông. Tuy nhiên, có một sự thật thú vị về không gian vectơ nhiều chiều là bất kỳ hai vectơ ngẫu nhiên nào bạn chọn từ chúng thường sẽ gần trực giao với nhau.

Đây là cách chúng tôi có thể thực hiện điều này

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import callbacks, layers

class RandomCoordinates[object]:
    def __init__[self, origin]:
        self.origin_ = origin
        self.v0_ = normalize_weights[
            [np.random.normal[size=w.shape] for w in origin], origin
        ]
        self.v1_ = normalize_weights[
            [np.random.normal[size=w.shape] for w in origin], origin
        ]

    def __call__[self, a, b]:
        return [
            a * w0 + b * w1 + wc
            for w0, w1, wc in zip[self.v0_, self.v1_, self.origin_]
        ]


def normalize_weights[weights, origin]:
    return [
        w * np.linalg.norm[wc] / np.linalg.norm[w]
        for w, wc in zip[weights, origin]
    ]


class LossSurface[object]:
    def __init__[self, model, inputs, outputs]:
        self.model_ = model
        self.inputs_ = inputs
        self.outputs_ = outputs

    def compile[self, range, points, coords]:
        a_grid = tf.linspace[-1.0, 1.0, num=points] ** 3 * range
        b_grid = tf.linspace[-1.0, 1.0, num=points] ** 3 * range
        loss_grid = np.empty[[len[a_grid], len[b_grid]]]
        for i, a in enumerate[a_grid]:
            for j, b in enumerate[b_grid]:
                self.model_.set_weights[coords[a, b]]
                loss = self.model_.test_on_batch[
                    self.inputs_, self.outputs_, return_dict=True
                ]["loss"]
                loss_grid[j, i] = loss
        self.model_.set_weights[coords.origin_]
        self.a_grid_ = a_grid
        self.b_grid_ = b_grid
        self.loss_grid_ = loss_grid

    def plot[self, range=1.0, points=24, levels=20, ax=None, **kwargs]:
        xs = self.a_grid_
        ys = self.b_grid_
        zs = self.loss_grid_
        if ax is None:
            _, ax = plt.subplots[**kwargs]
            ax.set_title["The Loss Surface"]
            ax.set_aspect["equal"]
        # Set Levels
        min_loss = zs.min[]
        max_loss = zs.max[]
        levels = tf.exp[
            tf.linspace[
                tf.math.log[min_loss], tf.math.log[max_loss], num=levels
            ]
        ]
        # Create Contour Plot
        CS = ax.contour[
            xs,
            ys,
            zs,
            levels=levels,
            cmap="magma",
            linewidths=0.75,
            norm=mpl.colors.LogNorm[vmin=min_loss, vmax=max_loss * 2.0],
        ]
        ax.clabel[CS, inline=True, fontsize=8, fmt="%1.2f"]
        return ax

Hãy thử nó ra. Chúng tôi sẽ tạo một mạng được kết nối đầy đủ đơn giản để khớp với một đường cong với hình parabol này

# Create some data
NUM_EXAMPLES = 256
BATCH_SIZE = 64
x = tf.random.normal[shape=[NUM_EXAMPLES, 1]]
err = tf.random.normal[shape=x.shape, stddev=0.25]
y = x ** 2 + err
y = tf.squeeze[y]
ds = [tf.data.Dataset
      .from_tensor_slices[[x, y]]
      .shuffle[NUM_EXAMPLES]
      .batch[BATCH_SIZE]]
plt.plot[x, y, 'o', alpha=0.5];

# Fit a fully-connected network [ie, a multi-layer perceptron]
model = keras.Sequential[[
  layers.Dense[64, activation='relu'],
  layers.Dense[64, activation='relu'],
  layers.Dense[64, activation='relu'],
  layers.Dense[1]
]]
model.compile[
  loss='mse',
  optimizer='adam',
]
history = model.fit[
  ds,
  epochs=200,
  verbose=0,
]

# Look at fitted curve
grid = tf.linspace[-4, 4, 3000]
fig, ax = plt.subplots[]
ax.plot[x, y, 'o', alpha=0.1]
ax.plot[grid, model.predict[grid].reshape[-1, 1], color='k']

Có vẻ như chúng ta đã có một sự phù hợp ổn, vì vậy bây giờ chúng ta sẽ xem xét một lát cắt ngẫu nhiên từ bề mặt mất mát

# Create loss surface
coords = RandomCoordinates[model.get_weights[]]
loss_surface = LossSurface[model, x, y]
loss_surface.compile[points=30, coords=coords]

# Look at loss surface
plt.figure[dpi=100]
loss_surface.plot[]

Cải thiện Chế độ xem

Để có được một biểu đồ tốt về đường dẫn mà các tham số thực hiện trong quá trình đào tạo cần thêm một mẹo nữa. Một đường đi qua một lát cắt ngẫu nhiên của cảnh quan có xu hướng hiển thị quá ít sự thay đổi để hiểu rõ quá trình đào tạo thực sự diễn ra như thế nào. Một chế độ xem đại diện hơn sẽ cho chúng ta thấy các hướng mà thông số có nhiều biến thể nhất. Nói cách khác, chúng tôi muốn hai thành phần chính đầu tiên của tập hợp các tham số do mạng giả định trong quá trình đào tạo

from sklearn.decomposition import PCA

# Some utility functions to reshape network weights
def vectorize_weights_[weights]:
    vec = [w.flatten[] for w in weights]
    vec = np.hstack[vec]
    return vec


def vectorize_weight_list_[weight_list]:
    vec_list = []
    for weights in weight_list:
        vec_list.append[vectorize_weights_[weights]]
    weight_matrix = np.column_stack[vec_list]
    return weight_matrix


def shape_weight_matrix_like_[weight_matrix, example]:
    weight_vecs = np.hsplit[weight_matrix, weight_matrix.shape[1]]
    sizes = [v.size for v in example]
    shapes = [v.shape for v in example]
    weight_list = []
    for net_weights in weight_vecs:
        vs = np.split[net_weights, np.cumsum[sizes]][:-1]
        vs = [v.reshape[s] for v, s in zip[vs, shapes]]
        weight_list.append[vs]
    return weight_list


def get_path_components_[training_path, n_components=2]:
    # Vectorize network weights
    weight_matrix = vectorize_weight_list_[training_path]
    # Create components
    pca = PCA[n_components=2, whiten=True]
    components = pca.fit_transform[weight_matrix]
    # Reshape to fit network
    example = training_path[0]
    weight_list = shape_weight_matrix_like_[components, example]
    return pca, weight_list


class PCACoordinates[object]:
    def __init__[self, training_path]:
        origin = training_path[-1]
        self.pca_, self.components = get_path_components_[training_path]
        self.set_origin[origin]

    def __call__[self, a, b]:
        return [
            a * w0 + b * w1 + wc
            for w0, w1, wc in zip[self.v0_, self.v1_, self.origin_]
        ]

    def set_origin[self, origin, renorm=True]:
        self.origin_ = origin
        if renorm:
            self.v0_ = normalize_weights[self.components[0], origin]
            self.v1_ = normalize_weights[self.components[1], origin]

Sau khi đã xác định những điều này, chúng ta sẽ huấn luyện một mô hình như trước nhưng lần này với một lệnh gọi lại đơn giản sẽ thu thập các trọng số của mô hình trong khi nó huấn luyện

# Create data
ds = [
    tf.data.Dataset.from_tensor_slices[[inputs, outputs]]
    .repeat[]
    .shuffle[1000, seed=SEED]
    .batch[BATCH_SIZE]
]


# Define Model
model = keras.Sequential[
    [
        layers.Dense[64, activation="relu", input_shape=[1]],
        layers.Dense[64, activation="relu"],
        layers.Dense[64, activation="relu"],      
        layers.Dense[1],
    ]
]

model.compile[
    optimizer="adam", loss="mse",
]

training_path = [model.get_weights[]]
# Callback to collect weights as the model trains
collect_weights = callbacks.LambdaCallback[
    on_epoch_end=[
        lambda batch, logs: training_path.append[model.get_weights[]]
    ]
]

history = model.fit[
    ds,
    steps_per_epoch=1,
    epochs=40,
    callbacks=[collect_weights],
    verbose=0,
]

Và bây giờ chúng ta có thể có một cái nhìn về bề mặt mất mát đại diện hơn cho nơi tối ưu hóa thực sự xảy ra

# Create loss surface
coords = PCACoordinates[training_path]
loss_surface = LossSurface[model, x, y]
loss_surface.compile[points=30, coords=coords, range=0.2]
# Look at loss surface
loss_surface.plot[dpi=150]

Vẽ sơ đồ đường dẫn tối ưu hóa

Tất cả những gì chúng tôi đang thiếu bây giờ là đường dẫn mà các trọng số của mạng thần kinh đã thực hiện trong quá trình đào tạo theo hệ tọa độ đã chuyển đổi. Đưa ra các trọng số \[W\] cho một mạng thần kinh, nói cách khác, chúng ta cần tìm các giá trị của \[a\] and \[b\] that correspond to the direction vectors we found via PCA and the origin weights \[W_c\].

\[W - W_c = a W_0 + b W_1\]

Chúng ta không thể giải quyết vấn đề này bằng một phép nghịch đảo thông thường [ma trận \[ \left[\begin{matrix} W_0 & W_1 \end{matrix} \right] \] . isn’t square], so instead we’ll use the Moore-Penrose pseudoinverse, which will give us a least-squares optimal projection of \[W\] onto the coordinate vectors:

\[\left[\begin{matrix} W_0 & W_1 \end{matrix}\right]^+ [W - W_c] = [a, b]\]

Đây là giải pháp bình phương nhỏ nhất thông thường cho phương trình trên

def weights_to_coordinates[coords, training_path]:
    """Project the training path onto the first two principal components
using the pseudoinverse."""
    components = [coords.v0_, coords.v1_]
    comp_matrix = vectorize_weight_list_[components]
    # the pseudoinverse
    comp_matrix_i = np.linalg.pinv[comp_matrix]
    # the origin vector
    w_c = vectorize_weights_[training_path[-1]]
    # center the weights on the training path and project onto components
    coord_path = np.array[
        [
            comp_matrix_i @ [vectorize_weights_[weights] - w_c]
            for weights in training_path
        ]
    ]
    return coord_path


def plot_training_path[coords, training_path, ax=None, end=None, **kwargs]:
    path = weights_to_coordinates[coords, training_path]
    if ax is None:
        fig, ax = plt.subplots[**kwargs]
    colors = range[path.shape[0]]
    end = path.shape[0] if end is None else end
    norm = plt.Normalize[0, end]
    ax.scatter[
        path[:, 0], path[:, 1], s=4, c=colors, cmap="cividis", norm=norm,
    ]
    return ax

Áp dụng những điều này vào đường dẫn đào tạo mà chúng tôi đã lưu có nghĩa là chúng tôi có thể vẽ chúng cùng với bối cảnh mất mát trong tọa độ PCA

Làm cách nào để vẽ sơ đồ mất xác thực trong Python?

Làm cách nào để vẽ đường cong xác thực trong Python? .

Nhập tập dữ liệu Digit và các thư viện cần thiết

Nhập chức năng đường cong xác thực để trực quan hóa

Chia tập dữ liệu thành huấn luyện và kiểm tra

Vẽ biểu đồ bằng matplotlib để phân tích xác nhận của mô hình

Làm cách nào để vẽ dữ liệu đào tạo trong Python?

Bước 1 - Nhập thư viện. nhập numpy dưới dạng np nhập matplotlib. pyplot dưới dạng plt từ sklearn. nhập đồng bộ RandomForestClassifier từ sklearn nhập bộ dữ liệu từ sklearn. model_selection nhập learning_curve. .

Bước 2 - Thiết lập dữ liệu. .

Bước 3 - Đường cong học tập và Điểm số. .

Bước 4 - Vẽ đồ thị đường cong học tập

Đầu ra của hàm mất mát là gì?

Hàm mất mát được sử dụng phổ biến nhất trong phân loại hình ảnh là mất mát entropy chéo/mất log [nhị phân để phân loại giữa 2 lớp và phân loại thưa thớt cho 3 lớp trở lên], trong đó đầu ra của mô hình a vector of probabilities that the input image belongs to each of the pre-set categories.

Làm cách nào để vẽ sơ đồ mất xác thực trong Python?

Làm cách nào để vẽ dữ liệu đào tạo trong Python?

Đầu ra của hàm mất mát là gì?

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề