Created
July 21, 2022 14:33
-
-
Save rhiskey/4869f54aed408309410b39e84aba7581 to your computer and use it in GitHub Desktop.
MathFund_NN_from_scratch_lesson.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "MathFund_NN_from_scratch_lesson.ipynb", | |
"provenance": [], | |
"collapsed_sections": [], | |
"authorship_tag": "ABX9TyObFXrTJuZu43gfAdeDbmLw", | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/rhiskey/4869f54aed408309410b39e84aba7581/mathfund_nn_from_scratch_lesson.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Простой класс полносвязанного слоя\n", | |
"Ранее вы узнали, что слой Dense реализует следующую входную трансфорацию, где W и b являются параметрами модели, а activation является поэлементной функцией (обычно relu, но это будет softmax для последнего слоя):\n", | |
"\n", | |
" output = activation(dot(W, input) + b)\n" | |
], | |
"metadata": { | |
"id": "-3PMy5G7rqgx" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"id": "EqbxZm3XrdAk" | |
}, | |
"outputs": [], | |
"source": [ | |
"import tensorflow as tf\n", | |
"\n", | |
"class NaiveDense:\n", | |
" def __init__(self, input_size, output_size, activation):\n", | |
" self.activation = activation\n", | |
"\n", | |
" w_shape = (input_size, output_size)\n", | |
" w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)\n", | |
" self.W = tf.Variable(w_initial_value)\n", | |
"\n", | |
" b_shape = (output_size,)\n", | |
" b_initial_value = tf.zeros(b_shape)\n", | |
" self.b = tf.Variable(b_initial_value)\n", | |
"\n", | |
" def __call__(self, inputs):\n", | |
" return self.activation(tf.matmul(inputs, self.W) + self.b)\n", | |
"\n", | |
" @property\n", | |
" def weights(self):\n", | |
" return [self.W, self.b]\n", | |
" \n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Простой класс последовательной модели Sequential\n", | |
"метод __call__() вызывает базовые слои на входных данных по порядку. " | |
], | |
"metadata": { | |
"id": "G6E3z9pKs9Yp" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"class NaiveSequential:\n", | |
" def __init__(self, layers):\n", | |
" self.layers = layers\n", | |
"\n", | |
" def __call__(self, inputs):\n", | |
" x = inputs\n", | |
" for layer in self.layers:\n", | |
" x = layer(x)\n", | |
" return x\n", | |
"\n", | |
" @property\n", | |
" def weights(self):\n", | |
" weights = []\n", | |
" for layer in self.layers:\n", | |
" weights += layer.weights\n", | |
" return weights" | |
], | |
"metadata": { | |
"id": "aVLfHW3KtG70" | |
}, | |
"execution_count": 2, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Модель Keras:" | |
], | |
"metadata": { | |
"id": "sVLu4CZRtLSR" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"model = NaiveSequential([\n", | |
" NaiveDense(input_size=28*28, output_size=512, activation=tf.nn.relu),\n", | |
" NaiveDense(input_size=512, output_size=10, activation=tf.nn.softmax)\n", | |
"])\n", | |
"assert len(model.weights) == 4" | |
], | |
"metadata": { | |
"id": "yrXzd2xPtMrC" | |
}, | |
"execution_count": 3, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Генератор бачей\n" | |
], | |
"metadata": { | |
"id": "H3O1VWfMtRhJ" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import math\n", | |
"\n", | |
"class BatchGenerator:\n", | |
" def __init__(self, images, labels, batch_size=128):\n", | |
" assert len(images) == len(labels)\n", | |
" self.index = 0\n", | |
" self.images = images\n", | |
" self.labels = labels\n", | |
" self.batch_size = batch_size\n", | |
" self.num_batches = math.ceil(len(images) / batch_size)\n", | |
"\n", | |
" def next(self):\n", | |
" images = self.images[self.index : self.index + self.batch_size]\n", | |
" labels = self.labels[self.index : self.index + self.batch_size]\n", | |
" self.index += self.batch_size\n", | |
" return images, labels" | |
], | |
"metadata": { | |
"id": "yj-MPXk0tQ5F" | |
}, | |
"execution_count": 4, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Запуск одного тренировочного шага\n", | |
"\n", | |
"\n", | |
"1. Вычислите прогнозы модели для изображений в баче.\n", | |
"2. Вычислите значение потерь для этих прогнозов, учитывая фактические метки.\n", | |
"3. Вычислите градиент потери по отношению к весу модели.\n", | |
"4. Переместите веса на небольшую величину в направлении, противоположном градиенту.\n" | |
], | |
"metadata": { | |
"id": "lNpmzj4mtclA" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"def one_training_step(model, images_batch, labels_batch):\n", | |
" with tf.GradientTape() as tape:\n", | |
" predictions = model(images_batch)\n", | |
" per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(\n", | |
" labels_batch, predictions)\n", | |
" average_loss = tf.reduce_mean(per_sample_losses)\n", | |
" gradients = tape.gradient(average_loss, model.weights)\n", | |
"\n", | |
" update_weights(gradients, model.weights)\n", | |
" return average_loss\n" | |
], | |
"metadata": { | |
"id": "yWTfeZ7AtupM" | |
}, | |
"execution_count": 5, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
" Самый простой способ реализовать функцию **update_weights** это вычесть `gradient * learning_rate` из каждого веса" | |
], | |
"metadata": { | |
"id": "2CvkHNcwuTCK" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"learning_rate = 1e-3\n", | |
"\n", | |
"def update_weights(gradients, weights):\n", | |
" for g, w in zip(gradients, weights):\n", | |
" w.assign_sub(g * learning_rate) # -=" | |
], | |
"metadata": { | |
"id": "Vn9FSY52uEr3" | |
}, | |
"execution_count": 6, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from tensorflow.keras import optimizers\n", | |
"\n", | |
"optimizer = optimizers.SGD(learning_rate=1e-3)\n", | |
"\n", | |
"def update_weights(gradients, weights):\n", | |
" optimizer.apply_gradients(zip(gradients, weights))" | |
], | |
"metadata": { | |
"id": "iR7YSw9Buxwh" | |
}, | |
"execution_count": 7, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"#Полный тренировочный цикл" | |
], | |
"metadata": { | |
"id": "-H5fs2-ju41G" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"def fit(model, images, labels, epochs, batch_size=128):\n", | |
" for epoch_counter in range(epochs):\n", | |
" print(f\"Epoch {epoch_counter}\")\n", | |
" batch_generator = BatchGenerator(images, labels)\n", | |
" for batch_counter in range(batch_generator.num_batches):\n", | |
" images_batch, labels_batch = batch_generator.next()\n", | |
" loss = one_training_step(model, images_batch, labels_batch)\n", | |
" if batch_counter % 100 == 0:\n", | |
" print(f\"loss at batch {batch_counter}: {loss:.2f}\")\n" | |
], | |
"metadata": { | |
"id": "DsPiE3aFvDKF" | |
}, | |
"execution_count": 8, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from tensorflow.keras.datasets import mnist\n", | |
"\n", | |
"(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n", | |
"\n", | |
"train_images = train_images.reshape((60000, 28*28))\n", | |
"train_images = train_images.astype(\"float32\") / 255\n", | |
"\n", | |
"test_images = test_images.reshape((10000, 28*28))\n", | |
"test_images = test_images.astype(\"float32\") / 255\n", | |
"\n", | |
"fit(model, train_images, train_labels, epochs=10, batch_size=128)" | |
], | |
"metadata": { | |
"id": "xHGhHpuxvGRR", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"outputId": "a7f7cffb-08f7-4f44-f809-e53f699595f7" | |
}, | |
"execution_count": 10, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Epoch 0\n", | |
"loss at batch 0: 5.54\n", | |
"loss at batch 100: 2.23\n", | |
"loss at batch 200: 2.19\n", | |
"loss at batch 300: 2.07\n", | |
"loss at batch 400: 2.21\n", | |
"Epoch 1\n", | |
"loss at batch 0: 1.90\n", | |
"loss at batch 100: 1.87\n", | |
"loss at batch 200: 1.80\n", | |
"loss at batch 300: 1.68\n", | |
"loss at batch 400: 1.81\n", | |
"Epoch 2\n", | |
"loss at batch 0: 1.57\n", | |
"loss at batch 100: 1.57\n", | |
"loss at batch 200: 1.48\n", | |
"loss at batch 300: 1.40\n", | |
"loss at batch 400: 1.49\n", | |
"Epoch 3\n", | |
"loss at batch 0: 1.31\n", | |
"loss at batch 100: 1.33\n", | |
"loss at batch 200: 1.21\n", | |
"loss at batch 300: 1.18\n", | |
"loss at batch 400: 1.27\n", | |
"Epoch 4\n", | |
"loss at batch 0: 1.11\n", | |
"loss at batch 100: 1.15\n", | |
"loss at batch 200: 1.02\n", | |
"loss at batch 300: 1.02\n", | |
"loss at batch 400: 1.11\n", | |
"Epoch 5\n", | |
"loss at batch 0: 0.97\n", | |
"loss at batch 100: 1.01\n", | |
"loss at batch 200: 0.88\n", | |
"loss at batch 300: 0.91\n", | |
"loss at batch 400: 0.99\n", | |
"Epoch 6\n", | |
"loss at batch 0: 0.86\n", | |
"loss at batch 100: 0.91\n", | |
"loss at batch 200: 0.78\n", | |
"loss at batch 300: 0.82\n", | |
"loss at batch 400: 0.90\n", | |
"Epoch 7\n", | |
"loss at batch 0: 0.78\n", | |
"loss at batch 100: 0.82\n", | |
"loss at batch 200: 0.70\n", | |
"loss at batch 300: 0.75\n", | |
"loss at batch 400: 0.83\n", | |
"Epoch 8\n", | |
"loss at batch 0: 0.71\n", | |
"loss at batch 100: 0.76\n", | |
"loss at batch 200: 0.64\n", | |
"loss at batch 300: 0.70\n", | |
"loss at batch 400: 0.78\n", | |
"Epoch 9\n", | |
"loss at batch 0: 0.66\n", | |
"loss at batch 100: 0.70\n", | |
"loss at batch 200: 0.60\n", | |
"loss at batch 300: 0.66\n", | |
"loss at batch 400: 0.74\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Оценка модели.\n" | |
], | |
"metadata": { | |
"id": "-0WYGTqhvN9x" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import numpy as np\n", | |
"\n", | |
"predictions = model(test_images)\n", | |
"predictions = predictions.numpy()\n", | |
"predicted_labels = np.argmax(predictions, axis = 1)\n", | |
"matches = predicted_labels == test_labels\n", | |
"print(f\"accuracy: {matches.mean():.2f}\")" | |
], | |
"metadata": { | |
"id": "LLftplLwvTwV", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"outputId": "11925b36-71e0-458e-a8cd-0516b94a1d0c" | |
}, | |
"execution_count": 11, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"accuracy: 0.82\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"" | |
], | |
"metadata": { | |
"id": "b8aatyESGAZv" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"#Итого\n", | |
"* ***Тензоры*** составляют основу современных систем машинного обучения. Они бывают разных разновидностей dtype, rank и shape.\n", | |
"\n", | |
"* Вы можете манипулировать числовыми тензорами с помощью ***тензорных операций*** (таких как сложение, тензорное произведение или элементное умножение), которые могут быть интерпретированы как кодирующие геометрические преобразования. В целом, все в глубоком обучении поддается геометрической интерпретации.\n", | |
"\n", | |
"* Модели глубокого обучения состоят из цепочек простых тензорных операций, параметризуемых ***весами***, которые сами по себе являются тензорами. Вес модели - это то место, где хранится ее \"знание\".\n", | |
"\n", | |
"* ***Обучение*** означает поиск набора значений для весов модели, который минимизирует функцию потерь для данного набора образцов тренировочных данных и их соответствующих целей.\n", | |
"\n", | |
"* Обучение происходит путем прогона случайных бачей образцов данных и их целей, а также вычисления градиента параметров модели по отношению к потере бача. Затем параметры модели перемещаются немного (величина хода определяется скоростью обучения) в противоположном направлении от градиента. Это называется ***мини-бач стохастическим градиентным спуском***.\n", | |
"\n", | |
"* Весь процесс обучения стал возможным благодаря тому факту, что все тензорные операции в нейронных сетях дифференцируемы, и, таким образом, можно применить цепное правило дифференцирования, чтобы найти градиентную функцию, отображающую текущие параметры и текущий батч данных в градиентное значение. Это ***алгоритм обратного распространения ошибки***.\n", | |
"\n", | |
"* Два ключевых понятия, которые вы будете часто видеть, - это потери (loss) и оптимизаторы (optimizer). Это две вещи, которые вам нужно определить, прежде чем начать подавать данные в модель.\n", | |
"\n", | |
"- ***Потеря (loss)***- это количество, которое вы попытаетесь свести к минимуму во время тренировки, поэтому она представляет собой показатель успеха задачи, которую вы пытаетесь решить.\n", | |
"\n", | |
"- ***Оптимизатор*** определяет точный способ, которым градиент потери будет использоваться для обновления параметров: например, это может быть оптимизатор RMSProp, SGD с импульсом и так далее." | |
], | |
"metadata": { | |
"id": "oZl1gbZ5veHC" | |
} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment