AI Mini Series: What is Backprogation and Gradient Dsents in Neural Networks Continued. Titelbild

AI Mini Series: What is Backprogation and Gradient Dsents in Neural Networks Continued.

AI Mini Series: What is Backprogation and Gradient Dsents in Neural Networks Continued.

Jetzt kostenlos hören, ohne Abo

Details anzeigen

Nur 0,99 € pro Monat für die ersten 3 Monate

Danach 9.95 € pro Monat. Bedingungen gelten.

Über diesen Titel

Briefing Document: Preparing Data for Neural Networks Executive Summary: This document outlines critical considerations for preparing training data, initial weights, and output targets for neural networks. Proper preparation is crucial for successful training and preventing issues like saturation (where learning stagnates) and the inability to learn due to zeroed values. The core idea is to keep values within a manageable range that aligns with the chosen activation function. Main Themes and Key Ideas: The Importance of Data Preparation: Neural networks are not inherently robust; successful training requires careful consideration of inputs, outputs, and initial weights."Not all attempts at using neural networks will work well, for many reasons. Some of those reasons can be addressed by thinking about the training data, the initial weights, and designing a good output scheme."Poor preparation can lead to ineffective learning and even hinder the network's ability to learn at all. Input Data Scaling: Problem: Large input values can cause the activation function (e.g. sigmoid) to become saturated which means its gradient becomes very small. A very small gradient reduces the ability of a network to learn. "A very flat activation function is problematic because we use the gradient to learn new weights...A tiny gradient means we’ve limited the ability to learn."Problem: Very small input values can also lead to problems, as computers lose accuracy when dealing with extremely small or large numbers.Solution: Rescale inputs to a small range, typically between 0.0 and 1.0. Some add a small offset (e.g., 0.01) to avoid zero values.Zero inputs are "troublesome because they kill the learning ability by zeroing the weight update expression by setting that o​j​= 0."The goal is to keep the input signals "well behaved" without either saturating the activation function or zeroing it out. Output Target Scaling: Problem: Target values outside the range of the activation function lead to saturation. Specifically, with a logistic (sigmoid) function, the output is limited to (0, 1) and it is asymptotic, never actually reaching 0 or 1."If we do set target values in these inaccessible forbidden ranges, the network training will drive ever larger weights in an attempt to produce larger and larger outputs which can never actually be produced by the activation function."Solution: Scale target output values to align with the possible outputs of the activation function. A common range for logistic functions is 0.01 to 0.99, avoiding the unattainable values of 0 and 1. Random Initial Weights: Problem: Large initial weights cause saturation by producing large signals into an activation function.Problem: Constant or zero initial weights prevent effective learning. Zeroed weights kill the input signal and thus the ability to update weights."Zero weights are even worse because they kill the input signal...That kills the ability to update the weights completely."Solution: Use small random weights to initialize the network.A basic approach is to use a range from -1.0 to +1.0.More Sophisticated Approach: A commonly used rule of thumb is to initialize weights using values sampled from a normal distribution with a mean of zero and a standard deviation that is the inverse of the square root of the number of incoming links into a node: “the weights are initialised randomly sampling from a range that is roughly the inverse of the square root of the number of links into a node.” This method takes into account how many input signals the node is receiving and adjusts the weight range accordingly, in order to “support keeping those signals well behaved as they are combined and the activation function applied”. Avoid Symmetry: Setting all initial weights to the same value, especially zero, would mean that all nodes would recieve the same signal, thus creating an undesirable symmetry that would result in the network not being able to properly learn, because all updates would be equal. "This symmetry is bad because if the properly trained network should have unequal weights (extremely likely for almost all problems) then you’d never get there." Key Takeaway: "Neural networks don’t work well if the input, output and initial weight data is not prepared to match the network design and the actual problem being solved."Saturation and zeroed values are the key issues to avoid during the process of data preparation. Key Recommendations: Scale Inputs: Rescale inputs to a small range such as 0.0 to 1.0 or 0.01 to 0.99 to prevent saturation and issues arising from extremely small values.Scale Outputs: Ensure target outputs match the range of the activation function. For a logistic sigmoid function a good range to use is 0.01 to 0.99.Randomize Initial Weights: Initialize weights with small random values, avoiding a constant or zero value. Use the more sophisticated method of a normal distribution with a mean of zero and a standard deviation ...
Noch keine Rezensionen vorhanden