Activation functions determine how the outputs produced in the total function of neurons should undergo. The activation functions we usually use in our artificial neural network (YSA) models are nonlinear functions. Nonlinear activation functions increase the complexity level of our models. In this way, our YSA models learn more about complex structures on data sets and deliver more successful results. It finds more effective solutions to complex real-life problems.
Another task of activation functions is to compress the output of neurons between certain values. Although it varies from function to function, this value is such as [-1,1], [0.1]. These values show the effect of neurons on the output we obtain from the model. Neurons that value close to 0 or 0 do not affect the forecast value much, while those that value close to 1 or -1 contribute more to the estimate.
Features that Should Be in Activation Functions
There are some features that a good activation function should have.
1 -) Nonlinear: As mentioned in the previous paragraphs, activation functions should not be linear in order to find solutions to more complex problems.
2 -) Derivability: Activation functions must be continuous and with it, first-degree derivatives should be obtained. In our next articles, we will talk in more detail about the necessity of derivatives in optimization techniques.
3 -) December: For more effective model training, we want the value of activation functions to be within certain ranges, not forever.
4 -) Monotony: The monotony of activation functions means that there will be minimum or maximum points, which is an indication that learning will happen.
5 -) Convergent by Origin: When activation functions converge according to origin, learning occurs when the initial weights of YSA models are randomly assigned as small values. Otherwise, the weights of the model should be assigned with different methods.
Types of Activation Functions
As we mentioned in our previous article, artificial neural networks consist of three types of layers: input layer, hidden layers and output layers. Activation functions are not used in this layer because the input layer directly transfers the training values we receive from the data set to the hidden layers. Activation functions are used in hidden layers and output layer. Usually hidden layers use the same activation function. For the output layer, the type of prediction made when determining the activation function is important. First, let's examine the activation functions for the output layer.
Activation Functions Used in Output Layers
Artificial neural networks have three types of forecast types: regression, classification and multiple classification. We can compare regression prediction to cause-and-effect relationship. For example, we can give the properties of any house to the YSA model and obtain the estimated price of the house as output. As an example of classification, we can give the pixel values of a cat image to the YSA model and, as an output, take the probability value of whether or not the cat is in this picture. In multiple classifications, we can print out the possibility of finding the cat and dog separately in a picture. According to our prediction, we choose our activation function.
It is used in regressions. Linear activation functions output input values unchanged.
from matplotlib import pyplot # linear function def linear(x): return x entries = [x for x in range(-10, 10)] out = [linear(x) for x in entries] pyplot.plot(entries, out) pyplot.show()
Used in single classifications. Compresses the incoming input value (0.1). The graph and formula of the function are as follows.
For example, if we give a picture of a cat to an YSA model and get a value of 0.85 in sigmoid function as output, we can consider the probability of being a cat in this picture as 85%. On the contrary, if we give it to our official model that is not a cat and get 0.05 as output, we can think of the probability of not having a cat in this picture as 99.95% or the probability of being a cat in the picture as 0.05%.
from math import exp from matplotlib import pyplot # sigmoid function def sigmoid(x): return 1.0 / (1.0 + exp(-x)) # entries entries = [x for x in range(-10, 10)] # They're out out = [sigmoid(x) for x in entries] # graffiti is ciz pyplot.plot(entries, out) pyplot.show()
Used in multiple classifications. Returns vectors as output. The formula is like the way it is.
The output vector holds the probability value of each class. The sum of these probability values is 1.0. For example, when we gave pictures of numbers 0 through 9 to our YSA model, output [0.03, If it gives the vector 0.07, 0.1, 0.04, 0.005, 0.001, 0.06, 0.67, 0.02, 0.004], we can say that there is a 67% probability of 7 in the picture.
import numpy as np # softmax function def softmax(x): return np.exp(x) / np.exp(x).sum() # entries entries = [1.0, 3.0, 2.0] # They're out out = softmax # Print the value of the irrationality print(out) # Print the sum of possibilities print(outs.sum())
Activation Functions Used in Hidden Layers
The most commonly used activation functions in hidden layers are ReLU, Sigmoid and Tanh. We mentioned sigmoid function in the functions used in the output layers. Then we can examine the ReLU and Tanh functions.
It is one of the most widely used activation functions in hidden layers. The formula and graph are as shown below.
If the inbound input value is greater than 0, it is returned as an unchanged output. It works like a linear function. If less than 0 is equal, it comes out as a direct 0.
from matplotlib import pyplot # ReLU function def ReLU(x): return max(0.0, x) # entries entries = [x for x in range(-10, 10)] # They're out They're out = [ReLU(x) for x in entries] # graffiti is ciz pyplot.plot(entries, out) pyplot.show()
It is very similar to sigmoid function. The only difference is that it is valued between sigmoid (0.1), while tanh function (-1.1) is valued. The formula and chart are as follows in the shape.
High input values give values close to 1 as output, while low input values give values close to -1.
How to Choose Activation Functions?
It is a little easier to choose the function that we will use in the output layers than the hidden layer because we use our appropriate activation function according to our prediction. In hidden layers, we choose the function we will use by trial and error method. We can train our model using different activation functions and move on with the function where we get the best performance.