Activation Functions

Activation functions determine how the outputs produced in the total function of neurons should undergo. The activation functions we usually use in our artificial neural network (YSA) models are nonlinear functions. Nonlinear activation functions increase the complexity level of our models. In this way, our YSA models learn more about complex structures on data sets and deliver more successful results. It finds more effective solutions to complex real-life problems.

Another task of activation functions is to compress the output of neurons between certain values. Although it varies from function to function, this value is such as [-1,1], [0.1]. These values show the effect of neurons on the output we obtain from the model. Neurons that value close to 0 or 0 do not affect the forecast value much, while those that value close to 1 or -1 contribute more to the estimate.

Features that Should Be in Activation Functions

There are some features that a good activation function should have.

1 -) Nonlinear: As mentioned in the previous paragraphs, activation functions should not be linear in order to find solutions to more complex problems.

2 -) Derivability: Activation functions must be continuous and with it, first-degree derivatives should be obtained. In our next articles, we will talk in more detail about the necessity of derivatives in optimization techniques.

3 -) December: For more effective model training, we want the value of activation functions to be within certain ranges, not forever.

4 -) Monotony: The monotony of activation functions means that there will be minimum or maximum points, which is an indication that learning will happen.

5 -) Convergent by Origin: When activation functions converge according to origin, learning occurs when the initial weights of YSA models are randomly assigned as small values. Otherwise, the weights of the model should be assigned with different methods.

Types of Activation Functions

As we mentioned in our previous article, artificial neural networks consist of three types of layers: input layer, hidden layers and output layers. Activation functions are not used in this layer because the input layer directly transfers the training values we receive from the data set to the hidden layers. Activation functions are used in hidden layers and output layer. Usually hidden layers use the same activation function. For the output layer, the type of prediction made when determining the activation function is important. First, let's examine the activation functions for the output layer.

Activation Functions Used in Output Layers

Artificial neural networks have three types of forecast types: regression, classification and multiple classification. We can compare regression prediction to cause-and-effect relationship. For example, we can give the properties of any house to the YSA model and obtain the estimated price of the house as output. As an example of classification, we can give the pixel values of a cat image to the YSA model and, as an output, take the probability value of whether or not the cat is in this picture. In multiple classifications, we can print out the possibility of finding the cat and dog separately in a picture. According to our prediction, we choose our activation function.

Linear Function

It is used in regressions. Linear activation functions output input values unchanged.

from matplotlib import pyplot
 
# linear function
def linear(x):
	return x
 
entries = [x for x in range(-10, 10)]

out = [linear(x) for x in entries]

pyplot.plot(entries, out)
pyplot.show()

Sigmoid Function

Used in single classifications. Compresses the incoming input value (0.1). The graph and formula of the function are as follows.

Activation Functions

For example, if we give a picture of a cat to an YSA model and get a value of 0.85 in sigmoid function as output, we can consider the probability of being a cat in this picture as 85%. On the contrary, if we give it to our official model that is not a cat and get 0.05 as output, we can think of the probability of not having a cat in this picture as 99.95% or the probability of being a cat in the picture as 0.05%.

from math import exp
from matplotlib import pyplot
 
# sigmoid function
def sigmoid(x):
	return 1.0 / (1.0 + exp(-x))
 
# entries
entries = [x for x in range(-10, 10)]
# They're out
out = [sigmoid(x) for x in entries]
# graffiti is ciz
pyplot.plot(entries, out)
pyplot.show()

Softmax Function

Used in multiple classifications. Returns vectors as output. The formula is like the way it is.

Activation Functions

The output vector holds the probability value of each class. The sum of these probability values is 1.0. For example, when we gave pictures of numbers 0 through 9 to our YSA model, output [0.03, If it gives the vector 0.07, 0.1, 0.04, 0.005, 0.001, 0.06, 0.67, 0.02, 0.004], we can say that there is a 67% probability of 7 in the picture.


import numpy as np
 
# softmax function
def softmax(x):
	return np.exp(x) / np.exp(x).sum()
 
# entries
entries = [1.0, 3.0, 2.0]
# They're out
out = softmax
# Print the value of the irrationality
print(out)
# Print the sum of possibilities
print(outs.sum())

Activation Functions Used in Hidden Layers

The most commonly used activation functions in hidden layers are ReLU, Sigmoid and Tanh. We mentioned sigmoid function in the functions used in the output layers. Then we can examine the ReLU and Tanh functions.

ReLU Function

It is one of the most widely used activation functions in hidden layers. The formula and graph are as shown below.

Activation Functions

If the inbound input value is greater than 0, it is returned as an unchanged output. It works like a linear function. If less than 0 is equal, it comes out as a direct 0.

from matplotlib import pyplot
 
# ReLU function
def ReLU(x):
	return max(0.0, x)
 
# entries
entries = [x for x in range(-10, 10)]
# They're out
They're out
 = [ReLU(x) for x in entries]
# graffiti is ciz
pyplot.plot(entries, out)
pyplot.show()

Tanh Function

It is very similar to sigmoid function. The only difference is that it is valued between sigmoid (0.1), while tanh function (-1.1) is valued. The formula and chart are as follows in the shape.

Activation Functions

High input values give values close to 1 as output, while low input values give values close to -1.

How to Choose Activation Functions?

It is a little easier to choose the function that we will use in the output layers than the hidden layer because we use our appropriate activation function according to our prediction. In hidden layers, we choose the function we will use by trial and error method. We can train our model using different activation functions and move on with the function where we get the best performance.