# Optimization Techniques

Gradient descent is the basis of many machine learning algorithms and artificial neural networks. Therefore, although more advanced versions of the gradient landing algorithm are available, it is important to know how it works to establish the basis and logic of the work.

Gradient means slope. In other words, the purpose of the gradient landing algorithm is to reduce the slope values of our loss functions to the minimum point of our function. To understand this situation, let's examine the parabola f(x) = x^4 – 5x^3 + x^2 + 21x – 18 parabolas, which is a simpler function than our lost functions. You can see the graph of the function below.

The minimum value of our function is located at -32 x = -1. Our goal is to find the x value that minimizes the y value. Let's choose x = -2 as the starting point and our goal is to be able to land at -1. At the point x = -2, the value of our function is 0.

Now we must take a 1st degree derivative of our function and our result is f'(x) = 4x^3 – 15x^2 + 2x + 21. x = -2 point derivative value is -75. The sign of this value shows us the direction of the increase in function. If f'(-2) = -75 is x = -2 – we will observe an increase in our function as we go in the direction, we will observe a decrease in the + direction. Then we must always move in the opposite direction to our slope value, then our formula for updating our x value is as follows.

We update our x value in each iteration. According to the formula, we calculate our new x as 73. The value of our function at x = 73 is as great as 26 million 460 thousand. The error we made here was that we missed the point x = -1 by taking a very big step from the point x = -2 and achieved a greater value in our function.

To make such a mistake, we must move towards our goal with smaller but more confident steps. Then we are making a slight change to our formula and adding a learning rate coefficient to our f'(x). Our new formula is as follows.

The learning speed value is a parameter that we must choose when training artificial neural networks. Let's choose 0.01 for now. So if we go back to -2, according to our new formula, when we update our x value, we calculate our new x as -1.25. As you can see, we are moving towards our goal with more confident steps without missing x = -1. We calculate our f(-1.25) value as approximately -30.48 and f'(-1.25) = -12.75.

Instead of calculating the remaining steps manually, let's minimize our function by encoding them on python. Below you can review the code, you can also try it yourself with different start x values and different learning speed values. For example, if you start training at x = 6, you can be stuck with the local minimum at x = 3. When training artificial neural networks, local minimum points such as here may experience stuttering, and in our following articles we will mention the solutions that can be followed in such situations.

``````def function(x):
# f(x) = x^4 - 5x^3 + x^2 + 21x - 18
return pow(x,4) - 5* pow(x,3) + pow(x,2) + 21*x - 18

def turev(x):
# f'(x) = 4x^3 - 15x^2 + 2x + 21
return 4* pow(x,3) - 15* pow(x,2) + 2* x + 21

def guncelle(x,ogrenme_hizi):
return x - turev(x) * ogrenme_hizi

x = -2

iteration = 50
ogrenme_hizi = 0.01

print("X initial value : ", x, ", f({}) : ".format(x), function(x))

We update our x value throughout the #50 iteration
for i in range:
x = guncelle(x,ogrenme_hizi)

print(i +1,". iteration result x : ", x, " f({}) : ".format(x), function(x))``````

As for the artificial neural networks part of the work, the value of our loss function depends on the parameters of our model. These parameters are the weight values and threshold values of neurons. For example, there are a total of 41 parameters in the artificial neural network below.

In our example above, our function depended only on the x variable. That's why we were able to visualize our function in 2 dimensions, but it is impossible to perfectly visualize our 43-variable loss function in 2 dimensions or 3 sizes. Think about the complexity of models with millions of parameters. At this point, we must find each parameter value that minimizes our loss function. This process is calculated with the help of complex mathematical formulas that we cannot mention at this time, but the logic still works the same as the example we gave above. In this part of the business, deep learning libraries come to our rescue and take care of the optimization part of our model in the background.

In the gradient landing algorithm, optimization is performed after processing each instance on the training data set and calculating the lost value of the model. This process has some drawbacks. These disadvantages: the need to work very slowly and over-memory in large data sets. In addition to the gradient descent algorithm, there are also random gradient descent and mini-batch gradient descent algorithms.