📊 PYTHON + AI – MACHINE LEARNING
🧮 How Does the Machine Learn a Line? – The Mathematics Behind Gradient Descent

📊 PYTHON + AI – MACHINE LEARNING 🧮 How Does the Machine Learn a Line? – The Mathematics Behind Gradient Descent

📰 Edição #53 — DICA PYTHON + MACHINE LEARNING - Previsão de Desgaste de Lâmina de Serra Circular Usando Random Forest e XGBoost

Article content

✨ Introduction

When we think of Artificial Intelligence, we often imagine complex algorithms capable of recognizing faces, predicting the weather, or automating human tasks. But at the heart of all these sophisticated models lies a simple yet powerful concept: learning through optimization. Among optimization methods, Gradient Descent is one of the most fundamental and widely used.

In this article, we will dive into this algorithm that allows AI to learn mathematical patterns – starting with the most classic example: learning the formula of a straight line. You will see how, through gradual adjustments in its parameters, AI can get closer to the ideal result, revealing the essence of supervised machine learning.


1. 🎯 Objective

To demonstrate, in a didactic, practical, and complete way, how an AI model learns a simple mathematical pattern (such as a line) using gradient descent, which adjusts its internal parameters to minimize error.


2. 🧠 Technique Concept

Gradient Descent is an optimization algorithm used to adjust the parameters of an AI model, such as weights and biases, to minimize the cost function (error). It calculates the slope (gradient) of the error function with respect to each parameter and updates them in the opposite direction to the gradient, reducing the error at each iteration.

🔍 What is it for?

  • Finding the minimum point of a function (minimum error).
  • Adjusting linear and logistic regression models.
  • Training neural networks by optimizing millions of parameters.
  • Solving curve fitting problems in physics, engineering, and economics.

🔧 Didactic summary: Gradient Descent is like going down a foggy mountain, taking small steps in the steepest direction until reaching the lowest valley (minimum).


3. 🧠 Core Concept

AI adjusts its internal parameters (weight and bias) to reduce the error between prediction and actual value. This is done through gradient descent, which “descends the slope of error” to find the best combination of parameters that brings the model closer to the true pattern.


4. 🔍 Understanding the Fundamentals of Machine Learning

A) 🧠 What is Gradient Descent?

It is an optimization algorithm that adjusts the model parameters to minimize error. It calculates the fastest descending direction (gradient) and updates values to approach the optimal solution.


B) 🪂 Why do we say it “descends the slope of error”?

Imagine the model's error as being at the top of a mountain. At each step, the algorithm calculates the slope and goes down in the steepest direction, gradually reducing the error until reaching the minimum.


C) 🎯 What are “guesses” (initial guesses)?

They are random values assigned to parameters at the start of training, as AI does not know the ideal value yet.

  • Weight (w): defines the slope of the line (how much x influences y).
  • Bias (b): allows vertical shifting (e.g., +3 in y = 2x + 3).


D) 🔢 What is a mathematical pattern?

It is a predictable relationship that can be modeled by a formula, such as lines, parabolas, or trigonometric functions. Machine learning consists of adjusting parameters until replicating this pattern with precision.


5. 🧮 Mathematical Formulas Involved

Article content

Loss function (MSE): MSE = (1/n) Σ(yi - y_pred_i)^2

Gradients: ∂L/∂w = (1/n) Σ(y_pred - y) * x ∂L/∂b = (1/n) Σ(y_pred - y)


6. 📝 Case Study

In this case study, we will implement a Python script for AI to learn the formula of the line y = 2x + 3 from simulated data.

📈 About the line formula:

The general equation of a line is y = mx + b, where:

  • m (weight/w): slope coefficient, determines steepness.
  • b (bias): intercept coefficient, determines where it crosses the y-axis.

Our goal is to show how gradient descent adjusts m and b so that the AI-modeled line overlaps the real line.


7. 💻 Complete Python Script

import numpy as np
import matplotlib.pyplot as plt


# 1️⃣ Generating input data
x = np.linspace(0, 10, 100)
y_real = 2 * x + 3


# 2️⃣ Initializing model parameters
w = np.random.randn()
b = np.random.randn()
lr = 0.01

# 3️⃣ Loss function (mean squared error)
def loss(y_pred, y_true):
    return np.mean((y_pred - y_true)**2)

# 4️⃣ Training using gradient descent
for epoch in range(1000):
    y_pred = w * x + b
    error = y_pred - y_real
 
    # Error gradients
    dw = np.mean(error * x)
    db = np.mean(error)


    # Updating parameters
    w -= lr * dw
    b -= lr * db

# 5️⃣ Final result
print(f"Weight (w): {w:.2f}, Bias (b): {b:.2f}")


# 6️⃣ Visualization
plt.plot(x, y_real, label="Real Pattern (2x+3)")
plt.plot(x, w*x + b, '--', label="Learned Model")
plt.legend()
plt.title("How AI Learns a Pattern")
plt.xlabel("x")
plt.ylabel("y")
plt.grid(True)
plt.show()        

 8. 🧩 Detailed Explanation of the Code and Learning Moment

Article content

🔸 Generating data

x = np.linspace(0, 10, 100)
y_real = 2 * x + 3        

Generates 100 x values between 0 and 10 and calculates y according to the line formula.


🔸 Initializing parameters

w = np.random.randn()
b = np.random.randn()
lr = 0.01        

  • w and b start with random values.
  • lr (learning rate) controls the step size in gradient descent.


🔸 Loss function

def loss(y_pred, y_true):
    return np.mean((y_pred - y_true)**2)        

Calculates mean squared error between prediction and actual values.


🔸 Training loop

for epoch in range(1000):
    y_pred = w * x + b
    error = y_pred - y_real
    dw = np.mean(error * x)
    db = np.mean(error)
    w -= lr * dw
    b -= lr * db        

  • y_pred: current model prediction.
  • error: difference between prediction and actual.
  • dw/db: gradients (partial derivatives of error w.r.t. w and b).
  • w -= lr dw, b -= lr db: learning moment – updates w and b to reduce error.


🔸 Final result

print(f"Weight (w): {w:.2f}, Bias (b): {b:.2f}")        

Displays final learned values, expected to approximate 2 (weight) and 3 (bias).


🔸 Visualization

Compares the real line with the model learned by AI, validating learning success.


9. 🔍 Analysis of the Results

The experiment aimed to demonstrate how AI learns the formula of a simple line using Gradient Descent, gradually adjusting its parameters to minimize error.

📈 Graph Interpretation

Article content

The graph titled “How AI Learns a Pattern” shows two curves:

  • Blue solid line: represents the real pattern, the true function used as a reference, defined by the formula y = 2x + 3.
  • Orange dashed line: represents the model learned by the AI, that is, the line calculated by the algorithm after training with gradient descent.

🔑 Key Observations:

  • Visually, both lines almost overlap, indicating that the model has learned the x–y relationship very well.
  • The proximity between the real line and the learned line demonstrates that the residual error (difference between true and predicted values) is minimal after training.


🖥️ Terminal (VSCode) Output Interpretation

Article content

According to the VSCode execution:

Peso (w): 2.03, Bias (b): 2.81

🔬 What does this mean?

  • Weight (w = 2.03): indicates the slope of the line learned by AI. The true value is 2, and the model found 2.03, meaning an error of only +0.03, nearly perfect.
  • Bias (b = 2.81): indicates the intercept of the learned line (where it crosses the y-axis). The true value is 3, and the model found 2.81, an error of -0.18. This is an excellent result considering simple iterations without advanced regularization.


💡 Technical Conclusion

✔️ Efficiency of Gradient Descent: The algorithm was able to quickly adjust the parameters (weight and bias) to values very close to the real ones, minimizing the error function (usually Mean Squared Error).

✔️ Essential Supervised Learning: This result proves the foundation of supervised machine learning: adjusting parameters to reduce the error between predictions and actual values.

✔️ Residual Error: The small residual difference is expected due to the learning rate and number of iterations defined. Fine-tuning (smaller learning rate or more epochs) could reduce this small error further.


Didactic Summary

Article content

This example shows in practice how small successive parameter corrections lead AI to “discover” the true function underlying the data, translating the very essence of learning via optimization.


10. 🧩 Discussion on Local Minima vs Global Minimum

In real problems, the error surface is not always a simple parabola (as in linear regression). In deep neural networks or complex functions, there are multiple valleys (local minima) and one deeper valley (global minimum).

🔍 Illustrative example:

  • Local minimum: point where the function reaches a minimum, but it is not the lowest possible value.
  • Global minimum: the lowest value of error in the entire domain.

📈 Suggested graphic: [Insert or design a function graph with multiple local minima and the global minimum clearly marked for readers’ understanding.]


11. 🧩 Analysis of Learning Rate (lr) Impact

Choosing the right learning rate (lr) directly impacts training success:

Ideal learning rate: smooth convergence to global minimum.

⚠️ Too low lr: very slow convergence, high computational cost.

Too high lr: oscillations or divergence (overshooting).

🔍 Suggested exercise: Modify the lr value in the script and observe its effect on the model's learning curve.


12. 📌 When Is This Type of Learning Used?

This simple process is the foundation for more complex models such as:

• Deep Neural Networks

• Linear Classifiers

• Linear Regression

• Predictive models in marketing, sales, and industry


 13. 🔎 Final Interpretation

AI does not memorize the data. It adjusts a mathematical function to approximate real results, allowing it to predict new values based on the learned pattern.


14. 🛠️ Real Applications

• Sales forecasting (e.g. revenue over time)

• Dynamic pricing • Fault diagnostics (e.g. linear part wear)

• Basic robot learning


15. 💬 Extra Tip

Replace the pattern y = 2x + 3 with any other (e.g. exponential, logarithmic, sinusoidal) and observe how the model learns.

🔍 Powerful exercise: understand how different patterns impact the learning process.


16. 📌 How to Use It in VSCode?

  1. Create a file called ia_learning_line.py
  2. Paste the complete code
  3. Run it in the terminal with python ia_learning_line.py
  4. Observe the graph and learned values


17. 🧠 Real-World Practice

Even in complex systems like product recommenders or search engines, the fundamental logic is the same: adjust model weights to minimize prediction error.


18. 📎 References

  1. Andrew Ng – Coursera Machine Learning https://www.coursera.org/learn/machine-learning
  2. Deep Learning Book – Ian Goodfellow, Yoshua Bengio, Aaron Courville https://www.deeplearningbook.org/
  3. Towards Data Science Blog – Gradient Descent Simplified https://towardsdatascience.com/gradient-descent-for-machine-learning-5fdfdbec4a2f


19. 📅 CTA – Follow & Connect

💼 LinkedIn & Newsletters: 👉 https://www.linkedin.com/in/izairton-oliveira-de-vasconcelos-a1916351/ 👉 https://www.linkedin.com/newsletters/scripts-em-python-produtividad-7287106727202742273 👉 https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7319069038595268608

💼 Company Page: 👉 https://www.linkedin.com/company/106356348/

💻 GitHub: 👉 https://github.com/IOVASCON

 

 


To view or add a comment, sign in

Others also viewed

Explore topics