📊 PYTHON + AI – MACHINE LEARNING 🧮 How Does the Machine Learn a Line? – The Mathematics Behind Gradient Descent
📰 Edição #53 — DICA PYTHON + MACHINE LEARNING - Previsão de Desgaste de Lâmina de Serra Circular Usando Random Forest e XGBoost
✨ Introduction
When we think of Artificial Intelligence, we often imagine complex algorithms capable of recognizing faces, predicting the weather, or automating human tasks. But at the heart of all these sophisticated models lies a simple yet powerful concept: learning through optimization. Among optimization methods, Gradient Descent is one of the most fundamental and widely used.
In this article, we will dive into this algorithm that allows AI to learn mathematical patterns – starting with the most classic example: learning the formula of a straight line. You will see how, through gradual adjustments in its parameters, AI can get closer to the ideal result, revealing the essence of supervised machine learning.
1. 🎯 Objective
To demonstrate, in a didactic, practical, and complete way, how an AI model learns a simple mathematical pattern (such as a line) using gradient descent, which adjusts its internal parameters to minimize error.
2. 🧠 Technique Concept
Gradient Descent is an optimization algorithm used to adjust the parameters of an AI model, such as weights and biases, to minimize the cost function (error). It calculates the slope (gradient) of the error function with respect to each parameter and updates them in the opposite direction to the gradient, reducing the error at each iteration.
🔍 What is it for?
🔧 Didactic summary: Gradient Descent is like going down a foggy mountain, taking small steps in the steepest direction until reaching the lowest valley (minimum).
3. 🧠 Core Concept
AI adjusts its internal parameters (weight and bias) to reduce the error between prediction and actual value. This is done through gradient descent, which “descends the slope of error” to find the best combination of parameters that brings the model closer to the true pattern.
4. 🔍 Understanding the Fundamentals of Machine Learning
A) 🧠 What is Gradient Descent?
It is an optimization algorithm that adjusts the model parameters to minimize error. It calculates the fastest descending direction (gradient) and updates values to approach the optimal solution.
B) 🪂 Why do we say it “descends the slope of error”?
Imagine the model's error as being at the top of a mountain. At each step, the algorithm calculates the slope and goes down in the steepest direction, gradually reducing the error until reaching the minimum.
C) 🎯 What are “guesses” (initial guesses)?
They are random values assigned to parameters at the start of training, as AI does not know the ideal value yet.
D) 🔢 What is a mathematical pattern?
It is a predictable relationship that can be modeled by a formula, such as lines, parabolas, or trigonometric functions. Machine learning consists of adjusting parameters until replicating this pattern with precision.
5. 🧮 Mathematical Formulas Involved
Loss function (MSE): MSE = (1/n) Σ(yi - y_pred_i)^2
Gradients: ∂L/∂w = (1/n) Σ(y_pred - y) * x ∂L/∂b = (1/n) Σ(y_pred - y)
6. 📝 Case Study
In this case study, we will implement a Python script for AI to learn the formula of the line y = 2x + 3 from simulated data.
📈 About the line formula:
The general equation of a line is y = mx + b, where:
Our goal is to show how gradient descent adjusts m and b so that the AI-modeled line overlaps the real line.
7. 💻 Complete Python Script
import numpy as np
import matplotlib.pyplot as plt
# 1️⃣ Generating input data
x = np.linspace(0, 10, 100)
y_real = 2 * x + 3
# 2️⃣ Initializing model parameters
w = np.random.randn()
b = np.random.randn()
lr = 0.01
# 3️⃣ Loss function (mean squared error)
def loss(y_pred, y_true):
return np.mean((y_pred - y_true)**2)
# 4️⃣ Training using gradient descent
for epoch in range(1000):
y_pred = w * x + b
error = y_pred - y_real
# Error gradients
dw = np.mean(error * x)
db = np.mean(error)
# Updating parameters
w -= lr * dw
b -= lr * db
# 5️⃣ Final result
print(f"Weight (w): {w:.2f}, Bias (b): {b:.2f}")
# 6️⃣ Visualization
plt.plot(x, y_real, label="Real Pattern (2x+3)")
plt.plot(x, w*x + b, '--', label="Learned Model")
plt.legend()
plt.title("How AI Learns a Pattern")
plt.xlabel("x")
plt.ylabel("y")
plt.grid(True)
plt.show()
8. 🧩 Detailed Explanation of the Code and Learning Moment
🔸 Generating data
x = np.linspace(0, 10, 100)
y_real = 2 * x + 3
Generates 100 x values between 0 and 10 and calculates y according to the line formula.
🔸 Initializing parameters
w = np.random.randn()
b = np.random.randn()
lr = 0.01
🔸 Loss function
def loss(y_pred, y_true):
return np.mean((y_pred - y_true)**2)
Calculates mean squared error between prediction and actual values.
🔸 Training loop
for epoch in range(1000):
y_pred = w * x + b
error = y_pred - y_real
dw = np.mean(error * x)
db = np.mean(error)
w -= lr * dw
b -= lr * db
🔸 Final result
print(f"Weight (w): {w:.2f}, Bias (b): {b:.2f}")
Displays final learned values, expected to approximate 2 (weight) and 3 (bias).
🔸 Visualization
Compares the real line with the model learned by AI, validating learning success.
9. 🔍 Analysis of the Results
The experiment aimed to demonstrate how AI learns the formula of a simple line using Gradient Descent, gradually adjusting its parameters to minimize error.
📈 Graph Interpretation
The graph titled “How AI Learns a Pattern” shows two curves:
🔑 Key Observations:
🖥️ Terminal (VSCode) Output Interpretation
According to the VSCode execution:
Peso (w): 2.03, Bias (b): 2.81
🔬 What does this mean?
💡 Technical Conclusion
✔️ Efficiency of Gradient Descent: The algorithm was able to quickly adjust the parameters (weight and bias) to values very close to the real ones, minimizing the error function (usually Mean Squared Error).
✔️ Essential Supervised Learning: This result proves the foundation of supervised machine learning: adjusting parameters to reduce the error between predictions and actual values.
✔️ Residual Error: The small residual difference is expected due to the learning rate and number of iterations defined. Fine-tuning (smaller learning rate or more epochs) could reduce this small error further.
✅ Didactic Summary
This example shows in practice how small successive parameter corrections lead AI to “discover” the true function underlying the data, translating the very essence of learning via optimization.
10. 🧩 Discussion on Local Minima vs Global Minimum
In real problems, the error surface is not always a simple parabola (as in linear regression). In deep neural networks or complex functions, there are multiple valleys (local minima) and one deeper valley (global minimum).
🔍 Illustrative example:
📈 Suggested graphic: [Insert or design a function graph with multiple local minima and the global minimum clearly marked for readers’ understanding.]
11. 🧩 Analysis of Learning Rate (lr) Impact
Choosing the right learning rate (lr) directly impacts training success:
✅ Ideal learning rate: smooth convergence to global minimum.
⚠️ Too low lr: very slow convergence, high computational cost.
❌ Too high lr: oscillations or divergence (overshooting).
🔍 Suggested exercise: Modify the lr value in the script and observe its effect on the model's learning curve.
12. 📌 When Is This Type of Learning Used?
This simple process is the foundation for more complex models such as:
• Deep Neural Networks
• Linear Classifiers
• Linear Regression
• Predictive models in marketing, sales, and industry
13. 🔎 Final Interpretation
AI does not memorize the data. It adjusts a mathematical function to approximate real results, allowing it to predict new values based on the learned pattern.
14. 🛠️ Real Applications
• Sales forecasting (e.g. revenue over time)
• Dynamic pricing • Fault diagnostics (e.g. linear part wear)
• Basic robot learning
15. 💬 Extra Tip
Replace the pattern y = 2x + 3 with any other (e.g. exponential, logarithmic, sinusoidal) and observe how the model learns.
🔍 Powerful exercise: understand how different patterns impact the learning process.
16. 📌 How to Use It in VSCode?
17. 🧠 Real-World Practice
Even in complex systems like product recommenders or search engines, the fundamental logic is the same: adjust model weights to minimize prediction error.
18. 📎 References
19. 📅 CTA – Follow & Connect
💼 LinkedIn & Newsletters: 👉 https://www.linkedin.com/in/izairton-oliveira-de-vasconcelos-a1916351/ 👉 https://www.linkedin.com/newsletters/scripts-em-python-produtividad-7287106727202742273 👉 https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7319069038595268608
💼 Company Page: 👉 https://www.linkedin.com/company/106356348/
💻 GitHub: 👉 https://github.com/IOVASCON