线性回归简介：从理论到应用

什么是回归>线性回归？

回归>线性回归是一种用于预测数值型结果的统计方法，它通过建立一个或多个自变量（输入特征）与因变量（输出目标）之间的线性关系模型来工作。在最简单的形式中，即简单回归>线性回归，仅涉及一个自变量和一个因变量，而多变量回归>线性回归则考虑了多个自变量。

数学表达

在回归>线性回归中，假设自变量 $x$ 和因变量 $y$ 之间存在线性关系，可以表示为：
$\theta_0 + \theta_1 x + \epsilon$
其中， $\theta_0$ 是截距项， $\theta_1$ 是斜率参数，描述了 $x$ 对 $y$ 的影响程度，而 $\epsilon$ 表示误差项，代表了未被模型捕捉到的变异性。

损失函数

为了找到最佳拟合直线，我们定义了一个损失函数，通常采用均方误差（Mean Squared Error, MSE）的形式：
$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$
这里， $h_\theta(x)$ 是模型的预测值， $y^{(i)}$ 是实际值， $m$ 是样本数量。

参数估计

为了最小化损失函数 $J(\theta)$ ，我们使用梯度下降法或正规方程等方法来更新参数 $\theta$ 。这些方法旨在寻找一组参数值，使得模型对训练数据的预测尽可能准确。

梯度下降法

梯度下降是一种迭代优化算法，它通过沿着损失函数的负梯度方向调整参数来逐步减少损失。更新规则如下：
$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta)$
其中， $\alpha$ 是学习率，控制每次更新的步长大小。回归>线性回归利用梯度下降法进行参数更新的公式推导如下：

定义损失函数:首先，定义回归>线性回归的损失函数（通常为均方误差，Mean Squared Error, MSE）：
$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$
其中：

$h_\theta(x^{(i)}) = \theta^T x^{(i)}$ 是模型的预测值。
$y^{(i)}$ 是实际值。
$m$ 是训练样本的数量。

求导:为了最小化损失函数 $J(\theta)$ ，我们需要对参数 $\theta$ 求导，并令导数等于0。具体来说，我们对每个参数 $\theta_j$ 偏导数：
$\frac{\partial}{\partial \theta_j} J(\theta) = \frac{\partial}{\partial \theta_j} \left( \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2 \right)$
计算偏导数:计算上述表达式的偏导数 $\frac{\partial}{\partial \theta_j} J(\theta) = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot \frac{\partial}{\partial \theta_j} h_\theta(x^{(i)})$
由于 $h_\theta(x^{(i)}) = \theta^T x^{(i)}$ ，所以： $\frac{\partial}{\partial \theta_j} h_\theta(x^{(i)}) = x_j^{(i)}$
因此： $\frac{\partial}{\partial \theta_j} J(\theta) = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}$
更新规则:为了最小化损失函数，我们沿着负梯度方向更新参数 $\theta_j$ ：
$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta)$
代入上面的偏导数结果： $\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}$
最终更新公式

整理得到最终的参数更新公式：
$\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}$

这就是回归>线性回归中使用梯度下降法进行参数更新的公式。

python">
import numpy as np
def linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:
    m, n = X.shape
    theta = np.zeros((n, 1))
    for _ in range(iterations):
        predictions = X @ theta
        errors = predictions - y.reshape(-1, 1)
        updates = X.T @ errors / m
        theta -= alpha * updates
    return np.round(theta.flatten(), 4)

正规方程

对于某些问题，可以直接求解最优参数，无需迭代过程。这可以通过解下面的正规方程完成：
$\theta = (X^T X)^{-1} X^T y$
这种方法适用于当特征数量不多且矩阵可逆时的情况。具体推导过程如下：
回归>线性回归参数更新公式的推导如下：

目标函数：回归>线性回归的目标是找到一组参数 $\theta$ ，使得预测值 $h_\theta(x)$ 与实际值 $y$ 之间的误差最小。通常使用均方误差（Mean Squared Error, MSE）作为损失函数：
$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$
其中， $h_\theta(x) = \theta^T x$ 是预测值， $x^{(i)}$ 和 $y^{(i)}$ 分别是第 $i$ 个训练样本的特征向量和目标值。
损失函数的矩阵形式：将上述公式转换为矩阵形式，设 $X$ 是特征矩阵（每一行是一个训练样本，每一列是一个特征）， $y$ 是目标值向量，则损失函数可以写为：
$J(\theta) = \frac{1}{2m} (X\theta - y)^T (X\theta - y)$
求导：为了找到使损失函数最小化的 $\theta$ ，需要对 $\theta$ 导，并令导数等于0。首先计算损失函数关于 $\theta$ 的导数：
$\nabla_\theta J(\theta) = \frac{1}{m} X^T (X\theta - y)$
设置导数为0：令导数等于0，得到：
$\frac{1}{m} X^T (X\theta - y) = 0$
化简得：
$X^T X \theta = X^T y$
解方程：假设 $X^T X$ 是非奇异的（即可逆），则可以通过两边同时左乘 $X^T X)^{-1}$ 得到：
$\theta = (X^T X)^{-1} X^T y$

这就是回归>线性回归中使用正规方程（Normal Equation）来求解参数 $\theta$ 的过程。

python">import numpy as np
def linear_regression_normal_equation(X: list[list[float]], y: list[float]) -> list[float]:
	# Your code here, make sure to round
	X = np.array(X)
	y = np.array(y).reshape(-1, 1)
	X_T = X.T
	theta = np.linalg.inv(X_T @ X) @ X_T @ y
	theta = np.round(theta, 4).flatten().tolist()
	return theta

实际应用

当然回归>线性回归也是可以捕捉到数据中的非线性关系的，只需要在数据中进行预处理即可。比如，当利用变量 $x_1, x_2)$ 预测房价 $y$ ，我们可以手动添加特征 $x_1^{2},x_1x_2$ 之类经过非线性变换后的变量，这样就可以让回归>线性回归学习到非线性关系了！更细致的回归>线性回归讲解可以参考Google Crash Courese 回归>线性回归