Gradient Descent For Perceptron¶
Algorithm¶
Algorithm
\[
\begin{array}{l}
\text{Gradient Descent} \\
\\
L = \underset{w, w_2, b}{\text{argmin}} \\
\\
L = \frac{1}{n} \sum_{i=1}^{n} \max(0, -Y_i f(x_i)) \\
\text{where } f(x_i) = w_1 x_{i1} + w_2 x_{i2} + b \\
\\
\text{for } i \text{ in epoch} \\
\quad w_1 = w_1 + \eta \frac{\partial L}{\partial w_1} \\
\quad w_2 = w_2 + \eta \frac{\partial L}{\partial w_2} \\
\quad b = b + \eta \frac{\partial L}{\partial b}
\end{array}
\]
Partial Derivative
\[
\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial f(x_i)} \cdot \frac{\partial f(x_i)}{\partial w_1}
\]
-
Partial Derivative of Loss w.r.t. \(f(x_i)\) ---
\[ \frac{\partial L}{\partial f(x_i)} = \begin{cases} 0 & \text{if } 1 - Y_i f(x_i) \leq 0 \\ -Y_i & \text{if } 1 - Y_i f(x_i) > 0 \end{cases} \] -
Partial Derivative of \(f(x_i)\) w.r.t. \(w_1\) ---
\[ \frac{\partial f(x_i)}{\partial w_1} = \frac{\partial}{\partial w_1} (w_1 x_{i1} + w_2 x_{i2} + b) = x_{i1} \]
Similarly for the \(w_1\), \(w_2\), and bias (\(b\))
\[
\begin{array}{l}
\frac{\partial L}{\partial w_1} = \begin{cases} 0 & \text{if } Y_i f(x_i) \geq 0 \\ -Y_i x_{i1} & \text{if } Y_i f(x_i) < 0 \end{cases} \\
\\
\frac{\partial L}{\partial w_2} = \begin{cases} 0 & \text{if } Y_i f(x_i) \geq 0 \\ -Y_i x_{i2} & \text{if } Y_i f(x_i) < 0 \end{cases} \\
\\
\frac{\partial L}{\partial b} = \begin{cases} 0 & \text{if } Y_i f(x_i) \geq 0 \\ -Y_i & \text{if } Y_i f(x_i) < 0 \end{cases}
\end{array}
\]