{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Name:  **Your name here**  \n",
    "UID:  **Your student ID num here**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Homework 3:  Gradients \n",
    "\n",
    "For this assignment, you'll need the results of Homework 2.  We'll import your homework solutions.  Put the file `notebook_importer.py` into the directory with your Homework 3 file.  Then run the following cell.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from grad2d import grad2d, divergence2d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, setup the environment with a bunch of functions you'll need."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Setup the environment - do not modify this cell, but run it before anything else\n",
    "import numpy as np\n",
    "from numpy import sqrt, sum, abs, max, maximum, logspace, exp, log, log10, zeros\n",
    "from numpy.random import randn, normal, choice\n",
    "from numpy.linalg import norm\n",
    "import scipy\n",
    "from scipy.linalg import orth\n",
    "import urllib, io\n",
    "import matplotlib.pyplot as plt\n",
    "np.random.seed(0)\n",
    "def good_job(path):\n",
    "    f = urllib.request.urlopen(path)\n",
    "    a = plt.imread(io.BytesIO(f.read()))\n",
    "    fig = plt.imshow(a)\n",
    "    fig.axes.get_xaxis().set_visible(False)\n",
    "    fig.axes.get_yaxis().set_visible(False)\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Some useful tools\n",
    "Run the following code.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def buildmat(m,n,cond_number):\n",
    "    \"\"\"Build an mxn matrix with condition number cond.\"\"\"\n",
    "    if m<=n:\n",
    "        U = randn(m,m);\n",
    "        U = orth(U);\n",
    "        Vt = randn(n, m);\n",
    "        Vt = orth(Vt).T;\n",
    "        S = 1/logspace(0,log10(cond_number),num=m);\n",
    "        return (U*S[:,None]).dot(Vt)\n",
    "    else:\n",
    "        return buildmat(n,m,cond_number).T\n",
    "    \n",
    "def create_classification_problem(num_data, num_features, cond_number):\n",
    "    \"\"\"Build a simple classification problem.\"\"\"\n",
    "    X = buildmat(num_data, num_features, cond_number)\n",
    "    # The linear dividing line between the classes\n",
    "    w =  randn(num_features,1)\n",
    "    # create labels\n",
    "    prods = X@w\n",
    "    y = np.sign(prods)\n",
    "    #  mess up the labels on 10% of data\n",
    "    flip = choice(range(num_data),int(num_data/10))\n",
    "    y[flip] = -y[flip]\n",
    "    #  return result\n",
    "    return X,y\n",
    "# Visualize this classification problem\n",
    "X,y = create_classification_problem(100, 2, 5)\n",
    "y = y.ravel()\n",
    "plt.scatter(X[:,0][y>0], X[:,1][y>0], color='r')\n",
    "plt.scatter(X[:,0][y<0], X[:,1][y<0], color='g')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 1 - Gradient checker\n",
    "Write a method for testing whether the function `grad` generates the gradient of `f`.  Do this by generating a random perturbation $\\delta$ and then testing whether\n",
    "  $$\\frac{f(x+\\delta) -f(x-\\delta)}{2} \\approx  \\delta^\\top \\nabla f(x).$$\n",
    "  The method should generate a random Gaussian $\\delta$, check the gradient condition, and then replace $\\delta \\gets \\delta/10.$  Do this for 10 different orders of magnitude of $\\delta.$ For each order, compute the **relative** error between the left and right side of the above equation.  Finally, print the minimum relative error achieved.  All of your print statement should label what they are printing, i.e., don't output random/unlabeled numbers to the console.  The method returns `True` if the gradient is correct up to 1 part in 1 million, and `False` otherwise.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def check_gradient(f, grad, x):\n",
    "    ## Your code here!\n",
    "    return min_error < 1e-6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Now, run this unit test**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This test should pass\n",
    "f    = lambda x: 0.5*x.T@x\n",
    "grad = lambda x: x\n",
    "x = randn(10,1)\n",
    "did_pass = check_gradient(f, grad, x)\n",
    "assert did_pass, \"Test should have passed, but failed\"\n",
    "# This test should fail\n",
    "grad = lambda x: x+1e-5\n",
    "did_pass = check_gradient(f, grad, x)\n",
    "assert not did_pass, \"Test should have failed, but passed\"\n",
    "\n",
    "print(\"Tests passed!  Your gradient checker is like totally awesome!\")\n",
    "good_job(\"https://www.cs.umd.edu/~tomg/img/important_memes/congrats_work.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 2 - Logistic regression\n",
    "Write a routine that evaluates the logistic loss function\n",
    "$$L(z) = \\sum_i \\ln(1+e^{-z_i}).$$\n",
    "Then, write a routine for evaluating the logistic regression objective function\n",
    "$$f(w) = L(YXw)$$\n",
    "where $Y$ is a diagonal matrix of labels, $X$ is matrix of training data, and $w$ is the slope vector.\n",
    "\n",
    "Your implementation must satisfy these criteria:\n",
    "\n",
    "-  You cannot use ANY `for` loops, or any other loops.\n",
    "-  You may not explicitly form the matrix $Y.$ You may only store the vector $y.$ \n",
    "-  You can **never** exponentiate a positive number.  In other words, you can't evaluate $e^z$ for $z>0.$   Computing $e^z$ is dangerous because you get `NaN` when $z$ is big, and this will crash your code. \n",
    "-  Your `logistic_loss` routine can be at most 6 (short) lines of code (excluding return line and signature).\n",
    "-  Your `logreg_objective` can be at most 2 (short) lines of code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def logistic_loss(z):\n",
    "    \"\"\"Return sum(log(1+exp(-z))). Your implementation can NEVER exponentiate a positive number.  No for loops.\"\"\"\n",
    "    # Your code here\n",
    "    return # return a scalar\n",
    "\n",
    "def logreg_objective(w,X,y):\n",
    "    \"\"\"Evaluate the logistic regression loss function on the data and labels, where the rows of D contain \n",
    "    feature vectors, and y is a 1D vector of +1/-1 labels.\"\"\"\n",
    "    # Your code here\n",
    "    return # return a scalar"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Now run this unit test...**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test logistic_loss using a for loop implementation\n",
    "X, Y = create_classification_problem(100, 10, 10)\n",
    "w = randn(10,1)\n",
    "output = logreg_objective(w,X,Y)\n",
    "loss = 0\n",
    "for x,y in zip(X,Y):\n",
    "    z = y*(x@w)\n",
    "    loss += log(1+exp(-z))\n",
    "assert abs(output-loss)<1e-10, 'Test FAILED:  your loss is incorrect'\n",
    "\n",
    "output = logreg_objective(1e9*w,X,Y)\n",
    "assert np.isfinite(output), \"Test FAILED:  Your routine is not numerically stable.  Maybe you exponentiated a positive number?\"\n",
    "\n",
    "print('Test PASSED!  Your logistic loss works!')\n",
    "good_job(\"https://www.cs.umd.edu/~tomg/img/important_memes/dog.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Now write a routine that produces the gradient of the logistic loss, and logistic objective.**  No `for` loops.  These routines should be short.  Remember, don't exponentiate a positive number."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def logistic_loss_grad(z):\n",
    "    \"\"\"Gradient of logistic loss\"\"\"\n",
    "    # Your code here\n",
    "    return # return a vector\n",
    "\n",
    "def logreg_objective_grad(w,X,y):\n",
    "    # Your code here\n",
    "    return # return a vector"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Run this routine to check that your gradients are correct.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test the logistic gradient accuracy\n",
    "z = randn(10,1)\n",
    "did_pass = check_gradient(logistic_loss, logistic_loss_grad, z)\n",
    "assert did_pass, \"Incorrect gradient for logistic_loss\"\n",
    "\n",
    "# Test the logistic gradient stability\n",
    "lossgrad = logistic_loss_grad(z*1e9)\n",
    "assert np.alltrue(np.isfinite(lossgrad)), \"FAILED: Logistic gradient is unstable.  Did you exponentiate a positive number?\"\n",
    "\n",
    "# Test the logreg objective gradient accuracy\n",
    "X, y = create_classification_problem(100, 10, 10)\n",
    "f = lambda w: logreg_objective(w,X,y)\n",
    "grad = lambda w: logreg_objective_grad(w,X,y)\n",
    "w = randn(10,1)\n",
    "did_pass = check_gradient(f, grad, w)\n",
    "assert did_pass, \"Incorrect gradient for logistic_loss\"\n",
    "\n",
    "print(\"Tests passed!  Your logistic gradients are perfect!\")\n",
    "good_job(\"https://www.cs.umd.edu/~tomg/img/important_memes/hey.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 3 - TV\n",
    "Implement the total-variation denoising objective\n",
    "$$\\mu |\\nabla_d x| + \\frac{1}{2}\\|x-b\\|^2$$\n",
    "where $\\mu$ is an arbitrary scalar, $\\nabla_d$ denotes the discrete 2d gradient, and $b$ is a noisy image.  The symbol $\\nabla_d$ is the discrete 2D gradient operator, and produces all the first-order differences between adjacent pixels.\n",
    "\n",
    "Because the $\\ell_1$ norm is non-differentiable, replace it with it's hyperbolic regularization $|z| \\approx \\sum_i\\sqrt{z_i^2+\\epsilon^2}.$ \n",
    "\n",
    "You must use the grad2d routine from Homework 2.  It was imported by the first code cell in this assignment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def h(z, eps=.01):\n",
    "    \"\"\"The hyperbolic approximation to L1\"\"\"\n",
    "   # Your code here\n",
    "    return # return a scalar\n",
    "\n",
    "def tv_denoise_objective(x,mu,b):\n",
    "    # Your code here\n",
    "    return # return a scalar\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Now run this routine to check that your method runs**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a simple test image\n",
    "b = zeros((100,100))\n",
    "b[25:75,25:75] = 2\n",
    "x = zeros((100,100))\n",
    "mu = 1\n",
    "# Evaluate the loss\n",
    "tvobj = tv_denoise_objective(x,mu,b)\n",
    "assert tvobj==5200.0, \"FAILED!  Your TV objective is incorrect.\"\n",
    "print('Test PASSED!')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Now implement these gradient routines.**  Use the divergence2d method from Homework 2. Each routine should be a few lines of code (in my solution each routine is only 1 line)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def h_grad(z, eps=.01):\n",
    "    \"\"\"The gradient of h\"\"\"\n",
    "    # Your code here\n",
    "    return # return an array\n",
    "\n",
    "def tv_denoise_grad(x,mu,b):\n",
    "    \"\"\"The gradient of the TV objective\"\"\"\n",
    "    # Your code here\n",
    "    return # return an array"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Now run these unit tests**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b = randn(100,100)\n",
    "x = randn(100,100)\n",
    "mu = 1\n",
    "f = lambda x: tv_denoise_objective(x,mu,b)\n",
    "grad = lambda x: tv_denoise_grad(x,mu,b)\n",
    "did_pass = check_gradient(f,grad,x)\n",
    "assert did_pass, \"FAILED:  Your gradient operator is no good\"\n",
    "\n",
    "good_job(\"https://www.cs.umd.edu/~tomg/img/important_memes/you_rock.png\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}