{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Name: **Your name here**  \n",
    "UID: **Your student ID num here**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Optional Homework:  MCMC  \n",
    "In this homework you will create a loss function for a logistic regression.  Unlike your previous homeworks, where you \"solved\" for the optimal regression parameters using gradient optimization, in this assignment you create a confidence interval for the slope of the separation line between two classes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from utility import *\n",
    "import numpy as np\n",
    "from numpy.random import randn, rand\n",
    "import matplotlib.pyplot as plt\n",
    "np.random.seed(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a classification problem in two dimensions\n",
    "The two classes will be separated by the line\n",
    "  $$w^Tx = 0$$\n",
    "where $w$ is a 2-vector.  The slope of this line is given by $m=-w[0]/w[1]$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a matrix of data points and a vector of labels\n",
    "X, y = create_classification_problem(100, 2, cond_number=3)\n",
    "\n",
    "# Define the logistic loss function, and its gradient\n",
    "nll = lambda w: logreg_objective(w,X,y)\n",
    "\n",
    "# An initial guess of the minimizer (may not be close to center of distribution)\n",
    "# Note: I'm choosing a \"bad\" initial guess to produce burn-in samples for instructional purposes\n",
    "w_guess = np.array([[-10],[10]])  \n",
    "\n",
    "# Test the negative log likelihood function\n",
    "f = nll(w_guess)\n",
    "print('The NLL of the initial guess is ', f)\n",
    "ind = y.ravel()==1\n",
    "plt.scatter(X[ind,0], X[ind,1], color='blue')\n",
    "plt.scatter(X[~ind,0], X[~ind,1], color='red')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate many samples from the posterios distribution\n",
    "Note: the NLL function above generates $-\\log(p(w)).$ \n",
    "\n",
    "**You will have to fill in the formula for the acceptance probability, alpha.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iters = 5000 #  number of MCMC samples to draw\n",
    "sigma = 3   #  sigma for the Guassian proposal distribution\n",
    "\n",
    "# Counters to keep track of how many rejected and accepted proposals there have been \n",
    "reject_count=0;\n",
    "accept_count=0;\n",
    "\n",
    "# Arrays to store all the iterates be produced\n",
    "samps  = np.zeros((iters,2))   # The samples of w from the distribution\n",
    "slopes = np.zeros((iters,1))  # The slopes of the samples\n",
    "nlls   = np.zeros((iters,1))  # The NLL values of the samples\n",
    "\n",
    "# Run the Metropolis sampler \n",
    "w = w_guess\n",
    "for i in range(iters):\n",
    "    # Make a proposal\n",
    "    wp = w+sigma*randn(2,1) \n",
    "    \n",
    "    # The acceptance probability\n",
    "    alpha =  ######## FiLL IN THIS LINE OF CODE #######\n",
    "    \n",
    "    # Should you accept this sample?\n",
    "    if rand()<alpha:\n",
    "        w=wp;\n",
    "        accept_count = accept_count+1;\n",
    "    else:\n",
    "        reject_count=reject_count+1;\n",
    "        \n",
    "    # Record sample and associated NLL\n",
    "    samps[i,:] = w.T\n",
    "    nlls[i]    = nll(w)\n",
    "    \n",
    "print('Accepted proposals: ', accept_count)\n",
    "print('Rejected proposals: ', reject_count)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('NLL values')\n",
    "plt.plot(nlls)\n",
    "plt.show()\n",
    "\n",
    "print('Samples')\n",
    "plt.scatter(samps[:,0], samps[:,1])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Remove the burn-in samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "samps = samps[100:,:]\n",
    "nlls  = nlls[100:]\n",
    "\n",
    "print('NLL values')\n",
    "plt.plot(nlls)\n",
    "plt.show()\n",
    "\n",
    "print('Samples')\n",
    "plt.scatter(samps[:,0], samps[:,1])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Make a 95% credible interval for the slope of the decision line\n",
    "A \"credible\" interval is the Bayesian version of a confidence interval.  If the model weights are drawn from the posteriod distribution, which represents our uncertainty, then in what interval can we be 95% confident that the unknown slope lies?\n",
    "\n",
    "Recall that for a weight $w$ the corresponding slope is $-w[0]/w[1].$ Youn should hae to write about 5 lines of code to compute the lower and upper bounds."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# YOU CODE HERE\n",
    "\n",
    "lower = # YOUR CODE HERE\n",
    "upper = # YOUR CODE HERE\n",
    "\n",
    "print(f'The confidence interval is [{lower}, {upper}]')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Try tinkering with the $\\sigma$ value in the scripts above\n",
    "Then answer the following..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### What happens is sigma is too big?\n",
    "\n",
    "WRITE A SENTENCE OR TWO HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### What happens is sigma is too small?\n",
    "\n",
    "WRITE A SENTENCE OR TWO HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Why did you remove the burn-in samples? \n",
    "\n",
    "WRITE A SENTENCE OR TWO HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}