{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Missing data. \n",
"\n",
"Missing completely at random = MCAR\n",
"This is when the data is missing COMPLETELY, I MEAN TOTALLY, due to some variables that are not in the data. E.g. in the data set below the Sick data is missing because computers at the clinic had mold and in the remediation damaged the computers and some data. This data is missing = None becuase of reasons that are completely unrelated to our data.\n",
"\n",
"Question is anything really MCAR? Is anything truly random? Hyzenberg? \n"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Exposed to mold | \n",
" Sick | \n",
"
\n",
" \n",
" Student | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 1 | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 2 | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 3 | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 4 | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 5 | \n",
" True | \n",
" None | \n",
"
\n",
" \n",
" 6 | \n",
" True | \n",
" None | \n",
"
\n",
" \n",
" 7 | \n",
" True | \n",
" None | \n",
"
\n",
" \n",
" 8 | \n",
" True | \n",
" None | \n",
"
\n",
" \n",
" 9 | \n",
" True | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Exposed to mold Sick\n",
"Student \n",
"0 True True\n",
"1 True True\n",
"2 True True\n",
"3 True True\n",
"4 True True\n",
"5 True None\n",
"6 True None\n",
"7 True None\n",
"8 True None\n",
"9 True None"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"Student = pd.Series(range(0,10))\n",
"Mold = 10*[True]\n",
"Sick = 5*[True]\n",
"SickMissing = 5*[None]\n",
"\n",
"d = {'Student': Student, 'Exposed to mold': Mold,'Sick':Sick+SickMissing}\n",
"df = pd.DataFrame(data=d)\n",
"df.set_index('Student')\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Claim exposed to mold | \n",
" Test for mold exposure | \n",
" Sick due to mold exposure | \n",
"
\n",
" \n",
" Student | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" True | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 1 | \n",
" True | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 2 | \n",
" True | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 3 | \n",
" True | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 4 | \n",
" True | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 5 | \n",
" True | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
" 6 | \n",
" True | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
" 7 | \n",
" True | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
" 8 | \n",
" True | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
" 9 | \n",
" True | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Claim exposed to mold Test for mold exposure \\\n",
"Student \n",
"0 True True \n",
"1 True True \n",
"2 True True \n",
"3 True True \n",
"4 True True \n",
"5 True None \n",
"6 True None \n",
"7 True None \n",
"8 True None \n",
"9 True None \n",
"\n",
" Sick due to mold exposure \n",
"Student \n",
"0 True \n",
"1 True \n",
"2 True \n",
"3 True \n",
"4 True \n",
"5 None \n",
"6 None \n",
"7 None \n",
"8 None \n",
"9 None "
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#How about Missing at Random. Here the data is missing because of \"some\" reason that is related to our data\n",
"#set but not the data itself. \n",
"#E.g., below the Sick result was missing because the test for mold exposure was None/Inconclusive\n",
"\n",
"Student = pd.Series(range(0,10))\n",
"Mold = 10*[True]\n",
"TestForMoldResult = 5*[True]+5*[None]\n",
"\n",
"Sick = 5*[True]\n",
"SickMissing = 5*[None]\n",
"\n",
"d = {'Student': Student, 'Claim exposed to mold': Mold,'Test for mold exposure':TestForMoldResult,'Sick due to mold exposure':Sick+SickMissing}\n",
"df = pd.DataFrame(data=d)\n",
"df.set_index('Student')\n"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Exposed to mold | \n",
" Sick | \n",
"
\n",
" \n",
" Student | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 1 | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 2 | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 3 | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 4 | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 5 | \n",
" True | \n",
" None | \n",
"
\n",
" \n",
" 6 | \n",
" True | \n",
" None | \n",
"
\n",
" \n",
" 7 | \n",
" True | \n",
" None | \n",
"
\n",
" \n",
" 8 | \n",
" True | \n",
" None | \n",
"
\n",
" \n",
" 9 | \n",
" True | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Exposed to mold Sick\n",
"Student \n",
"0 True True\n",
"1 True True\n",
"2 True True\n",
"3 True True\n",
"4 True True\n",
"5 True None\n",
"6 True None\n",
"7 True None\n",
"8 True None\n",
"9 True None"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#How bout Missing not completely at Random. Here the data is missing due to the data itself.\n",
"#E.g., the data is missing because the students were too sick to get tested for being sick.\n",
"\n",
"Student = pd.Series(range(0,10))\n",
"Mold = 10*[True]\n",
"Sick = 5*[True]\n",
"SickMissing = 5*[None]\n",
"\n",
"d = {'Student': Student, 'Exposed to mold': Mold,'Sick':Sick+SickMissing}\n",
"df = pd.DataFrame(data=d)\n",
"df.set_index('Student')"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Shots | \n",
" Made | \n",
" % | \n",
"
\n",
" \n",
" Day | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 100 | \n",
" 50.0 | \n",
" 0.50 | \n",
"
\n",
" \n",
" 1 | \n",
" 100 | \n",
" 60.0 | \n",
" 0.60 | \n",
"
\n",
" \n",
" 2 | \n",
" 100 | \n",
" 50.0 | \n",
" 0.50 | \n",
"
\n",
" \n",
" 3 | \n",
" 100 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 100 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 5 | \n",
" 100 | \n",
" 75.0 | \n",
" 0.75 | \n",
"
\n",
" \n",
" 6 | \n",
" 100 | \n",
" 80.0 | \n",
" 0.80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Shots Made %\n",
"Day \n",
"0 100 50.0 0.50\n",
"1 100 60.0 0.60\n",
"2 100 50.0 0.50\n",
"3 100 NaN NaN\n",
"4 100 NaN NaN\n",
"5 100 75.0 0.75\n",
"6 100 80.0 0.80"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"\n",
"Day = pd.Series(range(0,7))\n",
"Shots = 7*[100]\n",
"Made = pd.Series([50,60,50,None,None,75,80])\n",
"Percentage = Made/Shots\n",
"\n",
"d = {'Day': Day, 'Shots': Shots,'Made':Made,'%':Percentage}\n",
"df = pd.DataFrame(data=d)\n",
"df.set_index('Day')"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6300000000000001"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#if we take the mean with missing data\n",
"df['%'].mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#here the data was missing because I didn't have a pencil and paper to record the data, what kind of \n",
"#missing data is this?\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Shots | \n",
" Made | \n",
" % | \n",
"
\n",
" \n",
" Day | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 100 | \n",
" 50.000000 | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 1 | \n",
" 100 | \n",
" 60.000000 | \n",
" 0.600000 | \n",
"
\n",
" \n",
" 2 | \n",
" 100 | \n",
" 50.000000 | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 3 | \n",
" 100 | \n",
" 58.333333 | \n",
" 0.583333 | \n",
"
\n",
" \n",
" 4 | \n",
" 100 | \n",
" 66.666667 | \n",
" 0.666667 | \n",
"
\n",
" \n",
" 5 | \n",
" 100 | \n",
" 75.000000 | \n",
" 0.750000 | \n",
"
\n",
" \n",
" 6 | \n",
" 100 | \n",
" 80.000000 | \n",
" 0.800000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Shots Made %\n",
"Day \n",
"0 100 50.000000 0.500000\n",
"1 100 60.000000 0.600000\n",
"2 100 50.000000 0.500000\n",
"3 100 58.333333 0.583333\n",
"4 100 66.666667 0.666667\n",
"5 100 75.000000 0.750000\n",
"6 100 80.000000 0.800000"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#if we interpolate we get \n",
"Day = pd.Series(range(0,7))\n",
"Shots = 7*[100]\n",
"Made = pd.Series([50,60,50,None,None,75,80])\n",
"Made = Made.interpolate()\n",
"Percentage = Made/Shots\n",
"\n",
"d = {'Day': Day, 'Shots': Shots,'Made':Made,'%':Percentage}\n",
"df = pd.DataFrame(data=d)\n",
"df.set_index('Day')"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6285714285714287"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#if we take the mean with interpolated data\n",
"df['%'].mean()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"#it went down!!!\n"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Shots | \n",
" Made | \n",
" % | \n",
"
\n",
" \n",
" Day | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 10 | \n",
" 5.0 | \n",
" 0.5 | \n",
"
\n",
" \n",
" 1 | \n",
" 10 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 10 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 10 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 10 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 5 | \n",
" 10 | \n",
" 3.0 | \n",
" 0.3 | \n",
"
\n",
" \n",
" 6 | \n",
" 10 | \n",
" 4.0 | \n",
" 0.4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Shots Made %\n",
"Day \n",
"0 10 5.0 0.5\n",
"1 10 NaN NaN\n",
"2 10 NaN NaN\n",
"3 10 NaN NaN\n",
"4 10 NaN NaN\n",
"5 10 3.0 0.3\n",
"6 10 4.0 0.4"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#less volume shooter\n",
"Day = pd.Series(range(0,7))\n",
"Shots = 7*[10]\n",
"Made = pd.Series([5,None,None,None,None,3,4])\n",
"Percentage = Made/Shots\n",
"\n",
"d = {'Day': Day, 'Shots': Shots,'Made':Made,'%':Percentage}\n",
"df = pd.DataFrame(data=d)\n",
"df.set_index('Day')"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.4000000000000001"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['%'].mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#perhaps it's reasonable after data analysis (association rules below) that we find that lower volume\n",
"#shooters don't record there made shots. Because the percentages will be lower. So why is data missing here?"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Shots | \n",
" Made | \n",
" % | \n",
"
\n",
" \n",
" Day | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 10 | \n",
" 5.0 | \n",
" 0.50 | \n",
"
\n",
" \n",
" 1 | \n",
" 10 | \n",
" 4.6 | \n",
" 0.46 | \n",
"
\n",
" \n",
" 2 | \n",
" 10 | \n",
" 4.2 | \n",
" 0.42 | \n",
"
\n",
" \n",
" 3 | \n",
" 10 | \n",
" 3.8 | \n",
" 0.38 | \n",
"
\n",
" \n",
" 4 | \n",
" 10 | \n",
" 3.4 | \n",
" 0.34 | \n",
"
\n",
" \n",
" 5 | \n",
" 10 | \n",
" 3.0 | \n",
" 0.30 | \n",
"
\n",
" \n",
" 6 | \n",
" 10 | \n",
" 4.0 | \n",
" 0.40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Shots Made %\n",
"Day \n",
"0 10 5.0 0.50\n",
"1 10 4.6 0.46\n",
"2 10 4.2 0.42\n",
"3 10 3.8 0.38\n",
"4 10 3.4 0.34\n",
"5 10 3.0 0.30\n",
"6 10 4.0 0.40"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#less volume shooter with interpolation\n",
"Day = pd.Series(range(0,7))\n",
"Shots = 7*[10]\n",
"Made = Made.interpolate()\n",
"Percentage = Made/Shots\n",
"\n",
"d = {'Day': Day, 'Shots': Shots,'Made':Made,'%':Percentage}\n",
"df = pd.DataFrame(data=d)\n",
"df.set_index('Day')"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.3999999999999999"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#marginal drop\n",
"df['%'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Brief intro into Association Rules\n",
"If you go to target and buy cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug there is an 87% chance that ... "
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],\n",
" ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],\n",
" ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],\n",
" ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],\n",
" ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Apple | \n",
" Corn | \n",
" Dill | \n",
" Eggs | \n",
" Ice cream | \n",
" Kidney Beans | \n",
" Milk | \n",
" Nutmeg | \n",
" Onion | \n",
" Unicorn | \n",
" Yogurt | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" False | \n",
" False | \n",
" False | \n",
" True | \n",
" False | \n",
" True | \n",
" True | \n",
" True | \n",
" True | \n",
" False | \n",
" True | \n",
"
\n",
" \n",
" 1 | \n",
" False | \n",
" False | \n",
" True | \n",
" True | \n",
" False | \n",
" True | \n",
" False | \n",
" True | \n",
" True | \n",
" False | \n",
" True | \n",
"
\n",
" \n",
" 2 | \n",
" True | \n",
" False | \n",
" False | \n",
" True | \n",
" False | \n",
" True | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 3 | \n",
" False | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
" True | \n",
" True | \n",
" False | \n",
" False | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" 4 | \n",
" False | \n",
" True | \n",
" False | \n",
" True | \n",
" True | \n",
" True | \n",
" False | \n",
" False | \n",
" True | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Apple Corn Dill Eggs Ice cream Kidney Beans Milk Nutmeg Onion \\\n",
"0 False False False True False True True True True \n",
"1 False False True True False True False True True \n",
"2 True False False True False True True False False \n",
"3 False True False False False True True False False \n",
"4 False True False True True True False False True \n",
"\n",
" Unicorn Yogurt \n",
"0 False True \n",
"1 False True \n",
"2 False False \n",
"3 True True \n",
"4 False False "
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from mlxtend.preprocessing import TransactionEncoder\n",
"\n",
"te = TransactionEncoder()\n",
"te_ary = te.fit(dataset).transform(dataset)\n",
"df = pd.DataFrame(te_ary, columns=te.columns_)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" support | \n",
" itemsets | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.8 | \n",
" (Eggs) | \n",
"
\n",
" \n",
" 1 | \n",
" 1.0 | \n",
" (Kidney Beans) | \n",
"
\n",
" \n",
" 2 | \n",
" 0.6 | \n",
" (Milk) | \n",
"
\n",
" \n",
" 3 | \n",
" 0.6 | \n",
" (Onion) | \n",
"
\n",
" \n",
" 4 | \n",
" 0.6 | \n",
" (Yogurt) | \n",
"
\n",
" \n",
" 5 | \n",
" 0.8 | \n",
" (Kidney Beans, Eggs) | \n",
"
\n",
" \n",
" 6 | \n",
" 0.6 | \n",
" (Onion, Eggs) | \n",
"
\n",
" \n",
" 7 | \n",
" 0.6 | \n",
" (Kidney Beans, Milk) | \n",
"
\n",
" \n",
" 8 | \n",
" 0.6 | \n",
" (Kidney Beans, Onion) | \n",
"
\n",
" \n",
" 9 | \n",
" 0.6 | \n",
" (Kidney Beans, Yogurt) | \n",
"
\n",
" \n",
" 10 | \n",
" 0.6 | \n",
" (Kidney Beans, Onion, Eggs) | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" support itemsets\n",
"0 0.8 (Eggs)\n",
"1 1.0 (Kidney Beans)\n",
"2 0.6 (Milk)\n",
"3 0.6 (Onion)\n",
"4 0.6 (Yogurt)\n",
"5 0.8 (Kidney Beans, Eggs)\n",
"6 0.6 (Onion, Eggs)\n",
"7 0.6 (Kidney Beans, Milk)\n",
"8 0.6 (Kidney Beans, Onion)\n",
"9 0.6 (Kidney Beans, Yogurt)\n",
"10 0.6 (Kidney Beans, Onion, Eggs)"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from mlxtend.frequent_patterns import apriori\n",
"\n",
"got = apriori(df, min_support=0.6, use_colnames=True)\n",
"got"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" antecedents | \n",
" consequents | \n",
" antecedent support | \n",
" consequent support | \n",
" support | \n",
" confidence | \n",
" lift | \n",
" leverage | \n",
" conviction | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" (Kidney Beans) | \n",
" (Eggs) | \n",
" 1.0 | \n",
" 0.8 | \n",
" 0.8 | \n",
" 0.8 | \n",
" 1.00 | \n",
" 0.00 | \n",
" 1.000000 | \n",
"
\n",
" \n",
" 1 | \n",
" (Eggs) | \n",
" (Kidney Beans) | \n",
" 0.8 | \n",
" 1.0 | \n",
" 0.8 | \n",
" 1.0 | \n",
" 1.00 | \n",
" 0.00 | \n",
" inf | \n",
"
\n",
" \n",
" 2 | \n",
" (Onion) | \n",
" (Eggs) | \n",
" 0.6 | \n",
" 0.8 | \n",
" 0.6 | \n",
" 1.0 | \n",
" 1.25 | \n",
" 0.12 | \n",
" inf | \n",
"
\n",
" \n",
" 3 | \n",
" (Milk) | \n",
" (Kidney Beans) | \n",
" 0.6 | \n",
" 1.0 | \n",
" 0.6 | \n",
" 1.0 | \n",
" 1.00 | \n",
" 0.00 | \n",
" inf | \n",
"
\n",
" \n",
" 4 | \n",
" (Onion) | \n",
" (Kidney Beans) | \n",
" 0.6 | \n",
" 1.0 | \n",
" 0.6 | \n",
" 1.0 | \n",
" 1.00 | \n",
" 0.00 | \n",
" inf | \n",
"
\n",
" \n",
" 5 | \n",
" (Yogurt) | \n",
" (Kidney Beans) | \n",
" 0.6 | \n",
" 1.0 | \n",
" 0.6 | \n",
" 1.0 | \n",
" 1.00 | \n",
" 0.00 | \n",
" inf | \n",
"
\n",
" \n",
" 6 | \n",
" (Kidney Beans, Onion) | \n",
" (Eggs) | \n",
" 0.6 | \n",
" 0.8 | \n",
" 0.6 | \n",
" 1.0 | \n",
" 1.25 | \n",
" 0.12 | \n",
" inf | \n",
"
\n",
" \n",
" 7 | \n",
" (Onion, Eggs) | \n",
" (Kidney Beans) | \n",
" 0.6 | \n",
" 1.0 | \n",
" 0.6 | \n",
" 1.0 | \n",
" 1.00 | \n",
" 0.00 | \n",
" inf | \n",
"
\n",
" \n",
" 8 | \n",
" (Onion) | \n",
" (Kidney Beans, Eggs) | \n",
" 0.6 | \n",
" 0.8 | \n",
" 0.6 | \n",
" 1.0 | \n",
" 1.25 | \n",
" 0.12 | \n",
" inf | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" antecedents consequents antecedent support \\\n",
"0 (Kidney Beans) (Eggs) 1.0 \n",
"1 (Eggs) (Kidney Beans) 0.8 \n",
"2 (Onion) (Eggs) 0.6 \n",
"3 (Milk) (Kidney Beans) 0.6 \n",
"4 (Onion) (Kidney Beans) 0.6 \n",
"5 (Yogurt) (Kidney Beans) 0.6 \n",
"6 (Kidney Beans, Onion) (Eggs) 0.6 \n",
"7 (Onion, Eggs) (Kidney Beans) 0.6 \n",
"8 (Onion) (Kidney Beans, Eggs) 0.6 \n",
"\n",
" consequent support support confidence lift leverage conviction \n",
"0 0.8 0.8 0.8 1.00 0.00 1.000000 \n",
"1 1.0 0.8 1.0 1.00 0.00 inf \n",
"2 0.8 0.6 1.0 1.25 0.12 inf \n",
"3 1.0 0.6 1.0 1.00 0.00 inf \n",
"4 1.0 0.6 1.0 1.00 0.00 inf \n",
"5 1.0 0.6 1.0 1.00 0.00 inf \n",
"6 0.8 0.6 1.0 1.25 0.12 inf \n",
"7 1.0 0.6 1.0 1.00 0.00 inf \n",
"8 0.8 0.6 1.0 1.25 0.12 inf "
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from mlxtend.frequent_patterns import association_rules\n",
"rules = association_rules(got)\n",
"rules"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"https://en.wikipedia.org/wiki/Association_rule_learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"if we somehow have time fun with 0"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}