Files

2886 lines
246 KiB
Plaintext
Raw Permalink Normal View History

2018-11-26 22:47:22 -04:00
{
"cells": [
{
"attachments": {
"nummatplot.png": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgMAAACLCAYAAAD8p5rLAAAAAXNSR0IArs4c6QAAQABJREFUeAHsnQeAXdV1rtftd/poNNKoI4kmqugdG2OaIa5gx92OnTixU2wncdp7cdwwL89ObMd5MW4kjgtxL4AN2BhMFQKEKAIkQF2jNr3cmdvf968zdzQzGlESsADtLd25556zzy7r7LPXv1fbsSrJQgoUCBQIFAgUCBQIFDhgKRA/YHseOh4oECgQKBAoECgQKOAUCGAgDIRAgUCBQIFAgUCBA5wCAQwc4AMgdD9QIFAgUCBQIFAggIEwBgIFAgUCBQIFAgUOcAoEMHCAD4DQ/UCBQIFAgUCBQIEABsIYCBQIFAgUCBQIFDjAKRDAwAE+AEL3AwUCBQIFAgUCBQIYCGMgUCBQIFAgUCBQ4ACnQAADB/gACN0PFAgUCBQIFAgUCGAgjIFAgUCBQIFAgUCBA5wCAQwc4AMgdD9QIFAgUCBQIFAggIEwBgIFAgUCBQIFAgUOcAoEMHCAD4DQ/UCBQIFAgUCBQIEABsIYCBQIFAgUCBQIFDjAKRDAwAE+AEL3AwUCBQIFAgUCBQIYCGMgUCBQIFAgUCBQ4ACnQAADB/gACN0PFAgUCBQIFAgUCGAgjIFAgUCBQIFAgUCBA5wCAQwc4AMgdD9QIFAgUCBQIFAggIEwBgIFAgUCBQIFAgUOcAoEMHCAD4DQ/UCBQIFAgUCBQIEABsIYCBQIFAgUCBQIFDjAKRDAwAE+AEL3AwUCBQIFAgUCBQIYCGMgUCBQIFAgUCBQ4ACnQAADB/gACN0PFAgUCBQIFAgUCGAgjIFAgUCBQIFAgUCBA5wCAQwc4AMgdD9QIFAgUCBQIFAggIEwBgIFAgUCBQIFAgUOcAoEMHCAD4DQ/UCBQIFAgUCBQIH9DwaqeghVq1bL/q3j6LPni6PnIHlF05RTq2/i9zTZwqlAgUCBQIFAgUCBlygFks9Pv8RYK2NFJyL2PpUXxwAA+ZjFE2axRMlKpRFLcMwvi8VoViULJojxu3bjnuNYTMdTUmJinVOuTSpDlQh4KNGG8XaqvVFdMUtz7I1RppACBQIFAgUCBQIFXtIUiFVJz30PJzLm6ZlquUwepAExLheKecumM1aqFC3ujD5uFYCACwtg0KkkAgxlVIpO8nsqIJi+nuimp/o7sa01UqisqeU/VRnhWqBAoECgQKBAoMCLlwLPExgQQWqMdRriVCOpQdGKEZOvxpEMxC0Rj1lyjKeXWb1Xq0VYctxS/CtXuCdWASwIB6jsmlphrPxqHQfPgIFPyTIdFNoLZ0zThXAqUCBQIFAgUCBQ4KVCgedJTSDy1LjuRKbNubGFuL4K6AVi1ZRt3dRnT6zbZulknc3qaLXm1ox/svUpUwOrlRgqBKGECClUrcRxTQ0Rfddq48LTpBpIeYo7lOUpLj9NBeFyoECgQKBAoECgwIuKAs+LZGAyuxUYqLDK15fAAACAfzIHGOTz+KNddu/dj1pfT9EyqWZf9cdSeWtoqlhTi1lrU7PNbungOGONDRmrq8/ySbgEwYsc0/8nJ+n4ay2g3jEJRY23y0bA69djIpvsD6Lcalt8TPtQy61MIQUKBAoECgQKBAq8tCnwvICBySSLJAOlitgyzBYwoEX9SL5st63caGvXbrNqKWvxagtMOgtoKAEUhsENA1aODVm1UMacD+afht0nYtbQUGczZjZafX3S5sydbUuWtluxVLW6lFk8iQQBPi5WHoEASQ3E/vkFGolxJNlCxPw5z7nIRkHwIDov0BKPZRwkcCqkQIFAgUCBQIFAgZc8BZ53MFD1lTs2AWCCfLli6VTCtneO2Kr719qGjQU4cAIm32DFQtKS8bQz8VgcW4F4Hq5cADyULVERC69gNyD1QMnKZQGMkrXMaLL2WbNsaGjQWlsrVgdQaGlqtOZmSQ+y1thcbykcAxLYHzrjp4w0XL8sju//BQuqAAjBBNkoRCqHZBwwMEnSQLaQAgUCBQIFAgUCBV6iFHh+wIB4rBKMV2CgBGPNCw3w/cT6blu16nHr7R7EHmAOq3qu4l8Yw1tA6/NKBQ8D7ovJgUB/YNMVmH+Z80nsBhKJFI4FMSvmc5YbGbF0BrkB+SqVYVb0rPzxPMhkMTnMJC2bTVgTgGBGW6s1NjagZkjazGbsEAAk+sQFBFKRmqCmLBAg0JVIYqBOhBQoECgQKBAoECjw0qbA8wcGtMh2jUAZn4E4kCBmDz22ze6840HL57VUrzME/mTCQyBOZqQBpXIeJiwAoDgDLOmJNyDVQtElAlVL436o6/l83np7uvFAKFp7ewf5Y/gb1PFXh4AOPjIyLFfyAAp5LERMP5koWzt2CMlk0pYuWWxLD51rA0gV6tAx1GOsmE1F9pRqQUiBAoECgQKBAoECBwoFnqU3gTTxMFtZ/9X8/idQypkxv6vlQS5LPMAqng+ZrYCEf2CoaH25gqXrZyK2F0AYhblTppg9K3LhAOn8K6gTkgkAAvYDsQp2BlrBJwEH8aT19Pfb4OAQaoWipVK4HFaTfCcBEpFgP04etVJShmQyY5USRzISBBOMjBRsdynldglDuT7buLVsu3Zt9cBHdXWoGFpmoGJA9dBeh9ohjVQhYdmMWh+1S9+1PkrtoHZKEhGdk+RDfZYqgyRVhFoicEIOpwNt9aQ+RwecJygT9JQ0JKRAgUCBQIFAgUCB/UGBZwkGaKIzub2b6q7/NYbGSr9MAKFEIsk37BH+mAQTpDKN8PMmK8VTVhKgkE0ARcXiYqhpi+NtUFA8ARK8HSYNQChWLIMtgZhvT9du6xvMcZ5moy6ogB6qoIcYRgGSAIjBVgAQLh3A1kAi/yRAQUXKHCCTmcG9jTacG7ChnXnb2VUAdDSQn7L7qrZtWx9SimF+92OwWLT6uoS1tGatuSlrc3B5nNHSiKoh5aqJhmySvkTRFb29GEiCPOiLjBIjY0UHA0gnKJDzAimoQWiIYJBiJtBDPi4+8S+VE1KgQKBAoECgQKDAb5sCzxoMeKjg6Voprh4td2G+jRxiK8Aq/JFHt9imbd3W1rHAtm/vt1KRtTBWfXEYeHxsEe3FCWQAEDgrKwF5+bHGhn3CN6XDHxkZdalAFQAgxlphqS8pQAUQUIEBywBQ2gb/E4MpU5zW4xXsDCR4cIkB+fuGBqyvbxcXq9Y+s43zAhAVIiCm4c8xK2PbkEjUg0YKNjJQtKH+QdtW7cUFciv1mjXU19viJYvtsMPnWFfXLqQSaQBCFilCHTYKWY+LIK8HSTjkwYDJIr1AaVFM0QvZRlCILBoVcIkzIQUKBAoECgQKBArsbwr8920GnPHXRONwPsT5UYqx6jcs/Mt2H4aCj63dQLhhrqcarX+4aF2s7GPpOsT7Zo1pGQTC6gUMWGXLQDDOORkM+qqb2xIFxP2FjPUM9NlQTqt2RP4w8grMFNU/Iv0291AQ81eiQ/zR6ltQgJU7ACGdbKR9CRsYGLIByqiwWs9kUtba3EwZVEJ9yqdYBQkAg/eEYsTEtYFSReVxsuKhkCmZeuWtMDy4zQFCBmPFetQM2WwasIC6obnBZuDp0NjUYM2NSc5FezBI8+FJFciWAQAj9UFVsRdoRkiBAoECgQKBAoEC+4MCz1oyAH+EkY01tcbcai33WMFmu3aWMBR8wjZt3mrZujbc+xIYAVJVFYM+vALKgAPtTZDLjTgTlPlBlXtjAgNY90vsn4XZJvEUqBKhMJNqQNyfBwSMOhCoIgmotaMKo66IU4ubk2LSCfCzDDiRt0AqWW+jqBqGBvoBKJShfQ4EFahLdfo9kjZQjssXpGbgqAqi0f4IMVQSsURGcgcHKLnRYevv7bemRqIi1M3APqJs+aESQEXAaNjtJRRSWUaKSaQGddkygZOSSA5mWBv2CB2zM0RZpD+SDjiCEUFDChQIFAgUCBQIFNh/FHj2YEBtdf6lPzBN6cpxDSzDjKusvjds6bQ77uq0rt1lROitVixShbh9TMaEMN8qDB1+m+BcCiasoIRS6lf4lIolq+QxysPtTzy9MgxYyCdsFF3BCLsalmDQciv0ZToAQMxb
}
},
"cell_type": "markdown",
"metadata": {},
"source": [
"# *Manipulación y Análisis de Datos con Python* \n",
"![nummatplot.png](attachment:nummatplot.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"## Pre-procesamiento de datos"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"### Manejo de datos faltantes"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>24.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>12.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>17.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>5.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A\n",
"0 24.0\n",
"1 NaN\n",
"2 NaN\n",
"3 23.0\n",
"4 NaN\n",
"5 12.0\n",
"6 NaN\n",
"7 17.0\n",
"8 NaN\n",
"9 2.0\n",
"10 5.0"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"datos1 = pd.DataFrame([24, np.nan, np.nan, 23,np.nan, 12, np.nan, 17, np.nan, 2 ,5], columns = list('A'))\n",
"datos1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"En `dropna()` el argumento `subset` considera la etiqueta para seleccionar el conjunto a descartar, `axis=0` descarta filas (`axis=1` columnas) y `inplace= True` hace que los cambios se ejecuten directamente en `DataFrame`."
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>24.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>12.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>17.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>5.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A\n",
"0 24.0\n",
"3 23.0\n",
"5 12.0\n",
"7 17.0\n",
"9 2.0\n",
"10 5.0"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"datos1.dropna(subset=['A'], axis= 0, inplace= True)\n",
"datos1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br> \n",
"La función `replace()` permite reemplazar valores faltantes en el `DataFrame` por valores nuevos. En nuestro ejemplo, reemplazaremos con el promedio, que se calcula con la función `mean()`. "
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"13.833333333333334"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"datos1 = pd.DataFrame([24, np.nan, np.nan, 23,np.nan, 12, np.nan, 17, np.nan, 2 ,5], columns = list('A'))\n",
"media = datos1['A'].mean()\n",
"media"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Ahora usamos la función `replace()`"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>24.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>13.833333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>13.833333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>13.833333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>12.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>13.833333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>17.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>13.833333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>5.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A\n",
"0 24.000000\n",
"1 13.833333\n",
"2 13.833333\n",
"3 23.000000\n",
"4 13.833333\n",
"5 12.000000\n",
"6 13.833333\n",
"7 17.000000\n",
"8 13.833333\n",
"9 2.000000\n",
"10 5.000000"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"datos1['A'].replace(np.nan, media, inplace = True)\n",
"datos1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"## Transformando los datos"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"### Mezclando y combinando `DataFrames`"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Artículo comprado</th>\n",
" <th>Costo</th>\n",
" <th>Nombre</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Tienda 1</th>\n",
" <td>Libro</td>\n",
" <td>1200</td>\n",
" <td>Adelis</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tienda 1</th>\n",
" <td>Raspberry pi 3</td>\n",
" <td>15000</td>\n",
" <td>Miguel</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tienda 2</th>\n",
" <td>Balón</td>\n",
" <td>5000</td>\n",
" <td>Jaime</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Artículo comprado Costo Nombre\n",
"Tienda 1 Libro 1200 Adelis\n",
"Tienda 1 Raspberry pi 3 15000 Miguel\n",
"Tienda 2 Balón 5000 Jaime"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"compra_1 = pd.Series({'Nombre': 'Adelis',\n",
" 'Artículo comprado': 'Libro',\n",
" 'Costo': 1200})\n",
"compra_2 = pd.Series({'Nombre': 'Miguel',\n",
" 'Artículo comprado': 'Raspberry pi 3',\n",
" 'Costo': 15000})\n",
"compra_3 = pd.Series({'Nombre': 'Jaime',\n",
" 'Artículo comprado': 'Balón',\n",
" 'Costo': 5000})\n",
"df = pd.DataFrame([compra_1, compra_2, compra_3], index=['Tienda 1', 'Tienda 1', 'Tienda 2'])\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Podemos agregar elementos al `DataFrame` de la siguiente manera:"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Artículo comprado</th>\n",
" <th>Costo</th>\n",
" <th>Nombre</th>\n",
" <th>Fecha</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Tienda 1</th>\n",
" <td>Libro</td>\n",
" <td>1200</td>\n",
" <td>Adelis</td>\n",
" <td>Diciembre 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tienda 1</th>\n",
" <td>Raspberry pi 3</td>\n",
" <td>15000</td>\n",
" <td>Miguel</td>\n",
" <td>Febrero 4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tienda 2</th>\n",
" <td>Balón</td>\n",
" <td>5000</td>\n",
" <td>Jaime</td>\n",
" <td>Mediados de Julio</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Artículo comprado Costo Nombre Fecha\n",
"Tienda 1 Libro 1200 Adelis Diciembre 1\n",
"Tienda 1 Raspberry pi 3 15000 Miguel Febrero 4\n",
"Tienda 2 Balón 5000 Jaime Mediados de Julio"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Fecha'] = ['Diciembre 1', 'Febrero 4', 'Mediados de Julio']\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Artículo comprado</th>\n",
" <th>Costo</th>\n",
" <th>Nombre</th>\n",
" <th>Fecha</th>\n",
" <th>Entregado</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Tienda 1</th>\n",
" <td>Libro</td>\n",
" <td>1200</td>\n",
" <td>Adelis</td>\n",
" <td>Diciembre 1</td>\n",
" <td>Sí</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tienda 1</th>\n",
" <td>Raspberry pi 3</td>\n",
" <td>15000</td>\n",
" <td>Miguel</td>\n",
" <td>Febrero 4</td>\n",
" <td>Sí</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tienda 2</th>\n",
" <td>Balón</td>\n",
" <td>5000</td>\n",
" <td>Jaime</td>\n",
" <td>Mediados de Julio</td>\n",
" <td>Sí</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Artículo comprado Costo Nombre Fecha Entregado\n",
"Tienda 1 Libro 1200 Adelis Diciembre 1 Sí\n",
"Tienda 1 Raspberry pi 3 15000 Miguel Febrero 4 Sí\n",
"Tienda 2 Balón 5000 Jaime Mediados de Julio Sí"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Entregado'] = 'Sí'\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Artículo comprado</th>\n",
" <th>Costo</th>\n",
" <th>Nombre</th>\n",
" <th>Fecha</th>\n",
" <th>Entregado</th>\n",
" <th>Retroalimentación</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Tienda 1</th>\n",
" <td>Libro</td>\n",
" <td>1200</td>\n",
" <td>Adelis</td>\n",
" <td>Diciembre 1</td>\n",
" <td>Sí</td>\n",
" <td>Positiva</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tienda 1</th>\n",
" <td>Raspberry pi 3</td>\n",
" <td>15000</td>\n",
" <td>Miguel</td>\n",
" <td>Febrero 4</td>\n",
" <td>Sí</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tienda 2</th>\n",
" <td>Balón</td>\n",
" <td>5000</td>\n",
" <td>Jaime</td>\n",
" <td>Mediados de Julio</td>\n",
" <td>Sí</td>\n",
" <td>Negativa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Artículo comprado Costo Nombre Fecha Entregado \\\n",
"Tienda 1 Libro 1200 Adelis Diciembre 1 Sí \n",
"Tienda 1 Raspberry pi 3 15000 Miguel Febrero 4 Sí \n",
"Tienda 2 Balón 5000 Jaime Mediados de Julio Sí \n",
"\n",
" Retroalimentación \n",
"Tienda 1 Positiva \n",
"Tienda 1 None \n",
"Tienda 2 Negativa "
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Retroalimentación'] = ['Positiva', None, 'Negativa']\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Pandas `reset_index ()` es un método para restablecer el índice de un `DataFrame`. El establece como índices una lista de enteros que van desde 0 hasta la longitud de los datos."
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>index</th>\n",
" <th>Artículo comprado</th>\n",
" <th>Costo</th>\n",
" <th>Nombre</th>\n",
" <th>Fecha</th>\n",
" <th>Entregado</th>\n",
" <th>Retroalimentación</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Tienda 1</td>\n",
" <td>Libro</td>\n",
" <td>1200</td>\n",
" <td>Adelis</td>\n",
" <td>Diciembre 1</td>\n",
" <td>Sí</td>\n",
" <td>Positiva</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Tienda 1</td>\n",
" <td>Raspberry pi 3</td>\n",
" <td>15000</td>\n",
" <td>Miguel</td>\n",
" <td>Febrero 4</td>\n",
" <td>Sí</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tienda 2</td>\n",
" <td>Balón</td>\n",
" <td>5000</td>\n",
" <td>Jaime</td>\n",
" <td>Mediados de Julio</td>\n",
" <td>Sí</td>\n",
" <td>Negativa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" index Artículo comprado Costo Nombre Fecha Entregado \\\n",
"0 Tienda 1 Libro 1200 Adelis Diciembre 1 Sí \n",
"1 Tienda 1 Raspberry pi 3 15000 Miguel Febrero 4 Sí \n",
"2 Tienda 2 Balón 5000 Jaime Mediados de Julio Sí \n",
"\n",
" Retroalimentación \n",
"0 Positiva \n",
"1 None \n",
"2 Negativa "
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adf = df.reset_index()\n",
"adf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Podemos tener un par de tablas de datos que nos interese unir o combinar en un mismo `DataFrame`."
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Función\n",
"Nombre \n",
"Adriana Gerente de ventas\n",
"Andrés Vendedor 1\n",
"Cristóbal Gerente de departamento\n",
"\n",
" Grado\n",
"Nombre \n",
"Andrés Nivel 3\n",
"Cristóbal Nivel 1\n",
"Adriana Nivel 2\n"
]
}
],
"source": [
"empleados_df = pd.DataFrame([{'Nombre': 'Adriana', 'Función': 'Gerente de ventas'},\n",
" {'Nombre': 'Andrés', 'Función': 'Vendedor 1'},\n",
" {'Nombre': 'Cristóbal', 'Función': 'Gerente de departamento'}])\n",
"empleados_df = empleados_df.set_index('Nombre')\n",
"grado_df = pd.DataFrame([{'Nombre': 'Andrés', 'Grado': 'Nivel 3'},\n",
" {'Nombre': 'Cristóbal', 'Grado': 'Nivel 1'},\n",
" {'Nombre': 'Adriana', 'Grado': 'Nivel 2'}])\n",
"grado_df = grado_df.set_index('Nombre')\n",
"print(empleados_df.head())\n",
"print()\n",
"print(grado_df.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"`pd.merge()` conecta filas en el `DataFrames` basado en una o más teclas. Para los conocedores de SQL esta función hace unión de bases de datos por columnas o índices."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Función</th>\n",
" <th>Grado</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Nombre</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Adriana</th>\n",
" <td>Gerente de ventas</td>\n",
" <td>Nivel 2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Andrés</th>\n",
" <td>Vendedor 1</td>\n",
" <td>Nivel 3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Cristóbal</th>\n",
" <td>Gerente de departamento</td>\n",
" <td>Nivel 1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Función Grado\n",
"Nombre \n",
"Adriana Gerente de ventas Nivel 2\n",
"Andrés Vendedor 1 Nivel 3\n",
"Cristóbal Gerente de departamento Nivel 1"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_info_empleados=pd.merge(empleados_df, grado_df, how='outer', left_index=True, right_index=True)\n",
"df_info_empleados"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Otros ejemplos de cómo variar el parámetro `how` se pueden encontrar en el libro *Python for Data Analysis* - McKinney."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Supongamos que tenemos ahora un nuevo `DataFrame` que coincide en número de filas con el anterior. Por ejemplo:"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Fecha de Ingreso\n",
"Nombre \n",
"Adriana 20/06/2013\n",
"Andrés 10/01/2018\n",
"Cristóbal 20/03/2011\n",
" Art.Vendidos/Total Art.\n",
"Nombre \n",
"Adriana 0.0123\n",
"Andrés 0.1450\n",
"Cristóbal 0.5000\n"
]
}
],
"source": [
"fecha_ingreso_df = pd.DataFrame([{'Nombre': 'Adriana', 'Fecha de Ingreso': '20/06/2013'},\n",
" {'Nombre': 'Andrés', 'Fecha de Ingreso': '10/01/2018'},\n",
" {'Nombre': 'Cristóbal', 'Fecha de Ingreso': '20/03/2011'}])\n",
"fecha_ingreso_df = fecha_ingreso_df.set_index('Nombre')\n",
"art_vendidos_df = pd.DataFrame([{'Nombre': 'Adriana', 'Art.Vendidos/Total Art.': 123/10000},\n",
" {'Nombre': 'Andrés', 'Art.Vendidos/Total Art.': 1450/10000},\n",
" {'Nombre': 'Cristóbal', 'Art.Vendidos/Total Art.': 5000/10000}])\n",
"art_vendidos_df = art_vendidos_df.set_index('Nombre')\n",
"\n",
"print(fecha_ingreso_df.head())\n",
"print(art_vendidos_df.head())\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"`pd.concat()` pega o apila objetos a lo largo de un eje."
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Función</th>\n",
" <th>Grado</th>\n",
" <th>Fecha de Ingreso</th>\n",
" <th>Art.Vendidos/Total Art.</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Nombre</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Adriana</th>\n",
" <td>Gerente de ventas</td>\n",
" <td>Nivel 2</td>\n",
" <td>20/06/2013</td>\n",
" <td>0.0123</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Andrés</th>\n",
" <td>Vendedor 1</td>\n",
" <td>Nivel 3</td>\n",
" <td>10/01/2018</td>\n",
" <td>0.1450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Cristóbal</th>\n",
" <td>Gerente de departamento</td>\n",
" <td>Nivel 1</td>\n",
" <td>20/03/2011</td>\n",
" <td>0.5000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Función Grado Fecha de Ingreso \\\n",
"Nombre \n",
"Adriana Gerente de ventas Nivel 2 20/06/2013 \n",
"Andrés Vendedor 1 Nivel 3 10/01/2018 \n",
"Cristóbal Gerente de departamento Nivel 1 20/03/2011 \n",
"\n",
" Art.Vendidos/Total Art. \n",
"Nombre \n",
"Adriana 0.0123 \n",
"Andrés 0.1450 \n",
"Cristóbal 0.5000 "
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_data = pd.concat([df_info_empleados, fecha_ingreso_df, art_vendidos_df], axis=1)\n",
"new_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Hay mucho más que aprender! Por ejemplo: ¿Qué sucede si `axis=0`? **R**: pues posiblemente el resultado sea que Pandas pegue todos los valores y sus índices. Como se muestra a continuación:"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Art.Vendidos/Total Art.</th>\n",
" <th>Fecha de Ingreso</th>\n",
" <th>Función</th>\n",
" <th>Grado</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Nombre</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Adriana</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Gerente de ventas</td>\n",
" <td>Nivel 2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Andrés</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Vendedor 1</td>\n",
" <td>Nivel 3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Cristóbal</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Gerente de departamento</td>\n",
" <td>Nivel 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Adriana</th>\n",
" <td>NaN</td>\n",
" <td>20/06/2013</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Andrés</th>\n",
" <td>NaN</td>\n",
" <td>10/01/2018</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Cristóbal</th>\n",
" <td>NaN</td>\n",
" <td>20/03/2011</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Adriana</th>\n",
" <td>0.0123</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Andrés</th>\n",
" <td>0.1450</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Cristóbal</th>\n",
" <td>0.5000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Art.Vendidos/Total Art. Fecha de Ingreso Función \\\n",
"Nombre \n",
"Adriana NaN NaN Gerente de ventas \n",
"Andrés NaN NaN Vendedor 1 \n",
"Cristóbal NaN NaN Gerente de departamento \n",
"Adriana NaN 20/06/2013 NaN \n",
"Andrés NaN 10/01/2018 NaN \n",
"Cristóbal NaN 20/03/2011 NaN \n",
"Adriana 0.0123 NaN NaN \n",
"Andrés 0.1450 NaN NaN \n",
"Cristóbal 0.5000 NaN NaN \n",
"\n",
" Grado \n",
"Nombre \n",
"Adriana Nivel 2 \n",
"Andrés Nivel 3 \n",
"Cristóbal Nivel 1 \n",
"Adriana NaN \n",
"Andrés NaN \n",
"Cristóbal NaN \n",
"Adriana NaN \n",
"Andrés NaN \n",
"Cristóbal NaN "
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.concat([df_info_empleados, fecha_ingreso_df, art_vendidos_df], axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Otra transformación de interés podría ser hacer algún cálculo sobre una columna entera. En nuestro ejemplo, supongamos que deseamos colocar **% de artículos vendidos** y cambiar la etiqueta de esa columna.\n"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Función</th>\n",
" <th>Grado</th>\n",
" <th>Fecha de Ingreso</th>\n",
" <th>% Art. Vendidos</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Nombre</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Adriana</th>\n",
" <td>Gerente de ventas</td>\n",
" <td>Nivel 2</td>\n",
" <td>20/06/2013</td>\n",
" <td>1.23</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Andrés</th>\n",
" <td>Vendedor 1</td>\n",
" <td>Nivel 3</td>\n",
" <td>10/01/2018</td>\n",
" <td>14.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Cristóbal</th>\n",
" <td>Gerente de departamento</td>\n",
" <td>Nivel 1</td>\n",
" <td>20/03/2011</td>\n",
" <td>50.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Función Grado Fecha de Ingreso % Art. Vendidos\n",
"Nombre \n",
"Adriana Gerente de ventas Nivel 2 20/06/2013 1.23\n",
"Andrés Vendedor 1 Nivel 3 10/01/2018 14.50\n",
"Cristóbal Gerente de departamento Nivel 1 20/03/2011 50.00"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_data\n",
"new_data['Art.Vendidos/Total Art.']= new_data['Art.Vendidos/Total Art.']*100\n",
"new_data.rename(columns = {'Art.Vendidos/Total Art.': '% Art. Vendidos'}, inplace = True)\n",
"new_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"### Normalizando datos"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Tomemos un `DataFrame` que representa dimensiones de cajas a ser vendidas en un almacén. "
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Largo</th>\n",
" <th>Ancho</th>\n",
" <th>Alto</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>168.7</td>\n",
" <td>68.3</td>\n",
" <td>46.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>170.0</td>\n",
" <td>60.2</td>\n",
" <td>47.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>150.3</td>\n",
" <td>65.0</td>\n",
" <td>45.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>168.7</td>\n",
" <td>68.3</td>\n",
" <td>46.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>145.2</td>\n",
" <td>45.9</td>\n",
" <td>45.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>200.0</td>\n",
" <td>70.0</td>\n",
" <td>40.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>175.4</td>\n",
" <td>75.1</td>\n",
" <td>45.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>163.0</td>\n",
" <td>63.5</td>\n",
" <td>43.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>230.0</td>\n",
" <td>65.2</td>\n",
" <td>46.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>129.6</td>\n",
" <td>68.7</td>\n",
" <td>49.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>178.2</td>\n",
" <td>78.0</td>\n",
" <td>47.2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Largo Ancho Alto\n",
"0 168.7 68.3 46.8\n",
"1 170.0 60.2 47.0\n",
"2 150.3 65.0 45.0\n",
"3 168.7 68.3 46.8\n",
"4 145.2 45.9 45.3\n",
"5 200.0 70.0 40.9\n",
"6 175.4 75.1 45.6\n",
"7 163.0 63.5 43.8\n",
"8 230.0 65.2 46.8\n",
"9 129.6 68.7 49.0\n",
"10 178.2 78.0 47.2"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dimension1 = pd.DataFrame([168.7, 170.0, 150.3, 168.7, 145.2, 200.0, 175.4, 163.0, 230.0, 129.6, 178.2], columns = list('L'))\n",
"dimension1.rename(columns = {'L': 'Largo'}, inplace = True)\n",
"\n",
"\n",
"dimension2 = pd.DataFrame([68.3, 60.2, 65.0, 68.3, 45.9, 70.0, 75.1, 63.5, 65.2, 68.7, 78], columns = list('A'))\n",
"dimension2.rename(columns = {'A': 'Ancho'}, inplace = True)\n",
"\n",
"\n",
"dimension3 = pd.DataFrame([46.8, 47.0, 45.0, 46.8, 45.3, 40.9, 45.6, 43.8, 46.8, 49.0, 47.2], columns = list('A'))\n",
"dimension3.rename(columns = {'A': 'Alto'}, inplace = True)\n",
"\n",
"\n",
"dimensiones = pd.concat([dimension1, dimension2, dimension3], axis=1)\n",
"dimensiones"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Método de **\"Escala de característica simple\"**: se divide cada valor por el\n",
"valor máximo para esa característica, $x_{nuevo} = \\frac{x_{viejo}}{x_{máximo}}$"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Largo</th>\n",
" <th>Ancho</th>\n",
" <th>Alto</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.733478</td>\n",
" <td>0.875641</td>\n",
" <td>0.955102</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.739130</td>\n",
" <td>0.771795</td>\n",
" <td>0.959184</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.653478</td>\n",
" <td>0.833333</td>\n",
" <td>0.918367</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.733478</td>\n",
" <td>0.875641</td>\n",
" <td>0.955102</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.631304</td>\n",
" <td>0.588462</td>\n",
" <td>0.924490</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0.869565</td>\n",
" <td>0.897436</td>\n",
" <td>0.834694</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0.762609</td>\n",
" <td>0.962821</td>\n",
" <td>0.930612</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>0.708696</td>\n",
" <td>0.814103</td>\n",
" <td>0.893878</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1.000000</td>\n",
" <td>0.835897</td>\n",
" <td>0.955102</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>0.563478</td>\n",
" <td>0.880769</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>0.774783</td>\n",
" <td>1.000000</td>\n",
" <td>0.963265</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Largo Ancho Alto\n",
"0 0.733478 0.875641 0.955102\n",
"1 0.739130 0.771795 0.959184\n",
"2 0.653478 0.833333 0.918367\n",
"3 0.733478 0.875641 0.955102\n",
"4 0.631304 0.588462 0.924490\n",
"5 0.869565 0.897436 0.834694\n",
"6 0.762609 0.962821 0.930612\n",
"7 0.708696 0.814103 0.893878\n",
"8 1.000000 0.835897 0.955102\n",
"9 0.563478 0.880769 1.000000\n",
"10 0.774783 1.000000 0.963265"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dimensiones['Largo'] = dimensiones['Largo']/dimensiones['Largo'].max()\n",
"dimensiones['Ancho'] = dimensiones['Ancho']/dimensiones['Ancho'].max()\n",
"dimensiones['Alto'] = dimensiones['Alto']/dimensiones['Alto'].max()\n",
"dimensiones\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Método **Mínimo - Máximo**: toma cada valor, $x_{viejo}$ le resta el mínimo\n",
"valor de esa característica y luego se divide por el rango de esa característica, es decir, $x_{nuevo} = \\frac{x_{viejo} - x_{mínimo}}{x_{máximo} - x_{mínimo}}$"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Largo</th>\n",
" <th>Ancho</th>\n",
" <th>Alto</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.389442</td>\n",
" <td>0.697819</td>\n",
" <td>0.728395</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.402390</td>\n",
" <td>0.445483</td>\n",
" <td>0.753086</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.206175</td>\n",
" <td>0.595016</td>\n",
" <td>0.506173</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.389442</td>\n",
" <td>0.697819</td>\n",
" <td>0.728395</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.155378</td>\n",
" <td>0.000000</td>\n",
" <td>0.543210</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0.701195</td>\n",
" <td>0.750779</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0.456175</td>\n",
" <td>0.909657</td>\n",
" <td>0.580247</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>0.332669</td>\n",
" <td>0.548287</td>\n",
" <td>0.358025</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1.000000</td>\n",
" <td>0.601246</td>\n",
" <td>0.728395</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>0.000000</td>\n",
" <td>0.710280</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>0.484064</td>\n",
" <td>1.000000</td>\n",
" <td>0.777778</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Largo Ancho Alto\n",
"0 0.389442 0.697819 0.728395\n",
"1 0.402390 0.445483 0.753086\n",
"2 0.206175 0.595016 0.506173\n",
"3 0.389442 0.697819 0.728395\n",
"4 0.155378 0.000000 0.543210\n",
"5 0.701195 0.750779 0.000000\n",
"6 0.456175 0.909657 0.580247\n",
"7 0.332669 0.548287 0.358025\n",
"8 1.000000 0.601246 0.728395\n",
"9 0.000000 0.710280 1.000000\n",
"10 0.484064 1.000000 0.777778"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dimensiones['Largo'] = (dimensiones['Largo']-dimensiones['Largo'].min())/(dimensiones['Largo'].max() - dimensiones['Largo'].min())\n",
"dimensiones['Ancho'] = (dimensiones['Ancho']-dimensiones['Ancho'].min())/(dimensiones['Ancho'].max() - dimensiones['Ancho'].min())\n",
"dimensiones['Alto'] = (dimensiones['Alto']-dimensiones['Alto'].min())/(dimensiones['Alto'].max() - dimensiones['Alto'].min())\n",
"dimensiones\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Método **Puntaje estándar**: \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Largo</th>\n",
" <th>Ancho</th>\n",
" <th>Alto</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>-0.078861</td>\n",
" <td>0.249968</td>\n",
" <td>0.451435</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-0.030668</td>\n",
" <td>-0.714195</td>\n",
" <td>0.545129</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-0.760972</td>\n",
" <td>-0.142839</td>\n",
" <td>-0.391812</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>-0.078861</td>\n",
" <td>0.249968</td>\n",
" <td>0.451435</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-0.950036</td>\n",
" <td>-2.416358</td>\n",
" <td>-0.251270</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1.081470</td>\n",
" <td>0.452323</td>\n",
" <td>-2.312540</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0.169517</td>\n",
" <td>1.059389</td>\n",
" <td>-0.110729</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>-0.290167</td>\n",
" <td>-0.321388</td>\n",
" <td>-0.953976</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>2.193608</td>\n",
" <td>-0.119032</td>\n",
" <td>0.451435</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>-1.528347</td>\n",
" <td>0.297581</td>\n",
" <td>1.482070</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>0.273316</td>\n",
" <td>1.404583</td>\n",
" <td>0.638823</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Largo Ancho Alto\n",
"0 -0.078861 0.249968 0.451435\n",
"1 -0.030668 -0.714195 0.545129\n",
"2 -0.760972 -0.142839 -0.391812\n",
"3 -0.078861 0.249968 0.451435\n",
"4 -0.950036 -2.416358 -0.251270\n",
"5 1.081470 0.452323 -2.312540\n",
"6 0.169517 1.059389 -0.110729\n",
"7 -0.290167 -0.321388 -0.953976\n",
"8 2.193608 -0.119032 0.451435\n",
"9 -1.528347 0.297581 1.482070\n",
"10 0.273316 1.404583 0.638823"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dimensiones['Largo'] = (dimensiones['Largo']-dimensiones['Largo'].mean())/(dimensiones['Largo'].std())\n",
"dimensiones['Ancho'] = (dimensiones['Ancho']-dimensiones['Ancho'].mean())/(dimensiones['Ancho'].std())\n",
"dimensiones['Alto'] = (dimensiones['Alto']-dimensiones['Alto'].mean())/(dimensiones['Alto'].std())\n",
"dimensiones"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"## Estadística descriptiva"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"### Tabla de resumen estadístico"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>symboling</th>\n",
" <th>normalized-losses</th>\n",
" <th>make</th>\n",
" <th>fuel-type</th>\n",
" <th>aspiration</th>\n",
" <th>num-of-doors</th>\n",
" <th>body-style</th>\n",
" <th>drive-wheels</th>\n",
" <th>engine-location</th>\n",
" <th>wheel-base</th>\n",
" <th>...</th>\n",
" <th>engine-size</th>\n",
" <th>fuel-system</th>\n",
" <th>bore</th>\n",
" <th>stroke</th>\n",
" <th>compression-ratio</th>\n",
" <th>horsepower</th>\n",
" <th>peak-rpm</th>\n",
" <th>city-mpg</th>\n",
" <th>highway-mpg</th>\n",
" <th>price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>122</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>convertible</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>88.6</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.47</td>\n",
" <td>2.68</td>\n",
" <td>9.0</td>\n",
" <td>111.0</td>\n",
" <td>5000.0</td>\n",
" <td>21</td>\n",
" <td>27</td>\n",
" <td>13495.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3</td>\n",
" <td>122</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>convertible</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>88.6</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.47</td>\n",
" <td>2.68</td>\n",
" <td>9.0</td>\n",
" <td>111.0</td>\n",
" <td>5000.0</td>\n",
" <td>21</td>\n",
" <td>27</td>\n",
" <td>16500.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>122</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>94.5</td>\n",
" <td>...</td>\n",
" <td>152</td>\n",
" <td>mpfi</td>\n",
" <td>2.68</td>\n",
" <td>3.47</td>\n",
" <td>9.0</td>\n",
" <td>154.0</td>\n",
" <td>5000.0</td>\n",
" <td>19</td>\n",
" <td>26</td>\n",
" <td>16500.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2</td>\n",
" <td>164</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>99.8</td>\n",
" <td>...</td>\n",
" <td>109</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>10.0</td>\n",
" <td>102.0</td>\n",
" <td>5500.0</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>13950.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>164</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>4wd</td>\n",
" <td>front</td>\n",
" <td>99.4</td>\n",
" <td>...</td>\n",
" <td>136</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.0</td>\n",
" <td>115.0</td>\n",
" <td>5500.0</td>\n",
" <td>18</td>\n",
" <td>22</td>\n",
" <td>17450.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 26 columns</p>\n",
"</div>"
],
"text/plain": [
" symboling normalized-losses make fuel-type aspiration \\\n",
"0 3 122 alfa-romero gas std \n",
"1 3 122 alfa-romero gas std \n",
"2 1 122 alfa-romero gas std \n",
"3 2 164 audi gas std \n",
"4 2 164 audi gas std \n",
"\n",
" num-of-doors body-style drive-wheels engine-location wheel-base ... \\\n",
"0 two convertible rwd front 88.6 ... \n",
"1 two convertible rwd front 88.6 ... \n",
"2 two hatchback rwd front 94.5 ... \n",
"3 four sedan fwd front 99.8 ... \n",
"4 four sedan 4wd front 99.4 ... \n",
"\n",
" engine-size fuel-system bore stroke compression-ratio horsepower \\\n",
"0 130 mpfi 3.47 2.68 9.0 111.0 \n",
"1 130 mpfi 3.47 2.68 9.0 111.0 \n",
"2 152 mpfi 2.68 3.47 9.0 154.0 \n",
"3 109 mpfi 3.19 3.40 10.0 102.0 \n",
"4 136 mpfi 3.19 3.40 8.0 115.0 \n",
"\n",
" peak-rpm city-mpg highway-mpg price \n",
"0 5000.0 21 27 13495.0 \n",
"1 5000.0 21 27 16500.0 \n",
"2 5000.0 19 26 16500.0 \n",
"3 5500.0 24 30 13950.0 \n",
"4 5500.0 18 22 17450.0 \n",
"\n",
"[5 rows x 26 columns]"
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"df = pd.read_csv('Automobile_data.csv')\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>symboling</th>\n",
" <th>normalized-losses</th>\n",
" <th>wheel-base</th>\n",
" <th>length</th>\n",
" <th>width</th>\n",
" <th>height</th>\n",
" <th>curb-weight</th>\n",
" <th>engine-size</th>\n",
" <th>bore</th>\n",
" <th>stroke</th>\n",
" <th>compression-ratio</th>\n",
" <th>horsepower</th>\n",
" <th>peak-rpm</th>\n",
" <th>city-mpg</th>\n",
" <th>highway-mpg</th>\n",
" <th>price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>0.834146</td>\n",
" <td>122.000000</td>\n",
" <td>98.756585</td>\n",
" <td>174.049268</td>\n",
" <td>65.907805</td>\n",
" <td>53.724878</td>\n",
" <td>2555.565854</td>\n",
" <td>126.907317</td>\n",
" <td>3.329751</td>\n",
" <td>3.255423</td>\n",
" <td>10.142537</td>\n",
" <td>104.256158</td>\n",
" <td>5125.369458</td>\n",
" <td>25.219512</td>\n",
" <td>30.751220</td>\n",
" <td>13207.129353</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>1.245307</td>\n",
" <td>31.681008</td>\n",
" <td>6.021776</td>\n",
" <td>12.337289</td>\n",
" <td>2.145204</td>\n",
" <td>2.443522</td>\n",
" <td>520.680204</td>\n",
" <td>41.642693</td>\n",
" <td>0.270844</td>\n",
" <td>0.313597</td>\n",
" <td>3.972040</td>\n",
" <td>39.519211</td>\n",
" <td>476.979093</td>\n",
" <td>6.542142</td>\n",
" <td>6.886443</td>\n",
" <td>7868.768212</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>-2.000000</td>\n",
" <td>65.000000</td>\n",
" <td>86.600000</td>\n",
" <td>141.100000</td>\n",
" <td>60.300000</td>\n",
" <td>47.800000</td>\n",
" <td>1488.000000</td>\n",
" <td>61.000000</td>\n",
" <td>2.540000</td>\n",
" <td>2.070000</td>\n",
" <td>7.000000</td>\n",
" <td>48.000000</td>\n",
" <td>4150.000000</td>\n",
" <td>13.000000</td>\n",
" <td>16.000000</td>\n",
" <td>5118.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>0.000000</td>\n",
" <td>101.000000</td>\n",
" <td>94.500000</td>\n",
" <td>166.300000</td>\n",
" <td>64.100000</td>\n",
" <td>52.000000</td>\n",
" <td>2145.000000</td>\n",
" <td>97.000000</td>\n",
" <td>3.150000</td>\n",
" <td>3.110000</td>\n",
" <td>8.600000</td>\n",
" <td>70.000000</td>\n",
" <td>4800.000000</td>\n",
" <td>19.000000</td>\n",
" <td>25.000000</td>\n",
" <td>7788.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>1.000000</td>\n",
" <td>122.000000</td>\n",
" <td>97.000000</td>\n",
" <td>173.200000</td>\n",
" <td>65.500000</td>\n",
" <td>54.100000</td>\n",
" <td>2414.000000</td>\n",
" <td>120.000000</td>\n",
" <td>3.310000</td>\n",
" <td>3.290000</td>\n",
" <td>9.000000</td>\n",
" <td>95.000000</td>\n",
" <td>5200.000000</td>\n",
" <td>24.000000</td>\n",
" <td>30.000000</td>\n",
" <td>10595.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>2.000000</td>\n",
" <td>137.000000</td>\n",
" <td>102.400000</td>\n",
" <td>183.100000</td>\n",
" <td>66.900000</td>\n",
" <td>55.500000</td>\n",
" <td>2935.000000</td>\n",
" <td>141.000000</td>\n",
" <td>3.580000</td>\n",
" <td>3.410000</td>\n",
" <td>9.400000</td>\n",
" <td>116.000000</td>\n",
" <td>5500.000000</td>\n",
" <td>30.000000</td>\n",
" <td>34.000000</td>\n",
" <td>16500.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>3.000000</td>\n",
" <td>256.000000</td>\n",
" <td>120.900000</td>\n",
" <td>208.100000</td>\n",
" <td>72.300000</td>\n",
" <td>59.800000</td>\n",
" <td>4066.000000</td>\n",
" <td>326.000000</td>\n",
" <td>3.940000</td>\n",
" <td>4.170000</td>\n",
" <td>23.000000</td>\n",
" <td>288.000000</td>\n",
" <td>6600.000000</td>\n",
" <td>49.000000</td>\n",
" <td>54.000000</td>\n",
" <td>45400.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" symboling normalized-losses wheel-base length width \\\n",
"count 205.000000 205.000000 205.000000 205.000000 205.000000 \n",
"mean 0.834146 122.000000 98.756585 174.049268 65.907805 \n",
"std 1.245307 31.681008 6.021776 12.337289 2.145204 \n",
"min -2.000000 65.000000 86.600000 141.100000 60.300000 \n",
"25% 0.000000 101.000000 94.500000 166.300000 64.100000 \n",
"50% 1.000000 122.000000 97.000000 173.200000 65.500000 \n",
"75% 2.000000 137.000000 102.400000 183.100000 66.900000 \n",
"max 3.000000 256.000000 120.900000 208.100000 72.300000 \n",
"\n",
" height curb-weight engine-size bore stroke \\\n",
"count 205.000000 205.000000 205.000000 205.000000 205.000000 \n",
"mean 53.724878 2555.565854 126.907317 3.329751 3.255423 \n",
"std 2.443522 520.680204 41.642693 0.270844 0.313597 \n",
"min 47.800000 1488.000000 61.000000 2.540000 2.070000 \n",
"25% 52.000000 2145.000000 97.000000 3.150000 3.110000 \n",
"50% 54.100000 2414.000000 120.000000 3.310000 3.290000 \n",
"75% 55.500000 2935.000000 141.000000 3.580000 3.410000 \n",
"max 59.800000 4066.000000 326.000000 3.940000 4.170000 \n",
"\n",
" compression-ratio horsepower peak-rpm city-mpg highway-mpg \\\n",
"count 205.000000 205.000000 205.000000 205.000000 205.000000 \n",
"mean 10.142537 104.256158 5125.369458 25.219512 30.751220 \n",
"std 3.972040 39.519211 476.979093 6.542142 6.886443 \n",
"min 7.000000 48.000000 4150.000000 13.000000 16.000000 \n",
"25% 8.600000 70.000000 4800.000000 19.000000 25.000000 \n",
"50% 9.000000 95.000000 5200.000000 24.000000 30.000000 \n",
"75% 9.400000 116.000000 5500.000000 30.000000 34.000000 \n",
"max 23.000000 288.000000 6600.000000 49.000000 54.000000 \n",
"\n",
" price \n",
"count 205.000000 \n",
"mean 13207.129353 \n",
"std 7868.768212 \n",
"min 5118.000000 \n",
"25% 7788.000000 \n",
"50% 10595.000000 \n",
"75% 16500.000000 \n",
"max 45400.000000 "
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"### Gráficos de cajas (o Boxplots)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Vamos a generar datos aleatoriamente y hacer un **gráfico de caja**."
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>-0.736260</td>\n",
" <td>0.440260</td>\n",
" <td>0.731900</td>\n",
" <td>0.254316</td>\n",
" <td>-1.807715</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.053235</td>\n",
" <td>-2.661686</td>\n",
" <td>-1.076339</td>\n",
" <td>0.484691</td>\n",
" <td>0.170791</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.129711</td>\n",
" <td>-0.386408</td>\n",
" <td>1.199626</td>\n",
" <td>-1.070269</td>\n",
" <td>-1.157811</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>-0.271514</td>\n",
" <td>-0.071525</td>\n",
" <td>-0.862646</td>\n",
" <td>-0.789977</td>\n",
" <td>0.300384</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-0.166081</td>\n",
" <td>0.298657</td>\n",
" <td>1.737711</td>\n",
" <td>0.311688</td>\n",
" <td>1.138495</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3 4\n",
"0 -0.736260 0.440260 0.731900 0.254316 -1.807715\n",
"1 1.053235 -2.661686 -1.076339 0.484691 0.170791\n",
"2 0.129711 -0.386408 1.199626 -1.070269 -1.157811\n",
"3 -0.271514 -0.071525 -0.862646 -0.789977 0.300384\n",
"4 -0.166081 0.298657 1.737711 0.311688 1.138495"
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAD8CAYAAABjAo9vAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAADBZJREFUeJzt3V+IXOUZx/Hfr0mkEkUvImtJxBUqYrBVcRDFiy5WSjSl0kJBoVKpsBQqKAg1wYviXaQgvajQhioKFaVgRcnaxEh3kIK1bmwU09USJGKokIpUnVSqqU8vduuJdLsz2XN23zPP+X5gYGf3zHuePEx+8+57/qwjQgCAPL5QugAAQLMIdgBIhmAHgGQIdgBIhmAHgGQIdgBIhmAHgGQIdgBIhmAHgGTWl9jppk2bYnJyssSuP3P8+HFt3LixaA1tQS8q9KJCLypt6cWBAwfejYhzhm1XJNgnJyc1NzdXYtef6ff7mpqaKlpDW9CLCr2o0ItKW3ph+61RtmMpBgCSIdgBIBmCHQCSIdgBIBmCHQCSIdgBIBmCHQCSIdgBIJkiFygBQJvYbmSctvwN6dozdtvn2Z61PW/7kO07migMANZKRCz7OP/uPUO3aUuoS83M2E9IuisiXrZ9pqQDtvdHxF8aGBsAcIpqz9gj4p2IeHnx6w8lzUvaXHdcAMDKNHrw1PakpMslvdjkuACA0TV28NT2GZKekHRnRHywxM+nJU1L0sTEhPr9flO7XpHBYFC8hragFxV6UaEXnzdOvXATC/62N0jaI2lfRNw/bPterxfctrc96EWFXlToRWVyx4yO7NpeugzZPhARvWHbNXFWjCU9KGl+lFAHAKyuJtbYr5F0i6RrbR9cfNzQwLgAgBWovcYeEX+Q1MzZ/QCA2rilAAAkQ7ADQDLcKwboqGz3R0GFGTvQUdnuj4IKwQ4AyRDsAJAMa+zoFNaV0QXM2NEprCujCwh2AEiGYAeAZAh2AEiGYAeAZAh2AEiGYAeAZNKex875ygC6Ku2MnfOVAXRV2mAHgK4i2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJJpJNhtP2T7mO3XmhgPALByTc3YH5a0raGxAAA1NBLsEfG8pPeaGAsAUM/6tdqR7WlJ05I0MTGhfr+/Vrv+v9pQQxsMBgN6cRJ6UaEXlXHqxZoFe0TslrRbknq9XkxNTa3Vrpe2d0bFa2iJfr9PL/6L90WFXlTGrBecFQMAyRDsAJBMU6c7PibpBUkX2T5q+7YmxgUAnLpG1tgj4uYmxgEA1MdSDAAks2ZnxQBAKZfe+6ze/+iTWmNM7pip9fqzTt+gV37yjVpjjIpgB5De+x99oiO7tq/49U2cElz3g+FUEOwdYLuRcSKikXEArK6xDfau/WpVx7BAntwxU2s2A6BdxjbYu/arFUbDBz4wxsEOLIUPfIDTHQEgHYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGe4VAyTFDdG6i2AHkuKGaN3FUgwAJMOMHUB6Z168Q195ZEe9QR6pW4Mkrc0ftCHYE2AtFVjeh/O7OrUsRbAnwFoqgJOxxg4AyTBjB5Lq2royKgQ7kFTX1pVRYSkGAJIh2AEgGYIdAJIh2AEgGYIdAJJpJNhtb7P9hu3DtmueXwUAqKN2sNteJ+kBSddL2irpZttb644LAFiZJmbsV0o6HBFvRsTHkh6XdGMD4wIAVqCJYN8s6e2Tnh9d/B4AoIAmrjz1Et+L/9nInpY0LUkTExPq9/u1d1xnjMFgULyGJtGLCr2o0ItKp3oREbUekq6WtO+k5zsl7VzuNVdccUXUdf7de2q9fnZ2tngNTaEXFXpRoReVLL2QNBcj5HITSzEvSbrQ9gW2T5N0k6SnGxgXALACtZdiIuKE7dsl7ZO0TtJDEXGodmVDcOc6AFhaI3d3jIhnJD3TxFij4s51ALA0rjwFgGQIdgBIhmAHgGQIdgBIhj+NlwBnCAE4GcGeAGcIVfiQAwh2JMOHHMAaOwCkQ7ADQDIEOwAkQ7ADQDJjffC09kGqvfVef9bpG+rtHwBWwdgGe50zH6SFD4W6YwBAG7EUAwDJEOwAkAzBDgDJEOwAkAzBDgDJEOwAkAzBDgDJjO157ABwKrp0QSPBDiC9rl3QyFIMACRDsANAMgQ7ACSTdo3d9vBt7hs+TkQ0UA0ArJ20M/aIWPYxOzs7dBtCHcA4ShvsANBVBDsAJEOwA0AyBDsAJEOwA0AytYLd9ndtH7L9qe1eU0UBAFau7nnsr0n6jqRfNlALaujSDY4ALK9WsEfEvDTaxUBYPV27wRGA5aW98hQAv8l11dBgt/2cpHOX+NE9EfHUqDuyPS1pWpImJibU7/dHfemqGAwGxWtok0y9qPNvaep90YZ+PrxtY63X37r3eO0xpHb0ognj9O8YGuwRcV0TO4qI3ZJ2S1Kv14upqakmhl2xfr+v0jW0xt6ZPL3YO6Nb9x6vMYAl1Xn9wiw1RT8zvS/qGrNesBSDVDjeANQ/3fHbto9KulrSjO19zZQFAFipumfFPCnpyYZqAQA0gCtPASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASAZgh0AkiHYASCZ9aULAFCG7eHb3Dd8nIhooJqysvWCGTvQURGx7GN2dnboNm0Jsrqy9YJgB4BkWIpBp2T7lRtYSq0Zu+2f2n7d9qu2n7R9dlOFAash26/cwFLqLsXsl3RJRHxV0l8l7axfEgCgjlrBHhHPRsSJxad/lLSlfkkAgDqaPHj6A0m/a3A8AMAKDD14avs5Secu8aN7IuKpxW3ukXRC0qPLjDMtaVqSJiYm1O/3V1JvYwaDQfEa2oReLOB9UaEXlXHrheseCLL9fUk/lPT1iPjnKK/p9XoxNzdXa7919ft9TU1NFa2hLSZ3zOjIru2ly2gF3hcVelFpSy9sH4iI3rDtap3uaHubpLslfW3UUAcArK66a+w/l3SmpP22D9r+RQM1AQBqqDVjj4gvN1UIAKAZ3FIAAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgGYIdAJIh2AEgmfWlC8Dqsz18m/uGjxMRDVQDYLUxY++AiFj2MTs7O3QbQh0YHwQ7ACRDsANAMgQ7ACRDsANAMgQ7ACRDsANAMgQ7ACRDsANAMi5x4Yntv0t6a813/HmbJL1buIa2oBcVelGhF5W29OL8iDhn2EZFgr0NbM9FRK90HW1ALyr0okIvKuPWC5ZiACAZgh0AkulysO8uXUCL0IsKvajQi8pY9aKza+wAkFWXZ+wAkFIng932Nttv2D5se0fpekqx/ZDtY7ZfK11LabbPsz1re972Idt3lK6pFNtftP0n268s9uLe0jWVZnud7T/b3lO6llF0Lthtr5P0gKTrJW2VdLPtrWWrKuZhSdtKF9ESJyTdFREXS7pK0o86/L74l6RrI+JSSZdJ2mb7qsI1lXaHpPnSRYyqc8Eu6UpJhyPizYj4WNLjkm4sXFMREfG8pPdK19EGEfFORLy8+PWHWvhPvLlsVWXEgsHi0w2Lj84ejLO9RdJ2Sb8qXcuouhjsmyW9fdLzo+rof2AszfakpMslvVi2knIWlx4OSjomaX9EdLYXkn4m6ceSPi1dyKi6GOxL/WXnzs5G8Hm2z5D0hKQ7I+KD0vWUEhH/jojLJG2RdKXtS0rXVILtb0o6FhEHStdyKroY7EclnXfS8y2S/laoFrSI7Q1aCPVHI+K3petpg4j4h6S+unss5hpJ37J9RAvLttfa/nXZkob
"text/plain": [
"<matplotlib.figure.Figure at 0x1a1b233828>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"np.random.seed(1500) #generación aleatoria números\n",
"dfb = pd.DataFrame(np.random.randn(10,5)) #DataFrame de dimensiones 10x5\n",
"dfb.boxplot(return_type='axes') #Grafico de caja de cada categoría.\n",
"dfb.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Tomemos los datos del archivo `Automobile_data.csv` para crear un gráfico de caja de 3 variables que definen las dimensiones de los automóviles. "
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x1a1b41bcc0>"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAD/CAYAAAD4xAEfAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFoZJREFUeJzt3X20VXWdx/H3R4QgQbTUW6aAjj0QGT7cNTM55FzDmNKxUptJRiGNwoeVpcvVQqUHrYVZVq6eSHEYkUl0psZm8iFAklOSZgFKaZSjjoqpaSrgUTKB7/yx963D6T6ce86+nHvO7/Na66x79/799t6/fX/3nM/ZD+d3FBGYmVl6dml2A8zMrDkcAGZmiXIAmJklygFgZpYoB4CZWaIcAGZmiXIAmJklygFgZpYoB4CZWaJ2bXYD+rLXXnvFhAkTmt2MQfPCCy+w2267NbsZVif3X+tq975bs2bN7yNi7/7qDekAmDBhAqtXr252MwZNqVSiq6ur2c2wOrn/Wle7952kR2qp51NAZmaJcgCYmSXKAWBmligHgJlZohwAZmaJcgCYmSXKAWBmligHgJlZoob0B8FanaSG1+HvbDazweIjgEEUEX0+xs+5qd86ZmaDxQFgZpYoB4CZWaIcAGZmiXIAmJklygFgZpaofgNA0hck3Sdps6THJV0l6VVVdWZKelDSi5LuknR4VXmnpJ/l5Q9KOqXoHTEzs4Gp5QhgG3AK8GpgMrAfcHV3oaQpwLeAM4E9gf8CbpG0e14+FvhBPn9P4AzgCklvK243zMxsoPoNgIi4MCLujoiXI+Jp4BtAV0WVjwA3RMTyiHgJuAx4CTg+Lz8B2AJ8MSJeiohbge8BswvcDzMzG6B6Pgk8FfhFxfRkYFH3RESEpLvz+d3la2PHTzWtBWb0tHJJs8nDoaOjg1KpVEcTW0e77187K5fL7r8W5b7LDCgAJJ1I9o7/7ytmjwE2VVXdCOxeY/kOImIBsACgs7Mz2vl7O1l6c1t/L2m7a/fvlW1n7rtMzXcBSfon4CrgPRGxtqLoeWBsVfU9gM01lpuZWRPUFACSTgOuBI6LiJVVxeuAwyrqCjgkn99dfmjVModWlJuZWRPUchvox4AvAf8QET/pocpVwAmSpkoaAZwHjCS70Ev+85WSPiFphKSpZBeGFxSyB2ZmVpdargF8FdgKrKwc3jgiRuc/V0k6iywIXgv8EjgmIjbn5RslHQN8E/gs8ARwRkTcWeSOmJnZwPQbABHR76D2EbEYWNxH+c+Bvx5Y08zMbDB5KAgzs0Q5AMzMEuUAMDNLlAPAzCxRDgAzs0Q5AMzMEuUAMDNLlAPAzCxRDgAzs0Q5AMzMEuUAMDNLlAPAzCxRDgAzs0Q5AMzMElXPl8IbMPni5Wza8nLD65lw/s0NLT921HDWfWZaw+0ws/Q4AOq0acvLPHzpsQ2to4gvpm40QMwsXT4FZGaWKAeAmVmiHABmZolyAJiZJaqmAJB0kqTbJW2WtLWq7EJJ5apHSPpaRZ2HJf2hqs7BRe+MmZnVrtYjgOeA+cA51QURcUlEjO5+AIcCAXy7quqHK+tFxC8barmZmTWkpttAI2IZgKSuGqqfDtwTET9roF1mZjbICr0GIOkVwKnAFT0Uf0XSs5LukXR6kds1M7OBK/qDYO8HRgBLquZ/EFgDvAR0AddLIiKurF6BpNnAbICOjg5KpVLBTSxOo20rl8uF7N9Q/hu1s6L6z3Y+912m6AA4Hbg2IsqVMyPiRxWTt0r6CnAK8BcBEBELgAUAnZ2d0egnZQfN0psb/hRvEZ8ELqIdVp9C+s+awn2XKSwAJL0ZeDtwdg3VtwMqatvNMGbi+Rx8zfmNr+iaRtsB0NiQFGaWppoCQNIwYDjZ6R0kjcyLXoqIyH8/HfhpRKyrWnY8cCBwJ/AyMAU4F/hcw61voufXX+qxgMyspdV6BDADuLpiekv+8wDgYUmj8jrn9rDsbsBXgIPIbg99FPhsRHyjrhabmVkhar0NdBGwqI/yLcCrein7FdlnA8zMbAjxUBBmZolyAJiZJcoBYGaWKAeAmVmiHABmZonydwI3oJB78Jc2/qXwZmb1cADUqdEPgUEWIEWsx8ysHj4FZGaWKAeAmVmiHABmZolyAJiZJcoBYGaWKAeAmVmiHABmZolyAJiZJcoBYGaWKAeAmVmiHABmZolyAJiZJcoBYGaWKAeAmVmiagoASSdJul3SZklbq8q6JIWkcsXjjqo6B0laIekFSY9JOq/InTAzs4Gr9fsAngPmA6OABT2Ub4uI0T0tKGkYcCOwAngP8CZgqaTHIuI/Bt5kMzMrQk1HABGxLCKuAx6qYxtHAuOBCyLixYhYC1wJnFHHuszMrCBFXQMYJmmDpCcl3SxpckXZZOD+iChXzFubzzczsyYp4ishfw0cAtwHjAbmALdJOjgiHgfGAJuqltkI7N7TyiTNBmYDdHR0UCqVCmji0NXu+9fOyuWy+69Fue8yDQdARDwJPJlPbgQukPR+4N3AQuB5YGzVYnsAm3tZ3wLy6wydnZ3R1dXVaBOHrqU309b71+ZKpZL7r0W57zKDdRvodkD57+uAN0jaraL80Hy+mZk1Sa23gQ6TNBIYkU+PzB+S9I78Ns9dJI2WdBHQASzLF/8x8AhwiaRRkg4BTie7EGxmZk1S6xHADGAL2Yv6sPz3LWR390wGfkh2quch4G+Bd0bEBoCI2AYcB7wFeAa4BbgsIq4vbjfMzGygaroGEBGLgEW9FF+eP/pa/gFg6kAaZmZmg8tDQZiZJaqI20CtF5L6r/OFvssjoqDWmJntyEcAgygienwsWbKESZMmscsuuzBp0iSWLFnSa10zs8HiI4Cd7LrrrmPu3LksXLiQbdu2MWzYMGbNmgXA9OnTm9w6M0uJjwB2snnz5rFw4UKOOuoodt11V4466igWLlzIvHnzmt00M0uMA2AnW79+PVOmTNlh3pQpU1i/fn2TWmRmqXIA7GQTJ05k1apVO8xbtWoVEydObFKLzCxVDoCdbO7cucyaNYuVK1eydetWVq5cyaxZs5g7d26zm2ZmifFF4J2s+0Lv2Wefzfr165k4cSLz5s3zBWAz2+kcAE0wffp0pk+f7hEJzaypfArIzCxRDgAzs0Q5AMzMEuUAMDNLlAPAzCxRDgAzs0Q5AMzMEuUAMDNLlAPAzCxRDgAzs0Q5AMzMElVTAEg6SdLtkjZL2lpVdoyk2yT9XtJzeb23V9UJSS9KKlc8xha5I2ZmNjC1HgE8B8wHzumhbE/g68BBwN7AEuAHkvavqjctIkZXPDbV22gzM2tcTaOBRsQyAEldPZRdWzXrW5I+C3QCGxptoJmZDY7CrwFIeivwauDeqqLv5KeJ7pJ0QtHbNTOzgSn0+wAk7QN8F/hiRPxvRdHRwE/y398LXCvp+IhY2sM6ZgOzATo6OiiVSkU2cUgpl8ttvX/tzv3Xutx3GUVE7ZWzU0ArIuIvgkPSvsCtwErg7OhjxZKuAkZGxIy+ttfZ2RmrV6+uuX2txl8I09rcf62r3ftO0pqI6OyvXiGngCRNAG4HfhARH+3rxT+3HVAR2zYzs/rUdApI0jBgODAinx6ZF70EvBFYASyKiE/2sOxbgFcC9wABHAvMAE5qtPFmZla/Wo8AZgBbgGXAsPz3LcB4YA7wOuCcqvv8T86X3Ru4muxW0qeATwIfiojvF7cbZmY2ULXeBroIWNRL8Wn5o7dlVwKTBtowMzMbXB4KwswsUQ4AM7NEOQDMzBLlADAzS5QDwMwsUQ4AM7NEOQDMzBLlADAzS5QDwMwsUQ4AM7NEOQDMzBLlADAzS5QDwMwsUQ4AM7NEOQDMzBLlADAzS5QDwMwsUQ4AM7NEOQDMzBLlADAzS5QDwMwsUTUFgKSTJN0uabOkrT2Uv0vSfZK2SLpX0rSq8oMkrZD0gqTHJJ1X1A6YmVl9aj0CeA6YD5xTXSDpQOAG4PPA2Pzn9yRNyMuHATcC64G9gfcAcyR9oMG2m5lZA2oKgIhYFhHXAQ/1UPxBYE1EfDsi/hgR1wJr8/kARwLjgQs
"text/plain": [
"<matplotlib.figure.Figure at 0x1a1b2e2a20>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"x = df['length'] #Variable Largo\n",
"y = df['width'] #Variable Ancho\n",
"z =df['height'] #Variable Alto\n",
"dfbp = pd.DataFrame([x,y,z]).T #Creando un DataFrame con las dimensiones de los autosmóviles\n",
"dfbp.boxplot(fontsize=13, return_type='axes') #Gráfico de caja de las 3 variables \n",
"#Tarea!!!!! Normalice estos datos y haga el nuevo gráfico de caja"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"### Gráficos de barras (o histogramas)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Vamos a generar datos aleatoriamente y hacer un **gráfico de barras**."
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x1a1b38bb70>"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAD/CAYAAADbn1DKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3XmYVOWZ9/Hvr6qbRXalXVgbBBUXEGSJxl1jSIgx0WQSB2PGTMTMJPMmJi4k42SSWTKgb5x3ZrJJjE5UEo3RaIgLLnEhbtC0grIpIGA3qA3Y7FtX3e8f5zSWZXV3dXedPl3V9+e66qruc55zzn1Y6q5nOc8jM8M555xrq0TcATjnnCtunkicc861iycS55xz7eKJxDnnXLt4InHOOdcunkicc861iycS55xz7eKJxDnnXLt4InHOOdcuZXEH0BEGDhxolZWVcYfhnHNFZfHixZvNrKKlcl0ikVRWVlJVVRV3GM45V1Qkrc+nnDdtOeecaxdPJM4559rFE4lzzrl28UTinHOuXTyROOecaxdPJM4559rFE4lzzrl28UTinHOuXTyROOeca5cu8WS7c1GqnPlQu45fN2tagSJxLh6R1kgkJSXdJKlO0g5J90kamMdxfyfJJN2QtX2UpCck7ZJUI+k70UXvnHMuH1E3bc0ELgKmAEPCbXc2d4Ck4cB3gFeztieBecAKoAL4NHC9pC8UOGbnnHOtEHUimQHMNrO1ZrYNuA6YKqmymWN+BfwjsDVr+5nAcOC7ZrbbzKqBW4CvFTxq55xzeYsskUjqBwwDFjduM7M1wHZgbBPHXAXsNrN7cuweB7xuZjsztlWH251zzsUkys72vuH7tqzt9Rn7DpI0DLgB+EgT5+uT77nC880gqBExbNiw/CJ2LkK92MOJWsc79GedHRV3OM4VTJSJZEf43i9re3+CWkm2W4F/M7PaZs6X77kwsznAHICJEydaPgE7F5WPJxZyU/kc+mo3AA+nJvPdA19lG71jjsy59ousacvM6oENwITGbZJGEtQgluY45GPAjyRtlrQZ+CjwXUkLwv1LgGMk9co4Zny43blO66zEEn5a/t+ssUH8zf5rufnA5zgvUc3cbj+iD7vjDs+5dov6OZI5BCOrngK2ALOB+Wa2LkfZoVm/3wssAH4c/v4ssJ4g2cwEjgWuAr4ZQdzOFUQ/dnJT+S28YYOZvv977KYHTzOepTaSX5b/mFnlc8A+B1LcoTrXZlGP2ppFMGR3EVALJIHLACRNl3Sw49zMajJfwD5gu5m9E+5PARcCJxIkpYeBm8zs7ojvwbk2+1rZPAayjWsOfI3d9Di4/en0yfy44fNMSy6El++KMULn2i/SGkn44X9N+MreNxeY28yxZ+fYtho4r4AhOheZAWzn8uRjzEufyjIb8aH9t6Q+xTnJV5jy+PdhzKeg54AYonSu/XyuLecicklyAb20j582XJRzv5HgBwe+DHvr4akfdXB0zhWOJxLnImF8MfkUi9Ojed2yu//et8KGw8S/hUW3wubVHRifc4XjicS5CEzQG4xKbOSe1NktFz7rOkh2h2dvjDwu56LgicS5CHw6+Tx7rZyHUk09X5uh9+Ew+avw6r2w+Y3og3OuwDyROFdwxseSi3k2PZZd9MzvkNO+CWU94JnZ0YbmXAQ8kThXYCdoHYO1hcfTp+R/UO8KmPS38Np98N766IJzLgKeSJwrsAuSi0mZeDI1oeXCmab8HSgBL/48msCci4gnEucK7KzEEl620WzNPZ9o0/oNhhM/B9V3wJ73ognOuQh4InGugPqyi5O0lufSJ7btBKd9Aw7sgqrbCxuYcxHyROJcAU1OrCQp4/nUCW07wZEnwchz4KVboGFfYYNzLiKeSJwroNMSy9hj3XjZRrXjJN+AnW/Dsj8ULjDnIuSJxLkCOi2xjEXpY9lPedtPMvJcOGwULPpV4QJzLkKeSJwrkMPYxnGJt3gh3cZmrUaJRDBtSs1C2JRr6R7nOhdPJM4VyIRE8FT6wvSx7T/ZyZdCWU+o8lqJ6/w8kThXIGMTa2mwBK/lmDK+1XoOgJMugaW/g73b2n8+5yLkicS5AhmnNayyoeyjW2FOOOmrcGA3LLmnMOdzLiKRJhJJSUk3SaqTtEPSfZIGNlH2DEnVkrZK2hb+fHFWGZO0W9LOjFe/KO/BufwY4xJrWJIeWbhTDhoPgybA4v8Fs8Kd17kCi3rN9pnARcAUguVxbwPuBD6Ro+wq4LPAhvD3M4BHJZ1iZisyyl1gZn+JLmTnWq9Sb9NPu1nShmG/lTMfanLf9OTJ/Hv5bUz73s9YZpUf2r9u1rRWX8+5Qou6aWsGMNvM1prZNuA6YKqkyuyCZvauma03MwMEpMP42jEg37mOMVZrAVhayBoJMC/1EfZZOZ9LPlPQ8zpXSJElkrDJaRiwuHGbma0BtgNjmzmuHtgHLABeAh7LKnKvpM2SXspu+so6zwxJVZKq6urq2nEnzrXs5MQa9lg3XrchBT3vdnrzePoULko+RzkNBT23c4USZY2kcca67CEn9Rn7PsTM+gO9CZq5HoYP/O85HxgBDAFuBuZKmtrEeeaY2UQzm1hRUdG2O3AuT2MTa3nNKkmRLPi5f586k0O1k3MTLxf83M4VQpSJZEf4nt0Z3p+gVtIkM9tnZg8AZwFfzdj+pJntDV/3AHcB0wsYs3OtVkYDJ2gdS9JHR3L+BemTeMf687nks5Gc37n2iiyRmFk9Qcf5wUUZJI0kqI3k+7huGTC6mf1pgv4U52JzjGroqf0sjSiRpEjyh9QZnJ14hcM+VMF3Ln5Rd7bPAa6XNEJSX2A2MN/M1mUXlHSJpJMklUnqIelK4Fxgfrj/REmTJXWTVC7pM8CXgN9FfA/ONWtsIuhoX2KF7WjPdH/qdMqV4hPJhZFdw7m2ijqRzALmAYuAWiAJXAYgabqknRlljwLuJ+hD2Qh8BbjUzB4P91cAtwPvAe8CNwBfMbM/RnwPzjVrnNZQb71Yb0dEdo3XbSivpwdzYfKFyK7hXFtF+hyJmaWAa8JX9r65wNyM338C/KSZcz0FtHM2POcKb1xibTjsN9pW1nmpU7m67D6OYCvvcGik13KuNXyKFOfaY/9ujtFbvGLR9I9k+lP6VBIypiVfivxazrWGJxLn2uPtpZQpHVlHe6Y37SheS1d685brdDyRONcetdUAhZ1jqxl/Sn2E8YnVDNG7HXI95/LhicS59qhdzEY7lDoGdMjl/pQ+FYALEy92yPWcy4cnEufaY2N1hzRrNaqxCqrTo/iUN2+5TsQTiXNttXsrbF0b2RPtTXkoNYUTEusZqnc69LrONcUTiXNttTGY+yrKBxFzmZ+eBMDHE1Udel3nmuKJxLm2CjvaX+2gjvZGNXY4y9LDmZpc1KHXda4pnkica6uN1XDYaHZwSIdfen5qEhP0Buzw5i0XP08kzrWFGdQuhsGnxHL5R9OTSMhgVdOrKzrXUTyRONcW2zfCzndg8ISWy0bgdRvCm+kjYMW8WK7vXCZPJM61xcagf4RB8SQSEPPTk+HNZ2FPfUwxOBfwROJcW9QuhkQZHHlSbCHMT02EdAO8Pj+2GJwDTyTOtU1tNRxxApT3iC2EV+xo6HMUrPTmLRcvTyTOtVY6HTxDElNHeyMjAcdNg9VPwoG9scbiurZIE4mkpKSbJNVJ2iHpPkkDmyh7hqRqSVslbQt/vjirzChJT0jaJalG0neijN+5nLaugX3bY+wfyXDMVDiwG9b/Je5IXBcWdY1kJnARMAUYEm67s4myq4DPAocB/YFvAXdJGgNBUiJYbXEFwWqJnyZYxvcLkUXvXC61i4P3mGskAFSeDmU94Y3HWy7rXESiTiQzgNlmttbMtgHXAVMlVWYXNLN3zWy9mRnBUnPpML5RYZEzgeHAd81st5lVA7cAX4v4Hpz7oNpqKD8EKo6NOxIo7wkjzoQ3Hos7EteFRZZIJPUDhgGLG7eZ2RpgOzC2mePqgX3AAuAloPF/yDjgdTPLXOe9OtzuXMeprYJB4yGRjDuSwOiPwda1sHl13JG4LirKGkn
"text/plain": [
"<matplotlib.figure.Figure at 0x1a1b245860>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"np.random.seed(14000) #Generación de números aleatorios\n",
"pdhist = pd.Series(np.random.randn(1000)) #Serie de números aleatorios\n",
"pdhist.hist(normed=True) # Muestra las barras\n",
"pdhist.plot(fontsize=13, kind='kde') #Gráfico de barras (kde = Kernel Density Estimation plot. Haga la prueba con 'hist')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Utilicemos los datos de `Automobile_data.csv` para hacer un **gráfico de barras o histograma** de la variable `price` (precio)."
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0,0.5,'Frecuencia')"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAENCAYAAAAL98L+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAF0lJREFUeJzt3X+0J3V93/HniyX+YhUDXIwgsFpZbURDN1fBiorkhCYmxlZbdSsSTcweLNH+SuKpVqS2pv4gJ9GwdndrSahwNmo11qO2GqtUESS9sIu/iusPFheIcgHFgwYqu+/+MbPyncu9e+/c+/1xd3k+zvme78znMzOf9x1meX9nPjOfSVUhSdJ+h006AEnS6mJikCR1mBgkSR0mBklSh4lBktRhYpAkdZgYJEkdJgZJUoeJQZLUcfikA1iOY445ptatWzfpMCTpoHLttdfeXlVTiy13UCaGdevWMTMzM+kwJOmgkuSmpSznpSRJUoeJQZLUYWKQJHWYGCRJHSYGSVKHiUGS1GFikCR1mBgkSR1jSwxJfj3JjiQ7k3wpyYva8vVJrk6yq/0+eVwxSZIeaCxPPicJ8D7g2VX1lSRPA76Q5CPAFmBzVV2W5BxgK3DWyIK58MiRbXrxtu+aXNuStETjvJS0D9j/f+VHA38DHANsALa35duBDUkWHctDkjQaYzljqKpK8hLgvyf5EfBI4NeAE4Bbqmpvu9zeJLe25bOD20iyCdgEcOKJJ44jbEl6UBrLGUOSw4F/A7ywqk4CXgC8H1i71G1U1baqmq6q6akpTygkaVTGdSnpVOC4qvoCQPv9I+Ae4PgkawDa7+OAPWOKS5I0x7gSw83A45I8CSDJ3wV+DvgGsBPY2C63EdhRVbPzbkWSNHLj6mP4bpLXAP8tyb62+FVVdWeS84BLk1wAfB84dxwxSZLmN7YX9VTV5cDl85TfAJw2rjgkSQfmk8+SpA4TgySpw8QgSeowMUiSOkwMkqQOE4MkqcPEIEnqMDFIkjpMDJKkDhODJKnDxCBJ6jAxSJI6TAySpA4TgySpw8QgSeowMUiSOsbyop4k64CPDBQ9GnhUVR2VZD1wKXA0cAdwblV9YxxxSZIeaFyv9twNnLp/PsmfDLS9BdhcVZclOQfYCpw1jrgkSQ809ktJSR4CvBy4JMmxwAZge1u9HdiQZGrccUmSGpPoY/gN4Jaqug44oZ3eC9B+39qWS5ImYBKJ4beAS/qulGRTkpkkM7OzsyMIS5IEY04MSY4Dngtc3hbtAY5PsqatXwMc15Z3VNW2qpququmpKa80SdKojPuM4ZXAx6vqDoCqug3YCWxs6zcCO6rKUwJJmpCx3JU04JXA6+aUnQdcmuQC4PvAuWOOSZI0YKyJoarWz1N2A3DaOOOQJC3MJ58lSR0mBklSh4lBktRhYpAkdZgYJEkdJgZJUoeJQZLUYWKQJHWYGCRJHSYGSVKHiUGS1GFikCR1mBgkSR0mBklSh4lBktRhYpAkdZgYJEkdY0sMSR6W5D8l+UaSLyfZ1pavT3J1kl3t98njikmS9EDjfLXnO4B7gPVVVUke05ZvATZX1WVJzgG2AmeNMS5J0oCxJIYka4FzgcdVVQFU1feSHAtsAH65XXQ7cHGSqaqaHUdskqSucV1K+jvAHcCbk8wkuSLJGcAJwC1VtReg/b61Le9Isqldd2Z21pwhSaMyrsRwOPAEYEdVTQOvBz4MrF3qBqpqW1VNV9X01NTUiMKUJI0rMdwE3EdzqYiquga4Hfhb4PgkawDa7+OAPWOKS5I0x1gSQ1XdDnyWti8hyXrgWGAXsBPY2C66keaswmtFkjQh47wr6TzgkiR/BPwEeEVV/SDJecClSS4Avk/TSS1JmpCxJYaq+jZw5jzlNwCnjSsOSdKB+eSzJKnDxCBJ6jAxSJI6TAySpA4TgySpw8QgSeowMUiSOno9x5DkMJoxj6aA7C+vqquGHJckaUKWnBiSnAp8CHg8UDSJodrqNcMPTZI0CX0uJb0L+DhwFPBD4GeB9wIvH0FckqQJ6XMp6WnA2VV1b5JU1V1Jfg+4DviL0YQnSRq3PmcMPxmYvivJFHAv8NjhhiRJmqQ+ZwzXAb8EfAL4HHAp8GPgKyOIS5I0IX3OGH4H+Go7/a+A77XTrxpqRJKkiVryGUNV7RmYnsWEIEmHpAMmhiS/WFXXttPPWGi5qvrrYQcmSZqMxc4YrgAe2U5/cYFliiU8x5BkN3BP+wF4fVV9MsnpwFbg4cBu4Jyqum2x7UmSRmOxPoYjB6Z/ZoHPQ3q094+r6tT288kkAS4Dzq+q9TSd2m/rsT1J0pAdMDFU1b6B6b0LfVbQ/jRwT1Vd2c5vAV6ygu1JklZoyXclJflEkufNKTsrycd6tHd5ki8leU+SRwMnAjftr6yq24HDkhzVY5uSpCHqc7vqM4DPzyn7PHDaEtd/dlX9AvB0mnGWLu7RNkk2JZlJMjM7O9tnVUlSD30SQ/HAzuo1S93G/ttdq+pe4D3As4DvACftXybJMc0idec862+rqumqmp6amuoRtiSpjz6J4Trg/DllrwF2LLZikiOSHNlOB3gZsBO4Fnh4kjPaRc8DPtAjJknSkPUZEuP1wBVJXgTsAk4GTgGed8C1Go8BPpRkDc1ZxteAf1ZV+5K8Atia5GG0t6v2iEmSNGR9nnzemeTngd8E1tEMwf3SqrplCet+G/h7C9RdBTx1qXFIkkar1xvcqupW4D+OKBZJ0irQ99WeT6d59uCRg+VV9Y5hBiVJmpw+r/Z8E3AB8GXgRwNVBZgYJOkQ0eeM4XzgOVV19aiCkSRNXp/bVQ8DrhlVIJKk1aFPYrgEeOWI4pAkrRJ9LiWdCvzLJK8F/mawoqqeP9SoJEkT0ycx/J/2I0k6hPV5wO1NowzkQeHCIxdfZiTt3jWZdiUdlPo+x7CO5n0Jx1XVv0jyROBnqur/jiA2SdIE9Hkfwy/RPMNwJvDbbfFjgT8afliSpEnpc1fS24GXtR3N97VlM8CGoUclSZqYPonhiVX18Xa6AKrqb4GHDj0qSdLE9EkMN7ejq/5UkqfRDJUtSTpE9EkMFwMfTvIyYE2SFwL/FXjXSCKTJE1En9tVtyQ5DHgzzeWjtwN/UlV/PqLYJEkT0Pd9DO+heV+zJOkQ1ed21WMX+vRpMMmbk1SSU9r505Ncn2RXkk/13Z4kabj69DF8l2aMpPk+S5JkA3A68J12PsBlwPlVtR74HPC2HjFJkoasz6Wkk+fMHw+8Ebh8KSsneSiwGfinwGfb4mngnqq6sp3fQnOX02/1iEuSNER9Op+/NafoW0luAD5Nc3fSYt4CXFZVNzYnCgCcCNw00MbtSQ5LclRV3bnU2CRJw9PnUtJ87gbWLbZQkmcCT2cFHddJNiWZSTIzOzu73M1IkhbR553PL5lTdASwkaUNxf1c4MnA/rOFxwGfBN4NnDTQxjFAzXe2UFXbgG0A09PTtdS4JUn99OljmDtY3t00YyW9YbEVq+ptDHQqJ9kN/DrwNWBTkjPafobzgA/0iElLManhvsEhv6WDUJ8+hhOG3XhV7UvyCmBrkofRdDyfM+x2JElL1+dS0hOAu6vqtoGyY4G1VfXtPo1W1bqB6auAp/ZZX5I0On06n/8CeMycsp8Dtg8vHEnSpPVJDOur6stzyr4MPGmI8UiSJqxPYvhhkqPnlB0N/GiI8UiSJqxPYvg0sDnJIwDa73e15ZKkQ0SfxPB64PHAHUluBO6gGSbj90cRmCRpMvrcrjrbPsF8Os1DabuBa6pq34hikyRNQN/3MewDrkryjapyXApJOgT1eR/DI5JsTfJj2vc8J3lhkjeOKjhJ0vj16WO4CDgBeB7wk7bsWpphtCVJh4g+l5J+Azilqn6QZB9AVd2c5PjRhCZJmoQ+ZwxrgB8PFiQ5gmYwPUnSIaJPYrgK+IM5ZecD/3t44UiSJq3PpaR/DXwmyTnA2iQ7gLU0fQ6SpENEn+cYdid5Ck1fw+NpXsn50apySAxJOoQsKTEkORz4EPDSqnr/aEOSJE3SkvoYquo+miee7xttOJK
"text/plain": [
"<matplotlib.figure.Figure at 0x1a1b61cb70>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"p = df['price'] #Seleccionamos la variable price\n",
"pdf = pd.Series(p) #Convertimos la selección en una serie de Pandas\n",
"pdf.hist(normed=True) # Muestra las barras\n",
"pdf.plot(fontsize=11, kind = 'hist') #Gráfico de barras\n",
"plt.xlabel('Precio',fontsize=13)\n",
"plt.ylabel('Frecuencia', fontsize=13)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Este **gráfico de barras** nos indica que hay un número alto de automóviles con precio menor a 10000, entre otras cosas .... ¿Qué cosas? ;)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"### Gráfico de dispersión\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Este **gráfico de dispersión** muestra la relación entre las variables `tamaño del motor` y `precio`."
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0,0.5,'Precio')"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZ0AAAEZCAYAAABM/vhsAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJztvXuYHVWV9//5pmmgAaEJRIROQoJkggSEQA9EwflBFBIYIRFQQZGojHn1hfGeIagjoFyCUVEEGRGQmxIQmBBQjPy4jCNy6xAgBIyEa9IJJJgLtwi5rPePvU9SOV11+pzuc+1en+ep51Stfam9q+rUqr322nvLzHAcx3GcajCg1gVwHMdx+g+udBzHcZyq4UrHcRzHqRqudBzHcZyq4UrHcRzHqRqudBzHcZyq4UqnCCSNlfQ3STvmyadKekXSG5L+WdKdkv6jSmUaLMkkDStjnldIujpx/IakD5Qr/3KSVTZJIyT9VdIBZT7foZL63PgCSb+S9KykUZIekLRVmfNfLOnEIuPuFZ/pnctZBqc8SDpV0sO9zafPKx1JB0q6RdKy+KJ6IR6PLTL99sDFwMfMbGVCPhg4HzjczLYzs0fM7Cgz+0FlalJ9Yr0eqHU50kgrm6R3A9cCnzCzR2tTsq5I+q/47L0h6a34Yn0jsX26RuUSsBvwUeBS4C9m9nYtylIrJE2TdEcF8p0t6fKMsAck/bAM53hZ0pr4DK2Q9CdJh/Y23yzM7EozO6i3+fRppSPpCOB+4FmgHXgXsC/wG+BjBdI1Jw73Af7dzObnRRsGbDCzp8pZZqfL9S8aM1tmZh8wsyfKXabeYGZfjEpyO+DIKNsusf26RuUyMxtnZk+b2WFm9o1alKPRyXhefwGcJGm7vLj7AgcDqQqpB3wmPleDgaeAOyRtk1LGLeJHRu0xsz67AQuBK4qIdx/wE2Am8BowlXAT/wAsB1YD/wscGON/ElgDGPAG8Gwin+8k8h0G/BZYCqwiKMCdYtjuwG3Aq8CieP6WAmV8DzArluVvwL/F8w9LxPkC8GSMMxc4spt6f56gkF8DrgOuB65OhBtwaKIus2M9VgJzgJEx7Grg1zGP12Ken80714eAPwMrYvg3AMWww4B1wGeA54DXo/zLwPPA60AncH5a2eLx8cDjse6PE1qmubDPxmfhy8DiWP5fAE0Frs2IeD9fj/l9NfxdNoZvAXwr3ovcvT2wiGft0GQ+CXl7fMb+Hq/RHXn3dgZwFaEltzrW4wTgn4FHYznvAt6dSPNNYEEMexH4XuKabx2v4eRE+vuBPRPptyO0gBYDy4BbgLYCddsK+BnhP7ME+FpMe2IizuHAX+I9WEj4oMuF7RXLtHNG/l8kPN9nxPxXA+cBgwj/jdeA+cDBiTTNwPeBF+J1/SOwVwybBLwTn7034rZbDDuRzf9LH00px7cJz+WjKWXdgvC//0Ke/GfAPXF/ADA9xnud8KxPLuH99jJwQt4zZMCoxLX8bHwG3gZa4/X4LvAM4bn9E7BfIo8BwGnxOuaemy8k693T52NjumIr2Ggb8E/xon+4iLj3xQd2LCBgG2AoMCHutwA/jTegOaY5DFiXks934v42hBfopcAO8SH8AKG1tUV8aH8BbAu0AY8AlxYo493Af8e83kN4gW9UOoSXx0Jgv/jgHB3/RHtm5PchguI8IpbnFGAt2UrnN8AvCS+WJuD9wC4x7OqY9uSY1xEx7w/G8FHxAZ4Q0+5F+IOdkriWFs+xQ7x2/wS8BYyKcVqBMRll+wDwD+CoeP5/jccHx/DPxvKdF8u/J+EF9OmMa7MF8Nd471oICuivbK50zgceAvaIdTqV8AGxYzfPWpbSGQ38C7AlsCPhg+TeRPgM4M14bQcAXyG8NG4DdiW8AP4MXJxI8wnCx4IIL6S/A5NiWE7p3E94/rYmvLhvT6S/Jub5npj/dYTndEBG3c4jvKyGxXt4FeGFfmIM35/wPzs6XrNRhA+uT8TwYpTOWoKybyYo3HUEJdYe8/wx8EQizTnx3u0Z63gB8BKwTQyfBtyRd57DEte6CZhIeGnvnyjHuph261xeGdfj4cRxC0HZ5q7HsYT/QU7RvSd3jiLfcRuVTrw/v4j3uCVxLf9AUMpbxefmx/Ge7k54zk8jKL3tYz5fi9dnTIz/bqA9Ue+k0inp+diYricv9EbYgEPiRd8rITuW8EddDfwjIb8PuKqb/N4V89s78WAWUjqfiDdzi5S8Phgf4m0TsnGEF7VS4rfFc783ITuCzZXOk8SXeCLO7SRaXnlhvwSuy5PdT7bSuTrm976UvK4G/jdPdj1wedy/JP/6Elo6/3/iWhowNBG+R7wenwC2SzlnsmyXA7/OC78B+EXc/yzhZdeUCP8tcFGBZ+dtEi8TQivS4r4ISvRf8tLNA07u5jlKVTop8doJL7bcR84M4JZE+MB4DY5JyL4OPFAgz0uAa+P+1inpjweWxv0t4/k/lAjfEdgAjM7IfxEJRU74gFjPppfsFcDP89J8m/jSpzils5zEfwR4AvhR4viAWMat4/FLBBNULnwLwlf5x+JxmtK5FrgyT/bfwE8T5VhNyn87L82wWP/94vGkWP4t4/F4guI4Atiqu2ciJf+XCcpxVdz/I5s+tHLX8qBE/CbCx9hBefk8wybl9RxwaoHr/2RPn4/c1pf7dF6Nv4NzAjObZWathC/hfC+dF5IHknaUdLmkBZIWEV4oEL4aimEY8JyZrUsJGwIsM7M3E7JnCS+CtPxzdXgxIXs+L85w4FJJq3IbwZTRllG+weTVOSXPJFNi+O2Slkr6WZ69Oj+vFxLlHk6wbyfLdhbhCz3HBsJLCwAzew74NOFlv0TSnyUdmVG2IYQ/S5JnozzHMjNbnzh+k/AhkcbgGP+thCx5bXYmfNndnlenPRJ1LglJIyXdJmmJpNeAewgviYGJaEsT+29lyDbWSdIpkubETubVBJNs/vOVTJ+8JrvG82+8rhYcaVaw+XXNnSvnlPBCIv5qwpd9juHA5/Ku2dSYrlheyWn+yFt0vQYCtotlasurwzqCIupShwTFPE+dGf/tjZjZCwST5+Qomgz8yszeieF/IJg8zwGWR+/X/QvlmcIkM2s1s/eY2ZFm9lBe+AuJ/d0I77278u5BGzAkXq+hBJNxd5T0fCTpy0rnb4QLUpS7JuGll2Qa4QaMMbMhBHMShAe6GF4AhktqSglbBLw7r8NvD8JXyKsp8Tvj7+4J2fC8OC8Cn48PYG7bzsy+lFG+ToJiTJKf50bMbLmZfdnM9iS0BA4Dku7h+XkNI9h6c2W7Kq9s25vZqM1PsdnLBDO71cyOILzkbwJuS+skJVzP/LLvQUKJlUgnXe9PMv9XCS/oj+TVaVszm9bDc15B+ALfx8y2J5h6ofjnbTMkjSC0QL9NMIPuEM9RbH5LCV/pG+stqZWgBLtc13jvlpJ4DiTtQPj6zfEioaWTvGbvMrOyurfnlakzrw5NhP91rg75/3so7nlKS5fGL4BPS2onmIF/mVfGn5vZBwkK4W+E57ycJMu5lNCHdWjePdjGzC6K1+slgjm5O0p6PpL0WaUTL+BpwGckXShpiALbELxHumNHgv34H/GL/sISi/A7wg2+SNIOkpokjZH0LuBhQv/LjyRtI2k3Qmfnr8ysy8NsZosJprsfSNpe0i7Af+ZFuwg4W9L+sZ4tCmNL9soo37XACZI+HD1bTgYy3SElfVLS8Pg1tJpNHbA5xkg6KdZzLMFUc20M+zlwoqRjJDXH8+0t6f8rcL6RksbH+7U2ntNI/7NfDRwvaVw8/1HAccCvsvLvhgcJL8hp8Tq+l2DrBjY+Wz8Ffhhf7kjaLp6/lK/2JNsT+uBWK7h+n93DfHLkWqHLgXXRlbbYDzDi1/hvgPMl7RL/Az8hOB08npHsOuBMSbvH+/Yjwj3LcQkwSdJRiedgn0q6+RKejW9Jeq/CGKTvEUynf4zhLwPD8jzQria0zMfG5+lYgnXk6h6c/3ZC6+smQh/dM7kASR+Q9MFYrn8QTLYFW0+9IbbMLiG8k/aIZXhXvB+7xGiXAt+VdFB
"text/plain": [
"<matplotlib.figure.Figure at 0x1a1b9705c0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"x= df['engine-size'] #Variable predictora\n",
"y= df['price'] #Variable objetivo o que deseamos predecir\n",
"plt.scatter(x, y) #Gráfico de dispersión en Matplotlib \n",
"plt.title('Gráfico de dispersión de Tamaño del motor Vs. Precio', fontsize=13)#Nombre del gráfico\n",
"plt.xlabel('Tamaño del motor', fontsize=13)#Etiquetal del eje-x\n",
"plt.ylabel('Precio', fontsize=13)#Etiqueta del eje-y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"### Correlación entre variables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tomememos las dos variables del ejemplo anterior..."
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5,1,'Gráfico de dispersión de Tamaño del motor Vs. Precio')"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZkAAAEVCAYAAAAy15htAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJztnXmYFMX5xz8vy8IugiwqKiwgoIgXUXQVFGMQD8RzjRd4YcQr0XjEECH6U7xREs94ES9UPFAUUKNoRKISQEFEREVBUViIoBweXMtu/f6omp3eYe6dnp6ZfT/PM89MVVdXv9Xd09+uqreqxBiDoiiKovhBk6ANUBRFUQoXFRlFURTFN1RkFEVRFN9QkVEURVF8Q0VGURRF8Q0VGUVRFMU3VGSiICL9ROQLEWkTET9MRL4TkZ9FZH8ReU1E/pIlmzqIiBGRzhnM82ERedwT/llEDsxU/pkklm0i0k1EPheRfTN8vINFpOD8+0XkMRFZJCJ7ish0EWme4fyXisjAJNPu5u7p7TJpg5IZRGSIiLzf0HwKTmREZD8RGS8iK9yDabEL90ty/62Be4ATjTGrPfEdgFuAQ40xLY0xHxhjBhhjbvenJNnHlWt60HZEI5ptIrI98ARwqjHmw2As2xIRedDdez+LyDr3IP3Z8zkjILsEaA8cC9wH/NcYszEIW4JCREaKyCs+5DtZREbH2DZdRP6WgWP8T0TWu3tolYi8IyIHNzTfWBhjHjHGHNDQfApKZETkCGAasAioAFoBPYCngRPj7FfsCe4F/NEYMz8iWWeg1hjzaSZtVrY4/0ljjFlhjDnQGPNxpm1qCMaYi5wotgSOdHEtPZ+xAdlljDH9jTGfGWP6GmOuDMKOfCfG/foQMEhEWkak7QH0AqIKUBqc5e6rDsCnwCsi0iKKjU3dS0XwGGMK5gMsBB5OIt1U4C5gAvAjMAx70V4HVgJrgXeB/Vz604D1gAF+BhZ58rnGk29n4HlgObAGK3jbum07AROB74El7vilcWzcEZjkbPkCOM8dv7MnzfnAJy7NHODIBOU+FyvAPwJPAk8Bj3u2G+BgT1kmu3KsBmYD3d22x4GxLo8fXZ7nRBzr18B7wCq3/UpA3La+wGbgLOAr4CcXfynwNfATUAXcEs02Fz4JmOvKPhdb8wxtO8fdC5cCS539DwFFcc5NN3c9f3L5XW7/HnXbmwJ/ddcidG33S+JeO9ibjye+wt1jP7hz9ErEtX0WeBRbU1vrynEysD/wobPzTWB7zz5/Bha4bd8AN3jOeYk7hxd49p8G7OLZvyW2hrMUWAGMB8rjlK05cC/2P7MMuMLtO9CT5lDgv+4aLMS+wIW27eZs2i5G/hdh7++rXP5rgZuBttj/xo/AfKCXZ59i4EZgsTuvbwC7uW2DgU3u3vvZfdq7bQOp/186NoodV2Pvyw+j2NoU+78/PyL+XmCK+90EGOXS/YS91y9I4fn2P+DkiHvIAHt6zuU57h7YCJS583Et8CX2vn0H2NuTRxPgYnceQ/fN+d5yp3t/1O2XbAFz/QPs6k7yYUmknepu0H6AAC2ATsAJ7ncpcLc74cVun77A5ij5XON+t8A+MO8DWrub7kBsbaqpu0kfArYCyoEPgPvi2PgW8JLLa0fsA7tOZLAPi4XA3u5GOdr9aXaJkd+vsUJ5hLPnbKCa2CLzNPBP7IOkCPgVsIPb9rjb90yX1xEu74Pc9j3dDXuC23c37B/qbM+5NO4Yrd252xVYB+zp0pQBvWPYdiCwARjgjn+MC/dy289x9t3s7N8F+8A5I8a5aQp87q5dKVZwPqe+yNwCzAS6ujINwb4wtElwr8USmZ7AIUAzoA32BeRtz/ZngV/cuW0CXIZ9SEwE2mH/8O8B93j2ORX7ciDYB9APwGC3LSQy07D3Xwn2Qf2yZ/8xLs8dXf5PYu/TJjHKdjP24dTZXcNHsQ/wgW77Ptj/2dHunO2JfcE61W1PRmSqseJejBXYzVjRqnB53gF87NnnenftdnFlvBX4Fmjhto8EXok4Tl/PuS4CKrEP6X08dmx2+5aE8opxPt73hEux4ho6H8dj/wchYdsxdIwkn3F1IuOuz0PuGpd6zuXrWBFu7u6bO9w13Ql7n1+MFbmtXT5XuPPT26XfHqjwlNsrMindH3X7pfNAz8UP0Med5N08ccdj/5hrgQ2e+KnAownya+Xy28NzI8YTmVPdxWsaJa+D3E27lSeuP/bBLFHSl7tj7+yJO4L6IvMJ7qHtSfMynppVxLZ/Ak9GxE0jtsg87vLbPUpejwPvRsQ9BYx2v/8ReX6xNZl/e86lATp5tnd15+NUoGWUY3ptGw2Mjdj+DPCQ+30O9uFW5Nn+PHBnnHtnI56HB7aWaNxvwYrmIRH7zQPOTHAfRRWZKOkqsA+y0EvNs8B4z/Zt3Dk4zhP3J2B6nDz/ATzhfpdE2f8kYLn73cwd/9ee7W2AWqBnjPyX4BFu7AtDDeGH6sPA/RH7XI17yJOcyKzE8x8BPgb+7gnv62wsceFvsU1Koe1NsW/dJ7pwNJF5AngkIu4l4G6PHWuJ8t+O2KezK//eLjzY2d/MhY/CCsURQPNE90SU/P+HFcM17vcbhF+sQufyAE/6IuzL1wER+XxJWKy+AobEOf+fpHt/hD6F1CfzvfvuEIowxkwyxpRh33QjvWgWewMi0kZERovIAhFZgn2AgH0rSIbOwFfGmM1RtnUEVhhjfvHELcL+8aPlHyrDN564ryPSdAHuE5E1oQ+2aaI8hn0diChzlDy9DHXbXxaR5SJyb0R7c2Reiz12d8G2T3ttuw77Bh6iFvuQAsAY8xVwBvbhvkxE3hORI2PY1hH75/CyyMWHWGGMqfGEf8G+OESjg0u/zhPnPTfbYd/cXo4oU1dPmVNCRLqLyEQRWSYiPwJTsA+FbTzJlnt+r4sRV1cmETlbRGa7TuG12CbWyPvLu7/3nLRzx687r8Y6vqyi/nkNHSvkRLDYk34t9s09RBfgdxHnbJjbL1m+Cym9Yx1bngMBWjqbyiPKsBkrPFuUwUMy91NVjP92HcaYxdgmzAtc1AXAY8aYTW7769gmzOuBlc47dZ94eUZhsDGmzBizozHmSGPMzIjtiz2/22Ofe29GXINyoKM7X52wTcCJSOn+8FJIIvMF9gQk5T6Jfch5GYk94b2NMR2xzUNgb+BkWAx0EZGiKNuWANtHdNB1xb5lfB8lfZX73skT1yUizTfAue6GC31aGmN+H8O+KqwQeonMsw5jzEpjzKXGmF2wb/p9Aa+7dmRenbFttSHbHo2wbWtjzJ71D1Hv4YEx5kVjzBHYh/o4YGK0Tk3s+Yy0vSse0UqRKra8Pt78v8c+kA+PKNNWxpiRaR7zYewb9l7GmK2xTbeQ/P1WDxHphq1hXo1t1mztjpFsfsuxb+F15RaRMqzobXFe3bVbjuc+EJHW2LfbEN9gazLec9bKGJNRd/MIm6oiylCE/V+HyhD5v4fk7qdo+0XjIeAMEanANuv+M8LG+40xB2EF4AvsfZ5JvHYux/ZBHRxxDVoYY+505+tbbPNwIlK6P7wUjMi4E3YxcJaI3CYiHcXSAuvdkYg22PbfDe6N/bYUTXgVe0HvFJHWIlIkIr1FpBXwPrb/5O8i0kJE2mM7Jx8zxmxx8xpjlmKb4m4Xka1FZAfg/yKS3QmMEJF9XDlLxY7t2C2GfU8AJ4vIYc7z5EwgpnuiiJwmIl3c285awh2mIXqLyCBXzn7Yppcn3Lb7gYEicpyIFLvj7SEiv4lzvO4icpS7XtXumIbof+7HgZNEpL87/gDgt8BjsfJPwAzsA3GkO487Y9uqgbp7627gb+5hjoi0dMdP5a3cy9bYPrS1Yl2xR6SZT4hQLXMlsNm5tib7woV7234auEVEdnD/gbuwTgJzY+z2JDBcRHZy1+3v2GsW4h/AYBEZ4LkP9vLT7RZ7b/xVRHYWOwboBmxT6Btu+/+AzhEeYo9ja9793P10PLb14/E0jv8ytnY1DtvH9mVog4gcKCIHObs2YJtg49aOGoKref0D+0zq6mxo5a7HDi7ZfcC
"text/plain": [
"<matplotlib.figure.Figure at 0x1a1bd55a20>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"from scipy import stats\n",
"x=df['engine-size'] #Variable predictora\n",
"y= df['price'] #Variable objetivo o que deseamos predecir\n",
"slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)\n",
"line = slope*x+intercept\n",
"plt.plot(x,y,'o', x, line)\n",
"ax = plt.gca()\n",
"fig = plt.gcf()\n",
"plt.xlabel('Tamaño del motor', fontsize=9)#Etiquetal del eje-x\n",
"plt.ylabel('Precio', fontsize=9)#Etiqueta del eje-y\n",
"plt.title('Gráfico de dispersión de Tamaño del motor Vs. Precio', fontsize=13)#Nombre del gráfico\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"El gráfico de dispersión anterior revela que hay una relación lineal positiva entre el tamaño del motor y el precio del auto. Es decir, a medida que aumenta el tamaño del motor aumenta el precio."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Este gráfico de dispersión revela que hay una relación lineal negativa entre las millas que recorre el auto por combustible que usa y el precio del mismo. Es decir, mientras más millas por galón el auto es más económico. "
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5,1,'Gráfico de dispersión de Millas por galón en autopista Vs. Precio')"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbsAAAEWCAYAAAD/6zkuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3XmYFMX5wPHvu8tyKHKoaJDDRUQUREWJYjCKeABe4C2oiHcS/WmMQUGJoCGKwcSYxBhRDCCC4oV4onhEg4qCaBDFCIpyqKAcgnLsLu/vj6phe4eZ6dmdmZ1j38/z7LMz1Vd1T0+/U9VV1aKqGGOMMYWsKNsZMMYYYzLNgp0xxpiCZ8HOGGNMwbNgZ4wxpuBZsDPGGFPwLNgZY4wpeDkV7ESkl4j8T0SaR6UPFZFvRGSDiPxURJ4XketqKU+tRURFpDSN67xfRMYH3m8QkcPTtf50ipc3EekgIgtF5OA0b+8IEamV/jAi8nMRWRt4P1JEZgbevyYiw2sjL7lIRAaLyKI4064RkedEpGFt5yvfiEhb/z3aI9t5yVfpuEamLdiJyCEi8riIrPQZW+Lf90py+SbAX4FTVXVNIL01cCtwtKo2VtV3VbWvqv4xXXnPNr9fb2U7H7HEypuI7AZMBM5S1feyk7PY/AVaReS5GNM+8tN6AqjqG6rarNYzmedE5AzgSKC/qm7Kdn7SSUTGi8j96Vynqn7pv0crMr19ETnVX393ijHtAhH5VkQa1HT9fj0jRaTcb2e9iCwWkZtERFJZbyLpuEamJdiJyHHALGAx0A3YCegCTAZOTbBcSeDt/sD/qeqCqNlKga2q+lE68moqRR3/pKnqSlU9XFX/m+48pckKoLuItI0kiMgRQD2gImu5yhARKRaRWqulUdXHVPVUVd1SW9s0SXsa+B4YGGPaZcAEVd2chu28pqqNgSbApcAw4MJYM9b0OpN2qpryH7AIuD+J+V4D/gJMw30gQ4HWwAvAKmAd8AZwiJ//bGAjoMAGYHFgPcMD6y0FHgW+AtbiAu8uftqewFPAt8BSv/1GCfL4E2C6z8v/gEv89ksD81wKfOjnmQccH7LfF+F+CHwPPAhMAsYHpitwRGBfZvj9WAPMBTr6aeOBh/w6vvfrHBy1rZ8D/wFW++nXAuKn9QTKgfOBz4D1Pv0q4HNgPbAcuDVW3vz704EP/L5/gCuJR6YN9ufCVcAyn/97geIEx6aD/zzX+/X92p2W26bXA27wn0Xksz0kwfoiefgHMDKQPgEY4ve/Z/B4BOYZCcyMOl+D59m//Dm0HvgIGBiY1hx3Dn7nj82HwM9D8ng97pxdCfwJKAnMcwDwij+GnwHDI8fRnyMKXOzzsRn4SYztlAB3+vV/DVzntzvYT4/73QvmM/B+B+Aufwy+xX2P20Ydrz8Bj/tjtBjoF/LdSOZ8PdtPWwdMBXZKsL6rgYV++18CtxE4/9j+fN52DvjjU+b/Nvi/yDH/JfCJz8Pbwc/Wnzcv+2P9He7cHxp1fVKgtX/f1e/zOr/fb/rzJ+b2gQOBf/tjvgZ4Hmif4BiMAuZGpXUGtlJ5LTkWd+363q93Zrz1xVj/yOj5gTnA3wLnQZXrfNhnHTjnI+fjauClmlyH4uY72R1MsOP7+Iwck8S8r/md7wUI7svTFujnXzfCfZm+wH/xibogRV+E/HKfAXcDTXEXx8Nxpct6uIvOvcCOQCvgXeDuBHl8GXjSr+sn/sPZFuxwv44W+ROwCDjBn5R7J/gybwSO8/kZ5E/meMFuMnAf0MCf6AcAu/tp4/2y5/l1HefX/bPACb3eH89iYF9cEBsUOJbqt9HUH7t9gB+Bzn6eZkD3OHk7HNgE9PXbP9G/PyxwcSwD/uDzvzfupD03zrGph7sw3e0/+w7+vQbmuRWYDezl9+li3JezeZx1DvafT1fcxa7I7+taYDdSC3YXA7v4fJwDbAE6BfL5LNAYd27vA7RLkMeywH63xwXzYX56U+Ab4Hf+OO6HO8eHRF08X8ado/WJ8YMCuMkfz738dv7qtxsJdm2BU4j/3RtM1WB3r/8sWuG+T/fjLjTFgeP1LdDDH/dr/HHfIc5xSPZ8HeeP6+7Ap8CNCb6/pwPt/GfQ1R/HyxNcNKPPgfFE/XAHBvj9Ogx3zl4M/ADsGThvynA/3usDh+B+YAyI+rwiwe5N/9kU436QdAd2TLD9A4Cj/bnQFPej6q0Ex2BPXA3GwYG0u3Clscj7FbiSmPj1Hl2Na/5I/PfEf87H4K4Dkc/tNba/zod91i1xgXyYP7fqA8fW5DoUN9/J7mCCHe/hM7JvIO0U3Em+DtgUdfF4IGR9O/n1RS4iVU7G6IsQcBbu13G9GOv6Ge5X746BtN64ACEx5m/lt90+kHYcVYPdh5EPKDDP0wQuilHT7gMejEqbRfxgN96vb78Y6xoPvBGVNgkY61//Pfr44n49RU7Mnn5bwV/je/njcRbQOMY2g3kbCzwUNX0KcK9/PRh3kgd/ST8K3Jng3NlM4GKIKzWrfy24L8iRUcvNB86Ls87B+As07tfmicCVwKM+rcbBLsa25gC/Ciz7Nu5CVxRyjg+Osd+XAP/zrwfiSk/BX72XA5/416X+czkyZDuLgIsC7xv57Q5O8rsXPJZF/jw5LjB/Y1zAPzxwvO4OTN/Rr+/AONtL9nxtEZg+Bngy0X5Hre8OYGqs8znOOTCe7YPNi8AfotLeovLHyUjcD6vg5zUKeDHq84oEu9dwPxRK43zHE9aS4W75KIHrWox5ngPu8a8b4n50DghMX+LzuF2NQBLHdCQuuK/1610AXBf1vYn+XMM+6+uAdxNsM+nrULy/dNTzf+v/t44kqOp0dTf+T8T9aghaEnwjIs1FZKyIfCIiS3EXMoAWSW6/FPhMVctjTGsDrFTVHwJpi3Effqz1R/bhi0Da51HztAPuFpG1kT/cr65WcfLXmqh9jrHOoCF++tMi8pWI/E1EGgemR69rSSDf7YABUXkbgfvVFLEVdyEFQFU/A87FBZkVIvIfETk+Tt7a4EoYQYt9esRKVQ3eF/sBdxGNpbWf/8dAWvDY7Iq7oD4dtU97BfY5kftw+3Wpf11jIlIkIrf483Sdz8eBVJ5HY3AlrQnAKhGZICK7J1hl9H4voXKf2gBL1H+LvejjHFkmkVYEzmVV3YirIors064iMlFEvhSR76k8L2J9N1rgvjfbPn9V3YArwQTz9VVgeuR7F+/zT+Z8rVDVVYH3ic4nRGSAiLwrIt+JyDrgijj7Ux3JnPdfRH1eS4h/jl6I+/HwHxH5XER+LyL14m1cRNqLyBMistx/TrP8pF0T5HksMFBEdgTOxJX0nghM74erSZnvG279OsG6Yvm3qjZT1Z1VtbNu32BwSdT7sM+6FFe7kYxkPo/tpCPY/c9v+Jwk598a9X40rjqlu6q2wRXZwf2qT8YSoJ2IFMeYthTYTUR2CKTthSvyfhtj/uX+/56BtHZR83yB+7XcLPDXWFV/GSd/y3EfZFD0OrdR1VWqepWq7o0r+fTE/eqJiF5XKe4eQSRvD0TlrYmqdq66iSpfSlT1CVU9DvflmQo8FXXMIpbGyPteBIJnNS1n+88nuP5vcRe3Y6P2aUdVHZ3E+ifjqlKaAC/VMI8RA3Clr9NxVajNcFV4Au7Crqo3qur+uCqbVrgAGE/0fpdS+TkuBfaMat0W6zhHf5eiLSdwLotII6pe+G/DXWwOU9UmVF4sYn33VuFKhds+H/8jbLcY+UpWMudr0kSkDa6mYxTQUlWb4qqKg/vzA67EGRHdHSDWMU3mvI/+vEqp/DyrUNXPVfUiVW2NqwW7BHd7I972/4mr4TjAf049fHqia+TTuNsrZxOjYYqqfqCqZ+M+v8uB25JtOZ+k6P0I+6yX4IJvMmp0HUo52PkL5xXA+SJyu4i0EWcHXB13mOa4IvEm/+W5vZpZeBZXlXKniDT1LdO6+6a37+Cqcv4kIjv4fi6/B/6lqtudVKq6DFcE/6OINPG/zH8XNdudwEgROcjvZyNxfcP2jZO/icAZInKMiNQTkfOAQ+PtjIicLSL
"text/plain": [
"<matplotlib.figure.Figure at 0x1a1cc20400>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"from scipy import stats\n",
"x=df['highway-mpg'] #Variable predictora\n",
"y= df['price'] #Variable objetivo o que deseamos predecir\n",
"slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)\n",
"line = slope*x+intercept\n",
"plt.plot(x,y,'o', x, line)\n",
"ax = plt.gca()\n",
"fig = plt.gcf()\n",
"plt.xlabel('Millas por galón en autopista', fontsize=9)#Etiquetal del eje-x\n",
"plt.ylabel('Precio', fontsize=9)#Etiqueta del eje-y\n",
"plt.title('Gráfico de dispersión de Millas por galón en autopista Vs. Precio', fontsize=13)#Nombre del gráfico\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ahora calculemos el coeficiente de correlación y el p-valor entre las variables 'Caballos de Fuerza' y 'Precio' usando 'stats.pearson()'"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.757916953745141, 1.6076704005409566e-39)"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from scipy import stats\n",
"stats.pearsonr(df['horsepower'], df['price'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Existe una fuerte correlación positiva entre las variables ya que el coeficiente de correlación es cercano a 1 y el p-valor es mucho menor que 0.001"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 1
}