Created
March 17, 2017 03:26
-
-
Save ruloweb/6728e5dfbc41a60e1dd4985dbb792de6 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Práctica independiente" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import plotly.tools as tls\n", | |
"#tls.set_credentials_file(username='sebasggx', api_key='uLZskgQnsV7QPNGSOGbq')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Importar los paquetes requeridos\n", | |
"\n", | |
"No nos olvidemos de ejecutar esta celda" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"# Paquetes numéricos y estadísticos:\n", | |
"import numpy as np\n", | |
"import scipy.stats as stats\n", | |
"\n", | |
"# Pandas maneja la carga y manipulación del dataset\n", | |
"import pandas as pd\n", | |
"\n", | |
"import plotly\n", | |
"import plotly.plotly as py\n", | |
"import plotly.graph_objs as go\n", | |
"\n", | |
"# Inicializar plotly en modo offline para la notebook\n", | |
"plotly.offline.init_notebook_mode()\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# 1 Dataset salary.csv: Una vez más\n", | |
"\n", | |
"Queremos generar un gráfico paracido a este utilizando el dataset salary.csv y ``plotly``" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"\n", | |
"![Este es el gráfico que queremos generar]()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"salary = pd.read_csv('salary.csv')\n", | |
"salary.columns = ['gender', 'professor_rank', 'years_in_job', 'degree_level', 'years_since_degree', 'yearly_salary']\n", | |
"salary.head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" ### Obtener los 6 subconjuntos del dataset que representan los cruces entre gender y professor_rank (hay 6 subconjuntos: 2 generos * 3 tipos de cargo de profesor = 6)) " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"genders = salary.gender.unique()\n", | |
"ranks = salary.professor_rank.unique()\n", | |
"\n", | |
"symbols = [\"circle\", \"cross\", \"x\"]\n", | |
"colors = [\"#F22\", \"#22F\"]\n", | |
"\n", | |
"traces = []\n", | |
"for i, gender in enumerate(genders):\n", | |
" for j, rank in enumerate(ranks):\n", | |
" data = salary[(salary['gender'] == gender) & (salary['professor_rank'] == rank)]\n", | |
" \n", | |
" traces.append(go.Scatter(\n", | |
" name=gender + '-' + rank,\n", | |
" x=data.years_in_job.values,\n", | |
" y=data.yearly_salary.values,\n", | |
" mode='markers',\n", | |
" marker=dict(\n", | |
" color=colors[i],\n", | |
" symbol=symbols[j],\n", | |
" size=data.years_since_degree.values\n", | |
" )\n", | |
" ))\n", | |
"\n", | |
"layout = go.Layout(\n", | |
" title='Salaries'\n", | |
")\n", | |
"\n", | |
"fig = go.Figure(data=traces, layout=layout)\n", | |
"plotly.offline.iplot(fig)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# 2. NBA\n", | |
"\n", | |
"Vamos a utilizar el dataset 'nba.tsv' que representa la probabilidad que tiene un equipo dado en un minuto dado del partido de ganar. Cada fila representa un equipo y cada columna un minuto del partido (de 0 a 48)\n", | |
"\n", | |
"Queremos un gráfico que nos muestre en el eje x los minutos de un partido de NBA y en el eje Y la probabilidad de ganar que, en ese minuto de juego, tiene cada equipo.\n", | |
"\n", | |
"Utilizaremos la funcion ``Scatter`` de ``plotly``." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"nba = pd.read_csv('nba.tsv', sep='\\t')\n", | |
"\n", | |
"xaxis = nba.columns.values[1:]\n", | |
"\n", | |
"data = [go.Scatter(name=i[1].team, x=xaxis, y=i[1].values[1:]) for i in nba.iterrows()]\n", | |
"\n", | |
"layout = go.Layout(\n", | |
" title='NBA',\n", | |
" xaxis=dict(title='Minutes'),\n", | |
" yaxis=dict(title='Percentage')\n", | |
")\n", | |
"\n", | |
"fig = go.Figure(data=data, layout=layout)\n", | |
"plotly.offline.iplot(fig)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# 3. Mapas\n", | |
"\n", | |
"Utilizando el dataset del Desafío 1, generar un gráfico que muestre el mapa de Estados Unidos y mostrar por intensidad/divergencia de color las notas promedio de los estudiantes de cada estado para alguno de los tests.\n", | |
"\n", | |
"Utilizar la estrucutra ``choropleth`` y basarse en el ejemplo de la demo" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"df = pd.read_csv('sat_scores.csv')\n", | |
"df.head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"scl = [\n", | |
" [0.0, 'rgb(242,240,247)'],\n", | |
" [0.2, 'rgb(218,218,235)'],\n", | |
" [0.4, 'rgb(188,189,220)'],\n", | |
" [0.6, 'rgb(158,154,200)'],\n", | |
" [0.8, 'rgb(117,107,177)'],\n", | |
" [1.0, 'rgb(84,39,143)']\n", | |
"]\n", | |
"\n", | |
"df['text'] = df['State'] + '<br>' + \\\n", | |
" 'Rate ' + df['Rate'].astype(str) + '<br>' + \\\n", | |
" 'Math ' + df['Math'].astype(str)+ '<br>' + \\\n", | |
" 'Verbal ' + df['Verbal'].astype(str)\n", | |
"\n", | |
"data = [\n", | |
" dict(\n", | |
" type='choropleth',\n", | |
" colorscale = scl,\n", | |
" autocolorscale = False,\n", | |
" locations = df['State'],\n", | |
" z = df['Rate'],\n", | |
" locationmode = 'USA-states',\n", | |
" text = df['text'],\n", | |
" marker = dict(\n", | |
" line = dict (\n", | |
" color = 'rgb(255,255,255)',\n", | |
" width = 2\n", | |
" )\n", | |
" ),\n", | |
" colorbar = dict(\n", | |
" title = \"Rate\"\n", | |
" )\n", | |
" )\n", | |
"]\n", | |
"\n", | |
"layout = dict(\n", | |
" title = '2011 SAT scores',\n", | |
" geo = dict(\n", | |
" scope='usa',\n", | |
" projection=dict(type='albers usa'),\n", | |
" showlakes = True,\n", | |
" lakecolor = 'rgb(255, 255, 255)'\n", | |
" ),\n", | |
")\n", | |
"\n", | |
"fig = dict(data=data, layout=layout)\n", | |
"plotly.offline.iplot(fig)" | |
] | |
} | |
], | |
"metadata": { | |
"anaconda-cloud": {}, | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.13" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment