Skip to content

Instantly share code, notes, and snippets.

@scottyla19
Last active March 31, 2022 10:01
Show Gist options
  • Save scottyla19/7f7a2572e02f152226a8cdf444ef6394 to your computer and use it in GitHub Desktop.
Save scottyla19/7f7a2572e02f152226a8cdf444ef6394 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import sqlalchemy\n",
"sqlalchemy.create_engine('mysql+mysqlconnector://root:secret@localhost:3306/lahmansbaseballdb')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"%reload_ext sql"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"%sql mysql+mysqlconnector://root:secret@localhost:3306/lahmansbaseballdb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lahman's Baseball db SQL practice\n",
"I wanted to brush off some of the rust for my SQL skills and I figured the Lahman Baseball database offers a decent database to practice. I downloaded the install script from here and found some practice exercises [here](https://github.com/saifislam1/Lahman-Baseball). The rest of this post are my attempts to solve the problems. ~~Play Ball!~~ Query data!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 1: \n",
"#### What range of years does the provided database cover?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * mysql+mysqlconnector://root:***@localhost:3306/lahmansbaseballdb\n",
"1 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>first_year</th>\n",
" <th>last_year</th>\n",
" </tr>\n",
" <tr>\n",
" <td>1871</td>\n",
" <td>2019</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(1871, 2019)]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql \n",
"SELECT \n",
" MIN(yearID) AS first_year, MAX(yearID) AS last_year\n",
"FROM\n",
" batting;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 2: \n",
"#### Find the name and height of the shortest player in the database. How many games did he play in? What is the name of the team for which he played?"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * mysql+mysqlconnector://root:***@localhost:3306/lahmansbaseballdb\n",
"1 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>playerID</th>\n",
" <th>nameFirst</th>\n",
" <th>nameLast</th>\n",
" <th>height</th>\n",
" <th>games_played</th>\n",
" <th>name</th>\n",
" </tr>\n",
" <tr>\n",
" <td>gaedeed01</td>\n",
" <td>Eddie</td>\n",
" <td>Gaedel</td>\n",
" <td>43</td>\n",
" <td>52</td>\n",
" <td>St. Louis Browns</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('gaedeed01', 'Eddie', 'Gaedel', 43, Decimal('52'), 'St. Louis Browns')]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" p1.playerID,\n",
" p1.nameFirst,\n",
" p1.nameLast,\n",
" p1.height,\n",
" SUM(apps.G_all) AS games_played,\n",
" t.name\n",
"FROM\n",
" people p1\n",
" INNER JOIN\n",
" (SELECT \n",
" MIN(people.height) AS min_height\n",
" FROM\n",
" people) p2 ON p1.height = p2.min_height\n",
" INNER JOIN\n",
" appearances AS apps ON p1.playerID = apps.playerID\n",
" INNER JOIN\n",
" teams t ON t.teamID = apps.teamID\n",
"GROUP BY p1.playerID , p1.nameFirst , p1.nameLast , p1.height , t.name;\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Question 3: \n",
"#### Find all players in the database who played at Vanderbilt University. Create a list showing each player’s first and last names as well as the total salary they earned in the major leagues. Sort this list in descending order by the total salary earned. Which Vanderbilt player earned the most money in the majors?"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * mysql+mysqlconnector://root:***@localhost:3306/lahmansbaseballdb\n",
"15 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>nameFirst</th>\n",
" <th>nameLast</th>\n",
" <th>total_salary</th>\n",
" </tr>\n",
" <tr>\n",
" <td>David</td>\n",
" <td>Price</td>\n",
" <td>245553888.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Pedro</td>\n",
" <td>Alvarez</td>\n",
" <td>62045112.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Scott</td>\n",
" <td>Sanderson</td>\n",
" <td>21500000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Mike</td>\n",
" <td>Minor</td>\n",
" <td>20512500.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Joey</td>\n",
" <td>Cora</td>\n",
" <td>16867500.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Mark</td>\n",
" <td>Prior</td>\n",
" <td>12800000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Ryan</td>\n",
" <td>Flaherty</td>\n",
" <td>12183000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Josh</td>\n",
" <td>Paul</td>\n",
" <td>7920000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Sonny</td>\n",
" <td>Gray</td>\n",
" <td>4627500.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Mike</td>\n",
" <td>Baxter</td>\n",
" <td>4188836.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jensen</td>\n",
" <td>Lewis</td>\n",
" <td>3702000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Matt</td>\n",
" <td>Kata</td>\n",
" <td>3180000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Nick</td>\n",
" <td>Christiani</td>\n",
" <td>2000000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jeremy</td>\n",
" <td>Sowers</td>\n",
" <td>1154400.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Scotti</td>\n",
" <td>Madison</td>\n",
" <td>540000.0</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('David', 'Price', 245553888.0),\n",
" ('Pedro', 'Alvarez', 62045112.0),\n",
" ('Scott', 'Sanderson', 21500000.0),\n",
" ('Mike', 'Minor', 20512500.0),\n",
" ('Joey', 'Cora', 16867500.0),\n",
" ('Mark', 'Prior', 12800000.0),\n",
" ('Ryan', 'Flaherty', 12183000.0),\n",
" ('Josh', 'Paul', 7920000.0),\n",
" ('Sonny', 'Gray', 4627500.0),\n",
" ('Mike', 'Baxter', 4188836.0),\n",
" ('Jensen', 'Lewis', 3702000.0),\n",
" ('Matt', 'Kata', 3180000.0),\n",
" ('Nick', 'Christiani', 2000000.0),\n",
" ('Jeremy', 'Sowers', 1154400.0),\n",
" ('Scotti', 'Madison', 540000.0)]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" p.nameFirst, p.nameLast, SUM(s.salary) total_salary\n",
"FROM\n",
" people p\n",
" INNER JOIN\n",
" salaries s ON s.playerID = p.playerID\n",
" INNER JOIN\n",
" collegeplaying cp ON p.playerID = cp.playerID\n",
" INNER JOIN\n",
" schools sc ON sc.schoolID = cp.schoolID\n",
"WHERE\n",
" sc.name_full = 'Vanderbilt University'\n",
"GROUP BY p.nameFirst , p.nameLast\n",
"ORDER BY total_salary DESC;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Question 4:\n",
"#### Using the fielding table, group players into three groups based on their position: label players with position OF as \"Outfield\", those with position \"SS\", \"1B\", \"2B\", and \"3B\" as \"Infield\", and those with position \"P\" or \"C\" as \"Battery\". Determine the number of putouts made by each of these three groups in 2016."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * mysql+mysqlconnector://root:***@localhost:3306/lahmansbaseballdb\n",
"3 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>POS_Group</th>\n",
" <th>Putouts</th>\n",
" </tr>\n",
" <tr>\n",
" <td>Battery</td>\n",
" <td>41424</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Infield</td>\n",
" <td>58935</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Outfield</td>\n",
" <td>29560</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('Battery', Decimal('41424')),\n",
" ('Infield', Decimal('58935')),\n",
" ('Outfield', Decimal('29560'))]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" CASE\n",
" WHEN POS = 'OF' THEN 'Outfield'\n",
" WHEN POS IN ('P' , 'C') THEN 'Battery'\n",
" ELSE 'Infield'\n",
" END AS POS_Group,\n",
" SUM(PO) AS 'Putouts'\n",
"FROM\n",
" fielding\n",
"WHERE\n",
" yearID = 2016\n",
"GROUP BY POS_Group;\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 5:\n",
"#### Find the average number of strikeouts per game by decade since 1920. Round the numbers you report to 2 decimal places. Do the same for home runs per game. Do you see any trends?"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * mysql+mysqlconnector://root:***@localhost:3306/lahmansbaseballdb\n",
"10 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>decade</th>\n",
" <th>K per Game</th>\n",
" <th>HR per Game</th>\n",
" </tr>\n",
" <tr>\n",
" <td>1920</td>\n",
" <td>2.81490875</td>\n",
" <td>0.40173063</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1930</td>\n",
" <td>3.31660500</td>\n",
" <td>0.54575125</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1940</td>\n",
" <td>3.54989625</td>\n",
" <td>0.52308313</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1950</td>\n",
" <td>4.39932063</td>\n",
" <td>0.84299375</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1960</td>\n",
" <td>5.71242879</td>\n",
" <td>0.82020455</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1970</td>\n",
" <td>5.14521423</td>\n",
" <td>0.74564065</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1980</td>\n",
" <td>5.34239077</td>\n",
" <td>0.80403538</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1990</td>\n",
" <td>6.15079065</td>\n",
" <td>0.96031367</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2000</td>\n",
" <td>6.56109333</td>\n",
" <td>1.07335433</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2010</td>\n",
" <td>7.81854300</td>\n",
" <td>1.06862133</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(1920, Decimal('2.81490875'), Decimal('0.40173063')),\n",
" (1930, Decimal('3.31660500'), Decimal('0.54575125')),\n",
" (1940, Decimal('3.54989625'), Decimal('0.52308313')),\n",
" (1950, Decimal('4.39932063'), Decimal('0.84299375')),\n",
" (1960, Decimal('5.71242879'), Decimal('0.82020455')),\n",
" (1970, Decimal('5.14521423'), Decimal('0.74564065')),\n",
" (1980, Decimal('5.34239077'), Decimal('0.80403538')),\n",
" (1990, Decimal('6.15079065'), Decimal('0.96031367')),\n",
" (2000, Decimal('6.56109333'), Decimal('1.07335433')),\n",
" (2010, Decimal('7.81854300'), Decimal('1.06862133'))]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" FLOOR(yearID / 10) * 10 AS decade,\n",
" AVG(SO / G) 'K per Game',\n",
" AVG(HR / G) 'HR per Game'\n",
"FROM\n",
" teams\n",
"WHERE\n",
" yearID >= 1920\n",
"GROUP BY decade;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 6:\n",
"#### Find the player who had the most success stealing bases in 2016, where success is measured as the percentage of stolen base attempts which are successful. (A stolen base attempt results either in a stolen base or being caught stealing.) Consider only players who attempted at least 20 stolen bases."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * mysql+mysqlconnector://root:***@localhost:3306/lahmansbaseballdb\n",
"47 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>nameFirst</th>\n",
" <th>nameLast</th>\n",
" <th>sb_perc</th>\n",
" </tr>\n",
" <tr>\n",
" <td>Chris</td>\n",
" <td>Owings</td>\n",
" <td>0.91</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Brian</td>\n",
" <td>Dozier</td>\n",
" <td>0.90</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Rajai</td>\n",
" <td>Davis</td>\n",
" <td>0.88</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Billy</td>\n",
" <td>Hamilton</td>\n",
" <td>0.88</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Kevin</td>\n",
" <td>Kiermaier</td>\n",
" <td>0.88</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Mookie</td>\n",
" <td>Betts</td>\n",
" <td>0.87</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Paul</td>\n",
" <td>Goldschmidt</td>\n",
" <td>0.86</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Trea</td>\n",
" <td>Turner</td>\n",
" <td>0.85</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Keon</td>\n",
" <td>Broxton</td>\n",
" <td>0.85</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Hernan</td>\n",
" <td>Perez</td>\n",
" <td>0.83</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Josh</td>\n",
" <td>Harrison</td>\n",
" <td>0.83</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Eduardo</td>\n",
" <td>Nunez</td>\n",
" <td>0.82</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Wil</td>\n",
" <td>Myers</td>\n",
" <td>0.82</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jarrod</td>\n",
" <td>Dyson</td>\n",
" <td>0.81</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Mike</td>\n",
" <td>Trout</td>\n",
" <td>0.81</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Alcides</td>\n",
" <td>Escobar</td>\n",
" <td>0.81</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Dee</td>\n",
" <td>Gordon</td>\n",
" <td>0.81</td>\n",
" </tr>\n",
" <tr>\n",
" <td>B. J.</td>\n",
" <td>Upton</td>\n",
" <td>0.80</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Brett</td>\n",
" <td>Gardner</td>\n",
" <td>0.80</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Leonys</td>\n",
" <td>Martin</td>\n",
" <td>0.80</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Starling</td>\n",
" <td>Marte</td>\n",
" <td>0.80</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Francisco</td>\n",
" <td>Lindor</td>\n",
" <td>0.79</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jonathan</td>\n",
" <td>Villar</td>\n",
" <td>0.78</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Ian</td>\n",
" <td>Desmond</td>\n",
" <td>0.78</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Odubel</td>\n",
" <td>Herrera</td>\n",
" <td>0.78</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jean</td>\n",
" <td>Segura</td>\n",
" <td>0.77</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Ryan</td>\n",
" <td>Braun</td>\n",
" <td>0.76</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jose</td>\n",
" <td>Ramirez</td>\n",
" <td>0.76</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jose</td>\n",
" <td>Altuve</td>\n",
" <td>0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Todd</td>\n",
" <td>Frazier</td>\n",
" <td>0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Elvis</td>\n",
" <td>Andrus</td>\n",
" <td>0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Gregory</td>\n",
" <td>Polanco</td>\n",
" <td>0.74</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Freddy</td>\n",
" <td>Galvis</td>\n",
" <td>0.74</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Cameron</td>\n",
" <td>Maybin</td>\n",
" <td>0.71</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jacoby</td>\n",
" <td>Ellsbury</td>\n",
" <td>0.71</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Travis</td>\n",
" <td>Jankowski</td>\n",
" <td>0.71</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Ian</td>\n",
" <td>Kinsler</td>\n",
" <td>0.70</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Ender</td>\n",
" <td>Inciarte</td>\n",
" <td>0.70</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Kevin</td>\n",
" <td>Pillar</td>\n",
" <td>0.70</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jose</td>\n",
" <td>Peraza</td>\n",
" <td>0.68</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Bryce</td>\n",
" <td>Harper</td>\n",
" <td>0.68</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Rougned</td>\n",
" <td>Odor</td>\n",
" <td>0.67</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Mallex</td>\n",
" <td>Smith</td>\n",
" <td>0.67</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Charlie</td>\n",
" <td>Blackmon</td>\n",
" <td>0.65</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Brandon</td>\n",
" <td>Phillips</td>\n",
" <td>0.64</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Cesar</td>\n",
" <td>Hernandez</td>\n",
" <td>0.57</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Danny</td>\n",
" <td>Santana</td>\n",
" <td>0.57</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('Chris', 'Owings', Decimal('0.91')),\n",
" ('Brian', 'Dozier', Decimal('0.90')),\n",
" ('Rajai', 'Davis', Decimal('0.88')),\n",
" ('Billy', 'Hamilton', Decimal('0.88')),\n",
" ('Kevin', 'Kiermaier', Decimal('0.88')),\n",
" ('Mookie', 'Betts', Decimal('0.87')),\n",
" ('Paul', 'Goldschmidt', Decimal('0.86')),\n",
" ('Trea', 'Turner', Decimal('0.85')),\n",
" ('Keon', 'Broxton', Decimal('0.85')),\n",
" ('Hernan', 'Perez', Decimal('0.83')),\n",
" ('Josh', 'Harrison', Decimal('0.83')),\n",
" ('Eduardo', 'Nunez', Decimal('0.82')),\n",
" ('Wil', 'Myers', Decimal('0.82')),\n",
" ('Jarrod', 'Dyson', Decimal('0.81')),\n",
" ('Mike', 'Trout', Decimal('0.81')),\n",
" ('Alcides', 'Escobar', Decimal('0.81')),\n",
" ('Dee', 'Gordon', Decimal('0.81')),\n",
" ('B. J.', 'Upton', Decimal('0.80')),\n",
" ('Brett', 'Gardner', Decimal('0.80')),\n",
" ('Leonys', 'Martin', Decimal('0.80')),\n",
" ('Starling', 'Marte', Decimal('0.80')),\n",
" ('Francisco', 'Lindor', Decimal('0.79')),\n",
" ('Jonathan', 'Villar', Decimal('0.78')),\n",
" ('Ian', 'Desmond', Decimal('0.78')),\n",
" ('Odubel', 'Herrera', Decimal('0.78')),\n",
" ('Jean', 'Segura', Decimal('0.77')),\n",
" ('Ryan', 'Braun', Decimal('0.76')),\n",
" ('Jose', 'Ramirez', Decimal('0.76')),\n",
" ('Jose', 'Altuve', Decimal('0.75')),\n",
" ('Todd', 'Frazier', Decimal('0.75')),\n",
" ('Elvis', 'Andrus', Decimal('0.75')),\n",
" ('Gregory', 'Polanco', Decimal('0.74')),\n",
" ('Freddy', 'Galvis', Decimal('0.74')),\n",
" ('Cameron', 'Maybin', Decimal('0.71')),\n",
" ('Jacoby', 'Ellsbury', Decimal('0.71')),\n",
" ('Travis', 'Jankowski', Decimal('0.71')),\n",
" ('Ian', 'Kinsler', Decimal('0.70')),\n",
" ('Ender', 'Inciarte', Decimal('0.70')),\n",
" ('Kevin', 'Pillar', Decimal('0.70')),\n",
" ('Jose', 'Peraza', Decimal('0.68')),\n",
" ('Bryce', 'Harper', Decimal('0.68')),\n",
" ('Rougned', 'Odor', Decimal('0.67')),\n",
" ('Mallex', 'Smith', Decimal('0.67')),\n",
" ('Charlie', 'Blackmon', Decimal('0.65')),\n",
" ('Brandon', 'Phillips', Decimal('0.64')),\n",
" ('Cesar', 'Hernandez', Decimal('0.57')),\n",
" ('Danny', 'Santana', Decimal('0.57'))]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" p.nameFirst,\n",
" p.nameLast,\n",
" ROUND(b.SB / (b.SB + b.CS), 2) AS sb_perc\n",
"FROM\n",
" batting b\n",
" INNER JOIN\n",
" people p ON p.playerID = b.playerID\n",
"WHERE\n",
" b.SB + b.CS >= 20 AND b.yearID = 2016\n",
"ORDER BY sb_perc DESC;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 7:\n",
"#### From 1970 – 2019, what is the largest number of wins for a team that did not win the world series? What is the smallest number of wins for a team that did win the world series? Doing this will probably result in an unusually small number of wins for a world series champion – determine why this is the case. Then redo your query, excluding the problem year. How often from 1970 – 2016 was it the case that a team with the most wins also won the world series? What percentage of the time?"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * mysql+mysqlconnector://root:***@localhost:3306/lahmansbaseballdb\n",
"2 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>teamID</th>\n",
" <th>yearID</th>\n",
" <th>W</th>\n",
" </tr>\n",
" <tr>\n",
" <td>SEA</td>\n",
" <td>2001</td>\n",
" <td>116</td>\n",
" </tr>\n",
" <tr>\n",
" <td>SLN</td>\n",
" <td>2006</td>\n",
" <td>83</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('SEA', 2001, 116), ('SLN', 2006, 83)]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" t1.teamID, t1.yearID, t1.W\n",
"FROM\n",
" teams t1\n",
" INNER JOIN\n",
" (SELECT \n",
" t2.W, t2.teamID, t2.yearID\n",
" FROM\n",
" teams t2\n",
" WHERE\n",
" t2.WSWin = 'N' AND t2.yearID >= 1970\n",
" ORDER BY t2.W DESC\n",
" LIMIT 1) Max_w_no_WS ON t1.teamID = Max_w_no_WS.teamID\n",
" AND Max_w_no_WS.yearID = t1.yearID \n",
"UNION ALL SELECT \n",
" t1.teamID, t1.yearID, t1.W\n",
"FROM\n",
" teams t1\n",
" INNER JOIN\n",
" (SELECT \n",
" t2.W, t2.teamID, t2.yearID\n",
" FROM\n",
" teams t2\n",
" WHERE\n",
" t2.WSWin = 'Y' AND t2.yearID >= 1970\n",
" AND t2.G > 150\n",
" ORDER BY t2.W ASC\n",
" LIMIT 1) Min_w_WS ON t1.teamID = Min_w_WS.teamID\n",
" AND Min_w_WS.yearID = t1.yearID;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Part 2: How often from 1970 – 2016 was it the case that a team with the most wins also won the world series? What percentage of the time?"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * mysql+mysqlconnector://root:***@localhost:3306/lahmansbaseballdb\n",
"1 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>perc_most_wins_and_WS_champs</th>\n",
" </tr>\n",
" <tr>\n",
" <td>0.2653</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(Decimal('0.2653'),)]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" COUNT(*) / (2019 - 1970) AS perc_most_wins_and_WS_champs\n",
"FROM\n",
" teams t1\n",
"WHERE\n",
" (t1.yearID , t1.W) IN (SELECT \n",
" t2.yearID, MAX(t2.W) AS max_wins\n",
" FROM\n",
" teams t2\n",
" GROUP BY t2.yearID)\n",
" AND t1.yearID >= 1970\n",
" AND t1.WSWin = 'Y'\n",
"ORDER BY t1.yearID DESC , t1.W DESC;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 8:\n",
"#### Using the attendance figures from the homegames table, find the teams and parks which had the top 5 average attendance per game in 2016 (where average attendance is defined as total attendance divided by number of games). Only consider parks where there were at least 10 games played. Report the park name, team name, and average attendance. Repeat for the lowest 5 average attendance."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * mysql+mysqlconnector://root:***@localhost:3306/lahmansbaseballdb\n",
"30 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>parkname</th>\n",
" <th>name</th>\n",
" <th>average_attendance</th>\n",
" </tr>\n",
" <tr>\n",
" <td>Dodger Stadium</td>\n",
" <td>Los Angeles Dodgers</td>\n",
" <td>49065.5432</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Busch Stadium III</td>\n",
" <td>St. Louis Cardinals</td>\n",
" <td>42967.8148</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Yankee Stadium II</td>\n",
" <td>New York Yankees</td>\n",
" <td>40795.1111</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Wrigley Field</td>\n",
" <td>Chicago Cubs</td>\n",
" <td>38208.2099</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Angel Stadium of Anaheim</td>\n",
" <td>Los Angeles Angels of Anaheim</td>\n",
" <td>37812.9241</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Coors Field</td>\n",
" <td>Colorado Rockies</td>\n",
" <td>36953.6296</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Miller Park</td>\n",
" <td>Milwaukee Brewers</td>\n",
" <td>36090.5309</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Fenway Park</td>\n",
" <td>Boston Red Sox</td>\n",
" <td>35402.3291</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Minute Maid Park</td>\n",
" <td>Houston Astros</td>\n",
" <td>35276.1358</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Citizens Bank Park</td>\n",
" <td>Philadelphia Phillies</td>\n",
" <td>33671.8642</td>\n",
" </tr>\n",
" <tr>\n",
" <td>AT&amp;T Park</td>\n",
" <td>San Francisco Giants</td>\n",
" <td>33429.1358</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Suntrust Park</td>\n",
" <td>Atlanta Braves</td>\n",
" <td>32776.7901</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Citi Field</td>\n",
" <td>New York Mets</td>\n",
" <td>30154.7160</td>\n",
" </tr>\n",
" <tr>\n",
" <td>PETCO Park</td>\n",
" <td>San Diego Padres</td>\n",
" <td>29585.1728</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Target Field</td>\n",
" <td>Minnesota Twins</td>\n",
" <td>28435.7901</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Nationals Park</td>\n",
" <td>Washington Nationals</td>\n",
" <td>27898.5309</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Chase Field</td>\n",
" <td>Arizona Diamondbacks</td>\n",
" <td>26364.3210</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Rangers Ballpark in Arlington</td>\n",
" <td>Texas Rangers</td>\n",
" <td>26333.2593</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Great American Ballpark</td>\n",
" <td>Cincinnati Reds</td>\n",
" <td>22473.3671</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Safeco Field</td>\n",
" <td>Seattle Mariners</td>\n",
" <td>22112.4568</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Rogers Centre</td>\n",
" <td>Toronto Blue Jays</td>\n",
" <td>21606.7160</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Progressive Field</td>\n",
" <td>Cleveland Indians</td>\n",
" <td>21464.7160</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Oakland-Alameda County Coliseum</td>\n",
" <td>Oakland Athletics</td>\n",
" <td>20626.3457</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Guaranteed Rate Field</td>\n",
" <td>Chicago White Sox</td>\n",
" <td>20622.1875</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Comerica Park</td>\n",
" <td>Detroit Tigers</td>\n",
" <td>18536.1728</td>\n",
" </tr>\n",
" <tr>\n",
" <td>PNC Park</td>\n",
" <td>Pittsburgh Pirates</td>\n",
" <td>18412.8272</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Kauffman Stadium</td>\n",
" <td>Kansas City Royals</td>\n",
" <td>18177.5625</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Oriole Park at Camden Yards</td>\n",
" <td>Baltimore Orioles</td>\n",
" <td>16145.7654</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Tropicana Field</td>\n",
" <td>Tampa Bay Rays</td>\n",
" <td>14552.2840</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Marlins Park</td>\n",
" <td>Miami Marlins</td>\n",
" <td>10016.0741</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('Dodger Stadium', 'Los Angeles Dodgers', Decimal('49065.5432')),\n",
" ('Busch Stadium III', 'St. Louis Cardinals', Decimal('42967.8148')),\n",
" ('Yankee Stadium II', 'New York Yankees', Decimal('40795.1111')),\n",
" ('Wrigley Field', 'Chicago Cubs', Decimal('38208.2099')),\n",
" ('Angel Stadium of Anaheim', 'Los Angeles Angels of Anaheim', Decimal('37812.9241')),\n",
" ('Coors Field', 'Colorado Rockies', Decimal('36953.6296')),\n",
" ('Miller Park', 'Milwaukee Brewers', Decimal('36090.5309')),\n",
" ('Fenway Park', 'Boston Red Sox', Decimal('35402.3291')),\n",
" ('Minute Maid Park', 'Houston Astros', Decimal('35276.1358')),\n",
" ('Citizens Bank Park', 'Philadelphia Phillies', Decimal('33671.8642')),\n",
" ('AT&T Park', 'San Francisco Giants', Decimal('33429.1358')),\n",
" ('Suntrust Park', 'Atlanta Braves', Decimal('32776.7901')),\n",
" ('Citi Field', 'New York Mets', Decimal('30154.7160')),\n",
" ('PETCO Park', 'San Diego Padres', Decimal('29585.1728')),\n",
" ('Target Field', 'Minnesota Twins', Decimal('28435.7901')),\n",
" ('Nationals Park', 'Washington Nationals', Decimal('27898.5309')),\n",
" ('Chase Field', 'Arizona Diamondbacks', Decimal('26364.3210')),\n",
" ('Rangers Ballpark in Arlington', 'Texas Rangers', Decimal('26333.2593')),\n",
" ('Great American Ballpark', 'Cincinnati Reds', Decimal('22473.3671')),\n",
" ('Safeco Field', 'Seattle Mariners', Decimal('22112.4568')),\n",
" ('Rogers Centre', 'Toronto Blue Jays', Decimal('21606.7160')),\n",
" ('Progressive Field', 'Cleveland Indians', Decimal('21464.7160')),\n",
" ('Oakland-Alameda County Coliseum', 'Oakland Athletics', Decimal('20626.3457')),\n",
" ('Guaranteed Rate Field', 'Chicago White Sox', Decimal('20622.1875')),\n",
" ('Comerica Park', 'Detroit Tigers', Decimal('18536.1728')),\n",
" ('PNC Park', 'Pittsburgh Pirates', Decimal('18412.8272')),\n",
" ('Kauffman Stadium', 'Kansas City Royals', Decimal('18177.5625')),\n",
" ('Oriole Park at Camden Yards', 'Baltimore Orioles', Decimal('16145.7654')),\n",
" ('Tropicana Field', 'Tampa Bay Rays', Decimal('14552.2840')),\n",
" ('Marlins Park', 'Miami Marlins', Decimal('10016.0741'))]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" parks.parkname,\n",
" teams.name,\n",
" homegames.attendance / homegames.games average_attendance\n",
"FROM\n",
" homegames\n",
" INNER JOIN\n",
" parks ON parks.ID = homegames.park_ID\n",
" INNER JOIN\n",
" teams ON teams.ID = homegames.team_ID\n",
"WHERE\n",
" games > 10 AND yearkey = 2019\n",
"ORDER BY average_attendance DESC;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 9:\n",
"#### Which managers have won the TSN Manager of the Year award in both the National League (NL) and the American League (AL)? Give their full name and the teams that they were managing when they won the award."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * mysql+mysqlconnector://root:***@localhost:3306/lahmansbaseballdb\n",
"6 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>Manager</th>\n",
" <th>award</th>\n",
" <th>year</th>\n",
" <th>league</th>\n",
" <th>teamID</th>\n",
" </tr>\n",
" <tr>\n",
" <td>Jim Leyland</td>\n",
" <td>TSN Manager of the Year</td>\n",
" <td>1988</td>\n",
" <td>NL</td>\n",
" <td>PIT</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jim Leyland</td>\n",
" <td>TSN Manager of the Year</td>\n",
" <td>1990</td>\n",
" <td>NL</td>\n",
" <td>PIT</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jim Leyland</td>\n",
" <td>TSN Manager of the Year</td>\n",
" <td>1992</td>\n",
" <td>NL</td>\n",
" <td>PIT</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Jim Leyland</td>\n",
" <td>TSN Manager of the Year</td>\n",
" <td>2006</td>\n",
" <td>AL</td>\n",
" <td>DET</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Davey Johnson</td>\n",
" <td>TSN Manager of the Year</td>\n",
" <td>1997</td>\n",
" <td>AL</td>\n",
" <td>BAL</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Davey Johnson</td>\n",
" <td>TSN Manager of the Year</td>\n",
" <td>2012</td>\n",
" <td>NL</td>\n",
" <td>WAS</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('Jim Leyland', 'TSN Manager of the Year', 1988, 'NL', 'PIT'),\n",
" ('Jim Leyland', 'TSN Manager of the Year', 1990, 'NL', 'PIT'),\n",
" ('Jim Leyland', 'TSN Manager of the Year', 1992, 'NL', 'PIT'),\n",
" ('Jim Leyland', 'TSN Manager of the Year', 2006, 'AL', 'DET'),\n",
" ('Davey Johnson', 'TSN Manager of the Year', 1997, 'AL', 'BAL'),\n",
" ('Davey Johnson', 'TSN Manager of the Year', 2012, 'NL', 'WAS')]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" CONCAT(p.nameFirst, ' ', p.nameLast) Manager,\n",
" am1.awardID award,\n",
" am1.yearID year,\n",
" am1.lgID league,\n",
" m.teamID\n",
"FROM\n",
" awardsmanagers am1\n",
" INNER JOIN\n",
" people p ON p.playerID = am1.playerID\n",
" INNER JOIN\n",
" managers m ON m.playerID = am1.playerID\n",
" AND m.yearID = am1.yearID\n",
"WHERE\n",
" am1.playerID IN (SELECT \n",
" am2.playerID\n",
" FROM\n",
" awardsmanagers am2\n",
" WHERE\n",
" am2.awardID = 'TSN Manager of the Year'\n",
" AND am2.lgID = 'NL')\n",
" AND am1.playerID IN (SELECT \n",
" am2.playerID\n",
" FROM\n",
" awardsmanagers am2\n",
" WHERE\n",
" am2.awardID = 'TSN Manager of the Year'\n",
" AND am2.lgID = 'AL')\n",
" AND am1.awardID = 'TSN Manager of the Year'"
]
}
],
"metadata": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
},
"kernelspec": {
"display_name": "Python 3.8.2 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment