Skip to content

Instantly share code, notes, and snippets.

@tervay
Created April 23, 2021 07:32
Show Gist options
  • Save tervay/04806b98437b376881b3984bf7da7a86 to your computer and use it in GitHub Desktop.
Save tervay/04806b98437b376881b3984bf7da7a86 to your computer and use it in GitHub Desktop.
Articles

[[toc]]

How does TBA serve data?

TheBlueAlliance stores their data in a database, such as Google Cloud Datastore. However, databases themselves are typically heavily secured, so that bad actors can't intentionally edit the database and corrupt the data (and break the site). So instead, TBA exposes special URLs that return raw data, just like how specific URLs return a webpage. The data format that data is returned in is called JSON - a widely standardized format that many programming languages can understand.

You can view an example of a JSON response here: https://www.thebluealliance.com/api/v3/status

However, you'll see an error:

{"Error": "X-TBA-Auth-Key is a required header or URL param. Please get an access key at http://www.thebluealliance.com/account."}

This is because our request to the server didn't have all the information that the server wanted - specifically, we are missing a header that tells the server who we are. The reason we have to tell the server who we are is so that TBA (and many other sites that use API authentication practices) can track which users are requesting the most data - while TBA doesn't really do anything with this information, more popular APIs - such as the official League of Legends API or the Google Maps API - can limit how much data is returned to each user in order to prevent servers from being overwhelmed.

Don't worry about the error right now - we'll fix it in our upcoming code.

Setting up our development environment

Let's stick to the basics here. We'll use Python here since it's very easy to utilize. If you already have a Python environment set up, you're free to skip ahead.

We'll use the PyCharm educational IDE, since it's a great way for new Python programmers to learn. There is both an educational and a regular version of PyCharm, so feel free to grab whichever one you would prefer. The educational version has some simpler UIs and easier-to-navigate options, as well as coming with a Python version in the installer.

When installing, you'll come across this menu screen:

enter image description here

Note that my options for Choose Python version are greyed out, since I already have both of those Python versions installed. You should choose to install Python 3.8 if you have not installed it already (Python 2.7 is no longer officially supported). If you don't install any Python version, you won't be able to run the code!

enter image description here

Make sure you select Learner here.

enter image description here

Click New Project.

enter image description here

Let's take a step to examine what all this means:

  • Location is simply the folder on your hard drive where your project will be saved.
  • You'll see 3 options for what to use for a "new environment" - for this, we'll simply use Virtualenv, or "virtual environment" - this will create a copy of the Python installation inside the folder you specified in Location. The reason for this is a bit out of scope for this tutorial, but effectively it allows for easier separation of concerns when having multiple Python projects on your hard drive that each use their own external libraries.
  • The Base interpreter is simply which Python version will be copied to the virtual environment. This is used when you have multiple versions of Python installed, such as Python 3.8 and Python 2.7.
  • Make sure you select the last option to create a welcome script.

You'll see a window like this:

enter image description here

The first thing to do to make your life a little easier is go to View -> Appearance -> Toolbar:

enter image description here

Now, we can click the magical green play button to run the script!

enter image description here

And you should see a console output the message:

Hi, PyCharm

A quick note on Python

As much as I would love to teach you the basics of Python syntax, it's out of scope of what this guide is intended for. There are tons and tons of resources for learning the basics of Python, and it's a very easy language to learn. If you aren't familiar with Python at all, you're free to continue, but I won't be explaining every aspect of the code (such as iterating through dictionaries and lists).

Installing a third-party library

While it's possible to get data from a web API with the Python standard library, it's a lot easier with third-party libraries. My personal favorite and recommended library is tbapy. In order to install it, follow these steps:

  1. Go to File -> Settings.
  2. Navigate to Project: myProject -> Python Interpreter. enter image description here
  3. You'll see that we have two default libraries installed - pip and setuptools. pip is used to download other libraries, and setuptools is used to install and distribute libraries.
  4. Click the + on the right side.
  5. Search for tbapy: enter image description here
  6. Click Install Package.
  7. When it's installed successfully, you can close these windows. You may note that you have several more libraries installed, such as CacheControl, certifi, chardet, and others - these are all libraries used by tbapy, so they had to be installed too.

Getting a TBA API Key

You'll want to get a Read API Key from here: https://www.thebluealliance.com/account

Copy the value under X-TBA-Auth-Key - that's your API key (and what the error earlier was complaining about you not having!).

enter image description here

Getting our first piece of data

Back to PyCharm. The first thing we have to do is import tbapy so that our script knows what we are talking about:

import tbapy

After that, we have to tell tbapy what our API key is so that it can go and get data for us:

tba = tbapy.TBA("ThisIsMyAPIKey")

This creates an instance of the TBA class and assigns it to the tba variable. The TBA class has lots of helper methods to help us get data, such as status:

print(tba.status())

After running, your IDE should look like this:

enter image description here

You'll see that the console successfully output some information regarding the status of the TBA API! This isn't really any info you probably actually care about, so let's try getting information about a team...

import tbapy  
  
tba = tbapy.TBA("ThisIsMyAPIKey")  
print(tba.team(2791))

Feel free to swap out 2791 for your own team number. You should get an output like this:

Team({'address': None, 'city': 'Latham', 'country': 'USA', 'gmaps_place_id': None, 'gmaps_url': None, 'home_championship': {'2020': 'Detroit'}, 'key': 'frc2791', 'lat': None, 'lng': None, 'location_name': None, 'motto': None, 'name': 'GE Energy (Power and Water)/gcom Software/Google/PVA/The Colden Company Inc/CAPCOM Federal Credit Union/NYSUT/Crisafulli Brothers/Atlas Copco/Price Chopper/Market 32&Shaker High School', 'nickname': 'Shaker Robotics', 'postal_code': '12110', 'rookie_year': 2009, 'school_name': 'Shaker High School', 'state_prov': 'New York', 'team_number': 2791, 'website': 'http://www.team2791.org'})

Which looks great! You'll notice that it looks pretty similar to JSON, except it's wrapped in a Team( ... ) block -- this is just because it's actually printing an instance of a Team class that tbapy has implemented, and it simply internally stores the JSON representation of a team. You can interact with this Team instance just like it was a pure JSON object.

One minor nitpack is that it's all printed on one line, so it can be a little hard for the human eye to parse through. Luckily, Python has some helper methods for that! We'll use pprint (short for "pretty print") for that:

enter image description here

Now we can much more easily see the JSON blob representing team 2791.

How much is in the API?

A lot! You can see the full API documentation here: https://www.thebluealliance.com/apidocs/v3

And the tbapy documentation here: https://github.com/frc1418/tbapy#retrieval-functions

So, lets look at the tbapy API docs and explain what those function arguments mean.

For example:

enter image description here

Here, tba.team is exactly what we did in the last script we ran! 2791 (or whichever team you put) is team parameter - you'll note that the tbapy developers noted that you can provide either the team number (2791) or the team key, which would be a string of "frc2791". Team keys are pretty common within the TBA API and the official FRC APIs.

Next, you'll notice [simple] - the brackets simply mean it's an optional parameter. Since the official TBA API has multiple team API endpoints, tbapy simply condensed it into one method. You can see the endpoints on the official TBA docs page:

enter image description here

Clicking on each of those tells you what type of data they each return - in this case, the first method returns a Team object, while the second returns a TeamSimple object. To find out what these mean, scroll all the way down to the bottom of the TBA page to find these:

enter image description here

You'll see that TeamSimple just returns less data, in case you don't need all of it from the full Team model.

So lets print the simple version of our team. To provide the optional simple parameter, simply replace tba.team(2791) with tba.team(2791, simple=True):

enter image description here

Analyzing the data

Lets say we want to find the average number of district points our team has earned across our 2019 events. First, we need to find all our 2019 events:

import tbapy  
from pprint import pprint  
  
tba = tbapy.TBA("ThisIsMyAPIKey")  
all_events = tba.team_events(2168, year=2019)

Here, we are getting all events that team 2168 attended in 2019. An example event model looks like:

{'address': '100 Institute Rd, Worcester, MA 01609, USA',
  'city': 'Worcester',
  'country': 'USA',
  'district': {'abbreviation': 'ne',
               'display_name': 'New England',
               'key': '2019ne',
               'year': 2019},
  'division_keys': [],
  'end_date': '2019-04-13',
  'event_code': 'necmp',
  'event_type': 2,
  'event_type_string': 'District Championship',
  'first_event_code': 'NECMP',
  'first_event_id': None,
  'gmaps_place_id': 'ChIJdzY3EFkG5IkRrW3cc4Yhw8Y',
  'gmaps_url': 'https://maps.google.com/?cid=14322328101321469357',
  'key': '2019necmp',
  'lat': 42.2745754,
  'lng': -71.8062724,
  'location_name': 'Worcester Polytechnic Institute',
  'name': 'New England District Championship',
  'parent_event_key': None,
  'playoff_type': 0,
  'playoff_type_string': 'Elimination Bracket (8 Alliances)',
  'postal_code': '01609',
  'short_name': 'New England',
  'start_date': '2019-04-10',
  'state_prov': 'MA',
  'timezone': 'America/New_York',
  'webcasts': [{'channel': 'nefirst_red', 'type': 'twitch'},
               {'channel': 'nefirst_blue', 'type': 'twitch'}],
  'website': 'http://www.nefirst.org/',
  'week': 6,
  'year': 2019}

There is some important info here - notably, it tells us that this was a district championship, and that it was in the 2019ne district, and also various other details. However, team_events gets us all events - including offseasons, which we don't want district points for.

Comparably, an offseason event blob may look like this:

{'address': '7802 Hague Rd, Indianapolis, IN 46256, USA',
  'city': 'Indianapolis',
  'country': 'USA',
  'district': None,
  'division_keys': [],
  'end_date': '2019-07-13',
  'event_code': 'iri',
  'event_type': 99,
  'event_type_string': 'Offseason',
  'first_event_code': 'IRI',
  'first_event_id': None,
  'gmaps_place_id': 'ChIJt-fTNsJMa4gRk20afFmPaQU',
  'gmaps_url': 'https://maps.google.com/?cid=390000457241226643',
  'key': '2019iri',
  'lat': 39.8961663,
  'lng': -86.0349536,
  'location_name': 'Lawrence North High School',
  'name': 'Indiana Robotics Invitational',
  'parent_event_key': None,
  'playoff_type': 0,
  'playoff_type_string': 'Elimination Bracket (8 Alliances)',
  'postal_code': '46256',
  'short_name': 'Indiana Robotics Invitational',
  'start_date': '2019-07-12',
  'state_prov': 'IN',
  'timezone': 'America/Indiana/Indianapolis',
  'webcasts': [{'channel': 'firstinspires', 'type': 'twitch'}],
  'website': 'http://indianaroboticsinvitational.org/',
  'week': None,
  'year': 2019}

You'll note that 'district' is None - which is the equivalent of null in Java or C++.

So lets iterate over all the events and print which ones are district events:

enter image description here

You'll note that we can iterate over all events by for event in all_events. We can access a given events district value by event["district"]. We check if its not None - which means it's an event that belongs to a district. We can then print a string with the name of the event embedded in it by using "f-strings". f'{event_name}' simply embeds the event_name variable into the string.

But unfortunately our given event blobs don't include district points information. However, that information is available elsewhere in the API, so we need to make another call to the API. You'll note that the district points data is structured as such:

enter image description here

enter image description here

We're nearly there! Now we just need to calculate an average of the 'total' field...

enter image description here

Now here's the issue - we're including out of district events! 2168 is a New England team that participated in the 2019 Springside Chestnut Hill FMA event, where they didn't earn any district points. As a challenge to the reader, try calculating the average district points that 2168 earned at each New England (NE) event in 2019. Your answer should be 116.67.

Hint: Check the if statement on line 11. :)

[[toc]]

Accessing Bulk TBA Data

As you may have noticed, accessing the TBA API is pretty easy if you need to see one team or one event but what if you wanted to see something else, like how often Team 254 has Red Bumpers. For stuff like that TBA has an mirror of their data to a publicly queryable data set available at https://thebluealliance.com/bigquery.

Simply put, it's a database that you can ask questions. If you're familiar with querying other databases using SQL it should feel fairly similar.

Description of Some Tables

Team

This table describes information about any FRC Team in the database. This doesn't exactly mean that team ever competed and in some cases only existed at off season events. So, sadly, it's not as easy to find the list of all FRC teams by looking here.

Event

Should be fairly obvious, the only big catch is that it includes off season events. As a general trick you can filter by event_type_enum < 7 to exclude unofficial events or only include certain types of events, there is also an official column that works just as well but doesn't let you exclude things like Einstein which can sometimes skew data.

EventTeam

This is the join table that let's you go between Event and Team.

Match

This is where things get fun, it contains every match TBA has. The catch it's stored oddly because of how the Google infrastructure that TBA works on holds data. This means that getting things like if a team won a match can be difficult. This table joins back on Event, but inside the alliances_json you can also join back to Team.

Querying Data

You'll need to create a project and enable Big Query in it, the good news this is free. At the time of writing it's in a dropdown next to the words "Google Cloud Platform". Let's start with a simple query.

SELECT
  team.name
FROM
  `tbatv-prod-hrd.the_blue_alliance.eventTeam`
WHERE
  event.name = "2014abca"

This query is about as simple as it gets. It says to select the name on the team structure from the EventTeam table where the name of the event is "2014abca". If we wanted to see a different list of teams we could change the event name. The thing to note is that in both of these cases we are actually looking at the RECORD type stored on this table, so those names are really the names of the keys.

Joining Data

Listening teams that competed in a season is fairly easy, you just need to join to the Event table and filter on the Event.year

SELECT
 DISTINCT team.name
FROM
 `tbatv-prod-hrd.the_blue_alliance.eventTeam` AS ET
JOIN `tbatv-prod-hrd.the_blue_alliance.event` as E
 ON E.__key__.name = ET.event.name
WHERE
 E.year = 2016
AND E.event_type_enum < 7

When you join you need to tell it what table you are joining in and on what. I typically find it nice to alias the table swith an as to minimize my typing.

Helpful Queries

Create a table of team wins

Helpful if you need to find all wins by a team, mostly a base for aggregating the data with GROUP BY and COUNT clauses

-- This is a gnarly function to determine if a team won a match. Bear in mind it specifically does won.
-- !won can include lost or tied, it also doesn't account for red cards
CREATE TEMP FUNCTION
  won(team STRING,
    json STRING)
  RETURNS BOOL
  LANGUAGE js AS """
    var alliances = JSON.parse(json);
     var alliance = alliances.blue.teams.indexOf(team) >= 0? "blue": "red"
     var opp_alliance = alliance == "blue" ? "red" : "blue"
     var our_score = alliances[alliance].score;
     var opp_score = alliances[opp_alliance].score;
     return our_score > opp_score;
  """;
SELECT
  tm,
  M.__key__.name as match_id,
  won(tm,
    alliances_json) AS won,
  event.name as event_id,
FROM
  `tbatv-prod-hrd.the_blue_alliance.match` AS M,
  UNNEST(M.team_key_names) AS tm
JOIN `tbatv-prod-hrd.the_blue_alliance.event` AS E
  ON M.event.name = E.__key__.name
WHERE E.year >= 2005
-- Let's be honest, TBA data is sketchy prior to 2005 at best. 
-- Uncomment to limit to a specific team
-- AND tm = 'frc33'

-- Uncomment to limit to a specific event
-- AND E.__key__.name = "2014abca"

-- Uncomment to limit to a specific year
-- AND E.year == 2018

-- Filter out any off seasons, though you could just as easily only grab districts or such
-- https://github.com/the-blue-alliance/the-blue-alliance/blob/master/consts/event_type.py#L2
AND E.event_type_enum < 7

List teams at an event

Helpful to find a list of teams at an event, just update the event name.

SELECT
  team.name
FROM
  `tbatv-prod-hrd.the_blue_alliance.eventTeam`
WHERE
  event.name = "2014abca"
Teams that attended BOTH events

This one is actually showing a handy trick, if you construct two sets of things you can do set intersections on them. This would work with most of the queries here.

SELECT
  team.name
FROM
  `tbatv-prod-hrd.the_blue_alliance.eventTeam`
WHERE
  event.name = "2018mimil"
  
INTERSECT DISTINCT 

SELECT
  team.name
FROM
  `tbatv-prod-hrd.the_blue_alliance.eventTeam`
WHERE
  event.name = "2018misou"

Teams in a season

Maybe less useful for most but still handy to have, bear in mind ordering will be alphanumeric not numeric.

SELECT
  DISTINCT team.name
FROM
  `tbatv-prod-hrd.the_blue_alliance.eventTeam` AS ET
JOIN `tbatv-prod-hrd.the_blue_alliance.event` as E
  ON E.__key__.name = ET.event.name
WHERE
  E.year = 2016
AND E.event_type_enum < 7

Events a team attended

SELECT
  DISTINCT event.name
FROM
  `tbatv-prod-hrd.the_blue_alliance.eventTeam` as ET
JOIN `tbatv-prod-hrd.the_blue_alliance.event` as E
  ON E.__key__.name = ET.event.name
WHERE
  E.year = 2016
  AND team.name = "frc33"
  AND E.event_type_enum < 7

Awards a team won

SELECT
  name_str,
  event.name
FROM
  `tbatv-prod-hrd.the_blue_alliance.award`,
  UNNEST(team_list) AS TM
WHERE
  TM.name = "frc33"
  AND year > 2005
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment