Last active
March 26, 2024 10:25
-
-
Save Mlawrence95/f697aa939592fa3ef465c05821e1deed to your computer and use it in GitHub Desktop.
Python: create a confusion matrix across two columns in a Pandas dataframe having only categorical data
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
def confusion_matrix(df: pd.DataFrame, col1: str, col2: str): | |
""" | |
Given a dataframe with at least | |
two categorical columns, create a | |
confusion matrix of the count of the columns | |
cross-counts | |
use like: | |
>>> confusion_matrix(test_df, 'actual_label', 'predicted_label') | |
""" | |
return ( | |
df | |
.groupby([col1, col2]) | |
.size() | |
.unstack(fill_value=0) | |
) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Very nice. I came up with
and was looking for a way to turn the resulting multi-index Series into a DataFrame, but I like your solution much better!