College Demographics

As I’ve applied for faculty positions across different departments in colleges / universities / institutes across the countries, I’ve though a bit about the demographics of the students present at each place. I remembered that the National Center for Education Statistics has a pretty nice searchable dataset for finding this exact information out. Thus, I pulled data for a subset of colleges / universities (This is a much larger set than the set of schools I have actually applied to), selectively looking at their seven self-identified racial / ethnic categories. I ran a PCA to try to reduce the dimensionality down to a something I can somewhat visualize by scatterplot (Schools are colored according to somewhat arbitrary geographical groupings, except for Ivy League schools):
Some things to note:
1) Some points were clear outliers, and I pulled them in somewhat to keep everything in an interpretable plot. University of Hawaii was really different due to the number of self-identified mixed race students, whereas the historically black colleges were really different due to their large number of self-identified black students. I didn’t feel I needed to manually adjust the scale of University of New Mexico and New Mexico state, though they had a large proportion of self-identified hispanic students (without a correspondingly high number of asian students you observe with many of the California schools).
2) Coming from California, I had never quite appreciated the general lack of self-identified racial / ethnic diversity in the schools in the Mountain region and Midwest (Collectively, the minority populations typically make up less than a quarter of the student body).

Here’s what the plot of the first two principal components look like once I remove UH, Howard, and Morehouse:

Lastly, here’s the data table if anyone else wants to look at the actual numbers.

Edit 11/14/2020: I added this GitHub repo here for educational purposes.