Semicircle of semisegregation

In my previous plot, I noted that black student to white student ratio drove most of the variability in the school demographic data. This can be seen here in this principal component analysis of the local school demographics data. Interestingly, the datapoints just form this curved arc in a two-dimensional PCA space, which is distilling down the information in the more multi-dimensional traditional demographic label space.

Seems kind of crazy that the first two PCAs capture almost all of the variability within the local school demographics data, yet that is supported by the corresponding scree plot.

So what are the loadings for the first two principal components? Here’s a plot of that now:

So in other words: The first principal component is reflecting the frequency of non-white and non-black students are in a given school. Generally speaking, the higher the value of PC1, the fewer asian, hispanic, native american, pacific islander, or mixed background students are in that school. The second principal component is now separating the schools based on the relative proportions of black vs white students there are. Generally speaking, a positive PC2 value means there are more black students at that school, whereas a negative value means there are mostly white students.

But ya, kind of crazy that those two principal components explain all the demographic variability of schools (and cities?) in this area: 1) What is the relative ratio of black and white students, and 2) What total proportion of non-black and non-white students are there are the school.