Inferring cultural regions from correlation networks of given baby names
Abstract
We report investigations on the statistical characteristics of the baby names given between 1910 and 2010 in the United States of America. For each year, the 100 most frequent names in the USA are sorted out. For these names, the correlations between the names profiles are calculated for all pairs of states (minus Hawaii and Alaska). The correlations are used to form a weighted network which is found to vary mildly in time. In fact, the structure of communities in the network remains quite stable till about 1980. The goal is that the calculated structure approximately reproduces the usually accepted geopolitical regions: the Northeast, the South, and the "Midwest + West" as the third one. Furthermore, the dataset reveals that the name distribution satisfies the Zipf law, separately for each state and each year, i.e. the name frequency f ∝r-α, where r is the name rank. Between 1920 and 1980, the exponent α is the largest one for the set of states classified as 'the South', but the smallest one for the set of states classified as "Midwest + West". Our interpretation is that the pool of selected names was quite narrow in the Southern states. The data is compared with some related statistics of names in Belgium, a country also with different regions, but having quite a different scale than the USA. There, the Zipf exponent is low for young people and for the Brussels citizens.
- Publication:
-
Physica A Statistical Mechanics and its Applications
- Pub Date:
- March 2016
- DOI:
- 10.1016/j.physa.2015.11.003
- arXiv:
- arXiv:1512.02159
- Bibcode:
- 2016PhyA..445..169P
- Keywords:
-
- Networks;
- Communities;
- Names;
- Physics - Physics and Society;
- Computer Science - Social and Information Networks
- E-Print:
- Physica A 445 (2016) 169-175