data

Each year, the Social Security Administration, a government agency, collects data on the frequency of baby names in the US based on applications for Social Security Cards. While the dataset contains nationwide data, we elected to focus on two states with historically high immigration rates: California and New York.

Within each state, the data is organized by year, gender, name, and frequency of the name. For clarity, a name is only included if its frequency in a given year is at least 5; this restricts the dataset to contain approximately the top 250 male and female names each year.

Significance

Baby names are not only a reflection of popular ideals within familial structures, but also a representation of American culture during specific time periods. As a result, this dataset is useful in illuminating certain events and phenomena in modern American history. A noticeable rise in the popularity of certain names during specific years can reflect significant historical or pop cultural events that lead families to name their children after important figures. Additionally, the appearance of names of non-European origin may illustrate the changing demographics of the US, specifically how ethnic groups may find community in certain states over others. By tracking the popularity of certain names over a prolonged period of time, about 110 years, we are able to examine the impacts of social, political, historical and pop cultural events on baby-naming trends in the US.

Critique

Looking at the sources and structure of the dataset, several limitations arose that informed how we would proceed with our analysis. These limitations stem from implicit biases in the very concept of what it means to be considered “American”, and thus included in the data set, as well as the continual struggle to present data that is clear and concise, yet also accurately captures the vast diversity of the US.

The Politics of Social Security Numbers

Because the data is sourced from applications for Social Security numbers (SSN), it is important to examine barriers, in both the past and present, that may have impeded certain groups from completing these applications, effectively being misrepresented in this dataset. When SSNs were first created in 1936, they were meant for a limited demographic of certain worker groups to maintain records of retirement payments. Until the Enumeration at Birth program in 1989, parents had much difficulty in completing an application for their child’s SSN. As a result, the frequencies collected before this program may not be an accurate representation of the true popularity of certain names, as there is no method of validating that every baby born during that time was assigned a Social Security Number.

Further, because issuing an SSN requires submitting documents to the government, parents without US citizenship may withhold from applying, due to the risks of being found out. As a result, this creates a silence in the data for certain children of immigrant parents, which ultimately impacts the types of names present in the dataset. While they could be named based on popular naming trends, they could also be given names that better reflect their cultural and ethnic heritage. The lack of these names in the dataset then fails to accurately capture the rich diversity of the US.

The Threshold for the Inclusion of a Name:

Building on the idea of silences in this dataset, unique names, in which a child may be the only one with the name, are not present, as names are only included if they have been recorded at least 5 times in that year. This can make it difficult to determine when a name was first used, truly tracing back the origins of certain names. Consequently, this dataset is better suited for exploring trends in popularity, rather than where names originated from. The threshold for including a name in the dataset disproportionately affects ethnic and cultural enclaves in certain states, as the names and stories of unique names are not being captured. This speaks to a larger discussion of why certain names are considered “normal”, while others are viewed as “unique”. Sociocultural factors, such as race and ethnicity, impact the way names are perceived, which may influence parents as they consider what they want their children to be named.