Nick Young

Nick Young

Analysis and visualizations - for the interesting and inane.

29 Sep 2020

Beyoncé and Taylor Swift Lyrics

Beyonce: public domain license, Taylor Swift: Creative Commons, taken by David Shankbone

TidyTuesday

Join the R4DS Online Learning Community in the weekly #TidyTuesday event! Every week we post a raw dataset, a chart or article related to that dataset, and ask you to explore the data. While the dataset will be “tamed”, it will not always be tidy! As such you might need to apply various R for Data Science techniques to wrangle the data into a true tidy format. The goal of TidyTuesday is to apply your R skills, get feedback, explore other’s work, and connect with the greater #RStats community! As such we encourage everyone of all skills to participate!

Acknowledgement

This weeks Tidy Tuesday data is provided by Rosie Baillie and Dr. Sara Stoudt

Personal Intent

My intentions with Tidy Tuesday projects are to learn new R packages in regards to data-mining and visualization, to practice tidying data, refine best practices in writing code, and lastly to uncover new learnings within the data that I am working on and convey those in effective and meaningful ways to readers.

Load the weekly Data

Dowload the weekly data and make available in the tt object or variables of your choosing.

The Shape of the Data and Possible Explorations

We’re kicking off this analysis with 4 data frames given to us that give us context around the lyrics contained in both Beyonce and Taylor Swift’s songs and their album performances in terms of sales and chart position. The Taylor Swift data re: lyrics was already provided with an album variable to determine what album the song and lyrics came from. The Beyonce lyrics data did not have specificity to this level. I originally did not want to update the Beyonce data frame to provide info at this level, but I later decided to add it in myself for the purpose of using it for further analysis into Beyonce’s lyrics as a rough map to distinguish her emotional state and personal life given public knowledge into major personal events that are available given her life as a celebrity.

Furthermore, this gives us the potential to compare sentiment analysis between Beyonce and Taylor Swift at an album level to see if there are any distinguishing lingual characteristics of their lyrics that might explain their sales or chart positions in English speaking countries.

Taking all songs into consideration, does Taylor Swift write ‘happier’ music than Beyonce, or is she angrier? How do these perceptions affect their global performance in English speaking countries? Does Canada like songs about love more than the US? Do Beyonce albums consistently chart lower in New Zealand than the US because they find her lyrics around slavery and emancipation a twinge unrelatable? I’ve got no idea, I personally don’t like either of these artists. Yet still, we’ve got data, and with that I can suggest something plausible that’s got little merit besides that numbers I’m building upon. Sometimes that’s all the merit needed, and sometimes domain knowledge is everything. Let’s code!

Chart Positions by Album

charts %>%
  filter(!is.na(chart_position)) %>%
  arrange(artist) %>%
  mutate(title = fct_inorder(title)) %>%
  
ggplot(aes(title, factor(chart_position),fill = artist)) + geom_col() +
  facet_wrap(~chart, scales = 'free_y') + 
  labs(title = 'Chart Positions of Beyonce & Taylor Swift Albums', subtitle = 'Lower is better') +
  xlab('Album Title') + ylab('Chart Position #') + 
  theme_minimal() + theme(axis.text.x = element_text(angle = -45, vjust= 0.2, hjust= 0.1, size = 6), axis.text.y = element_text(size = 6)) + 
  scale_fill_manual(values=c("#999999", "#E69F00")) + scale_color_manual(values=c("#999999", "#E69F00"))

Taylor Swift has released more #1 albums (US) than Beyonce, but Beyonce has a better #1 rate. 6 of 6 Beyonce albums have made #1 in the US. Only 7 of 8 Taylor Swift albums have made #1.

How many #1 placements have Beyonce and Taylor Swift had each globally?

charts %>%
  filter(chart_position == 1) %>%
  select(artist) %>% table()
## .
##      Beyoncé Taylor Swift 
##           23           38

Taylor swift has had more #1 placements globally for all her albums than Beyonce.

What countries are these that buy so much of Swift’s albums that she gets that #1 spot? Are they different from the countries that propel Beyonce to #1?

charts %>% 
  filter(chart_position ==1, artist == 'Beyoncé') %>%
  select(chart) %>% table()
## .
## AUS CAN GER IRE JPN  NZ SWE  UK  US 
##   2   3   1   4   2   1   1   3   6
charts %>% 
  filter(chart_position ==1, artist == 'Taylor Swift') %>%
  select(chart) %>% table()
## .
## AUS CAN IRE  NZ SWE  UK  US 
##   6   7   5   7   1   5   7

Taylor Swift has considerably more or equal #1 placements in almost all the countries that Beyonce has placements in, except for two where Taylor Swift hasn’t made it to #1 at all: Germany and Japan. In contrast, Taylor Swift has had 7 of her albums make it to #1 in New Zealand, where Beyonce has only made it once.

However, this just observes how many #1 placements each artist has had and where. What’s important to ask is if they ever competed directly with each other. Otherwise, if we’re trying to compare the general musical agreeableness between Swift and Beyonce, it doesn’t really matter if Taylor Swift got a string of #1 albums during years when Beyonce hadn’t released anything.

inner_join(
charts %>%
  filter(artist == 'Taylor Swift') %>%
  select(release_year) %>% unique()
,
charts %>%
  filter(artist == regex('Beyoncé')) %>%
  select(release_year) %>% unique()
, by = "release_year")
## # A tibble: 2 x 1
##   release_year
##          <dbl>
## 1         2006
## 2         2008

It seems very much to be the case that Taylor Swift and Beyonce albums have never had to compete judging based on their release years - except for two years. What albums were these that may have competed with each other for chart position and sales?

## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 4 x 3
##   title                artist       release_year
##   <chr>                <chr>               <dbl>
## 1 B'Day                Beyoncé              2006
## 2 Taylor Swift         Taylor Swift         2006
## 3 Fearless             Taylor Swift         2008
## 4 I Am... Sasha Fierce Beyoncé              2008

Readability is a little messy, but what’s important to note is that in 2006, Beyonce consistently outplaced Taylor Swift in all charts, if Taylor swift even placed on the charts at all. Then, come 2008, Taylor Swift outperformed or equaled in chart position with Beyonce in 5 of 9 countries. Ever since 2008, Beyonce and Taylor Swift have never released an album in the same year. Could that be an intentional choice on from either’s management? I may not know much about either of their music, but I know enough about music and competition to know that release date is often a contentious matter. Just recall all the hullabaloo around Kanye West and 50 Cent’s album releases for Graduation and Curtis respectively. The release dates were a week apart until Kanye West bumped his release date up to coincide with Curtis. Famously, Kanye West outperformed 50 Cent, but was this a risky move that cannibalized both of their sales? So then, couldn’t it be possible Beyonce hasn’t released an album in the same year as Taylor Swift after 2008 to keep her sales and chart positions high? Again, a plausible possibility worth questioning, but not for this write-up.

Is there a Lyrical Cause for their Chart Placement?

We could certainly attribute Taylor Swift’s consistent #1 placements to a matter of taste: maybe her pop sound is just more catholic than the R&B and rap sound Beyonce sports. But then I wouldn’t get to do all this cool sentiment and emotion analysis of their lyrics! Let’s wrongly assume there’s no difference in their sound and the lyrical content is a perfectly valid, even the only, explanation for chart positions and album sales.

Word Frequency for Taylor Swift

Not at all shocked that ‘love’ and ‘time’ are the top two words. It also makes me laugh that ‘bad’ and ‘girl’ appear with the same frequency. Perhaps Swift has been a bit less than good.

Word Frequency for Beyonce

For how boy-crazy I’d always assumed Taylor Swift was, it’s pretty interesting that ‘boy’ is in Beyonce’s top 10, but not Swift’s.

Top 10 Words Compared

Some overlap for sure, but enough uniqueness to make some characterizations of both artists. Swift mentions ‘girl’ as a top 10 word, but never boy. Could be that her lyrics are a bit more introspective or deal much with feminine identity. She also uses the word ‘night’ pretty frequently, slightly more than Beyonce. Is she a night owl? On the other hand, Beyonce says ‘baby’ a significant amount more than Swift. Is ‘baby’ more than just a romantic pet name to Beyonce? Taylor Swift doesn’t have any children, whereas Beyonce has 3.

Positive/Negative Analysis of Beyonce

## NULL

As I understand it, these are Beyonce’s strongest words ranked by positive/negative connotation. The size of the word represents how often it is used.

Positive/Negative Analysis of Taylor Swift

## NULL

Let’s not put too much weight into either of these. Language is so subjective to really say this is representive of the distribution of positive/negative words for either artist. A neat visualization nonetheless.

Sentiment Analysis for Beyonce

bey_lyrics_vector <- as.character(beyonce_lyrics_total)
bey_lyrics_sentiment <- get_nrc_sentiment((bey_lyrics_vector))
## Warning: `filter_()` is deprecated as of dplyr 0.7.0.
## Please use `filter()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
bey_sent_scores <- data.frame(colSums(bey_lyrics_sentiment[,]))
names(bey_sent_scores) <- 'Score'
bey_sent_scores <- cbind('sentiment' = rownames(bey_sent_scores), bey_sent_scores)

ggplot(bey_sent_scores %>%  mutate(sentiment = fct_inorder(sentiment)), aes(sentiment, Score)) + geom_bar(aes(fill = fct_inorder(sentiment)), stat = 'identity', show.legend = FALSE) + labs(title = "Emotions in Beyonce's Entire Discography")

ts_lyrics_vector <- as.character(taylor_swift_lyrics_total)
ts_lyrics_sentiment <- get_nrc_sentiment((ts_lyrics_vector))
ts_sent_scores <- data.frame(colSums(ts_lyrics_sentiment[,]))
names(ts_sent_scores) <- 'Score'
ts_sent_scores <- cbind('sentiment' = rownames(ts_sent_scores), ts_sent_scores)

ggplot(ts_sent_scores %>% mutate(sentiment = fct_inorder(sentiment)), aes(sentiment, Score)) + geom_bar(aes(fill = fct_inorder(sentiment)), stat = 'identity', show.legend = FALSE) + labs(title = "Emotions in Taylor Swift's Entire Discography")

Judging from this, it seems like Taylor Swift’s music is sadder in general and brings about more perceptions of negativity.

Beyonce’s Lyrics Chronologically

It’s time to put any mirage of my impartiality to rest. I may not listen to either of these artists actively, but if I had to pick one over the other I’d choose Beyonce. 1000%. I think I just enjoy Beyonce’s collaborators, producers, and general creative circle much more than Taylor Swift’s. Also, Beyonce as a musical star came to prominence much earlier in my life, when pop music was a lot more pervasive despite what my actual taste in music was. Taylor Swift came up when I had more autonomy in the choosing of my musical environment. So, that said, I will admit I know a bit more about Beyonce and her life. This is also because of a lot of the Wikipedia scouring I had to do to see what songs belonged to what albums to transform the beyonce_lyrics data frame.

For example, I am aware that she has had three children in the years (checks Wikipedia) 2012 and 2017. Only her first child, Blue Ivy, born in 2012, was likely conceived during the time which she may have been writing music for her 2013 self-titled album. The quick wiki check also informed me she had a grievous miscarriagearound 2010 or 2011 which spurred a bout of song-writing as catharsis, the progeny of which is most likely included in her album 4. Lastly, it’s relatively common knowledge that Jay-Z had an alleged affair and some of the content around her album Lemonade is written about that. Without outlining Beyonce’s life any further, let’s see if the word count of her albums chronologically can give us any insight into her life during those times.

## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

It’s a hard thing to say whether or not the top words used can indicate events in Beyonce’s personal life, but we can point out some interesting connections nonetheless, as long as we refrain from assuming any statistical significance. For example, Beyonce used the word ‘love’ in her debut album a bit more than 125 times. This is the most she’s used it across any of her albums. This album also happens to mark right around when her relationship with Jay-Z was only inchoate and perhaps still in a phase of ‘puppy-love.’ It should be noted, she says ‘love’ 22 times in her single ‘Crazy In Love,’ which contributes to about 17% of the total times used. In the album 4, which was the subsequent album after her miscarriage, Beyonce’s second most used word is ‘baby.’ Lemonade, which is the only album containing lyrics regarding Jay-Z’s infidelity (that we know of) is the only album to feature ‘hurt’ as a top 10 word.

for (i in 1:length(sort(unique(as.integer(beyonce_lyrics$album))))){
print(get_nrc_sentiment(as.character(beyonce_lyrics %>% filter(as.integer(album) == i) %>% unnest_tokens(word, song_lyrics) %>% anti_join(stop_words, by = c('word' = 'word')) %>% filter(!word %in% custom_ts_stop_words, !word %in% custom_bey_stop_words) %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)))) %>% pivot_longer(cols = c(anger, anticipation, disgust, fear, joy, sadness, surprise, trust, negative, positive) , names_to = 'emotion') %>% arrange(desc(value)) %>% ggplot(aes(fct_inorder(emotion), value)) + geom_col(aes(fill = emotion)) + theme(legend.position = 'none') + labs(title = paste('Sentiment Analysis of', sort(unique(beyonce_lyrics$album))[i])) + xlab('Sentiment'))
}
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

Overall, Beyonce’s albums are more positive than negative. Except for Lemonade, that is. Trust also ranks high in all her albums (#1 aside from negative/positive), but suddenly drops as of Lemonade.

Reflections and Learnings

I’m deciding to call this analysis to an end. Tidy Tuesday is a way for me to practice new R packages, learn new methodologies, brush up on viz techniques or gain experience in data-wrangling. At this point in the project, I feel like I am going in circles with my visualizations and am no longer interested to continue on exploring the data.

In retrospect, I feel satisfied nonetheless as I have had my first foray into NLP and sentiment analysis, I’ve ventured outside of ggplot for the first time with the use of wordcloud, I’ve created my first interactive table with reactable, and made some long for loops that I look back on in awe. It can hard to uncover meaningful learnings from data you lack domain knowledge in. Ever worse, it can be especially difficult to uncover meaningful learning from data you lack any real interest in. Nonetheless, I feel like I’ve learned from this Tidy Tuesday, if not a little about Beyonce and Taylor Swift, then at least a bit about NLP and sentiment analysis.

For next time, I’d like to get more practice with other viz packages, particularly ggarrange or other viz packages that collate multiple plots and graphs.