Sentiment Analysis with songs

Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text data using natural language processing(NLP) techniques.

In this session, we are going to use songs from R package genius with bing lexicon. The bing lexicon categorizes words in a binary fashion into positive and negative categories with weighted sentiment scores.

First, load libraries

library(pacman)
p_load(tidyr,tidyverse,tidytext,forcats, genius)

then download songs using genius functions and perform sentiment analysis. Steps are described below.

  1. Aerosmith - I don’t want to miss a thing
#get song lyrics
aero_smith_i_don <- genius_lyrics(artist = "Aerosmith", song = " I don't want to miss a thing")
aero_smith_i_don
## # A tibble: 59 x 3
##    track_title                line lyric                                        
##    <chr>                     <int> <chr>                                        
##  1 I Don’t Want to Miss a T…     1 I could stay awake just to hear you breathin'
##  2 I Don’t Want to Miss a T…     2 Watch you smile while you are sleeping       
##  3 I Don’t Want to Miss a T…     3 While you're far away and dreaming           
##  4 I Don’t Want to Miss a T…     4 I could spend my life in this sweet surrender
##  5 I Don’t Want to Miss a T…     5 I could stay lost in this moment forever     
##  6 I Don’t Want to Miss a T…     6 Where every moment spent with you is a momen…
##  7 I Don’t Want to Miss a T…     7 Don't want to close my eyes                  
##  8 I Don’t Want to Miss a T…     8 I don't want to fall asleep                  
##  9 I Don’t Want to Miss a T…     9 'Cause I'd miss you, baby                    
## 10 I Don’t Want to Miss a T…    10 And I don't wanna miss a thing               
## # … with 49 more rows
#tidy up lyrics
i_dont_tidy <- aero_smith_i_don  %>% select(lyric, track_title) %>% unnest_tokens(word, lyric)
i_dont_tidy
## # A tibble: 390 x 2
##    track_title                  word    
##    <chr>                        <chr>   
##  1 I Don’t Want to Miss a Thing i       
##  2 I Don’t Want to Miss a Thing could   
##  3 I Don’t Want to Miss a Thing stay    
##  4 I Don’t Want to Miss a Thing awake   
##  5 I Don’t Want to Miss a Thing just    
##  6 I Don’t Want to Miss a Thing to      
##  7 I Don’t Want to Miss a Thing hear    
##  8 I Don’t Want to Miss a Thing you     
##  9 I Don’t Want to Miss a Thing breathin
## 10 I Don’t Want to Miss a Thing watch   
## # … with 380 more rows
# join with sentiment lexicon
i_dont_sentiments<- i_dont_tidy%>%
inner_join(get_sentiments("bing"), by = c(word = "word"))
i_dont_sentiments
## # A tibble: 35 x 3
##    track_title                  word      sentiment
##    <chr>                        <chr>     <chr>    
##  1 I Don’t Want to Miss a Thing smile     positive 
##  2 I Don’t Want to Miss a Thing sweet     positive 
##  3 I Don’t Want to Miss a Thing surrender negative 
##  4 I Don’t Want to Miss a Thing lost      negative 
##  5 I Don’t Want to Miss a Thing treasure  positive 
##  6 I Don’t Want to Miss a Thing fall      negative 
##  7 I Don’t Want to Miss a Thing miss      negative 
##  8 I Don’t Want to Miss a Thing miss      negative 
##  9 I Don’t Want to Miss a Thing miss      negative 
## 10 I Don’t Want to Miss a Thing miss      negative 
## # … with 25 more rows
#bargraph word-sentiment
i_dont_sentiments %>%
  count(sentiment, word) %>%
  ungroup() %>%
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = sentiment)) +
  geom_bar(stat = "identity") +
  ylab("Contribution to sentiment") +
  coord_flip()

  1. The Weeknd - Earned It
the_weeknd_earned_it <- genius_lyrics(artist = "The Weeknd", song = "Earned It")
the_weeknd_earned_it
## # A tibble: 51 x 3
##    track_title  line lyric                                        
##    <chr>       <int> <chr>                                        
##  1 Earned It       1 I'ma care for you                            
##  2 Earned It       2 I'ma care for you, you, you, you             
##  3 Earned It       3 You make it look like it's magic (Oh yeah)   
##  4 Earned It       4 'Cause I see nobody, nobody but you, you, you
##  5 Earned It       5 I'm never confused                           
##  6 Earned It       6 Hey, hey                                     
##  7 Earned It       7 I'm so used to bein' used                    
##  8 Earned It       8 So I love when you call unexpected           
##  9 Earned It       9 'Cause I hate when the moment's expected     
## 10 Earned It      10 So I'ma care for you, you, you               
## # … with 41 more rows
earned_it_tidy <- the_weeknd_earned_it  %>% select(lyric, track_title) %>% unnest_tokens(word, lyric)
earned_it_tidy
## # A tibble: 325 x 2
##    track_title word 
##    <chr>       <chr>
##  1 Earned It   i'ma 
##  2 Earned It   care 
##  3 Earned It   for  
##  4 Earned It   you  
##  5 Earned It   i'ma 
##  6 Earned It   care 
##  7 Earned It   for  
##  8 Earned It   you  
##  9 Earned It   you  
## 10 Earned It   you  
## # … with 315 more rows
earned_it_sentiments<- earned_it_tidy%>%
inner_join(get_sentiments("bing"), by = c(word = "word"))
earned_it_sentiments
## # A tibble: 36 x 3
##    track_title word       sentiment
##    <chr>       <chr>      <chr>    
##  1 Earned It   like       positive 
##  2 Earned It   magic      positive 
##  3 Earned It   confused   negative 
##  4 Earned It   love       positive 
##  5 Earned It   unexpected negative 
##  6 Earned It   hate       negative 
##  7 Earned It   perfect    positive 
##  8 Earned It   worth      positive 
##  9 Earned It   work       positive 
## 10 Earned It   love       positive 
## # … with 26 more rows
earned_it_sentiments %>%
  count(sentiment, word) %>%
  ungroup() %>%
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = sentiment)) +
  geom_bar(stat = "identity") +
  ylab("Contribution to sentiment") +
  coord_flip()

  1. Pharrell Williams - Happy
pharrell_will_happy <- genius_lyrics(artist = "Pharrell Williams", song = "Happy")
pharrell_will_happy
## # A tibble: 67 x 3
##    track_title  line lyric                                             
##    <chr>       <int> <chr>                                             
##  1 Happy           1 <NA>                                              
##  2 Happy           2 It might seem crazy what I'm 'bout to say         
##  3 Happy           3 Sunshine she's here, you can take a break         
##  4 Happy           4 I'm a hot air balloon that could go to space      
##  5 Happy           5 With the air, like I don't care, baby, by the way 
##  6 Happy           6 (Because I'm happy)                               
##  7 Happy           7 Clap along if you feel like a room without a roof 
##  8 Happy           8 (Because I'm happy)                               
##  9 Happy           9 Clap along if you feel like happiness is the truth
## 10 Happy          10 (Because I'm happy)                               
## # … with 57 more rows
happy_tidy <- pharrell_will_happy  %>% select(lyric, track_title) %>% unnest_tokens(word, lyric)
happy_tidy
## # A tibble: 473 x 2
##    track_title word 
##    <chr>       <chr>
##  1 Happy       <NA> 
##  2 Happy       it   
##  3 Happy       might
##  4 Happy       seem 
##  5 Happy       crazy
##  6 Happy       what 
##  7 Happy       i'm  
##  8 Happy       bout 
##  9 Happy       to   
## 10 Happy       say  
## # … with 463 more rows
happy_sentiments<- happy_tidy%>%
inner_join(get_sentiments("bing"), by = c(word = "word"))
happy_sentiments
## # A tibble: 63 x 3
##    track_title word      sentiment
##    <chr>       <chr>     <chr>    
##  1 Happy       crazy     negative 
##  2 Happy       break     negative 
##  3 Happy       hot       positive 
##  4 Happy       like      positive 
##  5 Happy       happy     positive 
##  6 Happy       like      positive 
##  7 Happy       happy     positive 
##  8 Happy       like      positive 
##  9 Happy       happiness positive 
## 10 Happy       happy     positive 
## # … with 53 more rows
happy_sentiments %>%
  count(sentiment, word) %>%
  ungroup() %>%
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = sentiment)) +
  geom_bar(stat = "identity") +
  ylab("Contribution to sentiment") +
  coord_flip()

  1. Summertime Sadness – Lana Del Rey
 Summertime_Sadness <- genius_lyrics(artist = "Lana Del Rey", song = "Summertime Sadness")
Summertime_Sadness
## # A tibble: 54 x 3
##    track_title         line lyric                                          
##    <chr>              <int> <chr>                                          
##  1 Summertime Sadness     1 Kiss me hard before you go                     
##  2 Summertime Sadness     2 Summertime sadness                             
##  3 Summertime Sadness     3 I just wanted you to know                      
##  4 Summertime Sadness     4 That, baby, you're the best                    
##  5 Summertime Sadness     5 I got my red dress on tonight                  
##  6 Summertime Sadness     6 Dancin' in the dark in the pale moonlight      
##  7 Summertime Sadness     7 Done my hair up real big, beauty queen style   
##  8 Summertime Sadness     8 High heels off, I'm feelin' alive              
##  9 Summertime Sadness     9 Oh, my God, I feel it in the air               
## 10 Summertime Sadness    10 Telephone wires above are sizzlin' like a snare
## # … with 44 more rows
Summertime_Sadness_tidy <- Summertime_Sadness %>% select(lyric, track_title) %>% unnest_tokens(word, lyric)
Summertime_Sadness_tidy
## # A tibble: 312 x 2
##    track_title        word      
##    <chr>              <chr>     
##  1 Summertime Sadness kiss      
##  2 Summertime Sadness me        
##  3 Summertime Sadness hard      
##  4 Summertime Sadness before    
##  5 Summertime Sadness you       
##  6 Summertime Sadness go        
##  7 Summertime Sadness summertime
##  8 Summertime Sadness sadness   
##  9 Summertime Sadness i         
## 10 Summertime Sadness just      
## # … with 302 more rows
Summertime_Sadness_sentiments<- Summertime_Sadness_tidy%>%
inner_join(get_sentiments("bing"), by = c(word = "word"))
Summertime_Sadness_sentiments
## # A tibble: 39 x 3
##    track_title        word    sentiment
##    <chr>              <chr>   <chr>    
##  1 Summertime Sadness hard    negative 
##  2 Summertime Sadness sadness negative 
##  3 Summertime Sadness best    positive 
##  4 Summertime Sadness dark    negative 
##  5 Summertime Sadness pale    negative 
##  6 Summertime Sadness beauty  positive 
##  7 Summertime Sadness like    positive 
##  8 Summertime Sadness snare   negative 
##  9 Summertime Sadness hard    negative 
## 10 Summertime Sadness sadness negative 
## # … with 29 more rows
Summertime_Sadness_sentiments %>%
  count(sentiment, word) %>%
  ungroup() %>%
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = sentiment)) +
  geom_bar(stat = "identity") +
  ylab("Contribution to sentiment") +
  coord_flip()

This is a glimpse of sentiment analysis and it’s use. With few steps, one can analyse texts with positive and negative emotions in a manner general public can interpret. Imagine significance of sentiment analysis in social media monitoring, brand monitoring, market research and customer service.

Min Tamang
Min Tamang
Statistics | Data Science | QA Engineer

Related