July 20, 2017

Analyzing K-pop Top Chart Data (2008-2016)

Looking through the top charts is one of the most intuitive ways to grasp trends in music. However, I thought it would be interesting to venture beyond mere intuition and analyze the data in a more concrete form. In this post, I share the process and results of my exploration of K-pop’s top 100 songs for each year from 2008 to 2016.

1. Obtaining the data

Although information on songs and artists are abundant on various K-pop music streaming/download websites, I could not find a source that provides this data in a downloadable format. I therefore decided that my best option would be to scrape the aforementioned websites, which each display a top chart in daily, weekly, and annual time frames.

There are 4 popular K-pop music websites:

  1. Bugs
  2. Melon
  3. Naver Music
  4. Mnet

I explored the websites quite extensively, focusing on the top chart and song details pages. My planned strategy was to (1) fetch the top 100 songs in a specified time range (e.g. 5 years), then (2) crawl into the details page for each of those songs to scrape fields such as the composers, lyricists, etc. After multiple implementations of different crawlers (which I might cover in a future post), I found that Mnet provided the best scraping experience due to its logical HTML markup.

Link to the Mnet Top 100 chart

I fetched the top 100 songs in each year from 2008 to 2016, a total of 900 records. After preprocessing, the final data was structured with the following fields:

year rank id* title artist featuring composer lyricist arranger time**
2009 50 1906097 In The Club 2NE1   TEDDY,KUSH TEDDY,KUSH TEDDY,KUSH 239
  • *id: a unique number assigned by Mnet.
  • **time: the length of the song in seconds.

The web crawler I used can be found on GitHub. I used the Scrapy framework to easily export fetched data to JSON and CSV formats.

2. Preprocessing

Preprocessing consumed an extensive amount of time. There were two main problems: inconsistency and missing values. The first problem was due to the information being fetched from a public web UI rather than a database; as for the second, I can’t imagine a reason other than pure laziness.

Inconsistency occurred in many places. By far the most annoying were the artists’ names. For example, the group TWICE appeared—fittingly, in two variants—either as “TWICE” or “TWICE (트와이스)”. MC the MAX was sometimes more boldly proclaimed as “MC THE MAX”. Taeyeon appeared as “태연”, “태연 (TAEYEON)”, or “태연 (Taeyeon)”. To fix this, I went through each row manually in Excel and modified each variation to match the majority label.

Composer names were even more inconsistent. Park Jin-young was listed either as “JYP”, “박진영”, or “J.Y. Park `TheAsiansoul`”. Unlike artists, composers and lyricists do not have much incentive to adhere to one set title because they are less exhibited publicly. I did not think I knew enough about composers to be able to convert between the dynamic supply of names they gave themselves, so I ultimately left the values as they are.

Another inconsistency arose in the listing of contributors to the song. These people predominantly consisted of “vocals”, “featuring”, “lyricists”, “composers”, “arrangers”; however, in very rare cases, there would be other labels as well. One was “rapper”, which appeared a mere 7 times out of 900; however, it did not appear in most songs that featured rap! “Producer” made an appearance only 5 times in 900 records. I got rid of this column, as well as “electric guitar” (1 appearance), “piano” (1 appearance), and “narration”.

The “vocals” column was nearly identical to “artists”. The few times they differed was due to a featuring artist or a collaboration. Consequently, I transferred the necessary values from “vocals” to “featuring” and erased the “vocals” column as well.

The fields that contained missing values were “composers”, “lyricists”, and “arrangers”. A total of 47 records lacked values in these columns; and while I could probably have filled them in by doing 47 google searches, I decided to save the fun for a later date.

3. Results

The names and titles of songs are unfortunately mostly in Korean; this is due to the source of the dataset (Mnet) being a Korean website. I’ll try to upload a translated version of the results soon.

Songs that make multiple appearances

title artist ranked years(rank)
벚꽃 엔딩 버스커 버스커 2012(6); 2013(42); 2014(100)
Lost Stars Adam Levine 2014(47); 2015(54)
Problem(Feat. Iggy Azalea) Ariana Grande 2014(93); 2015(67)
붉은 노을 BIGBANG 2008(18); 2009(75)
I`m Not The Only One Sam Smith 2015(20); 2016(72)
U R Man SS501 2008(76); 2009(80)
Officially missing you, too 긱스(Geeks),소유 2012(56); 2013(34)
바람기억 나얼 2012(18); 2015(38)
하지 못한 말 노을 2012(68); 2013(69)
야생화 박효신 2014(2); 2015(90)
총 맞은 것처럼 백지영 2008(23); 2009(83)
너랑 나 아이유(IU) 2011(89); 2012(64)
너 때문에 애프터스쿨(After School) 2009(82); 2010(62)
오늘부터 우리는 (Me gustas tu) 여자친구(GFRIEND) 2015(47); 2016(22)
되돌리다 이승기 2012(66); 2013(53)
또 다시 사랑 임창정 2015(35); 2016(15)
? (물음표)(Feat. 최자 of 다이나믹듀오, Zion.T) 프라이머리(Primary) 2012(95); 2013(56)
가슴 시린 이야기(Feat. 용준형 of BEAST) 휘성 2011(72); 2014(60)

In the dynamic K-pop scene, where new songs by popular artists are released at a rapid pace, it is not easy to hold on to a spot on the top charts. However, there are a total of 18 songs that had appeared in the Top 100 chart in at least two years. Of these, only one song appears THREE times—”벛꽃 엔딩” by 버스커버스커.

However, this metric probably isn’t a good measure of long-lasting popularity as songs that are released later in the year are likely to be listed twice in consecutive years.

Songs by foreign artists

title artist year rank
Lost Stars Adam Levine 2014 47
Lost Stars Adam Levine 2015 54
Rolling In The Deep Adele 2012 69
Hello Adele 2016 67
Problem(Feat. Iggy Azalea) Ariana Grande 2014 93
Problem(Feat. Iggy Azalea) Ariana Grande 2015 67
Nothin` On You(Feat. Bruno Mars) B.O.B. 2010 20
Call Me Maybe Carly Rae Jepsen 2013 66
One Call Away Charlie Puth 2016 44
Bad (Radio Edit)(Feat. Vassy) David Guetta,Showtek 2014 92
How Long Will I Love You Ellie Goulding 2014 57
Love The Way You Lie(Feat. Rihanna) Eminem 2010 53
Let It Go Idina Menzel 2014 9
Moves Like Jagger (Studio Recording From The Voice Performance)(Feat. Christina Aguilera) Maroon 5 2011 86
Payphone(Feat. Wiz Khalifa) Maroon 5 2012 57
One More Night Maroon 5 2012 84
Maps Maroon 5 2014 40
Sugar Maroon 5 2015 15
Bang Bang Nicki Minaj,Jessie J,Ariana Grande 2016 76
I`m Not The Only One Sam Smith 2015 20
I`m Not The Only One Sam Smith 2016 72

The Mnet Top 100 chart also includes songs by foreign artists. There are 18 such songs in total in the charts of 2008-2016. Let It Go by Idina Menzel actually placed within the Top 10 songs of 2014.

Artists with the most ranked songs

Artists with the most ranked songs

One thing to consider here is that members of BIGBANG have released additional songs individually or as units. As can be seen here, G-DRAGON has 7 listed songs; however, not shown are GD&TOP (2 songs), GD X TAEYANG (1 song), and two more songs where G-DRAGON collaborated with members of Infinity Challenge (무한도전). TAEYANG also has 3 songs listed from his solo albums. If you sum up all those appearances, various members of BIGBANG have landed over 40 songs in the Top 100 charts of 2008-2016.

The top 10 songs of each year

Artist frequency distribution

Artist frequency distribution

This chart shows the distribution of how many times each artist appears in the dataset. I had expected a small number of popular artists to dominate, making multiple appearances. However, the data shows this not to be the case; over half of Top 100 songs from 2008-2016 are by artists who appear only once.

But there is a catch: many artists who are part of groups often release songs as individuals or units. For example, Tiffany of SNSD is listed only once in the data while SNSD appears 12 times. Moreover, subsets of a group often release songs under different names, such as EXO-K; collaborations between artists also often utilize a group name, such as 이유 갓지 않은 이유, which is a one-time collaboration between IU and Myungsoo Park (박명수). As a consequence, the actual number of “one-hit-wonders” in the data would be considerably smaller.

Songwriters with the most ranked songs

Songwriters with the most ranked songs

There are a lot more collaborations between songwriters than artists—hence the larger numbers.

JYP is listed under three different names (“JYP”, “박진영”, and “J.Y. Park `TheAsiansoul`”). Summing up all his appearances would rank him somewhere around G-DRAGON and 용감한 형제, with 28 Top 100 songs.

Lyricists with the most ranked songs

Composers with the most ranked songs

4. Reflections

Some results were expected; others were not. Overall, the project was a fun way to explore K-pop. One regret is that the data could have been a lot cleaner; if all the artists were listed by name, as well as the composers and lyricists, it would have been possible to get accurate per-person statistics.

Also, if the data had been categorized with more labels such as gender, genre, etc., there would have been a greater amount of interesting results that could have been extracted. This would take a lot of time to do manually; hopefully there is a useful data source laying around somewhere. I felt that I only scratched the surface of music data analysis with this small project. If time allows, I think it would be fun to continue the analysis in more depth.

Most of the exploration and visualizations were done using Python; I shared the code on my Github.