Analyzing K-Pop Top Charts (2008–2016)

In this post, I share the process and results of my exploration of K-pop’s top 100 songs for each year from 2008 to 2016.

Looking through the top charts is one of the most intuitive ways to grasp trends in music. However, I thought it would be interesting to venture beyond mere intuition and analyze the data in a more concrete form. In this post, I share the process and results of my exploration of K-pop’s top 100 songs for each year from 2008 to 2016.

1. Obtaining the data

Although information on songs and artists are abundant on various K-pop music streaming/download websites, I could not find a source that provides this data in a downloadable format. I therefore decided that my best option would be to scrape the aforementioned websites, which each display a top chart in daily, weekly, and annual time frames.

There are 4 popular K-pop music websites:

  1. Bugs
  2. Melon
  3. Naver Music
  4. Mnet

I explored the websites quite extensively, focusing on the top chart and song details pages. My planned strategy was to (1) fetch the top 100 songs in a specified time range (e.g. 5 years), then (2) crawl into the details page for each of those songs to scrape fields such as the composers, lyricists, etc. After multiple implementations of different crawlers (which I might cover in a future post), I found that Mnet provided the best scraping experience due to its logical HTML markup.

Link to the Mnet Top 100 chart

I fetched the top 100 songs in each year from 2008 to 2016, a total of 900 records. After preprocessing, the final data was structured with the following fields:

yearrankid*titleartistfeaturingcomposerlyricistarrangertime**
2009501906097In The Club2NE1TEDDY,KUSHTEDDY,KUSHTEDDY,KUSH239
  • *id: a unique number assigned by Mnet.
  • **time: the length of the song in seconds.

The web crawler I used can be found on GitHub. I used the Scrapy framework to easily export fetched data to JSON and CSV formats.

2. Preprocessing

Preprocessing consumed an extensive amount of time. There were two main problems: inconsistency and missing values. The first problem was due to the information being fetched from a public web UI rather than a database; as for the second, I can’t imagine a reason other than pure laziness.

Inconsistency occurred in many places. By far the most annoying were the artists’ names. For example, the group TWICE appeared—fittingly, in two variants—either as “TWICE” or “TWICE (트와이스)”. MC the MAX was sometimes more boldly proclaimed as “MC THE MAX”. Taeyeon appeared as “태연”, “태연 (TAEYEON)”, or “태연 (Taeyeon)”. To fix this, I went through each row manually in Excel and modified each variation to match the majority label.

Composer names were even more inconsistent. Park Jin-young was listed either as “JYP”, “박진영”, or “J.Y. Park `TheAsiansoul`”. Unlike artists, composers and lyricists do not have much incentive to adhere to one set title because they are less exhibited publicly. I did not think I knew enough about composers to be able to convert between the dynamic supply of names they gave themselves, so I ultimately left the values as they are.

Another inconsistency arose in the listing of contributors to the song. These people predominantly consisted of “vocals”, “featuring”, “lyricists”, “composers”, “arrangers”; however, in very rare cases, there would be other labels as well. One was “rapper”, which appeared a mere 7 times out of 900; however, it did not appear in most songs that featured rap! “Producer” made an appearance only 5 times in 900 records. I got rid of this column, as well as “electric guitar” (1 appearance), “piano” (1 appearance), and “narration”.

The “vocals” column was nearly identical to “artists”. The few times they differed was due to a featuring artist or a collaboration. Consequently, I transferred the necessary values from “vocals” to “featuring” and erased the “vocals” column as well.

The fields that contained missing values were “composers”, “lyricists”, and “arrangers”. A total of 47 records lacked values in these columns; and while I could probably have filled them in by doing 47 google searches, I decided to save the fun for a later date.

3. Results

The names and titles of songs are unfortunately mostly in Korean; this is due to the source of the dataset (Mnet) being a Korean website. I’ll try to upload a translated version of the results soon.

Songs that make multiple appearances

titleartistranked years(rank)
벚꽃 엔딩버스커 버스커2012(6); 2013(42); 2014(100)
Lost StarsAdam Levine2014(47); 2015(54)
Problem(Feat. Iggy Azalea)Ariana Grande2014(93); 2015(67)
붉은 노을BIGBANG2008(18); 2009(75)
I`m Not The Only OneSam Smith2015(20); 2016(72)
U R ManSS5012008(76); 2009(80)
Officially missing you, too긱스(Geeks),소유2012(56); 2013(34)
바람기억나얼2012(18); 2015(38)
하지 못한 말노을2012(68); 2013(69)
야생화박효신2014(2); 2015(90)
총 맞은 것처럼백지영2008(23); 2009(83)
너랑 나아이유(IU)2011(89); 2012(64)
너 때문에애프터스쿨(After School)2009(82); 2010(62)
오늘부터 우리는 (Me gustas tu)여자친구(GFRIEND)2015(47); 2016(22)
되돌리다이승기2012(66); 2013(53)
또 다시 사랑임창정2015(35); 2016(15)
? (물음표)(Feat. 최자 of 다이나믹듀오, Zion.T)프라이머리(Primary)2012(95); 2013(56)
가슴 시린 이야기(Feat. 용준형 of BEAST)휘성2011(72); 2014(60)

In the dynamic K-pop scene, where new songs by popular artists are released at a rapid pace, it is not easy to hold on to a spot on the top charts. However, there are a total of 18 songs that had appeared in the Top 100 chart in at least two years. Of these, only one song appears THREE times—“벛꽃 엔딩” by 버스커버스커.

However, this metric probably isn’t a good measure of long-lasting popularity as songs that are released later in the year are likely to be listed twice in consecutive years.

Songs by foreign artists

titleartistyearrank
Lost StarsAdam Levine201447
Lost StarsAdam Levine201554
Rolling In The DeepAdele201269
HelloAdele201667
Problem(Feat. Iggy Azalea)Ariana Grande201493
Problem(Feat. Iggy Azalea)Ariana Grande201567
Nothin` On You(Feat. Bruno Mars)B.O.B.201020
Call Me MaybeCarly Rae Jepsen201366
One Call AwayCharlie Puth201644
Bad (Radio Edit)(Feat. Vassy)David Guetta,Showtek201492
How Long Will I Love YouEllie Goulding201457
Love The Way You Lie(Feat. Rihanna)Eminem201053
Let It GoIdina Menzel20149
Moves Like Jagger (Studio Recording From The Voice Performance)(Feat. Christina Aguilera)Maroon 5201186
Payphone(Feat. Wiz Khalifa)Maroon 5201257
One More NightMaroon 5201284
MapsMaroon 5201440
SugarMaroon 5201515
Bang BangNicki Minaj,Jessie J,Ariana Grande201676
I`m Not The Only OneSam Smith201520
I`m Not The Only OneSam Smith201672

The Mnet Top 100 chart also includes songs by foreign artists. There are 18 such songs in total in the charts of 2008-2016. Let It Go by Idina Menzel actually placed within the Top 10 songs of 2014.

Artists with the most ranked songs

Artists with the most ranked songs

One thing to consider here is that members of BIGBANG have released additional songs individually or as units. As can be seen here, G-DRAGON has 7 listed songs; however, not shown are GD&TOP (2 songs), GD X TAEYANG (1 song), and two more songs where G-DRAGON collaborated with members of Infinity Challenge (무한도전). TAEYANG also has 3 songs listed from his solo albums. If you sum up all those appearances, various members of BIGBANG have landed over 40 songs in the Top 100 charts of 2008-2016.

The top 10 songs of each year

2008

2008

2009

2009

2010

2010

2011

2011

2012

2012

2013

2013

2014

2014

2015

2015

2016

2016

Artist frequency distribution

Artist frequency distribution

This chart shows the distribution of how many times each artist appears in the dataset. I had expected a small number of popular artists to dominate, making multiple appearances. However, the data shows this not to be the case; over half of Top 100 songs from 2008-2016 are by artists who appear only once.

But there is a catch: many artists who are part of groups often release songs as individuals or units. For example, Tiffany of SNSD is listed only once in the data while SNSD appears 12 times. Moreover, subsets of a group often release songs under different names, such as EXO-K; collaborations between artists also often utilize a group name, such as 이유 갓지 않은 이유, which is a one-time collaboration between IU and Myungsoo Park (박명수). As a consequence, the actual number of “one-hit-wonders” in the data would be considerably smaller.

Songwriters with the most ranked songs

Songwriters with the most ranked songs

There are a lot more collaborations between songwriters than artists—hence the larger numbers.

JYP is listed under three different names (“JYP”, “박진영”, and “J.Y. Park `TheAsiansoul`”). Summing up all his appearances would rank him somewhere around G-DRAGON and 용감한 형제, with 28 Top 100 songs.

Lyricists with the most ranked songs

Composers with the most ranked songs

4. Reflections

Some results were expected; others were not. Overall, the project was a fun way to explore K-pop. One regret is that the data could have been a lot cleaner; if all the artists were listed by name, as well as the composers and lyricists, it would have been possible to get accurate per-person statistics.

Also, if the data had been categorized with more labels such as gender, genre, etc., there would have been a greater amount of interesting results that could have been extracted. This would take a lot of time to do manually; hopefully there is a useful data source laying around somewhere. I felt that I only scratched the surface of music data analysis with this small project. If time allows, I think it would be fun to continue the analysis in more depth.

Most of the exploration and visualizations were done using Python; I shared the code on my Github.