Analyzing K-Pop Top Charts (2008–2016)
In this post, I share the process and results of my exploration of K-pop’s top 100 songs for each year from 2008 to 2016.
July 20, 2017
Looking through the top charts is one of the most intuitive ways to grasp trends in music. However, I thought it would be interesting to venture beyond mere intuition and analyze the data in a more concrete form. In this post, I share the process and results of my exploration of K-pop’s top 100 songs for each year from 2008 to 2016.
1. Obtaining the data
Although information on songs and artists are abundant on various K-pop music streaming/download websites, I could not find a source that provides this data in a downloadable format. I therefore decided that my best option would be to scrape the aforementioned websites, which each display a top chart in daily, weekly, and annual time frames.
There are 4 popular K-pop music websites:
I explored the websites quite extensively, focusing on the top chart and song details pages. My planned strategy was to (1) fetch the top 100 songs in a specified time range (e.g. 5 years), then (2) crawl into the details page for each of those songs to scrape fields such as the composers, lyricists, etc. After multiple implementations of different crawlers (which I might cover in a future post), I found that Mnet provided the best scraping experience due to its logical HTML markup.
I fetched the top 100 songs in each year from 2008 to 2016, a total of 900 records. After preprocessing, the final data was structured with the following fields:
year | rank | id * | title | artist | featuring | composer | lyricist | arranger | time ** |
---|---|---|---|---|---|---|---|---|---|
2009 | 50 | 1906097 | In The Club | 2NE1 | TEDDY,KUSH | TEDDY,KUSH | TEDDY,KUSH | 239 |
- *
id
: a unique number assigned by Mnet. - **
time
: the length of the song in seconds.
The web crawler I used can be found on GitHub. I used the Scrapy framework to easily export fetched data to JSON and CSV formats.
2. Preprocessing
Preprocessing consumed an extensive amount of time. There were two main problems: inconsistency and missing values. The first problem was due to the information being fetched from a public web UI rather than a database; as for the second, I can’t imagine a reason other than pure laziness.
Inconsistency occurred in many places. By far the most annoying were the artists’ names. For example, the group TWICE appeared—fittingly, in two variants—either as “TWICE” or “TWICE (트와이스)”. MC the MAX was sometimes more boldly proclaimed as “MC THE MAX”. Taeyeon appeared as “태연”, “태연 (TAEYEON)”, or “태연 (Taeyeon)”. To fix this, I went through each row manually in Excel and modified each variation to match the majority label.
Composer names were even more inconsistent. Park Jin-young was listed either as “JYP”, “박진영”, or “J.Y. Park `TheAsiansoul`”. Unlike artists, composers and lyricists do not have much incentive to adhere to one set title because they are less exhibited publicly. I did not think I knew enough about composers to be able to convert between the dynamic supply of names they gave themselves, so I ultimately left the values as they are.
Another inconsistency arose in the listing of contributors to the song. These people predominantly consisted of “vocals”, “featuring”, “lyricists”, “composers”, “arrangers”; however, in very rare cases, there would be other labels as well. One was “rapper”, which appeared a mere 7 times out of 900; however, it did not appear in most songs that featured rap! “Producer” made an appearance only 5 times in 900 records. I got rid of this column, as well as “electric guitar” (1 appearance), “piano” (1 appearance), and “narration”.
The “vocals” column was nearly identical to “artists”. The few times they differed was due to a featuring artist or a collaboration. Consequently, I transferred the necessary values from “vocals” to “featuring” and erased the “vocals” column as well.
The fields that contained missing values were “composers”, “lyricists”, and “arrangers”. A total of 47 records lacked values in these columns; and while I could probably have filled them in by doing 47 google searches, I decided to save the fun for a later date.
3. Results
The names and titles of songs are unfortunately mostly in Korean; this is due to the source of the dataset (Mnet) being a Korean website. I’ll try to upload a translated version of the results soon.
Songs that make multiple appearances
title | artist | ranked years(rank) |
---|---|---|
벚꽃 엔딩 | 버스커 버스커 | 2012(6); 2013(42); 2014(100) |
Lost Stars | Adam Levine | 2014(47); 2015(54) |
Problem(Feat. Iggy Azalea) | Ariana Grande | 2014(93); 2015(67) |
붉은 노을 | BIGBANG | 2008(18); 2009(75) |
I`m Not The Only One | Sam Smith | 2015(20); 2016(72) |
U R Man | SS501 | 2008(76); 2009(80) |
Officially missing you, too | 긱스(Geeks),소유 | 2012(56); 2013(34) |
바람기억 | 나얼 | 2012(18); 2015(38) |
하지 못한 말 | 노을 | 2012(68); 2013(69) |
야생화 | 박효신 | 2014(2); 2015(90) |
총 맞은 것처럼 | 백지영 | 2008(23); 2009(83) |
너랑 나 | 아이유(IU) | 2011(89); 2012(64) |
너 때문에 | 애프터스쿨(After School) | 2009(82); 2010(62) |
오늘부터 우리는 (Me gustas tu) | 여자친구(GFRIEND) | 2015(47); 2016(22) |
되돌리다 | 이승기 | 2012(66); 2013(53) |
또 다시 사랑 | 임창정 | 2015(35); 2016(15) |
? (물음표)(Feat. 최자 of 다이나믹 듀오, Zion.T) | 프라이머리(Primary) | 2012(95); 2013(56) |
가슴 시린 이야기(Feat. 용준형 of BEAST) | 휘성 | 2011(72); 2014(60) |
In the dynamic K-pop scene, where new songs by popular artists are released at a rapid pace, it is not easy to hold on to a spot on the top charts. However, there are a total of 18 songs that had appeared in the Top 100 chart in at least two years. Of these, only one song appears THREE times—“벛꽃 엔딩” by 버스커버스커.
However, this metric probably isn’t a good measure of long-lasting popularity as songs that are released later in the year are likely to be listed twice in consecutive years.
Songs by foreign artists
title | artist | year | rank |
---|---|---|---|
Lost Stars | Adam Levine | 2014 | 47 |
Lost Stars | Adam Levine | 2015 | 54 |
Rolling In The Deep | Adele | 2012 | 69 |
Hello | Adele | 2016 | 67 |
Problem(Feat. Iggy Azalea) | Ariana Grande | 2014 | 93 |
Problem(Feat. Iggy Azalea) | Ariana Grande | 2015 | 67 |
Nothin` On You(Feat. Bruno Mars) | B.O.B. | 2010 | 20 |
Call Me Maybe | Carly Rae Jepsen | 2013 | 66 |
One Call Away | Charlie Puth | 2016 | 44 |
Bad (Radio Edit)(Feat. Vassy) | David Guetta,Showtek | 2014 | 92 |
How Long Will I Love You | Ellie Goulding | 2014 | 57 |
Love The Way You Lie(Feat. Rihanna) | Eminem | 2010 | 53 |
Let It Go | Idina Menzel | 2014 | 9 |
Moves Like Jagger (Studio Recording From The Voice Performance)(Feat. Christina Aguilera) | Maroon 5 | 2011 | 86 |
Payphone(Feat. Wiz Khalifa) | Maroon 5 | 2012 | 57 |
One More Night | Maroon 5 | 2012 | 84 |
Maps | Maroon 5 | 2014 | 40 |
Sugar | Maroon 5 | 2015 | 15 |
Bang Bang | Nicki Minaj,Jessie J,Ariana Grande | 2016 | 76 |
I`m Not The Only One | Sam Smith | 2015 | 20 |
I`m Not The Only One | Sam Smith | 2016 | 72 |
The Mnet Top 100 chart also includes songs by foreign artists. There are 18 such songs in total in the charts of 2008-2016. Let It Go by Idina Menzel actually placed within the Top 10 songs of 2014.
Artists with the most ranked songs
One thing to consider here is that members of BIGBANG have released additional songs individually or as units. As can be seen here, G-DRAGON has 7 listed songs; however, not shown are GD&TOP (2 songs), GD X TAEYANG (1 song), and two more songs where G-DRAGON collaborated with members of Infinity Challenge (무한도전). TAEYANG also has 3 songs listed from his solo albums. If you sum up all those appearances, various members of BIGBANG have landed over 40 songs in the Top 100 charts of 2008-2016.
The top 10 songs of each year
2008
2009
2010
2011
2012
2013
2014
2015
2016
Artist frequency distribution
This chart shows the distribution of how many times each artist appears in the dataset. I had expected a small number of popular artists to dominate, making multiple appearances. However, the data shows this not to be the case; over half of Top 100 songs from 2008-2016 are by artists who appear only once.
But there is a catch: many artists who are part of groups often release songs as individuals or units. For example, Tiffany of SNSD is listed only once in the data while SNSD appears 12 times. Moreover, subsets of a group often release songs under different names, such as EXO-K; collaborations between artists also often utilize a group name, such as 이유 갓지 않은 이유, which is a one-time collaboration between IU and Myungsoo Park (박명수). As a consequence, the actual number of “one-hit-wonders” in the data would be considerably smaller.
Songwriters with the most ranked songs
There are a lot more collaborations between songwriters than artists—hence the larger numbers.
JYP is listed under three different names (“JYP”, “박진영”, and “J.Y. Park `TheAsiansoul`”). Summing up all his appearances would rank him somewhere around G-DRAGON and 용감한 형제, with 28 Top 100 songs.
Lyricists with the most ranked songs
4. Reflections
Some results were expected; others were not. Overall, the project was a fun way to explore K-pop. One regret is that the data could have been a lot cleaner; if all the artists were listed by name, as well as the composers and lyricists, it would have been possible to get accurate per-person statistics.
Also, if the data had been categorized with more labels such as gender, genre, etc., there would have been a greater amount of interesting results that could have been extracted. This would take a lot of time to do manually; hopefully there is a useful data source laying around somewhere. I felt that I only scratched the surface of music data analysis with this small project. If time allows, I think it would be fun to continue the analysis in more depth.
Most of the exploration and visualizations were done using Python; I shared the code on my Github.