2021 American College Mathematical Contest in Modeling D Question Music Influences the Whole Process Documentation and Procedures of Problem Solving

2021 American Collegiate Mathematical Contest in Modeling

Question D The Influence of Music

Reproduction of the original title:

  Music is part of human society and an important part of cultural heritage. As part of an effort to understand the role music plays in the collective human experience, we were asked to develop a method to quantify musical evolution. When artists create a new piece of music, there are many factors that can influence them, including their innate creativity, current social or political events, access to a new instrument or tool, or other personal experiences. Our goal is to understand and measure the influence of previously produced music on new music and music artists.
  Some artists can list a dozen or more other artists who they say have influenced their own music. It has also been suggested that influence can be measured by the degree of similarity between song features, such as structure, rhythm or lyrics. Music is sometimes revolutionized, offering new sounds or rhythms, such as when a new genre emerges, or there is a reinvention of an existing genre (eg. classical, pop/rock, jazz, etc.). This could be due to a series of small changes, a collaborative effort of artists, a series of influential artists, or a shift within society.
  Many songs have a similar sound, and many artists have contributed to major shifts in the genre of music. Sometimes these changes are due to one artist influencing another. Sometimes it is a change that occurs in response to external events (such as major world events or technological advances). By considering song networks and their musical characteristics, we can begin to capture the influence of musical artists on each other. Perhaps, we can also better understand how music has evolved in society over time.
  Your group has been identified by the Integrating Collective Music (ICM) Institute to develop a model to measure musical impact. This question asks you to research the evolutionary and revolutionary trends of artists and genres. In order to do this, ICM gave your team several datasets:
  1) "influence_data"1 represents music influencers and followers, as reported by the artist himself, as well as opinions from industry experts. The data included influencers and followers of 5,854 artists over the past 90 years.
  2) "full_music_data"2 provides 16 variable entries, including musical features such as danceability, tempo, loudness and key, and artist_name and artist_id for 98,340 songs. These data were used to create two aggregated datasets, consisting of:
  a. the mean value of the artist "data_by_artist",
  b. the "data_by_year" mean for the years.
  Note: the data presented in these files is a subset of the larger dataset .. These files contain the only data you will use for this problem.
  To implement this challenging project, the ICM Consortium asks your team to explore the evolution of music through the influence of musical artists over time by doing the following: Using the
  influence_data data Set or parts thereof to create a directed network(s) of musical influence where influencers connect to followers. Develop parameters that capture "music influence" within this network. By creating subnetworks of your directed influencer network Explore a subset of musical influence. Describe this subnetwork. What does your 'music influence' measure reveal in this subnetwork?
  Use the music features from full_music_data and/or the two aggregated datasets (with artist and year) to formulate Measures of musical similarity. Using your metric, are artists within genres more similar than between genres?
  Compare similarities and influences between and within genres. What differentiates a genre and how do genres vary Are some genres related to others?
  Indicate whether similarity data reported in the data_influence dataset indicate that identified influencers actually influence their respective artists. "Influencers" really influence followers created music? Are some musical traits more "contagious" than others, or do they all have a similar effect in influencing the music of a particular artist? Determine from these
  data whether it is possible to represent a revolution in musical evolution (major Leap)? What artists represent revolutionaries (influencers of major change) in your network?
  Analyze the impact process of musical evolution, over time, within a genre. Can your team identify metrics that reveal dynamic influencers and explain how genres or artists change over time?
  How does your work convey a message about musical cultural influences in time or context? Or, how to identify the effects of social, political or technological change (such as the Internet) in a network?
  Write a one-page paper to the ICM Institute on the value of using your approach to understanding the impact of music across the web. Given that the two problem datasets are limited to certain genres, and then artists common to both datasets, how would your work or solution change with more or richer data? Further study of music and its impact on culture is recommended.
  The interdisciplinary and diverse ICM associations from the fields of music, history, social sciences, technology and mathematics look forward to your final presentation.

Overview of the overall solution process (abstract)

  To understand the evolution of music, we combine network science from physics, cosine similarity, cooling and gravity models, and other static methods to explore how music evolves through influences across artists and genres.
  Most importantly, we created targeted influencer networks to visualize the relationship between influencers and followers. We then explain the concept of musical influence with the help of a cooling model to describe the declining influence as the time interval becomes longer. Once this is done, we show the musical influence of the artists in the subnet by the size of the nodes and analyze them in detail.
  Next, after data preprocessing, we use PCA to reduce the dimensionality of music features and extract four factors. Based on the feature vector formed by the four factors, we calculated the cosine similarity between works or genres. Our method of measuring similarity proved to be correct, as artists within genres achieved higher similarity than artists between genres.
  We then develop basic models of networks and similarities to solve tasks about musical evolution. Our analysis is carried out from the following four perspectives.
  For the task on influencers and followers, we compared the similarity density curves between followers and influencers or non-influencers and identified influencers' influence on followers' music. We then reconstruct feature vectors to compute similarity by removing each feature to see which feature is more contagious in the influence process of a particular genre.
  For the task of genre trends, first we compute the similarity matrix of genres and accordingly describe the relations and differences between genres. Next, based on the gravity model, we estimated the main genres per year and analyzed the trends of genres over time. In addition, we analyze the evolution of country music, in particular by using popularity and counts to identify dynamic influencing factors and explain the fusion of country and other genres during its evolution.
  For the task on musical revolutions, we analyze the cosine similarity between each year and the next to identify significant changes as revolutions. We found that the 1940s, 1960s and 1980s were periods of revolution, and we specifically analyzed the revolution from 1965 to 1967 to find that the Beatles and Bob Dylan were the two artists who represented the revolution of this period.
  For tasks on other influences on the evolution of music, we analyze in detail genre timelines and periods of revolution as well as major cultural, social, political or technological changes, explaining how events such as the Cold War, the Civil Rights Movement, etc. affected the development of music.
  In summary, our model is instructive for further research due to its comprehensiveness, innovation and good performance in sensitive analysis.

Model assumptions:

  As the time interval gets larger, the musical influence decreases. Works generally have a certain timeliness, and the popularity of works will decay with the passage of time. The amount of decline is not related to release year and popular year, but to time interval. The longer the interval between the point of judging and the year of release, the less impact the work has on today's musicians.
  Every follower was influenced by an influencer the year they started making music. It is difficult to define when a person's peak period is, so it is fair and reasonable to use a person's active start year to calculate.
  Each artist belongs to only one genre, and throughout his career, the genre does not change. Generally speaking, an artist who decides to choose a genre must be influenced by his predecessors' works, so he is unlikely to change genres in his later creations.
  An artist can reflect the characteristics of his genre. An artist needs to reflect the characteristics of his category, otherwise he would not be classified as such.
  Existing data sets can reflect the situation of the music market. We fully believe in the statistical data of various indicators such as the number of followers and popularity, and believe that the trends reflected are correct.

Question restatement:

  Music has undergone rapid development over the past 100 years. To reveal the evolution of music from an artist's influence perspective, we need to address the following issues:
  construct a directed network of influence relations from the provided "influence_data" dataset, and develop a measure of musical influence. Network engineering should include parent and subnetworks, with detailed descriptions;
  build a model that measures musical similarity against the rest of a given dataset, and by comparing similarities between artists within genres and between artists across genres to examine the model;
  to explore whether influencers really influence the music of followers, and whether each feature of the music plays an equally important role in the influence process; to
  identify differences and connections between genres, and to explore The development of genres over time, not only the development of representative music genres in different periods, but also the development of the same music genre in different periods; determine indicators to find out the most dynamic influencing factors and analyze the evolution process of a specific genre; introduce an
  indicator , implying major changes in the evolutionary process and identifying artists who represented revolutionaries;
  discussing the cultural influence of music or other factors that may have influenced the evolution of music;

Model establishment and solution Overall paper thumbnail

insert image description here
insert image description here

For all papers, please see below "Only modeling QQ business cards" Click on the QQ business card

Part of the program code: (code and documentation not free)

import pandas as pd
#读取文件
data = pd.read_csv('influence_data.csv')

#选择influencer_main_genre与follower_main_genre两列数据 
influence_data = (list(data['influencer_main_genre']))
follower_data = (list(data['follower_main_genre']))

#导入计算频次模块
from collections import Counter

#统计频次并按照字典键排序
influence_data_dict = sorted(dict(Counter(influence_data)).items(),key=lambda item:item[0])
follower_data_dict = sorted(dict(Counter(follower_data)).items(),key=lambda item:item[0])

#提取influence_data的标签与频次数据
influence_data_labels = [item[0] for item in influence_data_dict]
influence_data_datas = [item[1] for item in influence_data_dict]

#提取follower_data的标签与频次数据
follower_data_labels = [item[0] for item in follower_data_dict]
follower_data_datas = [item[1] for item in follower_data_dict]

#导入画图库
from matplotlib import pyplot as plt
from matplotlib import font_manager

#x轴
a = influence_data_labels

#条形图宽度
bar_width = 0.2

#x轴范围
x_i = list(range(len(a)))
x_f = list(i+bar_width for i in x_i)

#图片大小
plt.figure(figsize=(10,8),dpi=100)

#绘制条形图influence
plt.bar(x_i,influence_data_datas,width=bar_width,label="influence",color='red')

#绘制条形图follow
plt.bar(x_f,follower_data_datas,width=bar_width,label="follow",color='gray')

# 设置图例
plt.legend(fontsize=10)

# 设置x轴刻度并倾斜展示
plt.xticks(x_f,a,rotation=70)

#刻度大小
plt.tick_params(axis='both',which='major',labelsize=14)

#条形图展示
plt.show()
import pandas as pd
#读取csv文件
data = pd.read_csv('full_music_data.csv')

#导入统计频率函数
from collections import Counter

#读入艺术家名
art_list = list(data['artist_names'])
#统计频率
artist = Counter(art_list)
#按照频率进行对艺术家排序
artist_sort = sorted(artist.items(),key=lambda item:item[1],reverse=True)

#读取其他13个因素
danceability = list(data['danceability'])
energy = list(data['energy'])
valence = list(data['valence'])
tempo = list(data['tempo'])
loudness = list(data['loudness'])
mode = list(data['mode'])
key = list(data['key'])
acousticness = list(data['acousticness'])
instrumentalness = list(data['instrumentalness'])
liveness = list(data['liveness'])
speechiness = list(data['speechiness'])
explicit = list(data['explicit'])
popularity = list(data['popularity'])

#读取歌曲名
song_title = list(data['song_title (censored)'])

#导入相似度计算函数 余弦距离
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

#计算不同艺术家歌曲相似度函数
def getSortData(artist_sort,num):
	#遍历所有艺术家的歌曲进行相似度计算
    for i in range(num):
        if artist_sort[i][1] > 2:
        	#获艺术家名字
            artist_name = artist_sort[i][0]
            #获取歌曲数量
            length = artist_sort[i][1]
            #获取截取片段的索引
            startindex,endindex = art_list.index(artist_name),art_list.index(artist_name)+length
            #print(artist_name,length,startindex,endindex)
            print('The ',(i+1),'th artist:',artist_name,'songs num:',length)
            
            #写入文件 写入艺术家名字与歌曲数量
            with open('music_sim.txt','a') as f:
                f.write('The '+str(i+1)+'th artist:'+str(artist_name)+' songs num:'+str(length)+' songs similarity:')
			
			#截取属于该艺术家的13个因素中的片段
            danceability_pic = danceability[startindex:endindex-1]
            energy_pic = energy[startindex:endindex-1]
            valence_pic = valence[startindex:endindex-1]
            tempo_pic = tempo[startindex:endindex-1]
            loudness_pic = loudness[startindex:endindex-1]
            mode_pic = mode[startindex:endindex-1]
            key_pic = key[startindex:endindex-1]
            acousticness_pic = acousticness[startindex:endindex-1]
            instrumentalness_pic = instrumentalness[startindex:endindex-1]
            liveness_pic = liveness[startindex:endindex-1]
            speechiness_pic = speechiness[startindex:endindex-1]
            explicit_pic = explicit[startindex:endindex-1]
            popularity_pic = popularity[startindex:endindex-1]
			
			#截取属于该艺术家的歌曲名称片段
            song_title_pic = song_title[startindex:endindex-1]
            
            #存储相似度列表
            sim_list = []

			#遍历该艺术家的乐曲计算相似度
            for j in range(len(danceability_pic)):
                try:
                	#当前乐曲的13个因素数据
                    a = [danceability_pic[j],energy_pic[j],valence_pic[j],tempo_pic[j],loudness_pic[j],
                    mode_pic[j],key_pic[j],acousticness_pic[j],instrumentalness_pic[j],liveness_pic[j],
                    speechiness_pic[j],explicit_pic[j],popularity_pic[j]]
                    #下一首乐曲的13个因素数据
                    b = [danceability_pic[j+1],energy_pic[j+1],valence_pic[j+1],tempo_pic[j+1],loudness_pic[j+1],
                    mode_pic[j+1],key_pic[j+1],acousticness_pic[j+1],instrumentalness_pic[j+1],liveness_pic[j+1],
                    speechiness_pic[j+1],explicit_pic[j+1],popularity_pic[j+1]]
                    #计算相似度
                    sim = cosine_similarity(np.array([a]),np.array([b]))[0][0]
                    
                    #存储相似度
                    sim_list.append(sim)
                    
                    #将相似度写入文件
                    with open('music_sim.txt','a') as f:
                        f.write(str(sim)+' ')
                except:
                    pass
			
			#计算该艺术家所有乐曲的最小相似度
            m = min(sim_list)
            
            #获取最小相似度的索引
            index = sim_list.index(m)
            
            #写入最小相似度与发生最小相似度的乐曲
            with open('music_sim.txt','a') as f:             
                try:
                	#最小相似度
                    f.write(' minst similarity is:'+str(m))
                    #最小相似度的乐曲
                    f.write(' minst similarity songs are:'+str(song_title_pic[index-1])+'--->'+str(song_title_pic[index]+'\n'))
                    f.close()
                except:
                	#最小相似度
                    f.write(' minst similarity is:'+str(m)+'\n')
                    f.close()
        print()
#调用函数计算    
getSortData(artist_sort,len(artist_sort))

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

#计算不同艺术家的相似度
def getArtistSimilarlty(artist_sort,num):
	#存储艺术家的相似度
    artist_sim_list = []
    #存储艺术家名称
    artist_list = []
    
    #遍历进行计算相似度
    for i in range(num):
        if artist_sort[i][1] > 2:
        	#艺术家名字
            artist_name = artist_sort[i][0]
            #歌曲数
            length = artist_sort[i][1]
            #获取截取片段的索引
            startindex,endindex = art_list.index(artist_name),art_list.index(artist_name)+length
            print('The ',(i+1),'th artist:',artist_name,'songs num:',length)
            
            #写入艺术家名字
            with open('artist_sim.txt','a') as f:
                f.write(str(artist_name)+' ')
                f.close()
                
            #截取属于该艺术家的13个因素中的片段
            danceability_pic = danceability[startindex:endindex-1]
            energy_pic = energy[startindex:endindex-1]
            valence_pic = valence[startindex:endindex-1]
            tempo_pic = tempo[startindex:endindex-1]
            loudness_pic = loudness[startindex:endindex-1]
            mode_pic = mode[startindex:endindex-1]
            key_pic = key[startindex:endindex-1]
            acousticness_pic = acousticness[startindex:endindex-1]
            instrumentalness_pic = instrumentalness[startindex:endindex-1]
            liveness_pic = liveness[startindex:endindex-1]
            speechiness_pic = speechiness[startindex:endindex-1]
            explicit_pic = explicit[startindex:endindex-1]
            popularity_pic = popularity[startindex:endindex-1]
          
            #存储艺术家相似度
            sim_list = []

			#遍历进行相似度计算
            for j in range(len(danceability_pic)):
                try:
                    a = [danceability_pic[j],energy_pic[j],valence_pic[j],tempo_pic[j],loudness_pic[j],
                    mode_pic[j],key_pic[j],acousticness_pic[j],instrumentalness_pic[j],liveness_pic[j],
                    speechiness_pic[j],explicit_pic[j],popularity_pic[j]]
                    b = [danceability_pic[j+1],energy_pic[j+1],valence_pic[j+1],tempo_pic[j+1],loudness_pic[j+1],
                    mode_pic[j+1],key_pic[j+1],acousticness_pic[j+1],instrumentalness_pic[j+1],liveness_pic[j+1],
                    speechiness_pic[j+1],explicit_pic[j+1],popularity_pic[j+1]]
                    sim = cosine_similarity(np.array([a]),np.array([b]))[0][0]
                    sim_list.append(sim)
                except:
                    pass
                    
			#相似度取所有乐曲相似度的平均值
            artist_sim_list.append(np.mean(sim_list))
            artist_list.append(artist_name)
            #写入相似度
            with open('artist_sim.txt','a') as f:
                f.write(str(np.mean(sim_list))+'\n')
                f.close()
        print()
    #返回艺术家及相似度
    return artist_sim_list,artist_list
    
artist_sim_list,artist_list = getArtistSimilarlty(artist_sort,len(artist_sort))

#连表查询 计算派间相似度
influencedata = pd.read_csv('influence_data.csv')
#读取influencer_name influencer_main_genre列
influencer_name = influencedata['influencer_name']
influencer_main_genre = influencedata['influencer_main_genre']
#派别
artist_type = []
#派别相似度
artist_type_sim = []

#遍历存储派别信息及派别间相似度信息
for i in range(len(influencer_name)):
    print(i)
    for j in range(len(artist_list)):
        if "['"+influencer_name[i]+"']" == artist_list[j]:
            print(influencer_name[i])
            print(artist_list[j])
            artist_type.append(influencer_main_genre[i])
            artist_type_sim.append(artist_sim_list[j])
            
#派别去重处理
artist_set_type = list(set(artist_type))

#派间相似度计算
#派别相似度
type_sim = []
#遍历计算派别相似度并写入文件
for item in artist_set_type:
    print(item)
    #写入派别名
    with open('Typesim.txt','a') as f:
        f.write(item)
        f.close()
    tmp = []
    #统计派别的总相似度
    for i in range(len(artist_type)):
        if artist_type[i] == item:
            tmp.append(artist_type_sim[i])
    #写入相似度
    with open('Typesim.txt','a') as f:
        f.write(' '+str(np.mean(tmp))+'\n')
        f.close()
    print(len(tmp))
    #存储总相似度的平均最为派别相似度
    type_sim.append(np.mean(tmp))
    
#派中相似度
#遍历计算派中相似度
for item in artist_set_type:
    print(item)
    #写入派别名
    with open('TypeZhongsim.txt','a') as f:
        f.write(item)
        f.close()
    #写入派别的所有艺术家的相似度
    for i in range(len(artist_type)):
        if artist_type[i] == item:
            with open('TypeZhongsim.txt','a') as f:
                f.write(' '+str(artist_type_sim[i])+' ')
                f.close()
    with open('TypeZhongsim.txt','a') as f:
        f.write('\n')
        f.close()

#导入绘图函数
from matplotlib import pyplot as plt
import random

#派别相似度
y_1 = type_sim
#派别名称
x = artist_set_type

#设置图形大小
plt.figure(figsize=(20,8),dpi=80)

#设置颜色
color = ['red', 'orange', 'slategrey', 'green', 'cyan', 'blue', 'purple']

#绘制折线图
plt.plot(x,y_1,color=color[random.randint(0,len(color)-1)])

#设置x轴刻度
plt.xticks(x,rotation=45,fontsize=10)

#绘制网格, alpha 设置网格透明度
plt.grid(alpha=0.1)

#图片展示
plt.show()
For all papers, please see below "Only modeling QQ business cards" Click on the QQ business card

Guess you like

Origin blog.csdn.net/weixin_43292788/article/details/131696862