编程设计电影海报的色彩/统计电影海报颜色/电影海报数据可视化

作业一：设计电影海报的色彩可视化作品。模仿课程PPT第四章第43页上的1914年以来电影海报的颜色做进行可视化的作品，设计实现中国从2008年到2018年这十年间的国产电影海报的颜色可视化作品。具体要求如下：

   采用Python语言编写从互联网上去获取自2008年-2018年中国国产电影的宣传海报图片的爬虫程序，并按年度分类存储；
    针对这些图片，获取每幅图片上的每个像素的色彩值；
    统计每幅海报图片中出现的像素颜色值数据；
    纵坐标方向自上往下表示从2008年到2018年，横坐标表示当年电影海报中所包含的色彩（简单地以7色光为统计单位）
    按自左至右红橙黄绿青蓝紫的顺序
    对该作品进行一下感知和认知层面上的分析和解读
    按本章课程PPT第四页的一般性流程，描述作品完成的过程及撰写体会

一、使用Python爬取1905网站的数据

我在此基础上做了一些修改来实现老师的需求

# 负责下载电影海报
def download_img(title, img_addr, headers, time):

    # 如果不存在图片文件夹,则自动创建
    if os.path.exists("./Top250_movie_images/"):
        pass
    else:
        os.makedirs("./Top250_movie_images/")
    if os.path.exists("./Top250_movie_images/" + time + "/"):
        pass
    else:
        os.makedirs("./Top250_movie_images/" + time + "/")

    # 获取图片二进制数据
    image_data = requests.get(img_addr, headers=headers).content
    # 设置海报存存储的路径和名称
    image_path = "./Top250_movie_images/" + time + "/" + title[0] + '.jpg'
    # 存储海报图片
    with open(image_path, "wb+") as f:
        f.write(image_data)


# 根据url获取数据,并打印到屏幕上,并保存为文件
def get_movies_data(url, headers):

    # 获取页面的响应内容
    db_response = requests.get(url, headers=headers)

    # 将获得的源码转换为etree
    db_reponse_etree = etree.HTML(db_response.content)

    # 提取所有电影数据
    db_movie_items = db_reponse_etree.xpath('//*[@class="fl line"]/a')
    print(len(db_movie_items))
    # 遍历电影数据列表,
    for db_movie_item in db_movie_items:

        # 这里用到了xpath的知识
        db_title = db_movie_item.xpath('img/@alt')
        print(db_title)
        db_date = db_movie_item.xpath('img/@data-original')
        db_img_addr = db_movie_item.xpath('img/@src')
        
        word = 'uploadfile'
        index = [m.start() + 11 for m in re.finditer(word, str(db_date[0]))]
        print(index)
        db_movie_date = db_date[0][index[0]:index[0]+4]
        print("标题:", db_title[0]+" 时间:", db_movie_date + " URL:", db_date[0])
        # a表示追加模式, b表示以二进制方式写入, + 表示如果文件不存在则自动创建
        with open("./douban_movie_top250.txt", "ab+") as f:
            tmp_data = "标题:"+str(db_title)+ "-" + str(db_movie_date) + "\n"
            f.write(tmp_data.encode("utf-8"))

        db_img_addr = str(db_img_addr[0].replace("\'", ""))
        download_img(db_title, db_img_addr, headers, str(db_movie_date))

截图显示的是爬取的2008年到2019年的电影海报

二、针对这些图片，获取每幅图片上的每个像素的色彩值

    使用list_all_files(’./Top250_movie_images’)遍历生成的海报
    导入from PIL import Image并使用toRGB(name)对于每一张海报都生成对应的色彩值
    使用data_write_csv(file_name, datas)或text_save(filename, data)生成对应的文件

def data_write_csv(file_name, datas):  # file_name为写入CSV文件的路径，datas为要写入数据列表
    file_csv = codecs.open(file_name, 'w+', 'utf-8')  # 追加
    writer = csv.writer(file_csv, delimiter=' ',
                        quotechar=' ', quoting=csv.QUOTE_MINIMAL)
    for data in datas:
        writer.writerow(data)
    print("保存文件成功，处理结束")


def text_save(filename, data):  # filename为写入CSV文件的路径，data为要写入数据列表.
    file = open(filename, 'a')
    for i in range(len(data)):
        s = str(data[i]).replace(
            '[', '').replace(']', '')  # 去除[],这两行按数据不同，可以选择
        s = s.replace("'", '').replace(',', '') + '\n'  # 去除单引号，逗号，每行末尾追加换行符
        file.write(s)
    file.close()
    print("保存文件成功")


def list_all_files(rootdir):
    import os
    _files = []
    list = os.listdir(rootdir)  # 列出文件夹下所有的目录与文件
    for i in range(0, len(list)):
        path = os.path.join(rootdir, list[i])
        if os.path.isdir(path):
            _files.extend(list_all_files(path))
        if os.path.isfile(path):
            _files.append(path)
            print(path)
            name = path[22:]
            toRGB(name)
            # print(name)
    return _files


def toRGB(name):
    time = name[:4]
    title = name[5:-4]
    
    print(title + " " + time)
    img = Image.open("C:\\Users\\Ifand\\Top250_movie_images\\" + name)
    img_array = img.load()
    width, height = img.size
    all_pixels = []
    for x in range(width):
        for y in range(height):
            cpixel = img_array[x, y]
            all_pixels.append(cpixel)
    # print(img_array[6, 4])
    print(len(all_pixels))

     # 如果不存在文件夹,则自动创建
    if os.path.exists("./Top250_movie_images/RGBFiles"):
        pass
    else:
        os.makedirs("./Top250_movie_images/RGBFiles")
    if os.path.exists("./Top250_movie_images/RGBFiles/" + time + "/"):
        pass
    else:
        os.makedirs("./Top250_movie_images/RGBFiles/" + time + "/")

    # data_write_csv("./Top250_movie_images/RGBFiles/" + time + "/" + title + ".csv", all_pixels)
    text_save("./Top250_movie_images/RGBFiles/" + time + "/" + title + ".txt", all_pixels)

三。生成的数据可视化界面

代码地址：https://fgk.pw/i/pz0ohi73031

因为网站可能会进行更新，建议先运行task3.py代码。查看效果。

编程设计电影海报的色彩/统计电影海报颜色/电影海报数据可视化

猜你喜欢