2023 American College Students Mathematical Contest in Modeling E Question Light Pollution Complete Dataset and Solution Code Sharing

Table of contents

data set collection

GeoNames geographic dataset

Latitude and longitude datasets for countries around the world

A Harmonized Global Nighttime Lights (1992 - 2018) Dataset

NASA's EaN Blue Marble 2016 dataset

Global Nighttime Dataset

read dataset

draw heatmap

Light Pollution Analysis

​Dataset and code address


2023 Mathematical Modeling Contest E Progress: The analysis of the light pollution data set and related codes of the 2023 Mathematical Contest E has been completed. The dataset is 1.2GB in total

data set collection

GeoNames geographic dataset

The GeoNames geographic database covers all countries and contains over eleven million place names for free download. This dataset contains some key information such as continent, area (km^2) and population.

Latitude and longitude datasets for countries around the world

Google Developers, which includes latitude and longitude data for each country. This establishes a plausible center for the country.

A Harmonized Global Nighttime Lights (1992 - 2018) Dataset

This dataset is particularly large, containing almost 20 billion data points (specifically 20322960028), so this data must be obtained in chunks. All data can be downloaded from a zip file as shown below.

  1. Download the zip file

  2. Create a directory related to this notebook called data/nightLight

  3. Extract all the contents of the zip file into the directory created in step 2 of nightLight.

  4. Delete the zip file to save disk space.

fig, (axim, axhist) = plt.subplots(1, 2, figsize=(40, 10), gridspec_kw={
    
    'width_ratios': [3, 1]})
rf = rs.open("data/nightLight/DN_NTL_2013_simVIIRS.tif", "r")
show(rf, ax=axim, cmap="inferno")
show_hist(rf, ax=axhist)
axim.set(xlabel="Longitude", ylabel="Latitude", title="Image of 2013 VIIRS Data")
axhist.set_title("Color Historgram of 2013 VIIRS Data")
del rf

NASA's EaN Blue Marble 2016 dataset

Satellite images of Earth at night (often referred to as "night lights") have been a tool of public curiosity and basic research for at least 25 years. They provide a vast and beautiful picture of how humans have shaped the planet and illuminated darkness. Produced every decade or so, these maps have spawned hundreds of pop culture uses and dozens of economic, social science and environmental research projects.

These images show Earth's nighttime lights as observed in 2016. The data were reprocessed with a new synthesis technique that selected the best cloud-free nights for each month on each landmass.

The images are provided in JPEG and GeoTIFF formats in three different resolutions: 0.1 degrees (3600x1800), 3 kilometers (13500x6750), and 500 meters (86400x43200). A 500-meter global map divided into tiles (21600x21600) according to the gridding scheme.

Global Nighttime Dataset

Globe At Night collects data based on a specific location, in this case, includes a column called LimitingMag that can be related to light pollution standards for that area.

The following commands demonstrate a way to programmatically download datasets while also removing unnecessary datasets.

gan_url = "https://www.globeatnight.org/"
files = [gan_url + i["href"] for i in BeautifulSoup(requests.get(gan_url+"maps.php").content, "lxml").findAll(href=re.compile("\.csv$"))]
gan = []
for file in files:
    filename = "data/gan/"+file.split("/")[-1]
    print(file, "==>", filename)
    file = BytesIO(requests.get(file, allow_redirects=True).content)
    data = pd.read_csv(file, error_bad_lines=False)[["Latitude", "Longitude", "LocalDate", "LocalTime", "UTDate", "UTTime", "LimitingMag", "Country"]]
    data = data[data.LimitingMag > 0]
    data.LocalTime = pd.to_datetime(data.apply(lambda row: row["LocalDate"] + " " + row["LocalTime"], axis=1), format='%Y-%m-%d %H:%M')
    data.UTTime = pd.to_datetime(data.apply(lambda row: row["UTDate"] + " " + row["UTTime"], axis=1), format='%Y-%m-%d %H:%M')
    data.loc[:, "Year"] = int(filename[-8:-4])
    data = data[["Latitude", "Longitude", "LocalTime", "UTTime", "LimitingMag", "Country", "Year"]]
    data.to_csv(filename)
    gan.append(data)

gan = pd.concat(gan, ignore_index=True)
gan.to_csv("data/gan/GaN.csv", index=False)

read dataset

gan = pd.read_csv("data/gan/GaN.csv").sort_values(["Year", "Country"], ignore_index=True)
gan.Country = gan.Country.str.replace("United States.*", "United States").str.replace("Republic of the Union of Myanmar", "Myanmar").replace("Republic of the Congo", "Congo Republic").replace('Myanmar (Burma)', "Myanmar").replace("Czechia", "Czech Republic").replace("Republic of Kosovo", "Kosovo").replace("Brunei Darussalam", "Brunei").replace("Democratic Republic of the Congo", "DR Congo").replace("The Bahamas", "Bahamas").replace('Macedonia (FYROM)', "North Macedonia").replace("Reunion", "Réunion").replace('Virgin Islands', 'U.S. Virgin Islands').replace('St Vincent and the Grenadines', 'St Vincent and Grenadines').replace('Kingdom of Norway', "Norway").replace('The Netherlands', 'Netherlands')

gan_countries = set(gan.Country.unique())
geolatlong_countries = set(geocountries_latlong.Country.unique())
print(gan_countries - geolatlong_countries)
print(geolatlong_countries - gan_countries)
base = countries.plot(color='white', edgecolor='black')
gan[["geometry"]].plot(ax=base, marker='o', color='red', markersize=2)
_ = (base.set_xlabel("Longitude"), base.set_ylabel("Latitude"), base.set_title("Plot of GaN Data Points Around the World"))

draw heatmap

heatmap, xedges, yedges = np.histogram2d(gan.Latitude, gan.Longitude, bins=250)

logheatmap = np.log(heatmap)
logheatmap[np.isneginf(logheatmap)] = 0
logheatmap = sp.ndimage.filters.gaussian_filter(logheatmap, 2, mode='nearest')

plt.figure(figsize=(20, 10))

plt.imshow(logheatmap, cmap="jet", extent=[yedges[0], yedges[-1], xedges[-1], xedges[0]])
plt.colorbar()

ax = plt.gca()
ax.invert_yaxis()
ax.set_xlim(-175,180)

countries.boundary.plot(edgecolor='white', ax=ax)
_ = ax.set_title("Heat Map of GaN Data")

Light Pollution Analysis

code show as below:

Here, we use the following two different algorithms to get an overview of light pollution:

pivotNightLight = nightLightMean.pivot("Country", "Year", "Average Light Pollution").sort_values(2018).rename(columns="nightLight{}".format) pivotNightLight

def summary(data, xloc, yloc):
    x, y = data.Year, data["Average Light Pollution"]
    m, c, r, p, stderr = stats.linregress(x=x, y=y)
    mspe = mean_squared_error(y, m*x + c)
    sns.regplot(x=x, y=y)
    plt.text(xloc, yloc, f"$y = {m} x + {c}$\nCorrelation, $r = {r}$\nConfidence, $p = {p}$\n$R^2 = {r**2}$\n$MSPE = {mspe}$")

yr_based = pivotNightLight.rename(columns=lambda yrstr: int(yrstr[-4:])).mean(axis=0).reset_index().rename(columns={
    
    0: "Average Light Pollution"})
summary(data=yr_based, xloc=2005, yloc=6)
plt.title("Regression plots of Double Average Light Pollution, $\mu_1$ per Year")

nightLightByQuan = nightLight[nightLight.Quantity.isin(["mean", "count"])].reset_index().set_index(["Quantity", "Year"])
fitted_mean_by_yr = ((nightLightByQuan.loc["mean"] * nightLightByQuan.loc["count"]).sum(axis=1) / nightLightByQuan.loc["count"].sum(axis=1)).reset_index().rename(columns={
    
    0:"Average Light Pollution"})
summary(fitted_mean_by_yr, 2004, 2.5)
summary(fitted_mean_by_yr[fitted_mean_by_yr.Year.isin(range(1992, 2014))], 2004, 0.6)
sns.lineplot(data=fitted_mean_by_yr, x="Year", y="Average Light Pollution").axvspan(xmin=2013.5, xmax=2018.5, color="r", alpha=0.2)
plt.title("Regression plots of Overall Weighted Average Light Pollution, $\mu_2$ per Year")

nightLightHighLow = nightLight.reset_index().set_index(["Quantity", "Year"]).loc["mean"].T.stack().reset_index().rename(columns={
    
    "level_0": "Country", 0: "Value"}).groupby("Country").Value.agg(["max", "min"])
nightLightHighLow = (nightLightHighLow["max"] - nightLightHighLow["min"]).sort_values(ascending=False).iloc[:5]
predict(nightLightHighLow)

def nightLightFilter(slice):
    return nightLight.reset_index().set_index(["Year", "Quantity"]).T.sort_values((2018, "mean"), ascending=False).iloc[slice].T.stack().reset_index().set_index(["Quantity", "Year"]).loc[["mean", "min", "max", "median", "mode"]].reset_index().rename(columns={
    
    "level_2":"Country", 0: "Value"})

nightLightMax = nightLightFilter(slice(0, 5))
for alg in [sns.lineplot, sns.regplot, sns.residplot]:
    sns.FacetGrid(nightLightMax, col="Quantity", row="Country").map(alg, "Year", "Value")

fig, ax = plt.subplots(1, figsize=(15, 12))
sns.heatmap(pivotNightLight.corr().dropna(how="all", axis=0).dropna(how="all", axis=1), cmap="RdBu_r", ax=ax)

sns.PairGrid(pivotNightLight[["nightLight1992", "nightLight2018"]], height=8).map_diag(sns.histplot).map_lower(sns.regplot).map_upper(sns.kdeplot)
 
 

​Dataset and code address

2023 American College Mathematical Contest in Modeling E Question Light Pollution Dataset

Guess you like

Origin blog.csdn.net/qq_45857113/article/details/129099437