Data visualization (5) Making global earthquake scatter diagram: JSON format

1. Earthquake data
2. View JSON data
3. Create a list of earthquakes
4. Extract magnitude
5. Extract location data
6. Draw magnitude scatter plot
7. Another way to specify chart data

Download a dataset of all earthquakes that occurred around the world in a month, and make a scatterplot showing the location and magnitude of these earthquakes. These data are stored in JSON format, so use the module json to process them. Plotly provides tools for drawing maps based on location data, which is suitable for beginners. You'll use it to visualize and indicate the global distribution of earthquakes.

1. Earthquake data

Please copy the file eq_data_1_day_m1.json to the folder where the program of this chapter is stored. Earthquakes are measured on the Richter scale, and this file records (as of writing this section) all earthquakes of magnitude 1 or greater that have occurred worldwide in the last 24 hours.

insert image description here

2. View JSON data

If you open the file eq_data_1_day_m1.json, you will find that its content is dense and difficult to read:

{
    
    "type":"FeatureCollection","metadata":{
    
    "generated":1550361461000,...
{
    
    "type":"Feature","properties":{
    
    "mag":1.2,"place":"11km NNE of Nor...
{
    
    "type":"Feature","properties":{
    
    "mag":4.3,"place":"69km NNW of Ayn...
{
    
    "type":"Feature","properties":{
    
    "mag":3.6,"place":"126km SSE of Co...
{
    
    "type":"Feature","properties":{
    
    "mag":2.1,"place":"21km NNW of Teh...
{
    
    "type":"Feature","properties":{
    
    "mag":4,"place":"57km SSW of Kakto...
--snip--

These data are suitable for reading by machines, not humans. However, as you can see, this file contains some dictionaries, and some information we are interested in, such as magnitude and location.

The module json provides various tools for exploring and manipulating JSON data, some of which help to reformat this file, allowing us to see the raw data more clearly and then decide how to process it programmatically.

Let's load this data first and display it in a human-readable way. This data file is quite long, so instead of printing it out, write the data to another file, then open that file and easily navigate through the data: eq_explore_data.py

  import json

  # 探索数据的结构。
  filename = 'data/eq_data_1_day_m1.json'
  with open(filename) as f:
❶     all_eq_data = json.load(f)

❷ readable_file = 'data/readable_eq_data.json'
  with open(readable_file, 'w') as f:
❸     json.dump(all_eq_data, f, indent=4)

First import the module json so that the data in the file is properly loaded and stored in all_eq_data (see ❶). The function json.load() converts the data into a format that Python can handle, here is a huge dictionary. At ❷, create a file in which to write this data in a human-readable form. The function json.dump() accepts a JSON data object and a file object, and writes the data to this file (see ❸). The parameter indent=4 tells dump() to format the data using an indentation amount that matches the data structure.

If you now look in the directory data and open the file readable_eq_data.json in it, you will find that the beginning of it looks like this: readable_eq_data.json

  {
    
    
      "type": "FeatureCollection",
❶     "metadata": {
    
    
          "generated": 1550361461000,
          "url": "https://earthquake.usgs.gov/earthquakes/.../1.0_day.geojson",
          "title": "USGS Magnitude 1.0+ Earthquakes, Past Day",
          "status": 200,
          "api": "1.7.0",
          "count": 158
      },
❷     "features": [
      --snip--

The beginning of this file is a fragment with the key "metadata" (see ❶), which indicates when the data file was generated and where it can be found on the Internet. It also contains human-readable titles and how many earthquakes were recorded in the file: In the last 24 hours, there were 158 earthquakes.

The structure of this geoJSON file is suitable for storing location-based data. The data is stored in a list associated with the key "features" (see ❷). This file contains earthquake data, so each element of the list corresponds to an earthquake. This structure can be a bit confusing, but it's useful, allowing geologists to store any amount of information about each earthquake in a dictionary, and put those dictionaries in a large list.

Let's take a look at the dictionary representing a specific earthquake: readable_eq_data.json

  --snip--
      {
    
    
          "type": "Feature",
❶         "properties": {
    
    
              "mag": 0.96,
              --snip--
❷             "title": "M 1.0 - 8km NE of Aguanga, CA"
           },
❸          "geometry": {
    
    
               "type": "Point",
               "coordinates": [
❹                 -116.7941667,
❺                 33.4863333,
                  3.22
               ]
          },
          "id": "ci37532978"
      },

The key "properties" is associated with a large amount of information related to a particular earthquake (see ❶). We are primarily concerned with the magnitude of the earthquake associated with the key "mag" and the title of the earthquake, since the latter gives a good overview of the magnitude and location of the earthquake (see ❷).

The key "geometry" indicates where the earthquake occurred (see ❸), and we need to mark the earthquake on the scatter plot based on this information. In the list associated with the key "coordinates" you can find the longitude (see ❹) and latitude (see ❺) where the earthquake occurred.

This file has more nesting levels than the code we have written. If this confuses you, don't worry, Python will handle most of the complicated work for you. We'll only be dealing with one or two levels of nesting at a time. We'll start by extracting a dictionary for each earthquake that occurred in the past 24 hours.

Note that when we talk about location, we usually talk about latitude first, and then longitude. The reason for this habit may be that humans first discovered latitude, and long before the concept of longitude. However, many geological frameworks list longitudes first and latitudes last because this is consistent with mathematical conventions [illustrated]. The geoJSON format follows the convention of (longitude, latitude), but it is important to be aware of the conventions it follows when using other frameworks.

3. Create a list of earthquakes

First, create a list with various information about all earthquakes: eq_explore_data.py

import json
# 探索数据的结构。
filename = 'data/eq_data_1_day_m1.json'
with open(filename) as f:
    all_eq_data = json.load(f)

all_eq_dicts = all_eq_data['features']
print(len(all_eq_dicts))

We extract the data associated with the key 'features' and store it into all_eq_dicts. We know that this file records 158 earthquakes. The following output shows that we extracted all earthquakes recorded in this file:

Note that the code we wrote is very short. The well-formed file readable_eq_data.json contains over 6000 lines, but with just a few lines of code, all the data can be read and stored into a Python list. The following will extract the magnitudes of all earthquakes.

4. Extract magnitude

Now that we have a list of all earthquake data, we can iterate through the list to extract the data we need. Let's extract the magnitude of each earthquake: eq_explore_data.py

  --snip--
  all_eq_dicts = all_eq_data['features']

❶ mags = []
  for eq_dict in all_eq_dicts:
❷     mag = eq_dict['properties']['mag']
      mags.append(mag)

  print(mags[:10])

We create an empty list to store earthquake magnitudes, and iterate over the list all_eq_dicts (see ❶). The magnitude of each earthquake is stored under the 'mag' key in the 'properties' section of the corresponding dictionary (see ❷). We in turn assign the magnitude of the earthquake to the variable mag, and append this variable to the end of the list mags.

To make sure the extracted data is correct, print the magnitudes of the first 10 earthquakes:

[0.96, 1.2, 4.3, 3.6, 2.1, 4, 1.06, 2.3, 4.9, 1.8]

Next, we will extract the location information of each earthquake, and then we can draw the earthquake scatterplot.

5. Extract location data

Position data is stored under the "geometry" key. In the dictionary associated with the "geometry" key, there is a "coordinates" key, which is associated to a list, and the first two values in the list are longitude and latitude. The following demonstrates how to extract location data: eq_explore_data.py

  --snip--
  all_eq_dicts = all_eq_data['features']

  mags, titles, lons, lats = [], [], [], []
  for eq_dict in all_eq_dicts:
      mag = eq_dict['properties']['mag']
❶     title = eq_dict['properties']['title']
❷     lon = eq_dict['geometry']['coordinates'][0]
      lat = eq_dict['geometry']['coordinates'][1]
      mags.append(mag)
      titles.append(title)
      lons.append(lon)
      lats.append(lat)

  print(mags[:10])
  print(titles[:2])
  print(lons[:5])
  print(lats[:5])

We create a list titles to store the title of the location, to extract the value corresponding to the 'title' key in the dictionary 'properties' (see ❶), and a list to store the longitude and latitude. The code eq_dict['geometry'] accesses the dictionary associated with the "geometry" key (see ❷). The second key ('coordinates') fetches the list associated with "coordinates", while index 0 fetches the first value in that list, the longitude of where the earthquake occurred.

When printing the first 5 longitudes and latitudes, the output shows that the extracted data is correct:

[0.96, 1.2, 4.3, 3.6, 2.1, 4, 1.06, 2.3, 4.9, 1.8]
['M 1.0 - 8km NE of Aguanga, CA', 'M 1.2 - 11km NNE of North Nenana, Alaska']
[-116.7941667, -148.9865, -74.2343, -161.6801, -118.5316667]
[33.4863333, 64.6673, -12.1025, 54.2232, 35.3098333]

6. Draw magnitude scatter plot

With the data extracted earlier, the visualization can be drawn. We'll start by implementing a simple magnitude scatterplot, and after making sure we're displaying the correct information, we'll turn our attention to styling and appearance. The code to draw the initial scatterplot is as follows: eq_world_map.py

❶ import plotly.express as px

  fig = px.scatter(
      x=lons,
      y=lats,
      labels={
    
    "x": "经度", "y": "纬度"},
      range_x=[-200, 200],
      range_y=[-90, 90],
      width=800,
      height=800,
      title="全球地震散点图",
❷ )
❸ fig.write_html("global_earthquakes.html")
❹ fig.show()

First, import plotly.express, denoted by the alias px. Plotly Express is a high-level interface of Plotly, which is easy to use and has a syntax similar to Matplotlib (see ❶). Then, call the px.scatter function to configure parameters to create a fig instance, and set the [illustration] axis to longitude [the range is [-200, 200] (expand the space to fully display the earthquake scatter points around 180° east-west longitude)], [Illustration] The axis is latitude [the range is [-90,90]], set the width and height of the scatter plot display to 800 pixels, and set the title to "Global Earthquake Scatter Chart" (see ❷).

In just 14 lines of code, a simple scatterplot is configured, which returns a fig object. The fig.write_html method can save the visualization as an html file. Find the global_earthquakes.html file in the folder and open it with a browser (see ❸). In addition, if you use Jupyter Notebook, you can directly use the fig.show method to display the scatter plot directly in the notebook cell (see ❹).

The local effect is shown in the figure below:

insert image description here

This scatterplot can be modified a lot to make it more meaningful and understandable. Let's make some of these changes.

7. Another way to specify chart data

Before configuring this chart, let's take a look at a slightly different way of specifying data for a Plotly chart. Currently, latitude and longitude data is configured manually:

--snip--
    x=lons,
    y=lats,
    labels={
    
    "x": "经度", "y": "纬度"},
--snip--

This is one of the easiest ways to define data for charts in Plotly Express, but it's not optimal for data manipulation. Here is another equivalent way of defining data for the graph, using the pandas data analysis tool. First create a DataFrame to encapsulate the required data:

import pandas as pd

data = pd.DataFrame(
    data=zip(lons, lats, titles, mags), columns=["经度", "纬度", "位置", "震级"]
)
data.head()

Then, the parameter configuration method can be changed to:

--snip--
    data,
    x="经度",
    y="纬度",
--snip--

In this approach, all information about the data is placed in a dictionary in the form of key-value pairs. If these codes are used in eq_plot.py, the resulting plots are the same. This format allows for seamless data analysis and easier customization than the previous format.

Python application examples (2) data visualization (5)