How to scale multiple KDE plots with different frequencies?

Sofía Contreras :

I'm using Seaborn to plot KDE's of two datasets. But both KDE's are scaled down.

My code:

sns.kdeplot(CDMX['EDAD'], shade=True)
sns.kdeplot(eduacion_superior['EDAD'], shade=True)

which is giving me:

enter image description here

But I'd like to have them scaled proportionally to the data they represent. So, something like:

enter image description here

Any suggestions?

JohanC :

A count can only make sense relative to some bins. As far as I know, seaborn's distplot can show a histogram with counts, but as soon as you also want a kde, both the histogram and the kde are scaled down to get a total area of 1.

To obtain a plot similar to the asked one, standard matplotlib can draw a kde calculated with Scipy. To get a count, one has to decide how the data is binned, as the count depends on the bin size of a related histogram. The simplest way would be to have one bin per unit on the x-axis (so, one per year of age).

Here is some sample code. First some random test data is generated. Then two histograms are drawn, with bins per year of age. In a second plot, the kde's for the same data are plotted, and scaled with the size of the data set.

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

cdmx_edad = np.random.chisquare(15, 10000)+10
ed_sup_edad = np.random.chisquare(20, 5000)+10

fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
bins = np.arange(10,61,1)
ax1.hist(cdmx_edad, bins=bins, color='r', alpha=0.4, label='CDMX edad')
ax1.hist(ed_sup_edad, bins=bins, color='b', alpha=0.4, label='Educación superior edad')
ax1.legend()

cdmx_kde = stats.gaussian_kde(cdmx_edad)
ed_sup_kde = stats.gaussian_kde(ed_sup_edad)
x = np.linspace(10,61,500)
cdmx_curve = cdmx_kde(x)*cdmx_edad.shape[0]
ed_sup_curve = ed_sup_kde(x)*ed_sup_edad.shape[0]
# ax2.plot(x, cdmx_curve, color='r')
ax2.fill_between(x, 0, cdmx_curve, color='r', alpha=0.4, label='CDMX edad')
# ax2.plot(x, ed_sup_curve, color='b')
ax2.fill_between(x, 0, ed_sup_curve, color='b', alpha=0.4, label='Educación superior edad')
ax2.legend()
plt.show()

resulting plot

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=350996&siteId=1