python – 我可以在不改变关联数据的情况下模拟matplotlib中轴的对数比例吗?

我正在尝试显示Zipf plot,它通常以对数日志刻度显示.

我正在使用一个库,它以线性比例给出等级,并以对数比例给出频率.我有以下代码,可以正确地绘制我的数据:

ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414, 
        -19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]

data = {
    'ranks': ranks, 
    'freqs': freqs,
}

df = pd.DataFrame(data=data)

_, ax = plt.subplots(figsize=(7, 7))
ax.set(xscale="log", yscale="linear")
ax.set_title("Zipf plot")
sns.regplot("ranks", "freqs", data=df, ax=ax, fit_reg=False)
ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
ax.grid(True, which="both")
plt.show()

结果图是:

python  – 我可以在不改变关联数据的情况下模拟matplotlib中轴的对数比例吗?

情节看起来不错,但y标签很奇怪.我希望它也以日志增量显示.我目前的解决方法是将freqs列表中每个元素的功率提高10;即,

freqs = [10**freq for freq in freqs]
# ...

并将ax.set中的yscale更改为log;即,

_, ax = plt.subplots(figsize=(7, 7))
ax.set(xscale="log", yscale="log")
ax.set_title("Zipf plot")
# ...

这给了我预期的图(下图),但它需要数据的变换,a)相对昂贵,b)冗余,c)有损.

python  – 我可以在不改变关联数据的情况下模拟matplotlib中轴的对数比例吗?

有没有办法在不改变数据的情况下模拟matplotlib图中轴的对数比例?

解决方法:

首先评论:我个人更喜欢重新调整数据的方法,因为它以更多的内存/ CPU时间为代价使一切变得更加容易,而且准确无关紧要

现在问题是,如何模拟线性轴上的对数刻度

解决方案1:模拟对数刻度

这并不容易.将轴设置为对数刻度会在后台发生很大变化,需要模仿所有这些.

>简单的部分是使用matplotlib.ticker.MultipleLocator()将主要的tickmark频率设置为1
>在看起来对数的位置创建次要刻度线更难.我能想到的最好的解决方案是使用matplotlib.ticker.FixedLocator()手动设置它们
>最后我们需要更改标记来表示实际数字,这意味着它们应该看起来像10 ^( – x)而不是-x.我知道这里有两个选择:

>使用FuncFormatter以科学格式设置值10 ** x.
>使用FuncFormatter以Latex格式设置值10 ^ x.这看起来更好,但与其他情节形成鲜明对比.

我不知道最后一点有什么更好的解决方案,但也许别人可以.

这是代码及其外观.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from matplotlib.ticker import MultipleLocator, FixedLocator, FuncFormatter

###### Locators for Y-axis
# set tickmarks at multiples of 1.
majorLocator = MultipleLocator(1.)
# create custom minor ticklabels at logarithmic positions
ra = np.array([ [n+(1.-np.log10(i))]  for n in xrange(10,20) for i in [2,3,4,5,6,7,8,9][::-1]]).flatten()*-1.
minorLocator = FixedLocator(ra)
###### Formatter for Y-axis (chose any of the following two)
# show labels as powers of 10 (looks ugly)
majorFormatter= FuncFormatter(lambda x,p: "{:.1e}".format(10**x) ) 
# or using MathText (looks nice, but not conform to the rest of the layout)
majorFormatter= FuncFormatter(lambda x,p: r"$10^{"+"{x:d}".format(x=int(x))+r"}$" ) 

ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414, 
        -19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]

data = {
    'ranks': ranks, 
    'freqs': freqs,
}

df = pd.DataFrame(data=data)

_, ax = plt.subplots(figsize=(6, 6))
ax.set(xscale="log", yscale="linear")
ax.set_title("Zipf plot")

sns.regplot("ranks", "freqs", data=df, ax=ax, fit_reg=False)

# Set the locators
ax.yaxis.set_major_locator(majorLocator)
ax.yaxis.set_minor_locator(minorLocator)
# Set formatter if you like to have the ticklabels consistently in power notation
ax.yaxis.set_major_formatter(majorFormatter)

ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
ax.grid(True, which="both")
plt.show()

python  – 我可以在不改变关联数据的情况下模拟matplotlib中轴的对数比例吗?

解决方案2:使用不同的轴

一个不同的解决方案,我首先没有考虑,将使用两个不同的轴,一个具有loglog刻度,看起来很好并产生正确的标签和刻度,另一个用于绘制数据.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414, 
        -19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]

data = {
    'ranks': ranks, 
    'freqs': freqs,
}

df = pd.DataFrame(data=data)

fig, ax = plt.subplots(figsize=(6, 6))
# use 2 axes
# ax is the log, log scale which produces nice labels and ticks
ax.set(xscale="log", yscale="log")
ax.set_title("Zipf plot")
# ax2 is the axes where the values are plottet to
ax2 = ax.twinx()

#plot values to ax2
sns.regplot("ranks", "freqs", data=df, ax=ax2, fit_reg=False)
# set the limits of the log log axis to 10 to the power of the label of ax2
ax.set_ylim(10**np.array(ax2.get_ylim())  )


ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
# remove ticklabels and axislabel from ax2
ax2.set_yticklabels([])
ax2.set_ylabel("")
ax.grid(True, which="both")
plt.show()

python  – 我可以在不改变关联数据的情况下模拟matplotlib中轴的对数比例吗?

上一篇:【Java Web开发学习】Spring加载外部properties配置文件


下一篇:python – Seaborn在热图中显示3位数字的科学记数法