Python绘制Venn图

pyvenn

这里介绍一个使用python绘制Venn图的方法,常见的绘制Venn图有两种方法,本文介绍第二种,它提供了2-6个子集时的Venn图绘制方法。

下面是demo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import matplotlib
matplotlib.use('Agg')

import matplotlib.pyplot as plt
import venn

labels = venn.get_labels([range(10), range(5, 15)], fill=['number', 'logic'])
fig, ax = venn.venn2(labels, names=['list 1', 'list 2'])
fig.savefig('betterpyvenn/images/venn2.png', bbox_inches='tight')
plt.close()

labels = venn.get_labels([range(10), range(5, 15), range(3, 8)], fill=['number', 'logic'])
fig, ax = venn.venn3(labels, names=['list 1', 'list 2', 'list 3'])
fig.savefig('betterpyvenn/images/venn3.png', bbox_inches='tight')
plt.close()

labels = venn.get_labels([range(10), range(5, 15), range(3, 8), range(8, 17)], fill=['number', 'logic'])
fig, ax = venn.venn4(labels, names=['list 1', 'list 2', 'list 3', 'list 4'])
fig.savefig('betterpyvenn/images/venn4.png', bbox_inches='tight')
plt.close()

labels = venn.get_labels([range(10), range(5, 15), range(3, 8), range(8, 17), range(10, 20)], fill=['number', 'logic'])
fig, ax = venn.venn5(labels, names=['list 1', 'list 2', 'list 3', 'list 4', 'list 5'])
fig.savefig('betterpyvenn/images/venn5.png', bbox_inches='tight')
plt.close()

labels = venn.get_labels([range(10), range(5, 15), range(3, 8), range(8, 17), range(10, 20), range(13, 25)], fill=['number', 'logic'])
fig, ax = venn.venn6(labels, names=['list 1', 'list 2', 'list 3', 'list 4', 'list 5', 'list 6'])
fig.savefig('betterpyvenn/images/venn6.png', bbox_inches='tight')
plt.close()

两个子集

其他方式展示Venn图

使用python的upsetplot库进行可视化

安装

1
pip install upsetplot

简单例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from upsetplot import generate_counts
example = generate_counts()
example

out:
cat0 cat1 cat2
False False False 56
True 283
True False 1279
True 5882
True False False 24
True 90
True False 429
True 1957
Name: value, dtype: int64

1
2
3
4
5
6
7
from upsetplot import plot
plot(example)
out:
{'matrix': <Axes: >,
'shading': <Axes: >,
'totals': <Axes: >,
'intersections': <Axes: ylabel='Intersection size'>}

X轴:各个集合有无两种逻辑状态,

y轴:展示数量

复杂例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
import pandas as pd
from sklearn.datasets import load_diabetes
from matplotlib import pyplot as plt
from upsetplot import UpSet

# 将数据集加载到DataFrame中
diabetes = load_diabetes()
diabetes_df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)

# 获取与中位数房屋价值最相关的五个特征
correls = diabetes_df.corrwith(pd.Series(diabetes.target), method='spearman').sort_values()
top_features = correls.index[-5:]

# 获取每个顶级特征是否高于平均值的二进制指示符
diabetes_above_avg = diabetes_df > diabetes_df.median(axis=0)
diabetes_above_avg = diabetes_above_avg[top_features]
diabetes_above_avg = diabetes_above_avg.rename(columns=lambda x: x + '>')

# 将该指示符掩码应用到diabetes_df的索引中
diabetes_df = pd.concat([diabetes_df, diabetes_above_avg], axis=1)
diabetes_df = diabetes_df.set_index(list(diabetes_above_avg.columns))

# 同时提供目标变量(中位数房屋价值)
diabetes_df = diabetes_df.assign(progression=diabetes.target)

# 使用库进行可视化
upset = UpSet(diabetes_df, subset_size='count', intersection_plot_elements=3)
upset.add_catplot(value='progression', kind='strip', color='blue')
print(diabetes_df)
upset.add_catplot(value='bmi', kind='strip', color='black')
upset.plot()
plt.title("UpSet with catplots, for orientation='horizontal'")
plt.savefig('./upset2.jpg')
plt.show()

out:
s6> bp> s4> bmi> s5>
False True False True True 0.038076 0.050680 0.061696 0.021872
False False False False -0.001882 -0.044642 -0.051474 -0.026328
True True 0.085299 0.050680 0.044451 -0.005670
True False True -0.089063 -0.044642 -0.011595 -0.036656
True False False False 0.005383 -0.044642 -0.036385 0.021872
... ... ... ... ...
True True False True True 0.041708 0.050680 0.019662 0.059744
False True False False -0.005515 0.050680 -0.015906 -0.067642
True False False False 0.041708 0.050680 -0.015906 0.017293
False True True True True -0.045472 -0.044642 0.039062 0.001215
True False False False False -0.045472 -0.044642 -0.073030 -0.081413

s1 s2 s3 s4 \
s6> bp> s4> bmi> s5>
False True False True True -0.044223 -0.034821 -0.043401 -0.002592
False False False False -0.008449 -0.019163 0.074412 -0.039493
True True -0.045599 -0.034194 -0.032356 -0.002592
True False True 0.012191 0.024991 -0.036038 0.034309
True False False False 0.003935 0.015596 0.008142 -0.002592
... ... ... ... ...
True True False True True -0.005697 -0.002566 -0.028674 -0.002592
False True False False 0.049341 0.079165 -0.028674 0.034309
True False False False -0.037344 -0.013840 -0.024993 -0.011080
...
False True True True True 0.044529 -0.025930 220.0
True False False False False -0.004222 0.003064 57.0

[442 rows x 11 columns]

竖直例子

1
2
3
4
5
6
7
8
9
10
# 竖直方向

upset = UpSet(diabetes_df, subset_size='count', intersection_plot_elements=3,
orientation='vertical')
upset.add_catplot(value='progression', kind='strip', color='blue')
upset.add_catplot(value='bmi', kind='strip', color='black')
upset.plot()
plt.title("UpSet with catplots, for orientation='vertical'")
plt.savefig('./upset3.jpg')
plt.show()

总结

在实际应用过程中,可以结合两个,使用Venn图可以让人眼前一亮,获得定性层面的感知,而绘制upset图可以使读者知道集合中交集量的关系,更加科学严谨。