FAQ

Why is pandas not enough?

为啥Pandas不够用？

pandas is a fantastic library for analysis of low-dimensional labelled data - if it can be sensibly described as “rows and columns”, pandas is probably the right choice. However, sometimes we want to use higher dimensional arrays (ndim > 2), or arrays for which the order of dimensions (e.g., columns vs rows) shouldn’t really matter. For example, climate and weather data is often natively expressed in 4 or more dimensions: time, x, y and z.
Pandas的维数是有标签、低维数的，适用于可用“行、列”表示的数据。
然而有时我们需要用到更高维数，或使用不必考虑各维度排列顺序（e.g.行或列？）的维度。例：气候与天气数据一般用4个维度表示：时间，以及空间的x、y、z。

Pandas has historically supported N-dimensional panels, but deprecated them in version 0.20 in favor of Xarray data structures. There are now built-in methods on both sides to convert between pandas and Xarray, allowing for more focussed development effort. Xarray objects have a much richer model of dimensionality - if you were using Panels:
Pandas从一开始就一直支持n维面板，但在0.20版本中为了支持xarray数据结构就废除了n维面板。现在有内置的方法可以使pandas和xarray互相转化，为更集中的开发提供了可能。

You need to create a new factory type for each dimensionality.
（在pandas中……？）你需要为每一个维度数建立一个新的factory type。
.
You can’t do math between NDPanels with different dimensionality.
你不可以在两个拥有不同维度数的NDPanel之间进行数学运算。
.
Each dimension in a NDPanel has a name (e.g., ‘labels’, ‘items’, ‘major_axis’, etc.) but the dimension names refer to order, not their meaning. You can’t specify an operation as to be applied along the “time” axis.
在ndpanel中，每个维度有一个名字(比如 ‘labels’, ‘items’, ‘major_axis’等)但是维度名称代表着它们的顺序而不是含义。（译者解释：比如A数组的第3个代表“名称”，而B数组的第4个代表“名称”，则这两个“名称”不可以用一条指令同时取出，而能取出的只有“第3列”或“第4列”。）你不能沿着具体的某个轴（比如“time”）实现一个操作。
.
You often have to manually convert collections of pandas arrays (Series, DataFrames, etc) to have the same number of dimensions. In contrast, this sort of data structure fits very naturally in an xarray Dataset.
你经常不得不手动转化成百上千条的pandas数组（比如series、数据框等）以它们具有相同的维度数。相比之下xarray非常适合处理维数参差不齐的数据。

You can read about switching from Panels to Xarray here. Pandas gets a lot of things right, but scientific users need fully multi- dimensional data structures.
【panels转化为xarray】pandas有很多优点，但科学家需要多维化程度更高的数据结构。

How do xarray data structures differ from those found in pandas?

xarray数据结构和pandas有啥不同？

The main distinguishing feature of xarray’s DataArray over labeled arrays in pandas is that dimensions can have names (e.g., “time”, “latitude”, “longitude”). Names are much easier to keep track of than axis numbers, and xarray uses dimension names for indexing, aggregation and broadcasting. Not only can you write x.sel(time=’2000-01-01’) and x.mean(dim=’time’), but operations like x - x.mean(dim=’time’) always work, no matter the order of the “time” dimension. You never need to reshape arrays (e.g., with np.newaxis) to align them for arithmetic operations in xarray.

xarray的DataArray数据类型，区别于带标签的pandas数组，最突出的特点就是维度可以有名称 (例如 “time”, “latitude”, “longitude”)。名字可以使跟踪这些轴数字更容易，而且xarray使用这些名字进行索引、收集与播报。你可以写出 x.sel(time='2000-01-01') 和x.mean(dim='time')，也可以写出x - x.mean(dim='time') ，它们均可有效运行，不管“time”处于第几列。在xarray中，你永远不需要重新整理数组的构型。

Should I use xarray instead of pandas?

是否应该使用xarray替代pandas？

It’s not an either/or choice! xarray provides robust support for converting back and forth between the tabular data-structures of pandas and its own multi-dimensional data-structures.
这并不是一个二选一的抉择，xarray支持将xarray的多维数据结构和pandas的表格型数据结构互相转化，且具有很强的鲁棒性。

That said, you should only bother with xarray if some aspect of data is fundamentally multi-dimensional. If your data is unstructured or one-dimensional, stick with pandas.
但是xarray只适用于多维数据结构。如果你的数据没有结构化或是一维的，就用pandas吧。

Why don’t aggregations return Python scalars?

为啥aggregations不返回python标量？

xarray tries hard to be self-consistent: operations on a DataArray (resp. Dataset) return another DataArray (resp. Dataset) object. In particular, operations returning scalar values (e.g. indexing or aggregations like mean or sum applied to all axes) will also return xarray objects.
xarray力图使其自我连续：对一个DataArray的操作返回的是另一个DataArray对象。特别注意：返回标量的操作（例如求和、求平均等）也会返回xarray对象。

Unfortunately, this means we sometimes have to explicitly cast our results from xarray when using them in other libraries. As an illustration, the following code fragment
然而，这就意味着我们要把xarray的结果用于其他library中时，可能必须对其进行清晰的投影。以以下代码为例：

In [1]: arr = xr.DataArray([1, 2, 3])

In [2]: pd.Series({
           'x': arr[0], 
           'mean': arr.mean(), 
           'std': arr.std()
           })

Out[2]: 
mean          <xarray.DataArray ()>\narray(2.)
std     <xarray.DataArray ()>\narray(0.816497)
x              <xarray.DataArray ()>\narray(1)
dtype: object

does not yield the pandas DataFrame we expected. We need to specify the type conversion ourselves:
并不能生成我们期待的那种pandas数据框。我们需要自己确定转化的类型。

扫描二维码关注公众号，回复： 4951477 查看本文章

In [3]: pd.Series({'x': arr[0], 
        'mean': arr.mean(), 
        'std':  arr.std()}, 
        dtype=float)   #增加了数据类型的声明

Out[3]: 
mean    2.000000
std     0.816497
x       1.000000
dtype: float64

Alternatively, we could use the item method or the float constructor to convert values one at a time
或者，我们可以使用item方法或临时转化格式的方式：

In [4]: pd.Series({'x': arr[0].item(), 
                   'mean': float(arr.mean())})

Out[4]: 
mean    2.0
x       1.0
dtype: float64

What is your approach to metadata?

你们是如何处理元数据的？

We are firm believers in the power of labeled data! In addition to dimensions and coordinates, xarray supports arbitrary metadata in the form of global (Dataset) and variable specific (DataArray) attributes (attrs).
我们坚信数据标签的功能很强大！除了维度和坐标，xarray也支持在任意全球数据集格式下和任意变量的具体属性下的元数据。

Automatic interpretation of labels is powerful but also reduces flexibility. With xarray, we draw a firm line between labels that the library understands (dims and coords) and labels for users and user code (attrs). For example, we do not automatically interpret and enforce units or CF conventions. (An exception is serialization to and from netCDF files.)
对标签的自动解释功能强大，但有失灵活性。使用xarray，我们为library能理解的标签（维度和坐标）和用户使用的标签及其属性划清了界限。举个栗子，我们不会自动解释数据并为数据增加单位或CF公约（climate forecast公约）（对netCDF文件的系列化除外）

An implication of this choice is that we do not propagate attrs through most operations unless explicitly flagged (some methods have a keep_attrs option). Similarly, xarray does not check for conflicts between attrs when combining arrays and datasets, unless explicitly requested with the option compat=’identical’. The guiding principle is that metadata should not be allowed to get in the way.
这种选择预示着我们使用的大多数操作都不会散布属性，除非是明确标识的（有的方法有一个keep_attrs选项）。而且，在拼接数组和数据集时，xarray不会检查属性值之间的矛盾，除非使用选项compat='identical'明确提出要求。核心原则是元数据的矛盾不应该妨碍数据的处理和问题的解决。

我还应该知道哪些Python的与netCDF相关的library？

（这些也先不看了，需要的时候再查）
netCDF4-python provides a lower level interface for working with netCDF and OpenDAP datasets in Python. We use netCDF4-python internally in xarray, and have contributed a number of improvements and fixes upstream. xarray does not yet support all of netCDF4-python’s features, such as modifying files on-disk.

Iris (supported by the UK Met office) provides similar tools for in- memory manipulation of labeled arrays, aimed specifically at weather and climate data needs. Indeed, the Iris Cube was direct inspiration for xarray’s DataArray. xarray and Iris take very different approaches to handling metadata: Iris strictly interprets CF conventions. Iris particularly shines at mapping, thanks to its integration with Cartopy.

UV-CDAT is another Python library that implements in-memory netCDF-like variables and tools for working with climate data.

We think the design decisions we have made for xarray (namely, basing it on pandas) make it a faster and more flexible data analysis tool. That said, Iris and CDAT have some great domain specific functionality, and xarray includes methods for converting back and forth between xarray and these libraries. See to_iris() and to_cdms2() for more details.

What other projects leverage xarray?

还有哪些基于xarray开发的工具？

Here are several existing libraries that build functionality upon xarray.
现在已有多种library是在xarray基础上搭建的。（太多了不一一翻译了）

Geosciences 地球科学

aospy: Automated analysis and management of gridded climate data.
infinite-diff: xarray-based finite-differencing, focused on gridded
climate/meterology data
* marc_analysis*: Analysis package for CESM/MARC experiments and output.
MPAS-Analysis: Analysis for simulations produced with Model for Prediction Across Scales (MPAS) components and the Accelerated Climate Model for Energy (ACME).
OGGM: Open Global Glacier Model
Oocgcm: Analysis of large gridded geophysical datasets
Open Data Cube: Analysis toolkit of continental scale Earth Observation data from satellites.
Pangaea:: xarray extension for gridded land surface & weather model output).
Pangeo: A community effort for big data geoscience in the cloud.
PyGDX: Python 3 package for accessing data stored in GAMS Data eXchange (GDX) files. Also uses a custom subclass.
Regionmask: plotting and creation of masks of spatial regions
salem: Adds geolocalised subsetting, masking, and plotting operations to xarray’s data structures via accessors.
Spyfit: FTIR spectroscopy of the atmosphere
windspharm: Spherical harmonic wind analysis in Python.
wrf-python: A collection of diagnostic and interpolation routines for use with output of the Weather Research and Forecasting (WRF-ARW) Model.
xarray-simlab: xarray extension for computer model simulations.
xarray-topo: xarray extension for topographic analysis and modelling.
8xbpch: xarray interface for bpch files.
xESMF: Universal Regridder for Geospatial Data.
xgcm: Extends the xarray data model to understand finite volume grid cells (common in General Circulation Models) and provides interpolation and difference operations for such grids.
xmitgcm: a python package for reading MITgcm binary MDS files into xarray data structures.
xshape: Tools for working with shapefiles, topographies, and polygons in xarray.

Machine Learning 机器学习

cesium: machine learning for time series analysis
Elm: Parallel machine learning on xarray data structures
sklearn-xarray (1): Combines scikit-learn and xarray (1).
sklearn-xarray (2): Combines scikit-learn and xarray (2).

Extend xarray capabilities 拓展xarray功能

Collocate: Collocate xarray trajectories in arbitrary physical dimensions
eofs: EOF analysis in Python.
xarray_extras: Advanced algorithms for xarray objects (e.g. intergrations/interpolations).
xrft: Fourier transforms for xarray data.
xr-scipy: A lightweight scipy wrapper for xarray.
X-regression: Multiple linear regression from Statsmodels library coupled with Xarray library.
xyzpy: Easily generate high dimensional data, including parallelization.

Visualization 可视化

Datashader, geoviews, holoviews: visualization packages for large data
psyplot: Interactive data visualization with python.

Other 其他

ptsa: EEG Time Series Analysis
pycalphad: Computational Thermodynamics in Python

More projects can be found at the “xarray” Github topic.
更多惊喜请在GitHub上搜索xarray哟！

How should I cite xarray?

我应该如何在科技文献中引用xarray？

If you are using xarray and would like to cite it in academic publication, we would certainly appreciate it. We recommend two citations.
如果你使用xarray并想要在学术出版物中引用它，我们很高兴你这么做。我们推荐两种引用方法。

At a minimum, we recommend citing the xarray overview journal article, published in the Journal of Open Research Software.
.
Hoyer, S. & Hamman, J., (2017). xarray: N-D labeled Arrays and Datasets in Python. Journal of Open Research Software. 5(1), p.10. DOI: http://doi.org/10.5334/jors.148

Here’s an example of a BibTeX entry:（BibTex通道）

@article{hoyer2017xarray,
     title     = {xarray: {N-D} labeled arrays and 
                             datasets in {Python}},
    author    = {Hoyer, S. and J. Hamman},
    journal   = {Journal of Open Research Software},
    volume    = {5},
    number    = {1},
    year      = {2017},
    publisher = {Ubiquity Press},
        doi       = {10.5334/jors.148},
        url       = {http://doi.org/10.5334/jors.148}
            }

You may also want to cite a specific version of the xarray package. We provide a Zenodo citation and DOI for this purpose:
也可以引用一个特定版本的xarray包。为此我们提供了一个Zenodo和DOI的引用格式：

https://zenodo.org/badge/doi/10.5281/zenodo.598201.svg

An example BibTeX entry:（BibTex通道）

        @misc{xarray_v0_8_0,
              author = {Stephan Hoyer and Clark Fitzgerald 
                          and Joe Hamman and others},
              title  = {xarray: v0.8.0},
              month  = aug,
              year   = 2016,
              doi    = {10.5281/zenodo.59499},
              url    = {https://doi.org/10.5281/zenodo.59499}

【题外发现】用这个markdown还挺好上手的，格式也很统一，很方便……就是做翻译的时候，设格式容易出现误操作，比如打了一个““`”就导致预览框接收到“把剩余部分全转化成代码”的命令，使运行负荷加重，拖慢打字速度。或许下次先粘贴到记事本里，加完点儿再拖进这里……？试试吧（手动捂脸
：好像可以先生成单行代码，再加回车233333

xarray官方文档翻译笔记（二） FAQ部分

FAQ

Why is pandas not enough?

为啥Pandas不够用？

How do xarray data structures differ from those found in pandas?

xarray数据结构和pandas有啥不同？

Should I use xarray instead of pandas?

是否应该使用xarray替代pandas？

Why don’t aggregations return Python scalars?

为啥aggregations不返回python标量？

What is your approach to metadata?

你们是如何处理元数据的？

我还应该知道哪些Python的与netCDF相关的library？

What other projects leverage xarray?

还有哪些基于xarray开发的工具？

How should I cite xarray?

我应该如何在科技文献中引用xarray？

猜你喜欢

xarray官方文档 翻译笔记（二） FAQ部分

FAQ

Why is pandas not enough?

为啥Pandas不够用？

How do xarray data structures differ from those found in pandas?

xarray数据结构和pandas有啥不同？

Should I use xarray instead of pandas?

是否应该使用xarray替代pandas？

Why don’t aggregations return Python scalars?

为啥aggregations不返回python标量？

What is your approach to metadata?

你们是如何处理元数据的？

What other netCDF related Python libraries should I know about?

我还应该知道哪些Python的与netCDF相关的library？

What other projects leverage xarray?

还有哪些基于xarray开发的工具？

How should I cite xarray?

我应该如何在科技文献中引用xarray？

猜你喜欢

xarray官方文档翻译笔记（二） FAQ部分