xarray官方文档 翻译笔记(一):Overview: why xarray?

Overview: Why xarray?

Features特点

Adding dimensions names and coordinate indexes to numpy’s ndarray makes many powerful array operations possible:  

将变量名称和坐标索引加入到numpy的n维数组,可完成多种功能强大的数组操作:

  • Apply operations over dimensions by name: x.sum('time').  
  • 通过变量名操作属性 ;

  • Select values by label instead of integer location:x.loc['2014-01-01'] or x.sel(time='2014-01-01')
  • 通过标签可以筛选出想要的数据,而不是受限于编号;

  • Mathematical operations (e.g., x - y) vectorize across multiple dimensions (array broadcasting) based on dimension names, not shape.
  • 可以以变量的名称为变量,直接进行向量的四则运算;
  •  
  • Flexible split-apply-combine operations with groupby:x.groupby('time.dayofyear').mean().
  • 可以使用groupby模块进行灵活的split-apply-combine(分解,调用,组合)操作;

  • Database like alignment based on coordinate labels that smoothly handles missing values: x, y = xr.align(x, y, join='outer').
  • 对于基于坐标标签的数据库(例如alignment),可以平滑处理缺失值;

  • Keep track of arbitrary metadata in the form of a Python dictionary:x.attrs.
  • 可以追踪一个dict型变量中的任意元数据。

pandas provides many of these features, but it does not make use of dimension names, and its core data structures are fixed dimensional arrays.

pandas也具有很多此类特征,但它不可以调用变量的属性(维度)名称,而且它的核心数据结构的维数是固定的。


The N-dimensional nature of xarray’s data structures makes it suitable for dealing with multi-dimensional scientific data, and its use of dimension names instead of axis labels (dim='time' instead of axis=0) makes such arrays much more manageable than the raw numpy ndarray: with xarray, you don’tneed to keep track of the order of arrays dimensions or insert dummy dimensions(e.g., np.newaxis) to align arrays.

N维xarray数据结构适合用于处理多维科学数据,而且它可以使用维度(属性)的名称来替代坐标轴的编号,使得此类数组比原始numpy数组更具可操作性:使用xarray,你无需时时追踪数组维度的命令或在列好的数组中间笨重地插入新的维度属性。


Core data structures

xarray has two core data structures. Both are fundamentally N-dimensional:

xarray具有两种核心数据结构。

  • DataArray is our implementation of a labeled, N-dimensional array. It is an N-D generalization of a pandas.Series. The nameDataArray itself is borrowed from Fernando Perez’s datarray project,which prototyped a similar data structure.
  • DataArray:对一个有标签的n维数组进行了功能补充的数组“数据数组”。它是一个pandas.Series的n维归一化的结果。它的名字借用于Fernando Perez的 datarray 项目,这个项目中给出了相似的数据结构的原型。

  • Dataset is a multi-dimensional, in-memory array database.It is a dict-like container of DataArray objects aligned along any number of shared dimensions, and serves a similar purpose in xarray to thepandas.DataFrame.
  • Dataset:多维、内存中的数组数据库。它是类似dict的容器,容纳以具有任何数量相同维度聚在一起的DataArray对象群,并在xarray中为pandas数据框(pandas.DataFrame)提供一个类似的功能。

The value of attaching labels to numpy’s numpy.ndarray may be fairly obvious, but the dataset may need more motivation.

为numpy数组numpy.ndarray添加标签的好处已经很明显了,但数据集还需要更多的激活。


The power of the dataset over a plain dictionary is that, in addition to pulling out arrays by name, it is possible to select or combine data along a dimension across all arrays simultaneously.Like aDataFrame, datasets facilitate array operations with heterogeneous data – the difference is that the arrays in a dataset can notonly have different data types, but can also have different numbers ofdimensions.

数据集对于一个普通dict的强大功能在于,它不仅可以根据名称取出所需数组,也为同时基于一个维度在全部数组范围内进行筛选或拼合提供了可能。就像一个DataFrame,数据集可以支持异构数据的数组操作——不同之处在于一个数据集中的数组不仅可以具有不同的数据类型,也可以具有不同的维数。


This data model is borrowed from the netCDF file format, which also provides xarray with a natural and portable serialization format. NetCDF is very popularin the geosciences, and there are existing libraries for reading and writingnetCDF in many programming languages, including Python.

这个数据模型是从netCDF文件格式中借用过来的,它也为xarray带来了一种自然、便携的序列化格式。netCDF在地球科学中很火,已经有不同编程语言下的各种library可以读写netCDF,python也可以。


xarray distinguishes itself from many tools for working with netCDF data in-so-far as it provides data structures for in-memory analytics that both utilize and preserve labels. You only need to do the tedious work of adding metadata once, not every time you save a file.

xarray由于可以处理和分析netCDF数据,而且可以保存和使用标签,它在众多工具中脱颖而出。繁冗的添加元数据的工作你只需进行一次,而不是每次保存文件时都要添加!


Goals and aspirations

pandas excels at working with tabular data. That suffices for many statistical analyses, but physical scientists rely on N-dimensional arrays – which iswhere xarray comes in.

pandas在处理表格数据时表现出色。这能够满足大多数统计学分析的要求,但物理学家依赖多维数组,因此才有了xarray。


xarray aims to provide a data analysis toolkit as powerful as pandas butdesigned for working with homogeneous N-dimensional arrays instead of tabular data. When possible, we copy the pandas API and rely on pandas’s highly optimized internals (in particular, for fast indexing).

xarray力图提供和pandas一样功能强大,但可用于结构相同的多维数组而不是简单用于表格数据的数据处理工具。如果可能的话,我们会搬来pandas API的功能并依靠pandas高度优化的本质(尤其用于快速索引)。


Importantly, xarray has robust support for converting its objects to and from a numpy ndarray or a pandas DataFrame or Series, providing compatibility with the full PyData ecosystem.

最重要的是,xarray支持将xarray对象和numpy的n维数组ndarray、pandas的数据框DataFrame或系列Series互相转化,且具有很强的鲁棒性,对于整个python的数据生态体系表现出强大的兼容性。


Our target audience is anyone who needs N-dimensional labeled arrays, but we are particularly focused on the data analysis needs of physical scientists –especially geoscientists who already know and love netCDF.

我们的目标受众是:需要n维数组,尤其是需要数据分析的物理科学家——尤其尤其尤其尤其是了解且喜欢netCDF的地理科学家。

猜你喜欢

转载自blog.csdn.net/weixin_39781307/article/details/80791702