Quick start

xarrera provides a simple class-based API for defining schemas and validating Xarray objects (and their components).

All schema objects objects have .validate() and to_json methods.

[1]:
import numpy as np
import xarray as xr

from xarrera import DataArraySchema

We’ll start with a simple example that uses the DataArraySchema to validate the following DataArray:

[2]:
da = xr.DataArray(np.ones((4, 10), dtype='i4'), dims=['x', 't'], name='foo')

We can create a schema for this DataArray that includes the datatype, name, and shape. Note that for the shape schema, we’ve used None as a wildcard.

[3]:
schema = DataArraySchema(dtype=np.integer, name='foo', shape=(4, None))

With our schema created, we can now validate our DataArray:

[4]:
schema.validate(da)

When we validate an object that doesn’t conform to our schema, we get a SchemaError:

[5]:
da2 = xr.DataArray(np.ones((4, 10), dtype='f4'), dims=['x', 't'], name='foo')
schema.validate(da2)
---------------------------------------------------------------------------
SchemaError                               Traceback (most recent call last)
Cell In[5], line 2
      1 da2 = xr.DataArray(np.ones((4, 10), dtype='f4'), dims=['x', 't'], name='foo')
----> 2 schema.validate(da2)

File ~/Repos/xarrera/xarrera/dataarray.py:199, in DataArraySchema.validate(self, da)
    196     raise ValueError('Input must be a xarray.DataArray')
    198 if self.dtype is not None:
--> 199     self.dtype.validate(da.dtype)
    201 if self.name is not None:
    202     self.name.validate(da.name)

File ~/Repos/xarrera/xarrera/components.py:74, in DTypeSchema.validate(self, dtype)
     66 '''Validate dtype
     67
     68 Parameters
   (...)     71     Dtype of the DataArray.
     72 '''
     73 if not np.issubdtype(dtype, self.dtype):
---> 74     raise SchemaError(f'dtype {dtype} != {self.dtype}')

SchemaError: dtype float32 != <class 'numpy.integer'>

Schemas can also be exported to JSON:

[6]:
schema.json
[6]:
{'dtype': 'integer', 'shape': [4, None], 'name': 'foo'}

Components

Each component of the Xarray data model is implemented as a stand alone class:

[7]:
from xarrera.components import (
    DTypeSchema,
    DimsSchema,
    ShapeSchema,
    NameSchema,
    ChunksSchema,
    ArrayTypeSchema,
    AttrSchema,
    AttrsSchema
)

# example constructions
dtype_schema = DTypeSchema('i4')
dims_schema = DimsSchema(('x', 'y', None))  # None is used as a wildcard
shape_schema = ShapeSchema((5, 10, None))  # None is used as a wildcard
name_schema = NameSchema('foo')
chunk_schema = ChunksSchema({'x': None, 'y': -1})  # None is used as a wildcard, -1 is used as
ArrayTypeSchema = ArrayTypeSchema(np.ndarray)

# Example usage
dtype_schema.validate(da.dtype)

# Each object schema can be exported to JSON format
chunk_schema = chunk_schema.to_json()
print(chunk_schema)
{"x": null, "y": -1}
[ ]: