Writing Components

Note

This section is intended for model developers. If you intend to use only components that are already written, you can probably ignore it.

Perhaps the best way to learn how to write components is to read components someone else has written. For example, you can look at the CliMT project. Here we will go over a couple examples of physically simple, made-up components to talk about the parts of their code.

Writing an Example

Let’s start with a Prognostic component which relaxes temperature towards some target temperature.

from sympl import (
    Prognostic, get_numpy_arrays_with_properties,
    restore_data_arrays_with_properties)

class TemperatureRelaxation(Prognostic):

    input_properties = {
        'air_temperature': {
            'dims': ['*'],
            'units': 'degK',
        },
        'vertical_wind': {
            'dims': ['*'],
            'units': 'm/s',
            'match_dims_like': ['air_temperature']
        }
    }

    diagnostic_properties = {}

    tendency_properties = {
        'air_temperature': {
            'dims_like': 'air_temperature',
            'units': 'degK/s',
        }
    }

    def __init__(self, tau=1., target_temperature=300.):
        self._tau = tau
        self._T0 = target_temperature

    def __call__(self, state):
        # we get numpy arrays with specifications from input_properties
        raw_arrays = get_numpy_arrays_with_properties(
            state, self.input_properties)
        T = raw_arrays['air_temperature']
        # here the actual computation happens
        raw_tendencies = {
            'air_temperature': (T - self._T0)/self._tau,
        }
        # now we re-format the data in a way the host model can use
        diagnostics = {}
        tendencies = restore_data_arrays_with_properties(
            raw_tendencies, self.tendency_properties,
            state, self.input_properties)
        return tendencies, diagnostics

Imports

There are a lot of parts to that code, so let’s go through some of them step-by-step. First we have to import objects and functions from Sympl that we plan to use. The import statement should always go at the top of your file so that it can be found right away by anyone reading your code.

from sympl import (
    Prognostic, get_numpy_arrays_with_properties,
    restore_data_arrays_with_properties)

Define an Object

Once these are imported, there’s this line:

class TemperatureRelaxation(Prognostic):

This is the syntax for defining an object in Python. TemperatureRelaxation will be the name of the new object. The Prognostic in parentheses is telling Python that TemperatureRelaxation is a subclass of Prognostic. This tells Sympl that it can expect your object to behave like a Prognostic.

Define Attributes

The next few lines define attributes of your object:

input_properties = {
    'air_temperature': {
        'dims': ['*'],
        'units': 'degK',
    },
    'eastward_wind': {
        'dims': ['*'],
        'units': 'm/s',
        'match_dims_like': ['air_temperature']
    }
}

diagnostic_properties = {}

tendency_properties = {
    'air_temperature': {
        'dims_like': 'air_temperature',
        'units': 'degK/s',
    }
}

Note

‘eastward_wind’ wouldn’t normally make sense as an input for this object, it’s only included so we can talk about match_dims_like.

These attributes will be attributes both of the class object you’re defining and of any instances of that object. That means you can access them using:

TemperatureRelaxation.input_properties

or on an instance, as when you do:

prognostic = TemperatureRelaxation()
prognostic.input_properties

These properties are described in Component Types. They are very useful! They clearly document your code. Here we can see that air_temperature will be used as a 1-dimensional flattened array in units of degrees Kelvin. Sympl can also understand these properties, and use them to automatically acquire arrays in the dimensions and units that you need. It can also test thatsome of these properties are accurate. It’s your responsibility, though, to make sure that the input units are the units you want to acquire in the numpy array data, and that the output units are the units of the values in the raw output arrays that you want to convert to DataArray objects.

It is possible that some of these attributes won’t be known until you create the object (they may depend on things passed in on initialization). If that’s the case, you can write the __init__ method (see below) so that it sets any relevant properties like self.input_properties to have the correct values.

Initialization Method

Next we see a method being defined for this class, which may seem to have a weird name:

def __init__(self, damping_timescale=1., target_temperature=300.):
    """
    damping_timescale is the damping timescale in seconds.
    target_temperature is the temperature that will be relaxed to,
    in degrees Kelvin.
    """
    self._tau = damping_timescale
    self._T0 = target_temperature

This is the function that is called when you create an instance of your object. All methods on objects take in a first argument called self. You don’t see it when you call those methods, it gets added in automatically. self is a variable that refers to the object on which the method is being called - it’s the object itself! When you store attributes on self, as we see in this code, they stay there. You can access them when the object is called later.

Notice some things about the way variables have been named in this __init__. The parameters are fairly verbose names which almost fully describe what they are (apart from the units, which are in the documentation string). This is best because it is entirely clear what these values are when others are using your object. You write code for people, not computers! Compilers write code for computers.

Then we take these inputs and store them as attributes with shorter names. This is also optimal. What these attributes mean is clearly defined in the two lines:

self._tau = damping_timescale
self._T0 = target_temperature

Obviously self._tau is the damping timescale, and self._T0 is the target temperature for the relaxation. Now you can use these shorter variables in the actual code to keep long lines for equations short, knowing that your variables are well-documented.

The Computation

That brings us to the __call__ method. This is what’s called when you use the object as though it is a function. In Sympl components, this is the method which takes in a state dictionary and returns dictionaries with outputs.

def __call__(self, state):
    # we get numpy arrays with specifications from input_properties
    raw_arrays = get_numpy_arrays_with_properties(
        state, self.input_properties)
    T = raw_arrays['air_temperature']
    # here the actual computation happens
    raw_tendencies = {
        'air_temperature': (T - self._T0)/self._tau,
    }
    # now we re-format the data in a way the host model can use
    diagnostics = {}
    tendencies = restore_data_arrays_with_properties(
        raw_tendencies, self.tendency_properties,
        state, self.input_properties)
    return diagnostics, tendencies

There are two helper functions used in this code that we strongly recommend using. They take care of the work of making sure you get variables that are in the units your component needs, and have the dimensions your component needs.

get_numpy_arrays_with_properties() uses the input_properties dictionary you give it to extract numpy arrays with those properties from the input state. It will convert units to ensure the numbers are in the specified units, and it will reshape the data to give it the shape specified in dims. For example, if dims is ['*', 'z'] then it will give you a 2-dimensional array whose second axis is the vertical, and first axis is a flattening of any other dimensions. If you specify ['*', 'mid_levels'] then the result is similar, but only ‘mid_levels’ is an acceptable vertical dimension. The match_dims_like property on air_pressure tells Sympl that any wildcard-matched dimensions (ones that match ‘x’, ‘y’, ‘z’, or ‘*’) should be the same between the two quantities, meaning they’re on the same grid for those wildcards. You can still, however, have one be on say ‘mid_levels’ and another on ‘interface_levels’ if those dimensions are explicitly listed (instead of listing ‘z’).

restore_data_arrays_with_properties() does something fairly magical. In this example, it takes the raw_tendencies dictionary and converts the value for ‘air_temperature’ from a numpy array to a DataArray that has the same dimensions as air_temperature had in the input state. That means that you could pass this object a state with whatever dimensions you want, whether it’s (x, y, z), or (z, x, y), or (x, y), or (station_number, z), etc. and this component will be able to take in that state, and return a tendency dictionary with the same dimensions (and order) that the model uses! And internally you can work with a simple 1-dimensional array. This is particularly useful for writing pointwise components using ['*'] or column components with ['*', 'z'] or ['z', '*'].

You can read more about properties in the section Input/Output Properties.

sympl.get_numpy_arrays_with_properties(state, property_dictionary)[source]
Parameters:
  • state (dict) – A state dictionary.
  • property_dictionary (dict) – A dictionary whose keys are quantity names and values are dictionaries with properties for those quantities. The property “dims” must be present, indicating the dimensions that the quantity must have when it is returned as a numpy array. The property “units” must be present, and will be used to check the units on the input state and perform a conversion if necessary. If the optional property “match_dims_like” is present, its value should be a quantity also present in property_dictionary, and it will be ensured that any shared wildcard dimensions (‘x’, ‘y’, ‘z’, ‘*’) for this quantity match the same dimensions as the specified quantity.
Returns:

out_dict – A dictionary whose keys are quantity names and values are numpy arrays containing the data for those quantities, as specified by property_dictionary.

Return type:

dict

Raises:
  • InvalidStateError – If a DataArray in the state is missing an explicitly-specified dimension defined in its properties (dimension names other than ‘x’, ‘y’, ‘z’, or ‘*’), or if the state is missing a required quantity.
  • InvalidPropertyError – If a quantity in property_dictionary is missing values for “dims” or “units”.
sympl.restore_data_arrays_with_properties(raw_arrays, output_properties, input_state, input_properties)[source]
Parameters:
  • raw_arrays (dict) – A dictionary whose keys are quantity names and values are numpy arrays containing the data for those quantities.
  • output_properties (dict) – A dictionary whose keys are quantity names and values are dictionaries with properties for those quantities. The property “dims_like” must be present, and specifies an input quantity that the dimensions of the output quantity should be like. All other properties are included as attributes on the output DataArray for that quantity, including “units” which is required.
  • input_state (dict) – A state dictionary that was used as input to a component for which DataArrays are being restored.
  • input_properties (dict) – A dictionary whose keys are quantity names and values are dictionaries with input properties for those quantities. The property “dims” must be present, indicating the dimensions that the quantity was transformed to when taken as input to a component.
Returns:

out_dict – A dictionary whose keys are quantities and values are DataArrays corresponding to those quantities, with data, shapes and attributes determined from the inputs to this function.

Return type:

dict

Raises:

InvalidPropertyDictError – When an output property is specified to have dims_like an input property, but the arrays for the two properties have incompatible shapes.

Aliases

Note

Using aliases isn’t necessary, but it may make your code easier to read if you have long quantity names

Let’s say if instead of the properties we set before, we have

input_properties = {
    'air_temperature': {
        'dims': ['*'],
        'units': 'degK',
        'alias': 'T',
    },
    'eastward_wind': {
        'dims': ['*'],
        'units': 'm/s',
        'match_dims_like': ['air_temperature']
        'alias': 'u',
    }
}

The difference here is we’ve set ‘T’ and ‘u’ to be aliases for ‘air_temperature’ and ‘eastward_wind’. What does that mean? Well, in the computational code, we can write:

def __call__(self, state):
    # we get numpy arrays with specifications from input_properties
    raw_arrays = get_numpy_arrays_with_properties(
        state, self.input_properties)
    T = raw_arrays['T']
    # here the actual computation happens
    raw_tendencies = {
        'T': (T - self._T0)/self._tau,
    }
    # now we re-format the data in a way the host model can use
    diagnostics = {}
    tendencies = restore_data_arrays_with_properties(
        raw_tendencies, self.tendency_properties,
        state, self.input_properties)
    return diagnostics, tendencies

Instead of using ‘air_temperature’ in the raw_arrays and raw_tendencies dictionaries, we can use ‘T’. This doesn’t matter much for a name as short as air_temperature, but it might matter for longer names like ‘correlation_of_eastward_wind_and_liquid_water_potential_temperature_on_interface_levels’.

Also notice that even though the alias is set in input_properties, it is also used when restoring DataArrays. If there is an output that is not also an input, the alias could instead be set in diagnostic_properties, tendency_properties, or output_properties, wherever is relevant.