motornet.environments¶

class motornet.environment.Environment(effector, q_init=None, name: str = 'Env', differentiable: bool = True, max_ep_duration: float = 1.0, action_noise: float = 0.0, obs_noise: float | list = 0.0, action_frame_stacking: int = 0, proprioception_delay: float = None, vision_delay: float = None, proprioception_noise: float = 0.0, vision_noise: float = 0.0, **kwargs)¶

Bases: Env, Module

Base class for environments.

Parameters:

effector – motornet.effector.Effector object class or subclass. This is the effector that will evolve in the environment.
q_init – Tensor or numpy.ndarray, the desired initial joint states for the environment, if a single set of pre-defined initial joint states is desired. If None, random initial joint states are drawn at each call of reset(). This parameter will be ignored on reset() calls where a joint_state is provided as an option.
name – String, the name of the environment object instance.
differentiable – Boolean, whether the environment will be differentiable or not. This will usually be useful for reinforcement learning, where the differentiability is not needed.
max_ep_duration – Float, the maximum duration of an episode, in seconds.
action_noise – Float, the standard deviation of the Gaussian noise added to the action input at each step of the simulation.
obs_noise – Float or list, the standard deviation of the Gaussian noise added to the observation vector at each step of the simulation. If this is a list, it should have as many elements as the observation vector and will indicate the standard deviation of each observation element independently.
action_frame_stacking – Integer, the number of past action steps to add to the observation vector.
proprioception_delay – Float, the delay in seconds for the proprioceptive feedback to be added to the observation vector. If None, no delay will occur.
vision_delay – Float, the delay in seconds for the visual feedback to be added to the observation vector. If None, no delay will occur.
proprioception_noise – Float, the standard deviation of the Gaussian noise added to the proprioceptive feedback at each step of the simulation.
vision_noise – Float, the standard deviation of the Gaussian noise added to the visual feedback at each step of the simulation.
**kwargs – This is passed as-is to the torch.nn.Module parent class.

apply_noise(loc: Tensor, noise: float | list) → Tensor¶

Applies element-wise Gaussian noise to the input loc.

Parameters:

loc – input on which the Gaussian noise is applied, which in probabilistic terms make it the mean of the Gaussian distribution.
noise – Float or list, the standard deviation (spread or “width”) of the distribution. Must be non-negative. If this is a list, it must contain as many elements as the second axis of `loc, and the Gaussian distribution for each column of loc will have a different standard deviation. Note that the elements within each column of loc will still be independent and identically distributed (i.i.d.).

Returns:

A noisy version of loc as a tensor.

detach(x: Any) → Any¶

Converts a tensor to a numpy.ndarray on the CPU, or returns x unchanged if it is not a tensor.

Parameters:: x – The value to detach.
Returns:: A numpy.ndarray if x is a tensor, otherwise x unchanged.

get_attributes() → tuple[list[str], list]¶

Gets all non-callable attributes declared in the object instance, excluding gym.spaces.Space attributes, the effector, muscle, and skeleton attributes.

Returns:

A list of attribute names as string elements.
A list of attribute values.

get_obs(action: Tensor | None = None, deterministic: bool = False) → Tensor | ndarray¶

Returns a (batch_size, n_features) tensor containing the (potentially time-delayed) observations. By default, this is the task goal, followed by the output of the get_proprioception() method, the output of the get_vision() method, and finally the last action_frame_stacking action sets, if a non-zero action_frame_stacking keyword argument was passed at initialization of this class instance. .i.i.d. Gaussian noise is added to each element in the tensor, using the obs_noise attribute.

Parameters:

action – Tensor or None, the action taken at the current step. Used to update the action frame buffer when action_frame_stacking is non-zero. Default: None.
deterministic – Boolean, if True, observation noise is not applied. Default: False.

Returns:

The observation vector as tensor or numpy.ndarray, depending on whether the Environment is set as differentiable or not.

get_proprioception() → Tensor¶: Returns a (batch_size, n_features) tensor containing the instantaneous (non-delayed) proprioceptive feedback. By default, this is the normalized muscle length for each muscle, followed by the normalized muscle velocity for each muscle as well. .i.i.d. Gaussian noise is added to each element in the tensor, using the proprioception_noise attribute.

get_save_config() → dict¶

Gets the environment object’s configuration as a dictionary.

Returns:: A dictionary containing the parameters of the environment’s configuration. All parameters held as non-callable attributes by the object instance will be included in the dictionary, excluding gym.spaces.Space attributes, the effector, muscle, and skeleton attributes.

get_vision() → Tensor¶: Returns a (batch_size, n_features) tensor containing the instantaneous (non-delayed) visual feedback. By default, this is the cartesian position of the end-point effector, that is, the fingertip. .i.i.d. Gaussian noise is added to each element in the tensor, using the vision_noise attribute.

joint2cartesian(joint_states: Tensor) → Tensor¶: Shortcut to motornet.effector.Effector.joint2cartesian() method.

print_attributes() → None¶: Prints all non-callable attributes declared in the object instance, excluding gym.spaces.Space attributes, the effector, muscle, and skeleton attributes.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → tuple[Tensor | ndarray, dict[str, Any]]¶

Initialize the task goal and effector states for a (batch of) simulation episode(s). The effector states (joint, cartesian, muscle, geometry) are initialized to be biomechanically compatible with each other. This method is likely to be overwritten by any subclass to implement user-defined computations, such as defining a custom initial goal or initial states.

Parameters:

seed – Integer, the seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again.
options – Dictionary, optional kwargs specific to motornet environments. This is mainly useful to pass batch_size, joint_state, and deterministic kwargs if desired, as described below.

Options:

batch_size: Integer, the desired batch size. Default: 1.
joint_state: The joint state from which the other state values are inferred. If None, the q_init value declared during the class instantiation will be used. If q_init is also None, random initial joint states are drawn, from which the other state values are inferred. Default: None.
deterministic: Boolean, whether observation, proprioception, and vision noise are applied. Default: False.

Returns:

The observation vector as tensor or numpy.ndarray, if the Environment is set as differentiable or not, respectively. It has dimensionality (batch_size, n_features).
A dictionary containing the initial step’s information.

step(action: Tensor | ndarray, deterministic: bool = False, **kwargs) → tuple[Tensor | ndarray, ndarray | None, bool, bool, dict[str, Any]]¶

Perform one simulation step. This method is likely to be overwritten by any subclass to implement user-defined computations, such as reward value calculation for reinforcement learning, custom truncation or termination conditions, or time-varying goals.

Parameters:

action – Tensor or numpy.ndarray, the input drive to the actuators.
deterministic – Boolean, whether observation, action, proprioception, and vision noise are applied.
**kwargs – Passed as-is to the motornet.effector.Effector.step() call. Mainly useful to pass endpoint_load or joint_load kwargs.

Returns:

The observation vector as tensor or numpy.ndarray, if the Environment is set as differentiable or not, respectively. It has dimensionality (batch_size, n_features).
A numpy.ndarray with the reward information for the step, with dimensionality (batch_size, 1). This is None if the Environment is set as differentiable. By default this always returns 0. in the Environment.
A boolean indicating if the simulation has been terminated or truncated. If the Environment is set as differentiable, this returns True when the simulation time reaches max_ep_duration provided at initialization.
A boolean indicating if the simulation has been truncated early or not. This always returns False if the Environment is set as differentiable.
A dictionary containing this step’s information.

to(*args, **kwargs)¶

Move and/or cast the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)

to(dtype, non_blocking=False)

to(tensor, non_blocking=False)

to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtypes. In addition, this method will only cast the floating point or complex parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters:

device (torch.device) – the desired device of the parameters and buffers in this module
dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module
tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns:

self

Return type:

Module

Examples:

>>> # xdoctest: +IGNORE_WANT("non-deterministic")
>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

update_obs_buffer(action: Tensor | None = None) → None¶

Shifts each observation buffer by one step and appends the current observation.

Parameters:: action – Tensor or None, the most recent action, used to update the action frame buffer when action_frame_stacking is non-zero. Default: None.

class motornet.environment.RandomTargetReach(*args, **kwargs)¶

Bases: Environment

A reach-to-target environment in which both the starting position and the target are drawn uniformly at random from the reachable workspace at each episode reset.

Parameters:

*args – Positional arguments passed as-is to the parent Environment class.
**kwargs – Keyword arguments passed as-is to the parent Environment class.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → tuple[Any, dict[str, Any]]¶

Overrides Environment.reset() to draw a random fingertip target position as the goal, sampled uniformly from the full reachable workspace.

Parameters:

seed – Integer, the seed that is used to initialize the environment’s PRNG. See Environment.reset() for full details.
options – Dictionary, optional kwargs. Accepts the same keys as Environment.reset().

Returns:

The observation vector as tensor or numpy.ndarray.
A dictionary containing the initial step’s information.