superpacman package

Submodules

superpacman.commands module

superpacman.commands.command_parser()[source]
superpacman.commands.main()[source]
usage: superpacman_prog [-h] {play,train,enjoy} ...

Positional Arguments

command

Possible choices: play, train, enjoy

Sub-commands

play

Run the game

superpacman_prog play [-h] [--partial_radius PARTIAL_RADIUS]
Named Arguments
--partial_radius

distance the agent can see

Default: 0

train

Train the agent

superpacman_prog train [-h] [--exp_name EXP_NAME] [--device DEVICE]
                       [--seed SEED] [--env_batch_size ENV_BATCH_SIZE]
                       [--steps_per_batch STEPS_PER_BATCH]
                       [--train_steps TRAIN_STEPS]
                       [--clip_epsilon CLIP_EPSILON] [--gamma GAMMA]
                       [--lmbda LMBDA] [--entropy_eps ENTROPY_EPS]
                       [--max_grad_norm MAX_GRAD_NORM] [--power POWER]
                       [--hidden_dim HIDDEN_DIM]
                       [--max_steps_per_traj MAX_STEPS_PER_TRAJ] [--lr LR]
                       [--lr_sched_step_size LR_SCHED_STEP_SIZE]
                       [--lr_sched_gamma LR_SCHED_GAMMA]
                       [--ppo_steps PPO_STEPS] [--eval_freq EVAL_FREQ]
                       [--eval_len EVAL_LEN]
                       [--logger {csv,wandb,mlflow,tensorboard}]
                       [--warmup_steps WARMUP_STEPS]
                       [--load_checkpoint LOAD_CHECKPOINT]
Named Arguments
--exp_name

experiment name

Default: 'superpacman'

--device

cuda or cpu

Default: 'cpu'

--seed

seed (defaults 42)

Default: 42

--env_batch_size

number of environments

Default: 2048

--steps_per_batch

number of steps to take in env per batch

Default: 8

--train_steps

number of PPO updates to run

Default: 1000

--clip_epsilon

PPO clipping parameter

Default: 0.1

--gamma

GAE gamma parameter

Default: 0.99

--lmbda

GAE lambda parameter

Default: 0.99

--entropy_eps

policy entropy bonus weight

Default: 0.08

--max_grad_norm

gradient clipping

Default: 1.0

--power

power of squeezenet

Default: 5

--hidden_dim

hidden dim size of MLP

Default: 64

--max_steps_per_traj

maximum length of a trajectory

--lr

Adam learning rate

Default: 0.001

--lr_sched_step_size

decay lr after this many steps

Default: 100000.0

--lr_sched_gamma

decay lr after this many steps

Default: 0.7

--ppo_steps

number of ppo updates per batch

Default: 6

--eval_freq

run eval after this many training steps

Default: 100

--eval_len

run eval after this many training steps

Default: 400

--logger

Possible choices: csv, wandb, mlflow, tensorboard

supported loggers

Default: 'csv'

--warmup_steps

delay before starting to learn

Default: 16

--load_checkpoint

load the checkpoint

enjoy

Write a video of the policy in action

superpacman_prog enjoy [-h] [--device DEVICE] [--seed SEED] [--length LENGTH]
                       [--max_steps_per_traj MAX_STEPS_PER_TRAJ]
                       checkpoint
Positional Arguments
checkpoint

checkpoint

Named Arguments
--device

Default: 'cpu'

--seed

Default: 42

--length

rollout length

Default: 400

--max_steps_per_traj

maximum length of a trajectory

superpacman.play module

superpacman.play.play(args)[source]

superpacman.superpacman module

class superpacman.superpacman.Actions(value)[source]

Bases: IntEnum

An enumeration.

E = 1
N = 0
S = 2
W = 3
class superpacman.superpacman.CenterPlayerTransform(in_keys, out_keys, patch_radius=2, fill_value=None)[source]

Bases: ObservationTransform

forward(tensordict)[source]

Reads the input tensordict, and for the selected keys, applies the transform.

By default, this method:

  • calls directly _apply_transform().

  • does not call _step() or _call().

This method is not called within env.step at any point. However, is is called within sample().

Note

forward also works with regular keyword arguments using dispatch to cast the args names to the keys.

Examples

>>> class TransformThatMeasuresBytes(Transform):
...     '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.'''
...     def __init__(self):
...         super().__init__(in_keys=[], out_keys=["bytes"])
...
...     def forward(self, tensordict: TensorDictBase) -> TensorDictBase:
...         bytes_in_td = tensordict.bytes()
...         tensordict["bytes"] = bytes
...         return tensordict
>>> t = TransformThatMeasuresBytes()
>>> env = env.append_transform(t) # works within envs
>>> t(TensorDict(a=0))  # Works offline too.
transform_observation_spec(observation_spec)[source]

Transforms the observation spec such that the resulting spec matches transform mapping.

Parameters:

observation_spec (TensorSpec) – spec before the transform

Returns:

expected spec after the transform

class superpacman.superpacman.DistanceTransform(in_keys, out_keys, normalize=True, h=0.35)[source]

Bases: ObservationTransform

manhattan distance transform from a 2d image The image must be binary, meaning only zeroes and ones in the image uses kornia.contrib distance_transform, which returns the manhattan distance

forward(tensordict)[source]

Reads the input tensordict, and for the selected keys, applies the transform.

By default, this method:

  • calls directly _apply_transform().

  • does not call _step() or _call().

This method is not called within env.step at any point. However, is is called within sample().

Note

forward also works with regular keyword arguments using dispatch to cast the args names to the keys.

Examples

>>> class TransformThatMeasuresBytes(Transform):
...     '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.'''
...     def __init__(self):
...         super().__init__(in_keys=[], out_keys=["bytes"])
...
...     def forward(self, tensordict: TensorDictBase) -> TensorDictBase:
...         bytes_in_td = tensordict.bytes()
...         tensordict["bytes"] = bytes
...         return tensordict
>>> t = TransformThatMeasuresBytes()
>>> env = env.append_transform(t) # works within envs
>>> t(TensorDict(a=0))  # Works offline too.
transform_observation_spec(observation_spec)[source]

Transforms the observation spec such that the resulting spec matches transform mapping.

Parameters:

observation_spec (TensorSpec) – spec before the transform

Returns:

expected spec after the transform

class superpacman.superpacman.FlatTileTransform(in_keys, out_key='flat_obs')[source]

Bases: ObservationTransform

takes all the input keys and outputs a single flat tensor in_keys: tensors to use out_keys: name of flat tensor, defautls to flat_obs

forward(tensordict)[source]

Reads the input tensordict, and for the selected keys, applies the transform.

By default, this method:

  • calls directly _apply_transform().

  • does not call _step() or _call().

This method is not called within env.step at any point. However, is is called within sample().

Note

forward also works with regular keyword arguments using dispatch to cast the args names to the keys.

Examples

>>> class TransformThatMeasuresBytes(Transform):
...     '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.'''
...     def __init__(self):
...         super().__init__(in_keys=[], out_keys=["bytes"])
...
...     def forward(self, tensordict: TensorDictBase) -> TensorDictBase:
...         bytes_in_td = tensordict.bytes()
...         tensordict["bytes"] = bytes
...         return tensordict
>>> t = TransformThatMeasuresBytes()
>>> env = env.append_transform(t) # works within envs
>>> t(TensorDict(a=0))  # Works offline too.
transform_observation_spec(observation_spec)[source]

Transforms the observation spec such that the resulting spec matches transform mapping.

Parameters:

observation_spec (TensorSpec) – spec before the transform

Returns:

expected spec after the transform

class superpacman.superpacman.Ghost(value)[source]

Bases: IntEnum

An enumeration.

BLINKY = 0
CLAUDE = 3
INKY = 2
PINKY = 1
class superpacman.superpacman.GhostMode(value)[source]

Bases: IntEnum

An enumeration.

CHASE = 2
FRIGHTENED = 3
SCATTER = 1
WAIT = 0
class superpacman.superpacman.RGBFullObsTransform[source]

Bases: ObservationTransform

Converts the state to a N, 3, H, W uint8 image tensor Adds it to the tensordict under the key [image]

forward(tensordict)[source]

Reads the input tensordict, and for the selected keys, applies the transform.

By default, this method:

  • calls directly _apply_transform().

  • does not call _step() or _call().

This method is not called within env.step at any point. However, is is called within sample().

Note

forward also works with regular keyword arguments using dispatch to cast the args names to the keys.

Examples

>>> class TransformThatMeasuresBytes(Transform):
...     '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.'''
...     def __init__(self):
...         super().__init__(in_keys=[], out_keys=["bytes"])
...
...     def forward(self, tensordict: TensorDictBase) -> TensorDictBase:
...         bytes_in_td = tensordict.bytes()
...         tensordict["bytes"] = bytes
...         return tensordict
>>> t = TransformThatMeasuresBytes()
>>> env = env.append_transform(t) # works within envs
>>> t(TensorDict(a=0))  # Works offline too.
transform_observation_spec(observation_spec)[source]

Transforms the observation spec such that the resulting spec matches transform mapping.

Parameters:

observation_spec (TensorSpec) – spec before the transform

Returns:

expected spec after the transform

class superpacman.superpacman.RGBPartialObsTransform[source]

Bases: ObservationTransform

Converts the state to a N, 3, H, W uint8 image tensor Adds it to the tensordict under the key [image]

forward(tensordict)[source]

Reads the input tensordict, and for the selected keys, applies the transform.

By default, this method:

  • calls directly _apply_transform().

  • does not call _step() or _call().

This method is not called within env.step at any point. However, is is called within sample().

Note

forward also works with regular keyword arguments using dispatch to cast the args names to the keys.

Examples

>>> class TransformThatMeasuresBytes(Transform):
...     '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.'''
...     def __init__(self):
...         super().__init__(in_keys=[], out_keys=["bytes"])
...
...     def forward(self, tensordict: TensorDictBase) -> TensorDictBase:
...         bytes_in_td = tensordict.bytes()
...         tensordict["bytes"] = bytes
...         return tensordict
>>> t = TransformThatMeasuresBytes()
>>> env = env.append_transform(t) # works within envs
>>> t(TensorDict(a=0))  # Works offline too.
transform_observation_spec(observation_spec)[source]

Transforms the observation spec such that the resulting spec matches transform mapping.

Parameters:

observation_spec (TensorSpec) – spec before the transform

Returns:

expected spec after the transform

class superpacman.superpacman.StackTileTransform(in_keys, out_key='image')[source]

Bases: ObservationTransform

stacks all the in_keys into an N, C, H, W tensor and outputs it under the out_key in_keys: must be N, H, W out_key: string, default “image”

forward(tensordict)[source]

Reads the input tensordict, and for the selected keys, applies the transform.

By default, this method:

  • calls directly _apply_transform().

  • does not call _step() or _call().

This method is not called within env.step at any point. However, is is called within sample().

Note

forward also works with regular keyword arguments using dispatch to cast the args names to the keys.

Examples

>>> class TransformThatMeasuresBytes(Transform):
...     '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.'''
...     def __init__(self):
...         super().__init__(in_keys=[], out_keys=["bytes"])
...
...     def forward(self, tensordict: TensorDictBase) -> TensorDictBase:
...         bytes_in_td = tensordict.bytes()
...         tensordict["bytes"] = bytes
...         return tensordict
>>> t = TransformThatMeasuresBytes()
>>> env = env.append_transform(t) # works within envs
>>> t(TensorDict(a=0))  # Works offline too.
transform_observation_spec(observation_spec)[source]

Transforms the observation spec such that the resulting spec matches transform mapping.

Parameters:

observation_spec (TensorSpec) – spec before the transform

Returns:

expected spec after the transform

class superpacman.superpacman.SuperPacman(*args, **kwargs)[source]

Bases: EnvBase

batch_locked = False
static gen_params(batch_size=None)

To change the layout of the gridworld, change these parameters

walls: 1 indicates the position of a wall. The boundary grid cells must have a wall. rewards: The amount of reward for entering the tile.

Rewards are only received the first time the agent enters the tile

terminal_states: Indicated by 1, when this tile is entered, the terminated flag is set true

Parameters:

batch_size – the number of environments to run simultaneously

Returns:

a batch_size tensordict, with the following entries

”player_pos”: N, 2 tensor indices that correspond to the players location “player_tiles”: N, 21, 21 tensor, with a single tile set to 1 that indicates player position “wall_tiles”: N, 21, 21 tensor, 1 indicates wall “reward_tiles”: N, 21, 21 tensor, rewards remaining in environment “energizer_tiles”: N, 21, 21 tensor, episode will terminate when tile with value True is entered

metadata = {'render_fps': 30, 'render_modes': ['human', '']}
superpacman.superpacman.gen_params(batch_size=None)[source]

To change the layout of the gridworld, change these parameters

walls: 1 indicates the position of a wall. The boundary grid cells must have a wall. rewards: The amount of reward for entering the tile.

Rewards are only received the first time the agent enters the tile

terminal_states: Indicated by 1, when this tile is entered, the terminated flag is set true

Parameters:

batch_size – the number of environments to run simultaneously

Returns:

a batch_size tensordict, with the following entries

”player_pos”: N, 2 tensor indices that correspond to the players location “player_tiles”: N, 21, 21 tensor, with a single tile set to 1 that indicates player position “wall_tiles”: N, 21, 21 tensor, 1 indicates wall “reward_tiles”: N, 21, 21 tensor, rewards remaining in environment “energizer_tiles”: N, 21, 21 tensor, episode will terminate when tile with value True is entered

superpacman.superpacman.has_transform(env, transform_class)[source]
superpacman.superpacman.hex_to_rgb(hex_color)[source]
superpacman.superpacman.make_env(env_batch_size, obs_keys=None, device='cpu', ego_patch_radius=10, log_video=False, log_stats=True, seed=None, logger=None, max_steps=None)[source]

configures and returns a Superpacman environment, ready to use.

Parameters:
  • env_batch_size – the number of environments N to run in parallel

  • obs_keys – a string or list of strings.. valid values are “image” => adds a (N, 9, 21, 21) tensor to the state tensordict in key [‘image’] “ego_image” => adds a (N, 9, ego_patch_radius * 2 + 1, ego_patch_radius * 2 + 1) tensor centered on pacman in key[‘ego_image’] “pixels” => adds a (N, 3, 21, 21) RGB image in key [‘pixels’] “ego_pixels” -> add a (N, 3, ego_patch_radius * 2 + 1, ego_patch_radius * 2 + 1) tensor centered on pacman in key[‘ego_pixels’]

  • device – device to use, “cuda” or “cpu”

  • ego_patch_radius – if “ego_image” or “ego_pixels”, then this set the visible radius around pacman

  • log_video – if true, the environment will enable logging RGB video to log… >>> env = make_env(9, log_video=True) >>> env.rollout(100) >>> env.recorder.dump(“hello”) by default logs will be created in logs/superpacman/videos/pacman_hello_0.mp4 set a logger to control where the logs will be set # note… logging video will increase memory utilization, so don’t use it in the training loop unless debugging

  • log_stats – adds [‘step_count’] and [‘episode_reward’] to the state dict, step_count being the total number

  • far (of steps on the trajectory.. episode_reward is set to the sum of reward recieved so)

  • eg (Make sure you use the "next" key to get episode_reward)

  • make_env (>>> env =)

  • env.rollout (>>> state =)

  • state[0][-1]['next' (>>>)

  • 'episode_reward']

  • seed – random seed to use.. default None

  • logger

  • max_steps

Returns:

superpacman.superpacman.pos_to_grid(pos, H, W, device='cpu', dtype=torch.float32)[source]

Converts positions to grid where 1 indicates a position :param pos: N, …, 2 tensor of grid positions (x = H, y = W) or 2 tensor :param H: height :param W: width :param: device: device :param: dtype: type of tensor :return: N, H, W tensor or single H, W tensor

superpacman.train_ppo module

class superpacman.train_ppo.Policy(in_features, in_channels, hidden_dim, actions_n, batchnorm_momentum=0.1)[source]

Bases: Module

Policy network for flat observation

forward(flat_obs, image)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class superpacman.train_ppo.VGGConvBlock(in_channels, batchnorm_momentum=0.1)[source]

Bases: Module

forward(image)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class superpacman.train_ppo.Value(in_features, in_channels, hidden_dim, batchnorm_momentum=0.1)[source]

Bases: Module

MLP value function

forward(flat_obs, image)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

superpacman.train_ppo.enjoy_checkpoint(args)[source]
superpacman.train_ppo.make_policy_module(policy_net, in_keys, device)[source]
superpacman.train_ppo.rollout_checkpoint(checkpoint_filename, suffix, logger, device='cpu', seed=42, len=400, max_steps_per_trajectory=None)[source]
superpacman.train_ppo.train(args)[source]

Optimize the agent using Proximal Policy Optimization (Actor - Critic) the Generalized Advantage Estimation module is used to compute Advantage

Module contents