superpacman package
Submodules
superpacman.commands module
usage: superpacman_prog [-h] {play,train,enjoy} ...
Positional Arguments
- command
Possible choices: play, train, enjoy
Sub-commands
play
Run the game
superpacman_prog play [-h] [--partial_radius PARTIAL_RADIUS]
Named Arguments
- --partial_radius
distance the agent can see
Default:
0
train
Train the agent
superpacman_prog train [-h] [--exp_name EXP_NAME] [--device DEVICE]
[--seed SEED] [--env_batch_size ENV_BATCH_SIZE]
[--steps_per_batch STEPS_PER_BATCH]
[--train_steps TRAIN_STEPS]
[--clip_epsilon CLIP_EPSILON] [--gamma GAMMA]
[--lmbda LMBDA] [--entropy_eps ENTROPY_EPS]
[--max_grad_norm MAX_GRAD_NORM] [--power POWER]
[--hidden_dim HIDDEN_DIM]
[--max_steps_per_traj MAX_STEPS_PER_TRAJ] [--lr LR]
[--lr_sched_step_size LR_SCHED_STEP_SIZE]
[--lr_sched_gamma LR_SCHED_GAMMA]
[--ppo_steps PPO_STEPS] [--eval_freq EVAL_FREQ]
[--eval_len EVAL_LEN]
[--logger {csv,wandb,mlflow,tensorboard}]
[--warmup_steps WARMUP_STEPS]
[--load_checkpoint LOAD_CHECKPOINT]
Named Arguments
- --exp_name
experiment name
Default:
'superpacman'
- --device
cuda or cpu
Default:
'cpu'
- --seed
seed (defaults 42)
Default:
42
- --env_batch_size
number of environments
Default:
2048
- --steps_per_batch
number of steps to take in env per batch
Default:
8
- --train_steps
number of PPO updates to run
Default:
1000
- --clip_epsilon
PPO clipping parameter
Default:
0.1
- --gamma
GAE gamma parameter
Default:
0.99
- --lmbda
GAE lambda parameter
Default:
0.99
- --entropy_eps
policy entropy bonus weight
Default:
0.08
- --max_grad_norm
gradient clipping
Default:
1.0
- --power
power of squeezenet
Default:
5
- --hidden_dim
hidden dim size of MLP
Default:
64
- --max_steps_per_traj
maximum length of a trajectory
- --lr
Adam learning rate
Default:
0.001
- --lr_sched_step_size
decay lr after this many steps
Default:
100000.0
- --lr_sched_gamma
decay lr after this many steps
Default:
0.7
- --ppo_steps
number of ppo updates per batch
Default:
6
- --eval_freq
run eval after this many training steps
Default:
100
- --eval_len
run eval after this many training steps
Default:
400
- --logger
Possible choices: csv, wandb, mlflow, tensorboard
supported loggers
Default:
'csv'
- --warmup_steps
delay before starting to learn
Default:
16
- --load_checkpoint
load the checkpoint
enjoy
Write a video of the policy in action
superpacman_prog enjoy [-h] [--device DEVICE] [--seed SEED] [--length LENGTH]
[--max_steps_per_traj MAX_STEPS_PER_TRAJ]
checkpoint
Positional Arguments
- checkpoint
checkpoint
Named Arguments
- --device
Default:
'cpu'
- --seed
Default:
42
- --length
rollout length
Default:
400
- --max_steps_per_traj
maximum length of a trajectory
superpacman.play module
superpacman.superpacman module
- class superpacman.superpacman.Actions(value)[source]
Bases:
IntEnum
An enumeration.
- E = 1
- N = 0
- S = 2
- W = 3
- class superpacman.superpacman.CenterPlayerTransform(in_keys, out_keys, patch_radius=2, fill_value=None)[source]
Bases:
ObservationTransform
- forward(tensordict)[source]
Reads the input tensordict, and for the selected keys, applies the transform.
By default, this method:
calls directly
_apply_transform()
.does not call
_step()
or_call()
.
This method is not called within env.step at any point. However, is is called within
sample()
.Note
forward
also works with regular keyword arguments usingdispatch
to cast the args names to the keys.Examples
>>> class TransformThatMeasuresBytes(Transform): ... '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.''' ... def __init__(self): ... super().__init__(in_keys=[], out_keys=["bytes"]) ... ... def forward(self, tensordict: TensorDictBase) -> TensorDictBase: ... bytes_in_td = tensordict.bytes() ... tensordict["bytes"] = bytes ... return tensordict >>> t = TransformThatMeasuresBytes() >>> env = env.append_transform(t) # works within envs >>> t(TensorDict(a=0)) # Works offline too.
- class superpacman.superpacman.DistanceTransform(in_keys, out_keys, normalize=True, h=0.35)[source]
Bases:
ObservationTransform
manhattan distance transform from a 2d image The image must be binary, meaning only zeroes and ones in the image uses kornia.contrib distance_transform, which returns the manhattan distance
- forward(tensordict)[source]
Reads the input tensordict, and for the selected keys, applies the transform.
By default, this method:
calls directly
_apply_transform()
.does not call
_step()
or_call()
.
This method is not called within env.step at any point. However, is is called within
sample()
.Note
forward
also works with regular keyword arguments usingdispatch
to cast the args names to the keys.Examples
>>> class TransformThatMeasuresBytes(Transform): ... '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.''' ... def __init__(self): ... super().__init__(in_keys=[], out_keys=["bytes"]) ... ... def forward(self, tensordict: TensorDictBase) -> TensorDictBase: ... bytes_in_td = tensordict.bytes() ... tensordict["bytes"] = bytes ... return tensordict >>> t = TransformThatMeasuresBytes() >>> env = env.append_transform(t) # works within envs >>> t(TensorDict(a=0)) # Works offline too.
- class superpacman.superpacman.FlatTileTransform(in_keys, out_key='flat_obs')[source]
Bases:
ObservationTransform
takes all the input keys and outputs a single flat tensor in_keys: tensors to use out_keys: name of flat tensor, defautls to flat_obs
- forward(tensordict)[source]
Reads the input tensordict, and for the selected keys, applies the transform.
By default, this method:
calls directly
_apply_transform()
.does not call
_step()
or_call()
.
This method is not called within env.step at any point. However, is is called within
sample()
.Note
forward
also works with regular keyword arguments usingdispatch
to cast the args names to the keys.Examples
>>> class TransformThatMeasuresBytes(Transform): ... '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.''' ... def __init__(self): ... super().__init__(in_keys=[], out_keys=["bytes"]) ... ... def forward(self, tensordict: TensorDictBase) -> TensorDictBase: ... bytes_in_td = tensordict.bytes() ... tensordict["bytes"] = bytes ... return tensordict >>> t = TransformThatMeasuresBytes() >>> env = env.append_transform(t) # works within envs >>> t(TensorDict(a=0)) # Works offline too.
- class superpacman.superpacman.Ghost(value)[source]
Bases:
IntEnum
An enumeration.
- BLINKY = 0
- CLAUDE = 3
- INKY = 2
- PINKY = 1
- class superpacman.superpacman.GhostMode(value)[source]
Bases:
IntEnum
An enumeration.
- CHASE = 2
- FRIGHTENED = 3
- SCATTER = 1
- WAIT = 0
- class superpacman.superpacman.RGBFullObsTransform[source]
Bases:
ObservationTransform
Converts the state to a N, 3, H, W uint8 image tensor Adds it to the tensordict under the key [image]
- forward(tensordict)[source]
Reads the input tensordict, and for the selected keys, applies the transform.
By default, this method:
calls directly
_apply_transform()
.does not call
_step()
or_call()
.
This method is not called within env.step at any point. However, is is called within
sample()
.Note
forward
also works with regular keyword arguments usingdispatch
to cast the args names to the keys.Examples
>>> class TransformThatMeasuresBytes(Transform): ... '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.''' ... def __init__(self): ... super().__init__(in_keys=[], out_keys=["bytes"]) ... ... def forward(self, tensordict: TensorDictBase) -> TensorDictBase: ... bytes_in_td = tensordict.bytes() ... tensordict["bytes"] = bytes ... return tensordict >>> t = TransformThatMeasuresBytes() >>> env = env.append_transform(t) # works within envs >>> t(TensorDict(a=0)) # Works offline too.
- class superpacman.superpacman.RGBPartialObsTransform[source]
Bases:
ObservationTransform
Converts the state to a N, 3, H, W uint8 image tensor Adds it to the tensordict under the key [image]
- forward(tensordict)[source]
Reads the input tensordict, and for the selected keys, applies the transform.
By default, this method:
calls directly
_apply_transform()
.does not call
_step()
or_call()
.
This method is not called within env.step at any point. However, is is called within
sample()
.Note
forward
also works with regular keyword arguments usingdispatch
to cast the args names to the keys.Examples
>>> class TransformThatMeasuresBytes(Transform): ... '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.''' ... def __init__(self): ... super().__init__(in_keys=[], out_keys=["bytes"]) ... ... def forward(self, tensordict: TensorDictBase) -> TensorDictBase: ... bytes_in_td = tensordict.bytes() ... tensordict["bytes"] = bytes ... return tensordict >>> t = TransformThatMeasuresBytes() >>> env = env.append_transform(t) # works within envs >>> t(TensorDict(a=0)) # Works offline too.
- class superpacman.superpacman.StackTileTransform(in_keys, out_key='image')[source]
Bases:
ObservationTransform
stacks all the in_keys into an N, C, H, W tensor and outputs it under the out_key in_keys: must be N, H, W out_key: string, default “image”
- forward(tensordict)[source]
Reads the input tensordict, and for the selected keys, applies the transform.
By default, this method:
calls directly
_apply_transform()
.does not call
_step()
or_call()
.
This method is not called within env.step at any point. However, is is called within
sample()
.Note
forward
also works with regular keyword arguments usingdispatch
to cast the args names to the keys.Examples
>>> class TransformThatMeasuresBytes(Transform): ... '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.''' ... def __init__(self): ... super().__init__(in_keys=[], out_keys=["bytes"]) ... ... def forward(self, tensordict: TensorDictBase) -> TensorDictBase: ... bytes_in_td = tensordict.bytes() ... tensordict["bytes"] = bytes ... return tensordict >>> t = TransformThatMeasuresBytes() >>> env = env.append_transform(t) # works within envs >>> t(TensorDict(a=0)) # Works offline too.
- class superpacman.superpacman.SuperPacman(*args, **kwargs)[source]
Bases:
EnvBase
- batch_locked = False
- static gen_params(batch_size=None)
To change the layout of the gridworld, change these parameters
walls: 1 indicates the position of a wall. The boundary grid cells must have a wall. rewards: The amount of reward for entering the tile.
Rewards are only received the first time the agent enters the tile
terminal_states: Indicated by 1, when this tile is entered, the terminated flag is set true
- Parameters:
batch_size – the number of environments to run simultaneously
- Returns:
a batch_size tensordict, with the following entries
”player_pos”: N, 2 tensor indices that correspond to the players location “player_tiles”: N, 21, 21 tensor, with a single tile set to 1 that indicates player position “wall_tiles”: N, 21, 21 tensor, 1 indicates wall “reward_tiles”: N, 21, 21 tensor, rewards remaining in environment “energizer_tiles”: N, 21, 21 tensor, episode will terminate when tile with value True is entered
- metadata = {'render_fps': 30, 'render_modes': ['human', '']}
- superpacman.superpacman.gen_params(batch_size=None)[source]
To change the layout of the gridworld, change these parameters
walls: 1 indicates the position of a wall. The boundary grid cells must have a wall. rewards: The amount of reward for entering the tile.
Rewards are only received the first time the agent enters the tile
terminal_states: Indicated by 1, when this tile is entered, the terminated flag is set true
- Parameters:
batch_size – the number of environments to run simultaneously
- Returns:
a batch_size tensordict, with the following entries
”player_pos”: N, 2 tensor indices that correspond to the players location “player_tiles”: N, 21, 21 tensor, with a single tile set to 1 that indicates player position “wall_tiles”: N, 21, 21 tensor, 1 indicates wall “reward_tiles”: N, 21, 21 tensor, rewards remaining in environment “energizer_tiles”: N, 21, 21 tensor, episode will terminate when tile with value True is entered
- superpacman.superpacman.make_env(env_batch_size, obs_keys=None, device='cpu', ego_patch_radius=10, log_video=False, log_stats=True, seed=None, logger=None, max_steps=None)[source]
configures and returns a Superpacman environment, ready to use.
- Parameters:
env_batch_size – the number of environments N to run in parallel
obs_keys – a string or list of strings.. valid values are “image” => adds a (N, 9, 21, 21) tensor to the state tensordict in key [‘image’] “ego_image” => adds a (N, 9, ego_patch_radius * 2 + 1, ego_patch_radius * 2 + 1) tensor centered on pacman in key[‘ego_image’] “pixels” => adds a (N, 3, 21, 21) RGB image in key [‘pixels’] “ego_pixels” -> add a (N, 3, ego_patch_radius * 2 + 1, ego_patch_radius * 2 + 1) tensor centered on pacman in key[‘ego_pixels’]
device – device to use, “cuda” or “cpu”
ego_patch_radius – if “ego_image” or “ego_pixels”, then this set the visible radius around pacman
log_video – if true, the environment will enable logging RGB video to log… >>> env = make_env(9, log_video=True) >>> env.rollout(100) >>> env.recorder.dump(“hello”) by default logs will be created in logs/superpacman/videos/pacman_hello_0.mp4 set a logger to control where the logs will be set # note… logging video will increase memory utilization, so don’t use it in the training loop unless debugging
log_stats – adds [‘step_count’] and [‘episode_reward’] to the state dict, step_count being the total number
far (of steps on the trajectory.. episode_reward is set to the sum of reward recieved so)
eg (Make sure you use the "next" key to get episode_reward)
make_env (>>> env =)
env.rollout (>>> state =)
state[0][-1]['next' (>>>)
'episode_reward']
seed – random seed to use.. default None
logger
max_steps
Returns:
- superpacman.superpacman.pos_to_grid(pos, H, W, device='cpu', dtype=torch.float32)[source]
Converts positions to grid where 1 indicates a position :param pos: N, …, 2 tensor of grid positions (x = H, y = W) or 2 tensor :param H: height :param W: width :param: device: device :param: dtype: type of tensor :return: N, H, W tensor or single H, W tensor
superpacman.train_ppo module
- class superpacman.train_ppo.Policy(in_features, in_channels, hidden_dim, actions_n, batchnorm_momentum=0.1)[source]
Bases:
Module
Policy network for flat observation
- forward(flat_obs, image)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class superpacman.train_ppo.VGGConvBlock(in_channels, batchnorm_momentum=0.1)[source]
Bases:
Module
- forward(image)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class superpacman.train_ppo.Value(in_features, in_channels, hidden_dim, batchnorm_momentum=0.1)[source]
Bases:
Module
MLP value function
- forward(flat_obs, image)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.