# Example Authoring Guide

This guide defines the expected shape of a PythonInteractiveRobotics example.
Examples should feel like readable experiment notes, not framework plugins.

## Required Shape

Each user-facing example should provide:

- a short module docstring that names the loop and declares success / failure
- a tiny world or environment
- a tiny agent or policy
- a `run(...)` function that returns `Trace`
- a `main()` function for script execution
- a `--no-render` flag for headless tests
- a focused smoke test in `tests/test_examples_smoke.py`
- a README section in the matching `examples/<category>/README.md`

The core loop should be visible:

```python
obs = env.reset(seed=seed)
agent.reset()
trace = Trace()

for _ in range(max_steps):
    action = agent.act(obs)
    result = env.step(action)
    obs, reward, done, info = result.as_tuple()
    agent.update(obs, reward, info)
    trace.append(obs, action, reward, info)

    if render:
        env.render(agent, info)

    if done:
        break
```

## Docstring Header

The first thing a reader sees should be enough to understand the loop without
reading code. Every example module docstring should answer three questions:

1. **What is the loop?** One paragraph naming the perception-action cycle.
2. **What counts as success?** Concrete condition that flips
   `info["success"]` to `True`.
3. **What counts as failure?** Each `Failure.kind` the example can emit,
   labelled `(recoverable)` or `(terminal)`.

Example:

```python
"""Reactive obstacle avoidance with a fake lidar.

The agent reads a discrete lidar reading at every step, picks a direction
that keeps the closest obstacle outside the safety radius, and moves.

Success: robot reaches the goal cell before max_steps.
Failure: collision (recoverable - the agent retries from a safe cell) or
timeout (terminal).
"""
```

A short success/failure block lets `Trace.summary()` and the README captions
stay aligned with the runnable example.

## Failure Contract

Failures are part of the API. Do not hide them in print output.

Use `Failure` in `info["failure"]`:

```python
info["failure"] = Failure(
    "grasp_miss",
    "gripper closed without lifting the object",
    recoverable=True,
)
```

Common failure kinds:

- `collision`
- `blocked_path`
- `target_not_visible`
- `object_not_visible`
- `grasp_miss`
- `blocked_grasp`
- `suction_miss`
- `container_closed`
- `precondition_blocked`
- `model_error`
- `timeout`
- `invalid_action`

Recoverable failures should leave `done=False`. Terminal failures should
leave `done=True` and have `recoverable=False`. Use `timeout` as the
canonical terminal failure when the agent runs out of steps without
success.

The agent should usually update memory, belief, state, or plan after a
recoverable failure. A retry should be different from repeating the same
action blindly.

### New Failure Kind Checklist

When introducing a new `Failure.kind`:

- pick a snake_case name that names the *cause*, not the symptom
- check it does not duplicate an existing kind in the list above
- decide `recoverable=True` or `recoverable=False` and stay consistent
  across the example
- add a regression assertion in
  `tests/test_failure_contracts.py` if the kind has nontrivial recovery
  semantics
- document the kind in the example's docstring header and category README

## Loop Counters

Counters surfaced through `info` should:

- end with `_count` so `Trace.summary()` picks them up automatically
- be monotonic over the run
- represent a teaching event (retries, replans, recoveries, persistences),
  not raw step ticks

Conventional names already in the repo: `retry_count`, `replan_count`,
`recovery_count`, `miss_count`, `servo_steps`, `belief_updates`,
`avoidance_count`, `target_switches`, `info_gain_step_count`,
`recovery_count`, `memory_persistence_count`. Reuse these when the concept
matches.

## Visualization Checklist

Render internal state, not only motion. Show the teaching concept directly.

Useful visual elements:

- true pose or true object position
- noisy observation or visual token
- belief / memory / estimated state
- current goal
- planned path or predicted rollout
- executed trajectory
- failure event
- retry count
- action arrow or skill marker
- state machine state
- reward, cost, entropy, or model error when relevant

Every major README GIF should be generated by `scripts/make_gifs.py` from
the same runnable example. The GIF caption in the root README should name
the *internal* state that is being shown (belief, novelty, predicted
rollout, ...), not just the visible motion.

## Example README Section

Use this shape in `examples/<category>/README.md`. The section headings
are fixed so contributors can scan the file consistently:

````markdown
## `NN_example_name.py`

### What this teaches

One short paragraph naming the loop and the lesson. Mention the closest
existing example and how this one differs.

Success: ...
Failure: kind_a (recoverable), kind_b (terminal).

### Run

```bash
python examples/<category>/NN_example_name.py
```

### Key loop

```text
observe -> update belief -> act -> observe failure -> retry
```

### Simplifications

- short list of fake or simplified parts

### Things to try

- concrete modification ideas
````

The `Success:` and `Failure:` lines under "What this teaches" mirror the
docstring header so the README block stays in sync with the runnable
example.

## Smoke Test Pattern

Add a focused test in `tests/test_examples_smoke.py`:

```python
def test_new_example_runs_headless() -> None:
    module = load_example("examples/category/NN_new_example.py")

    trace = module.run(seed=0, render=False, max_steps=40)

    final = trace.infos[-1]
    assert final["success"] is True
    assert final["replan_count"] >= 1
    assert any(failure.kind == "blocked_path" for failure in trace.failures())
```

Smoke tests should assert the concept being taught, not the implementation.
Useful patterns:

- assert `info["success"] is True` for the golden seed
- assert a counter is at least the expected lower bound (use `>=`, not `==`)
- assert a recoverable `Failure.kind` appears in `trace.failures()`
- assert the loop terminates (`len(trace.actions) <= max_steps`)
- use `trace.summary()` when a test or README needs compact run-level
  statistics such as total reward, failure counts, or maximum `*_count`
  loop counters

## GIF Addition Pattern

If the example should appear in the root README:

1. Add a `make_<name>()` function to `scripts/make_gifs.py` that imports
   the example via `load_example(...)` and renders frames using the same
   draw code the example uses.
2. Register the maker in `MAKERS` with a short key.
3. Add the generated GIF to the matching category README **and** the root
   `README.md`.
4. Install contributor dependencies and run:

```bash
pip install -e ".[dev]"
python scripts/run_all_smoke_tests.py --gifs --check-gifs
```

The check verifies frame count and nonblank pixels for every generated
GIF, and the Markdown asset check verifies that every linked image
exists.

## Package Boundary

Keep agent code in the example. Keep environment code in the example *or*
in `pir/worlds/`. The package boundary should make examples easier to
read, not turn the repo into a framework.

Move a world from an example into `pir/worlds/` when:

- another example or adapter needs to import it
- the world has a clean `reset()` / `observe()` / `step()` surface
- moving it does not require lifting agent code

Patterns to follow: `pir/worlds/blocked_path.py` and
`pir/worlds/moving_obstacle.py`. Both expose `reset()`, `observe()`, and
a `step()` that returns `StepResult`, and both keep their `draw_*_scene`
helper next to the world so example modules stay focused on the agent.

`step()` should return `StepResult` so the unwrap pattern
`obs, reward, done, info = result.as_tuple()` works the same way across
the codebase.

## When To Add Shared Code

Keep logic local to the example first. Move code into `pir/` only when:

- the same idea appears naturally in at least three examples
- the shared code makes examples easier to read
- the abstraction does not hide the interaction loop

Small duplication is acceptable when it keeps examples readable. Two
examples that each inline the same helper are fine. By the time a third
or fourth example wants the same thing, extract.

The repo's first cross-cutting extraction is `pir/planning/`. Ten
examples shared a grid A* and a small `bfs_reachable_count` skeleton,
so they were lifted into `pir.planning.astar` and
`pir.planning.bfs_reachable_count`. The new examples should import
those rather than re-implement them:

```python
from pir.planning import astar, bfs_reachable_count

walkable = known_map != OCCUPIED   # bool 2D array per the example's own conventions
path = astar(walkable, start, goal)
```

The shared `astar` accepts an optional `edge_cost` (per-target-cell
shaping cost, useful for empowerment / IRL examples) and an optional
`blocked` set (extra cells to avoid even when walkable, useful for the
blocked-path recovery example). Stick to the canonical signature when
you write a new shaped planner instead of inventing a new variant.

## Contributor Checklist

Before opening a PR with a new example:

- [ ] docstring header declares loop, success condition, and each failure
      kind
- [ ] `run(seed, render, max_steps)` returns `Trace`
- [ ] `--no-render` flag exists on the CLI entry point
- [ ] `env.step(...)` returns `StepResult`
- [ ] new `Failure.kind`s (if any) follow the snake_case cause-naming rule
- [ ] new counters end with `_count`
- [ ] smoke test asserts the concept being taught
- [ ] category README block uses the fixed headings above
- [ ] root README and category README link the GIF if one was added
- [ ] `python scripts/run_all_smoke_tests.py --check-gifs` is green
