Outputs & Artifacts
All commands write their artifacts under outputs/
, organized by purpose and map. This page explains what to expect after running the Quickstart sequence and where to look when reproducing results.
Maps
The preview-maps
command creates a maps/
directory and, for each configured world, a folder named map_<rows>x<cols>_<hash>
. Inside you will find map.png
and abstraction.png
, rendered by core_rust.visualize_world_map
and core_rust.visualize_abstraction
. The file map_abstractability.csv
summarizes each map’s abstractability label computed by llm_abstraction.utils.classify.classify_abstraction
.
LLM Scoring and Benchmarking
The score-prompts
and benchmark-llm
commands write to llm_scoring/
. For every map, the best‑scoring abstraction and supporting data are saved under the map’s folder. The file <promptIndex>_<model>_out.json
records the raw and cleaned responses and their similarity scores. If you run MCTS via benchmark-llm
, two additional files appear: <promptIndex>_<model>_raw_results.csv
contains agent returns per budget and depth, and <promptIndex>_<model>_mcts_results.png
plots those curves along with the optimal‑return baseline from core_rust.max_returns
. A log_summary.csv
at the root aggregates high‑level run information.
Analysis
The analysis
command merges model‑based and planning metrics into ranking tables and produces a series of plots (violin, interaction, and gain curves) under outputs/analysis/
. It also writes AN(C)OVA summaries and Tukey HSD comparisons as CSVs to support statistical inspection.
Determinism
To improve reproducibility, keep simulation limits, depths, and the discount gamma
fixed and, where possible, set random seeds in runners. The combination of saved prompts, JSON clusterings, and CSV results should let you retrace and verify any figure from the Thesis.