Skip to content

CLI Reference

The command‑line interface in main.py organizes all entry points for prompt preview, map rendering, scoring, benchmarking, and analysis. Each subcommand mirrors a function in llm_abstraction, so the CLI doubles as documentation for the Python API.

Commands

preview‑prompts

Preview the generated prompts for each configured map to verify instruction wording, contexts, and the inserted representation. - Example: ./env/bin/python main.py preview-prompts

preview‑maps

Render maps, compute abstractability labels, and save map.png and abstraction.png per map to outputs/maps/. - Example: ./env/bin/python main.py preview-maps

mcts

Run ground and ideal‑abstraction agents across a grid of simulation budgets and depths. Use the debug flag to print the MCTS tree for inspection. - Flags: -d/--debug - Example: ./env/bin/python main.py mcts -d

score‑prompts

Generate prompts, query the LLM, clean responses, and score clusterings with the bisimulation‑style similarity metric. - Required: -i/--index (one or more prompt indices) and -m/--model (one or more Ollama model names) - Optional: -d/--debug - Example: ./env/bin/python main.py score-prompts -i 0 1 -m deepseek-r1:7b llama3.1:8b

benchmark‑llm

Like score-prompts, but also run MCTS with the best‑scoring abstraction and write per‑map CSVs and plots. - Required: -i/--index, -m/--model - Optional: -g/--maps (one or more map hashes) and -d/--debug - Example: ./env/bin/python main.py benchmark-llm -i 0 -m deepseek-r1:7b -g map_3x3_36a9049c34

analysis

Aggregate prior runs into ranking tables, AN(C)OVA outputs, and violin/interaction plots under outputs/analysis/. - Example: ./env/bin/python main.py analysis