Gathered Human Language

To supplement our benchmark with human language instructions, we developed an Amazon Mechanical Turk task to collect human language instructions for each of our 300 tasks. More details are in the appendix of our paper.

All of the collected instructions can be found in this CSV, listed by task ID. To maintain a wide distribution of paraphrases and human noise in the dataset, we do not correct for spelling or grammar errors.