What is CoAuthor?
CoAuthor is a human-AI collaborative writing dataset that captures rich interaction between 63 writers and four instances of GPT-3 across 1445 writing sessions in English.
At each session, we provided writers with a prompt (black text) and an instance of GPT-3. They then wrote freely (brown), asked GPT-3 for suggestions (blue), accepted the suggestions or dismissed them, and edited the accepted suggestions or previous texts in any order they chose. All interactions between the writers and the system were recorded at the keystroke level (e.g. inserting text, moving the cursor, and receiving suggestions) with timestamps, so that we could replay them precisely. Using this rich, nuanced, and fine-grained interaction dataset, designers can examine the same sessions from multiple analytical perspectives to better understand the generative capabilities of large language models.
- Types of writing - Creative & Argumentative
- Creative writing: 830 stories written by 58 writers
- Argumentative writing: 615 essays written by 49 writers
- Writing prompts - WritingPrompts subreddit & The New York Times
- Writers - Qualified crowd workers from Amazon Mechanical Turk
- Interface - Text editor where writers can press the tab key to get five suggestions
- Stories and essays: 418 words long
- Number of queries: 11.8 queries per writing session
- Acceptance rate of suggestions: 72.3%
- Percentage of text written by humans: 72.6%
An example replay of a writing session in CoAuthor:
Take a look at more writing sessions (best viewed on desktop):
Download the dataset, metadata, and survey responses:
If you are familiar with Python, get started by downloading the dataset, reading writing sessions from files, and examining events:
More tutorials are coming up! Stay tuned!
Ask us questions at firstname.lastname@example.org!
- If you are interested in intelligent writing assistants: The First Workshop on Intelligent and Interactive Writing Assistants @ ACL 2022
- If you want to learn more about story writing: STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation
- If you want to see other keystroke-level datasets: Keystroke metrics & mouse movement datasets for NLP