In order to support continued engagement with the SoundBox Collection, as well as demonstrate a commitment to open scholarship, we have made the datasets that underpin this website available for download.
If you have used one of our datasets in passing or substantial ways, or have any questions, we would love to hear from you! Please let us know here.
As we make this data public, we would like to thank all those who have worked on the SpokenWeb Project and UBCO SoundBox Collection. This includes SpokenWeb’s Principal Investigator Jason Camlot, the lead researchers and librarians/archivists at all the partner institutions, the governing board, collaborators and members of the SpokenWeb network, current and former students, as well as the numerous speakers whose voices appear on the tapes themselves: thank you.
Dataset Information
JSON Dataset
The JSON dataset is organized into the following metadata fields used by SWALLOW and developed by the SpokenWeb project:
- ID
- Version
- Timestamp
- Score
- Creator names
- Creator names search
- Material designations
- Physical compositions
- Recording type
- AV type
- Cataloger name
- Partner institution
- Collection source collection
- Source collection label
- Collection contributing unit
- Source collection url
- Collection image url
- Collection source collection ID
- Persistent url
- Item title
- Item identifiers
- Creators
- Contributors
- Material description
- Digital description
- Dates
- Location
- Note
- Related works
- Rights
- Item title source
- Item language
- City
- Item production context
- Item title note
- Contents
- Performance date
- Address
- Venue
- Item series title
- Collection source collection description
- Contributors names
- Contributors names search
- Playback mode
- Author name
- Content notes
- Item subseries title
- Publication date
- Speaker name
- Reader name
- Production date
- Performer name
- Rights notes
- Recordist name
- Producer name
CSV Dataset
The CSV dataset is organized into the following narrower list of fields drawn from those used by SWALLOW and developed by the SpokenWeb project:
- SWALLOW ID
- Partner Institution
- Source Collection
- Item Title
- Title Source
- Language
- Creators
- Contributors
- Related Works
- Production Context
- Contents
- Performance Date
- Material Designations
- AV Type
- Location
Dataset Downloads
To cite this dataset:
Shearer, Karis, Marjorie Mitchell, Jason Camlot, Tomasz Neugebauer, Francisco Breztebeita, Megan Butchart, and Sarah Cipes. SpokenWeb UBCO: SoundBox Collection. Version 1.0, University of British Columbia, 2026.
