Collection Data – The SoundBox Collection

In order to support continued engagement with the SoundBox Collection, as well as demonstrate a commitment to open scholarship, we have made the datasets that underpin this website available for download. Please note that while attempts have been made to ensure the metadata is as accurate as possible, there is a possibility of errors and the dataset should be used with this in mind and be treated as preliminary.

If you have used one of our datasets in passing or substantial ways, or have any questions, we would love to hear from you! Please let us know here.

As we make this data public, we would like to thank all those who have worked on the SpokenWeb Project and UBCO SoundBox Collection. This includes SpokenWeb’s Principal Investigator Jason Camlot, the lead researchers and librarians/archivists at all the partner institutions, the governing board, collaborators and members of the SpokenWeb network, current and former students, as well as the numerous speakers whose voices appear on the tapes themselves: thank you.

Dataset Information

JSON Dataset

The JSON dataset is organized into the following metadata fields used by SWALLOW and developed by the SpokenWeb project:

ID
Version
Timestamp
Score
Creator names
Creator names search
Material designations
Physical compositions
Recording type
AV type
Cataloger name
Partner institution
Collection source collection
Source collection label
Collection contributing unit
Source collection url
Collection image url
Collection source collection ID
Persistent url
Item title
Item identifiers
Creators
Contributors
Material description
Digital description
Dates
Location
Note
Related works
Rights
Item title source
Item language
City
Item production context
Item title note
Contents
Performance date
Address
Venue
Item series title
Collection source collection description
Contributors names
Contributors names search
Playback mode
Author name
Content notes
Item subseries title
Publication date
Speaker name
Reader name
Production date
Performer name
Rights notes
Recordist name
Producer name

CSV Dataset

The CSV dataset is organized into the following narrower list of fields drawn from those used by SWALLOW and developed by the SpokenWeb project:

SWALLOW ID
Partner Institution
Source Collection
Item Title
Title Source
Language
Creators
Contributors
Related Works
Production Context
Contents
Performance Date
Material Designations
AV Type
Location

Dataset Downloads

DOWNLOAD CSV

DOWNLOAD JSON

To cite this dataset:

Shearer, Karis, Marjorie Mitchell, Jason Camlot, Tomasz Neugebauer, Francisco Breztebeita, Megan Butchart, and Sarah Cipes. SpokenWeb UBCO: SoundBox Collection. Version 1.0, University of British Columbia, March 2026.

Dataset Information

JSON Dataset

CSV Dataset

Dataset Downloads

To cite this dataset:

Find Us

Where

Contact