Digital Mammography Dataset


This digital mammography dataset includes data derived from a random sample of 20,000 digital and 20,000 film-screen mammograms performed between January 2005 and December 2008 from women in the Breast Cancer Surveillance Consortium. This dataset does not include images. Some women contribute multiple examinations to the data. The dataset may be useful to people interested in teaching data analysis, epidemiological study design, or statistical methods for binary outcomes or correlated data. The dataset includes the mammogram assessment, subsequent breast cancer diagnosis within one year, and participant characteristics previously shown to be associated with mammography performance including age, family history of breast cancer, breast density, use of hormone therapy, body mass index, history of biopsy, receipt of prior mammography, and presence of comparison films. See the Digital Mammography Dataset Documentation for more information about the variables included in the dataset.

These data are recommended for use as a teaching tool only; they should not be used to conduct primary research. View an example biostatistics data analysis exam question based on these data.

Acknowledge the BCSC

The following must be cited when using this dataset:

"Data collection and sharing was supported by the National Cancer Institute-funded Breast Cancer Surveillance Consortium (HHSN261201100031C). You can learn more about the BCSC at:"

Information about the BCSC may also be included in the methods section using language such as:

"Data for this study was obtained from the BCSC:"

Access the Data

Investigators can access this dataset by entering the information below and submitting a request for a download link for the dataset. The link and any future notices regarding data updates will be sent in an e-mail message to the address you provide. Once you receive the link, you may download the dataset.

Click here to download Digital Mammography Dataset