Digital Mammography Dataset


This digital mammography dataset includes data derived from a random sample of 20,000 digital and 20,000 film-screen mammograms performed between January 2005 and December 2008 from women in the Breast Cancer Surveillance Consortium. This dataset does not include images. Some women contribute multiple examinations to the data. The dataset may be useful to people interested in teaching data analysis, epidemiological study design, or statistical methods for binary outcomes or correlated data. The dataset includes the mammogram assessment, subsequent breast cancer diagnosis within one year, and participant characteristics previously shown to be associated with mammography performance including age, family history of breast cancer, breast density, use of hormone therapy, body mass index, history of biopsy, receipt of prior mammography, and presence of comparison films.

These data are recommended for use as a teaching tool only; they should not be used to conduct primary research. View an example biostatistics data analysis exam question based on these data.



See the Digital Mammography Dataset Documentation for more information about the variables included in the dataset.


Acknowledge the BCSC

The following must be cited when reproducing this data:

"The Breast Cancer Surveillance Consortium and its data collection and sharing activities are funded by the National Cancer Institute (P01CA154292). Downloaded xx/xx/xxxx from the Breast Cancer Surveillance Consortium Web site -"

Please acknowledge the BCSC:

“We thank the participating women, mammography facilities, and radiologists for the data they have provided. You can learn more about the BCSC at:"

Information about the BCSC may also be included in the methods section using language such as:

"Data for this study was obtained from the BCSC:"



Investigators can access this dataset by submitting the information below.

Captcha Code