Digital Mammography Dataset Documentation

This digital mammography dataset includes information from 20,000 digital and 20,000 film screening mammograms performed between January 2005 and December 2008 from women included in the Breast Cancer Surveillance Consortium. Some women contribute more than one examination to the dataset. These data are recommended only for use in teaching data analysis or epidemiological concepts. Because the data represent only a small sample of mammography data available from BCSC they should not be used to conduct primary research. 

Variable Name Description coding
age_c     patient's age in years at time of mammogram Numerical
assess_c     Radiologist's assessment based on the BI-RADS scale 0 = Needs additional imaging
1 = Negative
2 = Benign finding(s)
3 = Probably benign
4 = Suspicious abnormality
5 = Highly suggestive of malignancy
cancer_c     binary indicator of cancer diagnosis within one year of screening mammogram     0 = no cancer diagnosis
1 = cancer diagnosis
compfilm_c     comparison mammogram from prior mammography examination available     0 = no
1 = yes
9 = missing
density_c     patient's BI-RADS breast density as recorded at time of mammogram     1 = Almost entirely fatty
2 = Scattered fibroglandular densities
3 = Heterogeneously dense
4 = Extremely dense
family history of breast cancer in a first degree relative    
0 = no
1 = yes
9 = missing
hrt_c     current use of hormone therapy at time of mammogram     0 = no
1 = yes
9 = missing
prvmam_c     binary indicator of whether the woman had ever received a prior mammogram     0 = no
1 = yes
9 = missing
biophx_c     history of breast biopsy     0 = no
1 = yes
9 = missing
mammtype     film or digital mammogram     1 = film mammogram
2 = digital mammogram
CaTypeO     cancer type     1 = ductal carcinoma in situ
2 = invasive cancer
8 = no cancer diagnosis
bmi_c     body mass index at time of mammogram     Numerical or -99 if missing
ptid     patient's study id  

The following must be cited when using this dataset:

"Data collection and sharing was supported by the National Cancer Institute-funded Breast Cancer Surveillance Consortium (HHSN261201100031C). You can learn more about the BCSC at:"

Information about the BCSC may also be included in the methods section using language such as:

"Data for this study was obtained from the BCSC:"