Example Data Analysis Exam Question


The Breast Cancer Surveillance Consortium (BCSC) collects longitudinal data on mammography examinations and breast cancer outcomes. The BCSC receives data from community mammography facilities. Data from mammography facilities consist of patient characteristics including age, family history of breast cancer, use of hormone therapy, and mammography characteristics such as the woman's breast density and the radiologist's assessments and recommendations. Data from mammography facilities are linked to cancer registries that collect data on all cancer cases occurring in the region. Thus if a woman is diagnosed with breast cancer following her mammogram, this information is available to the BCSC.

One of the objectives of the BCSC is to evaluate screening mammography accuracy. Screening mammography accuracy is assessed using the following measures: the proportion of women with cancers that are successfully detected by mammography (sensitivity); the proportion of women without cancer who are correctly identified as cancer-free by mammography (specificity); and the proportion of women who are recalled for additional evaluation who actually have cancer (positive predictive value). In studies of mammography, typically any cancer diagnosed within one year after a screening mammogram is considered to have been detectable at the time of the screening examination. Thus all cancers diagnosed within a year of the screening mammogram are used in assessing mammography accuracy. Previous studies have shown that accuracy varies depending on patient characteristics including, age, breast density, menopausal status, and availability of prior mammograms for comparison.

Radiologists' assessments of a mammogram are coded using the Breast Imaging Reporting and Data Systems (BI-RADS) scale. The assessment scores a radiologist can give a mammogram are:

  • 1: Negative, no significant abnormality to report
  • 2: Benign finding, no cancer is present but a non-cancerous abnormality such as a lymph node has been identified
  • 3: Probably benign, cancer is unlikely to be present but an abnormality has been identified that should be examined carefully at future screening examinations
  • 0: Needs additional imaging, a possibly cancerous abnormality may have been identified and needs further evaluation with additional tests
  • 4: Suspicious abnormality, an abnormality has been identified that the radiologist believes may be cancer
  • 5: Highly suggestive of malignancy, an abnormality has been identified that the radiologist believes is likely to be cancer

Codes of 1, 2, or 3 indicate that a mammogram is normal or negative and codes of 0, 4, or 5 indicate that the mammogram is abnormal or positive. A positive mammography assessment is also called a "recall" and results in the woman being contacted by the radiology facility for diagnostic evaluation including additional imaging and possibly biopsy.

The companies that manufacture mammography machines have been working to develop new technologies that will improve mammography accuracy. Over the past several years, they have introduced digital mammography machines, which have been replacing the traditional film mammography machines. Digital mammography has been shown to have superior accuracy to film-screen mammography for younger women and those with dense breasts. However, there is some evidence that digital mammography may not perform as well as film in older women. Questions remain as to how this new technology performs in women over 60 years of age.

Breast tissue is composed of fat and fibroglandular tissue. Fat is largely transparent to x-rays so tumors occurring in fatty tissue are easily visualized with mammography. Fibroglandular tissue is opaque to x-rays, making it more difficult to determine if a tumor is present in a breast with a higher proportion of fibroglandular tissue. Breasts with a higher proportion of fibroglandular tissue are termed "dense" breasts. Women with dense breasts are also more likely to have breast cancer. Because digital mammography is believed to perform better than film for women with dense breasts, these women may have been selectively seeking out digital mammography or may have been triaged to receive digital mammography by radiology facilities with both types of machines. As a result, breast density is associated with receipt of digital mammography as well as cancer risk and positive mammography results.

Analysis Question

Using data from the BCSC on screening mammograms and subsequent cancer diagnoses from 2005-2008 for women age 60-89 years you will assess the accuracy of screening mammography and patient, mammography, and cancer characteristics that may be associated with accuracy. The variables in this dataset were measured at each mammogram visit (with some missing data). Missing values are due either to a woman failing to answer a question or the mammography facility failing to report this data to the BCSC. Women are recommended to undergo mammography screening annually or biennially, so some women have multiple mammograms included in the data set.

Specific questions to answer

  1. Conduct a descriptive analysis of radiologists' assessments on film and digital mammography examinations in a way that informs whether and how the three measures of mammography accuracy differ by age and mammography type.
  2. Compare the three measures of mammography accuracy for digital and film mammography. Does the relative accuracy of digital mammography vary by age compared to film, and, if so, by how much?
  3. Is there evidence that accuracy differs for DCIS compared to invasive cancer? Does the relative accuracy of digital and film mammography differ depending on whether the cancer is DCIS or invasive, and, if so, by how much?
  4. Construct a model to predict occurrence of any cancer using patient characteristics and mammography assessment. Which patient characteristics provide additional information beyond mammography assessment?