TreeScan™ is a free data mining software that implements the tree-based scan statistic, a data mining method that simultaneously looks for excess risk in any of a large number of individual cells in a database as well as in groups of closely related cells, adjusting for the multiple testing inherent in the large number of overlapping groups evaluated. Developed for disease surveillance, it can be used for the following types of problems:
- In pharmacovigilance, it can be used to simultaneously evaluating hundreds or thousands of potential adverse events and groups of adverse events, to determine if any one of them occur with higher probability among patients exposed to a particular pharmaceutical drug, device or vaccine, adjusting for the multiple tests inherent in the many adverse events evaluated.
- Also in pharmacovigilance, for a particular disease outcome such as liver failure, it can be used to simultaneously evaluate if it occurs with increased risk among people exposed any of hundreds of pharmaceutical drugs, or groups of related drugs, adjusting for the multiple testing inherent in the many drugs evaluated.
- In occupation disease surveillance, it can be used for a particular disease to evaluate whether certain occupations, or group of related occupations, are at higher risk to die from that disease.
It can also be used for data mining in other subject areas unrelated to disease surveillance or medicine.
Three key features of the tree-based scan statistic data mining method are:
- It will simultaneously look for an excess risk in any of a large number of cells in a database. This is what makes it a data mining method.
- It will not only evaluate single cells, but also overlapping groups of cells that are closely related to each other in a pre-defined tree structure. That is, it is not necessary to pre-specify the granularity of the analysis.
- The analysis is adjusted for the multiple testing inherent in the hundreds, thousands or millions of cells and overlapping cell groupings that are evaluated. When a 0.05 alpha level is used, this means that if the events occur randomly with equal risk in each cell, there is only a 5% probability of detecting a significant excess risk in any of the cells or cell grouping and there is a 95% probability that there will not be a single cell or cell grouping with a statistically significant excess risk.
Data Types and Probability Models
TreeScan uses either a Poisson-based probability model, where the number of events (or cases) in a cell is Poisson-distributed, according to a known underlying population at risk; or a binomial model, with 0/1 event data such as cases and controls. Both conditional or unconditional analyses can be performed. In a conditional analysis, the analysis is conditioned on the total number of cases observed.
Developers and Funders
The TreeScan™ software was developed by Martin Kulldorff together with Information Management Services Inc. Financial support for TreeScan has been received from:
- Agency for Health Research and Quality, Centers for Education and Research on Therapeutics
- National Institutes of Health, National Library of Medicine
- Food and Drug Administration, Center for Biologics Evaluation and Research, Mini-Sentinel Post-Licensure Rapid Immunization Safety Monitoring Program
- Alfred P. Sloan Foundation, through a grant to the Fund for Public Health in New York City
- CDC Foundation, through a grant to the Fund for Public Health in New York City
- Centers for Disease Control and Prevention, through ELC CARES grant NU50CK000517-01-09 to the New York City Department of Health and Mental Hygiene
Their financial support is greatly appreciated. The contents of TreeScan are the responsibility of the developer and do not necessarily reflect the official views of the funders.