Using big data for improving two surveillance systems: influenza surveillance using Google flu-related search query data and probationers absconding surveillance using chronological case notes data




Liu, Jialiang


Journal Title

Journal ISSN

Volume Title



Abstract Title: Using big data for improving two surveillance systems: influenza surveillance using Google flu-related search query data and probationers absconding surveillance using chronological case notes data. Objectives: The overall goal of this dissertation is to explore the feasibility of using big data to create innovative strategies to improve two surveillance systems, influenza surveillance, and probationers absconding surveillance. Flu surveillance - Under the current gold standard for flu surveillance in the US conducted by the CDC, there is always a delay of up to three weeks between the occurrence of flu season onset and dissemination of this information. To this end, the first goal of this dissertation was to test an innovative strategy that applies a statistical detection algorithm to the near real-time seasonal flu activity data to predict the onset of flu season weeks prior to the flu season beginning. Probationers absconding surveillance - In the US legal system, probation is the most widely used alternative sanction to incarceration. However, there is a significant segment of probationers who fail to complete probation by absconding from supervision. Due to the limited financial resources and the increasing population of probationers, little effort has been made toward locating and examining these probation absconders. Our second goal was to explore words and phrases associated with probation absconders by applying natural language processing (NLP) techniques to official chronologic case notes written by probation officers. Methods: Flu surveillance - we applied the modified Bayesian online change point detection (BOCPD) algorithm to real-time flu activity data obtained from the AutoRegression with General Online (ARGO) data model. The ARGO model uses Google flu-related search query data and historical CDC flu activity data to estimate flu activity in a real-time fashion. We used change point detection methods on the ARGO data to predict the dates of flu season onset and compared them to those reported by the CDC from 2007 to 2015. In applying the BOCPD algorithm to the ARGO data, we developed systematic ways to satisfy the necessary assumptions of the BOCPD algorithm making it more robust and practical for flu surveillance, and we proposed a method to determine informative change points that may signal the onset of flu seasons. Probationer absconding surveillance - We applied a text regression method known as concise comparative summarization (CCS) method to text data generated from case notes of a random sample of adult misdemeanors and felony offenders who have received probation in Tarrant County, TX. Results and conclusions: Flu surveillance - Our strategy of flu surveillance exhibits a high accuracy of prediction with the proportion of correct prediction being 86%. Additionally, our strategy on average detected flu season onset three weeks prior to the official flu season onset. Probationer absconding surveillance - We found phrases such as "cannabinoids", "technical violations", "failed pay", and "transfer intake" to be associated with probation absconding. This suggests that probationers who had a history of using cannabinoids, violating probation conditions, failing to pay supervision fees during their probation periods, and those who were transfer cases tended more likely to be absconders. Meanwhile, phrases such as "everything going well", "travel", and "fees paid full" were found to be associated with probation completers. This implies that successful completers tended to have positive attitudes and willingness to share their personal life and feelings as well as having a stable income source to pay supervision fees. Currently, the case notes are kept only for record-keeping purposes. Our study identified previously unknown commonalities in the case notes of absconders and completers and may contribute to a new surveillance system that uses case notes systematically.