A geometry-driven longitudinal topic model
Published in Harvard Data Science Review, 2021
A simple and scalable framework for longitudinal analysis of Twitter data is developed that combines latent topic models with computational geometric methods. Dimension reduction tools from computational geometry are applied to learn the intrinsic manifold on which the latent, temporal topics reside. Shortest path distances on the manifold are used to link together these topics. The proposed framework permits visualization of the low-dimensional embedding which provides clear interpretation of the complex, high-dimensional trajectories that may exist among latent topics. Practical application of the proposed framework is demonstrated through its ability to capture and effectively visualize natural progression of latent COVID-19 related topics learned from Twitter data. Interpretability of the trajectories is achieved by comparing to real-world events. In addition, the framework permits study of spatial variation in Twitter behavior for learned topics. The analysis demonstrates that the proposed framework is able to capture granular-level impact of COVID-19 on public discussions. We end by arguing that Twitter data, when analyzed within the proposed framework, can serve as a valuable supplementary data stream for COVID-related studies.