University of Paris Seminar Series on Data Analytics
in collaboration with the diNo group

Invited Seminar Talk

Next-Generation of Big Multi-Dimensional Data Analytics
Prof John Paparrizos, The Ohio State University (USA)

when: 15 December 2022, 2pm
where: online (email the organizer for connection details), and in-person:
room Turing Conseil, 7th floor, Paris Descartes University, 45 Rue Des Saints Peres, Paris 75006


Today, automated processes, Internet‑of‑Things deployments, and Web and mobile applications generate an overwhelming amount of high‑dimensional data. Meanwhile, computational resources remain limited, and advances in machine learning (ML) create a pressing need to support increasingly complex analytical tasks. Unfortunately, traditional techniques for data analytics offer limited support for high‑dimensional data, ML tasks, and adaptation to data properties, often resulting in reduced performance. In my work, I ask how we can address the lack of task‑aware and data‑driven adaptations in data analytics and ML methods. Specifically, I will discuss solutions for computational methods to exploit similarities, shapes, densities, and distributions in data. Motivated by the ubiquity of high-dimensional sequential data (i.e., time series), I will present a method for similarity-preserving representation learning to accelerate various forms of analytics. Then, I will discuss a variance-aware quantization method for indexing high-dimensional data that enables similarity search queries at scale. In both examples, the proposed methods substantially improve accuracy and runtime performance (i.e., higher task accuracy with reduced operational costs), demonstrating the benefit of designing task-aware and data-driven solutions for large-scale data science applications.

Short Bio

John Paparrizos is an assistant professor at The Ohio State University, leading The Datum Lab. His research in data analytics, data science, and applied machine learning focuses on adaptive solutions for managing and analyzing structured and unstructured data to enable the next generation of data-intensive and machine-learning applications. His doctoral work was recognized at the 2019 ACM SIGKDD Doctoral Dissertation Award competition. He is a recipient of multiple other distinctions, including the inaugural ACM SIGMOD Research Highlight Award and a NetApp Faculty Award. John's work has been featured in popular media outlets, including The New York Times, Washington Post, Guardian, and MIT Technology Review. His ideas have been widely adopted across scientific areas, Fortune 100-500 companies (e.g., Exelon, Nokia, and many financial firms), and organizations such as the European Space Agency. John completed his Ph.D. at Columbia University and received postdoctoral training at the University of Chicago.

Hosted by: Themis Palpanas

List of past seminars