Over the past fifteen years, ontology-mediated query answering has grown into a very active research topic within the AI and database theory communities. While enriching data with an ontology offers many advantages (e.g. simplifying query formulation, integrating data from different sources, providing more complete answers to queries), it also renders the query answering task more computationally involved, spurring the development of new algorithmic techniques. The aim of this talk is to provide a gentle introduction to ontologies and ontology-mediated query answering, while also highlighting some recent results and research directions.
Meghyn Bienvenu is a CNRS researcher and member of the LaBRI laboratory at the University of Bordeaux. Born in Canada, she obtained her undergraduate degree from the University of Toronto before moving to France to continue her studies at the University of Toulouse. Her PhD thesis, defended in 2009, was awarded the AFIA Prize for best French dissertation in artificial intelligence. Her research interests span a range of topics in knowledge representation and reasoning and database theory, with a main focus on description logic ontologies and their use in querying data. She currently leads an ANR AI Chair on the topic of intelligent handling of imperfect data. Bienvenu is an associate editor of ACM Transactions on Computational Logic and will serve as PC co-chair for KR 2021, the leading conference on knowledge representation and reasoning. Her research has been recognized by an invited Early Career Spotlight talk at IJCAI’16, the world’s premier AI conference, and the 2016 CNRS Bronze Medal in the area of computer science.
Melanie Herschel: Tackling data quality issues: on getting in shape, playing them offense or defense, and analyzing results
Quality issues are omnipresent in data, the foundation of many business operations and any data analysis. Efforts to address these issues target different steps of the data analysis pipeline. To get data in best shape for further use, methods to avoid wrong data entry or their interpretation are crucial. Processing these data can introduce further quality issues, best to be recognized and rectified. Nevertheless, even with top-notch solutions to mitigate data quality issues, the final result should always be critically examined, which justifies techniques to better understand data-driven results.
In this talk, I will introduce, through a series of anecdotes, the importance and difficulty of data quality in various disciplines and highlight some of our recent contributions to address specific problems at different steps of the data analysis pipeline. The discussion will highlight that poor data quality is and will continue to be a persistent opponent in an increasingly data-driven world, but pushing the limits to improve it is worthwhile, for technical reasons and beyond.
Melanie Herschel is a full professor of Data Engineering at the University of Stuttgart. She was previously an associate professor at Université Paris Sud. In her early career, she was a member of the research staff of the Database Systems group at the University of Tübingen, at IBM Research – Almaden, and at the Hasso-Plattner-Institute Potsdam. She has also held a secondary appointment as Visiting Research Professor at the National University of Singapore. She obtained her PhD from Humboldt University Berlin in 2008. Her research interests in data management include data quality, data integration, meta-data management, and data exploration and analysis. She has participated in the organization of several conferences and workshops, notably as PC chair of EDBT 2019. She is an associate editor for the VLDB Journal and ACM/IMS Transactions on Data Science, and has regularly served as a reviewer for journals and conferences.
Data science is driven by a large number of data-related assets, such as datasets, algorithms, ML models, and processing systems. Although we all benefit from the latest results in data science, building a proper data science ecosystem requires a significant investment from organizations and individuals. As a result, only a few players can afford such investments, which leads to a small data science “world” dominating the latest technologies. This naturally causes lock-in effects and hinders features that require a flexible exchange of assets among users.
This talk presents Agora, our current effort to “democratise” data science. We are building a unified ecosystem that brings together data, algorithms, models, and computational resources and provides them to a broad audience. In particular, this talk presents the execution layer of Agora. I will talk about a series of works that allow us to: (i) leverage existing execution engines to run tasks efficiently; (ii) run tasks at different geo-distributed sites without violating the constraints/policies that every site might impose over the data; and (iii) securely execute tasks at large scale without any data and code (application logics) leakage.
I shall conclude this talk with a roadmap of open problems to achieve a world-wide data processing layer, thereby making a big step forward to fulfil our Agora vision.
Jorge Quiané is Principal Researcher at the DIMA group (TU Berlin) and Scientific Coordinators of the Berlin Institute for the Foundations of Learning and Data (BIFOLD). He is also Scientific Advisor at the IAM group (DFKI). Earlier in his career, he was Senior Scientist at the Qatar Computing Research Institute (QCRI) and Research Associate at Saarland University. Jorge’s research interests are in the broad area of scalable data management, including cross-platform data management and big data analytics. He has published numerous research papers on query and data processing as well as on novel system architectures. He also holds 5 patents in core database areas, such as join processing and data storage. He did his PhD in Computer Science at INRIA and University of Nantes, France. He received an M.Sc. in Computer Science with a speciality in Networks and Distributed Systems from Joseph Fourier University, Grenoble, France. He also obtained, with highest honours, an M.Sc. in Computer Science from the National Polytechnic Institute, Mexico.