Our tutorial will consist of four sections group into two parts. In the first part (first three sessions), we will present the state of the art in link discovery while in the second part (last session), we will get the participants to carry out link discovery tasks of growing complexity. Throughout the sessions, we will offer the possibility of asking questions and interacting with the presenters. The sessions will be as follows:
- Introduction to Link Discovery (15 min): In this section of the tutorial, we will present the basic link discovery problem as modelled in a large number of publications. Thereafter, we will present the two main challenges behind link discovery, i.e., the runtime and accuracy challenges. The goal here is to ensure that the audience understand the Link Discovery problem and the need for efficient and effective solutions.
- Efficient Algorithms for Link Discovery (45 min): In this section, we will focus on approaches for the efficient execution of Link Discovery approaches. We will especially focus on reduction-ratio-optimal link discovery while also providing insights into other types of approaches such as blocking. The main goal here will be to provide the audience with a taste for (1) the kind of similarity measures used in Link Discovery and (2) how they achieve good runtimes even on large datasets.
- Accuracy in Link Discovery (30 min): This section will focus on using machine learning to find accurate specifications given a Link Discovery problem. We will focus on genetic programming approaches as they have been implemented in several tools including SILK, LIMES and KnoFuss. We will present batch, active and unsupervised versions of the algorithm to ensure that the audience is familiarized with these concepts.
- Benchmarking Link Discovery (30 min): In this section, we will present means to evaluate the performance of a Link Discovery system. In particular, we will focus on existing measures and benchmarks. State-of-the-art results will be presented and current challenges to link discovery made clear.
- Hands-on Session (60 min): In the hands-on sessions, we will present the audience with tasks of growing difficulty to familiarize them with the declarative Link Discovery framework LIMES. We will begin with a simple task, i.e., linking with one property and one class. We will then include preprocessing, several properties, property paths and calls to the machine-learning approaches.