Onto

Query





HOME
Registration
Course Plan
Readings
Participants
Venue
Accomodation
Contact
Organizers








Contact:
borsmose@ruc.dk

PhD Course: Concept Analysis and Concept Based Retrieval
Course Plan

Lectures: John Old -- Uta Priss -- Helge Dyvik -- Lars Kai Hansen

Monday,May 10

Morning session chair: Per Anker Jensen
9:00 - 9:45 Welcome & Brief presentations (1 slide, 5 minutes) by: Alessio Bosca, Torino Bonino Dario, Kendall Lister, Davide Martinenghi, Jaak Simm, Antoine Doucet
Break
9:55 - 10:40 Presentation by: Eija Airio, Pirkko Saatsi, Puay Leng Lee, Kean Huat Soon, Sotiris Rompas, Jesper Vinther Christensen
Coffee break
11:00 - 11:45 Presentation by: Break
Gunn Inger Lyse, Kadri Vider, Olatz Ansa, Jesper Matthiesen, Rasmus Knappe, Henrik Bulskov
11:55 - 12:40 Presentation by: Till Lech, Lone Bo Sisseck, Päivi Pasanen, Henrik Oxhammar, Frédéric Hallot, Paulo Gottgtroy
Lunch
Afternoon session chair: Jřrgen Fischer Nilsson
13:40 - 14:25 H. Dyvik: Meaning, semantic representations and translation
Break
14:35 - 15:20 U. Priss: Introduction to formal concept analysis
Coffee break
15:40 - 16:25 L. K. Hansen: Statistical aspects of context in multimedia: Representations
Break
16:35 -17:20 J. Old :The conceptual structure of Roget’s Thesaurus, a conceptual hierarchy

Tuesday,May 11

Morning session chair: Hanne Erdmann Thomsen
9:00 - 9:45 H. Dyvik: Sense individuation based on a parallel corpus
Break
9:55 - 10:40 J. Old:. Practical session 1: Web access to the database version of Roget’s Thesaurus
Coffee break
11:00 - 11:45 L. K. Hansen: The independent context hypothesis
Break
11:55 - 12:40 U. Priss: Practical 1: Formal concept analysis examples
Lunch
Afternoon session chair: Bodil Nistrup Madsen
13:40 - 14:25 H. Dyvik: Deriving semantic wordnets from a parallel corpus
Break
14:35 - 15:20 J. Old: Visualization of semantic/conceptual data
Coffee break
15:40 - 16:25 L. K. Hansen: Text classification
Break
16:35 -17:20 U. Priss: Linguistic applications of formal concept analysis

Wednesday,May 12

Morning session chair: Hanne Erdmann Thomsen
9:00 - 9:45 H. Dyvik: A toy semantic field
Break
9:55 - 10:40 J.Old: Practical ses. 2:Generation and visualization of semnatic/conceptual data
Coffee break
11:00 - 11:45 L. K. Hansen: Networks
Break
11:55 - 12:40 U. Priss: A semiotic-conceptual framework for ontologies
Lunch
Afternoon session chair: Bodil Nistrup Madsen
13:40 - 14:25 H. Dyvik: Evaluating the results; future perspectives
Break
14:35 - 15:20 J. Old
Coffee break
15:40 - 16:25 L. K. Hansen: Review and perspectives
Break
16:35 -17:20 U. Priss:

19:00 Dinner at restaurant La Buca

Thursday,May 13

Morning session chair: Per Anker Jensen
9:30 - 10:30 Presentation of OntoQuery
Coffee break
11:00 - 12:30 Panel session with course teachers:
Summary, loose ends and cross-breeds; discussion with audience.
Lunch
13:30 - ca. 15 Closed project meeting

John Old, School of Computing, Napier University

Slides: Lecture 1, Practical 1, Lecture 2, Practical 2, Lecture 3

Lecture One:
The conceptual structure of Roget's Thesaurus, a conceptual hierarchy

This lecture will discuss the explicit organization of Roget's Thesaurus (conceptual hierarchy/ontology; body of the text), and the internal structures (synonymy versus polysemy; lexical (word) and conceptual (sense) relations; and semantic neighbourhoods). The thesaurus structure will be contrasted with dictionaries, WordNet (which was derived from Roget's Thesaurus), and word-association data.

Practical Session One:

Web access to the database version of Roget's Thesaurus Client-side Software:

Lecture Two Visualization of semantic/conceptual data

This lecture will focus on the visualization of hidden patterns and structures within Roget's Thesaurus. Using Bryan's model of abstract thesauri, one thread of analysis of implicit patterns in the semantics of Roget's Thesaurus will be explored to illustrate the use of graphs, lattices and algorithmic constraints in support of semantic analysis.

Practical Session Two:
Generation and visualization of semantic/conceptual data

Readings/Materials:

The conceptual structure of Roget's Thesaurus, a conceptual hierarchy
Web access to the database version of Roget's Thesaurus: IR and browsing
Client-side Software: Conexp and Pajek

Generation and visualization of semantic/conceptual data
Lattices (Conexp/Anaconda) Graphs, (Pajek), and Roget 2000: http://ella.slis.indiana.edu/~jold/Roget2000/graphics.htm

--------------

Uta Priss, School of Computing, Napier University

Slides: Lecture 1, Lecture 2, Lecture 3, Lecture 4

Lecture 1: Introduction to formal concept analysis

Formal concept analysis is a theory of data analysis which identifies conceptual structures among data sets. It was introduced by Rudolf Wille in 1982 and has since then grown rapidly. Its method of formal data analysis has successfully been applied to many fields, such as medicine, psychology, musicology, linguistic databases, library and information science, software re-engineering, civil engineering, ecology, and others. A feature of formal concept analysis is its capability of producing graphical visualizations of the inherent structures among data. This lecture will provide an introduction to the theory underlying formal concept analysis and show examples of its applications.

Formal Concept Analysis Homepage at http://www.upriss.org.uk/fca/fca.html

Practical 1: Formal concept analysis examples

In this practical session, examples of formal concept analysis will be explored. The participants are asked to bring laptops with them to this session if possible. A CD with formal concept analysis software and examples will be distributed.

Lecture 2: Linguistic applications of formal concept analysis

Modelling and storage of lexical information is becoming increasingly important for natural language processing tasks. This causes a growing need for detailed lexical databases, which should preferably be automatically constructed. This lecture describes the role that formal concept analysis can play in the automated or semi-automated construction of lexical databases from corpora. Formal concept analysis can be used to model hierarchical components of lexical databases, such as hyponymy or type hierarchies. Other semantic relations, such as meronymy, will also be discussed in this lecture. No clear distinction between lexical databases and ontologies will be made as the structures and problems are similar among these two. Thus the structures discussed in this lecture also apply to ontology engineering.

Lecture 3: Ontology Visualisation

Ontologies and lexical databases usually contain large hierarchies or networks. Visualisations of such structures are useful for all stages of ontology development: from the design stages to the end-user stages. But in general, it is a challenge to visualise large hierarchies or networks. This lecture discusses different visualisation techniques and provides examples of applications.

Lecture 4: A semiotic-conceptual framework for ontologies

Formal concept analysis is contrasted with other existing conceptual formalisms, such as John Sowa's Conceptual Structures. But a more general approach to knowledge representation should not stop at the conceptual level, and instead should also consider aspects related to the naming of concepts. A semiotic-conceptual framework, which combines conceptual and sign-related aspects, provides insights into how ontologies evolve and are shaped and interacted with by their human users. It is argued in this lecture, that if such semiotic aspects are overlooked in ontology design, then phenomena, such as namespaces and identities, become difficult to manage.

Readings/Materials:

Lecture 1: Introduction to formal concept analysis
Wolff, Karl Erich (1994). A first Course in Formal Concept Analysis. Proceedings SoftStat'93. Gustav Fischer Verlag. Ganter, Bernhard; Wille, Rudolf (1997). Applied Lattice Theory: Formal Concept Analysis. (This introduction is only for people who are interested in the mathematical details of FCA.)

Lecture 2: Linguistic applications of formal concept analysis
Priss, Uta. Linguistic Applications of Formal Concept Analysis. Proceedings of ICFCA 2003. (to appear)

Lecture 3: Ontology Visualisation
The first seven pages in the paper by Fluit; Sabou; van Harmelen (2003). Supporting User Tasks through Visualisation of Light-weight Ontologies In: Staab; Studer (Eds.). Handbook on Ontologies in Information Systems, Springer-Verlag.

Lecture 4: A semiotic-conceptual framework for ontologies
Conceptual Graphs website

-----------------------------------------

Helge Dyvik, Section for Linguistic Studies, University of Bergen

Semantic Mirrors: Thesaurus Derivation from Parallel Corpora

Slides: Dyvik's Slides

The lectures will present a method for the automatic extraction of wordnet-type information from translational data. The basic insight behind the method is that much information about the semantic relations among the words in a language resides in the way in which the sets of their possible translations into some other language overlap. Therefore, taking the translational relation as a given, languages can serve as each other's "semantic mirrors". The implemented method takes words with their sets of possible translations in the corpus as input and outputs complex lattice structures displaying relations of hyperonymy/hyponymy and near-synonymy among word senses in each language. Thesaurus-like entries can then be derived from these lattice structures.
In addition to presenting and demonstrating the method itself, the lectures will discuss the relationship between translation and semantics. Furthermore, we will look at a method for automatic word alignment of a parallel corpus, and compare thesaurus entries derived from automatically aligned data with entries derived from manually aligned data. Evaluation will also be discussed.

--------------

Readings/Materials:

Another approach to the use of translational data for semantics:
Nancy Ide, Tomas Erjavec & Dan Tufis (2002): Sense Discrimination with Parallel Corpora. Proceedings of ACL'02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, Philadelphia, 54-60.

A presentation of the "Mirrors" approach
Helge Dyvik: Translations as a Semantic Knowledge Source. (2003)

A web-based demonstrator of the "Mirrors" method: http://ling.uib.no/~helge/mirrwebguide.html


Lars Kai Hansen, Informatics and Mathematical Modelling, Technical University of Denmark

Slides: Lecture 1, Lecture 2, Lecture 3


Statistical aspects of context in multi-media

The purpose is to give an overview of statistical approaches to context representation and detection in multi-media: text, images, sound, networks.

Content:

  • Latent semantic analysis
  • The independent context hypothesis
  • Hierarchical clustering
  • Web and other networks
  • Search engine tools and methods
  • Futuristic extrapolations: A vision for the information sciences

Lecture 1. Representations

  • Data vs Features
  • The curse of dimensionality
  • Text representations: Bag-of-words, language features
  • MPEG (moving pictures expert group)
  • Naive Bayes methods
  • Latent semantic analysis (Principal components)

Lecture 2. The independent context hypothesis

  • Supervised vs unsupervised learning
  • Independent component analysis (ICA): the concept
  • Statistical approaches to ICA
  • Examples

Lecture 3. Text classification

  • Statistical approaches: Clustering
  • Unsupervised: Hierarchical clustering
  • Supervised: Artificial neural networks
  • Examples

Lecture 4. Networks

  • Search engines
  • Google's pagerank
  • Authorities and hubs
  • Social networks -power laws on the web

Lecture 5. Review and Perspectives

  • Review of lecture 1-4.
  • Future: Human constraints in multi-modal search
  • Future: Deep search

Readings/Material

Latent Semantic Analysis
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990): Indexing by latent semantic analysis, Journal of the American Society for Information Science, 41(6), 391-407.

MPEG Moving pictures expert group
http://www.chiariglione.org/mpeg/

Independent component analysis
J. Larsen, L. K. Hansen, T. Kolenda, F. Å . Nielsen : Independent Component Analysis in Multimedia Modeling

ICA Matlab toolbox with text demo's
http://mole.imm.dtu.dk/toolbox/ica/index.html

PageRank
http://citeseer.nj.nec.com/page98pagerank.html

Deep web
www.brightplanet.com/technology/deepweb.asp