HOME
Registration
Course Plan
Readings
Participants
Venue
Accomodation
Contact
Organizers
Contact:
borsmose@ruc.dk
|
PhD Course: Concept Analysis and Concept
Based Retrieval
Course Plan
Lectures: John Old -- Uta Priss
-- Helge Dyvik -- Lars Kai Hansen
Monday,May 10
Morning session chair: Per Anker Jensen
9:00 - 9:45 Welcome & Brief presentations (1 slide, 5 minutes) by:
Alessio Bosca, Torino Bonino Dario, Kendall Lister, Davide Martinenghi,
Jaak Simm, Antoine Doucet
Break
9:55 - 10:40 Presentation by:
Eija Airio,
Pirkko Saatsi,
Puay Leng Lee,
Kean Huat Soon,
Sotiris Rompas,
Jesper Vinther Christensen
Coffee break
11:00 - 11:45 Presentation by:
Break
Gunn Inger Lyse,
Kadri Vider,
Olatz Ansa,
Jesper Matthiesen,
Rasmus Knappe,
Henrik Bulskov
11:55 - 12:40 Presentation by:
Till Lech,
Lone Bo Sisseck,
Päivi Pasanen,
Henrik Oxhammar,
Frédéric Hallot,
Paulo Gottgtroy
Lunch
Afternoon session chair: Jřrgen Fischer Nilsson
13:40 - 14:25 H. Dyvik: Meaning, semantic representations and translation
Break
14:35 - 15:20 U. Priss: Introduction to formal concept analysis
Coffee break
15:40 - 16:25 L. K. Hansen: Statistical aspects of context in multimedia:
Representations
Break
16:35 -17:20 J. Old :The conceptual structure of Roget’s Thesaurus,
a conceptual hierarchy
Tuesday,May 11
Morning session chair: Hanne Erdmann Thomsen
9:00 - 9:45 H. Dyvik: Sense individuation based on
a parallel corpus
Break
9:55 - 10:40 J. Old:. Practical session 1: Web access to the database
version of Roget’s Thesaurus
Coffee break
11:00 - 11:45 L. K. Hansen: The independent context hypothesis
Break
11:55 - 12:40 U. Priss: Practical 1: Formal concept analysis examples
Lunch
Afternoon session chair: Bodil Nistrup Madsen
13:40 - 14:25 H. Dyvik: Deriving semantic wordnets from a parallel
corpus
Break
14:35 - 15:20 J. Old: Visualization of semantic/conceptual data
Coffee break
15:40 - 16:25 L. K. Hansen: Text classification
Break
16:35 -17:20 U. Priss: Linguistic applications of formal concept analysis
Wednesday,May 12
Morning session chair: Hanne Erdmann Thomsen
9:00 - 9:45 H. Dyvik: A toy semantic field
Break
9:55 - 10:40 J.Old: Practical ses. 2:Generation and visualization
of semnatic/conceptual data
Coffee break
11:00 - 11:45 L. K. Hansen: Networks
Break
11:55 - 12:40 U. Priss: A semiotic-conceptual framework for ontologies
Lunch
Afternoon session chair: Bodil Nistrup Madsen 13:40 - 14:25 H. Dyvik: Evaluating the results; future perspectives
Break
14:35 - 15:20 J. Old
Coffee break
15:40 - 16:25 L. K. Hansen: Review and perspectives
Break
16:35 -17:20 U. Priss:
19:00 Dinner at restaurant La Buca
Thursday,May 13
Morning session chair: Per Anker Jensen
9:30 - 10:30 Presentation of OntoQuery
Coffee break
11:00 - 12:30 Panel session with course teachers:
Summary, loose ends and cross-breeds; discussion with audience.
Lunch
13:30 - ca. 15 Closed project meeting
John Old, School of Computing, Napier University
Slides: Lecture
1, Practical
1, Lecture
2, Practical
2, Lecture
3
Lecture One:
The conceptual structure of Roget's Thesaurus, a conceptual hierarchy
This lecture will discuss the explicit organization of Roget's Thesaurus
(conceptual hierarchy/ontology; body of the text), and the internal structures
(synonymy versus polysemy; lexical (word) and conceptual (sense) relations;
and semantic neighbourhoods). The thesaurus structure will be contrasted
with dictionaries, WordNet (which was derived from Roget's Thesaurus),
and word-association data.
Practical Session One:
Web access to the database version of Roget's Thesaurus
Client-side Software:
Lecture Two Visualization of semantic/conceptual data
This lecture will focus on the visualization of hidden patterns and structures
within Roget's Thesaurus. Using Bryan's model of abstract thesauri, one
thread of analysis of implicit patterns in the semantics of Roget's Thesaurus
will be explored to illustrate the use of graphs, lattices and algorithmic
constraints in support of semantic analysis.
Practical Session Two:
Generation and visualization of semantic/conceptual data
Readings/Materials:
The conceptual structure of Roget's Thesaurus, a conceptual hierarchy
Web access to the database version of Roget's Thesaurus:
IR and browsing
Client-side Software: Conexp
and Pajek
Generation and visualization of semantic/conceptual data
Lattices (Conexp/Anaconda)
Graphs, (Pajek), and Roget 2000:
http://ella.slis.indiana.edu/~jold/Roget2000/graphics.htm
--------------
Uta Priss, School of Computing, Napier University
Slides: Lecture
1, Lecture
2, Lecture
3, Lecture
4
Lecture 1: Introduction to formal concept analysis
Formal concept analysis is a theory of data analysis which identifies
conceptual structures among data sets. It was introduced by Rudolf Wille
in 1982 and has since then grown rapidly. Its method of formal data analysis
has successfully been applied to many fields, such as medicine, psychology,
musicology, linguistic databases, library and information science, software
re-engineering, civil engineering, ecology, and others. A feature of formal
concept analysis is its capability of producing graphical visualizations
of the inherent structures among data. This lecture will provide an introduction
to the theory underlying formal concept analysis and show examples of
its applications.
Formal Concept Analysis Homepage at http://www.upriss.org.uk/fca/fca.html
Practical 1: Formal concept analysis examples
In this practical session, examples of formal concept analysis will be explored.
The participants are asked to bring laptops with them to this session if
possible. A CD with formal concept analysis software and examples will be
distributed. Lecture 2: Linguistic applications of formal concept
analysis
Modelling and storage of lexical information is becoming increasingly
important for natural language processing tasks. This causes a growing
need for detailed lexical databases, which should preferably be automatically
constructed. This lecture describes the role that formal concept analysis
can play in the automated or semi-automated construction of lexical databases
from corpora. Formal concept analysis can be used to model hierarchical
components of lexical databases, such as hyponymy or type hierarchies.
Other semantic relations, such as meronymy, will also be discussed in
this lecture. No clear distinction between lexical databases and ontologies
will be made as the structures and problems are similar among these two.
Thus the structures discussed in this lecture also apply to ontology engineering.
Lecture 3: Ontology Visualisation
Ontologies and lexical databases usually contain large hierarchies
or networks. Visualisations of such structures are useful for
all stages of ontology development: from the design stages to the
end-user stages. But in general, it is a challenge to visualise
large hierarchies or networks. This lecture discusses different
visualisation techniques and provides examples of applications.
Lecture 4: A semiotic-conceptual framework for ontologies
Formal concept analysis is contrasted with other existing conceptual
formalisms, such as John Sowa's Conceptual Structures. But a more general
approach to knowledge representation should not stop at the conceptual
level, and instead should also consider aspects related to the naming
of concepts. A semiotic-conceptual framework, which combines conceptual
and sign-related aspects, provides insights into how ontologies evolve
and are shaped and interacted with by their human users. It is argued
in this lecture, that if such semiotic aspects are overlooked in ontology
design, then phenomena, such as namespaces and identities, become difficult
to manage.
Readings/Materials:
Lecture 1: Introduction to formal concept analysis
Wolff, Karl Erich (1994).
A first Course in Formal Concept Analysis.
Proceedings SoftStat'93. Gustav Fischer Verlag.
Ganter, Bernhard; Wille, Rudolf (1997).
Applied Lattice Theory: Formal Concept Analysis.
(This introduction is only for people who are interested
in the mathematical details of FCA.)
Lecture 2: Linguistic applications of formal concept analysis
Priss, Uta.
Linguistic Applications of Formal Concept Analysis.
Proceedings of ICFCA 2003. (to appear)
Lecture 3: Ontology Visualisation
The first seven pages in the paper by
Fluit; Sabou; van Harmelen (2003).
Supporting User Tasks through Visualisation of Light-weight Ontologies
In: Staab; Studer (Eds.). Handbook on Ontologies in Information Systems,
Springer-Verlag.
Lecture 4: A semiotic-conceptual framework for ontologies
Conceptual Graphs website
-----------------------------------------
Helge Dyvik, Section for Linguistic Studies, University of Bergen
Semantic Mirrors: Thesaurus Derivation from Parallel Corpora
Slides: Dyvik's Slides
The lectures will present a method for the automatic extraction of wordnet-type
information from translational data. The basic insight behind the method
is that much information about the semantic relations among the words
in a language resides in the way in which the sets of their possible translations
into some other language overlap. Therefore, taking the translational
relation as a given, languages can serve as each other's "semantic mirrors".
The implemented method takes words with their sets of possible translations
in the corpus as input and outputs complex lattice structures displaying
relations of hyperonymy/hyponymy and near-synonymy among word senses in
each language. Thesaurus-like entries can then be derived from these lattice
structures.
In addition to presenting and demonstrating the method itself, the lectures
will discuss the relationship between translation and semantics. Furthermore,
we will look at a method for automatic word alignment of a parallel corpus,
and compare thesaurus entries derived from automatically aligned data
with entries derived from manually aligned data. Evaluation will also
be discussed.
--------------
Readings/Materials:
Another approach to the use of translational data for semantics:
Nancy Ide, Tomas Erjavec & Dan Tufis (2002):
Sense Discrimination with Parallel Corpora.
Proceedings of ACL'02 Workshop on Word Sense Disambiguation: Recent Successes and Future
Directions, Philadelphia, 54-60.
A presentation of the "Mirrors" approach
Helge Dyvik: Translations as a Semantic Knowledge Source. (2003)
A web-based demonstrator of the "Mirrors" method:
http://ling.uib.no/~helge/mirrwebguide.html
Lars Kai Hansen, Informatics and Mathematical Modelling, Technical University
of Denmark
Slides: Lecture
1, Lecture
2, Lecture
3
Statistical aspects of context in multi-media
The purpose is to give an overview of statistical approaches to context
representation and detection in multi-media: text, images, sound, networks.
Content:
- Latent semantic analysis
- The independent context hypothesis
- Hierarchical clustering
- Web and other networks
- Search engine tools and methods
- Futuristic extrapolations: A vision for the information sciences
Lecture 1. Representations
- Data vs Features
- The curse of dimensionality
- Text representations: Bag-of-words, language features
- MPEG (moving pictures expert group)
- Naive Bayes methods
- Latent semantic analysis (Principal components)
Lecture 2. The independent context hypothesis
- Supervised vs unsupervised learning
- Independent component analysis (ICA): the concept
- Statistical approaches to ICA
- Examples
Lecture 3. Text classification
- Statistical approaches: Clustering
- Unsupervised: Hierarchical clustering
- Supervised: Artificial neural networks
- Examples
Lecture 4. Networks
- Search engines
- Google's pagerank
- Authorities and hubs
- Social networks -power laws on the web
Lecture 5. Review and Perspectives
- Review of lecture 1-4.
- Future: Human constraints in multi-modal search
- Future: Deep search
Readings/Material
Latent Semantic Analysis
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman,
R. (1990): Indexing
by latent semantic analysis, Journal of the American Society for Information
Science, 41(6), 391-407.
MPEG Moving pictures expert group
http://www.chiariglione.org/mpeg/
Independent component analysis
J. Larsen, L. K. Hansen, T. Kolenda, F. Å . Nielsen : Independent
Component Analysis in Multimedia Modeling
ICA Matlab toolbox with text demo's
http://mole.imm.dtu.dk/toolbox/ica/index.html
PageRank
http://citeseer.nj.nec.com/page98pagerank.html
Deep web
www.brightplanet.com/technology/deepweb.asp |