Archiv

7. 11. 06

 

7. 11. 1006 The architecture of a dictionary

Discussion of homework

  • definition: a short text, consisting of two parts: the definiendum (what has to be defined) and the definiens, which consists of the genus proximum and differencia specifica

  • hip hop: an urban kind of lifestyle



Dictionary Information

  • Metadata: catalogue information about the production of the dictionary, intended for dictionary identification

  • Types of lexical information in lexical entries

  • form (appearance) : spelling, pronunciation

  • structure ( formulation) : construction of words, place of words in larger constructions (sentences)

  • content (meaning) : definition, relations with other words, examples



Organisation of lexical information

  • Semasiological dictionary: reader's dictionary, decoding dictionary

  • Onomasiological dictionary: writer's dictionary, encoding dictionary



Parts of a dictionary

  • Megatructure (overall structure)

  • Macrostucture ( organisation of content)

  • Mesostructure ( what links entries, words linked within definitions, cross references)

  • Microstructure ( organisation of lexical entries)



Megastructure

  • entire structure of the dictionary

  • front matter (metadata)

  • abbreviations and explanations of grammar, pronunciations, ...

  • the body of the dictionary (content)

  • back matter ( references to printer, ...)

    Quiz

    Give examples of the kinds of information contained in each of these structure types

    Oxford Advanced Learner's dictionary

  • front matter: A S Horneby, seventh edition, chief editor: Sally Wehmeier

  • abbreviations, .. : abbr.: abbreviation, adj.: adjective, BrE: British English

  • the body of the dictionary: content from a to zygote

  • back matter: Oxford University Press 2005

     

    Macrostructure

  • organisation of lexical entries in the body of a dictionary

  • trees, networks, list

  • types of macrostructure : onomasiological, semasiological

     

    Quiz

  • Are semasiological macrostructures more like lists, trees or networks? -> Lists

     

    QUIZ: megastructure, macrostructure

  • What is the

  • Megastructure ( definition and examples see above)

  • Macrostructure ( list: lexical order ( all words beginning with „a“ before the words beginning with „b“, network, tree: all words belonging to one word field )

 

  • What is a

  • Semasiological ( bilingual, monolingual dictionaries, „Oxford Learner's dictionary“

  • Onomasiological dictionary? ( Thesaurus)



Microstructure

  • consistent organisation of lexical information within lexical entries in the dictionary

  • Meaning: semantics, pragmatics

  • Structure: Syntax ( text, phrase), morphology ( inflexion, word formation)

  • Appearance: form ( spelling, orthography)

     


Quiz

  • How many different types of lexical information can you find?

  • spelling, pronunciation, genus proximum, differencia specifica, picture, synonyms/ antonyms, translation, ethymological information, ...

  • Is the microstructure of a semasiological dictionary typically a list, a tree or a network?

  • list

  • What kind of structure do the combined macrostructure and microstructure of a semasiological dictionary have?

  • Hierarchical structure, embedded list, table

  • And an onomasiological dictionary?

  • Tree structure


Mesostructure

  • set of relations between lexical entries and other entities such as other parts of the dictionary or a text corpus


QUIZ:

  • How do lexical entries relate to each other?

    When a lexical entry says „noun“ , it is automatically a link to the entry for „noun“

  • How do lexical entries relate to the mini-grammar in the megastructure?

    For example a number which indicates the number of the table where you can find the irregular verb in the mini grammar

  • How do lexical entries relate to text corpora?

    Contextual definitions


Lexicon mesostructure

  • Datacategory subvectors ( embedded lists) : modality, grammar, object semantics

  • Description references: use of abbreviations for parts of speech, characterization of spelling

  • cross references between related entries ( co – hyponyms)

  • corpus references (concordance)


  • What is the mesostructure of a dictionary? ( see above)

  • Give examples for mesostructural elements concerning

  • Types of information with reference to the sign model

    a picture next to the entry, for example for „shirt“

  • Linguistic description references

    noun, verb

  • Cross-references between related entries little numbers which indicate that there is another meaning for the same word

    Fitting room (NamE also dressing room)

  • Corpus references
    mobile: Call me on my mobile.



Homework

  • Work out optimal answers to the quizzes ( see above)

  • Take one of your dictionaries, and describe in as much detail as possible its

  • megastructure

  • macrostructure

  • microstructure

  • mesostructure


Megastructure

  • Oxford Advanced Learner's dictionary

  • A S Hornby

  • 7th edition

  • chief editor: Sally Wehmeier

  • editor: Colin McIntosh, Joanna Turnbull

  • Phonetics Editor: Michael Ashby

  • Oxford University Press 2005

  • ISBN 987- ...

  • Cornelsen

  • Abbreviations, symbols, lables, ... used in the dictionary

  • key to verb pattern

  • Content ( abbreviations, ..., the dictionary, maps, coloured pages, reference section, ...

Macrostructure

  • alphabetical order

  • list


Microstructure

  • definiendum/ orthography/ spelling

  • pronunciation

  • genus proximum

  • differencia specifica

  • picture (icon, model)

  • co hypernym

  • antonym/ synonym

  • ( translation)

  • ethymological information


Mesostructure

  • co hypernyms

  • modality

  • grammar

  • object semantics

  • use of abbreviations for parts of speech

  • characterization of spelling

  • corpus references

14. 11. 2006 Lexical databases

 

14. 11. 2006 Lexical Databases



Surface structure (appearance, rendering) of dictionaries

  • semasiological dictionary (reader's dictionary, decoding dictionary)

  • onomasiological dictionary (writer's dictionary, encoding dictionary)



An overview of surface structures



The deep structure of dictionaries

semasiological dictionary

  • basic form : table

  • rows: lexical entries with specific microstructure

  • columns: single types of lexical information

  • if orthography or phonology ambiguous

  • either item is repeated with the new information

  • or sub table

  • depends on kind of ambiguity

  • homonymy (homography, homophony)

  • polysemy

  • homonym

  • a word that has the same pronunciation and spelling as another word, but a different meaning.

    Example: The word stalk, meaning either part of a plant or to follow (someone) around.

  • Homograph

  • a word that has the same spelling as another word, but a different meaning. Example: The spelling to cleave may denote to adhere to or to divide or split.

  • Homophone

  • a word that has the same pronunciation as another word, but whose meaning and/or spelling are different, . Example: All of to, too, and two, or there, their, and they’re ( http://en.wikipedia.org/wiki/Homonym )

  • polyseme

  • a word or phrase with multiple, related meanings. http://en.wikipedia.org/wiki/Polysemy



Dictionary Information

  • Metadata: catalogue information about the production of thedictionary, intended for dictionary identification

  • Types of lexical information in dictionary entries:

  • FORM (cf. appearance), e.g. spelling, pronunciation

  • STRUCTURE (cf. formulation), e.g. construction ofwords, place of words in larger constructions (e.g. sentences)

  • CONTENT (cf. Meaning): definition, relations with other words, examples



The task

Exercise: To understand what a database basically is,

create a table with one of the following:

  • a list of your CDs (well, some of them), with name, artist, ...

  • a list of your friends, with names, addresses, etc.


Name

Prename

Birthday

Bechtloff

Corinna

20. 01. 1986

Bryczek

Natalia

04. 03. 1987

Höppner

Marie Luise

12. 09. 1986

Kerker

Kristina

14. 06. 1986

Schneider

Sarah

21. 06. 1987



Basic model of a table

  • table: a list of rows

  • row: a list of fields

  • column: a list of fields in the same row position



How to ...

... create tables in Open Office/ Microsoft Words

  • table“ -> insert -> table -> choose number of rows/ columns -> ceate table


How to ...

... create tables in Ms Excel/ Open Office Calc

  • start program


The html table model


< html >

< head >

<title> Example of the HTML table model < /title>

< /head>

< body>

< table border = 20 >

< tr >

< td > love < / td>

< td> noun < / td>

< td> a feeling of strong affection < / td>

< /tr>

< /table>

< /body>

21. 11. 2006 Lexicon Data and their Structure

 

21. 11. 2006 Lexicon data and their structure



Lexicon structure and their data types

  • Microstructure

  • number of lexicon articles/entries/records

  • order of DatCats ( datacategories)

  • Mesostructure

  • Interrelation of lexicon entries

  • relation to external information

  • Macrostructure

  • order of lexicon entries

  • selection of sort key

  • sorting order not trivial! ( cf. Languages which are only spoken -> IPA)


    Sorting NOT trivial, example „ @ „

  • you would expect „ h@me“ close to the word „home“

  • you would expect intern@t close to the word „ internet“

  • you would expect „ @“ home close to the word „ at“

    so where do you sort „@“ ???


Haus -> Häuser

Hauses -> Häuser

Hause -> Häusern

Haus -> Häuser ( which form would you find in a dictionary? -> Haus)


a declination

flamm – a, – ae, - ae, -am, -a, -ae, -arum, - is, - as, -is ( which one would you find here? -> flamma )


Microstructure

  • words (most) ( except for pucture dictionaries)

  • grammatical information: syntax

  • part of speech (POS)

  • inflectional class

  • valence ( which verb takes (how many) objects, transitiv, intransitiv)

  • representation of meaning (formats differ)

  • semantics

  • definition

  • corpus reference := usage examples



Detour: CORPUS

-> collection of language material

  • texts

  • transcripts

  • speech ( transcription in IPA)

  • examples : Oxford corpus, Longman corpus

-> with additional information

  • Part Of Speech

  • lemma ( de- grammaticalized form of a word)

  • transcription

  • annotations

-> with a specific structure

  • interlinar glossing

  • special make up




Other types of lexicons


  • Word frequency lexicon

  • the most frequent one first

  • Lexicon of "phrasal verbs"

  • by part of speech and a special structure

  • rhyming lexicon

  • by word ending

  • picture lexicon

  • by prototype




Problematic issues in lexicography


  • ambiguity

  • synonyms ( two word forms , same meaning)

  • polysemy ( one word form, two or more (slightly) different meanings)

  • homonyms ( one word form, meaning completely different)


  • word search

  • languages with inflectional prefixes

  • orthographic ambiguity

  • picture lexicons?

     

  • Language change

  • new words

  • new meaning



  • Solutions to problems

  • ambiguity : enumeration

  • search word „abitrary“ definition

  • language change: new edition

  • more fundamental solutions




Methods of creating lexicons

  • introspection

  • look inside ( by trained linguist)

  • reflecting one's own language use

  • social“ filter : relevance, importance, adequacy

  • Questionnaire

  • in comparative linguistics

  • typology

  • unknown language -> picture dictionary

  • point at picture ( might be rude in some countries)

     

  • requirements and limitations

  • intended use: researching morphology, use in computer systems, translation

  • intended usergroup: experts, lay, translators,linguists,

  • intended coverage: general, special purpose

  • available sources: availability of language experts (native speakers)

  • example questionnaire :

  • Asking questions for translation, explanation

  • Social filters apply

  • http://www.spectrum.unibielefeld.de/~ttrippel/htmd/questionnaire_short.html


     

  • corpus

     

     


Corpus based lexicon creation

  • "reflect the evidence"

  • include "words" found

  • exclude items not in corpus

  • based on corpora

  • list all words: wordlist

  • words in context: concordance

  • distribution analysis: HMM

  • flat tabular lexicon

  • generalizations in the lexicon

  • declarative lexicons




Hierachy of lexicon and corpus types



Corpus based lexicon creation application

  • SIL toolbox

  • Summer Institute of Linguistics

  • famous for fieldwork tools

  • language database: www.ethnologue.org

  • previously named "shoebox"

  • future: fieldworks

  • Interlinearization of text

  • one line "base" text

  • one line gloss

  • one line morphology

  • ....




Lexicon Database Applications

  • Lists

  • Table

  • Tables

  • Relational Database Management Systems (RDBMS)

  • samples

  • Corpus based lexicon management

  • Graph based lexicons




Relational Model for a Lexicon

  • table structures

  • efficient storage and retrieval in Relational

  • Database Management Systems (RDBMS)

  • often used for technological applications

  • used for some web based lexicons

  • translation = mapping of two different columns

  • example: http://dict.tu-chemnitz.de




Graph based lexicon

  • Lexical information = nodes in a graph

  • microstructure = (labeled) arcs between nodes

  • crossreferences = arcs between nodes

  • mesostructure = reference to external knowledge

  • macrostructure = access structure, starting at each node



Summary

  • Lexicon structures and data types

  • microstructure data types

  • different macrostructures

  • Lexicon creation

  • questionnaire

  • corpus

  • Lexicon representation formats

  • RDBMS

  • graphs