The Dutch database, version N3.1, was released in March 1990 and contains information on 381,292 present-day Dutch wordforms, corresponding to 124,136 lemmata. Lexical data stored in three separate databases for Dutch, English, and German. SGML-encoded text files: The text of the Cambridge International Dictionary of English CD-ROM, English Pronouncing Dictionary, the Cambridge Dictionary of American English, the Cambridge International Dictionary of Idioms, the Cambridge International Dictionary of Phrasal Verbs and the Word Routes/Selector series of parallel bilingual mini-thesauri in French, Spanish, Portuguese, Italian, Greek and Catalan, and sound files from the CIDE CD-ROM. Links (many dead ones!) to on-line dictionaries, including parallel/multilingual onesĪ collection of pronunciations captured in individual audio files for more than 50,000 of the most common words in English (words were extracted from newswire and telephone conversation)Ī machine-readable pronunciation dictionary for North American English that contains over 125,000 words and their ASCII phonemic transcriptions. Pretty good listing of lexicons and electronic dictionaries. Hundreds of dictionaries for more than 260 languagesĪ bookmarks page by the Special Interest Group on the Lexicon of the Association for Computational Linguistics. Roget’s Thesaurus (1911 edition) in Java, designed for Natural Language Processing includes four examples of NLP applications: (1) detecting lexical chains in text, (2) determining semantic distance between words and phrases, (3) clustering words based on their meaning, and (4) solving a word quiz.Ĭomprehensive listing of on-line dictionaries. V.Roget’s Thesaurus as an Electronic Lexical Knowledge Base
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |