Identification and Disambiguation of Cognates, False Friends, and Partial Cognates Using Machine Learning Techniques

Oana Frunza, Diana Inkpen

Abstract


Cognates are words in different languages that have similar spelling and meaning. They can help a second-language learner on the tasks of vocabulary expansion and reading comprehension. The learner needs to pay attention to pairs of words that appear similar but are in fact false friends, have different meanings. Partial cognates are pairs of words in two languages that have the same meaning in some but not all contexts. Detecting the actual meaning of a partial cognate in context can be useful for Machine Translation tools and for Computer-Assisted Language Learning tools. In this article we present a method to automatically classify a pair of words as cognates or false friends. We use several measures of orthographic similarity as features for classification. We study the impact of selecting different features, averaging them, and combining them through machine learning techniques. We also present a supervised and a semi-supervised method to disambiguate partial cognates between two languages. The methods applied for the partial cognate disambiguation task use only automatically-labeled data therefore they can be applied to other pairs of languages as well. We also show that our methods perform well when using corpora from different domains. We applied all our methods to French and English.

Full Text:

PDF


DOI: https://doi.org/10.5296/ijl.v1i1.309

Copyright (c)



International Journal of Linguistics  ISSN 1948-5425  Email: ijl@macrothink.org

Copyright © Macrothink Institute ISSN 1948-5425

To make sure that you can receive messages from us, please add the 'macrothink.org' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.