Main Article Content
The Swadesh’s wordlist has been used for more than half a century to collect data for studies in comparative and historical linguistics. The current study compares the classification results of the Swadesh’s100 wordlist with those of its subsets to determine if reducing the size of the wordlist impacts its effectiveness. In the comparison, the 100, 50 and 40 wordlists were used to compute lexical distances of 29 Cushitic and Semitic languages spoken in Ethiopia and neighboring countries. Gabmap, a based application, was employed to compute the lexical distances and to divide the languages into related clusters. The comparison shows that the subsets are not as effective as the 100 wordlist in clustering languages into smaller related subgroups, but they are equally effective in dividing languages into bigger groups such as subfamilies. It is observed that the subsets may lead to an erroneous classification whereby unrelated languages by chance form a cluster which is not attested by a comparative study. The chance to get a wrong result will be higher when the subsets are used to classify languages which are not closely related. Though a further study is still needed to settle the issues around the size of the Swadesh’s wordlist, this study indicates that the 50 and 40 wordlists cannot be recommended as reliable substitutes for the 100 wordlist under all circumstances. The choice seems to be determined by the objective of a researcher and the degree of affiliation among the languages to be classified.