Here you will need to describe the text segments out-of the fresh corpus try classified considering its particular vocabularies in addition to their place was separated with respect to the volume of the reduced versions. Away from matrices crossing text message segments and you can terms (within the constant chi-square screening), the DHC system is used and you can a stable and you can decisive classification is acquired ( Reinert, 1990 ). Which data is designed to get categories off text message areas one to, meanwhile, present a code the same as each other, and you can a code distinct from the language markets away from almost every other classes.
Compliment of lexical data, following reduction of the language on the roots, the corpus displayed a maximum of 2,782 events different terms which have 757 distinctive line of variations. 4% of your own corpus. Continue reading