MEASURING THE LINGUISTIC EFFICIENCY BY MEANS OF SHANNON ENTROPY AND ZIPF’S LAW ON DIFFERENT TEXT GENRES
Abstract
This study examines how efficient language is in different genres—academic papers, news reports, fiction, and tweets—based on Shannon entropy and Zipf’s Law between COCA and Twitter. Salient observations are that academic text exhibits the most extreme entropy (H = 10.2 bits, indicative of dense information), that social media data is almost Zipfian (α = −1.03 ), and that fictional text reaches a nominal point of compromise between creativity and readability. This research investigates lexical diversity, word frequencies distributions, and compression efficiency to show how various genres achieve maximal communication optimization.
Keywords: Shannon entropy, Zipf’s Law, linguistic efficiency, genre analysis, computational linguistics