Language of Stress: A Corpus-Based Study to Detect Early Signs of Suicide Through Lexical Choice

Afshan Ishfaq; Nida Sultan

doi:10.5281/

Authors

Afshan Ishfaq Assistant. Professor Head of Academics at Institute of Law, Lahore Author
Nida Sultan Lecturer in English at NAMAL, Mianwali Author

DOI:

https://doi.org/10.5281/

Abstract

Suicide represents one of the most devastating and preventable causes of premature death globally, yet its early detection remains stubbornly elusive. This study advances the hypothesis that language specifically the spontaneous lexical choices individuals make in everyday written and digital discourse constitutes one of the most sensitive and accessible markers of suicidal ideation. We employ corpus-based methodologies to conduct a systematic, quantitative investigation of how the written language of individuals experiencing suicidal ideation differs from that of a matched non-suicidal population. A purpose-built Suicide Discourse Corpus (SDC) of approximately 452,000 tokens was compiled from four heterogeneous sources: anonymized crisis helpline transcripts, Reddit posts from mental health disclosure communities, published first-person narratives of suicidal crises, and archival farewell notes. A Matched Control Corpus (MCC) of 449,800 tokens from general online discourse was constructed as a baseline. Analytical methods include keyness analysis (log-likelihood, G²), semantic domain profiling using the UCREL Semantic Analysis System (USAS), collocational analysis (Mutual Information scoring), and frequency analysis of grammatical and functional-word categories. Findings reveal a statistically robust lexical signature in suicidal discourse marked by: (a) dramatically elevated pain, suffering, and death-related vocabulary; (b) absolutist and negation-heavy language reflecting cognitive constriction; (c) a depletion of future-oriented temporal reference and positive evaluative terms; (d) heightened first-person singular pronoun use alongside reduced social solidarity vocabulary; and (e) distinctive collocational frames encoding inescapable, internally directed suffering. These patterns align with major psychological theories of suicide including Shneidman's psychache theory, Joiner's Interpersonal Theory of Suicide (IPTS), and Beck's cognitive model of hopelessness. Implications for the design of NLP-assisted early-warning systems, ethical governance of mental health corpus research, and future multilingual extension of this work are discussed at length.