Annotation Quality in Aspect-Based Sentiment Analysis: A Case Study Comparing Experts, Students, Crowdworkers, and Large Language Model
A new study benchmarks annotation quality across four sources (expert annotators, students, crowdworkers, and LLMs) for German aspect-based sentiment analysis, using inter-annotator agreement and downstream task performance as metrics. The work addresses a critical gap in non-English ABSA datasets and reveals how LLM-generated labels compare to human annotation at scale. For practitioners building multilingual NLP systems, this establishes empirical guidance on whether to invest in expert annotation, crowd labor, or synthetic LLM labeling for low-resource languages, with direct implications for dataset construction costs and model reliability.52























