Seol mar théacs é seo: Corpus data across languages and disciplines