Text this: Multilingual corpora and multilingual corpus analysis