Text this: Spoken language corpus and linguistic informatics