Text this: From language to multimodality :