Text this: Video object extraction and representation