The document presents MCSE (multimodal contrastive learning of sentence embeddings), an approach that extends unsupervised SimCSE by incorporating a multimodal contrastive objective. It highlights experiments that demonstrate MCSE's superior performance on semantic textual similarity tasks using datasets like Flickr30K and MS-COCO. The study concludes that MCSE enhances sentence embedding learning through improved alignment and uniformity in the embedding space compared to traditional methods.