The document examines approximation methods for reducing the computational cost of training generative moment matching network (GMMN)-based speech synthesis. It investigates approximating the Gram matrices using block diagonal and random Fourier feature approaches, as well as using random and clustering-based methods for minibatch selection. Subjective tests found that the random Fourier feature and clustering-based minibatch selection approaches achieved higher subjective scores for inter-utterance variation while maintaining naturalness.