The document presents a grammar-based pre-processing method for the prediction by partial matching (PPM) compression algorithm, achieving improved compression rates for various natural languages. By substituting common two-character and three-character sequences (bigraphs and trigraphs) with non-terminal symbols generated from the text, the approach enhances compression efficacy, demonstrating significant improvements over standard methods. Experimental results confirm up to 35% better compression for languages like Chinese and around 11% to 20% improvement when tested on the Calgary corpus.