Built a high-performance file compressor in C++ for various formats

Student at Guru Nanak Dev University (GNDU), Amritsar

1mo

🚀 Just shipped: A high-performance, multi-format file compression engine in C++! Tired of one-size-fits-all compression tools that don't understand your data? I've been building a professional-grade File Compressor designed to intelligently reduce file sizes by leveraging specialized algorithms tailored to each format's unique characteristics. This wasn't just about applying zip to everything. The core challenge was selecting and integrating the best-in-class libraries to achieve maximum efficiency for each file type: ✅ Text & Data (TXT, CSV, JSON, XML, LOG): Leveraged Zlib for robust lossless compression, achieving 60-85% reduction. ✅ Images (BMP, TIFF, PSD): Used stb and LibTIFF to intelligently convert BMPs to PNGs and compress TIFF/PSD files with up to 85% reduction. ✅ Audio (WAV, AIFF): Integrated LAME & AudioFile to transform uncompressed audio into efficient MP4 (AAC) files, slashing size by a massive 85-95%. 🛠️ Under the Hood: Modular C++ Architecture: Each format has its own dedicated compression module, making it easy to maintain and extend. CMake Build System: For seamless, cross-platform compilation. Professional Code Structure: Clean separation between source, headers, and external libraries. Although this is just the initial version of this project it was still fun to take a deep dive into the nuances of data formats, compression theory, and native library integration. It reinforced the principle that the right tool for the job will always outperform a generic solution. I'm excited to share the code and see how others might extend it. Think it could be useful? Check out the repo, I’d love your feedback on what other formats would you want this to support? Star ⭐ it if you like it, and I'm always open to feedback and collaboration! 🔗 Repository: https://lnkd.in/ge7k-zmv

GitHub - n0th9ng2311/File_Compressor: New repo for file compressor github.com

1 Comment

Vikramjeet Singh Bawa

Guru Nanak Dev University, Amritsar

1mo

Impressive

To view or add a comment, sign in

More Relevant Posts

João Antunes

Software Engineer & Architect | Building Stuff & Helping Colleagues Level Up | Working Remotely
2w
Report this post
Not really sure yet if it's at all useful (possibly not 😅), but started experimenting with the idea of having a super basic folder/git based schema catalog (for stuff like OpenAPI, Avro, Protobuf, etc). I was thinking about having some directory structure following some conventions, plus using some files using predefined formats to keep some metadata about schemas. Then, bolt an API on top to index some relevant metadata, to allow a CLI to interact with, in order to keep a synchronized local copy of the schemas relevant to a given application/service/whatever. - Any thoughts on the relevance of something like this? - Do you know of any existing tools that could help with this? (I looked into Event Catalog, but it didn't really feel like what I'm looking for) https://lnkd.in/drJ7NcvJ

GitHub - joaofbantunes/schema-catalog-tools-poc: Quick proof of concept to have a simple, file system/git based schema catalog with a bit of tooling on top github.com

1 Comment
Like Comment
To view or add a comment, sign in
Muhammad Nasir

AI/ML Engineer | Data Scientist | Generative AI & LLM Specialist | Deep Learning Enthusiast
1mo
Report this post
load vs lazy_load in LangChain: load() - Eager Loading The load() method is the straightforward approach. When you call it, the document loader reads the entire source (e.g., a file, a directory of files, a website) and parses everything into a list of Document objects immediately. When to use it: You are working with a small number of files or a small amount of text. Your entire dataset can easily fit into your application's memory (RAM). You want simplicity and plan to use all documents immediately (e.g., for splitting and embedding). lazy_load() - Lazy Loading The lazy_load() method is designed for memory efficiency. Instead of returning a list, it returns a generator. A generator yields one Document at a time. This means you can process each document (e.g., split, embed, store in a database) without ever having the entire dataset in RAM. When to use it: You are working with a large number of files or a very large single file. The full dataset is too large to load into memory at once. You want to process documents in a streaming fashion.
Like Comment
To view or add a comment, sign in
Jyoti Ranjan Nayak

Founder @ Sigma Analytics and Computing Private limited
1mo
Report this post
Ever wondered how file reading actually works at the lowest level? At the OS layer, even something as simple as opening-a-file eventually becomes a syscall like read, write, or open. These syscalls bridge our program with the kernel, which then talks to the filesystem and the underlying storage hardware. But in day-to-day coding, we rarely think about this complexity — higher-level abstractions (standard libraries, frameworks, databases) hide the raw mechanics so developers can focus on logic rather than hardware details. Databases, however, go deeper. They fine-tune reading and writing operations with caching, buffering, log-structured storage, and clever data movement algorithms. All of this is done not just for speed but also to uphold the sacred ACID properties (Atomicity, Consistency, Isolation, Durability). ✨ Abstraction makes software development simple. ⚙️ Optimization makes databases powerful. Both coexist to turn raw system calls into the seamless experiences we rely on every day. A small file-read demo program is shown here. https://lnkd.in/grz2_8FH

system_programming/syscalldemo.c at main · jyotinayak1976/system_programming github.com
Like Comment
To view or add a comment, sign in
Robert Graves

Learning Systems and Data Strategist at MathWorks
2w
Report this post
The Live Editor supports a new plain text Live Code file format (.m) for live scripts as an alternative to the default binary Live Code file format (.mlx), but you can make (.m) as the default in the settings. Live scripts use a custom markup, based on markdown, where formatted text and the appendix that stores the data associated with the output and other controls. You can learn more here➡️ https://spr.ly/6048AHQp6
Like Comment
To view or add a comment, sign in
Kevin Holly, PhD

Senior Application Engineer @ MathWorks | Biomedical Engineering PhD
3w
Report this post
The Live Editor supports a new plain text Live Code file format (.m) for live scripts as an alternative to the default binary Live Code file format (.mlx), but you can make (.m) as the default in the settings. Live scripts use a custom markup, based on markdown, where formatted text and the appendix that stores the data associated with the output and other controls. You can learn more here➡️ https://spr.ly/6047A6KQR
Like Comment
To view or add a comment, sign in
Shane O'Kelly

Senior Account Manager- Accelerating the pace of Engineering and Science
2w
Report this post
The Live Editor supports a new plain text Live Code file format (.m) for live scripts as an alternative to the default binary Live Code file format (.mlx), but you can make (.m) as the default in the settings. Live scripts use a custom markup, based on markdown, where formatted text and the appendix that stores the data associated with the output and other controls. You can learn more here➡️ https://spr.ly/6043AEXSN
Like Comment
To view or add a comment, sign in
Stephan van Beek

Transforming product development with Systems Engineering & Model-Based FPGA/SoC Design
3w
Report this post
The Live Editor supports a new plain text Live Code file format (.m) for live scripts as an alternative to the default binary Live Code file format (.mlx), but you can make (.m) as the default in the settings. Live scripts use a custom markup, based on markdown, where formatted text and the appendix that stores the data associated with the output and other controls. You can learn more here➡️ https://spr.ly/6047A6z43
Like Comment
To view or add a comment, sign in
WordPress Blocks and Editor: A Comprehensive Guide

41 followers
3w
Report this post
Using static blocks for frequently changing content? The result: Manual maintenance of EVERY block instance The solution: Dynamic blocks Better: Synced patterns → global updates. When logic is needed: Dynamic blocks → PHP-powered output. Stop editing one by one. Use the right tool. Think dynamic. https://lnkd.in/d37kuigX
Like Comment
To view or add a comment, sign in
Maksim Vialykh

Executive Engineering Leader | Cyber Identity & Cross-Platform Applications | Org Scaling & Secure Delivery
3w
Report this post
🦀 Mini Project: Parallel Log Analyzer (Week 5 focus) ⚡📊 📚 Key Learnings: Learned about data parallelism in Rust using the rayon crate. Practiced splitting large workloads (log files) into chunks for concurrent processing. Improved understanding of iterators, map/reduce patterns, and thread-safety. Reinforced Rust’s fearless concurrency approach. 🛠 Mini Project: Parallel Log Analyzer Reads large log files. Splits them into lines and processes entries in parallel. Extracts metrics (e.g., error counts, most common warnings, request frequency). Outputs a summary report much faster than a sequential approach. 💻 Repo: https://lnkd.in/gBzcYySW This was a fun step into scalable systems programming — getting comfortable with concurrency is a key milestone in Rust! #Rust #Rayon #Concurrency #ParallelComputing #LearningInPublic #RustLang #SideProjects

GitHub - vialyx/parallel_log_analyzer: Parallel Log Analyzer using Rayon github.com
Like Comment
To view or add a comment, sign in
Parshwa Vora

SDE 2 | Node.js, Nest.js, TypeScript, GraphQL, AWS, Database Design Expert | Expert in Scalable Microservices & Cloud-Native Architectures | Built AI Chatbots with GenAI & LangChain
4w
Report this post
I killed 3 backend APIs with one change ⚡. Recently, I reworked our file upload flow and the results were eye-opening. Initially, we followed the “classic” approach: ➡️ Client uploads file to our backend API ➡️ Backend stores it temporarily ➡️ Backend pushes it to S3 ➡️ Another API serves the file It worked, but… ❌ Our servers became a bottleneck ❌ Upload speeds were slower due to multiple hops ❌ Costs went up since all traffic passed through backend ❌ More APIs = more code to maintain & scale So I switched to S3 pre-signed URLs (PUT/POST). Here’s what changed: ✅ Client now uploads directly to S3 using a secure, time-limited URL ✅ Backend only generates the URL & validates permissions ✅ No file data flows through our servers anymore ✅ Uploads got faster, infra costs dropped, and scaling became easier The shift was simple but powerful — instead of making the backend do everything, I let storage handle storage. Backend just became the gatekeeper for security & access control. This small change improved performance, reduced complexity, and gave us a much leaner architecture. If you’re still handling uploads via backend APIs, I highly recommend trying out pre-signed URLs. It’s a game changer. Reference to Docs: https://lnkd.in/gHaMbmpb #AWS #S3 #BackendDevelopment #CloudComputing #SoftwareEngineering #ScalableArchitecture #FileUploads #PresignedURLs #DevCommunity #BuildInPublic

Uploading objects with presigned URLs docs.aws.amazon.com

21 Comments
Like Comment
To view or add a comment, sign in

514 followers

5 Posts

View Profile Connect

LinkedIn respects your privacy

Built a high-performance file compressor in C++ for various formats

Explore content categories