🚀 Just shipped: A high-performance, multi-format file compression engine in C++! Tired of one-size-fits-all compression tools that don't understand your data? I've been building a professional-grade File Compressor designed to intelligently reduce file sizes by leveraging specialized algorithms tailored to each format's unique characteristics. This wasn't just about applying zip to everything. The core challenge was selecting and integrating the best-in-class libraries to achieve maximum efficiency for each file type: ✅ Text & Data (TXT, CSV, JSON, XML, LOG): Leveraged Zlib for robust lossless compression, achieving 60-85% reduction. ✅ Images (BMP, TIFF, PSD): Used stb and LibTIFF to intelligently convert BMPs to PNGs and compress TIFF/PSD files with up to 85% reduction. ✅ Audio (WAV, AIFF): Integrated LAME & AudioFile to transform uncompressed audio into efficient MP4 (AAC) files, slashing size by a massive 85-95%. 🛠️ Under the Hood: Modular C++ Architecture: Each format has its own dedicated compression module, making it easy to maintain and extend. CMake Build System: For seamless, cross-platform compilation. Professional Code Structure: Clean separation between source, headers, and external libraries. Although this is just the initial version of this project it was still fun to take a deep dive into the nuances of data formats, compression theory, and native library integration. It reinforced the principle that the right tool for the job will always outperform a generic solution. I'm excited to share the code and see how others might extend it. Think it could be useful? Check out the repo, I’d love your feedback on what other formats would you want this to support? Star ⭐ it if you like it, and I'm always open to feedback and collaboration! 🔗 Repository: https://lnkd.in/ge7k-zmv
Built a high-performance file compressor in C++ for various formats
More Relevant Posts
-
Not really sure yet if it's at all useful (possibly not 😅), but started experimenting with the idea of having a super basic folder/git based schema catalog (for stuff like OpenAPI, Avro, Protobuf, etc). I was thinking about having some directory structure following some conventions, plus using some files using predefined formats to keep some metadata about schemas. Then, bolt an API on top to index some relevant metadata, to allow a CLI to interact with, in order to keep a synchronized local copy of the schemas relevant to a given application/service/whatever. - Any thoughts on the relevance of something like this? - Do you know of any existing tools that could help with this? (I looked into Event Catalog, but it didn't really feel like what I'm looking for) https://lnkd.in/drJ7NcvJ
To view or add a comment, sign in
-
load vs lazy_load in LangChain: load() - Eager Loading The load() method is the straightforward approach. When you call it, the document loader reads the entire source (e.g., a file, a directory of files, a website) and parses everything into a list of Document objects immediately. When to use it: You are working with a small number of files or a small amount of text. Your entire dataset can easily fit into your application's memory (RAM). You want simplicity and plan to use all documents immediately (e.g., for splitting and embedding). lazy_load() - Lazy Loading The lazy_load() method is designed for memory efficiency. Instead of returning a list, it returns a generator. A generator yields one Document at a time. This means you can process each document (e.g., split, embed, store in a database) without ever having the entire dataset in RAM. When to use it: You are working with a large number of files or a very large single file. The full dataset is too large to load into memory at once. You want to process documents in a streaming fashion.
To view or add a comment, sign in
-
Ever wondered how file reading actually works at the lowest level? At the OS layer, even something as simple as opening-a-file eventually becomes a syscall like read, write, or open. These syscalls bridge our program with the kernel, which then talks to the filesystem and the underlying storage hardware. But in day-to-day coding, we rarely think about this complexity — higher-level abstractions (standard libraries, frameworks, databases) hide the raw mechanics so developers can focus on logic rather than hardware details. Databases, however, go deeper. They fine-tune reading and writing operations with caching, buffering, log-structured storage, and clever data movement algorithms. All of this is done not just for speed but also to uphold the sacred ACID properties (Atomicity, Consistency, Isolation, Durability). ✨ Abstraction makes software development simple. ⚙️ Optimization makes databases powerful. Both coexist to turn raw system calls into the seamless experiences we rely on every day. A small file-read demo program is shown here. https://lnkd.in/grz2_8FH
To view or add a comment, sign in
-
The Live Editor supports a new plain text Live Code file format (.m) for live scripts as an alternative to the default binary Live Code file format (.mlx), but you can make (.m) as the default in the settings. Live scripts use a custom markup, based on markdown, where formatted text and the appendix that stores the data associated with the output and other controls. You can learn more here➡️ https://spr.ly/6048AHQp6
To view or add a comment, sign in
-
The Live Editor supports a new plain text Live Code file format (.m) for live scripts as an alternative to the default binary Live Code file format (.mlx), but you can make (.m) as the default in the settings. Live scripts use a custom markup, based on markdown, where formatted text and the appendix that stores the data associated with the output and other controls. You can learn more here➡️ https://spr.ly/6047A6KQR
To view or add a comment, sign in
-
The Live Editor supports a new plain text Live Code file format (.m) for live scripts as an alternative to the default binary Live Code file format (.mlx), but you can make (.m) as the default in the settings. Live scripts use a custom markup, based on markdown, where formatted text and the appendix that stores the data associated with the output and other controls. You can learn more here➡️ https://spr.ly/6043AEXSN
To view or add a comment, sign in
-
The Live Editor supports a new plain text Live Code file format (.m) for live scripts as an alternative to the default binary Live Code file format (.mlx), but you can make (.m) as the default in the settings. Live scripts use a custom markup, based on markdown, where formatted text and the appendix that stores the data associated with the output and other controls. You can learn more here➡️ https://spr.ly/6047A6z43
To view or add a comment, sign in
-
Using static blocks for frequently changing content? The result: Manual maintenance of EVERY block instance The solution: Dynamic blocks Better: Synced patterns → global updates. When logic is needed: Dynamic blocks → PHP-powered output. Stop editing one by one. Use the right tool. Think dynamic. https://lnkd.in/d37kuigX
To view or add a comment, sign in
-
-
🦀 Mini Project: Parallel Log Analyzer (Week 5 focus) ⚡📊 📚 Key Learnings: Learned about data parallelism in Rust using the rayon crate. Practiced splitting large workloads (log files) into chunks for concurrent processing. Improved understanding of iterators, map/reduce patterns, and thread-safety. Reinforced Rust’s fearless concurrency approach. 🛠 Mini Project: Parallel Log Analyzer Reads large log files. Splits them into lines and processes entries in parallel. Extracts metrics (e.g., error counts, most common warnings, request frequency). Outputs a summary report much faster than a sequential approach. 💻 Repo: https://lnkd.in/gBzcYySW This was a fun step into scalable systems programming — getting comfortable with concurrency is a key milestone in Rust! #Rust #Rayon #Concurrency #ParallelComputing #LearningInPublic #RustLang #SideProjects
To view or add a comment, sign in
-
I killed 3 backend APIs with one change ⚡. Recently, I reworked our file upload flow and the results were eye-opening. Initially, we followed the “classic” approach: ➡️ Client uploads file to our backend API ➡️ Backend stores it temporarily ➡️ Backend pushes it to S3 ➡️ Another API serves the file It worked, but… ❌ Our servers became a bottleneck ❌ Upload speeds were slower due to multiple hops ❌ Costs went up since all traffic passed through backend ❌ More APIs = more code to maintain & scale So I switched to S3 pre-signed URLs (PUT/POST). Here’s what changed: ✅ Client now uploads directly to S3 using a secure, time-limited URL ✅ Backend only generates the URL & validates permissions ✅ No file data flows through our servers anymore ✅ Uploads got faster, infra costs dropped, and scaling became easier The shift was simple but powerful — instead of making the backend do everything, I let storage handle storage. Backend just became the gatekeeper for security & access control. This small change improved performance, reduced complexity, and gave us a much leaner architecture. If you’re still handling uploads via backend APIs, I highly recommend trying out pre-signed URLs. It’s a game changer. Reference to Docs: https://lnkd.in/gHaMbmpb #AWS #S3 #BackendDevelopment #CloudComputing #SoftwareEngineering #ScalableArchitecture #FileUploads #PresignedURLs #DevCommunity #BuildInPublic
To view or add a comment, sign in
Guru Nanak Dev University, Amritsar
1moImpressive