Jens Schulz’s Post

1mo

What is the next big leap in #mathematicaloptimization: quantum approaches, AI/ML/RL, GPU...? So many trends to follow and to evaluate. So, which to bet on? Since early this year, it crystallized that for some large (especially gigantically large) linear optimization problems, GPU acceleration can help tremendously -- based on a proper mathematical foundation (first order methods) combined with the strength of GPUs (in contrast to CPUs) when it comes to memory exchange and parallelism. With the October 2025 release, #Xpress 9.8 now incorporates GPU acceleration on PDHG that makes your large-scale linear programming solutions fly! 🔥 What's Got Us Excited: • 30x speedups in single precision and 25x in double precision! • Full algorithm GPU implementation - not just matrix operations • Great for problems with over 100,000 non zeroes; even better for problems with over 10,000,000 non zeroes Thanks to our partners who submitted instances for testing and evaluating. 💡 Read more here: https://lnkd.in/eUq69x8G Don't get addicted purely to GPUs, yet! There are still many instances for which a Barrier or dual Simplex outperform current GPU implementations, and are less dependent on the numerical tolerances. Let's research more, and start enjoying!

GPU Acceleration of the Hybrid Gradient Algorithm in FICO Xpress fico.com

1 Comment

Sven Serneels

1mo

Long awaited!

To view or add a comment, sign in

More Relevant Posts

Sreejith G Krishnan
1mo
Report this post
This is a great example of how a Hybrid compute architecture can drive efficiency for Inference which is key to meet the best TCO and less Energy usage. This is built on a hybrid architecture that combines GPU systems like NVIDIA DGX with CPU built by Apple. Here’s a technical breakdown: --- 1. Workload Specialization • GPU (DGX Spark): Ideal for parallel-heavy tasks like prefill (matrix multiplications, attention blocks). • CPU (M3 Ultra): Excels at sequential, low-latency tasks like token-by-token decode, especially with unified memory and fast single-core performance. By splitting inference stages, you reduce contention and maximize hardware utilization. --- TCO (Total Cost of Ownership) Benefits 1. Lower Hardware Costs • DGX systems are expensive and power-hungry. Offloading decode to a cheaper, efficient CPU system reduces the need for multiple DGX units. • M3 Ultra offers high performance per watt at a fraction of the cost. 2. Reduced GPU Overprovisioning • Decode is often seen bottlenecked by latency, not throughput. Running it on GPU wastes parallelism. • Offloading decode frees up GPU, improving throughput per dollar. 3. Scalable Deployment • You can scale CPU nodes independently for decode-heavy workloads. • This modularity allows elastic scaling based on workload profiles. --- Power Efficiency Gains 1. Energy-Optimized Decode • CPUs like the M3 Ultra consume far less power than GPUs for sequential tasks. 2. Thermal and Cooling Savings • DGX systems require active cooling and high-density power delivery. • Offloading decode reduces GPU duty cycle, lowering thermal output and cooling costs. 3. Idle Power Reduction • Decode often involves waiting for token generation. CPUs can idle efficiently, while GPUs consume power even when underutilized. --- With the announcement of Nvidia NVLink integrations with Intel CPU architecture, maybe we will witness more of such deployments which is a win-win for AI optimizations and CPU wins across AI Inference!!!!

Combining NVIDIA DGX Spark + Apple Mac Studio for 4x Faster LLM Inference with EXO 1.0 blog.exolabs.net
Like Comment
To view or add a comment, sign in
Mohammed Karamathulla, PMP
1mo
Report this post
Accelerate large-scale vector search with NVIDIA cuVS integration in Faiss, boosting up to 12x faster index builds and 8x lower search latency on GPU. Effortlessly scale and deploy across CPU and GPU to power real-time AI and retrieval applications. https://lnkd.in/g6QHaziT

Enhancing GPU-Accelerated Vector Search in Faiss with NVIDIA cuVS | NVIDIA Technical Blog developer.nvidia.com
Like Comment
To view or add a comment, sign in
Diego R. Llanos
1mo
Report this post
🎥 New video: Performance-portable GPUs with SYCL (oneAPI & AdaptiveCpp) vs CUDA/HIP We've just published a short explainer on our JPDC paper where we stress-test SYCL on real HPC workloads (single- and multi-GPU, even mixed NVIDIA+AMD) from one codebase. What’s inside: • Why SYCL (modern C++) is a pragmatic path to “write once, run fast (almost) everywhere.” • A real application (UVaFTLE for flow analysis), with both memory-bound and compute-light kernels. • Results: SYCL ≈ HIP on AMD; competitive on NVIDIA (sometimes wins on lighter kernels); mixed-vendor multi-GPU works in practice. • Practical guidance: prefer device-resident data (USM-device or buffers); avoid shared/managed memory for performance-critical paths; oneAPI tends to shine on NVIDIA, AdaptiveCpp on AMD. Why it matters: Teams juggling heterogeneous clusters and evolving hardware roadmaps can maintain a single well-performing codebase without vendor lock-in, trading a sliver of peak performance for portability and longevity. ▶ Watch this short video! 📄 Preprint (JPDC, 2025): https://lnkd.in/dDPRiPeE 💻 Code (UVaFTLE: CUDA/HIP/SYCL): https://lnkd.in/dxjqC_wK If you’re working on GPU portability, we’d love your feedback!
Like Comment
To view or add a comment, sign in
Scaleway

63,289 followers
1mo Edited
Report this post
Can your CUDA code run anywhere? For nearly two decades, NVIDIA’s CUDA has set the standard for GPU programming — a powerful framework that turned GPUs into the engine of the AI revolution. Now, as demand for compute soars, others are following in its footsteps. From AMD’s HIP to Spectral’s SCALE and even new Chinese GPU stacks, developers are reimagining what “CUDA compatibility” means in a more open, multi-vendor world. Our latest deep dive retraces the path from the first GeForce 256 to today’s emerging efforts to make CUDA code run agnostically, and what this shift could mean for developers everywhere. 👉 Read the full story written by Antoine Radet & Cédric Courtaud, Ph.D : https://lnkd.in/egKSbBFM

Can Your CUDA Code Run on All GPUs? scaleway.com
Like Comment
To view or add a comment, sign in
Noureddine Taguelmimt
1mo
Report this post
Very interesting NVIDIA blog to read related to Floating Point Emulation in cuBLAS. The latest cuBLAS update in NVIDIA CUDA Toolkit 13.0 Update 2 introduces new APIs and implementations that significantly boost the performance of double-precision (FP64) matrix multiplications through floating-point emulation on Tensor Cores found in GPU architectures such as NVIDIA GB200 NVL72. The cuBLAS library includes an automatic dynamic precision (ADP) framework that analyzes inputs to determine if emulation can be safely leveraged for increased performance, and automatically configures emulation parameters to enable accuracy equal to or better than native FP64 matrix multiplication. Applications such as ecTrans, BerkeleyGW, and Quantum Espresso have seen significant performance improvements using FP emulation, with speedups ranging from 1.5x to 3x, while maintaining accuracy within acceptable ranges. https://lnkd.in/dvFbfuq7

Unlocking Tensor Core Performance with Floating Point Emulation in cuBLAS | NVIDIA Technical Blog developer.nvidia.com
Like Comment
To view or add a comment, sign in
Alex Frances
1mo
Report this post
Learn when to use CPUs vs. GPUs for AI inference. Compare performance, cost, and energy efficiency to choose the right hardware for your AI workloads. Read more. #CloudComputing https://ow.ly/4leu50X84pz

AI Inference Hardware Decisions: When to Choose CPUs vs. GPUs
Like Comment
To view or add a comment, sign in
Kendall Haenggi
1mo
Report this post
Learn when to use CPUs vs. GPUs for AI inference. Compare performance, cost, and energy efficiency to choose the right hardware for your AI workloads. Read more. #CloudComputing https://ow.ly/4leu50X84pz

AI Inference Hardware Decisions: When to Choose CPUs vs. GPUs
Like Comment
To view or add a comment, sign in
Jenny Lee
1mo
Report this post
Learn when to use CPUs vs. GPUs for AI inference. Compare performance, cost, and energy efficiency to choose the right hardware for your AI workloads. Read more. #CloudComputing https://ow.ly/4leu50X84pz

AI Inference Hardware Decisions: When to Choose CPUs vs. GPUs
Like Comment
To view or add a comment, sign in
Luca Olivari
1mo
Report this post
Learn when to use CPUs vs. GPUs for AI inference. Compare performance, cost, and energy efficiency to choose the right hardware for your AI workloads. Read more. #CloudComputing https://ow.ly/4leu50X84pz

AI Inference Hardware Decisions: When to Choose CPUs vs. GPUs
Like Comment
To view or add a comment, sign in
Cloud Native Hero
1mo
Report this post
🔥 GPU health monitoring just got native in Kubernetes 1.34! No more “Running” Pods with dead GPUs. Kubernetes now tracks per-resource health, surfacing GPU or accelerator failures directly in Pod status. ✅ Detect GPU faults in real time ✅ Automate recovery with controllers ✅ Stop wasting compute on broken devices Read the full blog 👇 👉 https://zurl.co/mA50F #Kubernetes #AI #GPU #ML #CloudNative

💡 Resource Health Status in Pods: Detecting GPU & Accelerator Failures in Real Time Swapnil Kulkarni on LinkedIn
Like Comment
To view or add a comment, sign in

1,884 followers

98 Posts

View Profile Follow

LinkedIn respects your privacy

Jens Schulz’s Post

Explore content categories

Jens Schulz’s Post

More Relevant Posts

AI Inference Hardware Decisions: When to Choose CPUs vs. GPUs

AI Inference Hardware Decisions: When to Choose CPUs vs. GPUs

AI Inference Hardware Decisions: When to Choose CPUs vs. GPUs

AI Inference Hardware Decisions: When to Choose CPUs vs. GPUs

Explore related topics

Explore content categories