🤖 AI K-news #41

🤖 AI K-news #41

First newsletter of July. Let's go!

➡️ Introducing Gemini CLI: Open‑Source AI Agent for Developers

Gemini CLI is Google’s new open‑source AI agent that embeds the Gemini 2.5 Pro model directly into the command line. It offers a massive 1 million‑token context window, enabling deep reasoning and powerful code understanding. With natural‑language prompts, developers can write, debug, and manage code, perform deep research, generate content, automate tasks, and even create images or video using Veo and Imagen tools. Gemini CLI also integrates with Google Search and supports the Model Context Protocol (MCP) for extensibility. Available now in preview, it’s free via a personal Google account tied to Gemini Code Assist, offering generous usage limits (60 model requests per minute; 1,000 per day). Code is Apache‑licensed, enabling inspection and community contributions. Gemini CLI transforms the developer terminal into an AI‑powered environment, complementing Gemini Code Assist in VS Code and rivaling tools like Copilot and OpenAI’s Codex CLI.

More info: https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/

➡️ New Qwen-VL-O Unifies Multimodal Understanding and Generation

The Alibaba Qwen team has released Qwen VLo, a new AI model that unifies multimodal understanding and generation within a single framework, allowing it to both interpret and create high-quality visual content. The model supports generating and editing images from text or sketches in multiple languages, and introduces features like "progressive generation" that builds images step-by-step for more user control. It is designed to handle complex, open-ended instructions for on-the-fly visual editing, such as changing styles or adding elements, positioning it as a powerful creative engine for a wide array of content-driven industries. The model is available for free use on the Qwen Chat platform.

More info: https://qwenlm.github.io/blog/qwen-vlo/

➡️ New Method Approximates LLM Training Data from Model Weights

Researchers from Cornell University have developed a method to approximate the finetuning data of language models that have open weights but closed training data. This gradient-based approach, called SELECT, addresses the challenge of data approximation by greedily selecting the highest-matching documents from a large public corpus, using only the weights of the original base model and the final finetuned model. Even when the selected data is completely different from the true training set, the method is effective enough to train a new model that approaches the original model's performance on tasks like text classification and supervised finetuning. The research shows that releasing model weights, even without the corresponding data, reveals sufficient information to enable the selection of effective substitute training datasets.

More info: https://arxiv.org/abs/2506.15553

➡️ Advanced Image Editing Goes Open-Weight with FLUX.1 Kontext [dev]

Black Forest Labs has released FLUX.1 Kontext [dev], a 12B parameter open-weight model that brings proprietary-level image editing performance to consumer hardware. Focusing exclusively on editing tasks, the model enables iterative changes and excels at character preservation across diverse scenes. Human preference evaluations show it outperforms existing open and closed models like Bytedance Bagel and Google's Gemini-Flash Image in many categories. Optimized for the NVIDIA Blackwell architecture with specific TensorRT weights, FLUX.1 Kontext [dev] is available under a non-commercial license on Hugging Face, with commercial licenses offered via a new self-serve portal.

More info: https://bfl.ai/announcements/flux-1-kontext-dev

➡️ Baidu Open-Sources ERNIE 4.5, a Family of Multimodal MoE Models

Baidu has open-sourced its ERNIE 4.5, a new family of 10 large-scale multimodal models, including Mixture-of-Experts (MoE) variants with up to 424 billion parameters. A key innovation is the use of multimodal heterogeneous MoE pre-training, which jointly trains on text and visual data to improve understanding across modalities. Built on the PaddlePaddle framework, the models demonstrate state-of-the-art results, with the largest version outperforming comparable models on numerous benchmarks. The entire ERNIE 4.5 family is available under the Apache 2.0 license, permitting commercial use, and is supported by open-source toolkits for fine-tuning and deployment.

More info: https://ernie.baidu.com/blog/posts/ernie4.5/

➡️ Natural Language Queries Arrive with Text-to-MQL Feature

MongoDB has introduced Text-to-MQL, a new capability within its LangChain integration designed to translate natural language queries into the MongoDB Query Language (MQL). This feature allows developers to build intuitive, LLM-powered application interfaces, such as internal chatbots or AI agents, that enable users to interact with and retrieve data from MongoDB using everyday language. The integration provides pre-defined tools for a comprehensive development toolkit that can be used standalone or within more complex agentic systems, aiming to democratize data access and enhance developer productivity.

More info: https://www.mongodb.com/blog/post/product-release-announcements/introducing-text-to-mql-langchain-query-mongodb-using-natural-language

➡️ Bridging the Gap Between Speed and Accuracy in Vector Search

Google Research has introduced MUVERA, a novel retrieval algorithm designed to make complex multi-vector search as fast and efficient as single-vector search without sacrificing accuracy. The method works by transforming multi-vector sets into single, fixed-dimensional encodings (FDEs), allowing for the use of highly-optimized Maximum Inner Product Search (MIPS) algorithms for an initial candidate retrieval, followed by a more precise re-ranking step. This approach has been shown to achieve high retrieval accuracy with significantly reduced latency compared to previous state-of-the-art methods, and an open-source implementation of the FDE construction algorithm has been made available on GitHub.

More info: https://research.google/blog/muvera-making-multi-vector-retrieval-as-fast-as-simple-vector-search/

➡️ REGEN Benchmark Aims for More Conversational AI Recommendations

To improve personalized recommendations, Google Research has introduced REGEN (Reviews Enhanced with GEnerative Narratives), a new benchmark dataset that integrates natural language interactions. The system augments existing review datasets by using Gemini 1.5 Flash to synthesize conversational elements like "critiques" and "narratives," allowing for the development of recommender systems that can adapt to user feedback in real time. By releasing the REGEN dataset, researchers aim to foster the creation of more sophisticated models that can engage in multi-turn dialogues and provide more intuitive, human-like recommendation experiences.

More info: https://research.google/blog/regen-empowering-personalized-recommendations-with-natural-language/

➡️ Seamless Aims for Real-Time, Expressive Multilingual Communication

Meta has unveiled major updates to its Seamless communication models, a suite of AI tools designed to enable more natural and expressive real-time conversations across different languages. The system now incorporates non-verbal cues like laughter and pauses to create more authentic interactions. The core of the technology is SeamlessExpressive, a model that preserves a speaker's vocal style and emotional nuances when translating their speech. This is complemented by a new streaming model for real-time translation and a text-to-text model, all working together to power more dynamic and lifelike cross-lingual communication.

More info: https://ai.meta.com/blog/seamless-interaction-natural-conversational-dynamics/

➡️ New Embedding Model Boosts Multimodal RAG Pipeline Accuracy

NVIDIA has introduced the Llama 3.2 NeMo Retriever Embedding Model, a 1.6B parameter multimodal tool designed to improve the accuracy of Retrieval-Augmented Generation (RAG) pipelines by directly processing both text and images. Available as an NVIDIA NIM (NVIDIA Inference Microservice), the model generates 2,048-dimensional embeddings and has been shown to outperform other small vision embedding models in retrieval accuracy on various benchmarks. This new retriever is designed for easy integration and is accessible via the API catalog in NVIDIA's hosted environment.

More info: https://developer.nvidia.com/blog/best-in-class-multimodal-rag-how-the-llama-3-2-nemo-retriever-embedding-model-boosts-pipeline-accuracy/

➡️ Tencent's New Hunyuan-A13B Focuses on 'Attention-Free' Architecture

Tencent's Hunyuan team has introduced Hunyuan-A13B, a new 13-billion-parameter language model notable for its "attention-free" architecture, which aims to significantly reduce inference costs and memory usage. The model replaces the standard attention mechanism with a novel approach that retains local-window attention while using gating and recurrence for global context modeling. This design allows for a linear increase in computational complexity relative to sequence length, making it highly efficient for long-context tasks. Hunyuan-A13B has demonstrated strong performance, outperforming other models of similar size on various benchmarks, and is available as an open-source project.

More info: https://github.com/Tencent-Hunyuan/Hunyuan-A13B

➡️ New Framework Gives Developers More Control Over Agentic Workflows

LlamaIndex has announced the official release of Workflows 1.0, a lightweight, async-first framework for building complex, multi-step agentic systems in Python and Typescript. Now available as its own standalone package to encourage broader adoption outside the core LlamaIndex ecosystem, Workflows allows developers to orchestrate AI models and APIs while maintaining a high level of control. The 1.0 release introduces new features including typed workflow state for improved safety, dynamic resource injection, and enhanced observability through integrations with tools like OpenTelemetry. The framework is designed for a wide range of use cases, from document processing pipelines and multimodal applications to role-based multi-agent systems.

More info: https://www.llamaindex.ai/blog/announcing-workflows-1-0-a-lightweight-framework-for-agentic-systems

➡️ Path to Medical Superintelligence (Microsoft AI Diagnostic Orchestrator)

Microsoft's MAI-DxO, a medical AI "orchestrator" achieved an impressive 85 % accuracy on 304 New England Journal of Medicine case studies, about four times better than the 20 % accuracy from 21 experienced physicians working without external aids. Using a novel “chain-of-debate” method, MAI-DxO coordinates multiple top AI models (like OpenAI's o3, Meta's Llama, Anthropic's Claude, Google’s Gemini, and xAI’s Grok) to collaboratively diagnose complex conditions. It also reduced unnecessary testing, cutting estimated costs by roughly 20 %. While heralded as a step toward “medical superintelligence,” the system remains experimental and unverified in clinical settings. Microsoft emphasizes it will complement, not replace, human clinicians, who provide critical empathy and contextual judgment.

More info: https://microsoft.ai/new/the-path-to-medical-superintelligence/

➡️ Modular 'Unmute' System Brings Voice to Any Text LLM

The French AI research lab Kyutai has introduced Unmute, an open-source system designed to add real-time, natural-sounding voice interaction to any text-based Large Language Model (LLM). The modular framework wraps a text LLM with Kyutai's own streaming speech-to-text (STT) and text-to-speech (TTS) models. Key features include a semantic Voice Activity Detection (VAD) that understands pauses to avoid premature interruptions, the ability to be interrupted for smart turn-taking, and support for voice cloning from a short audio sample. The entire system is available on GitHub under an MIT license and is optimized for low-latency, enabling more natural conversational experiences with existing powerful text models.

More info: https://kyutai.org/2025/05/22/unmute.html | https://github.com/kyutai-labs/unmute

➡️ How Korean Teachers Are Embracing AI in Classrooms to Drive Change

Korean educators are leveraging AI to create more engaging and inclusive classrooms. Teachers like Hyunsik Cho and Sangmin Lee are using tools such as Microsoft Copilot, Minecraft, and Power BI to simplify lesson prep, personalize learning, and visualize data. These technologies save time and help tailor content to students’ needs. At Gunseo Future International School, teachers adapt materials to students’ language levels and cultural backgrounds with AI support. Across Busan, Chungbuk, and Gyeonggi, educators are shifting their mindset to integrate AI not just as a tool, but as a partner in teaching, fostering student creativity, collaboration, and deeper understanding.

More info: https://news.microsoft.com/source/asia/features/how-korean-teachers-are-embracing-ai-in-classrooms-to-drive-change/

Credits to Héctor Castillejo and José Francisco Pardo García

To view or add a comment, sign in

Others also viewed

Explore topics