Release Notes 03.05.2026

AI Model Portfolio Update

To keep the PHOENIQS AI Model Service current and to provide customers with access to the latest model capabilities, we are updating the model portfolio as part of our regular lifecycle management process. 

Over the next 20 business days, selected models will be retired from service, with newer recommended alternatives identified to provide stronger capability coverage, improved performance, and better long-term support. This update is intended to ensure that customers continue to benefit from a service that remains aligned with the rapidly evolving model landscape. 

As part of this refresh, we are pleased to introduce Kimi K2.6 and Gemma 4.1. Additional models are also planned for release on a rolling basis, including: 

Qwen 3.5 35B-A3B

GLM-5 

Quest Coder V1 7B Instruct

Voxtral 4B TTS

FLUX.2 [klein] 4B 

Qwen-Image-2512

Qwen-Image-Edit-2511 

Recommended replacement models have been identified to support a smooth transition based on workload type and capability profile. Because behavior may vary across models, customers should validate prompts, tool behavior, output formats, latency, and response quality before migration.  

This portfolio update reflects our continued investment in keeping the service up to date and expanding customer access to new model capabilities as they become available. 

If short-term continued access to a retiring model is required for transition planning, customers may submit a service desk ticket outlining the use case, expected business impact, and requested timeframe. Requests will be reviewed in accordance with the standard decommissioning process; however, continued availability cannot be guaranteed. Further model additions will be communicated as they are released into service. 

Recommended replacements

Decommissioned model 

Recommended replacement 

Replacement basis 

Source 

apertus-8B 

inference-apertus-70B 

A large general-purpose instruct model optimized for complex enterprise workloads, including conversational AI, content generation, summarization, question answering, and multi-step reasoning, with deployment and optimization tailored for Switzerland. 

Hugging Face model card 

deepseekr1-70b 

inference-qwq-32b 

Qwen describes QwQ-32B as its medium-sized reasoning model, built for harder downstream problems and capable of competitive performance against state-of-the-art reasoning models, including DeepSeek-R1 and o1-mini. 

Hugging Face model card 

qwq25-vl-72b 

inference-qwen3-vl-235b 

Qwen describes Qwen3-VL as the most powerful vision-language model in the Qwen series to date, with stronger text understanding and generation, deeper visual reasoning, longer context, and stronger agent interaction capabilities. 

Hugging Face model card 

kimi-K2 

inference-kimi-K2.6 

Moonshot AI describes Kimi K2.6 as an open-source, native multimodal agentic model with advances in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration. 

Hugging Face model card 

llama 3.3 70B 

inference-llama4-maverick  

 

(Primary) 

Meta Llama 4 Maverick is a natively multimodal mixture-of-experts model designed for text and image understanding, with strong performance on multilingual tasks, coding, tool calling, and agentic use cases, while aiming for fast responses at relatively low cost. 

Hugging Face model card 

llama 3.3 70B 

inference-llama4-scout-17b 

 

(Secondary) 

Meta Llama 4 Scout is a natively multimodal model designed for text and image understanding, with single-H100-GPU efficiency and a 10 million token context window for long-document and long-context workloads. Meta says the Llama 4 family is optimized for multimodal understanding, multilingual tasks, coding, tool calling, and agentic systems. 

Hugging Face model card 

 

deepseek-670B 

inference-deepseek-V32 

DeepSeek introduces DeepSeek-V3.2 as a model that combines high computational efficiency with strong reasoning and agent performance. 

Hugging Face model card 



What this means for you 

If you are currently using any of the affected models, please plan to migrate applications, prompts, and API calls to the recommended alternatives within the next 20 business days, based on the Basel, Switzerland business calendar. 

Because output characteristics and tool behavior may vary by model, we recommend validating prompt compatibility, structured outputs, latency, and overall response quality before cutover. 

Recommended next steps 
  • Identify applications and workflows currently using the affected models. 
  • Update model references to the recommended replacement models. 
  • Test prompt compatibility, latency, output quality, and downstream integrations. 
  • Complete migration before the communicated decommissioning date. 
Support 

Please contact your account team or support representative if you would like assistance selecting or validating the most appropriate replacement for a specific workload.