SageMaker JumpStart now offers optimized deployments, enabling customers to deploy foundation models with pre-configured settings tailored to specific use cases and performance constraints. SageMaker JumpStart optimized deployments simplify model deployment by offering task-aware configurations that optimize for cost, throughput, or latency based on your workload requirements - whether content generation, summarization, or Q&A. This launch includes support for 30+ popular models from Meta, Microsoft, Mistral AI, Qwen, Google, and TII, with visibility into key performance metrics like P50 latency, time-to-first token (TTFT), and throughput before deployment.
With SageMaker JumpStart optimized deployments, customers can select from use case-specific configurations (such as generative writing or chat-style interactions) and choose optimization targets including cost-optimized, throughput-optimized, latency-optimized, or balanced performance. Models deploy to SageMaker AI Managed Inference endpoints or SageMaker HyperPod clusters with pre-set configurations that eliminate guesswork while maintaining full visibility into deployment details. Available models include Meta Llama 3.1 and 3.2 variants, Microsoft Phi-3, Mistral AI models including the new Mistral-Small-24B-Instruct-2501, Qwen 2 and 3 series including multimodal Qwen2-VL, Google Gemma, and TII Falcon3. All deployments leverage SageMaker's VPC deployment capabilities, ensuring data control and production-ready infrastructure with enterprise-grade security. The feature is available in all AWS regions where SageMaker JumpStart is curretly supported.
To get started with optimized deployments, navigate to Models in SageMaker Studio, select your desired foundation model in the JumpStart Models tab, choose "Deploy," and select your use case and performance optimization target. For details, visit the SageMaker JumpStart documentation. AWS is actively expanding support to include additional models.
AWS Clean Rooms now supports configurable Spark properties for PySpark jobs, offering customers the ability to optimize their workloads based on their performance and scale requirements. With this launch, customers can customize Spark settings such as memory overhead, task concurrency, and network timeouts for each analysis that uses PySpark, the Python API for Apache Spark. For example, a pharmaceutical research company collaborating with healthcare organizations for real-world clinical trial data can set specific memory tuning for large-scale workloads to improve performance and optimize costs.
AWS Clean Rooms helps companies and their partners easily analyze and collaborate on their collective datasets without revealing or copying one another’s underlying data. For more information about the AWS Regions where AWS Clean Rooms is available, see the AWS Regions table. To learn more about collaborating with AWS Clean Rooms, visit AWS Clean Rooms.
Amazon Managed Grafana now supports creating new workspaces with Grafana version 12.4. This release includes features that were launched as a part of open source Grafana versions 11.0 to 12.4, including Drilldown apps, scenes powered dashboards, variables in transformations, visualization enhancements, and new features with the Amazon CloudWatch plugin.
Queryless Drilldown apps enable customers to perform point-and-click exploration of Prometheus metrics, Loki logs, Tempo traces, and Pyroscope profiles. The Scenes-powered rendering engine boosts dashboard performance. Amazon CloudWatch Logs adds support for PPL and SQL queries, cross-account Metrics Insights, and log anomaly detection. The rebuilt table visualization improves performance with CSS cell styling and interactive Actions buttons, while trendline transformations and navigation bookmarks enhance data exploration. Grafana 12.4 is supported in all AWS regions where Amazon Managed Grafana is generally available.
You can create a new Amazon Managed Grafana workspace from the AWS Console, SDK, or CLI. To explore the complete list of new features, please refer to the user documentation. Follow the instructions here to create workspaces with version 12.4. To learn more about Amazon Managed Grafana features and its pricing, visit the product page and pricing page.
Amazon Connect now supports the use of flow modules across all Connect flows, allowing you to reuse common logic and functionality beyond inbound customer experiences. Flow modules organize repeatable logic and create common reusable functions across the customer experiences you build with flows. For example, you can now use a module to share information about a customer’s recent transactions in an agent whisper flow, preparing the agent with relevant details and leveraging functionality that was previously only available as part of inbound flows.
Additionally, you can now use flow modules within other modules, enabling you to build complex logic by stitching together pre-built intermediary steps under a single module. For example, a credit card eligibility module can invoke other modules that check credit scores, verify income, and review payment history before making a final determination. This modular approach allows you to build reusable components that can be combined and extended as your business requirements evolve.
To learn more about these features, see the Amazon Connect Administrator Guide. To understand recent enhancements to flow module capabilities, see our AWS blog post. This feature is available in all AWS regions where Amazon Connect is offered. To learn more about Amazon Connect, the AWS cloud-based contact center, please visit the Amazon Connect website.
Today, AWS Deadline Cloud announces an AI-powered troubleshooting assistant that helps you quickly diagnose and resolve render job failures. AWS Deadline Cloud is a fully managed service that simplifies render management for computer-generated 2D/3D graphics and visual effects for films, TV shows, commercials, games, and industrial design.
Render job failures from missing assets, software errors, configuration mismatches, and resource constraints can stall production pipelines and waste compute resources. Previously, diagnosing these issues required specialized technical staff to manually parse logs and identify root causes — a process that is time-consuming, difficult to scale, and often unavailable to smaller studios. The new Deadline Cloud assistant investigates failed jobs you identify, analyzes logs and metrics, detects common issues, and provides troubleshooting recommendations based on industry best practices and a pre-trained knowledge base covering Deadline Cloud, common render farm issues, and popular digital content creation applications including Autodesk Maya, 3ds Max, VRED, Blender, SideFX Houdini, Maxon Cinema 4D, Foundry Nuke, and Adobe After Effects. The assistant runs within your AWS account using Amazon Bedrock, keeping all data and analysis within your control.
The Deadline Cloud assistant is available today in all AWS Regions where AWS Deadline Cloud is supported. Watch a demo on YouTube to see it in action, or visit the AWS Deadline Cloud documentation to learn more.
Amazon SageMaker HyperPod now supports on-demand deep health checks for Amazon EKS and Slurm-orchestrated clusters, enabling you to proactively verify GPU accelerator health on running instances at any time. HyperPod Slurm-orchestrated clusters now also support deep health checks during node provisioning, at the time of cluster creation. This capability addresses a critical challenge where even a single unhealthy node can waste hours of compute time and delay critical workloads.
With on-demand deep health checks, you can target entire instance groups or specific instances to run comprehensive hardware stress tests and connectivity tests before committing compute resources to a job. Progress and results are visible at both the instance group and instance level through the SageMaker console and APIs, providing complete visibility into GPU health, network connectivity, and multi-node communication performance. Instances undergoing checks are automatically isolated from workload scheduling and returned to service upon passing. When paired with HyperPod's automatic node recovery capability, instances that fail are automatically rebooted or replaced, ensuring cluster health.
This capability is available in all regions where Amazon SageMaker HyperPod is available. To learn more about on-demand health checks, see the documentation.
Amazon Elastic Container Registry (Amazon ECR) now automatically discovers and syncs OCI referrers, such as image signatures, SBOMs, and attestations, from upstream registries into your Amazon ECR private repositories with its pull through cache feature.
Previously, when you listed referrers on a repository with a matching pull through cache rule, Amazon ECR would not return or sync referrers from the upstream repository. This meant that you had to manually list and fetch the upstream referrers.
With today's launch, Amazon ECR's pull through cache will now reach upstream during referrers API requests and automatically cache related referrer artifacts in your private repository. This enables end-to-end image signature verification, SBOM discovery, and attestation retrieval workflows to work seamlessly with pull through cache repositories without requiring any client-side workarounds.
This feature is available today in all AWS Regions where Amazon ECR pull through cache is supported. To learn more, visit the Amazon ECR documentation.
In this release, AWS Neuron SDK 2.29.0 promotes the Neuron Kernel Interface (NKI) from Beta to Stable with version 0.3.0. NKI gives developers direct, low-level programming access to AWS Trainium and AWS Inferentia NeuronCores using a Python-based syntax. This release introduces the NKI Standard Library, which exposes developer-visible source code for all NKI APIs and native language objects. It also contains a new CPU Simulator that lets developers write, test, and debug NKI kernels locally on standard CPU, without requiring Trainium hardware, using standard Python debugging tools. NKI 0.3.0 also adds new ISA-level features including a dedicated exponential instruction, matmul accumulation control, DMA priority settings for Trn3, and variable-length all-to-all collectives.
The NKI Library expands with 7 new experimental kernels covering Conv1D, a multi-layer Transformer token generation megakernel, fused communication-compute primitives for Trainium2, and dynamic tiling operations. Existing kernels also receive improvements. Attention CTE scales to larger batch sizes and sequence lengths, MLP adds mixed-precision quantization paths, and MoE TKG introduces a dynamic all-expert algorithm.
For inference, NxD Inference improves vision language model support with optimizations for Qwen3 VL and Qwen2 VL, including text-model sequence parallelism and vision data parallelism. vLLM Neuron Plugin updated to version 0.5.0.
Neuron Explorer, Neuron’s profiling and debugging suite of tools, also moves from Beta to Stable. The System Trace Viewer now supports the full set of Device widgets for multi-device profile analysis, and the tool is available on the VS Code Extension Marketplace for streamlined installation. For full release details, see the AWS Neuron SDK 2.29.0 release notes.
The SDK is available in all AWS Regions supporting Inferentia and Trainium instances.
Learn more:
Amazon SageMaker HyperPod now supports flexible instance groups, enabling customers to specify multiple instance types and multiple subnets within a single instance group. Customers running training and inference workloads on HyperPod often need to span multiple instance types and availability zones for capacity resilience, cost optimization, and subnet utilization, but previously had to create and manage a separate instance group for every instance type and availability zone combination, resulting in operational overhead across cluster configuration, scaling, patching, and monitoring.
With flexible instance groups, you can define an ordered list of instance types using the new InstanceRequirements parameter and provide multiple subnets across availability zones in a single instance group. HyperPod provisions instances using the highest-priority type first and automatically falls back to lower-priority types when capacity is unavailable, eliminating the need for customers to manually retry across individual instance groups. Training customers benefit from multi-subnet distribution within an availability zone to avoid subnet exhaustion. Inference customers scaling manually get automatic priority-based fallback across instance types without needing to retry each instance group individually, while those using Karpenter autoscaling can reference a single flexible instance group. Karpenter automatically detects supported instance types from the flexible instance group and provisions the optimal type and availability zone based on pod requirements. You can create flexible instance groups using the CreateCluster and UpdateCluster APIs, the AWS CLI, or the AWS Management Console.
Flexible instance groups are available for SageMaker HyperPod clusters using the EKS orchestrator in all AWS Regions where SageMaker HyperPod is supported. To learn more, see Flexible instance groups.
Amazon EC2 High Memory U7i-8TB instances (u7i-8tb.112xlarge) and U7i-12TB instances (u7i-12tb.224xlarge) are now available in AWS Asia Pacific (Singapore) region. U7i instances are part of AWS 7th generation and are powered by custom fourth generation Intel Xeon Scalable Processors (Sapphire Rapids). U7i-8tb instances offer 8TiB of DDR5 memory, and U7i-12tb instances offer 12TiB of DDR5 memory, enabling customers to scale transaction processing throughput in a fast-growing data environment.
U7i-8tb instances deliver 448 vCPUs; U7i-12tb instances deliver 896 vCPUs. Both instances support up to 100 Gbps of Amazon EBS bandwidth for faster data loading and backups, 100 Gbps of network bandwidth, and ENA Express. U7i instances are ideal for customers using mission-critical in-memory databases like SAP HANA, Oracle, and SQL Server.
To learn more about U7i instances, visit the High Memory instances page.
Security logs capture essential security-related activities, such as user sign-ins, file access, network traffic, and application usage. These logs are important for monitoring, detecting, and responding to potential security events. The Open Cybersecurity Schema Framework (OCSF) addresses this challenge by providing a standardized format to represent security events, ensuring consistent and efficient data handling across […]
Bulletin ID: 2026-016-AWS
Scope: AWS
Content Type: Important (requires attention)
Publication Date: 2026/04/17 11:15 AM PDT
Description:
The Amazon EFS CSI Driver is a Container Storage Interface driver that allows Kubernetes clusters to use Amazon Elastic File System.
We identified CVE-2026-6437, where an actor with PersistentVolume creation privileges can inject arbitrary mount options via two unsanitized fields: the Access Point ID in volumeHandle and the mounttargetip volumeAttribute. In both cases, appending comma-separated values causes the mount utility to parse them as separate mount options.
No AWS service is affected.
Impacted versions: EFS CSI Driver <&equal; v3.0.0
Please refer to the article below for the most up-to-date and complete information related to this AWS Security Bulletin.
In this post, we share how AWS Marketing’s Technology, AI, and Analytics (TAA) team worked with Gradial to build an agentic AI solution on Amazon Bedrock for accelerating content publishing workflows.
This hands-on guide walks through every step of fine-tuning an Amazon Nova model with the Amazon Nova Forge SDK, from data preparation to training with data mixing to evaluation, giving you a repeatable playbook you can adapt to your own use case. This is the second part in our Nova Forge SDK series, building on the SDK introduction and first part, which covered kicking off customization experiments.
In this post, we show you how to build a video semantic search solution on Amazon Bedrock using Nova Multimodal Embeddings that intelligently understands user intent and retrieves accurate video results across all signal types simultaneously. We also share a reference implementation you can deploy and explore with your own content.
In this post, we show you how to use Model Distillation, a model customization technique on Amazon Bedrock, to transfer routing intelligence from a large teacher model (Amazon Nova Premier) into a much smaller student model (Amazon Nova Micro). This approach cuts inference cost by over 95% and reduces latency by 50% while maintaining the nuanced routing quality that the task demands.
In this post, we share how Amazon Bedrock's granular cost attribution works and walk through example cost tracking scenarios.