Skip to content

Artificial intelligence company OpenAI is introducing compact, open-source models to compete with DeepSeek, marking the first time these models will be accessible on Amazon Web Services (AWS) platform.

Performance-tailored for resource-restricted settings, yet benchmark results remain undefined

OpenAI launches compact models, competing with DeepSeek, accessible for the first time on Amazon...
OpenAI launches compact models, competing with DeepSeek, accessible for the first time on Amazon Web Services (AWS)

Artificial intelligence company OpenAI is introducing compact, open-source models to compete with DeepSeek, marking the first time these models will be accessible on Amazon Web Services (AWS) platform.

OpenAI, a leading AI research company, has made a significant stride in the AI landscape by releasing two open-weight large language models (LLMs), gpt-oss-120B and gpt-oss-20B. These models are now available on Amazon Web Services (AWS) through Amazon Bedrock and Amazon SageMaker AI platforms.

These state-of-the-art models deliver strong performance in reasoning, tool use, and coding tasks, outperforming similarly sized open models and competing closely with OpenAI's proprietary models like the o4-mini and o3-mini variants.

The 120B model achieves near-parity with OpenAI's o4-mini on key reasoning benchmarks while running efficiently on a single 80GB GPU, making it suitable for high-capability agentic and autonomous applications. The 20B model, fitting within 16GB VRAM, matches o3-mini-level performance and is optimized for consumer and edge hardware where resources are limited.

Performance highlights include strong reasoning and benchmark scores, excelling in few-shot chain-of-thought (CoT) reasoning settings, and smooth test-time scaling. Both models show strong tool use and coding capabilities, with the 120B outperforming earlier open-source and some proprietary GPT-4o variants on HealthBench and agentic tasks.

The models leverage Mixture-of-Experts (MoE) architectures with active parameter routing, combined with 4-bit quantization (mxfp4), achieving fast and cost-effective inference on consumer-grade GPUs. The models also boast a 128K context window, supporting longer interactions, such as document analysis or technical support tasks.

OpenAI's gpt-oss models are licensed under Apache 2.0, a permissive open-source licensing that allows extensive freedom for reuse, modification, and commercial integration without strict restrictions or royalties. This move aims to lower access barriers for AI development in high-security or resource-limited environments.

The release positions OpenAI as a key player in open model infrastructure and strengthens its ties with Amazon Web Services. OpenAI and AWS are promoting this release as a developer-ready foundation for building scalable AI applications, with features like Guardrails and custom model import.

This is OpenAI's entry into the open-weight model segment, a space that until now has been dominated by competitors such as Mistral AI and Meta. External evaluations of the models' performance across varied workloads are not yet available.

The models integrate with developer tooling, supporting platforms like vLLM, llama.cpp, and Hugging Face. This compatibility encourages integration into various platforms, toolchains, and open-source projects, thereby enhancing experimentation and adoption.

In summary, OpenAI's gpt-oss models combine competitive real-world performance close to proprietary GPT models with efficient compute requirements and permissive open licensing, making them influential for AI research and deployment in both mainstream and sensitive or resource-constrained settings.

  1. The large language models, gpt-oss-120B and gpt-oss-20B, released by OpenAI, show strong performance in coding tasks and gaming scenarios, as they outperform earlier open-source and some proprietary GPT-4o variants on coding capabilities and even surpass earlier open-source variants on agentic tasks related to gaming.
  2. These gpt-oss models from OpenAI, with their Mixture-of-Experts (MoE) architectures, are optimized for various technological environments, such as consumer and edge hardware, with the 20B model fitting within 16GB VRAM and the 120B model efficiently running on a single 80GB GPU, making them suitable for a wide range of computing applications.

Read also:

    Latest