
OpenAI has officially launched its first open-weight models since GPT‑2 with the release of gpt‑oss‑120b and gpt‑oss‑20b, delivering high-end reasoning and tool-use AI that developers can run locally under the Apache 2.0 license. These models challenge the proprietary status quo by offering performance on par with OpenAI proprietary models like o4‑mini and o3‑mini—while empowering users to customize, run offline, and deploy on private infrastructure.
GPT-OSS High Performance, Low Costs, Full Control

The gpt‑oss‑120b, a 117 billion‑parameter Transformer using a mixture‑of‑experts (MoE) architecture, delivers near‑parity with OpenAI’s o4‑mini on core reasoning benchmarks like Codeforces, MMLU, TauBench agentic tasks, and HealthBench—while running efficiently on a single 80 GB GPU.
Meanwhile, gpt‑oss‑20b, at 21 billion parameters, matches or outperforms o3‑mini and runs on consumer hardware with just 16GB of memory, making it usable on laptops or desktops.
Both models support chain‑of‑thought (CoT) reasoning, few‑shot function calling, structured output, and agentic tool use—letting them browse the web, reason mathematically, call APIs, or execute Python code as part of intelligent workflows.
GPT-OSS Built for Real‑World Use & Safety

These models are not experimental toys—they’re built for efficient deployment on consumer hardware and enterprise systems alike. The 120B model is optimized for a single 80 GB accelerator, while the 20B model is tailor‑made for edge or on‑device inference.
OpenAI has released the weights under Apache 2.0, granting developers full flexibility to fine‑tune, redistribute, or commercialize the models without vendor lock‑in.
Safety was a top priority: OpenAI conducted external evaluations using its Preparedness Framework, including internal adversarial fine‑tuning tests to simulate misuse scenarios. The models performed comparably to proprietary models in safety benchmarks, and the methodology was peer‑reviewed by experts.
OpenAI also launched a Red Teaming Challenge with a $500,000 prize to crowdsource novel safety issues and publish findings along with an evaluation dataset for the developer community.
GPT-OSS Architecture & Deployment
Both models leverage mixture‑of‑experts (MoE), activating a subset of parameters per token—gpt‑oss‑120b triggers ~5.1B active params, while gpt‑oss‑20b uses ~3.6B—a design choice that balances capability and efficiency.
They also support 128K token context windows, grouped multi‑query attention with group size 8, alternating dense and local sparse attention, rotary positional embeddings (RoPE), and efficient quantization (MXFP4).
These innovations mean the gpt‑oss models can scale efficiently and remain usable on modest hardware while preserving high reasoning capacity.
Use Cases & Deployment Partners
OpenAI is already working with early partners like Microsoft (Azure AI Foundry and Windows AI Foundry), Databricks, AWS Bedrock, Hugging Face, vLLM, llama.cpp, Ollama, and others to make deployment seamless across cloud, local, and edge environments.
NVIDIA has also optimized the models for RTX AI PCs and GeForce RTX 5090 GPUs, delivering inference speeds up to 256 tokens/second locally.
Developers can run the models via Hugging Face Transformers, utilize the open-sourced tokenizer (o200k_harmony), and integrate with agentic workflows using the Responses API or open tools like openai‑harmony and LangChain.
What’s Next: Scaling Open AI Innovation
With the release of gpt‑oss‑120b and 20b, OpenAI is doubling down on the promise of democratized, transparent AI. By making high-end reasoning models open-weight and deployable, they reduce reliance on closed APIs and bring advanced capabilities to developers, enterprises, governments, and researchers who lack the budget or infrastructure for proprietary systems.
As CEO Sam Altman framed it: it’s a “triumph of technology” designed to get AI into the hands of as many people as possible—while pushing ethical and safety standards forward in the open-source AI ecosystem