Models
Browse 100+ text, image, and video generation models. Click any model to try it in the playground or get the API.
Showing 91 models
ImageHOTBlack Forest Labs
Flux Dev
FLUX.1 [dev] is Black Forest Labs' open-weights guidance-distilled model that achieves near-Pro quality at a fraction of the cost. It's the go-to model for non-commercial experimentation and rapid iteration.
ImageHOTBlack Forest Labs
Flux Pro
FLUX.1 [pro] is Black Forest Labs' flagship commercial model, delivering state-of-the-art text adherence, photorealism, and fine detail. It's the benchmark for portrait, product, and advertising photography.
VideoHOTWanVideo
Wan 2.6
Wan 2.6 is WanVideo's most advanced video model, producing 720p clips with strong temporal consistency and natural motion dynamics. It supports both text-to-video and image-to-video pipelines at a competitive price.
VideoHOTKuaishou
Kling 1.6
Kling 1.6 is Kuaishou's flagship video model with exceptional motion quality and precise prompt adherence across complex scene dynamics and human motion. It's widely considered the best value-to-quality model for cinematic video generation.
TextHOTDeepSeek
DeepSeek R1
DeepSeek R1 is a 671B open-source reasoning model trained with reinforcement learning, matching or beating OpenAI o1 on math, science, and coding benchmarks. At $0.55/M input tokens versus o1's $15, it delivers frontier reasoning at a 96% cost reduction.
TextHOTAnthropic
Claude Sonnet 4.5
Claude Sonnet 4.5 is Anthropic's best balance of intelligence and speed in the Claude 4 family, with a 200K context window and top-tier coding and analysis capabilities. It's the recommended model for production agentic workflows and complex document processing.
VideoHOTRunway
Gen-3 Alpha
Gen-3 Alpha is Runway's most capable text-to-video model and widely considered the industry benchmark for visual fidelity and motion quality. It excels at cinematic scenes with complex lighting, camera movements, and realistic character motion.
ImageFASTBlack Forest Labs
Flux Schnell
FLUX.1 [schnell] is the fastest Flux model, generating high-quality images in as few as 1–4 inference steps. It's optimized for local development, batch workflows, and rapid prompt iteration.
ImageBlack Forest Labs
Flux Pro Ultra
FLUX.1 Pro Ultra extends Flux Pro to natively generate 4-megapixel images up to 2048×2048 resolution without post-processing upscaling. It's the right choice for print, billboard, and professional editorial work.
ImageStability AI
Stable Diffusion 3.5 Large
Stable Diffusion 3.5 Large is Stability AI's most capable open model, using an 8-billion parameter MMDiT architecture with a 16-channel VAE. It excels at complex compositional scenes and is one of the few open models with reliable in-image text rendering.
ImageCHEAPStability AI
SDXL
Stable Diffusion XL is Stability AI's battle-tested 3.5B parameter model, tuned for 1024×1024 native resolution outputs. It has the largest LoRA and fine-tune ecosystem of any image generation model.
ImageOpenAI
DALL·E 3
DALL·E 3 is OpenAI's most capable image model, with unmatched prompt-following fidelity among commercial models. It's natively integrated into ChatGPT and supports conversational image refinement.
ImageNEWIdeogram
Ideogram 3.0
Ideogram 3.0 is the best commercially available model for rendering accurate, legible text within images — a task most diffusion models fail at. It's the top choice for posters, logos, signage, and typography-heavy branded content.
ImageRecraft
Recraft V3
Recraft V3 offers professional-grade control over style consistency, making it the leading model for brand asset creation at scale. It supports vector outputs and leads on the Hugging Face image generation leaderboard.
ImageImagen 3
Imagen 3 is Google's state-of-the-art text-to-image model with dramatic improvements in lighting, fine detail, and artifact reduction over previous versions. It's particularly strong at photorealistic lifestyle, nature, and architectural imagery.
VideoByteDance
Seedance 2.0
Seedance 2.0 is ByteDance's most advanced video model, producing photorealistic 720p video with strong narrative coherence and scene-level consistency. It supports text-to-video, image-to-video, and reference-subject-to-video generation.
VideoCHEAPMiniMax
Hailuo Mini
Hailuo Mini is MiniMax's efficient text-to-video model, delivering consistent 6-second clips with smooth motion at an accessible price point. It's well-suited for social media content creators who need volume without sacrificing quality.
VideoCHEAPWanVideo
Wan 2.1 I2V
Wan 2.1 I2V specializes in animating static images into fluid video with natural-looking motion while preserving fine details from the reference image. At $0.018/sec, it's one of the lowest-cost image-to-video models available.
VideoZhipu AI
CogVideoX
CogVideoX is Zhipu AI's open-source video generation model built on a DiT architecture with strong temporal coherence on text-to-video tasks. It's a reliable open option for researchers and developers who need transparent model weights.
TextFASTDeepSeek
DeepSeek V3
DeepSeek V3 is a 685B Mixture-of-Experts model with only 37B active parameters per forward pass, delivering frontier-class performance on coding and general tasks. It offers the best price-to-performance ratio among non-reasoning models at $0.27/M input tokens.
TextAlibaba
Qwen 3 235B
Qwen 3 235B is Alibaba's largest model with hybrid thinking capabilities and top-tier multilingual performance across 100+ languages. It achieves best-in-class results on coding, math, and long-context document understanding benchmarks.
TextNEWMoonshot
Kimi K2
Kimi K2 is Moonshot AI's MoE model with 32B active parameters, built specifically for agentic tool-use workflows and long-horizon coding tasks. It achieves top results on SWE-bench and tool-use benchmarks at a competitive price.
TextHOTGemini 2.0 Flash
Gemini 2.0 Flash is Google's best multimodal model for high-throughput use cases, handling text, images, audio, and video in a single API call. It offers a 1M token context window and is significantly faster and cheaper than Gemini 1.5 Flash.
TextMeta
Llama 3.3 70B
Llama 3.3 70B is Meta's most capable 70B model, delivering performance competitive with much larger models on instruction-following and coding tasks. It's fully open-source under a permissive commercial license, making it the default choice for open deployments.
TextMistral
Mistral Large 2
Mistral Large 2 is Mistral AI's frontier model with strong reasoning, multilingual capabilities, and a 128K context window. Built in France, it's particularly strong on European languages and compliance-sensitive enterprise use cases.
TextxAI
Grok 3 Mini
Grok 3 Mini is xAI's efficient thinking model with strong performance on math, science, and STEM benchmarks relative to its inference cost. It supports a configurable thinking budget that lets developers trade compute for accuracy.
TextFASTNova
Nova Turbo
Nova Turbo is Nova's proprietary ultra-low-latency model optimized for real-time streaming applications, achieving time-to-first-token under 300ms. It's the best choice for live chat, voice interfaces, and any latency-critical production use case.
TextAnthropic
Claude Opus 4.7
Claude Opus 4.7 is Anthropic's most capable model, trained with extended thinking and achieving frontier-level performance on complex reasoning, long-document analysis, and advanced code generation. Use it when maximum intelligence matters more than cost.
TextFASTAnthropic
Claude Haiku 4.5
Claude Haiku 4.5 is Anthropic's fastest and most compact model, delivering near-instant responses with a 200K context window. It maintains strong performance on instruction-following, summarization, and classification tasks despite its speed-first design.
TextHOTOpenAI
GPT-4o
GPT-4o is OpenAI's flagship multimodal model, handling text, images, and audio with state-of-the-art accuracy across a broad range of tasks. It offers the best balance of capability and speed among OpenAI's frontier models.
TextCHEAPOpenAI
GPT-4o Mini
GPT-4o Mini is OpenAI's cost-efficient frontier model, outperforming GPT-3.5 Turbo on most benchmarks at a fraction of the cost. It's the default choice for classification, extraction, and high-volume applications where cost matters.
TextOpenAI
o1
OpenAI o1 is a reasoning model that spends time thinking through complex problems step by step before responding, achieving leading results on competition math and PhD-level science questions. It's the right choice when problem difficulty justifies the premium cost.
TextOpenAI
o3 Mini
OpenAI o3 Mini is a compact reasoning model that achieves strong STEM performance at much lower cost and latency than o1. It supports a configurable effort parameter (low/medium/high) to tune the thinking depth per request.
TextGemini 1.5 Pro
Gemini 1.5 Pro has a 2-million token context window — the largest of any generally available model — enabling processing of entire codebases, books, or hours of audio in a single call. It's the definitive choice for long-document analysis and large-codebase retrieval.
TextFASTGemini 1.5 Flash
Gemini 1.5 Flash is Google's speed-optimized multimodal model with a 1M token context window at 5× lower cost than Gemini 1.5 Pro. It offers excellent cost-to-performance for high-volume summarization, classification, and extraction pipelines.
TextGemma 2 27B
Gemma 2 27B is Google's most capable open-weight model, using a sliding-window attention mechanism and outperforming many larger models on reasoning and instruction-following benchmarks. It's a strong open-source alternative at competitive pricing.
TextCHEAPGemma 2 9B
Gemma 2 9B is Google's compact open-weight model with strong language understanding for its size, suitable for deployment on consumer GPUs and edge devices. At $0.05/M tokens, it offers exceptional value for classification and lightweight generation tasks.
TextMeta
Llama 3.1 405B
Llama 3.1 405B is Meta's largest open-source model and the only fully open model competitive with GPT-4 on a broad range of benchmarks. It's the foundation for many fine-tuned specialized models across the open-source community.
TextMeta
Llama 3.1 70B
Llama 3.1 70B is the production workhorse of the Llama 3.1 family, offering an excellent balance of capability and inference cost for RAG pipelines and chat applications. It's widely deployed in production and benefits from the largest open fine-tune ecosystem.
TextCHEAPMeta
Llama 3.1 8B
Llama 3.1 8B is Meta's fastest and cheapest Llama model, operating at very high throughput for classification, extraction, and lightweight summarization tasks. At $0.05/M tokens with a 128K context window, it's one of the best-value models on the platform.
TextMeta
Llama 3.2 90B Vision
Llama 3.2 90B Vision is Meta's largest multimodal open model, enabling high-quality image understanding across documents, charts, and natural scenes. It maintains strong text capability while adding best-in-class open vision performance.
TextCHEAPMistral
Mistral 7B Instruct
Mistral 7B Instruct is Mistral's compact instruction-following model, widely known for punching well above its weight class on coding and reasoning benchmarks. It's fully open-source under Apache 2.0 and efficient enough to run on a single consumer GPU.
TextMistral
Mixtral 8×7B
Mixtral 8×7B is Mistral's sparse MoE model that uses only 13B active parameters per forward pass from a 47B total pool, delivering GPT-3.5-class performance at open-source prices. It's widely used in production RAG and function-calling pipelines.
TextMistral
Codestral
Codestral is Mistral's dedicated code model trained on 80+ programming languages, leading on HumanEval and other code generation benchmarks among openly available models. It's optimized for both code completion in IDEs and full code generation via API.
TextAlibaba
Qwen 3 32B
Qwen 3 32B is Alibaba's mid-size model with hybrid thinking mode and strong multilingual coding capability at a competitive price. It offers an excellent cost-to-performance ratio for teams needing reasoning and analysis without the cost of frontier models.
TextAlibaba
Qwen 2.5 72B
Qwen 2.5 72B is Alibaba's proven previous-generation flagship with strong math, coding, and multilingual capabilities across 29 languages. It's a battle-tested choice for production deployments requiring reliable performance at scale.
TextxAI
Grok 3
Grok 3 is xAI's frontier model with real-time knowledge access through X's data platform and leading performance on AIME math and GPQA science benchmarks among publicly available models. It combines frontier reasoning with up-to-the-minute world knowledge.
TextxAI
Grok 2
Grok 2 is xAI's previous frontier model with real-time X/Twitter knowledge access and competitive performance on coding and reasoning tasks. It remains a reliable option for applications that require current event awareness.
TextCohere
Command R+
Command R+ is Cohere's enterprise-focused model purpose-built for Retrieval-Augmented Generation and multi-step tool use with reliable citation-grounded generation. It's the model of choice for enterprise knowledge management, search, and document intelligence.
TextCHEAPCohere
Command R
Command R is Cohere's cost-effective RAG model supporting 10 languages, optimized for citation-grounded generation and business document processing. It's designed for production knowledge workflows where reliability and grounding matter more than raw capability.
TextDeepSeek
DeepSeek R1 Zero
DeepSeek R1 Zero is the base RL-trained checkpoint of DeepSeek R1 before supervised fine-tuning, demonstrating raw chain-of-thought reasoning emerging directly from reinforcement learning. It's primarily used for research into reasoning emergence and RL-based training methods.
TextCHEAPDeepSeek
DeepSeek Coder V2
DeepSeek Coder V2 is a 236B MoE coding model with 21B active parameters, achieving state-of-the-art performance on HumanEval and LiveCodeBench among open models. At $0.14/M input tokens, it offers frontier code generation at a price comparable to much smaller models.
TextMicrosoft
Phi 4
Phi 4 is Microsoft's 14B parameter small language model with remarkable STEM reasoning and mathematical capabilities for its size, matching or beating much larger models on reasoning benchmarks. It runs efficiently on consumer hardware while delivering strong performance on structured problem-solving.
TextCHEAPMicrosoft
Phi 4 Mini
Phi 4 Mini is Microsoft's 3.8B parameter model optimized for reasoning tasks in resource-constrained environments, achieving competitive results on math and coding benchmarks for its class. At $0.04/M tokens, it's among the cheapest capable models available.
TextNEWNova
Nova Star
Nova Star is Nova's flagship proprietary model combining frontier-level reasoning with a 256K context window and strong coding performance. It's price-optimized relative to comparable models from other providers, making it the default for demanding Nova workloads.
TextCHEAPNova
Nova Mini
Nova Mini is Nova's most cost-efficient proprietary model, designed for classification, routing, and lightweight instruction-following tasks at high throughput. It's the right pick for pipelines where millions of short requests need fast, cheap decisions.
TextCHEAPYi
Yi Lightning
Yi Lightning is 01.AI's fastest model with exceptional Chinese-English bilingual capability, well-suited for multilingual classification, summarization, and Q&A at minimal cost. It delivers strong throughput for Asian-language workloads at a competitive price.
ImageNEWBlack Forest Labs
Flux 1.1 Pro
FLUX 1.1 [pro] is an enhanced version of Flux Pro with 6× faster generation speed and improved prompt adherence across photorealistic and artistic outputs. It maintains the same image quality as Flux Pro while enabling significantly higher throughput.
ImageStability AI
Stable Image Ultra
Stable Image Ultra is Stability AI's highest-quality image offering, combining the SD 3.5 Large model with their most advanced upscaling and refinement pipeline. It produces ultra-detailed, photorealistic outputs at up to 4-megapixel resolution, suitable for professional publishing.
ImageFASTStability AI
Stable Diffusion 3.5 Medium
Stable Diffusion 3.5 Medium is a 2.5B parameter multimodal diffusion transformer optimized for speed and efficient local deployment on consumer GPUs. At $0.012/image it's ideal for rapid prompt iteration and is fast enough to run interactively.
ImageFASTStability AI
SDXL Turbo
SDXL Turbo uses adversarial diffusion distillation to generate SDXL-quality images in a single inference step, enabling sub-second interactive image generation. It's the fastest model on the platform for real-time previewing and live UX applications.
ImageIdeogram
Ideogram 2.0
Ideogram 2.0 is the previous generation of Ideogram's text-rendering model with strong accuracy for typography and design elements embedded in images. It remains a reliable and affordable option for branded and marketing content requiring legible in-image text.
ImageNEWPlayground AI
Playground v3
Playground v3 is Playground AI's latest model delivering exceptional quality on aesthetic, graphic design, and illustration-focused outputs. It leads on the community Elo leaderboard for overall image quality and aesthetic preference among creative use cases.
ImageCHEAPKuaishou
Kolors
Kolors is Kuaishou's text-to-image model trained on a large-scale Chinese-English dataset, producing images with vibrant colors and strong aesthetic quality. It has notable competency in East Asian cultural styles and character-based image generation.
ImageCHEAPTencent
Hunyuan Image
Hunyuan Image is Tencent's text-to-image model with superior understanding of Chinese cultural aesthetics, compositional principles, and traditional art styles. It performs well across both photorealistic and artistic outputs at a low price point.
ImageCHEAPZhipu AI
CogView 4
CogView 4 is Zhipu AI's most capable image generation model, featuring accurate text rendering and rich compositional detail with strong Chinese-language prompt understanding. It achieves competitive quality on image generation benchmarks at one of the lowest price points available.
ImageCHEAPDeepSeek
Janus Pro
Janus Pro is DeepSeek's unified multimodal model that uses a dual-encoder architecture to separately handle visual understanding and image generation. It delivers strong prompt alignment and competitive quality relative to its low price.
ImageHiDream
HiDream I1
HiDream I1 is a high-resolution image generation model with exceptional detail preservation at large resolutions, optimized for professional photography and product visualization workflows. It achieves strong results on fine-detail preservation benchmarks.
ImageCHEAPFal AI
AuraFlow
AuraFlow is an open-source flow-matching generative model trained on aesthetic image datasets, producing high-quality outputs at a very low price. At $0.005/image it's the most cost-effective model on the platform for volume workflows.
ImageCHEAPLightricks
LTX Image
LTX Image is Lightricks' text-to-image model optimized for commercial content creation, product photography, and on-brand asset generation. It delivers consistent, photorealistic outputs with strong prompt adherence at a budget-friendly price.
VideoFASTRunway
Gen-3 Alpha Turbo
Gen-3 Alpha Turbo is a 4× faster and significantly cheaper variant of Gen-3 Alpha, preserving most of the visual quality for rapid iteration. It's the preferred option when generation speed matters more than achieving peak cinematic quality.
VideoNEWPika
Pika 2.1
Pika 2.1 is Pika's latest video model with improved motion naturalness and stronger scene-level consistency compared to 2.0. It's designed for social media and short-form content creators who need polished, expressive video from simple text prompts.
VideoPika
Pika 2.0
Pika 2.0 is a stable and widely-deployed video model with good performance on short-form social media content. It offers a reliable balance of quality and affordability for consumer-facing AI applications.
VideoLuma AI
Dream Machine 1.5
Dream Machine 1.5 is Luma AI's accessible video model that generates realistic motion with strong physics simulation at a competitive price. It's a reliable choice for storytelling, product visualization, and immersive content creation.
VideoLuma AI
Ray 2
Ray 2 is Luma AI's flagship video model with state-of-the-art visual quality and precise cinematographic camera control for pan, tilt, and tracking shots. It produces high-fidelity video with remarkable consistency across multi-shot scenes.
VideoVeo 2
Veo 2 is Google's frontier video generation model with breakthrough quality in physics simulation, fine motion detail, and cinematographic camera control. It's the most capable model for high-end cinematic production requiring realistic physics and lighting.
VideoCHEAPGenmo
Mochi 1
Mochi 1 is Genmo's fully open-source video model with strong motion realism and fluid character animation at a fraction of proprietary model costs. It's the most capable open-source video model available for self-hosted deployments.
VideoMiniMax
MiniMax Video 01
MiniMax Video 01 is MiniMax's text-to-video model with strong narrative coherence and consistent character representation across multiple shots. It's suited for storytelling applications requiring scene-to-scene character and style consistency.
VideoMiniMax
Hailuo 01 Director
Hailuo 01 Director is MiniMax's camera-controlled video model, offering precise pan, zoom, dolly, and tracking camera movements via structured text prompts. It's designed for directors and cinematographers who need frame-accurate control over virtual camera motion.
VideoZhipu AI
CogVideoX 1.5
CogVideoX 1.5 is Zhipu AI's upgraded video model with improved temporal coherence and motion naturalness over the original CogVideoX. It supports longer video durations and higher prompt fidelity, making it the stronger open-source option from Zhipu.
VideoFASTLightricks
LTX Video
LTX Video is the world's first video generation model that produces video faster than real-time playback speed, using a highly efficient DiT architecture. It enables interactive video generation workflows and rapid iteration at a very competitive price.
VideoTencent
Hunyuan Video
Hunyuan Video is Tencent's open-source video model with strong motion dynamics and scene composition, using a dual-stream to single-stream transformer architecture. It delivers high visual quality with full model transparency for self-hosted deployments.
VideoCHEAPStability AI
Stable Video Diffusion
Stable Video Diffusion is Stability AI's image-to-video model, animating any still photograph into fluid video with natural-looking motion. It's particularly effective on product photography and lifestyle imagery where subtle, realistic motion is needed.
VideoKuaishou
Kling 2.0 Master
Kling 2.0 Master is Kuaishou's most advanced video model, producing cinematic-quality output with highly realistic physics, lighting, and precise human motion simulation. It's the premium option in the Kling family for high-stakes creative and commercial production.
VideoKuaishou
Kling 1.5
Kling 1.5 is the previous generation of Kuaishou's Kling model with proven production quality across a wide range of video generation tasks. It offers reliable performance at a competitive price for teams that don't need the latest Kling capabilities.
VideoCHEAPWanVideo
Wan 2.1 T2V 480p
Wan 2.1 T2V 480p is WanVideo's budget-friendly text-to-video model for 480p generation, offering the lowest price per second of any video model on the platform. It's well-suited for prototyping and validating video concepts before committing to higher-quality generation.
VideoWanVideo
Wan 2.1 T2V 720p
Wan 2.1 T2V 720p is WanVideo's HD text-to-video model generating 720p clips with smooth motion and reliable prompt adherence at a budget-friendly price. It's the value-focused default for HD video generation when Wan 2.6 quality is not required.
AudioHOTElevenLabs
ElevenLabs Flash v2.5
ElevenLabs Flash v2.5 is the fastest and most cost-efficient multilingual text-to-speech model, with sub-75ms latency and support for 32 languages. It's the go-to model for real-time voice applications, interactive agents, and high-volume narration pipelines.
AudioElevenLabs
ElevenLabs Multilingual v2
ElevenLabs Multilingual v2 is the highest-quality TTS model from ElevenLabs, delivering emotionally expressive, natural-sounding speech across 29 languages with deep voice cloning fidelity. It's the production standard for audiobooks, dubbing, and professional narration.
AudioFASTOpenAI
Whisper Large v3
Whisper Large v3 is OpenAI's most accurate speech-to-text model, supporting transcription and translation across 99 languages with state-of-the-art word error rates. It's the standard for production transcription pipelines, meeting notes, and accessibility tooling.
AudioOpenAI
OpenAI TTS-1
OpenAI TTS-1 is a real-time text-to-speech model optimized for low-latency streaming applications, supporting six distinct voices and offering natural-sounding output suitable for assistants and interactive applications. It's the simplest way to add voice to any OpenAI-powered product.