<h2 align="center">Awesome Prompt Engineering π§ββοΈ</h2>
<p align="center">
Loading actions...
Skill content
Main instructions and any bundled files for this skill.
Awesome Prompt Engineering π§ββοΈ
A hand-curated collection of resources for Prompt Engineering and Context Engineering β covering papers, tools, models, APIs, benchmarks, courses, and communities for working with Large Language Models.
https://promptslab.github.io
Master Prompt Engineering. Join the Course at https://promptslab.github.io
Master Prompt Engineering. Join the Course at https://promptslab.github.io
π Start Here
New to prompt engineering? Follow this path:
- Learn the basics β ChatGPT Prompt Engineering for Developers (free, ~90 min)
- Read the guide β Prompt Engineering Guide by DAIR.AI (open-source, comprehensive)
- Study provider docs β OpenAI Prompt Engineering Guide Β· Anthropic Prompt Engineering Guide
- Understand where the field is heading β Anthropic: Effective Context Engineering for AI Agents
- Read the research β The Prompt Report β taxonomy of 58+ prompting techniques from 1,500+ papers
Table of Contents
- Papers
- Major Surveys
- Prompt Optimization and Automatic Prompting
- Prompt Compression
- Reasoning Advances
- In-Context Learning
- Agentic Prompting and Multi-Agent Systems
- Multimodal Prompting
- Structured Output and Format Control
- Prompt Injection and Security
- Applications of Prompt Engineering
- Text-to-Image Generation
- Text-to-Music/Audio Generation
- Foundational Papers (Pre-2024)
- Tools and Code
- APIs
- Datasets and Benchmarks
- Models
- AI Content Detectors
- Books
- Courses
- Tutorials and Guides
- Videos
- Communities
- How to Contribute
Papers
π
Major Surveys
- The Prompt Report: A Systematic Survey of Prompting Techniques [2024] β Most comprehensive survey: taxonomy of 58 text and 40 multimodal prompting techniques from 1,500+ papers. Co-authored with OpenAI, Microsoft, Google, Stanford.
- A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications [2024] β 44 techniques across application areas with per-task performance summaries.
- A Survey of Prompt Engineering Methods in LLMs for Different NLP Tasks [2024] β 39 prompting methods across 29 NLP tasks.
- A Survey of Automatic Prompt Engineering: An Optimization Perspective [2025] β Formalizes auto-PE methods as discrete/continuous/hybrid optimization problems.
- Efficient Prompting Methods for Large Language Models: A Survey [2024] β Survey of efficiency-oriented prompting (compression, optimization, APE) for reducing compute and latency.
- Navigate through Enigmatic Labyrinth: A Survey of Chain of Thought Reasoning [2023, ACL 2024] β Systematic CoT survey.
- Demystifying Chains, Trees, and Graphs of Thoughts [2024] β Unified framework for multi-prompt reasoning topologies.
- Towards Goal-oriented Prompt Engineering for Large Language Models: A Survey [2024] β Focuses on prompts designed around explicit task goals.
- Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning LLMs [2025] β Distinguishes Long CoT from Short CoT in o1/R1-era models.
Prompt Optimization and Automatic Prompting
- OPRO: Large Language Models as Optimizers [2023, NeurIPS 2024] β Uses LLMs as optimizers via meta-prompts; optimized prompts outperform human-designed ones by up to 50% on BBH.
- DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines [2023, ICLR 2024] β Framework for programming (not prompting) LLMs with automatic prompt optimization.
- MIPRO: Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs [2024, EMNLP 2024] β Bayesian optimization for multi-stage LM programs; up to 13% accuracy gains.
- TextGrad: Automatic "Differentiation" via Text [2024] β Treats compound AI systems as computation graphs with textual feedback as gradients. Published in Nature.
- EvoPrompt [2023, ACL 2024] β Evolutionary algorithm approach for automatically optimizing discrete prompts.
- Meta Prompting for AI Systems [2023, ICLR 2024 Workshop] β Example-agnostic structural templates formalized using category theory.
- Prompt Engineering a Prompt Engineer (PEΒ²) [2024, ACL Findings] β Uses LLMs to meta-prompt themselves, refining prompts with step-by-step templates to significantly improve reasoning.
- Large Language Models Are Human-Level Prompt Engineers [2022] β Automatic prompt generation via APE.
- Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning [2023]
- SPO: Self-Supervised Prompt Optimization [2025] β Competitive performance at 1β6% of the cost of prior methods.
Prompt Compression
- LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression [2024, ACL 2024] β 3xβ6x faster than LLMLingua with GPT-4 data distillation.
- LongLLMLingua [2023, ACL 2024] β Question-aware compression for long contexts; 21.4% performance boost with 4x fewer tokens.
- Prompt Compression for Large Language Models: A Survey [2024] β Comprehensive survey of hard and soft prompt compression methods.
Reasoning Advances
- Scaling LLM Test-Time Compute Optimally [2024] β Shows optimal test-time compute allocation can outperform 14x larger models.
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [2025] β Pure RL-trained reasoning model matching o1; open-source with distilled variants.
- s1: Simple Test-Time Scaling [2025] β SFT on just 1,000 examples creates competitive reasoning model via "budget forcing."
- Reasoning Language Models: A Blueprint [2025] β Systematic framework organizing reasoning LM approaches.
- Demystifying Long Chain-of-Thought Reasoning in LLMs [2025] β Analyzes long CoT behavior in modern reasoning models.
- Graph of Thoughts: Solving Elaborate Problems with LLMs [2023, AAAI 2024] β Models thoughts as arbitrary graphs; 62% quality improvement over ToT on sorting.
- Tree of Thoughts: Deliberate Problem Solving with LLMs [2023, NeurIPS 2023] β Tree search over reasoning paths.
- Everything of Thoughts [2023] β Integrates CoT, ToT, and external solvers via MCTS.
- Skeleton-of-Thought [2023] β Parallel decoding via answer skeleton generation for up to 2.69x speedup.
- Chain of Thought Prompting Elicits Reasoning in Large Language Models [2022] β The foundational CoT paper.
- Self-Consistency Improves Chain of Thought Reasoning [2022] β Aggregating multiple CoT outputs for reliability.
- Large Language Models are Zero-Shot Reasoners [2022] β "Let's think step by step" as a zero-shot reasoning trigger.
- ReAct: Synergizing Reasoning and Acting in Language Models [2022] β Interleaving reasoning and tool use.
In-Context Learning
- Many-Shot In-Context Learning [2024, NeurIPS 2024 Spotlight] β Significant gains scaling ICL to hundreds/thousands of examples; introduces Reinforced and Unsupervised ICL.
- Many-Shot In-Context Learning in Multimodal Foundation Models [2024] β Scales multimodal ICL to ~2,000 examples across 14 datasets.
- Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? [2022]
- Fantastically Ordered Prompts and Where to Find Them [2021] β Overcoming few-shot prompt order sensitivity.
- Calibrate Before Use: Improving Few-Shot Performance of Language Models [2021]
Agentic Prompting and Multi-Agent Systems
- Agentic Large Language Models: A Survey [2025] β Comprehensive survey organizing agentic LLMs by reasoning, acting, and interacting capabilities.
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges [2024] β Covers profiling, communication, and growth mechanisms.
- Multi-Agent Collaboration Mechanisms: A Survey of LLMs [2025] β Reviews debate and cooperation strategies in LLM-based multi-agent systems.
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation [2023] β Microsoft's foundational multi-agent framework paper.
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-World APIs [2023, ICLR 2024] β Trains LLMs to use massive real-world API collections.
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [2023, ICLR 2024] β The benchmark driving agentic coding progress.
- AgentBench: Evaluating LLMs as Agents [2023, ICLR 2024] β Benchmark across 8 environments.
- PAL: Program-aided Language Models [2023] β Offloading computation to code interpreters.
Multimodal Prompting
- Visual Prompting in Multimodal Large Language Models: A Survey [2024] β First comprehensive survey on visual prompting methods in MLLMs.
- Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V [2023] β Visual markers dramatically improve visual grounding.
- A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks [2024] β Covers text, image, video, audio MLLMs.
- Multimodal Chain-of-Thought Reasoning in Language Models [2023]
- From Prompt Engineering to Prompt Craft [2024] β Design-research view of prompt "craft" for diffusion models.
Structured Output and Format Control
- Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of LLMs [2024] β Examines how constraining outputs to structured formats impacts reasoning performance.
- Batch Prompting: Efficient Inference with LLM APIs [2023]
- Structured Prompting: Scaling In-Context Learning to 1,000 Examples [2022]
Prompt Injection and Security
- Formalizing and Benchmarking Prompt Injection Attacks and Defenses [2023, USENIX Security 2024] β Formal framework with systematic evaluation of 5 attacks and 10 defenses across 10 LLMs.
- The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions [2024] β OpenAI's priority-level training for injection defense.
- AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses [2024] β Realistic agent scenario benchmark.
- InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents [2024]
- SecAlign: Defending Against Prompt Injection with Preference Optimization [2024] β DPO-based defense.
- WASP: Benchmarking Web Agent Security Against Prompt Injection [2025] β Security benchmark for web/computer-use agents.
- Many-Shot Jailbreaking [2024] β Scaling harmful examples in long-context windows enables jailbreaking (Anthropic Technical Report).
- Constitutional AI: Harmlessness from AI Feedback [2022]
- Ignore Previous Prompt: Attack Techniques For Language Models [2022]
- Artificial Intelligence and Cybersecurity: Documented Risks, Enterprise Guardrails, and Emerging Threats in 2024β2025 [2025] β Survey of real prompt-injection incidents with practical governance prompt patterns.
Applications of Prompt Engineering
- Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [2023]
- Legal Prompt Engineering for Multilingual Legal Judgement Prediction [2023]
- Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems [2022]
- Commonsense-Aware Prompting for Controllable Empathetic Dialogue Generation [2023]
- PLACES: Prompting Language Models for Social Conversation Synthesis [2023]
- Medical Image Segmentation Using Transformer Encoders and Prompt-Based Learning: A Systematic Review [2025]
- TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning [2025] β SQL-based interface preserving tabular structure for multi-hop queries.
Text-to-Image Generation
- A Taxonomy of Prompt Modifiers for Text-To-Image Generation [2022]
- Design Guidelines for Prompt Engineering Text-to-Image Generative Models [2021]
- High-Resolution Image Synthesis with Latent Diffusion Models [2021]
- DALLΒ·E: Creating Images from Text [2021]
- Investigating Prompt Engineering in Diffusion Models [2022]
Text-to-Music/Audio Generation
- MusicLM: Generating Music From Text [2023]
- ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models [2023]
- AudioLM: A Language Modeling Approach to Audio Generation [2023]
- Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models [2023]
Foundational Papers (Pre-2024)
These papers established the core concepts that modern prompt engineering builds on:
- Language Models are Few-Shot Learners (GPT-3) [2020] β Demonstrated few-shot prompting at scale.
- Prefix-Tuning: Optimizing Continuous Prompts for Generation [2021]
- The Power of Scale for Parameter-Efficient Prompt Tuning [2021]
- Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm [2021]
- Show Your Work: Scratchpads for Intermediate Computation with Language Models [2021]
- Generated Knowledge Prompting for Commonsense Reasoning [2021]
- Making Pre-trained Language Models Better Few-shot Learners [2021]
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts [2020]
- How Can We Know What Language Models Know? [2020]
- A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT [2023]
- Synthetic Prompting: Generating Chain-of-Thought Demonstrations for LLMs [2023]
- Progressive Prompts: Continual Learning for Language Models [2023]
- Successive Prompting for Decompleting Complex Questions [2022]
- Decomposed Prompting: A Modular Approach for Solving Complex Tasks [2022]
- PromptChainer: Chaining Large Language Model Prompts through Visual Programming [2022]
- Ask Me Anything: A Simple Strategy for Prompting Language Models [2022]
- Prompting GPT-3 To Be Reliable [2022]
- On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning [2022]
Tools and Code
π§
Prompt Management and Testing
| Name | Description | Link |
|---|---|---|
| Promptfoo | Open-source CLI for testing, evaluating, and red-teaming LLM prompts. YAML configs, CI/CD integration, adversarial testing. ~9K+ β | GitHub |
| Promptify | Solve NLP Problems with LLM's & Easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more with Promptify | [Github] |
| Agenta | Open-source LLM developer platform for prompt management, evaluation, human feedback, and deployment. | GitHub |
| PromptLayer | Version, test, and monitor every prompt and agent with robust evals, tracing, and regression sets. | Website |
| Helicone | Production prompt monitoring and optimization platform. | Website |
| LangGPT | Framework for structured and meta-prompt design. 10K+ β | GitHub |
| ChainForge | Visual toolkit for building, testing, and comparing LLM prompt responses without code. | GitHub |
| LMQL | A query language for LLMs making complex prompt logic programmable. | GitHub |
| Promptotype | Platform for developing, testing, and managing structured LLM prompts. | Website |
| PromptPanda | AI-powered prompt management system for streamlining prompt workflows. | Website |
| Promptimize AI | Browser extension to automatically improve user prompts for any AI model. | Website |
| PROMPTMETHEUS | Web-based "Prompt Engineering IDE" for iteratively creating and running prompts. | Website |
| Better Prompt | Test suite for LLM prompts before pushing to production. | GitHub |
| OpenPrompt | Open-source framework for prompt-learning research. | GitHub |
| Prompt Source | Toolkit for creating, sharing, and using natural language prompts. | GitHub |
| Prompt Engine | NPM utility library for creating and maintaining prompts for LLMs (Microsoft). | GitHub |
| PromptInject | Framework for quantitative analysis of LLM robustness to adversarial prompt attacks. | GitHub |
LLM Evaluation Tools
| Name | Description | Link |
|---|---|---|
| DeepEval | Open-source evaluation framework covering RAG, agents, and conversations with CI/CD integration. ~7K+ β | GitHub |
| Ragas | RAG evaluation with knowledge-graph-based test set generation and 30+ metrics. ~8K+ β | GitHub |
| LangSmith | LangChain's platform for debugging, testing, evaluating, and monitoring LLM applications. | Website |
| Langfuse | Open-source LLM observability with tracing, prompt management, and human annotation. ~7K+ β | GitHub |
| Braintrust | End-to-end AI evaluation platform, SOC2 Type II certified. | Website |
| Arize AI / Phoenix | Real-time LLM monitoring with drift detection and tracing. | GitHub |
| TruLens | Evaluating and explaining LLM apps; tracks hallucinations, relevance, groundedness. | GitHub |
| InspectAI | Purpose-built for evaluating agents against benchmarks (UK AISI). | GitHub |
| Opik | Evaluate, test, and ship LLM applications across dev and production lifecycles. | GitHub |
Agent Frameworks
| Name | Description | Link |
|---|---|---|
| LangChain / LangGraph | Most widely adopted LLM app framework; LangGraph adds graph-based multi-step agent workflows. ~100K+ / ~10K+ β | GitHub Β· LangGraph |
| CrewAI | Role-playing AI agent orchestration with 700+ integrations. ~44K+ β | GitHub |
| AutoGen (AG2) | Microsoft's multi-agent conversational framework. ~40K+ β | GitHub |
| DSPy | Stanford's framework for programming LLMs with automatic prompt/weight optimization. ~22K+ β | GitHub |
| OpenAI Agents SDK | Official agent framework with function calling, guardrails, and handoffs. ~10K+ β | GitHub |
| Semantic Kernel | Microsoft's AI framework powering M365 Copilot; C#, Python, Java. ~24K+ β | GitHub |
| LlamaIndex | Data framework for RAG and agent capabilities. ~40K+ β | GitHub |
| Haystack | Open-source NLP framework with pipeline architecture for RAG and agents. ~20K+ β | GitHub |
| Agno (formerly Phidata) | Python agent framework with microsecond instantiation. ~20K+ β | GitHub |
| Smolagents | Hugging Face's minimalist code-centric agent framework (~1000 LOC). ~15K+ β | GitHub |
| Pydantic AI | Type-safe agent framework using Pydantic for structured validation. ~8K+ β | GitHub |
| Mastra | TypeScript AI agent framework with assistants, RAG, and observability. ~20K+ β | GitHub |
| Google ADK | Agent Development Kit deeply integrated with Gemini and Google Cloud. | GitHub |
| Strands Agents (AWS) | Model-agnostic framework with deep AWS integrations. | GitHub |
| Langflow | Node-based visual agent builder with drag-and-drop. ~50K+ β | GitHub |
| n8n | Workflow automation with AI agent capabilities and 400+ integrations. ~60K+ β | GitHub |
| Dify | All-in-one backend for agentic workflows with tool-using agents and RAG. | GitHub |
| PraisonAI | Multi-AI Agents framework with 100+ LLM support, MCP integration, and built-in memory. | GitHub |
| Neurolink | Multi-provider AI agent framework unifying 12+ providers with workflow orchestration. | GitHub |
| Composio | Connect 100+ tools to AI agents with zero setup. | GitHub |
Prompt Optimization Tools
| Name | Description | Link |
|---|---|---|
| DSPy | Multiple optimizers (MIPROv2, BootstrapFewShot, COPRO) for automatic prompt tuning. ~22K+ β | GitHub |
| TextGrad | Automatic differentiation via text (Stanford). ~2K+ β | GitHub |
| OPRO | Google DeepMind's optimization by prompting. | GitHub |
Red Teaming and Prompt Security
| Name | Description | Link |
|---|---|---|
| Garak (NVIDIA) | LLM vulnerability scanner for hallucination, injection, and jailbreaks β the "nmap for LLMs." ~3K+ β | GitHub |
| PyRIT (Microsoft) | Python Risk Identification Tool for automated red-teaming. ~3K+ β | GitHub |
| DeepTeam | 40+ vulnerabilities, 10+ attack methods, OWASP Top 10 support. | GitHub |
| LLM Guard | Security toolkit for LLM I/O validation. ~2K+ β | GitHub |
| NeMo Guardrails (NVIDIA) | Programmable guardrails for conversational systems. ~5K+ β | GitHub |
| Guardrails AI | Define strict output formats (JSON schemas) to ensure system reliability. | Website |
| Lakera | AI security platform for real-time prompt injection detection. | Website |
| Purple Llama (Meta) | Open-source LLM safety evaluation including CyberSecEval. | GitHub |
| GPTFuzz | Automated jailbreak template generation achieving >90% success rates. | GitHub |
| Rebuff | Open-source tool for detection and prevention of prompt injection. | GitHub |
MCP (Model Context Protocol)
MCP is an open standard developed by Anthropic (Nov 2024, donated to Linux Foundation Dec 2025) for connecting AI assistants to external data sources and tools through a standardized interface. It has 97M+ monthly SDK downloads and has been adopted by GitHub, Google, and most major AI providers.
| Name | Description | Link |
|---|---|---|
| MCP Specification | The core protocol specification and SDKs. ~15K+ β | GitHub |
| MCP Reference Servers | Official implementations: fetch, filesystem, GitHub, Slack, Postgres. | GitHub |
| FastMCP (Python) | High-level Pythonic framework for building MCP servers. ~5K+ β | GitHub |
| GitHub MCP Server | GitHub's official MCP server for repo, issue, PR, and Actions interaction. ~15K+ β | GitHub |
| Awesome MCP Servers | Curated list of 10,000+ community MCP servers. ~30K+ β | GitHub |
| Context7 | MCP server providing version-specific documentation to reduce code hallucination. | GitHub |
| GitMCP | Creates remote MCP servers for any GitHub repo by changing the domain. | Website |
| MCP Inspector | Visual testing tool for MCP server development. | GitHub |
Vibe Coding and AI Coding Assistants
| Name | Description | Link |
|---|---|---|
| Claude Code | Anthropic's command-line AI coding tool; widely considered one of the best AI coding assistants (2026). | Docs |
| Cursor | AI-native code editor; Composer feature generates entire applications from natural language. | Website |
| Windsurf (Codeium) | "First agentic IDE" with multi-file editing and project-wide context. | Website |
| GitHub Copilot | AI pair programmer; ~30% of new GitHub code comes from Copilot. | Website |
| Aider | Open-source terminal AI pair programmer with Git integration. ~25K+ β | GitHub |
| Cline | Open-source VS Code AI assistant connecting editor and terminal through MCP. ~20K+ β | GitHub |
| Continue | Open-source IDE extensions for custom AI code assistants. ~22K+ β | GitHub |
| OpenAI Codex CLI | Lightweight terminal coding agent. | GitHub |
| Gemini CLI | Google's open-source terminal AI agent. | GitHub |
| Bolt.new | Browser-based prompt-to-app generation with one-click deployment. | Website |
| Lovable | Full-stack apps from natural language descriptions. | Website |
| v0 (Vercel) | AI assistant for building Next.js frontend components from text. | Website |
| Firebase Studio | Google's agentic cloud-based development environment. | Website |
Other Notable Repositories
| Name | Description | Link |
|---|---|---|
| Prompt Engineering Guide (DAIR.AI) | The definitive open-source guide and resource hub. 3M+ learners. ~55K+ β | GitHub |
| Awesome ChatGPT Prompts / Prompts.chat | World's largest open-source prompt library. 1000s of prompts for all major models. | GitHub |
| 12-Factor Agents | Principles for building production-grade LLM-powered software. ~17K+ β | GitHub |
| NirDiamant/Prompt_Engineering | 22 hands-on Jupyter Notebook tutorials. ~3K+ β | GitHub |
| Context Engineering Repository | First-principles handbook for moving beyond prompt engineering to context design. | GitHub |
| AI Agent System Prompts Library | Collection of system prompts from production AI coding agents (Claude Code, Gemini CLI, Cline, Aider, Roo Code). | GitHub |
| Awesome Vibe Coding | Curated list of 245+ tools and resources for building software through natural language prompts. | GitHub |
| OpenAI Cookbook | Official recipes for prompts, tools, RAG, and evaluations. | GitHub |
| Embedchain | Framework to create ChatGPT-like bots over your dataset. | GitHub |
| ThoughtSource | Framework for the science of machine thinking. | GitHub |
| Promptext | Extracts and formats code context for AI prompts with token counting. | GitHub |
| Price Per Token | Compare LLM API pricing across 200+ models. | Website |
APIs
π»
OpenAI
| Model | Context | Price (Input/Output per 1M tokens) | Key Feature |
|---|---|---|---|
| GPT-5.2 / 5.2 Thinking | 400K | $1.75 / $14 | Latest flagship, 90% cached discount, configurable reasoning |
| GPT-5.1 | 400K | $1.25 / $10 | Previous generation flagship |
| GPT-4.1 / 4.1 mini / nano | 1M | $2 / $8 | Best non-reasoning model, 40% faster and 80% cheaper than GPT-4o |
| o3 / o3-pro | 200K | Varies | Reasoning models with native tool use |
| o4-mini | 200K | Cost-efficient | Fast reasoning, best on AIME at its cost class |
| GPT-OSS-120B / 20B | 128K | $0.03 / $0.30 | First open-weight models, Apache 2.0 |
Key features: Responses API, Agents SDK, Structured Outputs, function calling, prompt caching (90% discount), Batch API (50% discount), MCP support. Platform Docs
Anthropic (Claude)
| Model | Context | Price (Input/Output per 1M tokens) | Key Feature |
|---|---|---|---|
| Claude Opus 4.6 | 1M (beta) | $5 / $25 | Most powerful, state-of-the-art coding and agentic tasks |
| Claude Sonnet 4.5 | 200K | $3 / $15 | Best coding model, 61.4% OSWorld (computer use) |
| Claude Haiku 4.5 | 200K | Fast tier | Near-frontier, fastest model class |
| Claude Opus 4 / Sonnet 4 | 200K | $15/$75 (Opus) | Opus: 72.5% SWE-bench, Sonnet 4 powers GitHub Copilot |
Key features: Extended Thinking with tool use, Computer Use, MCP (originated here), prompt caching, Claude Code CLI, available on AWS Bedrock and Google Vertex AI. API Docs
Google (Gemini)
| Model | Context | Price (Input/Output per 1M tokens) | Key Feature |
|---|---|---|---|
| Gemini 3 Pro Preview | 1M | $2 / $12 | Most intelligent Google model, deployed to 2B+ Search users |
| Gemini 2.5 Pro | 1M | $1.25 / $10 | Best for coding/agentic tasks, thinking model |
| Gemini 2.5 Flash / Flash-Lite | 1M | $0.30/$1.50 Β· $0.10/$0.40 | Price-performance leaders |
Key features: Thinking (all 2.5+ models), Google Search grounding, code execution, Live API (real-time audio/video), context caching. Google AI Studio
Meta (Llama)
| Model | Architecture | Context | Key Feature |
|---|---|---|---|
| Llama 4 Scout | 109B MoE / 17B active | 10M | Fits single H100, multimodal, open-weight |
| Llama 4 Maverick | 400B MoE / 17B active, 128 experts | 1M | Beats GPT-4o, open-weight |
| Llama 3.3 70B | Dense | 128K | Matches Llama 3.1 405B |
Available on 25+ cloud partners, Hugging Face, and inference APIs. Llama
Other Notable Providers
| Provider | Description | Link |
|---|---|---|
| Mistral AI | Mistral Large 3 (675B MoE), Devstral 2, Ministral 3. Apache 2.0. | Website |
| DeepSeek | V3.2 (671B MoE), R1 (reasoning, MIT license). $0.15/$0.75 per 1M tokens. | Website |
| xAI (Grok) | Grok 4.1 Fast: 2M context, $0.20/$0.50 per 1M tokens. | Website |
| Cohere | Command A (111B, 256K context), Embed v4, Rerank 4.0. Excels at RAG. | Website |
| Together AI | 200+ open models with sub-100ms latency. | Website |
| Groq | LPU hardware with ~300+ tokens/sec inference. | Website |
| Fireworks AI | Fast inference with HIPAA + SOC2 compliance. | Website |
| OpenRouter | Unified API for 300+ models from all providers. | Website |
| Cerebras | Wafer-scale chips with best total response time. | Website |
| Perplexity AI | Search-augmented API with citations. | Website |
| Amazon Bedrock | Managed multi-model service with Claude, Llama, Mistral, Cohere. | Website |
| Hugging Face Inference | Access to open models via API. | Website |
Datasets and Benchmarks
πΎ
Major Benchmarks (2024β2026)
| Name | Description | Link |
|---|---|---|
| Chatbot Arena / LM Arena | 6M+ user votes for Elo-rated pairwise LLM comparisons. De facto standard for human preference. | Website |
| MMLU-Pro | 12,000+ graduate-level questions across 14 domains. NeurIPS 2024 Spotlight. | GitHub |
| GPQA | 448 "Google-proof" STEM questions; non-expert validators achieve only 34%. | arXiv |
| SWE-bench Verified | Human-validated 500-task subset for real-world GitHub issue resolution. | Website |
| SWE-bench Pro | 1,865 tasks across 41 professional repos; best models score only ~23%. | Leaderboard |
| Humanity's Last Exam (HLE) | 2,500 expert-vetted questions; top AI scores only ~10β30%. | Website |
| BigCodeBench | 1,140 coding tasks across 7 domains; AI achieves ~35.5% vs. 97% human success. | Leaderboard |
| LiveBench | Contamination-resistant with frequently updated questions. | Paper |
| FrontierMath | Research-level math; AI solves only ~2% of problems. | Research |
| ARC-AGI v2 | Abstract reasoning measuring fluid intelligence. | Research |
| IFEval | Instruction-following evaluation with formatting/content constraints. | arXiv |
| MLE-bench | OpenAI's ML engineering evaluation via Kaggle-style tasks. | GitHub |
| PaperBench | Evaluates AI's ability to replicate 20 ICML 2024 papers from scratch. | GitHub |
Leaderboards and Meta-Benchmarks
| Name | Description | Link |
|---|---|---|
| Hugging Face Open LLM Leaderboard v2 | Evaluates open models on MMLU-Pro, GPQA, IFEval, MATH. | Leaderboard |
| Artificial Analysis Intelligence Index v3 | Aggregates 10 evaluations. | Website |
| SEAL by Scale AI | Hosts SWE-bench Pro and agentic evaluations. | Leaderboard |
Prompt and Instruction Datasets
| Name | Description | Link |
|---|---|---|
| P3 (Public Pool of Prompts) | Prompt templates for 270+ NLP tasks used to train T0 and similar models. | HuggingFace |
| System Prompts Dataset | 944 system prompt templates for agent workflows (by Daniel Rosehill, Aug 2025). | HuggingFace |
| OpenAssistant Conversations (OASST) | 161,443 messages in 35 languages with 461,292 quality ratings. | HuggingFace |
| UltraChat / UltraFeedback | Large-scale synthetic instruction and preference datasets for alignment training. | HuggingFace |
| SoftAge Prompt Engineering Dataset | 1,000 diverse prompts across 10 categories for benchmarking prompt performance. | HuggingFace |
| Text Transformation Prompt Library | Comprehensive collection of text transformation prompts (May 2025). | HuggingFace |
| Writing Prompts | ~300K human-written stories paired with prompts from r/WritingPrompts. | Kaggle |
| Midjourney Prompts | Text prompts and image URLs scraped from MidJourney's public Discord. | HuggingFace |
| CodeAlpaca-20k | 20,000 programming instruction-output pairs. | HuggingFace |
| ProPEX-RAG | Dataset for prompt optimization in RAG workflows. | HuggingFace |
| NanoBanana Trending Prompts | 1,000+ curated AI image prompts from X/Twitter, ranked by engagement. | GitHub |
Red Teaming and Adversarial Datasets
| Name | Description | Link |
|---|---|---|
| HarmBench | 510 harmful behaviors across standard, contextual, copyright, and multimodal categories. | Website |
| JailbreakBench | Open robustness benchmark for jailbreaking with 100 prompts. | Research |
| AgentHarm | 110 malicious agent tasks across 11 harm categories. | arXiv |
| DecodingTrust | 243,877 prompts evaluating trustworthiness across 8 perspectives. | Research |
| SafetyPrompts.com | Aggregator tracking 50+ safety/red-teaming datasets. | Website |
Models
π§
Frontier Models (2025β2026)
| Model | Provider | Context | Key Strength |
|---|---|---|---|
| GPT-5.2 | OpenAI | 400K | General intelligence, 100% AIME 2025 |
| Claude Opus 4.6 | Anthropic | 1M (beta) | Coding, agentic tasks, extended thinking |
| Gemini 3 Pro | 1M | #1 LMArena (~1500 Elo), multimodal | |
| Grok 4.1 | xAI | 2M | #2 LMArena (1483 Elo), low hallucination |
| Mistral Large 3 | Mistral AI | 256K | Best open-weight (675B MoE/41B active), Apache 2.0 |
| DeepSeek-V3.2 | DeepSeek | 128K | Best value (671B MoE/37B active), MIT license |
| Llama 4 Maverick | Meta | 1M | Beats GPT-4o (400B MoE/17B active), open-weight |
Reasoning Models
| Model | Key Detail |
|---|---|
| OpenAI o3 / o3-pro | 87.7% GPQA Diamond. Native tool use. |
| OpenAI o4-mini | Best AIME at its cost class with visual reasoning. |
| DeepSeek-R1 / R1-0528 | Open-weight, RL-trained. 87.5% on AIME 2025. MIT license. |
| QwQ (Qwen with Questions) | 32B reasoning model. Apache 2.0. Comparable to R1. |
| Gemini 2.5 Pro/Flash (Thinking) | Built-in reasoning with configurable thinking budget. |
| Claude Extended Thinking | Hybrid mode with visible chain-of-thought and tool use. |
| Phi-4 Reasoning / Plus | 14B reasoning models rivaling much larger models. Open-weight. |
| GPT-OSS-120B | OpenAI's open-weight with CoT. Near-parity with o4-mini. Apache 2.0. |
Notable Open-Source Models
| Model | Provider | Key Detail |
|---|---|---|
| Qwen3-235B-A22B | Alibaba | Flagship MoE. Strong reasoning/code/multilingual. Apache 2.0. Most downloaded family on HuggingFace. |
| Gemma 3 | 270M to 27B. Multimodal. 128K context. 140+ languages. | |
| OLMo 2/3 | Allen AI | Fully open (data, code, weights, logs). OLMo 2 32B surpasses GPT-3.5. Apache 2.0. |
| SmolLM3-3B | Hugging Face | Outperforms Llama-3.2-3B. Dual-mode reasoning. 128K context. |
| Kimi K2 | Moonshot AI | 32B active. Open-weight. Tailored for coding/agentic use. |
| Llama 4 Scout | Meta | 109B MoE/17B active. 10M token context. Fits single H100. |
Code-Specialized Models
| Model | Key Detail |
|---|---|
| Qwen3-Coder (480B-A35B) | 69.6% SWE-bench β milestone for open-source coding. 256K context. Apache 2.0. |
| Devstral 2 (123B) | 72.2% SWE-bench Verified. 7x more cost-efficient than Claude Sonnet. |
| Codestral 25.01 | Mistral's code model. 80+ languages. Fill-in-the-Middle support. |
| DeepSeek-Coder-V2 | 236B MoE / 21B active. 338 programming languages. |
| Qwen 2.5-Coder | 7B/32B. 92 programming languages. 88.4% HumanEval. Apache 2.0. |
Foundational Models (Historical Reference)
These models established key concepts but are largely superseded for practical use:
| Model | Provider | Significance |
|---|---|---|
| BLOOM 176B | BigScience | First major open multilingual LLM (2022) |
| GLM-130B | Tsinghua | Open bilingual English/Chinese LLM (2023) |
| Falcon 180B | TII | Large open generative model (2023) |
| Mixtral 8x7B | Mistral AI | Pioneered MoE architecture for open models (2023) |
| GPT-NeoX-20B | EleutherAI | Early open autoregressive LLM |
| GPT-J-6B | EleutherAI | Early open causal language model |
AI Content Detectors
π
Leading Commercial Detectors
| Name | Accuracy | Key Feature | Link |
|---|---|---|---|
| GPTZero | 99% claimed | 10M+ users, #1 on G2 (2025). Detects GPT-4/5, Gemini, Claude, Llama. Free tier available. | Website |
| Originality.ai | 98β100% (peer-reviewed) | Consistently rated most accurate. Combines AI detection + plagiarism + fact checking. From $14.95/month. | Website |
| Turnitin AI Detection | 98%+ on unmodified AI text | Dominant in academia. Launched AI bypasser/humanizer detection (Aug 2025). Institutional licensing. | Website |
| Copyleaks | 99%+ claimed | Enterprise tool detecting AI in 30+ languages. LMS integrations. | Website |
| Winston AI | 99.98% claimed | OCR for scanned documents, AI image/deepfake detection. 11 languages. | Website |
| Pangram Labs | 99.3% (COLING 2025) | Highest score in COLING 2025 Shared Task. 100% TPR on "humanized" text. 97.7% adversarial robustness. | Website |
Free and Research Detectors
| Name | Description | Link |
|---|---|---|
| Binoculars | Open-source research detector using cross-perplexity between two LLMs. | arXiv |
| DetectGPT / Fast-DetectGPT | Statistical method comparing log-probabilities of original text vs. perturbations. | arXiv |
| Openai Detector | AI classifier for indicating AI-written text (OpenAI Detector Python wrapper) | [GitHub] |
| Sapling AI Detector | Free browser-based detector (up to 2,000 chars). 97% accuracy in some studies. | Website |
| QuillBot AI Detector | Free, no sign-up required. | Website |
| Writer AI Content Detector | Free tool with color-coded results. | Website |
| ZeroGPT | Popular free detector evaluated in multiple academic studies. | Website |
Watermarking Approaches
| Name | Description | Link |
|---|---|---|
| SynthID (Google DeepMind) | Watermarking for AI text, images, and audio via statistical token sampling. Deployed in Google products. | Website |
| OpenAI Text Watermarking | Developed but still experimental as of 2025. Research shows fragility concerns. | Experimental |
Important caveat: No detector claims 100% accuracy. Mixed human/AI text remains hardest to detect (50β70% accuracy). Adversarial robustness varies widely. The AI detection market is projected to grow from ~$2.3B (2025) to $15B by 2035.
Books
π
Prompt Engineering
| Title | Author(s) | Publisher | Year |
|---|---|---|---|
| Prompt Engineering for LLMs | John Berryman & Albert Ziegler | O'Reilly | 2024 |
| Prompt Engineering for Generative AI | James Phoenix & Mike Taylor | O'Reilly | 2024 |
| Prompt Engineering for LLMs | Thomas R. Caldwell | Independent | 2025 |
LLM Application Development
| Title | Author(s) | Publisher | Year |
|---|---|---|---|
| AI Engineering: Building Applications with Foundation Models | Chip Huyen | O'Reilly | 2025 |
| Build a Large Language Model (From Scratch) | Sebastian Raschka | Manning | 2024 |
| Building LLMs for Production | Louis-FranΓ§ois Bouchard & Louie Peters | O'Reilly | 2024 |
| LLM Engineer's Handbook | Paul Iusztin & Maxime Labonne | Packt | 2024 |
| The Hundred-Page Language Models Book | Andriy Burkov | Self-Published | 2025 |
AI Agents
| Title | Author(s) | Publisher | Year |
|---|---|---|---|
| Building Applications with AI Agents | Michael Albada | O'Reilly | 2025 |
| AI Agents and Applications | Roberto Infante | Manning | 2025 |
| AI Agents in Action | Micheal Lanham | Manning | 2025 |
Production, Reliability, and Security
| Title | Author(s) | Publisher | Year |
|---|---|---|---|
| LLMs in Production | Christopher Brousseau & Matthew Sharp | Manning | 2025 |
| Building Reliable AI Systems | Rush Shahani | Manning | 2025 |
| The Developer's Playbook for LLM Security | Steve Wilson | O'Reilly | 2024 |
Courses
π©βπ«
Free Short Courses
- ChatGPT Prompt Engineering for Developers β Co-taught by Andrew Ng and OpenAI's Isa Fulford. The foundational starting point. (DeepLearning.AI)
- Building Systems with the ChatGPT API β Multi-step LLM system design for production. (DeepLearning.AI)
- AI Agents in LangGraph β Agentic dataflows with tool use and research agents. (DeepLearning.AI)
- Building Agentic RAG with LlamaIndex β RAG research agent construction. (DeepLearning.AI)
- Functions, Tools and Agents with LangChain β Function calling and agent building. (DeepLearning.AI)
- Prompt Engineering for Vision Models β Visual prompting techniques. (DeepLearning.AI)
University and Platform Courses
- Prompt Engineering Specialization (Vanderbilt) β 3-course series by Dr. Jules White covering foundational to advanced PE. (Coursera)
- Generative AI with LLMs (DeepLearning.AI + AWS) β LLM lifecycle, transformers, RLHF, deployment. (Coursera)
- Stanford CS336: Language Modeling from Scratch β Build an LLM end-to-end. (Stanford, 2024β2026)
- MIT 6.S191: Introduction to Deep Learning β Annual course including LLMs and generative AI. (MIT, 2024β2026)
- The Complete Prompt Engineering for AI Bootcamp β Covers GPT-5, DSPy, LangGraph, agent architectures. 58K+ ratings. (Udemy, updated Feb 2026)
Free Platform Courses
- Google Prompting Essentials β 5-step prompt design, meta-prompting, Gemini. Under 6 hours.
- Microsoft Azure AI Fundamentals: Generative AI β Free learning path covering LLMs, prompts, agents, Azure OpenAI.
- Hugging Face LLM Course β Community-driven course covering transformers, fine-tuning, building reasoning models.
- Hugging Face AI Agents Course β Agent theory to practice. 100K+ registered students.
Learn Prompting Courses
- ChatGPT for Everyone
- Introduction to Prompt Engineering
- Advanced Prompt Engineering
- Introduction to Prompt Hacking
- Advanced Prompt Hacking
- Introduction to Generative AI Agents for Business Professionals
- AI Safety
Tutorials and Guides
π
Official Provider Guides
- OpenAI Prompt Engineering Guide β Comprehensive, covering GPT-4.1/5 prompting, reasoning models, structured outputs, agentic workflows. Continuously updated.
- OpenAI GPT-4.1 Prompting Guide [2025] β Structured agent-like prompt design: goal persistence, tool integration, long-context processing.
- Anthropic Prompt Engineering Overview β Iterative prompt design, XML tags, chain-of-thought, role assignment. Includes prompt generator.
- Anthropic Claude 4 Best Practices [2025β2026] β Parallel tool execution, thinking capabilities, image processing.
- Anthropic: Effective Context Engineering for AI Agents [2025] β The evolution from prompt engineering to context engineering: agent state, memory, tools, MCP.
- Google Gemini Prompting Strategies β Multimodal prompting for Gemini via Vertex AI and AI Studio.
- Microsoft Prompt Engineering in Azure AI Studio β Tool calling, function design, few-shot prompting, prompt chaining.
Community and Independent Guides
- Prompt Engineering Guide (DAIR.AI / promptingguide.ai) β Most comprehensive open-source guide. 18+ techniques, model-specific guides, research papers. 3M+ learners. Now includes context engineering.
- Learn Prompting (learnprompting.org) β Structured free platform. Beginner to advanced PE, AI security, HackAPrompt competition.
- IBM 2026 Guide to Prompt Engineering [2026] β Curated tools, tutorials, real-world examples with Python code.
- Anthropic Interactive Tutorial β 9-chapter Jupyter notebook course with hands-on exercises.
- Lilian Weng's Prompt Engineering Guide [2023] β Highly respected technical blog from OpenAI researcher.
- Google Prompt Engineering Guide (68-page PDF) [2025] β Internal-style best-practice guide for Gemini with concrete patterns.
- DigitalOcean: Prompt Engineering Best Practices [2025] β Updated guide summarizing techniques: few-shot, chain-of-thought, role prompting, etc.
- Aakash Gupta: Prompt Engineering in 2025 [2025] β Practical guide with wisdom from shipping AI at OpenAI, Shopify, and Google.
- Best practices for prompt engineering with OpenAI API β OpenAI's introductory best practices.
- OpenAI Cookbook β Official recipes for function calling, RAG, evaluation, and complex workflows.
- Microsoft Prompt Engineering Docs β Microsoft's open prompt engineering resources.
- DALLE Prompt Book β Visual guide for text-to-image prompting.
- Best 100+ Stable Diffusion Prompts β Community-curated image generation prompts.
- Vibe Engineering (Manning) β Book by Tomasz Lelek & Artur Skowronski on building software through natural language prompts.
Videos
π₯
- Andrej Karpathy: "Deep Dive into LLMs" & "How I Use LLMs" [2024β2025] β Two of the most influential AI videos of 2024β2025. Comprehensive technical deep dive followed by practical usage patterns.
- Karpathy: "Software in the Era of AI" (YC AI Startup School) [2025] β Coined "vibe coding" (Feb 2025) and championed "context engineering" (Jun 2025).
- Karpathy: Neural Networks: Zero to Hero [2023β2024] β Full lecture series building from backpropagation to GPT.
- 3Blue1Brown: Neural Networks Series [Updated 2024] β Iconic animated visual explanations of transformers and attention mechanisms. 7M+ subscribers.
- AI Explained [2024β2025] β Long-form analysis breaking down papers, model capabilities, and PE developments.
- Sam Witteveen [2024β2025] β Practical tutorials on prompt engineering, LangChain, RAG, and agents.
- Matthew Berman [2024β2025] β Popular channel covering model releases and practical LLM usage. 600K+ subscribers.
- DeepLearning.AI YouTube [2024β2026] β Structured lessons, course previews, and Andrew Ng talks on agents and AI careers.
- Lex Fridman Podcast (AI Episodes) [2024β2025] β Long-form interviews with Altman, Hinton, Amodei on LLMs, prompting, and safety.
- ICSE 2025: AIware Prompt Engineering Tutorial [2025] β Conference tutorial covering prompt patterns, fragility, anti-patterns, and optimization DSLs.
- CMU Advanced NLP 2022: Prompting β Foundational academic lecture on prompting methods.
- ChatGPT: 5 Prompt Engineering Secrets For Beginners β Accessible intro for beginners.
Communities
π€
Discord Servers
- Learn Prompting β 40,000+ members. Largest PE Discord with courses, hackathons, HackAPrompt competitions.
- PromptsLab Discord - Community
- Midjourney β 1M+ members. Primary hub for text-to-image prompt sharing.
- OpenAI Discord β Official community with channels for GPTs, Sora, DALL-E, and API help.
- Anthropic Discord β Official Claude community for AI development collaboration.
- Hugging Face Discord β Model discussions, library support, community events.
- FlowGPT β 33K+ members. 100K+ prompts across ChatGPT, DALL-E, Stable Diffusion, Claude.
- r/PromptEngineering β Dedicated subreddit for prompt crafting techniques and discussions.
- r/ChatGPT β 10M+ members. Primary hub for ChatGPT users and prompt sharing.
- r/LocalLLaMA β Highly technical community for running open-source LLMs locally.
- r/ClaudeAI β Anthropic's Claude community: prompt sharing, API tips, model comparisons.
- r/MachineLearning β Academic-oriented ML research discussions.
- r/OpenAI β OpenAI product and API discussions.
- r/StableDiffusion β 450K+ members for AI art prompting and workflows.
- r/ChatGPTPromptGenius β 35K+ members sharing and refining prompts.
Forums and Platforms
- OpenAI Developer Community β Official forum for API help, best practices, project sharing.
- Hugging Face Community β Hub for open-source AI collaboration.
- DeepLearning.AI Community β Forum for learners discussing courses and AI careers.
- LessWrong β In-depth technical posts on AI capabilities and safety.
- AI Alignment Forum β Specialized alignment research discussions.
- CivitAI β Generative AI creators platform for sharing models, LoRAs, and prompts.
GitHub Organizations
- LangChain β Open-source LLM app framework. 100K+ stars.
- Promptslab β Generative Models | Prompt-Engineering | LLMs
- Hugging Face β Central hub: Transformers, Diffusers, Datasets, TRL.
- DSPy (Stanford NLP) β Growing community for systematic prompt optimization.
- OpenAI β Open-source models, benchmarks, and tools.
How to Contribute
We welcome contributions to this list! Before contributing, please take a moment to review our contribution guidelines. These guidelines will help ensure that your contributions align with our objectives and meet our standards for quality and relevance.
What we're looking for:
- New high-quality papers, tools, or resources with a brief description of why they matter
- Updates to existing entries (broken links, outdated information)
- Corrections to star counts, pricing, or model details
- Translations and accessibility improvements
Quality standards:
- All tools should be actively maintained (updated within the last 6 months)
- Papers should be from peer-reviewed venues or have significant community adoption
- Datasets should be publicly accessible
- Please include a one-line description explaining why the resource is valuable
Thank you for your interest in contributing to this project!
Maintained by PromptsLab Β· Star this repo if you find it useful!
Related Skills
Frontend Typescript Linting.mdc
TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...
2. Apply Deepthink Protocol (reason about dependencies
risks