{"id":22035,"date":"2026-03-26T20:40:58","date_gmt":"2026-03-26T20:40:58","guid":{"rendered":"https:\/\/codegen.com\/blog\/glossary\/swe-bench\/"},"modified":"2026-03-26T20:40:58","modified_gmt":"2026-03-26T20:40:58","slug":"swe-bench","status":"publish","type":"glossary-term","link":"https:\/\/codegen.com\/blog\/glossary\/swe-bench\/","title":{"rendered":"SWE-bench"},"content":{"rendered":"<p>SWE-bench is a benchmark for evaluating AI coding agents against real GitHub issues. It presents agents with actual bug reports and feature requests from open-source repositories and measures whether the agent can produce a correct fix.<\/p>\n<p>SWE-bench Verified is a curated subset with human-validated solutions. Top scores currently exceed 70% on the verified set. The benchmark has become the standard evaluation for autonomous coding capabilities.<\/p>\n<p>A key finding from SWE-bench research is that scaffolding matters as much as the model. In one test, three different agent frameworks running the same underlying model scored 17 issues apart on 731 problems. The architecture around the model does real work.<\/p>\n","protected":false},"template":"","glossary-category":[46],"class_list":["post-22035","glossary-term","type-glossary-term","status-publish","hentry","glossary-category-coding-agents"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.3 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>SWE-bench - The Codegen Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/codegen.com\/blog\/glossary\/swe-bench\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"SWE-bench\" \/>\n<meta property=\"og:description\" content=\"SWE-bench is a benchmark for evaluating AI coding agents against real GitHub issues. It presents agents with actual bug reports and feature requests from open-source repositories and measures whether the agent can produce a correct fix. SWE-bench Verified is a curated subset with human-validated solutions. Top scores currently exceed 70% on the verified set. The [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/codegen.com\/blog\/glossary\/swe-bench\/\" \/>\n<meta property=\"og:site_name\" content=\"The Codegen Blog\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@codegen\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/codegen.com\\\/blog\\\/glossary\\\/swe-bench\\\/\",\"url\":\"https:\\\/\\\/codegen.com\\\/blog\\\/glossary\\\/swe-bench\\\/\",\"name\":\"SWE-bench - The Codegen Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/codegen.com\\\/blog\\\/#website\"},\"datePublished\":\"2026-03-26T20:40:58+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/codegen.com\\\/blog\\\/glossary\\\/swe-bench\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/codegen.com\\\/blog\\\/glossary\\\/swe-bench\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/codegen.com\\\/blog\\\/glossary\\\/swe-bench\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/codegen.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Glossary\",\"item\":\"https:\\\/\\\/codegen.com\\\/blog\\\/glossary\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"SWE-bench\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/codegen.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/codegen.com\\\/blog\\\/\",\"name\":\"The Codegen Blog\",\"description\":\"What we\u2019re building, how we\u2019re building it, and what we\u2019re learning along the way.\",\"publisher\":{\"@id\":\"https:\\\/\\\/codegen.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/codegen.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/codegen.com\\\/blog\\\/#organization\",\"name\":\"Codegen\",\"url\":\"https:\\\/\\\/codegen.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/codegen.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/codegenblog.kinsta.cloud\\\/wp-content\\\/uploads\\\/2025\\\/07\\\/Codegen_Lockup-Black-1024h-scaled.png\",\"contentUrl\":\"https:\\\/\\\/codegenblog.kinsta.cloud\\\/wp-content\\\/uploads\\\/2025\\\/07\\\/Codegen_Lockup-Black-1024h-scaled.png\",\"width\":2560,\"height\":528,\"caption\":\"Codegen\"},\"image\":{\"@id\":\"https:\\\/\\\/codegen.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/codegen\"]}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"SWE-bench - The Codegen Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/codegen.com\/blog\/glossary\/swe-bench\/","og_locale":"en_US","og_type":"article","og_title":"SWE-bench","og_description":"SWE-bench is a benchmark for evaluating AI coding agents against real GitHub issues. It presents agents with actual bug reports and feature requests from open-source repositories and measures whether the agent can produce a correct fix. SWE-bench Verified is a curated subset with human-validated solutions. Top scores currently exceed 70% on the verified set. The [&hellip;]","og_url":"https:\/\/codegen.com\/blog\/glossary\/swe-bench\/","og_site_name":"The Codegen Blog","twitter_card":"summary_large_image","twitter_site":"@codegen","twitter_misc":{"Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/codegen.com\/blog\/glossary\/swe-bench\/","url":"https:\/\/codegen.com\/blog\/glossary\/swe-bench\/","name":"SWE-bench - The Codegen Blog","isPartOf":{"@id":"https:\/\/codegen.com\/blog\/#website"},"datePublished":"2026-03-26T20:40:58+00:00","breadcrumb":{"@id":"https:\/\/codegen.com\/blog\/glossary\/swe-bench\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/codegen.com\/blog\/glossary\/swe-bench\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/codegen.com\/blog\/glossary\/swe-bench\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/codegen.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Glossary","item":"https:\/\/codegen.com\/blog\/glossary\/"},{"@type":"ListItem","position":3,"name":"SWE-bench"}]},{"@type":"WebSite","@id":"https:\/\/codegen.com\/blog\/#website","url":"https:\/\/codegen.com\/blog\/","name":"The Codegen Blog","description":"What we\u2019re building, how we\u2019re building it, and what we\u2019re learning along the way.","publisher":{"@id":"https:\/\/codegen.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/codegen.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/codegen.com\/blog\/#organization","name":"Codegen","url":"https:\/\/codegen.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/codegen.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/codegenblog.kinsta.cloud\/wp-content\/uploads\/2025\/07\/Codegen_Lockup-Black-1024h-scaled.png","contentUrl":"https:\/\/codegenblog.kinsta.cloud\/wp-content\/uploads\/2025\/07\/Codegen_Lockup-Black-1024h-scaled.png","width":2560,"height":528,"caption":"Codegen"},"image":{"@id":"https:\/\/codegen.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/codegen"]}]}},"_links":{"self":[{"href":"https:\/\/codegen.com\/blog\/wp-json\/wp\/v2\/glossary-term\/22035","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/codegen.com\/blog\/wp-json\/wp\/v2\/glossary-term"}],"about":[{"href":"https:\/\/codegen.com\/blog\/wp-json\/wp\/v2\/types\/glossary-term"}],"wp:attachment":[{"href":"https:\/\/codegen.com\/blog\/wp-json\/wp\/v2\/media?parent=22035"}],"wp:term":[{"taxonomy":"glossary-category","embeddable":true,"href":"https:\/\/codegen.com\/blog\/wp-json\/wp\/v2\/glossary-category?post=22035"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}