×Ö½ÚÌø¶¯SeedÐû²¼NL2Repo-Bench¿ÍÕ»¼¶³¤³Ì´úÂëÌìÉú»ù×¼
2026-03-02 19:00:57

ÔÚ AI ±à³ÌÁìÓò £¬£¬£¬£¬£¬£¬¸÷ÈËËÆºõÕý´¦ÓÚÒ»¸öÈÏÖª´í¾õµÄ¼«µã£ºËæ×Å Coding Agents ×ÔÁ¦Íê³ÉʹÃüµÄÄѶȺ͹æÄ£Öð½¥ÔöÌí £¬£¬£¬£¬£¬£¬Coding ÁìÓòµÄ AGI ËÆºõ¾Í¿ÉÒÔʵÏÖ£¿ £¿£¿£¿£¿£¿

È»¶ø £¬£¬£¬£¬£¬£¬ÕæÕýµÄ¹¤³Ìʦ¶¼ÖªµÀ £¬£¬£¬£¬£¬£¬Ð´´úÂëµÄÁé»ê²»ÔÚÓÚfile/function levelµÄ code creation £¬£¬£¬£¬£¬£¬¶øÊÇ project level µÄ code completion¡£¡£¡£¡£¡£Ð´Á˺ܳ¤Ê±¼äµÄ´úÂë £¬£¬£¬£¬£¬£¬²»´ú±íÏîÄ¿×öÍê £¬£¬£¬£¬£¬£¬¸ü²»´ú±íÏîÄ¿×öºÃÁË¡£¡£¡£¡£¡£

Ò»¸öÍêÕûµÄÏîÄ¿¿ª·¢ÒªÇ󿪷¢Õß´ÓÒ»¸ö¿ÕÎļþ¼Ð×îÏÈ £¬£¬£¬£¬£¬£¬Ã÷È·ÉÏÍò token µÄÐèÇó £¬£¬£¬£¬£¬£¬Éè¼Æ¼Ü¹¹¡¢ÖÎÀí¶àģ̬Âß¼­ £¬£¬£¬£¬£¬£¬²¢²ú³ö¿É×°ÖᢿÉÔËÐеĴúÂë¿ÍÕ»¡£¡£¡£¡£¡£È»¶øÏÖÓдúÂëÆÀ²â»ù×¼Ö÷Òª¼¯ÖÐÔÚ¾Ö²¿´úÂëÌìÉú£¨Èç HumanEval¡¢MBPP£©»òÔÚÒÑÓдúÂë¿âÉϾÙÐÐÐÞ¸´£¨Èç SWE-bench£©¡£¡£¡£¡£¡£

¿ËÈÕ £¬£¬£¬£¬£¬£¬Ê׸öרÃÅÆÀ¹À±àÂëÖÇÄÜÌå¶Ëµ½¶Ë¿ÍÕ»ÌìÉúÄÜÁ¦µÄ»ù×¼²âÊÔ ¡ª¡ªNL2Repo-Bench ÕýʽÐû²¼¡£¡£¡£¡£¡£ËüÓÉ×Ö½ÚÌø¶¯ Seed¡¢ÄϾ©´óѧ¡¢±±¾©´óѧµÈ¶à¼Ò»ú¹¹µÄÑо¿ÕßÍŽá´òÔì £¬£¬£¬£¬£¬£¬Ðû²¼ºóÊܵ½ÆÕ±é¹Ø×¢¡£¡£¡£¡£¡£

ÂÛÎÄÎÊÌ⣺NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding AgentsÂÛÎÄÖ÷Ò³£ºhttps://huggingface.co/papers/2512.12730ÏîÄ¿Á´½Ó£ºhttps://github.com/multimodal-art-projection/NL2RepoBenchArXiv ÂÛÎÄ£ºhttps://arxiv.org/pdf/2512.12730

Show me your Repo £¬£¬£¬£¬£¬£¬

NL2Repo ÔõÑù¿¼²ì Coding Agent ´Ó 0 µ½ 1 ÊÂÇéÄÜÁ¦£¿ £¿£¿£¿£¿£¿

ÔÚ OpenAI ¶ÔͨÓÃÈ˹¤ÖÇÄÜ£¨AGI£©µÄ½ç˵ÖÐ £¬£¬£¬£¬£¬£¬AGI ÐèÒªÔÚ´ó´ó¶¼¾ßÓо­¼Ã¼ÛÖµµÄʹÃüÉϵִï»òÁè¼ÝÈËÀàÌåÏÖ¡£¡£¡£¡£¡£ÔÚÈí¼þ¹¤³ÌÁìÓò £¬£¬£¬£¬£¬£¬ÕâÖÖÔ¸¾°Òâζ×Å¿ª·¢·½·¨µÄÇ㸲ʽת±ä£ºÈËÀàÖ»ÐèÌṩÐèÇó £¬£¬£¬£¬£¬£¬Coding Agent ¼´¿É×ÔÁ¦Íê³É¿ª·¢¡¢µ÷ÊÔ¡¢°²ÅŵÈËùÓл·½Ú £¬£¬£¬£¬£¬£¬ÈËÀ಻ÔÙÐèÒªÖ±½Óд´úÂë¡£¡£¡£¡£¡£

ÓëÒÔÍùÒÀÀµ LLM ÆÀ·Ö»ò¶ÔÒÑÓдúÂë¿ÍÕ»¾ÙÐÐÐ޸ĵĻù×¼²î±ð £¬£¬£¬£¬£¬£¬NL2Repo-Bench µÄÉè¼ÆÁÁµãÔÚÓÚ´Ó ¡°ÈËÀ಻ÔÙÐèÒªÖ±½Óд´úÂë" µÄ×îÖÕÔ¸¾°³ö·¢ £¬£¬£¬£¬£¬£¬Éè¼ÆÁ˼«ÆäÑÏ¿áµÄ ¡°Áã´úÂëÖ´ÐÐÆÀ¹À¡± »úÖÆ¡£¡£¡£¡£¡£¸Ã»ù×¼ÒªÇóÖÇÄÜÌåÃæÁÙÍêÈ«Õæ¿ÕµÄ³õʼÊÂÇé¿Õ¼ä £¬£¬£¬£¬£¬£¬½öͨ¹ýƽ¾ù³¤¶È³¬ 1.8 Íò token µÄ³¤ÆªÐèÇó˵Ã÷ £¬£¬£¬£¬£¬£¬×ÔÖ÷¾ÙÐÐÐèÇóÃ÷È·¡¢¿ª·¢¡¢²âÊÔ¡¢¶àÎļþЭͬÖÎÀíµÈÈ«Á´Â·ÊÂÇé¡£¡£¡£¡£¡£

¼òÆÓÀ´Ëµ £¬£¬£¬£¬£¬£¬NL2Repo ÍÅ¶Ó´Ó GitHub ÌôÑ¡ÁË 104 ¸öÓµÓÐÍêÕû pytest ²âÊÔÓÃÀýµÄ Python ¿ªÔ´ÏîÄ¿¡£¡£¡£¡£¡£ÊµÑéÀú³ÌÖÐ £¬£¬£¬£¬£¬£¬²î±ðµÄ Coding Agent ÐèҪƾ֤ר¼Ò¹¹½¨µÄ¸ßÖÊÁ¿ÐèÇóÎĵµ £¬£¬£¬£¬£¬£¬´ÓÁ㸴ÏÖÕû¸ö¿ÍÕ» £¬£¬£¬£¬£¬£¬²¢ÒÔÏîĿԭÓеIJâÊÔÓÃÀý×÷Ϊ»ù×¼À´ÆÀ¹À¸´ÏÖЧ¹û¡£¡£¡£¡£¡£

NL2Repo-Bench ÊÇÔõÑù¹¹½¨ÆÀ²âµÄ£¿ £¿£¿£¿£¿£¿

Ê×ÏÈÊÇʹÃüѡȡ¡£¡£¡£¡£¡£

¹¹½¨ NL2Repo-Bench ÕâÒ»»ù×¼ÆÀ²âÊý¾Ý¼¯µÄÖ÷ÒªÌôÕ½ÔÚÓÚ £¬£¬£¬£¬£¬£¬ÔõÑù´Óº£Á¿µÄ GitHub ¿ªÔ´¿ÍÕ»ÖÐÝÍÈ¡³ö¾ß±¸¸ßÊÖÒÕº¬Á¿ÇÒ¿ÉÑéÖ¤µÄ»Æ½ðÑù±¾¡£¡£¡£¡£¡£

ΪÁËʹÓÿÉÑéÖ¤µÄÕæÖµ£¨Ground Truth£©ÆÀ¹À¿ÍÕ»¼¶´úÂëÌìÉúÄÜÁ¦ £¬£¬£¬£¬£¬£¬NL2Repo-Bench ´Ó¾ßÓÐÄ£¿ £¿£¿£¿£¿£¿é»¯¼Ü¹¹ºÍȨÍþ pytest ²âÊÔÌ×¼þµÄÕæÊµ Python ¿âÖÐÌáȡʹÃü¡£¡£¡£¡£¡£Coding Agent ½öÎüÊÕ¼òµ¥µÄ×ÔÈ»ÓïÑԹ淶 £¬£¬£¬£¬£¬£¬±ØÐè´ÓÁã×îÏÈÖØÐÞÍêÕûµÄ¿ÍÕ» £¬£¬£¬£¬£¬£¬°üÀ¨Îļþ½á¹¹ºÍ¹¦Ð§Âß¼­¡£¡£¡£¡£¡£×¼È·ÐÔÑÏ¿áͨ¹ýÔÚԭʼÉÏÓβâÊÔÌ×¼þÖÐÔËÐÐÌìÉúµÄ´úÂëÀ´È¨ºâ¡£¡£¡£¡£¡£

ΪÁËÈ·±£ÆÀ²âÊý¾ÝµÄÏÖʵÒâÒåÓëÊÖÒÕÉî¶È £¬£¬£¬£¬£¬£¬ÍŶÓÔÚɸѡÁ÷³ÌÉ趨Á˶àά¶ÈµÄ×¼ÈëÃż÷£º

»îÔ¾¶È£º½ü 3 ÄêÄÚÓÐÖÁÉÙÒ»´Î¸üС£¡£¡£¡£¡£È¨ÍþÐÔ£ºGithub ÐÇÊýÖÁÉÙΪ 10¡£¡£¡£¡£¡£ÍêÕûÐÔ£º°üÀ¨ÇåÎúµÄĿ¼½á¹¹¡¢ÍêÕû²âÊÔÓÃÀý£¨pytest/unittest£©¡£¡£¡£¡£¡£ÇÒÔ´´úÂë²ÖÄܹ»Í¨¹ýÆä×Ô´øµÄ²âÊÔÓÃÀý¡£¡£¡£¡£¡£¸ßÄѶȣº´úÂë×ÜÐÐÊýÐèÔÚ 300 ÐÐÒÔÉÏ£¨¾ø´ó²¿·ÖʹÃüÁè¼Ý 1000 ÐÐ £¬£¬£¬£¬£¬£¬²¿·ÖʹÃü¹ýÍòÐУ©¡£¡£¡£¡£¡£´ú±íÐÔ£ºÁýÕÖ¹¤¾ßÀࣨÈçÊý¾Ýϴ媿⣩¡¢¿ò¼ÜÀࣨÈçÇáÁ¿¼¶ Web ¿ò¼Ü£©¡¢Ëã·¨ÀࣨÈçͼÏñ´¦Öóͷ£¿â£©µÈ¶à¸ö²î±ðÀàÐ굀 python library¡£¡£¡£¡£¡£

Ñ¡Ôñ Python Library ¼¶±ðµÄ¿ÍÕ»×÷ΪĿµÄ £¬£¬£¬£¬£¬£¬ÕýÊÇÓÉÓÚÆä¿ªÔ´ÊôÐÔÓë¹æ·¶»¯Ë®Æ½ÍêÉÆÆõºÏÁËÕâÒ»ÑéÖ¤»úÖÆ £¬£¬£¬£¬£¬£¬´øÓÐÍêÕûµÄ²âÊÔÓÃÀýµÈÌØÕ÷ £¬£¬£¬£¬£¬£¬ÎªÆÀ¹À´óÄ£×ÓÔÚ¿ÍÕ»¼¶´úÂëÌìÉúÉϵÄÕæÊµÌåÏÖÌṩÁË¿ÆÑ§µÄʵÑ鳡¡£¡£¡£¡£¡£

ÆÀ²â¹¹½¨Á÷³Ìͼ

ʹÃüÁýÕÖ·½Ãæ £¬£¬£¬£¬£¬£¬NL2RepoBench °üÀ¨ 104 ¸öÕæÊµ Python ¿ÍÕ»¼¶Ê¹Ãü £¬£¬£¬£¬£¬£¬º­¸Ç¹¤¾ßÀà¡¢¿ò¼ÜÀà¡¢Ëã·¨ÀàµÈ¶à¸öÖ÷Á÷ Python ¿âÖֱ𠣬£¬£¬£¬£¬£¬ÑϿῼ²ì Agent ´Ó×ÔÈ»ÓïÑÔÎĵµ³ö·¢×ÔÁ¦¿ª·¢¿ÉÖ±½ÓÔËÐС¢¿É°²ÅŵÄÈí¼þ¿ÍÕ»ÄÜÁ¦¡£¡£¡£¡£¡£

ÔõÑùÏû³ý Coding Agent ÆÀ¹ÀÀú³ÌÖеÄËæ»úÐÔ£¿ £¿£¿£¿£¿£¿

ÐèÇóÎĵµ + ÆÀ²âÇéÐÎ + È«Á÷³Ì QC

ÔÚ°ü¹Ü NL2Repo-Bench ʹÃüÎĵµÖÊÁ¿µÄÀú³ÌÖÐ £¬£¬£¬£¬£¬£¬¹¹½¨ÍŶÓÈ·Á¢ÁËÒ»Ì×ÑÏÃܵÄ×Ô¶¯»¯¹¤¾ßÓëÈ˹¤Éî¶È¼ÓÈëÏàÍŽáµÄÑé֤ϵͳ¡£¡£¡£¡£¡£

NL2Repo ʹÃüÎĵµÊ¾Àý

1. ΪÁ˾«×¼Ëø¶¨¿ÍÕ»µÄ½¹µã¹¦Ð§½Úµã £¬£¬£¬£¬£¬£¬ÊÖÒÕÍŶÓÊ×ÏÈʹÓþ²Ì¬É¨Ã蹤¾ß¶ÔÔ´´úÂë¾ÙÐÐÍØÆËÆÊÎö £¬£¬£¬£¬£¬£¬ÌáÈ¡³öÖ§³ÖÏîÄ¿ÔËÐеÄÒªº¦¼Ü¹¹ÐÅÏ¢¡£¡£¡£¡£¡£

2. ÔÚ´Ë»ù´¡ÉÏ £¬£¬£¬£¬£¬£¬Ê¹ÃüÎĵµµÄ±àд׷Ç󼫸ߵÄÑϽ÷ÐÔÓëÖÜÈ«ÐÔ £¬£¬£¬£¬£¬£¬Í¨¹ý ¡°È˹¤×¨¼Ò + AI ¹¤¾ß¡± µÄË«ÖØÐ£Ñé»úÖÆ £¬£¬£¬£¬£¬£¬È·±£Ã¿Ò»¸ö½¹µã¹¦Ð§½ÚµãÔÚÐèÇóÐÎòÖоùÎÞÒÅ© £¬£¬£¬£¬£¬£¬ÎªÄ£×ӵĴúÂëÌìÉúÌṩ׼ȷµÄÖ¸Òý¡£¡£¡£¡£¡£

3. ÆÀ²âÇéÐεÄÎȹÌÐÔÊÇÈ·±£Ð§¹û¿ÉÖØ¸´ÐԵĻùʯ¡£¡£¡£¡£¡£Îª´Ë £¬£¬£¬£¬£¬£¬ÍŶӶÔʹÃüÏà¹ØµÄ¾µÏñÇéÐξÙÐÐÁËϸÄ廯ÉèÖà £¬£¬£¬£¬£¬£¬Í¨¹ý×îС»¯·Ç¹¦Ð§ÐÔÒÀÀµ £¬£¬£¬£¬£¬£¬Ïû³ýÁËÓÉÓÚÇéÐ⨶¯´øÀ´µÄ×ÌÈÅÏî¡£¡£¡£¡£¡£

ÿһÏîʹÃü´ÓÆðÔ´Æð²Ýµ½×îÖÕÊÕÈëÆÀ²â¼¯ £¬£¬£¬£¬£¬£¬¶¼±ØÐèÇ¿ÖÆÍ¨¹ýÈ˹¤ÎĵµÉóºË¡¢¾²Ì¬¹¤¾ß¼ì²â¡¢¾µÏñÇéÐÎÑéÖ¤ÒÔ¼°Ô¤ÊµÑéÑéÖ¤ÕâËĸö½×¶Î¡£¡£¡£¡£¡£ÕâÖÖÈ«ÉúÃüÖÜÆÚµÄÖÊÁ¿¿ØÖƱջ· £¬£¬£¬£¬£¬£¬ÓÐÓÃɨ³ýÁ˵ÍÖÊÁ¿Ê¹Ãü¶Ô»ù×¼²âÊÔÐŶȵÄÓ°Ïì £¬£¬£¬£¬£¬£¬È·±£ÁË NL2Repo-Bench Äܹ»ÕæÊµ·´Ó¦ Coding Agent ÔÚÖØ´ó¹¤³Ì³¡¾°ÏµĽ¹µã¾ºÕùÁ¦¡£¡£¡£¡£¡£

Repo Ò»Ëó³ö £¬£¬£¬£¬£¬£¬

Ò»Ïß Coding Agent ÏÖʵÌåÏÖÔõÑù£¿ £¿£¿£¿£¿£¿

NL2Repo-Bench ÍŶÓÊ×´ÎÍêÕû²âÊÔÁËÄ¿½ñ×îÇ¿µÄ Coding Agent £¬£¬£¬£¬£¬£¬Ð§¹ûÏÔʾ¼´¼´ÊÇÌåÏÖ×î¼ÑµÄ Claude4.5 £¬£¬£¬£¬£¬£¬ÕûÌåͨ¹ýÂÊÈÔµÍÓÚ 40% £¬£¬£¬£¬£¬£¬´ó¶¼Ä£×ÓµÄÕûÌåÌåÏÖ½öÔÚ 20% ×óÓÒ¡£¡£¡£¡£¡£

ʹÃüÄѶÈÉÏÉý £¬£¬£¬£¬£¬£¬Ä£×ÓÌåÏÖ¿ìËÙϽµ£ºÕæÊµÖØ´óÏîÄ¿¿ª¾ÙʶÈÓÐÓÃÌåÏÖ¡£¡£¡£¡£¡£Claude ¼Ò×åÒ£Ò£ÁìÏÈ £¬£¬£¬£¬£¬£¬GPT5 ÒâÍâÂäÎ飺½»»¥Õ½ÂÔµÄȱÏÝÏÔ×ÅÍÏÀÛÁË GPT5 ÌåÏÖ¡£¡£¡£¡£¡£

NL2Repo-Bench ÍŶӽøÒ»²½ÆÊÎöÁËÄ£×ÓŲÓù¤¾ßµÄÆ«ºÃÓ뿪·¢Õ½ÂÔ £¬£¬£¬£¬£¬£¬·¢Ã÷ÒÔϵ䷶ÎÊÌ⣺

ÔçÍ££¨Early-Stop£©£º²¿·ÖÄ£×Óȱ·¦³¤³ÌÍýÏë £¬£¬£¬£¬£¬£¬¹ýÔçÖÕÖ¹¿ª·¢£» £» £»£»£»Î´ÖÕÖ¹£¨Non-Finish£©£ºÄ£×ÓÆµÈÔÏÝÈëÆÚ´ýÓû§Ö¸ÁîµÄ״̬ £¬£¬£¬£¬£¬£¬¿ª·¢Î´Íê³É£» £» £»£»£»Ã¤Ä¿±à¼­Óëµ¼º½ÏÝÚ壺²¿·Ö Agent ȱ·¦ÏµÍ³ÐÔÍýÏë £¬£¬£¬£¬£¬£¬ÆÌÕÅ´ó×ÚÂÖ´ÎÔÚÎÞÒâÒå²Ù×÷¡£¡£¡£¡£¡£

ÏûÈÚʵÑé 1£ºÂÖ´ÎÊý¶ÔÄ£×ÓÌåÏÖµÄÓ°Ïì

NL2Repo-Bench ÍŶӷ¢Ã÷ £¬£¬£¬£¬£¬£¬½»»¥ÂÖ´ÎÔöÌíµ½ 200 ´Î×óÓÒ¿ÉÏÔÖøÌá¸ßÄ£×ÓÌåÏÖ¡£¡£¡£¡£¡£±ðµÄ £¬£¬£¬£¬£¬£¬¼´±ãÔÚ ¡°¿ª¾í¿¼ÊÔ¡±£¨Ìṩ²âÊÔÓÃÀý£©µÄÌõ¼þÏ £¬£¬£¬£¬£¬£¬Ä£×ÓÒ²ÄÑÒÔÍ»ÆÆ 60 ·Ö £¬£¬£¬£¬£¬£¬×ã¼ûÕæÊµ¿ÍÕ»¼¶¿ª·¢Ê¹ÃüÄѶÈÖ®¸ß¡£¡£¡£¡£¡£

claude4.5 µÃ·Öת±äÇ÷ÊÆÍ¼

ÏûÈÚʵÑé 2£ºÐ¹Â¶²âÊÔÓÃÀý¶ÔÄ£×ÓÌåÏÖµÄÓ°Ïì

Ö÷ʵÑéÖÐ £¬£¬£¬£¬£¬£¬CodingAgent ³ýÁËʹÃüÎĵµºÍÖ¸ÁîÍâûÓÐÈκÎÊäÈëÄÚÈÝ¡£¡£¡£¡£¡£ ΪÁËÅжϲâÊÔÓÃÀýÄÜ·ñ¶ÔÄ£×ӵĿª·¢ÊÂÇéʵÏÖÓÐÓø¨Öú £¬£¬£¬£¬£¬£¬NL2Repo-Bench ÍŶÓѡȡ Claude4.5+ClaudeCode £¬£¬£¬£¬£¬£¬ÔÚÖ´ÐÐʹÃüµÄ workspace ÖÐ×¢ÈëÁ˲âÊԽ׶εÄËùÓвâÊÔÎļþ¡£¡£¡£¡£¡£

ʵÑéЧ¹û£ºÌìÉú½×¶ÎÌṩ²âÊÔÓúÓÄϺÓÂåÒ½ÁƿƼ¼ÓÐÏÞ¹«Ë¾Àýºó £¬£¬£¬£¬£¬£¬Ä£×ÓÔÚ¸÷¸öÄѶÈʹÃüµÄÌåÏÖ¶¼ÓÐÁËÏÔ×ŵÄÌáÉý £¬£¬£¬£¬£¬£¬µ«×ÜÌåµÃ·ÖÈÔȻƫµÍ£¨59.4 £¬£¬£¬£¬£¬£¬µÍÓÚ 60 ·Ö£© ¡£¡£¡£¡£¡£ÕâһЧ¹ûÒ»·½ÃæÅú×¢Ìṩ²âÊÔÓÃÀýµÄÇéÐÎȷʵÄܹ»ÊµÏÖ¶ÔÄ£×Ó¿ª·¢µÄ¸¨Öú £¬£¬£¬£¬£¬£¬ÁíÒ»·½Ãæ £¬£¬£¬£¬£¬£¬ÒÀÈ»½ÏµÍµÄ all-pass rate Ò²Åú×¢ÎúÄ¿½ñµÄ coding-agent ×ÝÈ»ÊÇÔÚ ¡°¿ª¾í¿¼ÊÔ¡± µÄÇéÐÎÏÂÒ²ÒÀÈ»½ÏÄÑʵÏÖÍêÕû¿ÍÕ»µÄ³¤³Ì¿ª·¢¡£¡£¡£¡£¡£