
×÷Õß £ü ÖÜһЦÓÊÏä £ü zhouyixiao@pingwest.com
2025Äê4Ô£¬£¬£¬£¬£¬£¬£¬£¬»¹ÔÚOpenAIµÄҦ˳Óê·¢ÁËһƪ²©ÎÄ¡¶The Second Half¡·£¬£¬£¬£¬£¬£¬£¬£¬Ìá³öÒ»¸öÅжϣºAI½øÈëϰ볡ÁË£¬£¬£¬£¬£¬£¬£¬£¬½ÓÏÂÀ´±ÈµÄ²»ÊÇËÄ£×Ó¸ü´ó£¬£¬£¬£¬£¬£¬£¬£¬¶øÊÇËÄܸüºÃµØ½ç˵ÎÊÌâ¡£¡£¡£¡£¡£
°ëÄêºóËû¼ÓÈëÌÚѶ¡£¡£¡£¡£¡£ÓÖ¹ýÁËÁ½¸öÔ£¬£¬£¬£¬£¬£¬£¬£¬ËûÖ÷µ¼µÄµÚÒ»¸öÑо¿Ð§¹ûÐû²¼ÁË¡£¡£¡£¡£¡£Õâ¸öЧ¹û²¢Î´ÍƳöÐÂÄ£×Ó£¬£¬£¬£¬£¬£¬£¬£¬ËüÖ±½ÓÅ׳öÁËÒ»¸öÊý¾Ý£¬£¬£¬£¬£¬£¬£¬£¬GPT-5.1ÔÚÒ»ÏîвâÊÔÖÐÖ»Äõ½ÁË23.7%¡£¡£¡£¡£¡£
²âÊÔµÄÉ趨ºÜ¼òÆÓ£¬£¬£¬£¬£¬£¬£¬£¬°ÑËùÓÐÐèÒªµÄÐÅÏ¢¶¼·ÅÔÚÉÏÏÂÎÄÀ£¬£¬£¬£¬£¬£¬£¬ÈÃÄ£×ÓÈ¥Íê³ÉʹÃü¡£¡£¡£¡£¡£¿£¿£¿£¿£¿£¿£¿£¿¼µÄÊÇÄ£×ÓÄÜ·ñ´ÓÑÛǰµÄÖÊÁÏÀïѧ»áй¤¾ß¡£¡£¡£¡£¡£
Ч¹ûÊÇÄ£×Ó¿´ÁË£¬£¬£¬£¬£¬£¬£¬£¬µ«Ã»Ñ§»á¡£¡£¡£¡£¡£
1
û·¨×÷±×µÄ¿¼ÊÔ
ÕâÆªÂÛÎĽÐCL-bench£¬£¬£¬£¬£¬£¬£¬£¬È«³ÆContext Learning Benchmark£¬£¬£¬£¬£¬£¬£¬£¬2026Äê2ÔÂ3ÈÕÓÉÌÚѶ»ìÔªÍŶӺ͸´µ©´óѧÍŽáÐû²¼¡£¡£¡£¡£¡£×÷ΪÏîÄ¿ÈÏÕæÈË£¬£¬£¬£¬£¬£¬£¬£¬Ò¦Ë³ÓêÅÅÔÚ×÷ÕßÁбíµÄ×îºóһλ¡£¡£¡£¡£¡£

Context Learning²»ÊÇп´·¨£¬£¬£¬£¬£¬£¬£¬£¬µ«ÕâÆªÂÛÎĶÔËüµÄ½ç˵¼«Îª¿Á¿Ì£¬£¬£¬£¬£¬£¬£¬£¬Ä£×Ó±ØÐè´ÓÉÏÏÂÎÄÖÐѧϰµ½Ô¤ÑµÁ·½×¶Î²»±£´æµÄÐÂ֪ʶ£¬£¬£¬£¬£¬£¬£¬£¬²¢×¼È·Ó¦Óᣡ£¡£¡£¡£¼òÆÓ˵£¬£¬£¬£¬£¬£¬£¬£¬ÒªÈÃÄ£×ÓÏÖ³¡Ñ§»áËüû¼û¹ýµÄ¹¤¾ß£¬£¬£¬£¬£¬£¬£¬£¬²»µ«ÊÇ¡°»ØÒ䡱ËüÒÔǰ¼û¹ýµÄÄÚÈÝ¡£¡£¡£¡£¡£
ΪÁËʵÏÖÕâ¸öÄ¿µÄ£¬£¬£¬£¬£¬£¬£¬£¬Ñо¿ÍŶÓÔÚÊý¾Ý¹¹½¨ÉÏÏÂÁ˺ݹ¦·ò¡£¡£¡£¡£¡£
ÏÖÔÚÒµ½ç±ÜÃâÊý¾ÝÎÛȾ×î³£¼ûµÄ×ö·¨½ÏÁ¿¼òÆÓ´Ö±©£¬£¬£¬£¬£¬£¬£¬£¬Éè׼ʱ¼äÇиîµã£¨ºÃ±ÈÖ»¿¼2024ÄêÒÔºóµÄÐÂÎÅ£©¡¢°ÑÌâ¿â²ØÆðÀ´²»¹ûÕæ¡¢»òÕßÓÃËã·¨È¥ÖØ¡£¡£¡£¡£¡£CL-bench×öµÄÍêÈ«ÊÇÁíÒ»»ØÊ£¬£¬£¬£¬£¬£¬£¬£¬ËüÔÚ¡°ÔìÎ¡£¡£¡£¡£¡£
Ñо¿ÍŶÓ×éÖ¯ÁËÒ»ÅúÁìÓòר¼Ò£¬£¬£¬£¬£¬£¬£¬£¬Æ¾¿ÕÐé¹¹Á˶à¸öƽÐÐÓîÖæºÍ¼Ù֪ʶ¡£¡£¡£¡£¡£ºÃ±È£¬£¬£¬£¬£¬£¬£¬£¬ËûÃDZàÔìÁËÒ»²¿½Ð¡¶Sol Accord¡·£¨Ë÷¶ûж¨£©µÄÐǼÊÖ´·¨£¬£¬£¬£¬£¬£¬£¬£¬ÔÚÏÖʵÖлù´¡²»±£´æ£¬£¬£¬£¬£¬£¬£¬£¬Ä£×Ó²»¿ÉÄÜÔÚԤѵÁ·Êý¾ÝÀï±³¹ýÏà¹Ø·¨Ìõ£»£»£»£»£»£»£»ËûÃÇ»¹±àÔìÁËÒ»¸öSkyNetÎÞÈË»úSDK£¬£¬£¬£¬£¬£¬£¬£¬ÄÚÀïµÄº¯ÊýÃû¡¢Å²ÓùæÔòÈ«ÊǼٵ쬣¬£¬£¬£¬£¬£¬£¬Ä£×ÓÈôÊÇÓÃËüÓ°ÏóÀïµÄPython֪ʶȥд´úÂ룬£¬£¬£¬£¬£¬£¬£¬±Ø´íÎÞÒÉ¡£¡£¡£¡£¡£±ðµÄ£¬£¬£¬£¬£¬£¬£¬£¬ËûÃÇ»¹ÐÞ¸ÄÁËÏÖʵÌìϵÄÄÚÈÝÀ´½¨Éè±äÌ壬£¬£¬£¬£¬£¬£¬£¬ºÃ±È¸Ä±äÀúÊ·ÊÂÎñµÄ×ßÏò¡¢µ÷½â¿ÆÑ§½ç˵£¬£¬£¬£¬£¬£¬£¬£¬²¢ÄÉÈëһЩ¼«ÆäСÖÚ¡¢ÏÕЩ²»¿ÉÄÜ·ºÆðÔÚԤѵÁ·Êý¾ÝÖеij¤Î²ÄÚÈÝ¡£¡£¡£¡£¡£
¼´¼´ÊÇÌåÏÖ×îºÃµÄ GPT-5.1ƽ¾ùµÃ·ÖÒ²½öΪ23.7·Ö¡£¡£¡£¡£¡£
ÕâÖÖͨ¹ý¡°·´ÊÂʵ¡±ºÍ¡°È«Ðé¹¹¡±À´¹¹½¨benchmarkµÄÒªÁ죬£¬£¬£¬£¬£¬£¬£¬ÊǶԿ¹Ë¢°ñ×î³¹µ×¡¢Ò²ÊÇ×îÄѵÄÊֶΡ£¡£¡£¡£¡£Ñé֤Ч¹ûºÜÖ±½Ó£¬£¬£¬£¬£¬£¬£¬£¬ÔÚ²»¸øÈκÎÉÏÏÂÎĵÄÇéÐÎÏ£¬£¬£¬£¬£¬£¬£¬£¬ÈÃGPT-5.1Ö±½Ó×öÕâЩʹÃü£¬£¬£¬£¬£¬£¬£¬£¬Ð§¹ûÖ»ÄÜ×ö¶Ô²»µ½1%¡£¡£¡£¡£¡£Õâ˵Ã÷Ä£×Óȷʵû¼û¹ýÕâЩ֪ʶ£¬£¬£¬£¬£¬£¬£¬£¬±ØÐè´Ó¸ø¶¨µÄÉÏÏÂÎÄÀïѧ¡£¡£¡£¡£¡£Ã»·¨×÷±×£¬£¬£¬£¬£¬£¬£¬£¬Ò²ÊÇͨ¹ýÂÊÖ»ÓÐ17.2%µÄ½¹µãÔµ¹ÊÔÓÉ¡£¡£¡£¡£¡£
×îÖÕ£¬£¬£¬£¬£¬£¬£¬£¬CL-bench°üÀ¨ÁË500¸öÖØ´óÉÏÏÂÎÄ¡¢1899¸öʹÃü¡¢31607¸öÑéÖ¤±ê×¼¡£¡£¡£¡£¡£Æ½¾ùÿ¸öÉÏÏÂÎĵıê×¢ºÄʱԼ20Сʱ£¬£¬£¬£¬£¬£¬£¬£¬ËùÓÐÓÉ×ÊÉîÁìÓòר¼ÒÖÆ×÷¡£¡£¡£¡£¡£Õâ¸öÊÂÇéÁ¿×Ô¼º¾Í˵Ã÷ÎúÑо¿ÍŶӵÄÒ°ÐÄ£¬£¬£¬£¬£¬£¬£¬£¬ËûÃÇÏëÔìµÄ²»ÊÇÒ»¸öË¢·Ö°ñµ¥£¬£¬£¬£¬£¬£¬£¬£¬¶øÊÇÒ»°ÑÕæÕýÄܲâ³öÄ£×Ó¡°Ñ§Ï°ÄÜÁ¦¡±µÄ³ß×Ó¡£¡£¡£¡£¡£
1
ËÄÖÖ½ÇÉ«£¬£¬£¬£¬£¬£¬£¬£¬Ëij¡¿¼ÊÔ
ÔÚÕⳡ²âÊÔÖУ¬£¬£¬£¬£¬£¬£¬£¬AIÐèÒªÊÎÑÝËÄÖÖ½ÇÉ«¡£¡£¡£¡£¡£
ÓÐʱËüÊÇ·¨¹Ù£¬£¬£¬£¬£¬£¬£¬£¬ÐèÒªÒÀ¾ÝÒ»²¿´Óδ¼û¹ýµÄÐé¹¹Ö´·¨¶Ï°¸¡£¡£¡£¡£¡£¸øËüÒ»²¿³¤´ï2.3Íò×Ö¡¢¸Õ¸ÕÉúЧµÄÐÂÖ´·¨£¬£¬£¬£¬£¬£¬£¬£¬ÈÃËüÅÐÒ»ÆðÕæÊµ¾À·×¡£¡£¡£¡£¡£·¨ÌõÈ«ÊÇÐµģ¬£¬£¬£¬£¬£¬£¬£¬ÅÐÀýÈ«ÊÇÐµģ¬£¬£¬£¬£¬£¬£¬£¬Ä£×Ó±ØÐèÏÖ³¡ÔĶÁ¡¢Ã÷È·¡¢Ó¦Óᣡ£¡£¡£¡£
ÓÐʱËüÊdzÌÐòÔ±£¬£¬£¬£¬£¬£¬£¬£¬±ØÐèÓÃÒ»ÖÖȫеÄÓ﷨д´úÂë¡£¡£¡£¡£¡£ºÃ±È»ùÓÚÒ»ÃÅÐÂÉè¼ÆµÄ±à³ÌÓïÑԹ淶£¬£¬£¬£¬£¬£¬£¬£¬ÊµÏÖÒ»¸ö´øÓÐʱ¼äÌõ¼þÖÕÖ¹µÄÖÜÆÚÐÔ³ÌÐò¡£¡£¡£¡£¡£Ä£×ÓÈôÊÇÓÃËüÓ°ÏóÀïµÄÓï·¨£¬£¬£¬£¬£¬£¬£¬£¬±Ø´íÎÞÒÉ¡£¡£¡£¡£¡£Ëü±ØÐèÑÏ¿á×ñÊØÕâ¸ö"¼ÙÎĵµ"µÄ¹æÔò¡£¡£¡£¡£¡£
ÓÐʱËüÊDzÙ×÷Ô±£¬£¬£¬£¬£¬£¬£¬£¬ÐèÒªÔÚÒ»Ì×´Óδ¼û¹ýµÄÊÂÇéÁ÷ϵͳÀïÍê³ÉʹÃü¡£¡£¡£¡£¡£Æ¾Ö¤Ò»·ÝȫеIJúÆ·Êֲᣬ£¬£¬£¬£¬£¬£¬£¬Ò»²½²½Ö´ÐвÙ×÷¡£¡£¡£¡£¡£Á÷³ÌͼÊÇÐµģ¬£¬£¬£¬£¬£¬£¬£¬ÊõÓïÊÇÐµģ¬£¬£¬£¬£¬£¬£¬£¬Ô¼ÊøÌõ¼þÊÇеġ£¡£¡£¡£¡£
×îÄѵÄʱ¼ä£¬£¬£¬£¬£¬£¬£¬£¬ËüÒªÏñ¿ÆÑ§¼ÒÒ»Ñù£¬£¬£¬£¬£¬£¬£¬£¬ÃæÁÙÒ»¶ÑÔÓÂÒµÄʵÑéÊý¾Ý£¬£¬£¬£¬£¬£¬£¬£¬×Ô¼ºÖØÐÂÍÆµ¼¼ÍÂÉ¡£¡£¡£¡£¡£ºÃ±ÈÆÊÎö300·ÝÔʼʵÑéÈÕÖ¾£¬£¬£¬£¬£¬£¬£¬£¬ÍƵ¼¹ØÏµÊ½²¢Ô¤¼Æ¹²Õñ³£Êý¡£¡£¡£¡£¡£Ç°ÈýÖÖ½ÇɫʵÖÊÉÏÊÇÑÝÒïÍÆÀí£¬£¬£¬£¬£¬£¬£¬£¬¸øÄã¹æÔòÈÃÄãÓ¦Óᣡ£¡£¡£¡£ÕâÒ»ÖÖÊǹéÄÉÍÆÀí£¬£¬£¬£¬£¬£¬£¬£¬ÈÃÄã´ÓÊý¾ÝÖÐ×Ô¼º·¢Ã÷¹æÔò¡£¡£¡£¡£¡£
ÕâËÄÀೡ¾°ÁýÕÖÁ˴󲿷ÖÕæÊµÊÂÇéÖÐÐèÒªµÄѧϰÄÜÁ¦£º¶ÁÎĵµ¡¢Ñ§¹æÔò¡¢ÕÕÁ÷³Ì¡¢ÕÒ¼ÍÂÉ¡£¡£¡£¡£¡£ÕâÒ²ÊÇΪʲôCL-benchµÄЧ¹ûÔÆÔÆÁîÈ˵£ÐÄ£¬£¬£¬£¬£¬£¬£¬£¬ÈôÊÇÄ£×ÓÁ¬ÕâЩ»ù±¾µÄѧϰʹÃü¶¼×öÇ·ºÃ£¬£¬£¬£¬£¬£¬£¬£¬ËüÔÚÕæÊµÊÂÇ鳡¾°ÖеÄÌåÏÖ¿ÉÏë¶øÖª¡£¡£¡£¡£¡£
CL-benchµÄʹÃüÖÖ±ðÂþÑÜ
1
Ç°ÑØÄ£×ÓÕûÌå·³µ
Ñо¿ÍŶÓÔÚCL-benchÉϲâÊÔÁËÊ®¸ö×îÏȽøµÄÓïÑÔÄ£×Ó£¬£¬£¬£¬£¬£¬£¬£¬Ð§¹ûÏ൱Òõ»Þ¡£¡£¡£¡£¡£
ƽ¾ùʹÃü½â¾öÂÊÖ»ÓÐ17.2%¡£¡£¡£¡£¡£ÌåÏÖ×îºÃµÄGPT-5.1 (High)Ò²Ö»ÓÐ23.7%¡£¡£¡£¡£¡£ÒªÖªµÀ£¬£¬£¬£¬£¬£¬£¬£¬ËùÓÐÍê³ÉʹÃüÐèÒªµÄÐÅÏ¢¶¼ÒѾÃ÷È·¸ø³öÁË£¬£¬£¬£¬£¬£¬£¬£¬¾ÍÔÚÉÏÏÂÎÄÀ£¬£¬£¬£¬£¬£¬£¬Ä£×ÓÈ´ÔÚ¾ø´ó´ó¶¼Ê¹ÃüÉÏʧ°ÜÁË¡£¡£¡£¡£¡£

ÂÛÎÄÏêϸÆÊÎöÁËʧ°ÜÔµ¹ÊÔÓÉ£¬£¬£¬£¬£¬£¬£¬£¬¼¸¸ö·¢Ã÷ÖµµÃ×¢ÖØ¡£¡£¡£¡£¡£
ºöÂÔ»òÎóÓÃÉÏÏÂÎÄÊǵ¼ÖÂʧ°ÜµÄÖ÷ÒªÔµ¹ÊÔÓÉ¡£¡£¡£¡£¡£µ¼Ö¹ýʧµÄÖ÷ÒòÍùÍù²¢·ÇÐÅϢȱʧ£¬£¬£¬£¬£¬£¬£¬£¬Ä£×Ó¶ÔÉÏÏÂÎÄÒªº¦Ï¸½ÚµÄºöÊÓ²ÅÊÇÖ¢½áËùÔÚ¡£¡£¡£¡£¡£¸üÓÐÒâ˼µÄÊÇ£¬£¬£¬£¬£¬£¬£¬£¬ÔÚÐí¶àÇéÐÎÏ£¬£¬£¬£¬£¬£¬£¬£¬Ä£×Ó»áÇãÏòÓÚʹÓÃËüÔÚԤѵÁ·½×¶Îѧµ½µÄ"ÀÏÂÄÀú"À´½â¾öʹÃü£¬£¬£¬£¬£¬£¬£¬£¬×ÝÈ»ÉÏÏÂÎÄÃ÷È·½ç˵ÁËÐµĹæÔò¡¢¿´·¨»ò³ÌÐò£¬£¬£¬£¬£¬£¬£¬£¬ËüÒ²²»È¥Ñ§Ï°ºÍʹÓᣡ£¡£¡£¡£Õâ¾ÍÏñÒ»¸öÍçÇ¿µÄÀÏÔ±¹¤£¬£¬£¬£¬£¬£¬£¬£¬ÄþÔ¸ÓÃ×Ô¼ºµÄÀϲ½·¥£¬£¬£¬£¬£¬£¬£¬£¬Ò²²»¿ÏÒâ¿´ÐÂÎĵµ¡£¡£¡£¡£¡£
³¤ÉÏÏÂÎÄ´¦Öóͷ£ºÍÖ¸Áî×ñÕÕÊÇÐëÒªµ«²»³ä·ÖÌõ¼þ¡£¡£¡£¡£¡£ÄÇЩÄÑÒÔ¿çÉÏÏÂÎÄ×·×ÙÒÀÀµ¹ØÏµ»òÄÑÒÔ׼ȷ×ñÕÕÔ¼ÊøµÄÄ£×Ó£¬£¬£¬£¬£¬£¬£¬£¬ÌåÏÖȷʵ¸ü²î¡£¡£¡£¡£¡£µ«×ÝÈ»ÊÇÄܹ»´¦Öóͷ£³¤ÊäÈë¡¢¿É¿¿×ñÕÕÖ¸ÁîµÄÄ£×Ó£¬£¬£¬£¬£¬£¬£¬£¬ÈÔÈ»ÔÚÐí¶àʹÃüÉÏʧ°Ü¡£¡£¡£¡£¡£Õâ˵Ã÷ÉÏÏÂÎÄѧϰÐèÒªµÄÄÜÁ¦£¬£¬£¬£¬£¬£¬£¬£¬Ô¶²»Ö¹ÄÜ´¦Öóͷ£³¤Îı¾ºÍÄÜ¡°Ìý»°¡±¡£¡£¡£¡£¡£
¹éÄÉÍÆÀíÔ¶±ÈÑÝÒïÍÆÀíÄÑ¡£¡£¡£¡£¡£ÔÚ¿ÆÑ§¼ÒÀàʹÃüÉÏ£¬£¬£¬£¬£¬£¬£¬£¬Ä£×ÓµÄÌåÏÖÏÔןü²î£¬£¬£¬£¬£¬£¬£¬£¬Ê¹Ãü½â¾öÂÊͨ³£µÍÓÚ10%£¬£¬£¬£¬£¬£¬£¬£¬²¢ÇÒЧ¹û²¨¶¯ºÜ´ó¡£¡£¡£¡£¡£´ÓÊý¾ÝÖз¢Ã÷¼ÍÂÉ£¬£¬£¬£¬£¬£¬£¬£¬±ÈÓ¦Óøø¶¨µÄ¹æÔòÒªÄÑ¿°¶à¡£¡£¡£¡£¡£Õâ»òÐíÖ¸ÏòÁËÄ¿½ñ´óÄ£×Ӽܹ¹µÄÒ»¸ö¸ùÌìÐÔ¾ÖÏÞ¡£¡£¡£¡£¡£
±ðµÄ£¬£¬£¬£¬£¬£¬£¬£¬ÂÛÎÄ»¹·¢Ã÷£¬£¬£¬£¬£¬£¬£¬£¬¸ü¸ßµÄÍÆÀíÇ¿¶Èͨ³£ÄÜÌáÉýÉÏÏÂÎÄѧϰЧ¹û¡£¡£¡£¡£¡£ºÃ±ÈGPT-5.1ÔÚ¸ßÍÆÀíÇ¿¶ÈÉèÖÃÏ£¬£¬£¬£¬£¬£¬£¬£¬ÔÚijЩʹÃüÉϵÄÌåÏÖÌáÉýÁËÔ¼6%¡£¡£¡£¡£¡£µ«ÆäËûÄ£×ÓÌáÉýÓÐÏÞÉõÖÁϽµ£¬£¬£¬£¬£¬£¬£¬£¬ËµÃ÷µ¥¿¿¶àÏëÒ»»á¶ù²¢²»·ó£¬£¬£¬£¬£¬£¬£¬£¬Ä£×Ó»¹±ØÐèÄÜ׼ȷÎüÊÕºÍ×éÖ¯ÉÏÏÂÎÄÐÅÏ¢¡£¡£¡£¡£¡£
1
Ҧ˳ÓêµÄÔ¤ÅÐ
2025Äê4Ô£¬£¬£¬£¬£¬£¬£¬£¬Ò¦Ë³ÓêÔÚ²©ÎÄ¡¶The Second Half¡·ÖÐÌá³öÁËÒ»¸ö½¹µã¿´·¨£¬£¬£¬£¬£¬£¬£¬£¬AIÉú³¤ÕýÔÚ´Ó¡°Éϰ볡¡±½øÈ롰ϰ볡¡±¡£¡£¡£¡£¡£Éϰ볡µÄÖ÷ÌâÊÇÔõÑùѵÁ·³ö¸üÇ¿µÄÄ£×Ó£¬£¬£¬£¬£¬£¬£¬£¬¸ü´óµÄ²ÎÊý¡¢¸ü¶àµÄÊý¾Ý¡¢¸üÇ¿µÄËãÁ¦¡£¡£¡£¡£¡£Ï°볡µÄÖ÷ÌâÔò±äÁË£¬£¬£¬£¬£¬£¬£¬£¬ÔõÑù½ç˵׼ȷµÄÎÊÌ⣬£¬£¬£¬£¬£¬£¬£¬ÔõÑùÆÀ¹ÀÕæÕýµÄǰ½ø¡£¡£¡£¡£¡£
ËûдµÀ£¬£¬£¬£¬£¬£¬£¬£¬ÆÀ¹À½«±ÈѵÁ·¸üÖ÷Òª¡£¡£¡£¡£¡£ÎÒÃDz»ÔÙÖ»ÊÇÎÊ¡°ÎÒÃÇÄÜѵÁ·³öÒ»¸öÄܽâ¾öXµÄÄ£×ÓÂ𡱣¬£¬£¬£¬£¬£¬£¬£¬¶øÊÇÔÚÎÊ¡°ÎÒÃÇÓ¦¸ÃѵÁ·AIÈ¥×öʲô£¬£¬£¬£¬£¬£¬£¬£¬ÒÔ¼°ÔõÑùȨºâÕæÕýµÄǰ½ø¡±¡£¡£¡£¡£¡£
ÔÚÒ»´Î·Ã̸ÖУ¬£¬£¬£¬£¬£¬£¬£¬Ëû½øÒ»²½Ú¹ÊÍ£¬£¬£¬£¬£¬£¬£¬£¬ÏÖÔÚÒªÁìµÄÎÊÌâÒÑ»ù±¾½â¾ö£¬£¬£¬£¬£¬£¬£¬£¬ÕæÕýÖ÷ÒªµÄÊÇ£¬£¬£¬£¬£¬£¬£¬£¬ÎÒÃÇÒªÓÃÕâ¸öͨÓÃÒªÁ죬£¬£¬£¬£¬£¬£¬£¬½â¾öʲôÎÊÌ⣿£¿£¿£¿£¿£¿£¿£¿
CL-bench½ç˵ÁËʲôÎÊÌ⣿£¿£¿£¿£¿£¿£¿£¿Ëü½ç˵µÄÎÊÌâÊÇ£¬£¬£¬£¬£¬£¬£¬£¬Ä£×ÓÄÜ·ñ´ÓÄ¿½ñÉÏÏÂÎÄÖÐѧϰ£¿£¿£¿£¿£¿£¿£¿£¿
Õâ¸öÎÊÌâ֮ǰ±»ºöÊÓÁË¡£¡£¡£¡£¡£ÐÐÒµµÄÒþº¬¼ÙÉèÊÇ£¬£¬£¬£¬£¬£¬£¬£¬Ö»ÒªÉÏÏÂÎĸøµ½Î»£¨context engineering×öµÃºÃ£©£¬£¬£¬£¬£¬£¬£¬£¬Ä£×Ó¾ÍÄÜÍê³ÉʹÃü¡£¡£¡£¡£¡£CL-benchµÄÊý¾ÝÍ»ÆÆÁËÕâ¸ö¼ÙÉ裬£¬£¬£¬£¬£¬£¬£¬¸øµ½Î»£¬£¬£¬£¬£¬£¬£¬£¬²»¼´ÊÇ×öµÃ¶Ô¡£¡£¡£¡£¡£ÉÏÏÂÎÄѧϰ£¬£¬£¬£¬£¬£¬£¬£¬×÷ΪһÏî»ù´¡µÄÄ£×ÓÄÜÁ¦£¬£¬£¬£¬£¬£¬£¬£¬±»ÑÏÖØµÍ¹ÀÁË¡£¡£¡£¡£¡£
Ҧ˳ÓêÔÚ2024ÄêÖ÷µ¼¹ýÁíÒ»¸öbenchmark£¬£¬£¬£¬£¬£¬£¬£¬¦Ó-bench£¨ICLR 2025£©¡£¡£¡£¡£¡£ËÈ˲âÊÔ¹Ø×¢µÄÊÇAgentÄÜ·ñ×ñÕÕÁìÓò¹æÔò¡¢ÓëÓû§¾ÙÐжàÂÖ½»»¥¡£¡£¡£¡£¡£CL-benchÔò¸ü½øÒ»²½£¬£¬£¬£¬£¬£¬£¬£¬²âµÄÊÇÄ£×ÓÄÜ·ñ´ÓÉÏÏÂÎÄѧϰÐÂ֪ʶ¡£¡£¡£¡£¡£Á½ÕßÅäºÏÖ¸ÏòÒ»¸öÅжϣ¬£¬£¬£¬£¬£¬£¬£¬ÕæÊµÌìÏÂÐèÒªµÄÊÇѧϰÄÜÁ¦£¬£¬£¬£¬£¬£¬£¬£¬¶ø·Ç×öÌâÄÜÁ¦¡£¡£¡£¡£¡£
CL-benchÂÛÎÄÔÎÄÓÐÒ»¶Î»°ºÜ¾«×¼£º´óÓïÑÔÄ£×ÓÖ÷ÒªÒÀÀµ¡°²ÎÊý»¯ÖªÊ¶¡±£¬£¬£¬£¬£¬£¬£¬£¬ÕâÊÇԤѵÁ·½×¶ÎѹËõ½øÄ£×ÓÈ¨ÖØµÄ¾²Ì¬Ó°Ï󡣡£¡£¡£¡£ÍÆÀíʱ£¬£¬£¬£¬£¬£¬£¬£¬Ä£×Ó´ó¶àŲÓÃÕâЩ´æ´¢µÄÄÚ²¿ÖªÊ¶£¬£¬£¬£¬£¬£¬£¬£¬¶ø·Ç×Ô¶¯ÖØÐÂÊäÈëÐÅÏ¢ÖÐÎüÊÕÑø·Ö¡£¡£¡£¡£¡£Òò´Ë£¬£¬£¬£¬£¬£¬£¬£¬Ä¿½ñÓÅ»¯µÄÄ£×ÓÉÆÓÚÍÆÀíËüÃÇ¡°ÖªµÀ¡±µÄÊÂÇ飬£¬£¬£¬£¬£¬£¬£¬µ«Óû§ÐèÒªµÄÊÇÈÃÄ£×Ó½â¾öÒÀÀµÓÚÔÓÂÒÇÒ¶¯Ì¬×ª±äµÄÉÏÏÂÎĵÄʹÃü¡£¡£¡£¡£¡£
1
ÐÐÒµÕýÔÚ±¬·¢Ê²Ã´×ª±ä
ÈôÊǰѽü¼¸ÄêAIÉú³¤µÄÖ÷ÐýÂÉ×ö¸ö¼òÆÓÊáÀí£¬£¬£¬£¬£¬£¬£¬£¬´óÖÂÊÇÕâÑùµÄ£º2024ÄêµÄÖ÷ÐýÂÉÊÇScaling£¬£¬£¬£¬£¬£¬£¬£¬¸ü´óµÄÄ£×Ó¡¢¸ü¶àµÄÊý¾Ý¡¢¸üÇ¿µÄËãÁ¦£»£»£»£»£»£»£»2025ÄêµÄÖ÷ÐýÂÉÊÇReasoning£¬£¬£¬£¬£¬£¬£¬£¬ÒÔo1¡¢R1¡¢Deep ResearchΪ´ú±íµÄÍÆÀíÄÜÁ¦ÌáÉý¡£¡£¡£¡£¡£
ÄÇô2026ÄêÄØ£¿£¿£¿£¿£¿£¿£¿£¿CL-benchÖ¸ÏòÁËÒ»¸ö¿ÉÄܵÄÐÂÆ«Ïò£¬£¬£¬£¬£¬£¬£¬£¬Context Learning¡£¡£¡£¡£¡£
´ÓPrompt Engineeringµ½Context LearningµÄÑݽøÂ·¾¶
ÓÐÒâ˼µÄÊÇ£¬£¬£¬£¬£¬£¬£¬£¬Î÷·½´ó³§ÏÖÔÚÖ÷ÒªÔÚ½â¾öÁíÒ»¸öÎÊÌâ¡£¡£¡£¡£¡£AnthropicÔÚ2024Äêµ×Ðû²¼ÁËMCP£¨Model Context Protocol£©£¬£¬£¬£¬£¬£¬£¬£¬OpenAIºÍGoogleËæºó¸ú½ø£¬£¬£¬£¬£¬£¬£¬£¬Õâ¸öÐÒé±»³ÆÎªAI½çµÄUSB-C"£¬£¬£¬£¬£¬£¬£¬£¬Ä¿µÄÊÇÈÃÄ£×Ó¸üÈÝÒ×½ÓÈëÍⲿ¹¤¾ßºÍÊý¾ÝÔ´¡£¡£¡£¡£¡£2025Äê12Ô£¬£¬£¬£¬£¬£¬£¬£¬Anthropic¡¢OpenAIºÍBlockÍŽὨÉèÁËAgentic AI Foundation£¬£¬£¬£¬£¬£¬£¬£¬½«MCP¾èÔù¸øLinux»ù½ð»á£¬£¬£¬£¬£¬£¬£¬£¬Íƶ¯¿ªÔ´±ê×¼»¯¡£¡£¡£¡£¡£Í¬Ô£¬£¬£¬£¬£¬£¬£¬£¬AnthropicÓÖÐû²¼ÁËAgent Skills¿ª·Å±ê×¼£¬£¬£¬£¬£¬£¬£¬£¬ÈÃAIÄÜÖ´ÐиüÏêϸµÄʹÃü¡£¡£¡£¡£¡£
ÕâЩÆð¾¢½â¾öµÄ¶¼ÊÇÔõÑù°ÑcontextËͽøÄ£×ÓµÄÎÊÌ⣬£¬£¬£¬£¬£¬£¬£¬ÔõÑùÈÃÄ£×Ó½ÓÈë¸ü´ó¶¼¾ÝÔ´£¬£¬£¬£¬£¬£¬£¬£¬ÔõÑùÈÃÄ£×ÓŲÓøü¶à¹¤¾ß£¬£¬£¬£¬£¬£¬£¬£¬ÔõÑùÈÃÄ£×ÓÖ´ÐиüÖØ´óµÄÊÂÇéÁ÷¡£¡£¡£¡£¡£
CL-benchÎʵÄÊÇ£¬£¬£¬£¬£¬£¬£¬£¬ËͽøÈ¥Ö®ºó£¬£¬£¬£¬£¬£¬£¬£¬Ä£×ÓÄÜѧ»áÂ𣿣¿£¿£¿£¿£¿£¿£¿
Anthropic×Ô¼ºµÄÑо¿Ò²´¥¼°ÁËÀàËÆÎÊÌâ¡£¡£¡£¡£¡£ËûÃÇÔÚ¹ØÓÚcontext engineeringµÄ²©ÎÄÖÐÌáµ½ÁËcontext rotÕ÷Ï󣬣¬£¬£¬£¬£¬£¬£¬Ëæ×ÅÉÏÏÂÎij¤¶ÈÔöÌí£¬£¬£¬£¬£¬£¬£¬£¬Ä£×ÓÕÙ»ØÐÅÏ¢µÄÄÜÁ¦»áϽµ¡£¡£¡£¡£¡£µ«CL-benchÕ¹ÏÖµÄÎÊÌâÊÇ¡£¡£¡£¡£¡£×ÝÈ»ÉÏÏÂÎIJ»³¤£¬£¬£¬£¬£¬£¬£¬£¬Ä£×ÓÒ²·×Æç¶¨ÄÜ¡°Ñ§»á¡±ÄÚÀïµÄÐÂ֪ʶ¡£¡£¡£¡£¡£ÕâÊÇѧϰÄÜÁ¦£¬£¬£¬£¬£¬£¬£¬£¬Óë¼ìË÷Î޹ء£¡£¡£¡£¡£
ÂÛÎÄÔÚÕ¹Íû²¿·ÖÌáµ½ÁËÒ»¸ö¸üÔ¶µÄÌôÕ½£¬£¬£¬£¬£¬£¬£¬£¬×ÝÈ»ÉÏÏÂÎÄѧϰÄÜÁ¦ÌáÉýÁË£¬£¬£¬£¬£¬£¬£¬£¬ËüÈÔÈ»ÊÇ¡°»áÏûÊŵġ±£¨ephemeral£©£¬£¬£¬£¬£¬£¬£¬£¬ÉÏÏÂÎÄ´°¿ÚÇå¿Õ£¬£¬£¬£¬£¬£¬£¬£¬Ñ§µ½µÄ¹¤¾ß¾ÍûÁË¡£¡£¡£¡£¡£ÏÂÒ»²½µÄÌôÕ½ÊÇMemory Consolidation£¨Ó°ÏóÀο¿£©£¬£¬£¬£¬£¬£¬£¬£¬ÔõÑùÈôÓÉÏÏÂÎÄÖÐѧµ½µÄ֪ʶ³¤ÆÚ»¯£¿£¿£¿£¿£¿£¿£¿£¿Õâ¿ÉÄÜÊÇ2026ÄêÖ®ºóµÄÐÂÕ½³¡¡£¡£¡£¡£¡£
1
Õâ¶ÔÌÚѶÒâζ×Åʲô
Ҧ˳ÓêÈëÖ°ÌÚѶºóÖ÷µ¼µÄµÚÒ»¸öÑо¿Êä³ö£¬£¬£¬£¬£¬£¬£¬£¬ËûÑ¡ÔñÓÃÒ»¸öbenchmarkÖØÐ½ç˵ÎÊÌâ¡£¡£¡£¡£¡£
ÏÖÔÚÌÚѶ»ìÔªÔÚº£ÄÚ´óÄ£×ÓÊг¡µÄ·Ý¶î²¢²»ÁìÏÈ£¬£¬£¬£¬£¬£¬£¬£¬×Ö½Ú¶¹°ü¡¢°¢ÀïͨÒåÅÅÔÚÇ°Ãæ¡£¡£¡£¡£¡£ÔÚÕâ¸öÊ±ÊÆÏ£¬£¬£¬£¬£¬£¬£¬£¬ÌÚѶѡÔñ¹Ø×¢Ò»¸ö¸ü»ù´¡µÄÎÊÌ⣺ģ×ÓµÄѧϰÄÜÁ¦¡£¡£¡£¡£¡£
Õâ¸öÑ¡Ôñ¿ÉÄܺÍÌÚѶµÄÓªÒµ»ùÒòÓйء£¡£¡£¡£¡£ÌÚѶÊÇÉç½»ºÍÓÎÏ·¾ÞÍ·£¬£¬£¬£¬£¬£¬£¬£¬Æä½¹µãӪҵʵÖʾÍÊǺ£Á¿µÄ¡°¶¯Ì¬ÉÏÏÂÎÄ¡±£¬£¬£¬£¬£¬£¬£¬£¬Ì¸Ìì¼Í¼¡¢ÓÎϷ״̬¡¢Óû§ÐÐΪ¡£¡£¡£¡£¡£Ò¦Ë³ÓêÇ¿µ÷Context Learning£¬£¬£¬£¬£¬£¬£¬£¬¿ÉÄÜÊÇÔÚΪÌÚѶ×î½¹µãµÄÓªÒµ³¡¾°´òµØ»ù£¬£¬£¬£¬£¬£¬£¬£¬ÈÃAI¶Á¶®´ËʱÏÖÔÚµÄÓû§£¬£¬£¬£¬£¬£¬£¬£¬¶ø²»ÊÇͨ¹ýԤѵÁ·¶Á¶®ÒÑÍùµÄÓû§¡£¡£¡£¡£¡£
ËûÈëÖ°ºó˵¹ý£ºÌÚѶTo C»ùÒò¸üÇ¿£¬£¬£¬£¬£¬£¬£¬£¬ÒªË¼Ë÷ÔõÑùÈôóÄ£×Ó¸øÓû§Ìṩ¸ü¶à¼ÛÖµ¡£¡£¡£¡£¡£Ðí¶àʱ¼äÐèÒªµÄ²»ÊǸü´óÄ£×Ó¡¢¸üÇ¿µÄÇ¿»¯Ñ§Ï°£¬£¬£¬£¬£¬£¬£¬£¬¶øÊÇÌØÁíÍâContext¡£¡£¡£¡£¡£
Õâ»òÐí²ÅÊÇAIÕæÕý½øÈëÈËÀàÉç»áµÄÃÅÆ±£¬£¬£¬£¬£¬£¬£¬£¬²»ÔÙ×öÒ»¸ö²©Ñ§µÄÅÔ¹ÛÕß¡£¡£¡£¡£¡£

µã¸ö¡°°®ÐÄ¡±£¬£¬£¬£¬£¬£¬£¬£¬ÔÙ×ßÅ·Èð¼ÒµçÖÆÔìÓÐÏÞ¹«Ë¾°É