
×÷Õß £ü ÖÜһЦÓÊÏä £ü zhouyixiao@pingwest.com
2025Äê4Ô£¬£¬£¬£¬£¬»¹ÔÚOpenAIµÄҦ˳Óê·¢ÁËһƪ²©ÎÄ¡¶The Second Half¡·£¬£¬£¬£¬£¬Ìá³öÒ»¸öÅжϣºAI½øÈëϰ볡ÁË£¬£¬£¬£¬£¬½ÓÏÂÀ´±ÈµÄ²»ÊÇËÄ£×Ó¸ü´ó£¬£¬£¬£¬£¬¶øÊÇËÄܸüºÃµØ½ç˵ÎÊÌâ¡£¡£¡£¡£¡£
°ëÄêºóËû¼ÓÈëÌÚѶ¡£¡£¡£¡£¡£ÓÖ¹ýÁËÁ½¸öÔ£¬£¬£¬£¬£¬ËûÖ÷µ¼µÄµÚÒ»¸öÑо¿Ð§¹ûÐû²¼ÁË¡£¡£¡£¡£¡£Õâ¸öЧ¹û²¢Î´ÍƳöÐÂÄ£×Ó£¬£¬£¬£¬£¬ËüÖ±½ÓÅ׳öÁËÒ»¸öÊý¾Ý£¬£¬£¬£¬£¬GPT-5.1ÔÚÒ»ÏîвâÊÔÖÐÖ»Äõ½ÁË23.7%¡£¡£¡£¡£¡£
²âÊÔµÄÉ趨ºÜ¼òÆÓ£¬£¬£¬£¬£¬°ÑËùÓÐÐèÒªµÄÐÅÏ¢¶¼·ÅÔÚÉÏÏÂÎÄÀ£¬£¬£¬£¬ÈÃÄ£×ÓÈ¥Íê³ÉʹÃü¡£¡£¡£¡£¡£¿£¿£¿£¿£¿£¿£¿£¿¼µÄÊÇÄ£×ÓÄÜ·ñ´ÓÑÛǰµÄÖÊÁÏÀïѧ»áй¤¾ß¡£¡£¡£¡£¡£
Ч¹ûÊÇÄ£×Ó¿´ÁË£¬£¬£¬£¬£¬µ«Ã»Ñ§»á¡£¡£¡£¡£¡£
1
û·¨×÷±×µÄ¿¼ÊÔ
ÕâÆªÂÛÎĽÐCL-bench£¬£¬£¬£¬£¬È«³ÆContext Learning Benchmark£¬£¬£¬£¬£¬2026Äê2ÔÂ3ÈÕÓÉÌÚѶ»ìÔªÍŶӺ͸´µ©´óѧÍŽáÐû²¼¡£¡£¡£¡£¡£×÷ΪÏîÄ¿ÈÏÕæÈË£¬£¬£¬£¬£¬Ò¦Ë³ÓêÅÅÔÚ×÷ÕßÁбíµÄ×îºóһλ¡£¡£¡£¡£¡£

Context Learning²»ÊÇп´·¨£¬£¬£¬£¬£¬µ«ÕâÆªÂÛÎĶÔËüµÄ½ç˵¼«Îª¿Á¿Ì£¬£¬£¬£¬£¬Ä£×Ó±ØÐè´ÓÉÏÏÂÎÄÖÐѧϰµ½Ô¤ÑµÁ·½×¶Î²»±£´æµÄÐÂ֪ʶ£¬£¬£¬£¬£¬²¢×¼È·Ó¦Óᣡ£¡£¡£¡£¼òÆÓ˵£¬£¬£¬£¬£¬ÒªÈÃÄ£×ÓÏÖ³¡Ñ§»áËüû¼û¹ýµÄ¹¤¾ß£¬£¬£¬£¬£¬²»µ«ÊÇ¡°»ØÒ䡱ËüÒÔǰ¼û¹ýµÄÄÚÈÝ¡£¡£¡£¡£¡£
ΪÁËʵÏÖÕâ¸öÄ¿µÄ£¬£¬£¬£¬£¬Ñо¿ÍŶÓÔÚÊý¾Ý¹¹½¨ÉÏÏÂÁ˺ݹ¦·ò¡£¡£¡£¡£¡£
ÏÖÔÚÒµ½ç±ÜÃâÊý¾ÝÎÛȾ×î³£¼ûµÄ×ö·¨½ÏÁ¿¼òÆÓ´Ö±©£¬£¬£¬£¬£¬Éè׼ʱ¼äÇиîµã£¨ºÃ±ÈÖ»¿¼2024ÄêÒÔºóµÄÐÂÎÅ£©¡¢°ÑÌâ¿â²ØÆðÀ´²»¹ûÕæ¡¢»òÕßÓÃËã·¨È¥ÖØ¡£¡£¡£¡£¡£CL-bench×öµÄÍêÈ«ÊÇÁíÒ»»ØÊ£¬£¬£¬£¬£¬ËüÔÚ¡°ÔìÎ¡£¡£¡£¡£¡£
Ñо¿ÍŶÓ×éÖ¯ÁËÒ»ÅúÁìÓòר¼Ò£¬£¬£¬£¬£¬Æ¾¿ÕÐé¹¹Á˶à¸öƽÐÐÓîÖæºÍ¼Ù֪ʶ¡£¡£¡£¡£¡£ºÃ±È£¬£¬£¬£¬£¬ËûÃDZàÔìÁËÒ»²¿½Ð¡¶Sol Accord¡·£¨Ë÷¶ûж¨£©µÄÐǼÊÖ´·¨£¬£¬£¬£¬£¬ÔÚÏÖʵÖлù´¡²»±£´æ£¬£¬£¬£¬£¬Ä£×Ó²»¿ÉÄÜÔÚԤѵÁ·Êý¾ÝÀï±³¹ýÏà¹Ø·¨Ìõ£»£»£»£»£»£»£»£»ËûÃÇ»¹±àÔìÁËÒ»¸öSkyNetÎÞÈË»úSDK£¬£¬£¬£¬£¬ÄÚÀïµÄº¯ÊýÃû¡¢Å²ÓùæÔòÈ«ÊǼٵ쬣¬£¬£¬£¬Ä£×ÓÈôÊÇÓÃËüÓ°ÏóÀïµÄPython֪ʶȥд´úÂ룬£¬£¬£¬£¬±Ø´íÎÞÒÉ¡£¡£¡£¡£¡£±ðµÄ£¬£¬£¬£¬£¬ËûÃÇ»¹ÐÞ¸ÄÁËÏÖʵÌìϵÄÄÚÈÝÀ´½¨Éè±äÌ壬£¬£¬£¬£¬ºÃ±È¸Ä±äÀúÊ·ÊÂÎñµÄ×ßÏò¡¢µ÷½â¿ÆÑ§½ç˵£¬£¬£¬£¬£¬²¢ÄÉÈëһЩ¼«ÆäСÖÚ¡¢ÏÕЩ²»¿ÉÄÜ·ºÆðÔÚԤѵÁ·Êý¾ÝÖеij¤Î²ÄÚÈÝ¡£¡£¡£¡£¡£
¼´¼´ÊÇÌåÏÖ×îºÃµÄ GPT-5.1ƽ¾ùµÃ·ÖÒ²½öΪ23.7·Ö¡£¡£¡£¡£¡£
ÕâÖÖͨ¹ý¡°·´ÊÂʵ¡±ºÍ¡°È«Ðé¹¹¡±À´¹¹½¨benchmarkµÄÒªÁ죬£¬£¬£¬£¬ÊǶԿ¹Ë¢°ñ×î³¹µ×¡¢Ò²ÊÇ×îÄѵÄÊֶΡ£¡£¡£¡£¡£Ñé֤Ч¹ûºÜÖ±½Ó£¬£¬£¬£¬£¬ÔÚ²»¸øÈκÎÉÏÏÂÎĵÄÇéÐÎÏ£¬£¬£¬£¬£¬ÈÃGPT-5.1Ö±½Ó×öÕâЩʹÃü£¬£¬£¬£¬£¬Ð§¹ûÖ»ÄÜ×ö¶Ô²»µ½1%¡£¡£¡£¡£¡£Õâ˵Ã÷Ä£×Óȷʵû¼û¹ýÕâЩ֪ʶ£¬£¬£¬£¬£¬±ØÐè´Ó¸ø¶¨µÄÉÏÏÂÎÄÀïѧ¡£¡£¡£¡£¡£Ã»·¨×÷±×£¬£¬£¬£¬£¬Ò²ÊÇͨ¹ýÂÊÖ»ÓÐ17.2%µÄ½¹µãÔµ¹ÊÔÓÉ¡£¡£¡£¡£¡£
×îÖÕ£¬£¬£¬£¬£¬CL-bench°üÀ¨ÁË500¸öÖØ´óÉÏÏÂÎÄ¡¢1899¸öʹÃü¡¢31607¸öÑéÖ¤±ê×¼¡£¡£¡£¡£¡£Æ½¾ùÿ¸öÉÏÏÂÎĵıê×¢ºÄʱԼ20Сʱ£¬£¬£¬£¬£¬ËùÓÐÓÉ×ÊÉîÁìÓòר¼ÒÖÆ×÷¡£¡£¡£¡£¡£Õâ¸öÊÂÇéÁ¿×Ô¼º¾Í˵Ã÷ÎúÑо¿ÍŶӵÄÒ°ÐÄ£¬£¬£¬£¬£¬ËûÃÇÏëÔìµÄ²»ÊÇÒ»¸öË¢·Ö°ñµ¥£¬£¬£¬£¬£¬¶øÊÇÒ»°ÑÕæÕýÄܲâ³öÄ£×Ó¡°Ñ§Ï°ÄÜÁ¦¡±µÄ³ß×Ó¡£¡£¡£¡£¡£
1
ËÄÖÖ½ÇÉ«£¬£¬£¬£¬£¬Ëij¡¿¼ÊÔ
ÔÚÕⳡ²âÊÔÖУ¬£¬£¬£¬£¬AIÐèÒªÊÎÑÝËÄÖÖ½ÇÉ«¡£¡£¡£¡£¡£
ÓÐʱËüÊÇ·¨¹Ù£¬£¬£¬£¬£¬ÐèÒªÒÀ¾ÝÒ»²¿´Óδ¼û¹ýµÄÐé¹¹Ö´·¨¶Ï°¸¡£¡£¡£¡£¡£¸øËüÒ»²¿³¤´ï2.3Íò×Ö¡¢¸Õ¸ÕÉúЧµÄÐÂÖ´·¨£¬£¬£¬£¬£¬ÈÃËüÅÐÒ»ÆðÕæÊµ¾À·×¡£¡£¡£¡£¡£·¨ÌõÈ«ÊÇÐµģ¬£¬£¬£¬£¬ÅÐÀýÈ«ÊÇÐµģ¬£¬£¬£¬£¬Ä£×Ó±ØÐèÏÖ³¡ÔĶÁ¡¢Ã÷È·¡¢Ó¦Óᣡ£¡£¡£¡£
ÓÐʱËüÊdzÌÐòÔ±£¬£¬£¬£¬£¬±ØÐèÓÃÒ»ÖÖȫеÄÓ﷨д´úÂë¡£¡£¡£¡£¡£ºÃ±È»ùÓÚÒ»ÃÅÐÂÉè¼ÆµÄ±à³ÌÓïÑԹ淶£¬£¬£¬£¬£¬ÊµÏÖÒ»¸ö´øÓÐʱ¼äÌõ¼þÖÕÖ¹µÄÖÜÆÚÐÔ³ÌÐò¡£¡£¡£¡£¡£Ä£×ÓÈôÊÇÓÃËüÓ°ÏóÀïµÄÓï·¨£¬£¬£¬£¬£¬±Ø´íÎÞÒÉ¡£¡£¡£¡£¡£Ëü±ØÐèÑÏ¿á×ñÊØÕâ¸ö"¼ÙÎĵµ"µÄ¹æÔò¡£¡£¡£¡£¡£
ÓÐʱËüÊDzÙ×÷Ô±£¬£¬£¬£¬£¬ÐèÒªÔÚÒ»Ì×´Óδ¼û¹ýµÄÊÂÇéÁ÷ϵͳÀïÍê³ÉʹÃü¡£¡£¡£¡£¡£Æ¾Ö¤Ò»·ÝȫеIJúÆ·Êֲᣬ£¬£¬£¬£¬Ò»²½²½Ö´ÐвÙ×÷¡£¡£¡£¡£¡£Á÷³ÌͼÊÇÐµģ¬£¬£¬£¬£¬ÊõÓïÊÇÐµģ¬£¬£¬£¬£¬Ô¼ÊøÌõ¼þÊÇеġ£¡£¡£¡£¡£
×îÄѵÄʱ¼ä£¬£¬£¬£¬£¬ËüÒªÏñ¿ÆÑ§¼ÒÒ»Ñù£¬£¬£¬£¬£¬ÃæÁÙÒ»¶ÑÔÓÂÒµÄʵÑéÊý¾Ý£¬£¬£¬£¬£¬×Ô¼ºÖØÐÂÍÆµ¼¼ÍÂÉ¡£¡£¡£¡£¡£ºÃ±ÈÆÊÎö300·ÝÔʼʵÑéÈÕÖ¾£¬£¬£¬£¬£¬ÍƵ¼¹ØÏµÊ½²¢Ô¤¼Æ¹²Õñ³£Êý¡£¡£¡£¡£¡£Ç°ÈýÖÖ½ÇɫʵÖÊÉÏÊÇÑÝÒïÍÆÀí£¬£¬£¬£¬£¬¸øÄã¹æÔòÈÃÄãÓ¦Óᣡ£¡£¡£¡£ÕâÒ»ÖÖÊǹéÄÉÍÆÀí£¬£¬£¬£¬£¬ÈÃÄã´ÓÊý¾ÝÖÐ×Ô¼º·¢Ã÷¹æÔò¡£¡£¡£¡£¡£
ÕâËÄÀೡ¾°ÁýÕÖÁ˴󲿷ÖÕæÊµÊÂÇéÖÐÐèÒªµÄѧϰÄÜÁ¦£º¶ÁÎĵµ¡¢Ñ§¹æÔò¡¢ÕÕÁ÷³Ì¡¢ÕÒ¼ÍÂÉ¡£¡£¡£¡£¡£ÕâÒ²ÊÇΪʲôCL-benchµÄЧ¹ûÔÆÔÆÁîÈ˵£ÐÄ£¬£¬£¬£¬£¬ÈôÊÇÄ£×ÓÁ¬ÕâЩ»ù±¾µÄѧϰʹÃü¶¼×öÇ·ºÃ£¬£¬£¬£¬£¬ËüÔÚÕæÊµÊÂÇ鳡¾°ÖеÄÌåÏÖ¿ÉÏë¶øÖª¡£¡£¡£¡£¡£
CL-benchµÄʹÃüÖÖ±ðÂþÑÜ
1
Ç°ÑØÄ£×ÓÕûÌå·³µ
Ñо¿ÍŶÓÔÚCL-benchÉϲâÊÔÁËÊ®¸ö×îÏȽøµÄÓïÑÔÄ£×Ó£¬£¬£¬£¬£¬Ð§¹ûÏ൱Òõ»Þ¡£¡£¡£¡£¡£
ƽ¾ùʹÃü½â¾öÂÊÖ»ÓÐ17.2%¡£¡£¡£¡£¡£ÌåÏÖ×îºÃµÄGPT-5.1 (High)Ò²Ö»ÓÐ23.7%¡£¡£¡£¡£¡£ÒªÖªµÀ£¬£¬£¬£¬£¬ËùÓÐÍê³ÉʹÃüÐèÒªµÄÐÅÏ¢¶¼ÒѾÃ÷È·¸ø³öÁË£¬£¬£¬£¬£¬¾ÍÔÚÉÏÏÂÎÄÀ£¬£¬£¬£¬Ä£×ÓÈ´ÔÚ¾ø´ó´ó¶¼Ê¹ÃüÉÏʧ°ÜÁË¡£¡£¡£¡£¡£

ÂÛÎÄÏêϸÆÊÎöÁËʧ°ÜÔµ¹ÊÔÓÉ£¬£¬£¬£¬£¬¼¸¸ö·¢Ã÷ÖµµÃ×¢ÖØ¡£¡£¡£¡£¡£
ºöÂÔ»òÎóÓÃÉÏÏÂÎÄÊǵ¼ÖÂʧ°ÜµÄÖ÷ÒªÔµ¹ÊÔÓÉ¡£¡£¡£¡£¡£µ¼Ö¹ýʧµÄÖ÷ÒòÍùÍù²¢·ÇÐÅϢȱʧ£¬£¬£¬£¬£¬Ä£×Ó¶ÔÉÏÏÂÎÄÒªº¦Ï¸½ÚµÄºöÊÓ²ÅÊÇÖ¢½áËùÔÚ¡£¡£¡£¡£¡£¸üÓÐÒâ˼µÄÊÇ£¬£¬£¬£¬£¬ÔÚÐí¶àÇéÐÎÏ£¬£¬£¬£¬£¬Ä£×Ó»áÇãÏòÓÚʹÓÃËüÔÚԤѵÁ·½×¶Îѧµ½µÄ"ÀÏÂÄÀú"À´½â¾öʹÃü£¬£¬£¬£¬£¬×ÝÈ»ÉÏÏÂÎÄÃ÷È·½ç˵ÁËÐµĹæÔò¡¢¿´·¨»ò³ÌÐò£¬£¬£¬£¬£¬ËüÒ²²»È¥Ñ§Ï°ºÍʹÓᣡ£¡£¡£¡£Õâ¾ÍÏñÒ»¸öÍçÇ¿µÄÀÏÔ±¹¤£¬£¬£¬£¬£¬ÄþÔ¸ÓÃ×Ô¼ºµÄÀϲ½·¥£¬£¬£¬£¬£¬Ò²²»¿ÏÒâ¿´ÐÂÎĵµ¡£¡£¡£¡£¡£
³¤ÉÏÏÂÎÄ´¦Öóͷ£ºÍÖ¸Áî×ñÕÕÊÇÐëÒªµ«²»³ä·ÖÌõ¼þ¡£¡£¡£¡£¡£ÄÇЩÄÑÒÔ¿çÉÏÏÂÎÄ×·×ÙÒÀÀµ¹ØÏµ»òÄÑÒÔ׼ȷ×ñÕÕÔ¼ÊøµÄÄ£×Ó£¬£¬£¬£¬£¬ÌåÏÖȷʵ¸ü²î¡£¡£¡£¡£¡£µ«×ÝÈ»ÊÇÄܹ»´¦Öóͷ£³¤ÊäÈë¡¢¿É¿¿×ñÕÕÖ¸ÁîµÄÄ£×Ó£¬£¬£¬£¬£¬ÈÔÈ»ÔÚÐí¶àʹÃüÉÏʧ°Ü¡£¡£¡£¡£¡£Õâ˵Ã÷ÉÏÏÂÎÄѧϰÐèÒªµÄÄÜÁ¦£¬£¬£¬£¬£¬Ô¶²»Ö¹ÄÜ´¦Öóͷ£³¤Îı¾ºÍÄÜ¡°Ìý»°¡±¡£¡£¡£¡£¡£
¹éÄÉÍÆÀíÔ¶±ÈÑÝÒïÍÆÀíÄÑ¡£¡£¡£¡£¡£ÔÚ¿ÆÑ§¼ÒÀàʹÃüÉÏ£¬£¬£¬£¬£¬Ä£×ÓµÄÌåÏÖÏÔןü²î£¬£¬£¬£¬£¬Ê¹Ãü½â¾öÂÊͨ³£µÍÓÚ10%£¬£¬£¬£¬£¬²¢ÇÒЧ¹û²¨¶¯ºÜ´ó¡£¡£¡£¡£¡£´ÓÊý¾ÝÖз¢Ã÷¼ÍÂÉ£¬£¬£¬£¬£¬±ÈÓ¦Óøø¶¨µÄ¹æÔòÒªÄÑ¿°¶à¡£¡£¡£¡£¡£Õâ»òÐíÖ¸ÏòÁËÄ¿½ñ´óÄ£×Ӽܹ¹µÄÒ»¸ö¸ùÌìÐÔ¾ÖÏÞ¡£¡£¡£¡£¡£
±ðµÄ£¬£¬£¬£¬£¬ÂÛÎÄ»¹·¢Ã÷£¬£¬£¬£¬£¬¸ü¸ßµÄÍÆÀíÇ¿¶Èͨ³£ÄÜÌáÉýÉÏÏÂÎÄѧϰЧ¹û¡£¡£¡£¡£¡£ºÃ±ÈGPT-5.1ÔÚ¸ßÍÆÀíÇ¿¶ÈÉèÖÃÏ£¬£¬£¬£¬£¬ÔÚijЩʹÃüÉϵÄÌåÏÖÌáÉýÁËÔ¼6%¡£¡£¡£¡£¡£µ«ÆäËûÄ£×ÓÌáÉýÓÐÏÞÉõÖÁϽµ£¬£¬£¬£¬£¬ËµÃ÷µ¥¿¿¶àÏëÒ»»á¶ù²¢²»·ó£¬£¬£¬£¬£¬Ä£×Ó»¹±ØÐèÄÜ׼ȷÎüÊÕºÍ×éÖ¯ÉÏÏÂÎÄÐÅÏ¢¡£¡£¡£¡£¡£
1
Ҧ˳ÓêµÄÔ¤ÅÐ
2025Äê4Ô£¬£¬£¬£¬£¬Ò¦Ë³ÓêÔÚ²©ÎÄ¡¶The Second Half¡·ÖÐÌá³öÁËÒ»¸ö½¹µã¿´·¨£¬£¬£¬£¬£¬AIÉú³¤ÕýÔÚ´Ó¡°Éϰ볡¡±½øÈ롰ϰ볡¡±¡£¡£¡£¡£¡£Éϰ볡µÄÖ÷ÌâÊÇÔõÑùѵÁ·³ö¸üÇ¿µÄÄ£×Ó£¬£¬£¬£¬£¬¸ü´óµÄ²ÎÊý¡¢¸ü¶àµÄÊý¾Ý¡¢¸üÇ¿µÄËãÁ¦¡£¡£¡£¡£¡£Ï°볡µÄÖ÷ÌâÔò±äÁË£¬£¬£¬£¬£¬ÔõÑù½ç˵׼ȷµÄÎÊÌ⣬£¬£¬£¬£¬ÔõÑùÆÀ¹ÀÕæÕýµÄǰ½ø¡£¡£¡£¡£¡£
ËûдµÀ£¬£¬£¬£¬£¬ÆÀ¹À½«±ÈѵÁ·¸üÖ÷Òª¡£¡£¡£¡£¡£ÎÒÃDz»ÔÙÖ»ÊÇÎÊ¡°ÎÒÃÇÄÜѵÁ·³öÒ»¸öÄܽâ¾öXµÄÄ£×ÓÂ𡱣¬£¬£¬£¬£¬¶øÊÇÔÚÎÊ¡°ÎÒÃÇÓ¦¸ÃѵÁ·AIÈ¥×öʲô£¬£¬£¬£¬£¬ÒÔ¼°ÔõÑùȨºâÕæÕýµÄǰ½ø¡±¡£¡£¡£¡£¡£
ÔÚÒ»´Î·Ã̸ÖУ¬£¬£¬£¬£¬Ëû½øÒ»²½Ú¹ÊÍ£¬£¬£¬£¬£¬ÏÖÔÚÒªÁìµÄÎÊÌâÒÑ»ù±¾½â¾ö£¬£¬£¬£¬£¬ÕæÕýÖ÷ÒªµÄÊÇ£¬£¬£¬£¬£¬ÎÒÃÇÒªÓÃÕâ¸öͨÓÃÒªÁ죬£¬£¬£¬£¬½â¾öʲôÎÊÌ⣿£¿£¿£¿£¿£¿£¿£¿
CL-bench½ç˵ÁËʲôÎÊÌ⣿£¿£¿£¿£¿£¿£¿£¿Ëü½ç˵µÄÎÊÌâÊÇ£¬£¬£¬£¬£¬Ä£×ÓÄÜ·ñ´ÓÄ¿½ñÉÏÏÂÎÄÖÐѧϰ£¿£¿£¿£¿£¿£¿£¿£¿
Õâ¸öÎÊÌâ֮ǰ±»ºöÊÓÁË¡£¡£¡£¡£¡£ÐÐÒµµÄÒþº¬¼ÙÉèÊÇ£¬£¬£¬£¬£¬Ö»ÒªÉÏÏÂÎĸøµ½Î»£¨context engineering×öµÃºÃ£©£¬£¬£¬£¬£¬Ä£×Ó¾ÍÄÜÍê³ÉʹÃü¡£¡£¡£¡£¡£CL-benchµÄÊý¾ÝÍ»ÆÆÁËÕâ¸ö¼ÙÉ裬£¬£¬£¬£¬¸øµ½Î»£¬£¬£¬£¬£¬²»¼´ÊÇ×öµÃ¶Ô¡£¡£¡£¡£¡£ÉÏÏÂÎÄѧϰ£¬£¬£¬£¬£¬×÷ΪһÏî»ù´¡µÄÄ£×ÓÄÜÁ¦£¬£¬£¬£¬£¬±»ÑÏÖØµÍ¹ÀÁË¡£¡£¡£¡£¡£
Ҧ˳ÓêÔÚ2024ÄêÖ÷µ¼¹ýÁíÒ»¸öbenchmark£¬£¬£¬£¬£¬¦Ó-bench£¨ICLR 2025£©¡£¡£¡£¡£¡£ËÈ˲âÊÔ¹Ø×¢µÄÊÇAgentÄÜ·ñ×ñÕÕÁìÓò¹æÔò¡¢ÓëÓû§¾ÙÐжàÂÖ½»»¥¡£¡£¡£¡£¡£CL-benchÔò¸ü½øÒ»²½£¬£¬£¬£¬£¬²âµÄÊÇÄ£×ÓÄÜ·ñ´ÓÉÏÏÂÎÄѧϰÐÂ֪ʶ¡£¡£¡£¡£¡£Á½ÕßÅäºÏÖ¸ÏòÒ»¸öÅжϣ¬£¬£¬£¬£¬ÕæÊµÌìÏÂÐèÒªµÄÊÇѧϰÄÜÁ¦£¬£¬£¬£¬£¬¶ø·Ç×öÌâÄÜÁ¦¡£¡£¡£¡£¡£
CL-benchÂÛÎÄÔÎÄÓÐÒ»¶Î»°ºÜ¾«×¼£º´óÓïÑÔÄ£×ÓÖ÷ÒªÒÀÀµ¡°²ÎÊý»¯ÖªÊ¶¡±£¬£¬£¬£¬£¬ÕâÊÇԤѵÁ·½×¶ÎѹËõ½øÄ£×ÓÈ¨ÖØµÄ¾²Ì¬Ó°Ï󡣡£¡£¡£¡£ÍÆÀíʱ£¬£¬£¬£¬£¬Ä£×Ó´ó¶àŲÓÃÕâЩ´æ´¢µÄÄÚ²¿ÖªÊ¶£¬£¬£¬£¬£¬¶ø·Ç×Ô¶¯ÖØÐÂÊäÈëÐÅÏ¢ÖÐÎüÊÕÑø·Ö¡£¡£¡£¡£¡£Òò´Ë£¬£¬£¬£¬£¬Ä¿½ñÓÅ»¯µÄÄ£×ÓÉÆÓÚÍÆÀíËüÃÇ¡°ÖªµÀ¡±µÄÊÂÇ飬£¬£¬£¬£¬µ«Óû§ÐèÒªµÄÊÇÈÃÄ£×Ó½â¾öÒÀÀµÓÚÔÓÂÒÇÒ¶¯Ì¬×ª±äµÄÉÏÏÂÎĵÄʹÃü¡£¡£¡£¡£¡£
1
ÐÐÒµÕýÔÚ±¬·¢Ê²Ã´×ª±ä
ÈôÊǰѽü¼¸ÄêAIÉú³¤µÄÖ÷ÐýÂÉ×ö¸ö¼òÆÓÊáÀí£¬£¬£¬£¬£¬´óÖÂÊÇÕâÑùµÄ£º2024ÄêµÄÖ÷ÐýÂÉÊÇScaling£¬£¬£¬£¬£¬¸ü´óµÄÄ£×Ó¡¢¸ü¶àµÄÊý¾Ý¡¢¸üÇ¿µÄËãÁ¦£»£»£»£»£»£»£»£»2025ÄêµÄÖ÷ÐýÂÉÊÇReasoning£¬£¬£¬£¬£¬ÒÔo1¡¢R1¡¢Deep ResearchΪ´ú±íµÄÍÆÀíÄÜÁ¦ÌáÉý¡£¡£¡£¡£¡£
ÄÇô2026ÄêÄØ£¿£¿£¿£¿£¿£¿£¿£¿CL-benchÖ¸ÏòÁËÒ»¸ö¿ÉÄܵÄÐÂÆ«Ïò£¬£¬£¬£¬£¬Context Learning¡£¡£¡£¡£¡£
´ÓPrompt Engineeringµ½Context LearningµÄÑݽøÂ·¾¶
ÓÐÒâ˼µÄÊÇ£¬£¬£¬£¬£¬Î÷·½´ó³§ÏÖÔÚÖ÷ÒªÔÚ½â¾öÁíÒ»¸öÎÊÌâ¡£¡£¡£¡£¡£AnthropicÔÚ2024Äêµ×Ðû²¼ÁËMCP£¨Model Context Protocol£©£¬£¬£¬£¬£¬OpenAIºÍGoogleËæºó¸ú½ø£¬£¬£¬£¬£¬Õâ¸öÐÒé±»³ÆÎªAI½çµÄUSB-C"£¬£¬£¬£¬£¬Ä¿µÄÊÇÈÃÄ£×Ó¸üÈÝÒ×½ÓÈëÍⲿ¹¤¾ßºÍÊý¾ÝÔ´¡£¡£¡£¡£¡£2025Äê12Ô£¬£¬£¬£¬£¬Anthropic¡¢OpenAIºÍBlockÍŽὨÉèÁËAgentic AI Foundation£¬£¬£¬£¬£¬½«MCP¾èÔù¸øLinux»ù½ð»á£¬£¬£¬£¬£¬Íƶ¯¿ªÔ´±ê×¼»¯¡£¡£¡£¡£¡£Í¬Ô£¬£¬£¬£¬£¬AnthropicÓÖÐû²¼ÁËAgent Skills¿ª·Å±ê×¼£¬£¬£¬£¬£¬ÈÃAIÄÜÖ´ÐиüÏêϸµÄʹÃü¡£¡£¡£¡£¡£
ÕâЩÆð¾¢½â¾öµÄ¶¼ÊÇÔõÑù°ÑcontextËͽøÄ£×ÓµÄÎÊÌ⣬£¬£¬£¬£¬ÔõÑùÈÃÄ£×Ó½ÓÈë¸ü´ó¶¼¾ÝÔ´£¬£¬£¬£¬£¬ÔõÑùÈÃÄ£×ÓŲÓøü¶à¹¤¾ß£¬£¬£¬£¬£¬ÔõÑùÈÃÄ£×ÓÖ´ÐиüÖØ´óµÄÊÂÇéÁ÷¡£¡£¡£¡£¡£
CL-benchÎʵÄÊÇ£¬£¬£¬£¬£¬ËͽøÈ¥Ö®ºó£¬£¬£¬£¬£¬Ä£×ÓÄÜѧ»áÂ𣿣¿£¿£¿£¿£¿£¿£¿
Anthropic×Ô¼ºµÄÑо¿Ò²´¥¼°ÁËÀàËÆÎÊÌâ¡£¡£¡£¡£¡£ËûÃÇÔÚ¹ØÓÚcontext engineeringµÄ²©ÎÄÖÐÌáµ½ÁËcontext rotÕ÷Ï󣬣¬£¬£¬£¬Ëæ×ÅÉÏÏÂÎij¤¶ÈÔöÌí£¬£¬£¬£¬£¬Ä£×ÓÕÙ»ØÐÅÏ¢µÄÄÜÁ¦»áϽµ¡£¡£¡£¡£¡£µ«CL-benchÕ¹ÏÖµÄÎÊÌâÊÇ¡£¡£¡£¡£¡£×ÝÈ»ÉÏÏÂÎIJ»³¤£¬£¬£¬£¬£¬Ä£×ÓÒ²·×Æç¶¨ÄÜ¡°Ñ§»á¡±ÄÚÀïµÄÐÂ֪ʶ¡£¡£¡£¡£¡£ÕâÊÇѧϰÄÜÁ¦£¬£¬£¬£¬£¬Óë¼ìË÷Î޹ء£¡£¡£¡£¡£
ÂÛÎÄÔÚÕ¹Íû²¿·ÖÌáµ½ÁËÒ»¸ö¸üÔ¶µÄÌôÕ½£¬£¬£¬£¬£¬×ÝÈ»ÉÏÏÂÎÄѧϰÄÜÁ¦ÌáÉýÁË£¬£¬£¬£¬£¬ËüÈÔÈ»ÊÇ¡°»áÏûÊŵġ±£¨ephemeral£©£¬£¬£¬£¬£¬ÉÏÏÂÎÄ´°¿ÚÇå¿Õ£¬£¬£¬£¬£¬Ñ§µ½µÄ¹¤¾ß¾ÍûÁË¡£¡£¡£¡£¡£ÏÂÒ»²½µÄÌôÕ½ÊÇMemory Consolidation£¨Ó°ÏóÀο¿£©£¬£¬£¬£¬£¬ÔõÑùÈôÓÉÏÏÂÎÄÖÐѧµ½µÄ֪ʶ³¤ÆÚ»¯£¿£¿£¿£¿£¿£¿£¿£¿Õâ¿ÉÄÜÊÇ2026ÄêÖ®ºóµÄÐÂÕ½³¡¡£¡£¡£¡£¡£
1
Õâ¶ÔÌÚѶÒâζ×Åʲô
Ҧ˳ÓêÈëÖ°ÌÚѶºóÖ÷µ¼µÄµÚÒ»¸öÑо¿Êä³ö£¬£¬£¬£¬£¬ËûÑ¡ÔñÓÃÒ»¸öbenchmarkÖØÐ½ç˵ÎÊÌâ¡£¡£¡£¡£¡£
ÏÖÔÚÌÚѶ»ìÔªÔÚº£ÄÚ´óÄ£×ÓÊг¡µÄ·Ý¶î²¢²»ÁìÏÈ£¬£¬£¬£¬£¬×Ö½Ú¶¹°ü¡¢°¢ÀïͨÒåÅÅÔÚÇ°Ãæ¡£¡£¡£¡£¡£ÔÚÕâ¸öÊ±ÊÆÏ£¬£¬£¬£¬£¬ÌÚѶѡÔñ¹Ø×¢Ò»¸ö¸ü»ù´¡µÄÎÊÌ⣺ģ×ÓµÄѧϰÄÜÁ¦¡£¡£¡£¡£¡£
Õâ¸öÑ¡Ôñ¿ÉÄܺÍÌÚѶµÄÓªÒµ»ùÒòÓйء£¡£¡£¡£¡£ÌÚѶÊÇÉç½»ºÍÓÎÏ·¾ÞÍ·£¬£¬£¬£¬£¬Æä½¹µãӪҵʵÖʾÍÊǺ£Á¿µÄ¡°¶¯Ì¬ÉÏÏÂÎÄ¡±£¬£¬£¬£¬£¬Ì¸Ìì¼Í¼¡¢ÓÎϷ״̬¡¢Óû§ÐÐΪ¡£¡£¡£¡£¡£Ò¦Ë³ÓêÇ¿µ÷Context Learning£¬£¬£¬£¬£¬¿ÉÄÜÊÇÔÚΪÌÚѶ×î½¹µãµÄÓªÒµ³¡¾°´òµØ»ù£¬£¬£¬£¬£¬ÈÃAI¶Á¶®´ËʱÏÖÔÚµÄÓû§£¬£¬£¬£¬£¬¶ø²»ÊÇͨ¹ýԤѵÁ·¶Á¶®ÒÑÍùµÄÓû§¡£¡£¡£¡£¡£
ËûÈëÖ°ºó˵¹ý£ºÌÚѶTo C»ùÒò¸üÇ¿£¬£¬£¬£¬£¬ÒªË¼Ë÷ÔõÑùÈôóÄ£×Ó¸øÓû§Ìṩ¸ü¶à¼ÛÖµ¡£¡£¡£¡£¡£Ðí¶àʱ¼äÐèÒªµÄ²»ÊǸü´óÄ£×Ó¡¢¸üÇ¿µÄÇ¿»¯Ñ§Ï°£¬£¬£¬£¬£¬¶øÊÇÌØÁíÍâContext¡£¡£¡£¡£¡£
Õâ»òÐí²ÅÊÇAIÕæÕý½øÈëÈËÀàÉç»áµÄÃÅÆ±£¬£¬£¬£¬£¬²»ÔÙ×öÒ»¸ö²©Ñ§µÄÅÔ¹ÛÕß¡£¡£¡£¡£¡£

µã¸ö¡°°®ÐÄ¡±£¬£¬£¬£¬£¬ÔÙ×ßÁÉÄþ¹Ø¶«ÀäÁ´ÎïÁ÷ÓÐÏÞ¹«Ë¾°É