Ҧ˳ÓêÔÚÌÚѶÊ׸öÑо¿£ºÔÚ¡°ÉÏÏÂÎÄ¡±ÕâÊÂÉÏ£¬£¬ £¬£¬ £¬ÔÚ×ùµÄÁÐλ¶¼È±·¦¸ñ
2026-02-26 19:51:43

×÷Õß £ü ÖÜһЦÓÊÏä £ü zhouyixiao@pingwest.com

2025Äê4Ô£¬£¬ £¬£¬ £¬»¹ÔÚOpenAIµÄҦ˳Óê·¢ÁËһƪ²©ÎÄ¡¶The Second Half¡·£¬£¬ £¬£¬ £¬Ìá³öÒ»¸öÅжϣºAI½øÈëϰ볡ÁË£¬£¬ £¬£¬ £¬½ÓÏÂÀ´±ÈµÄ²»ÊÇË­Ä£×Ó¸ü´ó£¬£¬ £¬£¬ £¬¶øÊÇË­ÄܸüºÃµØ½ç˵ÎÊÌâ¡£¡£¡£¡£¡£

°ëÄêºóËû¼ÓÈëÌÚѶ¡£¡£¡£¡£¡£ÓÖ¹ýÁËÁ½¸öÔ£¬£¬ £¬£¬ £¬ËûÖ÷µ¼µÄµÚÒ»¸öÑо¿Ð§¹ûÐû²¼ÁË¡£¡£¡£¡£¡£Õâ¸öЧ¹û²¢Î´ÍƳöÐÂÄ£×Ó£¬£¬ £¬£¬ £¬ËüÖ±½ÓÅ׳öÁËÒ»¸öÊý¾Ý£¬£¬ £¬£¬ £¬GPT-5.1ÔÚÒ»ÏîвâÊÔÖÐÖ»Äõ½ÁË23.7%¡£¡£¡£¡£¡£

²âÊÔµÄÉ趨ºÜ¼òÆÓ£¬£¬ £¬£¬ £¬°ÑËùÓÐÐèÒªµÄÐÅÏ¢¶¼·ÅÔÚÉÏÏÂÎÄÀ£¬ £¬£¬ £¬ÈÃÄ£×ÓÈ¥Íê³ÉʹÃü¡£¡£¡£¡£¡£¿£¿£¿£¿£¿£¿£¿£¿¼µÄÊÇÄ£×ÓÄÜ·ñ´ÓÑÛǰµÄÖÊÁÏÀïѧ»áй¤¾ß¡£¡£¡£¡£¡£

Ч¹ûÊÇÄ£×Ó¿´ÁË£¬£¬ £¬£¬ £¬µ«Ã»Ñ§»á¡£¡£¡£¡£¡£

1

û·¨×÷±×µÄ¿¼ÊÔ

ÕâÆªÂÛÎĽÐCL-bench£¬£¬ £¬£¬ £¬È«³ÆContext Learning Benchmark£¬£¬ £¬£¬ £¬2026Äê2ÔÂ3ÈÕÓÉÌÚѶ»ìÔªÍŶӺ͸´µ©´óѧÍŽáÐû²¼¡£¡£¡£¡£¡£×÷ΪÏîÄ¿ÈÏÕæÈË£¬£¬ £¬£¬ £¬Ò¦Ë³ÓêÅÅÔÚ×÷ÕßÁбíµÄ×îºóһλ¡£¡£¡£¡£¡£

Context Learning²»ÊÇп´·¨£¬£¬ £¬£¬ £¬µ«ÕâÆªÂÛÎĶÔËüµÄ½ç˵¼«Îª¿Á¿Ì£¬£¬ £¬£¬ £¬Ä£×Ó±ØÐè´ÓÉÏÏÂÎÄÖÐѧϰµ½Ô¤ÑµÁ·½×¶Î²»±£´æµÄÐÂ֪ʶ£¬£¬ £¬£¬ £¬²¢×¼È·Ó¦Óᣡ£¡£¡£¡£¼òÆÓ˵£¬£¬ £¬£¬ £¬ÒªÈÃÄ£×ÓÏÖ³¡Ñ§»áËüû¼û¹ýµÄ¹¤¾ß£¬£¬ £¬£¬ £¬²»µ«ÊÇ¡°»ØÒ䡱ËüÒÔǰ¼û¹ýµÄÄÚÈÝ¡£¡£¡£¡£¡£

ΪÁËʵÏÖÕâ¸öÄ¿µÄ£¬£¬ £¬£¬ £¬Ñо¿ÍŶÓÔÚÊý¾Ý¹¹½¨ÉÏÏÂÁ˺ݹ¦·ò¡£¡£¡£¡£¡£

ÏÖÔÚÒµ½ç±ÜÃâÊý¾ÝÎÛȾ×î³£¼ûµÄ×ö·¨½ÏÁ¿¼òÆÓ´Ö±©£¬£¬ £¬£¬ £¬Éè׼ʱ¼äÇиîµã£¨ºÃ±ÈÖ»¿¼2024ÄêÒÔºóµÄÐÂÎÅ£©¡¢°ÑÌâ¿â²ØÆðÀ´²»¹ûÕæ¡¢»òÕßÓÃËã·¨È¥ÖØ¡£¡£¡£¡£¡£CL-bench×öµÄÍêÈ«ÊÇÁíÒ»»ØÊ£¬£¬ £¬£¬ £¬ËüÔÚ¡°ÔìÎ¡£¡£¡£¡£¡£

Ñо¿ÍŶÓ×éÖ¯ÁËÒ»ÅúÁìÓòר¼Ò£¬£¬ £¬£¬ £¬Æ¾¿ÕÐé¹¹Á˶à¸öƽÐÐÓîÖæºÍ¼Ù֪ʶ¡£¡£¡£¡£¡£ºÃ±È£¬£¬ £¬£¬ £¬ËûÃDZàÔìÁËÒ»²¿½Ð¡¶Sol Accord¡·£¨Ë÷¶ûЭ¶¨£©µÄÐǼÊÖ´·¨£¬£¬ £¬£¬ £¬ÔÚÏÖʵÖлù´¡²»±£´æ£¬£¬ £¬£¬ £¬Ä£×Ó²»¿ÉÄÜÔÚԤѵÁ·Êý¾ÝÀï±³¹ýÏà¹Ø·¨Ìõ£»£»£»£»£»£»£»£»ËûÃÇ»¹±àÔìÁËÒ»¸öSkyNetÎÞÈË»úSDK£¬£¬ £¬£¬ £¬ÄÚÀïµÄº¯ÊýÃû¡¢Å²ÓùæÔòÈ«ÊǼٵ쬣¬ £¬£¬ £¬Ä£×ÓÈôÊÇÓÃËüÓ°ÏóÀïµÄPython֪ʶȥд´úÂ룬£¬ £¬£¬ £¬±Ø´íÎÞÒÉ¡£¡£¡£¡£¡£±ðµÄ£¬£¬ £¬£¬ £¬ËûÃÇ»¹ÐÞ¸ÄÁËÏÖʵÌìϵÄÄÚÈÝÀ´½¨Éè±äÌ壬£¬ £¬£¬ £¬ºÃ±È¸Ä±äÀúÊ·ÊÂÎñµÄ×ßÏò¡¢µ÷½â¿ÆÑ§½ç˵£¬£¬ £¬£¬ £¬²¢ÄÉÈëһЩ¼«ÆäСÖÚ¡¢ÏÕЩ²»¿ÉÄÜ·ºÆðÔÚԤѵÁ·Êý¾ÝÖеij¤Î²ÄÚÈÝ¡£¡£¡£¡£¡£

¼´¼´ÊÇÌåÏÖ×îºÃµÄ GPT-5.1ƽ¾ùµÃ·ÖÒ²½öΪ23.7·Ö¡£¡£¡£¡£¡£

ÕâÖÖͨ¹ý¡°·´ÊÂʵ¡±ºÍ¡°È«Ðé¹¹¡±À´¹¹½¨benchmarkµÄÒªÁ죬£¬ £¬£¬ £¬ÊǶԿ¹Ë¢°ñ×î³¹µ×¡¢Ò²ÊÇ×îÄѵÄÊֶΡ£¡£¡£¡£¡£Ñé֤Ч¹ûºÜÖ±½Ó£¬£¬ £¬£¬ £¬ÔÚ²»¸øÈκÎÉÏÏÂÎĵÄÇéÐÎÏ£¬£¬ £¬£¬ £¬ÈÃGPT-5.1Ö±½Ó×öÕâЩʹÃü£¬£¬ £¬£¬ £¬Ð§¹ûÖ»ÄÜ×ö¶Ô²»µ½1%¡£¡£¡£¡£¡£Õâ˵Ã÷Ä£×Óȷʵû¼û¹ýÕâЩ֪ʶ£¬£¬ £¬£¬ £¬±ØÐè´Ó¸ø¶¨µÄÉÏÏÂÎÄÀïѧ¡£¡£¡£¡£¡£Ã»·¨×÷±×£¬£¬ £¬£¬ £¬Ò²ÊÇͨ¹ýÂÊÖ»ÓÐ17.2%µÄ½¹µãÔµ¹ÊÔ­ÓÉ¡£¡£¡£¡£¡£

×îÖÕ£¬£¬ £¬£¬ £¬CL-bench°üÀ¨ÁË500¸öÖØ´óÉÏÏÂÎÄ¡¢1899¸öʹÃü¡¢31607¸öÑéÖ¤±ê×¼¡£¡£¡£¡£¡£Æ½¾ùÿ¸öÉÏÏÂÎĵıê×¢ºÄʱԼ20Сʱ£¬£¬ £¬£¬ £¬ËùÓÐÓÉ×ÊÉîÁìÓòר¼ÒÖÆ×÷¡£¡£¡£¡£¡£Õâ¸öÊÂÇéÁ¿×Ô¼º¾Í˵Ã÷ÎúÑо¿ÍŶӵÄÒ°ÐÄ£¬£¬ £¬£¬ £¬ËûÃÇÏëÔìµÄ²»ÊÇÒ»¸öË¢·Ö°ñµ¥£¬£¬ £¬£¬ £¬¶øÊÇÒ»°ÑÕæÕýÄܲâ³öÄ£×Ó¡°Ñ§Ï°ÄÜÁ¦¡±µÄ³ß×Ó¡£¡£¡£¡£¡£

1

ËÄÖÖ½ÇÉ«£¬£¬ £¬£¬ £¬Ëij¡¿¼ÊÔ

ÔÚÕⳡ²âÊÔÖУ¬£¬ £¬£¬ £¬AIÐèÒªÊÎÑÝËÄÖÖ½ÇÉ«¡£¡£¡£¡£¡£

ÓÐʱËüÊÇ·¨¹Ù£¬£¬ £¬£¬ £¬ÐèÒªÒÀ¾ÝÒ»²¿´Óδ¼û¹ýµÄÐé¹¹Ö´·¨¶Ï°¸¡£¡£¡£¡£¡£¸øËüÒ»²¿³¤´ï2.3Íò×Ö¡¢¸Õ¸ÕÉúЧµÄÐÂÖ´·¨£¬£¬ £¬£¬ £¬ÈÃËüÅÐÒ»ÆðÕæÊµ¾À·×¡£¡£¡£¡£¡£·¨ÌõÈ«ÊÇÐµģ¬£¬ £¬£¬ £¬ÅÐÀýÈ«ÊÇÐµģ¬£¬ £¬£¬ £¬Ä£×Ó±ØÐèÏÖ³¡ÔĶÁ¡¢Ã÷È·¡¢Ó¦Óᣡ£¡£¡£¡£

ÓÐʱËüÊdzÌÐòÔ±£¬£¬ £¬£¬ £¬±ØÐèÓÃÒ»ÖÖȫеÄÓ﷨д´úÂë¡£¡£¡£¡£¡£ºÃ±È»ùÓÚÒ»ÃÅÐÂÉè¼ÆµÄ±à³ÌÓïÑԹ淶£¬£¬ £¬£¬ £¬ÊµÏÖÒ»¸ö´øÓÐʱ¼äÌõ¼þÖÕÖ¹µÄÖÜÆÚÐÔ³ÌÐò¡£¡£¡£¡£¡£Ä£×ÓÈôÊÇÓÃËüÓ°ÏóÀïµÄÓï·¨£¬£¬ £¬£¬ £¬±Ø´íÎÞÒÉ¡£¡£¡£¡£¡£Ëü±ØÐèÑÏ¿á×ñÊØÕâ¸ö"¼ÙÎĵµ"µÄ¹æÔò¡£¡£¡£¡£¡£

ÓÐʱËüÊDzÙ×÷Ô±£¬£¬ £¬£¬ £¬ÐèÒªÔÚÒ»Ì×´Óδ¼û¹ýµÄÊÂÇéÁ÷ϵͳÀïÍê³ÉʹÃü¡£¡£¡£¡£¡£Æ¾Ö¤Ò»·ÝȫеIJúÆ·Êֲᣬ£¬ £¬£¬ £¬Ò»²½²½Ö´ÐвÙ×÷¡£¡£¡£¡£¡£Á÷³ÌͼÊÇÐµģ¬£¬ £¬£¬ £¬ÊõÓïÊÇÐµģ¬£¬ £¬£¬ £¬Ô¼ÊøÌõ¼þÊÇеġ£¡£¡£¡£¡£

×îÄѵÄʱ¼ä£¬£¬ £¬£¬ £¬ËüÒªÏñ¿ÆÑ§¼ÒÒ»Ñù£¬£¬ £¬£¬ £¬ÃæÁÙÒ»¶ÑÔÓÂÒµÄʵÑéÊý¾Ý£¬£¬ £¬£¬ £¬×Ô¼ºÖØÐÂÍÆµ¼¼ÍÂÉ¡£¡£¡£¡£¡£ºÃ±ÈÆÊÎö300·ÝԭʼʵÑéÈÕÖ¾£¬£¬ £¬£¬ £¬ÍƵ¼¹ØÏµÊ½²¢Ô¤¼Æ¹²Õñ³£Êý¡£¡£¡£¡£¡£Ç°ÈýÖÖ½ÇɫʵÖÊÉÏÊÇÑÝÒïÍÆÀí£¬£¬ £¬£¬ £¬¸øÄã¹æÔòÈÃÄãÓ¦Óᣡ£¡£¡£¡£ÕâÒ»ÖÖÊǹéÄÉÍÆÀí£¬£¬ £¬£¬ £¬ÈÃÄã´ÓÊý¾ÝÖÐ×Ô¼º·¢Ã÷¹æÔò¡£¡£¡£¡£¡£

ÕâËÄÀೡ¾°ÁýÕÖÁ˴󲿷ÖÕæÊµÊÂÇéÖÐÐèÒªµÄѧϰÄÜÁ¦£º¶ÁÎĵµ¡¢Ñ§¹æÔò¡¢ÕÕÁ÷³Ì¡¢ÕÒ¼ÍÂÉ¡£¡£¡£¡£¡£ÕâÒ²ÊÇΪʲôCL-benchµÄЧ¹ûÔÆÔÆÁîÈ˵£ÐÄ£¬£¬ £¬£¬ £¬ÈôÊÇÄ£×ÓÁ¬ÕâЩ»ù±¾µÄѧϰʹÃü¶¼×öÇ·ºÃ£¬£¬ £¬£¬ £¬ËüÔÚÕæÊµÊÂÇ鳡¾°ÖеÄÌåÏÖ¿ÉÏë¶øÖª¡£¡£¡£¡£¡£

CL-benchµÄʹÃüÖÖ±ðÂþÑÜ

1

Ç°ÑØÄ£×ÓÕûÌå·­³µ

Ñо¿ÍŶÓÔÚCL-benchÉϲâÊÔÁËÊ®¸ö×îÏȽøµÄÓïÑÔÄ£×Ó£¬£¬ £¬£¬ £¬Ð§¹ûÏ൱Òõ»Þ¡£¡£¡£¡£¡£

ƽ¾ùʹÃü½â¾öÂÊÖ»ÓÐ17.2%¡£¡£¡£¡£¡£ÌåÏÖ×îºÃµÄGPT-5.1 (High)Ò²Ö»ÓÐ23.7%¡£¡£¡£¡£¡£ÒªÖªµÀ£¬£¬ £¬£¬ £¬ËùÓÐÍê³ÉʹÃüÐèÒªµÄÐÅÏ¢¶¼ÒѾ­Ã÷È·¸ø³öÁË£¬£¬ £¬£¬ £¬¾ÍÔÚÉÏÏÂÎÄÀ£¬ £¬£¬ £¬Ä£×ÓÈ´ÔÚ¾ø´ó´ó¶¼Ê¹ÃüÉÏʧ°ÜÁË¡£¡£¡£¡£¡£

ÂÛÎÄÏêϸÆÊÎöÁËʧ°ÜÔµ¹ÊÔ­ÓÉ£¬£¬ £¬£¬ £¬¼¸¸ö·¢Ã÷ÖµµÃ×¢ÖØ¡£¡£¡£¡£¡£

ºöÂÔ»òÎóÓÃÉÏÏÂÎÄÊǵ¼ÖÂʧ°ÜµÄÖ÷ÒªÔµ¹ÊÔ­ÓÉ¡£¡£¡£¡£¡£µ¼Ö¹ýʧµÄÖ÷ÒòÍùÍù²¢·ÇÐÅϢȱʧ£¬£¬ £¬£¬ £¬Ä£×Ó¶ÔÉÏÏÂÎÄÒªº¦Ï¸½ÚµÄºöÊÓ²ÅÊÇÖ¢½áËùÔÚ¡£¡£¡£¡£¡£¸üÓÐÒâ˼µÄÊÇ£¬£¬ £¬£¬ £¬ÔÚÐí¶àÇéÐÎÏ£¬£¬ £¬£¬ £¬Ä£×Ó»áÇãÏòÓÚʹÓÃËüÔÚԤѵÁ·½×¶Îѧµ½µÄ"ÀÏÂÄÀú"À´½â¾öʹÃü£¬£¬ £¬£¬ £¬×ÝÈ»ÉÏÏÂÎÄÃ÷È·½ç˵ÁËÐµĹæÔò¡¢¿´·¨»ò³ÌÐò£¬£¬ £¬£¬ £¬ËüÒ²²»È¥Ñ§Ï°ºÍʹÓᣡ£¡£¡£¡£Õâ¾ÍÏñÒ»¸öÍçÇ¿µÄÀÏÔ±¹¤£¬£¬ £¬£¬ £¬ÄþÔ¸ÓÃ×Ô¼ºµÄÀϲ½·¥£¬£¬ £¬£¬ £¬Ò²²»¿ÏÒâ¿´ÐÂÎĵµ¡£¡£¡£¡£¡£

³¤ÉÏÏÂÎÄ´¦Öóͷ£ºÍÖ¸Áî×ñÕÕÊÇÐëÒªµ«²»³ä·ÖÌõ¼þ¡£¡£¡£¡£¡£ÄÇЩÄÑÒÔ¿çÉÏÏÂÎÄ×·×ÙÒÀÀµ¹ØÏµ»òÄÑÒÔ׼ȷ×ñÕÕÔ¼ÊøµÄÄ£×Ó£¬£¬ £¬£¬ £¬ÌåÏÖȷʵ¸ü²î¡£¡£¡£¡£¡£µ«×ÝÈ»ÊÇÄܹ»´¦Öóͷ£³¤ÊäÈë¡¢¿É¿¿×ñÕÕÖ¸ÁîµÄÄ£×Ó£¬£¬ £¬£¬ £¬ÈÔÈ»ÔÚÐí¶àʹÃüÉÏʧ°Ü¡£¡£¡£¡£¡£Õâ˵Ã÷ÉÏÏÂÎÄѧϰÐèÒªµÄÄÜÁ¦£¬£¬ £¬£¬ £¬Ô¶²»Ö¹ÄÜ´¦Öóͷ£³¤Îı¾ºÍÄÜ¡°Ìý»°¡±¡£¡£¡£¡£¡£

¹éÄÉÍÆÀíÔ¶±ÈÑÝÒïÍÆÀíÄÑ¡£¡£¡£¡£¡£ÔÚ¿ÆÑ§¼ÒÀàʹÃüÉÏ£¬£¬ £¬£¬ £¬Ä£×ÓµÄÌåÏÖÏÔןü²î£¬£¬ £¬£¬ £¬Ê¹Ãü½â¾öÂÊͨ³£µÍÓÚ10%£¬£¬ £¬£¬ £¬²¢ÇÒЧ¹û²¨¶¯ºÜ´ó¡£¡£¡£¡£¡£´ÓÊý¾ÝÖз¢Ã÷¼ÍÂÉ£¬£¬ £¬£¬ £¬±ÈÓ¦Óøø¶¨µÄ¹æÔòÒªÄÑ¿°¶à¡£¡£¡£¡£¡£Õâ»òÐíÖ¸ÏòÁËÄ¿½ñ´óÄ£×Ӽܹ¹µÄÒ»¸ö¸ùÌìÐÔ¾ÖÏÞ¡£¡£¡£¡£¡£

±ðµÄ£¬£¬ £¬£¬ £¬ÂÛÎÄ»¹·¢Ã÷£¬£¬ £¬£¬ £¬¸ü¸ßµÄÍÆÀíÇ¿¶Èͨ³£ÄÜÌáÉýÉÏÏÂÎÄѧϰЧ¹û¡£¡£¡£¡£¡£ºÃ±ÈGPT-5.1ÔÚ¸ßÍÆÀíÇ¿¶ÈÉèÖÃÏ£¬£¬ £¬£¬ £¬ÔÚijЩʹÃüÉϵÄÌåÏÖÌáÉýÁËÔ¼6%¡£¡£¡£¡£¡£µ«ÆäËûÄ£×ÓÌáÉýÓÐÏÞÉõÖÁϽµ£¬£¬ £¬£¬ £¬ËµÃ÷µ¥¿¿¶àÏëÒ»»á¶ù²¢²»·ó£¬£¬ £¬£¬ £¬Ä£×Ó»¹±ØÐèÄÜ׼ȷÎüÊÕºÍ×éÖ¯ÉÏÏÂÎÄÐÅÏ¢¡£¡£¡£¡£¡£

1

Ҧ˳ÓêµÄÔ¤ÅÐ

2025Äê4Ô£¬£¬ £¬£¬ £¬Ò¦Ë³ÓêÔÚ²©ÎÄ¡¶The Second Half¡·ÖÐÌá³öÁËÒ»¸ö½¹µã¿´·¨£¬£¬ £¬£¬ £¬AIÉú³¤ÕýÔÚ´Ó¡°Éϰ볡¡±½øÈ롰ϰ볡¡±¡£¡£¡£¡£¡£Éϰ볡µÄÖ÷ÌâÊÇÔõÑùѵÁ·³ö¸üÇ¿µÄÄ£×Ó£¬£¬ £¬£¬ £¬¸ü´óµÄ²ÎÊý¡¢¸ü¶àµÄÊý¾Ý¡¢¸üÇ¿µÄËãÁ¦¡£¡£¡£¡£¡£Ï°볡µÄÖ÷ÌâÔò±äÁË£¬£¬ £¬£¬ £¬ÔõÑù½ç˵׼ȷµÄÎÊÌ⣬£¬ £¬£¬ £¬ÔõÑùÆÀ¹ÀÕæÕýµÄǰ½ø¡£¡£¡£¡£¡£

ËûдµÀ£¬£¬ £¬£¬ £¬ÆÀ¹À½«±ÈѵÁ·¸üÖ÷Òª¡£¡£¡£¡£¡£ÎÒÃDz»ÔÙÖ»ÊÇÎÊ¡°ÎÒÃÇÄÜѵÁ·³öÒ»¸öÄܽâ¾öXµÄÄ£×ÓÂ𡱣¬£¬ £¬£¬ £¬¶øÊÇÔÚÎÊ¡°ÎÒÃÇÓ¦¸ÃѵÁ·AIÈ¥×öʲô£¬£¬ £¬£¬ £¬ÒÔ¼°ÔõÑùȨºâÕæÕýµÄǰ½ø¡±¡£¡£¡£¡£¡£

ÔÚÒ»´Î·Ã̸ÖУ¬£¬ £¬£¬ £¬Ëû½øÒ»²½Ú¹ÊÍ£¬£¬ £¬£¬ £¬ÏÖÔÚÒªÁìµÄÎÊÌâÒÑ»ù±¾½â¾ö£¬£¬ £¬£¬ £¬ÕæÕýÖ÷ÒªµÄÊÇ£¬£¬ £¬£¬ £¬ÎÒÃÇÒªÓÃÕâ¸öͨÓÃÒªÁ죬£¬ £¬£¬ £¬½â¾öʲôÎÊÌ⣿£¿£¿£¿£¿£¿£¿£¿

CL-bench½ç˵ÁËʲôÎÊÌ⣿£¿£¿£¿£¿£¿£¿£¿Ëü½ç˵µÄÎÊÌâÊÇ£¬£¬ £¬£¬ £¬Ä£×ÓÄÜ·ñ´ÓÄ¿½ñÉÏÏÂÎÄÖÐѧϰ£¿£¿£¿£¿£¿£¿£¿£¿

Õâ¸öÎÊÌâ֮ǰ±»ºöÊÓÁË¡£¡£¡£¡£¡£ÐÐÒµµÄÒþº¬¼ÙÉèÊÇ£¬£¬ £¬£¬ £¬Ö»ÒªÉÏÏÂÎĸøµ½Î»£¨context engineering×öµÃºÃ£©£¬£¬ £¬£¬ £¬Ä£×Ó¾ÍÄÜÍê³ÉʹÃü¡£¡£¡£¡£¡£CL-benchµÄÊý¾ÝÍ»ÆÆÁËÕâ¸ö¼ÙÉ裬£¬ £¬£¬ £¬¸øµ½Î»£¬£¬ £¬£¬ £¬²»¼´ÊÇ×öµÃ¶Ô¡£¡£¡£¡£¡£ÉÏÏÂÎÄѧϰ£¬£¬ £¬£¬ £¬×÷ΪһÏî»ù´¡µÄÄ£×ÓÄÜÁ¦£¬£¬ £¬£¬ £¬±»ÑÏÖØµÍ¹ÀÁË¡£¡£¡£¡£¡£

Ҧ˳ÓêÔÚ2024ÄêÖ÷µ¼¹ýÁíÒ»¸öbenchmark£¬£¬ £¬£¬ £¬¦Ó-bench£¨ICLR 2025£©¡£¡£¡£¡£¡£Ë­È˲âÊÔ¹Ø×¢µÄÊÇAgentÄÜ·ñ×ñÕÕÁìÓò¹æÔò¡¢ÓëÓû§¾ÙÐжàÂÖ½»»¥¡£¡£¡£¡£¡£CL-benchÔò¸ü½øÒ»²½£¬£¬ £¬£¬ £¬²âµÄÊÇÄ£×ÓÄÜ·ñ´ÓÉÏÏÂÎÄѧϰÐÂ֪ʶ¡£¡£¡£¡£¡£Á½ÕßÅäºÏÖ¸ÏòÒ»¸öÅжϣ¬£¬ £¬£¬ £¬ÕæÊµÌìÏÂÐèÒªµÄÊÇѧϰÄÜÁ¦£¬£¬ £¬£¬ £¬¶ø·Ç×öÌâÄÜÁ¦¡£¡£¡£¡£¡£

CL-benchÂÛÎÄÔ­ÎÄÓÐÒ»¶Î»°ºÜ¾«×¼£º´óÓïÑÔÄ£×ÓÖ÷ÒªÒÀÀµ¡°²ÎÊý»¯ÖªÊ¶¡±£¬£¬ £¬£¬ £¬ÕâÊÇԤѵÁ·½×¶ÎѹËõ½øÄ£×ÓÈ¨ÖØµÄ¾²Ì¬Ó°Ï󡣡£¡£¡£¡£ÍÆÀíʱ£¬£¬ £¬£¬ £¬Ä£×Ó´ó¶àŲÓÃÕâЩ´æ´¢µÄÄÚ²¿ÖªÊ¶£¬£¬ £¬£¬ £¬¶ø·Ç×Ô¶¯ÖØÐÂÊäÈëÐÅÏ¢ÖÐÎüÊÕÑø·Ö¡£¡£¡£¡£¡£Òò´Ë£¬£¬ £¬£¬ £¬Ä¿½ñÓÅ»¯µÄÄ£×ÓÉÆÓÚÍÆÀíËüÃÇ¡°ÖªµÀ¡±µÄÊÂÇ飬£¬ £¬£¬ £¬µ«Óû§ÐèÒªµÄÊÇÈÃÄ£×Ó½â¾öÒÀÀµÓÚÔÓÂÒÇÒ¶¯Ì¬×ª±äµÄÉÏÏÂÎĵÄʹÃü¡£¡£¡£¡£¡£

1

ÐÐÒµÕýÔÚ±¬·¢Ê²Ã´×ª±ä

ÈôÊǰѽü¼¸ÄêAIÉú³¤µÄÖ÷ÐýÂÉ×ö¸ö¼òÆÓÊáÀí£¬£¬ £¬£¬ £¬´óÖÂÊÇÕâÑùµÄ£º2024ÄêµÄÖ÷ÐýÂÉÊÇScaling£¬£¬ £¬£¬ £¬¸ü´óµÄÄ£×Ó¡¢¸ü¶àµÄÊý¾Ý¡¢¸üÇ¿µÄËãÁ¦£»£»£»£»£»£»£»£»2025ÄêµÄÖ÷ÐýÂÉÊÇReasoning£¬£¬ £¬£¬ £¬ÒÔo1¡¢R1¡¢Deep ResearchΪ´ú±íµÄÍÆÀíÄÜÁ¦ÌáÉý¡£¡£¡£¡£¡£

ÄÇô2026ÄêÄØ£¿£¿£¿£¿£¿£¿£¿£¿CL-benchÖ¸ÏòÁËÒ»¸ö¿ÉÄܵÄÐÂÆ«Ïò£¬£¬ £¬£¬ £¬Context Learning¡£¡£¡£¡£¡£

´ÓPrompt Engineeringµ½Context LearningµÄÑݽøÂ·¾¶

ÓÐÒâ˼µÄÊÇ£¬£¬ £¬£¬ £¬Î÷·½´ó³§ÏÖÔÚÖ÷ÒªÔÚ½â¾öÁíÒ»¸öÎÊÌâ¡£¡£¡£¡£¡£AnthropicÔÚ2024Äêµ×Ðû²¼ÁËMCP£¨Model Context Protocol£©£¬£¬ £¬£¬ £¬OpenAIºÍGoogleËæºó¸ú½ø£¬£¬ £¬£¬ £¬Õâ¸öЭÒé±»³ÆÎªAI½çµÄUSB-C"£¬£¬ £¬£¬ £¬Ä¿µÄÊÇÈÃÄ£×Ó¸üÈÝÒ×½ÓÈëÍⲿ¹¤¾ßºÍÊý¾ÝÔ´¡£¡£¡£¡£¡£2025Äê12Ô£¬£¬ £¬£¬ £¬Anthropic¡¢OpenAIºÍBlockÍŽὨÉèÁËAgentic AI Foundation£¬£¬ £¬£¬ £¬½«MCP¾èÔù¸øLinux»ù½ð»á£¬£¬ £¬£¬ £¬Íƶ¯¿ªÔ´±ê×¼»¯¡£¡£¡£¡£¡£Í¬Ô£¬£¬ £¬£¬ £¬AnthropicÓÖÐû²¼ÁËAgent Skills¿ª·Å±ê×¼£¬£¬ £¬£¬ £¬ÈÃAIÄÜÖ´ÐиüÏêϸµÄʹÃü¡£¡£¡£¡£¡£

ÕâЩÆð¾¢½â¾öµÄ¶¼ÊÇÔõÑù°ÑcontextËͽøÄ£×ÓµÄÎÊÌ⣬£¬ £¬£¬ £¬ÔõÑùÈÃÄ£×Ó½ÓÈë¸ü´ó¶¼¾ÝÔ´£¬£¬ £¬£¬ £¬ÔõÑùÈÃÄ£×ÓŲÓøü¶à¹¤¾ß£¬£¬ £¬£¬ £¬ÔõÑùÈÃÄ£×ÓÖ´ÐиüÖØ´óµÄÊÂÇéÁ÷¡£¡£¡£¡£¡£

CL-benchÎʵÄÊÇ£¬£¬ £¬£¬ £¬ËͽøÈ¥Ö®ºó£¬£¬ £¬£¬ £¬Ä£×ÓÄÜѧ»áÂ𣿣¿£¿£¿£¿£¿£¿£¿

Anthropic×Ô¼ºµÄÑо¿Ò²´¥¼°ÁËÀàËÆÎÊÌâ¡£¡£¡£¡£¡£ËûÃÇÔÚ¹ØÓÚcontext engineeringµÄ²©ÎÄÖÐÌáµ½ÁËcontext rotÕ÷Ï󣬣¬ £¬£¬ £¬Ëæ×ÅÉÏÏÂÎij¤¶ÈÔöÌí£¬£¬ £¬£¬ £¬Ä£×ÓÕÙ»ØÐÅÏ¢µÄÄÜÁ¦»áϽµ¡£¡£¡£¡£¡£µ«CL-benchÕ¹ÏÖµÄÎÊÌâÊÇ¡£¡£¡£¡£¡£×ÝÈ»ÉÏÏÂÎIJ»³¤£¬£¬ £¬£¬ £¬Ä£×ÓÒ²·×Æç¶¨ÄÜ¡°Ñ§»á¡±ÄÚÀïµÄÐÂ֪ʶ¡£¡£¡£¡£¡£ÕâÊÇѧϰÄÜÁ¦£¬£¬ £¬£¬ £¬Óë¼ìË÷Î޹ء£¡£¡£¡£¡£

ÂÛÎÄÔÚÕ¹Íû²¿·ÖÌáµ½ÁËÒ»¸ö¸üÔ¶µÄÌôÕ½£¬£¬ £¬£¬ £¬×ÝÈ»ÉÏÏÂÎÄѧϰÄÜÁ¦ÌáÉýÁË£¬£¬ £¬£¬ £¬ËüÈÔÈ»ÊÇ¡°»áÏûÊŵġ±£¨ephemeral£©£¬£¬ £¬£¬ £¬ÉÏÏÂÎÄ´°¿ÚÇå¿Õ£¬£¬ £¬£¬ £¬Ñ§µ½µÄ¹¤¾ß¾ÍûÁË¡£¡£¡£¡£¡£ÏÂÒ»²½µÄÌôÕ½ÊÇMemory Consolidation£¨Ó°ÏóÀο¿£©£¬£¬ £¬£¬ £¬ÔõÑùÈôÓÉÏÏÂÎÄÖÐѧµ½µÄ֪ʶ³¤ÆÚ»¯£¿£¿£¿£¿£¿£¿£¿£¿Õâ¿ÉÄÜÊÇ2026ÄêÖ®ºóµÄÐÂÕ½³¡¡£¡£¡£¡£¡£

1

Õâ¶ÔÌÚѶÒâζ×Åʲô

Ҧ˳ÓêÈëÖ°ÌÚѶºóÖ÷µ¼µÄµÚÒ»¸öÑо¿Êä³ö£¬£¬ £¬£¬ £¬ËûÑ¡ÔñÓÃÒ»¸öbenchmarkÖØÐ½ç˵ÎÊÌâ¡£¡£¡£¡£¡£

ÏÖÔÚÌÚѶ»ìÔªÔÚº£ÄÚ´óÄ£×ÓÊг¡µÄ·Ý¶î²¢²»ÁìÏÈ£¬£¬ £¬£¬ £¬×Ö½Ú¶¹°ü¡¢°¢ÀïͨÒåÅÅÔÚÇ°Ãæ¡£¡£¡£¡£¡£ÔÚÕâ¸öÊ±ÊÆÏ£¬£¬ £¬£¬ £¬ÌÚѶѡÔñ¹Ø×¢Ò»¸ö¸ü»ù´¡µÄÎÊÌ⣺ģ×ÓµÄѧϰÄÜÁ¦¡£¡£¡£¡£¡£

Õâ¸öÑ¡Ôñ¿ÉÄܺÍÌÚѶµÄÓªÒµ»ùÒòÓйء£¡£¡£¡£¡£ÌÚѶÊÇÉç½»ºÍÓÎÏ·¾ÞÍ·£¬£¬ £¬£¬ £¬Æä½¹µãӪҵʵÖʾÍÊǺ£Á¿µÄ¡°¶¯Ì¬ÉÏÏÂÎÄ¡±£¬£¬ £¬£¬ £¬Ì¸Ìì¼Í¼¡¢ÓÎϷ״̬¡¢Óû§ÐÐΪ¡£¡£¡£¡£¡£Ò¦Ë³ÓêÇ¿µ÷Context Learning£¬£¬ £¬£¬ £¬¿ÉÄÜÊÇÔÚΪÌÚѶ×î½¹µãµÄÓªÒµ³¡¾°´òµØ»ù£¬£¬ £¬£¬ £¬ÈÃAI¶Á¶®´ËʱÏÖÔÚµÄÓû§£¬£¬ £¬£¬ £¬¶ø²»ÊÇͨ¹ýԤѵÁ·¶Á¶®ÒÑÍùµÄÓû§¡£¡£¡£¡£¡£

ËûÈëÖ°ºó˵¹ý£ºÌÚѶTo C»ùÒò¸üÇ¿£¬£¬ £¬£¬ £¬ÒªË¼Ë÷ÔõÑùÈôóÄ£×Ó¸øÓû§Ìṩ¸ü¶à¼ÛÖµ¡£¡£¡£¡£¡£Ðí¶àʱ¼äÐèÒªµÄ²»ÊǸü´óÄ£×Ó¡¢¸üÇ¿µÄÇ¿»¯Ñ§Ï°£¬£¬ £¬£¬ £¬¶øÊÇÌØÁíÍâContext¡£¡£¡£¡£¡£

Õâ»òÐí²ÅÊÇAIÕæÕý½øÈëÈËÀàÉç»áµÄÃÅÆ±£¬£¬ £¬£¬ £¬²»ÔÙ×öÒ»¸ö²©Ñ§µÄÅÔ¹ÛÕß¡£¡£¡£¡£¡£

µã¸ö¡°°®ÐÄ¡±£¬£¬ £¬£¬ £¬ÔÙ×ßÁÉÄþ¹Ø¶«ÀäÁ´ÎïÁ÷ÓÐÏÞ¹«Ë¾°É