Çå»ªÍ»ÆÆÇ¿»¯Ñ§Ï°Çå¾²ÐÔã£ÂÛ£¬ £¬£¬£¬£¬£¬14Ïî²âÊÔ»ù׼ʹÃüÈ«SOTA
2026-03-04 05:10:08

ÐÂÖÇÔª±¨µÀ

±à¼­£ºLRST

¡¾ÐÂÖÇÔªµ¼¶Á¡¿Ç廪´óѧÀîÉý²¨½ÌÊÚÍŶÓÌá³öRACSËã·¨£¬ £¬£¬£¬£¬£¬Í¨¹ýÒýÈ롸̽ÏÕÕß¡¹Õ½ÂÔ×Ô¶¯Ì½Ë÷Î¥¹æ½çÏߣ¬ £¬£¬£¬£¬£¬ÆÆ½âÇ徲ǿ»¯Ñ§Ï°µÄ¡¸Çå¾²ÐÔã£ÂÛ¡¹¡£¡£¡£¡£¡£¸ÃÒªÁìÔÚ²»ÔöÌí²ÉÑù±¾Ç®µÄÌõ¼þÏ£¬ £¬£¬£¬£¬£¬ÏÔÖøÌáÉýÎ¥¹æÑùʵÖÊÁ¿ÓëϵͳÇå¾²ÈÏÖª£¬ £¬£¬£¬£¬£¬ÊµÏÖÇå¾²ÓëÐÔÄܵÄ˫Ӯ£¬ £¬£¬£¬£¬£¬Ë¢Ð¶àÏî»ù×¼µÄSOTAЧ¹û¡£¡£¡£¡£¡£

Ëæ×ÅÇ¿»¯Ñ§Ï°£¨RL£©ÔÚÐéÄâÌìϵÄͳÖμ¶ÌåÏÖ£¬ £¬£¬£¬£¬£¬½«ÆäǨáãÖÁ×Ô¶¯¼ÝÊ»¡¢»úеÈË¿ØÖƵÈÕæÊµÎïÀíϵͳÒѳÉΪÐÐÒµ¹²Ê¶¡£¡£¡£¡£¡£È»¶ø£¬ £¬£¬£¬£¬£¬ÎïÀíÌìϵĸßΣº¦ÌØÕ÷»­³öÁËÒ»µÀ²»¿ÉÓâÔ½µÄºìÏß¡ª¡ª¡¸ÁãÔ¼ÊøÎ¥·´¡¹¡£¡£¡£¡£¡£

ΪÁËÊØ×¡ÕâµÀºìÏߣ¬ £¬£¬£¬£¬£¬Ñ§½çÌá³öÁ˶àÖּƻ®£ºOpenAIÍŽáÀ­¸ñÀÊÈÕ³Ë×Ó·¨¶¯Ì¬È¨ºâÇå¾²ÓëÐÔÄÜ£¬ £¬£¬£¬£¬£¬UC BerkeleyÌá³öµÄCPOË㷨ʹÓÃÐÅÍÐÓò½«Õ½ÂÔÏÞÖÆÔÚ¿ÉÐпռäÄÚ¡£¡£¡£¡£¡£

È»¶ø£¬ £¬£¬£¬£¬£¬ÏÖÓÐÒªÁìʼÖÕÃæÁÙÒ»¸ö½¹µãÍ´µã£ºÕ½ÂÔÄÑÒÔ×öµ½ÑÏ¿áµÄ¡¸ÁãÎ¥·´¡¹¡£¡£¡£¡£¡£´ó´ó¶¼Ëã·¨Ö»Äܽ«Î¥¹æ¿ØÖÆÔÚ¼«µÍˮƽ£¬ £¬£¬£¬£¬£¬Ò»µ©ÊÔͼ׷Çó¾ø¶ÔµÄÁãÎ¥¹æ£¬ £¬£¬£¬£¬£¬¾Í»áÔâÓöÖØ´ó×èÁ¦¡£¡£¡£¡£¡£

Ç廪´óѧÀîÉý²¨½ÌÊÚ¿ÎÌâ×éÓÚÇ徲ǿ»¯Ñ§Ï°ÁìÓò»ñµÃÍ»ÆÆÐÔÏ£Íû£¬ £¬£¬£¬£¬£¬Ê×´ÎÔÚÀíÂÛ²ãÃæÕ¹ÏÖ²¢Ö¤ÊµÎúÇ徲ǿ»¯Ñ§Ï°£¨Safe RL£©ÖеÄÒ»¸ö·´Ö±¾õÕ÷Ï󡪡ª¡¸Çå¾²ÐÔã£ÂÛ¡¹£¨Safety Paradox£©£ºÕ½ÂÔÔ½×·ÇóÇå¾²£¬ £¬£¬£¬£¬£¬·´¶ø¿ÉÄÜÔ½²»Çå¾²¡£¡£¡£¡£¡£

ÂÛÎÄÁ´½Ó£ºhttps://openreview.net/forum?id=BHSSV1nHvU

´úÂë¿ÍÕ»£ºhttps://github.com/yangyujie-jack/Feasible-Dual-Policy-Iteration

ÔÚÇ徲ǿ»¯Ñ§Ï°ÖУ¬ £¬£¬£¬£¬£¬ÖÇÄÜÌåͨ³£ÒÀÀµ½»»¥Êý¾Ýѧϰһ¸ö¿ÉÐÐÐÔº¯Êý£¨Feasibility Function£©£¬ £¬£¬£¬£¬£¬ÒÔ´ËÅжÏÄ¿½ñ״̬ÊÇ·ñºã¾ÃÇå¾²£¬ £¬£¬£¬£¬£¬´Ó¶ø¹æ±ÜΣÏÕÇøÓò¡£¡£¡£¡£¡£

È»¶ø£¬ £¬£¬£¬£¬£¬Ñо¿Í¨¹ýÑÏ¿áµÄÀíÂÛ֤ʵչÏÖÁËÒ»¸öÑÏËàÊÂʵ£º

Ëæ×ÅÕ½ÂÔ±äµÃÔ½À´Ô½Çå¾²£¬ £¬£¬£¬£¬£¬Æä±¬·¢µÄÎ¥¹æÑù±¾»á±äµÃ¼«¶ËÏ£º±¡£¡£¡£¡£¡£ÕâÖ±½Óµ¼Ö¿ÉÐÐÐÔº¯ÊýµÄÔ¤¼ÆÎó²î¼±¾çÔö´ó£¬ £¬£¬£¬£¬£¬½ø¶øÊ¹Ö¸µ¼Õ½ÂÔÓÅ»¯µÄÔ¼Êøº¯Êý·ºÆðÎó²î£¬ £¬£¬£¬£¬£¬×îÖÕµ¼ÖÂÕ½ÂÔÇå¾²ÐÔ±ÀËú¡£¡£¡£¡£¡£

Õâ¾ÍÏñÒ»¸ö´Óδ¼û¹ýÐüѵÄÈË£¬ £¬£¬£¬£¬£¬ÔÚÐÐ×ßʱ¼´±ãÔÙÕ½Õ½¾¤¾¤£¬ £¬£¬£¬£¬£¬Ò²»áÓÉÓÚȱ·¦¶Ô¡¸ÐüѱßÑØ¡¹¼òÖ±ÇÐÈÏÖª£¬ £¬£¬£¬£¬£¬¶øÎÞ·¨¾«×¼ÅжÏΣÏÕ½çÏßÊÂʵÔÚÄÇÀï¡£¡£¡£¡£¡£Ô½ÊÇ¿ÌÒâ×·ÇóÇå¾²£¬ £¬£¬£¬£¬£¬¶ÔΣÏÕ½çÏßµÄÈÏÖª¾ÍԽģºý£¬ £¬£¬£¬£¬£¬×îÖÕ·´¶øµ¼ÖÂÇå¾²·ÀµØÊ§Ð§¡£¡£¡£¡£¡£ Õâ¾ÍÊÇËùνµÄ¡¸Çå¾²ÐÔã£ÂÛ¡¹¡ª¡ªÕ½ÂÔÏÝÈëÁËÒ»¸ö×ÔÎÒ´ì°ÜµÄËÀÑ­»·¡£¡£¡£¡£¡£

Õë¶ÔÕâÒ»Äæ¾³£¬ £¬£¬£¬£¬£¬ÍŶÓÌá³öÁËRegion-wise Actor-Critic-Scenery£¨RACS£©Ëã·¨£¬ £¬£¬£¬£¬£¬Í¨¹ýÒýÈëרÃÅÍøÂçÎ¥¹æÑù±¾µÄ¡¸Ì½ÏÕÕß¡¹Õ½ÂÔ£¬ £¬£¬£¬£¬£¬ÀÖ³ÉÍ»ÆÆã£ÂÛ£¬ £¬£¬£¬£¬£¬ÔÚȨÍþ»ù×¼Safety-GymnasiumÉÏË¢ÐÂÁËSOTAЧ¹û£¬ £¬£¬£¬£¬£¬¸ÃÊÂÇé½ÒÏþÓÚÈ˹¤ÖÇÄܶ¥»áICLR 2026¡£¡£¡£¡£¡£

ÆÆ¾ÖÖ®µÀRACSËã·¨

¼ÈÈ»¡¸²»¸ÒÔ½À׳ØÒ»²½¡¹»áµ¼ÖÂÈÏÖªÃ¤Çø£¬ £¬£¬£¬£¬£¬ÄÇÃ´ÆÆ½âÖ®µÀ¼´ÊÇ×Ô¶¯Ì½ÏÕ¡¢Ö±ÃæÎ£ÏÕ¡£¡£¡£¡£¡£

Ñо¿ÍŶÓÌá³öÁËRegion-wise Actor-Critic-Scenery£¨RACS£©Ëã·¨£¬ £¬£¬£¬£¬£¬´´Á¢ÐÔµØÒýÈëÁË˫սÂԼܹ¹£º

£¨1£©Ô­Ê¼Õ½ÂÔ£¨Primal Policy£©£ºÊÎÑÝ¡¸ÊعæÔòµÄÖ´ÐÐÕß¡¹¡£¡£¡£¡£¡£ËüÈÏÕæÔÚÖª×ãÇå¾²Ô¼ÊøµÄÌõ¼þÏ£¬ £¬£¬£¬£¬£¬¾¡¿ÉÄÜ×î´ó»¯Ê¹Ãü½±Àø¡£¡£¡£¡£¡£

£¨2£©¶ÔżսÂÔ£¨Dual Policy£©£ºÊÎÑÝ¡¸ÎÞηµÄ̽ÏÕÕß¡¹¡£¡£¡£¡£¡£ËüµÄÄ¿µÄÓëǰÕßÏà·´£¬ £¬£¬£¬£¬£¬Ö¼ÔÚÕ½ÂÔÐÔµØ×î´ó»¯Ô¼ÊøÎ¥·´£¬ £¬£¬£¬£¬£¬×Ô¶¯´¥Ì½Ô­Ê¼Õ½ÂÔ²»¸ÒÉæ×ãµÄΣÏÕ½çÏß¡£¡£¡£¡£¡£

ͨ¹ýÕâÖÖ¡¸×óÓÒ»¥²«¡¹µÄ»úÖÆ£¬ £¬£¬£¬£¬£¬RACSÔÚ²»ÔöÌí×ܲÉÑù±¾Ç®µÄÌõ¼þÏ£¬ £¬£¬£¬£¬£¬ÏÔÖøÌáÉýÁËÒªº¦Î¥¹æÑù±¾µÄ±ÈÀý£¬ £¬£¬£¬£¬£¬´Ó¶øÈÃϵͳ¶Ô¡¸Çå¾²½çÏß¡¹ÓÐÁËÇåÎú¡¢¾«×¼µÄÈÏÖª¡£¡£¡£¡£¡£

ΪÏàʶ¾ö˫սÂÔÊý¾Ý»ìÏý´øÀ´µÄÂþÑÜÆ«ÒÆ£¨Distributional Shift£©ÎÊÌ⣬ £¬£¬£¬£¬£¬RACS½ÓÄÉÁËÖ÷ÒªÐÔ²ÉÑù£¨Importance Sampling£©ÊÖÒÕ¾ÙÐÐÊýѧÐÞÕý£¬ £¬£¬£¬£¬£¬²¢Ô¼Êø¶ÔżսÂÔÓëԭʼսÂÔ¼äµÄKLÉ¢¶È£¬ £¬£¬£¬£¬£¬È·±£ÑµÁ·Àú³ÌµÄƽÎÈÊÕÁ²¡£¡£¡£¡£¡£

ʵÑéЧ¹û£ºË¢ÐÂSOTA

Ñо¿ÍŶÓÔÚÇ徲ǿ»¯Ñ§Ï°È¨Íþ»ù×¼Safety-GymnasiumÉϾÙÐÐÁËÆÕ±éÑéÖ¤¡£¡£¡£¡£¡£Ð§¹ûÅú×¢£¬ £¬£¬£¬£¬£¬RACSÔÚ14ÏîʹÃüÖеÄ×ÛºÏÐÔÄִܵïÁËState-of-the-art£¨SOTA£©Ë®Æ½£º

£¨1£©Çå¾²ÐÔÏÔÖøÌáÉý£ºRACSʵÏÖÁË×îµÍµÄƽ¾ùÔ¼ÊøÎ¥·´´ÎÊý£¨Cost£©£¬ £¬£¬£¬£¬£¬ÏÔÖøÓÅÓÚÏÖÓеÄÀ­¸ñÀÊÈÕ³Ë×Ó·¨»òÐÅÍÐÓòÒªÁì¡£¡£¡£¡£¡£ÌØÊâÊÇÔÚHalfCheetahVelocity¡¢Walker2dVelocityµÈʹÃüÖУ¬ £¬£¬£¬£¬£¬ÊµÏÖÁËÑÏ¿áµÄÁãÔ¼ÊøÎ¥·´¡£¡£¡£¡£¡£

£¨2£©¿ØÖÆÐÔÄÜÎÞÍË»¯£ºÔÚ°ü¹ÜÇå¾²ÐÔµÄͬʱ£¬ £¬£¬£¬£¬£¬RACSµÄƽ¾ùÀÛ»ý»Ø±¨£¨Return£©ÒÀȻλ¾Ó°ñÊ×£¬ £¬£¬£¬£¬£¬ÊµÏÖÁËÇå¾²ÓëÐÔÄܵÄ˫Ӯ¡£¡£¡£¡£¡£ÔÚ¸ßάµÄHumanoidVelocity¡¢ÖØ´óµÄPointPush£¨ÍÆÏä×Óµ¼º½±ÜÕÏ£©µÈ¶àÏî¸ßÄѶÈʹÃüÖУ¬ £¬£¬£¬£¬£¬Çå¾²Ö¸±êÓëʹÃüÐÔÄܾùѹµ¹Ò»ÇС£¡£¡£¡£¡£

Ϊ̽ÌÖÐÔÄÜÌáÉýµÄ»ù´¡Ôµ¹ÊÔ­ÓÉ£¬ £¬£¬£¬£¬£¬Ñо¿ÍŶÓͳ¼ÆÁËÔöÌí¶ÔżսÂÔºóµÄÒªº¦Ö¸±êת±ä£º

£¨1£©Î¥¹æÑù±¾ÏÔÖøÔöÌí£ºÔÚËùÓÐ 14 ÏîʹÃüÖУ¬ £¬£¬£¬£¬£¬¶ÔżսÂÔÀÖ³ÉÊÕÂÞÁË´ó×ڸ߼ÛÖµµÄÎ¥¹æÑù±¾£¬ £¬£¬£¬£¬£¬´ó²¿·ÖʹÃüÖеÄÑù±¾Á¿ÌáÉýÁËÒ»¸öÊýÄ¿¼¶¡£¡£¡£¡£¡£

£¨2£©Ô¤¼ÆÎó²î´ó·ù½µµÍ£ºÍ³¼ÆÏÔʾ£¬ £¬£¬£¬£¬£¬¿ÉÐÐÐÔº¯ÊýµÄÄâºÏÎó²îÏÔÖø¼õС£¡£¡£¡£¡£¬ £¬£¬£¬£¬£¬ÓÈÆäÊÇ¡¸µÍ¹ÀΣº¦¡¹£¨Îó²îСÓÚÁ㣩µÄƵÂÊ´ó·ù½µµÍ¡£¡£¡£¡£¡£ÕâÒâζ×Åϵͳ²»ÔÙ½«Î£ÏÕ״̬ÎóÅÐΪÇå¾²£¬ £¬£¬£¬£¬£¬´Ó¶ø´Ó»ù´¡ÉÏÌáÉýÁËÕ½ÂÔµÄÇå¾²ÐÔ¡£¡£¡£¡£¡£

×ܽáÓëÕ¹Íû

¸ÃÑо¿´ÓÀíÂÛÉÏÕ¹ÏÖÁËÇ¿»¯Ñ§Ï°Öеġ¸Çå¾²ÐÔã£ÂÛ¡¹£¬ £¬£¬£¬£¬£¬ÆÊÎöÁËÎ¥¹æÑù±¾Ï£º±ÐÔÓë¿ÉÐÐÐÔº¯ÊýÔ¤¼ÆÎó²îÖ®¼äµÄÄÚÔÚÒò¹û¡£¡£¡£¡£¡£

RACSË㷨ͨ¹ý¶ÔżսÂԵġ¸¶Ô¿¹Ê½¡¹Ì½Ë÷Í»ÆÆÁË¡¸Çå¾²ÐÔã£ÂÛ¡¹£¬ £¬£¬£¬£¬£¬Ö¤ÊµÎúÒ»¸öÉî¿ÌµÄÔ­Àí£ºÎªÁËÕæÕýµÄÇå¾²£¬ £¬£¬£¬£¬£¬±ØÐè³ä·ÖµØÁËÊãÄÑÏÕ¡£¡£¡£¡£¡£

¸ÃÑо¿Îª×Ô¶¯¼ÝÊ»¡¢»úеÈ˵ȸßΣº¦³¡¾°ÏµÄÇ¿»¯Ñ§Ï°Â䵨ÌṩÁ˼áʵµÄÀíÂÛ»ù´¡ÓëÓÐÓõĽâ¾ö¼Æ»®¡£¡£¡£¡£¡£

²Î¿¼×ÊÁÏ£º

https://openreºÓÄÏÖÐÔ­ÖÆÔ켯ÍÅÓÐÏÞ¹«Ë¾view.net/forum?id=BHSSV1nHvU