°¢Àï¸ßµÂÐû²¼SpatialGenEval£¬£¬£¬£¬£¬£¬£¬½ÒÃØË­²ÅÊÇÕæÕýµÄÎÄÉúͼ¾Þ½³
2026-03-01 06:19:02

Ö»¹ÜÏÖÔÚÎÄÉúͼģ×Ó£¨Text-to-Image Models£©ÔÚÌìÉú¸ß±£ÕæÍ¼ÏñÉÏÌåÏÖ׿Խ£¬£¬£¬£¬£¬£¬£¬µ«ÔÚÓ¦¶Ô¿Õ¼ä¸ÐÖª¡¢¿Õ¼äÂß¼­ÍÆÀí¼°¶àÄ¿µÄ¿Õ¼ä½»»¥µÈÌùºÏÏÖʵ³¡¾°µÄÖØ´ó¿Õ¼äÖÇÄÜʹÃüʱÍùÍùÁ¦ÓÐδ´þ¡£¡£¡£¡£¡£ÏÖÓÐÆÀ¹À»ù×¼Ö÷ÒªÒÀÀµ¼ò¶Ì»òÐÅϢϣº±µÄÌáÐÑ´Ê£¬£¬£¬£¬£¬£¬£¬ÄÑÒÔÁýÕÖÖØ´óµÄ¿Õ¼äÂß¼­£¬£¬£¬£¬£¬£¬£¬µ¼ÖÂÄ£×ÓÔÚÕâЩҪº¦¿Õ¼äÖÇÄÜά¶ÈÉϵÄÄÜÁ¦È±Ïݱ»ÑÏÖØµÍ¹À¡£¡£¡£¡£¡£

À´×Ô°¢Àï¸ßµÂµÄһƪ×îРICLR 2026 ÖиåÂÛÎÄ¡¶Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models¡·Ìá³öÁËÃæÏòÎÄÉúͼ¿Õ¼äÖÇÄܵÄϵͳÐÔÆÀ¹À»ù×¼ SpatialGenEval£¬£¬£¬£¬£¬£¬£¬Ö¼ÔÚͨ¹ý³¤Îı¾¡¢¸ßÐÅÏ¢ÃÜ¶ÈµÄ T2I prompt Éè¼Æ£¬£¬£¬£¬£¬£¬£¬ÒÔ¼°Î§Èƿռä¸ÐÖª¡¢¿Õ¼äÍÆÀíºÍ¿Õ¼ä½»»¥µÄ 10 ´ó¿Õ¼äÖÇÄÜÄÜÁ¦Î¬¶ÈÉè¼Æ£¬£¬£¬£¬£¬£¬£¬ÉîÈë̽²âÎÄÉúͼģ×ӵĿռäÖÇÄÜÄÜÁ¦½çÏß¡£¡£¡£¡£¡£

SpatialGenEval ½«Éúͼ¿Õ¼äÖÇÄÜÄÜÁ¦Ï¸·ÖΪ 4 ´óά¶È£¬£¬£¬£¬£¬£¬£¬10 ¸ö×Óά¶È£¬£¬£¬£¬£¬£¬£¬ÁýÕÖ 25 ¸öÏÖʵӦÓó¡¾°£¬£¬£¬£¬£¬£¬£¬»ùÓÚ 23 ¸ö SOTA Ä£×ӵįÀ¹ÀЧ¹ûÅúעĿ½ñÄ£×ӵĿռäÖÇÄÜÄÜÁ¦ÈÔÓдý´ó·ùÌáÉý

ÂÛÎÄÎÊÌ⣺Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image ModelsÂÛÎÄÁ´½Ó£ºhttps://arxiv.org/abs/2601.20354ÂÛÎÄ´úÂ룺https://github.com/AMAP-ML/SpatialGenEval

½¹µãÌôÕ½£ºÏÖÔÚ T2I Ä£×Ó¿Õ¼äÈÏÖª ¡°Ç³±í»¯¡± ÓëÂß¼­È±Ê§

ÏÖÓÐÎÄÉúͼģ×ÓËäÈ»Äܹ»ºÜºÃµØÍê³É ¡°ÌìÉúʲô¡±£¨What£©µÄÓïÒå¶ÔÆë£¬£¬£¬£¬£¬£¬£¬µ«ÔÚ´¦Öóͷ£ ¡°¿Õ¼äλÖÃÔÚÄÇÀ£¨Where£©¡¢¡°¿Õ¼äÔõÑùÅÅÁС±£¨How£©ÒÔ¼°ÎïÀíÌìÏÂÖÐµÄ ¡°¿Õ¼ä½»»¥Âß¼­¡±£¨Why£©Ê±£¬£¬£¬£¬£¬£¬£¬ÃæÁÙ×Å´Ó ¡°»ù´¡¸ÐÖª¡± µ½ ¡°¸ß½×ÍÆÀí¡± µÄÈ«·½Î»ÌôÕ½£¬£¬£¬£¬£¬£¬£¬°üÀ¨£º

1. ¿Õ¼ä»ù´¡µÄ ¡°ÊôÐÔÆ¯ÒÆ¡± Óëȱʧ£ºÄ£×ÓËäÈ»ÄÜ»­³öÎïÌ壬£¬£¬£¬£¬£¬£¬µ«ÔÚÐÅÏ¢÷缯ÌáÐÑ´ÊÏ£¬£¬£¬£¬£¬£¬£¬³£·ºÆðÎïÌå©»­»òÊôÐ԰󶨹ýʧ£¬£¬£¬£¬£¬£¬£¬ÎÞ·¨Î¬³Ö ¡°ÍòÎï¸÷˾ÆäÖ°¡± µÄ»ù´¡¶ÔÆëÄÜÁ¦¡£¡£¡£¡£¡£

2. ¿Õ¼ä¸ÐÖªµÄ ¡°¼¸ºÎ˽¼û¡±£ºÔÚ´¦Öóͷ£ÎïÌå׼ȷλÖᢳ¯Ïò¼°Ìض¨ÅÅÁнṹʱ£¬£¬£¬£¬£¬£¬£¬Ä£×ÓÍùÍùÇãÏòÓÚÌìÉú ¡°Ä¬ÈÏ×Ë̬¡±£¨ÈçÕýÃæÊÓͼ£©£¬£¬£¬£¬£¬£¬£¬ÄÑÒÔ¿çÔ½ 2D »­²¼ÊµÏÖ¾«×¼µÄ¿Õ¼ä¶¨Î»¡£¡£¡£¡£¡£

3. ¿Õ¼äÍÆÀíµÄ ¡°Âß¼­Ã¤Çø¡±£ºÕâÊÇÄ¿½ñÄ£×Ó×î´óµÄ¶Ì°å¡£¡£¡£¡£¡£ÔÚÉæ¼°Ïà¶ÔÊýÖµ½ÏÁ¿£¨Èç ¡°ºìÒαÈÀ¶ÒδóÁ½±¶¡±£©¡¢3D ÕÚµ²¹ØÏµ¼°ÎïÀí¾àÀëÏà½üÐÔʱ£¬£¬£¬£¬£¬£¬£¬Ä£×ӵ÷ֿ¿½üËæ»úÍÆ²â£¬£¬£¬£¬£¬£¬£¬Åú×¢Æäȱ·¦¶ÔÕæÊµÎïÀíÌìϲ㼶ºÍÉî¶ÈµÄÈÏÖª¡£¡£¡£¡£¡£

4. ¿Õ¼ä½»»¥µÄ ¡°¶¯Ì¬Ê§Õ桱£ºÄ£×ÓÄÑÒÔ²¶»ñÎïÌå¼äµÄ¶¯Ì¬Ë²¼ä£¨ÈçÌøÔ¾ÖеÄ×ãÇò£©»òÎïÀíÒò¹ûÂß¼­£¨Èçײ»÷µ¼ÖÂµÄÆÆË飩£¬£¬£¬£¬£¬£¬£¬ÎÞ·¨½«Ç±²ØµÄÎïÀí¶¯Á¦Ñ§×ª»¯ÎªÂß¼­×ÔÇ¢µÄÊÓ¾õͼÏñ¡£¡£¡£¡£¡£

ÉÏ£ºµ±ËÞÊÀ³ÉÄ£×ÓÔÚ¸ÐÖª¡¢ÍÆÀíºÍ½»»¥ÉϵĹýʧÑùÀý£»£» £»£»£»£»£»£»Ï£ºÄ¿½ñÆÀ¹À»ù×¼±£´æÐÅϢϣº± / ´ÖÁ£¶È yes-or-no ÆÀ¹À

SpatialGenEval£ºÉæ¼°¿Õ¼ä»ù´¡¡¢¸ÐÖª¡¢ÍÆÀíºÍ½»»¥µÄ¿Õ¼äÖÇÄÜ ¡°È«¿ÆÉ¨Ã衱

ΪÁËϵͳ»¯µØ½ç˵ºÍÆÀ¹ÀÎÄÉúͼģ×Ó ¡°¿Õ¼äÖÇÄÜ¡± ÄÜÁ¦£¬£¬£¬£¬£¬£¬£¬Ñо¿ÍŶӹ¹½¨ÁËÒ»¸öÌõÀí»¯¿ò¼Ü£¬£¬£¬£¬£¬£¬£¬½«¿Õ¼äÖÇÄܽ⹹Ϊ 4 ´óÁìÓò¼° 10 ¸öÒªº¦×Óά¶È£º

1. ¿Õ¼ä»ù´¡ (S1/S2)£º¶àÄ¿µÄÎïÌåÖÖ±ð£¨S1£©¡¢¶àÄ¿µÄÊôÐ԰󶨣¨S2£©¡£¡£¡£¡£¡£

2. ¿Õ¼ä¸ÐÖª (S3/S4/S5)£º¿Õ¼äλÖã¨S3£©¡¢¿Õ¼ä³¯Ïò£¨S4£©Óë¿Õ¼ä½á¹¹£¨S5£©¡£¡£¡£¡£¡£

3. ¿Õ¼äÍÆÀí (S6/S7/S8)£º¿Õ¼ä¾Þϸ / ³¤¶È / ¸ß°«µÈ½ÏÁ¿£¨S6£©¡¢¿Õ¼äÏà½üÐÔ£¨S7£©Óë¿Õ¼äλÖÃÕÚµ²£¨S8£©¡£¡£¡£¡£¡£

4. ¿Õ¼ä½»»¥ (S9/S10)£º¿Õ¼äÔ˶¯½»»¥£¨S9£©Óë¿Õ¼äÒò¹û½»»¥£¨S10£©¡£¡£¡£¡£¡£

¸Ã»ù×¼²âÊÔÁýÕÖ×ÔÈ»¡¢ÊÒÄÚ¡¢»§Íâ¡¢ÈËÀàÔ˶¯¼°ÒÕÊõÉè¼ÆµÈ 25 ¸öÏÖʵÌìϳ¡¾°£¬£¬£¬£¬£¬£¬£¬ÎªÆäÈ«ÐĹ¹½¨ÁË 1,230 Ìõ ³¤Îı¾¡¢ÐÅÏ¢÷缯ÐÍÌáÐÑ´Ê¡£¡£¡£¡£¡£Ã¿¸öÌáÐѴʾùÉî¶ÈÈÚºÏÁËÉÏÊö´Ó»ù´¡ÊôÐÔ¡¢½á¹¹µ½¸ß½×ÕÚµ²¡¢Òò¹ûÍÆÀíµÈ 10 ¸ö¿Õ¼ä×ÓÁìÓò¼°¶ÔӦȫά¶ÈÎʴ𡣡£¡£¡£¡£ÖµµÃ×¢ÖØµÄÊÇ£¬£¬£¬£¬£¬£¬£¬Ã¿¸öÌáÐѴʳ¤¶ÈÔ¼ 60 ´Ê£¬£¬£¬£¬£¬£¬£¬ÔÊÐíͬʱ¼æ¹ËÒÀÀµ CLIP ±àÂëÄ£×Ó£¨77 tokens ÏÞÖÆ£©ºÍ¼á³Ö¸ß¶ÈÐÅÏ¢÷缯¡£¡£¡£¡£¡£

SpatialGenEval ÆÀ¹ÀÊý¾Ý¹¹½¨Á÷³Ì

SpatialGenEval ËùÓÐ 10 ¸ö¿Õ¼äά¶ÈµÄÌáÐѴʼ°ÆäÎÊÌâչʾ

½¹µã·¢Ã÷£º¿Õ¼äÍÆÀíÈÔÊÇÖ÷Ҫƿ¾±

Ñо¿ÍÅ¶Ó¶Ô 23 ¿îÇ°ÑØµÄ¿ªÔ´Óë±ÕÔ´ T2I Ä£×Ó¾ÙÐÐÁËÏ꾡ÆÀ¹À£¬£¬£¬£¬£¬£¬£¬Õ¹ÏÖÁËÒÔÏÂÐÐÒµÏÖ×´£º

¿Õ¼äÍÆÀíÊǽ¹µã±¡Èõ»·½Ú£ºÔÚÉæ¼°½ÏÁ¿ºÍÕÚµ²µÄ¿Õ¼äÍÆÀí×ÓʹÃüÖУ¬£¬£¬£¬£¬£¬£¬´ó¶¼Ä£×ӵĵ÷ֽöÔÚ 30% ×óÓÒ£¬£¬£¬£¬£¬£¬£¬¿¿½üËæ»úÍÆ²âˮƽ£¨20%£©£¬£¬£¬£¬£¬£¬£¬ÕâÅú×¢ÏÖÔÚµÄÄ£×ÓÆÕ±éȱ·¦¶Ô 3D ³¡¾°½á¹¹ºÍÂß¼­¹ØÏµµÄÃ÷È·¡£¡£¡£¡£¡£¿£¿£¿£¿£¿£¿ªÔ´Ä£×ÓÕý¿ìËÙ×·¸Ï£ºÆÀ²âÏÔʾ£¬£¬£¬£¬£¬£¬£¬×îÇ¿µÄ¿ªÔ´Ä£×Ó Qwen-Image (60.6%) ÌåÏÖÒÑÓë¶¥¼¶±ÕÔ´Ä£×Ó Seed Dream 4.0 (62.7%) »ù±¾³Öƽ£¬£¬£¬£¬£¬£¬£¬µ«¾ù½öµÖ´ï¼°¸ñÏßˮƽ£¬£¬£¬£¬£¬£¬£¬¿Õ¼äÖÇÄÜÈÔÓÐÖØ´óÌáÉý¿Õ¼ä¡£¡£¡£¡£¡£Ç¿Ê¢µÄÎı¾±àÂëÆ÷ÖÁ¹ØÖ÷Òª£ºÊ¹ÓøßÐÔÄÜ LLM£¨Èç T5 »ò´óÐÍÓïÑÔÄ£×Ó£©×÷ΪÎı¾±àÂëÆ÷µÄÄ£×Ó£¨Èç FLUX.1£©£¬£¬£¬£¬£¬£¬£¬ÔÚÆÊÎöÖØ´ó¿Õ¼äÖ¸ÁîʱÏÔÖøÓÅÓÚ½öÒÀÀµ CLIP µÄÄ£×Ó¡£¡£¡£¡£¡£

»ùÓÚ Qwen2.5-VL-72B-Instruct µÄ×Ô¶¯»¯ÆÀ¹ÀЧ¹û

×ó£ºËùÓÐÆÀ¹ÀÄ£×ӵĹýʧÀàÐÍÂþÑÜ£»£» £»£»£»£»£»£»ÓÒ£º¸ßÓÅÄ£×ӵĹýʧÀàÐÍÂþÑÜ

Êý¾ÝÖÐÐÄ·¶Ê½£ºÌáÉýÄ£×Ó¿Õ¼äÖÇÄܵÄÓÐÓ÷¾¶

³ýÁËÆÀ¹À£¬£¬£¬£¬£¬£¬£¬¸ÃÑо¿»¹Ìá³öÁËÒ»ÖÖ»ùÓÚÒÑÓÐÌìÉúͼÏñµÄˢмƻ®¡£¡£¡£¡£¡£ÍŶÓͨ¹ý¶àģ̬´óÄ£×Ó£¨MLLM£©ÖØÐ´ÌáÐÑ´ÊÒÔÈ·±£Í¼ÎÄÒ»ÖÂÐÔ£¬£¬£¬£¬£¬£¬£¬¹¹½¨Á˰üÀ¨ 15,400 ¶ÔͼÎÄÊý¾ÝµÄ SpatialT2I Êý¾Ý¼¯¡£¡£¡£¡£¡£¶ÔÖ÷Á÷Èý´óÀàÄ£×Ó£¨Diffusion-based, AR-based£¬£¬£¬£¬£¬£¬£¬Unified-based Ä£×Ó£©¾ÙÐмàÊÓ΢µ÷Ч¹ûÔÚ¿Õ¼äÆÀ¹ÀÖ¸±êÓÐÏÔÖøÔöÒæ£¬£¬£¬£¬£¬£¬£¬ÌìÉúµÄͼÏñÔÚÎïÀíÂß¼­ºÍ¿Õ¼ä½á¹¹Éϸü¾ßÕæÊµ¸Ð¡£¡£¡£¡£¡£

΢µ÷Ä£×ÓºóµÄÌìÉúЧ¹û±ÈÕÕ

×ܽáÓëÕ¹Íû

SpatialGenEval Ϊ T2I Ä£×Ó´Ó ¡°ÃÀѧÌìÉú¡± ÂõÏò ¡°Âß¼­¸ÐÖª¡± ½¨ÉèÁËÒ»ÌõÐÂµÄÆÀ¹Àõè¾¶£¬£¬£¬£¬£¬£¬£¬Ö»ÓÐÈÃÄ£×ÓÕæÕýÃ÷È· ¡°ÍòÎï¸÷µÃÆäËù (Everything in its place)¡±£¬£¬£¬£¬£¬£¬£¬ÌìÉúʽ AI ²Å»ªÔÚ»úеÈ˸¨Öú¡¢ÊÒÄÚÉè¼Æ¡¢×Ô¶¯¼ÝÊ»·ÂÕæµÈ¶Ô¿Õ¼äά¶ÈÓÐÑÏ¿ÁÒªÇóµÄÁìÓòÖÐÊÍ·ÅÕæÕýµÄÉú²úÁ¦¡£¡£¡£¡£¡£

×÷ÕßÍŶÓÏÈÈÝ

°¢Àï¸ßµÂµÄ»úеѧϰÑз¢²¿£¬£¬£¬£¬£¬£¬£¬³ÐºÆÁªÆûÅäÉÌÒµÓÐÏÞ¹«Ë¾½Ó¹«Ë¾ÖصãÓªÒµ£¬£¬£¬£¬£¬£¬£¬°üÀ¨ÍâµØÉúÑij¡¾°ÖÐµÄ¹ã¸æ´´Òâ¡¢ÉÌÆ·Ã÷È·¡¢ÄÚÈÝÖÇÄÜ´´×÷ºÍ·Ö·¢£¬£¬£¬£¬£¬£¬£¬³öÐг¡¾°µÄ AI ÖÇÄÜ»¯µÈ£¬£¬£¬£¬£¬£¬£¬²¿·ÖÑо¿ÁìÓòÆÕ±é£¬£¬£¬£¬£¬£¬£¬°üÀ¨µ«²»ÏÞÓÚÒÔÏÂÆ«Ïò£º(1) ¶àģ̬´óÄ£×Ó£»£» £»£»£»£»£»£»(2) ͼÏñÌìÉú / ±à¼­ÃÀ»¯£»£» £»£»£»£»£»£»(3) ÊÓÆµÌìÉú / Ã÷È·£»£» £»£»£»£»£»£»(4) Agent; (5) ʱ¿ÕÊý¾ÝÍÚ¾ò£»£» £»£»£»£»£»£»(6) ÖÇÄÜÍÆ¼ö£»£» £»£»£»£»£»£»(7) ¸ßÐÔÄÜÍÆÀíµÈ¡£¡£¡£¡£¡£ÍŶÓÊÖÒÕÆø·ÕºÃ£¬£¬£¬£¬£¬£¬£¬Éú³¤¿Õ¼ä´ó£¬£¬£¬£¬£¬£¬£¬ÓµÓи»×ãµÄÑз¢×ÊÔ´ºÍ´ó×ÚµÄÓªÒµÓ¦ÓÃÊý¾Ý£¬£¬£¬£¬£¬£¬£¬¶àƪÂÛÎÄÈëÑ¡ paper digest ×îÓÐÓ°ÏìÁ¦ÂÛÎÄÃûµ¥¡£¡£¡£¡£¡£