现在，我们需要一个人工智能限制协议

2017-09-28

对大部分人而言，人工智能带来的威胁似乎只存在于科幻小说中，如《我，机器人》、《黑客帝国》及《终结者》等。

但人工智能的威胁是真实存在的。多年以来，顶尖计算机科学家们一直在发出警告，而最近世界上一些最聪明的人也开始大声疾呼。比尔·盖茨认为人工智能比核灾难更为危险，伊隆·马斯克说人工智能可能是人类面临的“生死攸关的最大威胁”，史蒂芬·霍金则称人工智能或许“标志着人类的终结”。

首先，我们应当给“人工智能”下个定义。从某种意义上说，人工智能早已存在。它存在于网络搜索引擎中，存在于电子游戏中，存在于垃圾邮件拦截器中，存在于苹果手机Siri中。

所有这些都是弱人工智能（artificial narrow intelligence，ANI）的例子。所谓弱人工智能，是只能完成几个特定任务的人工智能。设计精良的弱人工智能完成一些特殊任务的能力能够与人类比肩，甚至超过人类，但与人类不同的是，弱人工智能无法胜任其他任务。谷歌的阿尔法狗或许能在围棋上击败任何人类，但它能做的也仅仅是下围棋。这类人工智能对我们帮助颇多，而且似乎不会给人类带来生存威胁。

而在强人工智能（artificial general intelligence，AGI）层面，情况就变得危险了。一个强人工智能可以全方位与人类比肩。与弱人工智能不同，强人工智能可以被用于任何领域。虽然还没有任何人开发出强人工智能，但从理论上说，强人工智能能够在任何任务上等同人类，自然也可以不费吹灰之力完成执行复杂计算、数秒之内大量复制、瞬时传输等任务。

超人工智能（artificial super intelligence，ASI）则完全不同。它将在所有层面上击败人类，而它究竟比人类聪明多少，有可能超出我们的估算。

人类的最终发明

在发表于Wait, But Why网站的著名文章《人工智能革命：通往超级智能之路》里，蒂姆·厄本解释了为何人工智能认知能力的发展很可能令我们震惊。

人类倾向于认为，在最聪明的人和最愚笨的人之间存在巨大的智力差。援引牛津大学哲学家尼克·博斯特罗姆举出的例子，阿尔伯特·爱因斯坦远比村里的白痴聪明。但若将非人类的动物也包括进来，爱因斯坦与白痴的智力差就变得微乎其微了。一个人与一只黑猩猩的智力差，要比爱因斯坦与白痴的智力差大上无数倍。而黑猩猩与老鼠之间的智力差则更是相差千里。

这意味着，虽然让人工智能拥有与黑猩猩同样的智力水平需要花费数年甚至数十年，但一旦达到这个水平，人工智能成长到与人类比肩的强人工智能的速度就会快很多，这就是一些人所称的“智能爆炸”。

此外，我们还应当将递归自我改善因素考虑进来，这一概念在研究如何增进智能的人工智能研究者之间非常流行。一个能够进行递归自我改善的人工智能会找到令自身更加聪明的方法，而一旦做到这一步，它就能找到令自身变得愈发聪明的更多方法，从而提升自身智能。这样一个人工智能将独立令自身的认知能力呈指数级增长。

这样，一个智力水平接近人类的人工智能将加速进展，不仅不会停留在博斯特罗姆所谓的“人类城站”（Humanville Station），反而会呼啸而过。一个能够进行递归自我改善的人工智能如果在早上达到了村里白痴的智力水平，那么到了下午它就有可能达到爱因斯坦的智力水平。而到了晚上，它的智力水平或许已经远远超过了任何人类。为开发出强人工智能而欢呼的研究者们或许会发现，在他们还没喝完杯中的香槟时，强人工智能就已经演变成了超人工智能。

正如人类远比黑猩猩聪明一样，超人工智能可能远比人类聪明。我们甚至无法想象这样一个存在。我们认为，智商为70的人蠢笨，智商为130的人聪明，但我们无从想象一个智商高达10000的人是什么样，或者一个拥有如此高认知能力的存在到底能做些什么。对于我们来说，它的能力无法估算：很多我们认为根本不可能或异想天开的事情，对于它们来说简直如同儿戏。治愈疾病就如倒出药丸一样得心应手，星际旅行好比从一个房间步入另一个房间一样轻而易举，毁灭地球上的所有生物如同捻灭一支蜡烛一样信手拈来。

我们能够用来形容这样一种存在的词语就是上帝，但正如厄本不无担忧地指出的，我们应当追问的是：它会是一个仁慈的上帝么？

驯服上帝

一些计算机科学家似乎信心满满，他们认为人类能够令一个强人工智能或超人工智能变得“善良”，驯服人类创造出来的上帝只是一个编程问题。

编程设计一个强人工智能或超人工智能很可能是一个极具挑战的任务。谁知道如果不向它发出具体指令或赋予它价值观的话，它会有何行动，或者即便我们发出了具体指令或赋予了其价值观，它的行动依然难以预测。牛津大学人类未来研究所创始主任尼克·博斯特罗姆举了一个例子，一个人工智能被要求尽可能多地制造回形针，在某一时刻，它可能会决定为了最大化生产出回形针应当阻止人类对它进行重新编程或切断电源。而为了达成这一目标，它杀死了所有人类，这样它就可以继续不受影响地不停制造回形针。

我们应当注意到，在这一情境里，人工智能并非出于对人类的憎恶而对人类实施了灭绝行动（正如我们使用抗生素也并非由于憎恶细菌一样），而仅仅是因为人类对它来说无所谓。同样，当谷歌的Deep Mind人工智能程序随着自身越来越聪明而变得越来越具有进攻性时，它很可能在模拟游戏中使用激光攻击对手，这并不是因为它对对手抱有任何恶意，它这么做只是因为这一战略可以令其赢得比赛的几率最大化。

为了阻止发生类似事情，一些人建议给人工智能植入对人类有益的程序。然而，这种努力同样也可能导致意想不到的后果。

例如，一个被设计成“令人快乐”的人工智能或许会意识到，达成这一目标最有效的方法就是捕获人类，将电极植入人类大脑，刺激控制快感的中枢。

同样，一个被设计成遵守伊萨克·阿西莫夫机器人三定律的人工智能——

1）机器人不得伤害人类，或因不作为而令人类受到伤害。

2）机器人必须服从人类指令，除非该指令违背第一定律。

3）机器人必须保护自己，只要这样做不违背第一和第二定律。

或许决定，既然人类无时无刻不在伤害彼此，那么遵守这些定律的最佳方法就是将所有人类都温柔地监禁起来。

还有人建议，将一套预设的价值观上传至人工智能——比如实用主义或自由民主。但即便我们假设人们能够就选择哪套哲学观达成一致，就连向人类灌输价值观都实属不易，更何况人工智能。我们无法判断，一个超人工智能将如何解读人类的价值观或这套价值观的内在矛盾。

目前，我们还未能找到一个可靠的方法来确保超人工智能的目标或价值观符合人类自身的目标或价值观。一个大意的假设、疏忽或模糊指令都可能导致意料之外或计划之外的结果。

囚禁上帝

还有一些人建议，在强人工智能或超人工智能周围设立防护措施。他们提出了复杂程度不同的方法，从阻断人工智能连接网络，到限制人工智能接触外部世界，到将人工智能困在一系列同心虚拟世界中。但这些防护措施无一能够真正让人安心。

首先，正如路易斯维尔大学计算机工程和计算机科学副教授罗马·V·亚普尔斯基指出的那样，人类发明的每种安全措施最终都会被规避。

“签名可以被伪造，锁可以被撬开，最高安全级别的监狱也发生过越狱，拥有保镖的领导人被暗杀，银行金库被洗劫一空，法律被钻了空子……密码被破解，网络被渗透，电脑被黑客入侵，生物识别系统被骗，信用卡被克隆，加密货币被盗用 ……验证码被破解，密码协议被破坏，”他写道。“数千年的人类历史充斥着数百万试图以技术和逻辑增强安全的例证，然而这些无一不以失败告终。”

任何安全措施最终不是被人类黑客规避，就是被自然力量破坏（如在福岛核电站引发核泄漏的海啸）。当一个项目的风险可控时，一定程度的失败率是可以容忍的。但当一个小小失误就能让人工智能终结人类支配地位时，这是无法令人接受的。

那么就存在这样一种可能，即任何安全措施都有可能被人工智能规避。事实上，人类最优秀的计算机科学家能够发明出的任何安全措施，在一个超人工智能看来都极其可笑，因为超人工智能显然远比任何人类都要聪明。

想象一个被黑猩猩囚禁的人类。假设这是一群极其聪明的黑猩猩，它们利用最先端的猴子科技来囚禁人类——或许它们利用树枝建造出了一个简易牢笼。我们难道会怀疑人类不会最终以黑猩猩完全无法想象的方式逃脱么？他可以在笼子下方挖地洞，也可以利用手边东西打造工具，也可以拆除笼子栏杆把它当做武器，或者生火烧毁部分笼子。无论以哪种方式，他想方设法逃出牢笼都只是时间问题。

同样，一个超人工智能将远比人类更为聪明。亚普尔斯基在《防范奇异：人工智能囚禁问题》一文中指出，一个超人工智能可以轻轻松松地操控人类守卫放它逃生。它可以瞄准人类守卫的弱点，许诺赋予他力量或长生不老，或承诺治愈其身患绝症的心爱之人。

它也可能检测到系统漏洞，并利用该漏洞（人类黑客一直都在从事着类似活动）。或者佯装发生故障，然后当人类守卫放松警备准备检查故障时一举逃走。又或者它甚至可以用人类认为完全不可能的方式逃走。博斯特罗姆认为，当与外部世界完全隔绝时，人工智能或许能发明一种方式，通过在其电路系统中以特殊方式排列电子而产生无线电波。当然，这些都只是人类微不足道的大脑想出来的点子，比我们聪明数千倍的人工智能会想出更多方法。有效的安全措施取决于实力，而当我们面对智力远为高超，因而实力远为强大的对手时，这些措施无一生效。认为我们能够控制这样一种存在只是狂妄自大罢了。

伊隆·马斯克在TED大会一次演讲中将发展人工智能比作召唤恶魔。“你知道，在所有故事中都有这么一个人，他一手拿着五芒星，一手捧着圣水，信心满满地认为自己可以控制恶魔。但事实是他不能。”

我们如何囚禁上帝？对这个问题的简短回答是：“我们做不到。”

我们需要一个协议

开发出强人工智能和超人工智能或许离我们不远了，主流计算机科学家们预测为2040年。这或许听起来还很遥远，但我们需要从现在开始未雨绸缪。

“如果一个超级先进的外星文明向我们发送一条短信，'我们将在几十年之后到达，'我们会仅仅回答'好的，等你们到了之后打电话，我们会把灯留着'？ ”史蒂芬·霍金在与加州大学伯克利分校的斯图尔特·拉塞尔、麻省理工学院的马克思·泰格马克和弗兰克·维尔切克合作的文章中写道。“很可能不会。但这一情景或多或少都将在人工智能身上重现。”

任何一个大国若想在21世纪继续前进，都不能无视人工智能这样的技术。尤其是美国和中国，它们正在向公营和私营的人工智能研究领域投入大量资源，希望能够获取下一个重大突破。

但同时，人工智能给人类带来了真正生死攸关的威胁。人类面临的其他生存威胁，从全球变暖到大规模杀伤性武器，都已经存在着某种形式的协议来管控相关风险。现在是时候出台一个人工智能协议了。

各方合作建立一个国际框架至关重要，在人工智能研究仍处于初期的现在，在人工智能产业还未飞速发展之前，在我们习惯了人工智能带给我们的便利之前，在相关既得利益者和游说集团攫取巨大权力之前。应对全球变暖危机的困境展示出人类不思进取的倾向，哪怕是我们正面临着生死存亡的威胁。“人类很可能轻易就让自己沦落到依赖机器的境地，当这种依赖达到一定程度，人类除了接受机器作出的所有决定外，别无选择，”太阳微系统公司联合创始人比尔·乔伊在题为《未来为何不需要我们》的文章中写道。他警告，到那时“人们将无法关掉机器，因为我们是如此依赖它们，以至于关闭机器即意味着自杀”。

当我向顶尖计算机科学家提出出台限制人工智能相关协议的想法时，很多人都对此持怀疑态度，有些人甚至抛出了宿命论。

“一台'在所有方面都胜过人类'的机器将与世界GDP价值相当，大约为100万亿美元，”拉塞尔说。“阻止人们建造这样一台机器会很难。”

“人工智能不同于核武器，”伊利诺伊大学厄巴纳-香槟分校前计算机科学教授史蒂夫·奥莫亨德罗说。奥莫亨德罗现为Self-Aware System总裁，这是一家旨在推动安全使用人工智能的智库。“鉴于当下的技术水平，很难查证签约方在遵守（人工智能）协议上是否合规。”

然而，我们必须为此作出努力。人工智能的快速发展不仅给人类的繁荣带来了巨大潜力，同时也给人类的毁灭带来了巨大威胁。我们没有借口放弃遏制后者的努力。

至此，我们可以总结出几条结论：

1）一个超人工智能无法被驯服或囚禁。

2）一个能够进行递归自我改善的强人工智能将很快进化为超人工智能。

3）即便无法进行递归自我改善，一个强人工智能依然能够给人类带来生存威胁，因为强人工智能除了能够在执行任何任务上比肩人类，它还能够从事只有计算机才能胜任的工作。

如果我们要在一个限制人工智能的协议中划出一条分界线，那么这条分界线应当划在强人工智能级别：任何人都不应被允许开发在所有领域与人类比肩或胜过人类的人工智能，也不应被允许开发可独立进化出这种能力的人工智能。弱人工智能的相关研究——我们当下使用的人工智能的进阶版——可以不受阻碍地得到进一步发展。重要的区别在于领域的特殊性，一个弱人工智能无法解决超出其能力范围的问题，但是一个强人工智能可以被用于任何领域。“每个系统都有一个特殊的领域，除非进行大规模重新设计，否则它无法切换到其他领域，”亚普尔斯基解释说。“（IBM国际象棋人工智能）“深蓝”无法被用于分拣邮件，（参加“Jeopardy”综艺节目的IBM人工智能）“沃森”无法驾驶车辆。但一个强人工智能（就其定义来说）能够切换领域。”

而基于以上这些原则的协议将会是什么样子？

可能的条款

一个控制人工智能的国际框架应当包含一些与控制大规模杀伤性武器框架同样的元素：

1）承诺不开发相关技术，或支持任何人开发相关技术，或允许任何人开发相关技术。

2）在遵守协议条款的缔约国之间建立一个分享信息和技术的渠道。

3）一个监控该领域发展的国际组织。

4）一个核查违约者的检查机构。

5）将任何违反规则的人提交联合国安理会接受惩罚。

6）一个移除并处理违禁品的机制。

关于承诺和信息技术分享条款，其意义不言自明。一言以蔽之，签约国必须承诺，不仅要规避开发强人工智能研究，还应确保各自国内的私营部门不会参与相关开发。

显然，这将极其困难。强人工智能的研究成果很可能会带来丰厚的利润，尤其是公司会有极大的动力参与开发，哪怕这种开发是违法行为。

《我们的最终发明：人工智能与人类时代的终结》一书作者詹姆斯·巴拉特列举了很多受贪婪驱使、不负责任的公司行为。

“公司行事就仿佛是肆意妄为的精神病患者，”他对我说。“我脑海中浮现的是美国联合碳化物的博帕尔事件，福特公司的斑马车油箱爆炸事件，安然公司的加州大停电事件。脸书、谷歌、IBM或百度也并不比这些公司更正直。我不觉得他们会对创新实施管理。”

各国必须承诺对国内的人工智能研究进行严密监控，同时应对任何可能导致开发出强人工智能的研究实施惩罚，这些惩罚必须严厉到足以令任何潜在收益化为乌有。

而对人工智能发展实施监控，可以被控制在一个合理的范围内。

“虽然一些作者指出，开发强人工智能远比开发核武器更容易避人耳目，”机器智能研究所的卡伊·苏塔拉写道，但“先进的高科技研究的确倾向于需要大规模投资，而大规模投资很可能被哪怕不那么严密的监控捕捉到。”

“识别出那些在强人工智能研究上拥有长期严肃兴趣的能人并不困难，”博斯特罗姆在《超智能：路径、危险和战略》一书中写道。“这样的人才通常会留下足迹。他们可能已经发表了学术论文，或在网络论坛上发布过帖子，或在顶尖计算机科学系获得了学位。他们也可能与其他人工智能研究者保持联系，这就可以让我们通过绘制社交网络图找到他们。”

因此，致力于开发强人工智能的研究者们可以被监控。或许可以设立一个国际机构，推动人工智能的安全使用并进行检查，正如国际原子能机构对核原料的检查一样。

当然，协议的细节还需要专家来敲定。正如全球巨灾风险研究所副主管G.S. 威尔逊提议的那样，由一个专家组来决定在强人工智能或其他存在潜在风险的研究中什么是“合理的关切水平”。

这样一个协议必然会引发抑制创新的担忧。这些担忧合情合理。人工智能创新的确会极大地受制于这些措施，而这个领域的创新将会扩展知识，拯救生命，将我们的生活水平提高到一个前所未有的高度。然而，人类灭绝的切实威胁令我们放弃部分利益的决定更为明智。

缺陷

这样一个协议的缺陷也是显而易见的。

一些隐秘的强人工智能研究是否能躲避最严密的监控？是的，正如一个恐怖分子可以在当局完全不知情的情况下组装出一个核炸弹。但这并不意味着核武器控制条约毫无意义。

会不会有一些国家作弊？当然，而且一份条约只有在能确保被严格执行的情况下才有效。

强人工智能和弱人工智能间的微妙区别也是一个漏洞。一个弱人工智能只能完成一些任务（而“一些”究竟是多少？），弱人工智能在不进行大规模重新设计的情况下无法从事不同任务（而究竟什么算是“大规模”？）。

最重要的是，令各国签署这样一个协议也非常困难。但如果人工智能研究竞赛中的领跑者——美国、中国和日本——强力推动，其他国家将会效仿。

冷战期间，世界一直生活在人类灭亡的恐惧中。但正是这份恐惧促使大国设立了最大化降低大规模杀伤性武器危害的协议。尤其是美国和苏联，这两个大国选择终结各自的生物武器项目，因为与核原料和化学品不同，生物武器使用的病毒和细菌极其变幻莫测，它们很可能会演化成为更强大、更危险的毒株。这一点与人工智能相似。

眼下，世界正面临着人工智能带来的生死攸关的新威胁。没有什么框架协议能够完全消除它带来的威胁，但如果一个协议能够在最大程度上减少这种威胁，那么这就足以让我们为此付出努力。未来正在迫近，时不我待。

For most, the threat of artificial intelligence seems like science fiction, the stuff of movies like I, Robot, The Matrix, and The Terminator.

But the threat it poses is real. Prominent computer scientists have warned of it for years, and recently some of the smartest people on the planet have taken up the call. Bill Gates considers AI more dangerous than a nuclear catastrophe, Elon Musk said it was probably humanity's “biggest existential threat,” Steven Hawking said it could “spell the end of the human race.”

We should start by defining what's meant by the term “AI.” AI, in a sense, is already here. It's in online search engines, the computer opponents in video games, the spam filter in our emails, and the Siri assistant in our iPhones.

All of these are examples of artificial narrow intelligence (ANI) – AI that's only capable of a few specific tasks. Well-designed ANIs can match or surpass humans at particular tasks, but, unlike humans, they can't be applied to much else. Google's AlphaGo may be able to beat any human at Go, but that's all it can do. Such AIs are useful, and don't seem to pose an existential threat.

It's at the level of artificial general intelligence (AGI) when things get dangerous. An AGI would be as smart as a human across the board. Unlike an ANI, an AGI could be applied to anything. No one's been able to develop one yet, but in theory, an AGI would be able to match a human at any task, and, naturally, would also be able to do things like perform complicated calculations effortlessly, make countless copies of itself in seconds, and transmit itself across the world instantaneously.

An artificial superintelligence (ASI) would be something else entirely. It would be smarter than humans across the board, and the extent to which it's smarter may be beyond our reckoning.

Our final invention

In his great article “The AI Revolution: The Road to Superintelligence” in Wait But Why, Tim Urban explained why growth in AI cognitive power is likely to take us by surprise.

Humans tend to think that the difference in intelligence between the smartest human and the dumbest human is large, that is, to use Oxford philosopher Nick Bostrom's example, that someone like Albert Einstein is much smarter than the village idiot. On the grand scale of intelligence including non-human animals, however, this difference is miniscule. The difference between the intelligence of a human and that of a chimpanzee is many, many times larger than the difference between the intelligence of Einstein and that of the village idiot. The difference between the intelligence of a chimpanzee and that of a mouse is larger still.

This means that whilst it may take years or decades to get an AI to chimpanzee-level intelligence, for example, once that level is reached the transition to general human-level intelligence (AGI) will be much faster, resulting in what some have termed an “intelligence explosion.”

Furthermore, we should factor-in recursive self-improvement, a popular idea amongst AI researchers for boosting intelligence. An AI capable of recursive self-improvement would be able to find ways to make itself smarter; once it's done that, it'll be able to find even more ways to make itself smarter still, thereby bootstrapping its own intelligence. Such an AI would independently and exponentially increase in cognitive power.

An AI approaching general human-level intelligence, therefore, would pick up speed, and, far from stopping at Humanville Station, as Bostrom puts it, would whoosh past it. An AI capable of recursive self-improvement that had attained village idiot intelligence level in the morning might hit Einstein-level by the afternoon. By evening, it could have reached a level of intelligence far beyond any human. AI researchers, celebrating their success at creating an AGI, might find themselves faced with a superintelligence before they'd even finished the champagne.

A superintelligence could be smarter than humans in the same way that humans are smarter than chimpanzees. We wouldn't even be able to comprehend an entity like that. We think of an IQ of 70 as dumb and an IQ of 130 as smart, but we have no idea what an IQ of 10,000 would be like, or what a being with that cognitive capacity would be capable of. Its power, for us anyway, would be incalculable: many things we deem impossible or fantastical would be child's play for it. Curing all disease would be as easy for it as popping a pill, interstellar travel as easy as stepping from room to room, and extinguishing all life on earth as easy as snuffing out a candle.

The only term we have that comes close to describing something like that is God, and, as Urban ominously puts it, the question we should ask then is: Will it be a nice God?

Taming God

Some computer scientists seem confident that we can make an AGI or a superintelligence be “nice,” that taming the god we created is a matter of programming.

Programming an AI of human intelligence or above will likely be a daunting task. Who knows what it might do without being given specific goals or values, and, even if it is, its actions might still be unpredictable. Nick Bostrom, who is also the founding director of the Future of Humanity Institute at the University of Oxford, gives the example of an AI being tasked with the seemingly boring and innocuous goal of making as many paperclips as possible. At some point, it may decide that in order to maximize the number of paperclips it should prevent humans from reprogramming it or switching it off, upon which it kills all the humans so it can continue making endless amounts of paperclips unimpeded.

Note, of course, that in that scenario the AI wouldn't exterminate humans because of any malice it had towards them (no more than we hate bacteria when we take antibiotics), but because they don't matter to it. Likewise, when Google's DeepMind AI program grew increasingly aggressive as it got smarter, and was more likely to attack opponents with lasers in simulated games, it wasn't because of any malice towards those opponents; it was just because that strategy maximized its chances of winning.

In order to prevent something like that from happening, some have suggested programming AIs with goals specifically beneficial to humans. Such attempts, however, can also lead to unexpected results.

For example, an AI programmed to “make people happy” might realize that the most efficient way to do this is to capture humans, implant electrodes into their brains and stimulate their pleasure centers.

Likewise, an AI programmed with Isaac Asimov's Three Laws of Robotics—

1) A robot may not injure a human being or, through inaction, allow a human being to come to harm.

2) A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

3) A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

--might decide that, since humans are constantly harming each other, the best way to obey these laws would be to gently imprison all of them.

Another suggestion is to upload a pre-existing set of values into an AI – utilitarianism, say, or liberal democracy. But even assuming people could agree on which philosophy to go with, it's hard enough to imbue humans with human values as it is. There's no telling how a superintelligence might interpret it, or the contradictions within it.

There's no reliable way to ensure a superintelligence's goals or values accord with our own. A single careless assumption or oversight or ambiguity could lead to results no one expected or intended.

Caging God

Others have suggested building safeguards around the AGI or superintelligence. They've mooted measures of varying degrees of complexity, from denying it access to the internet, to restricting its contact with the outside world, to trapping it in a series of concentric virtual worlds. None of these safeguards inspire confidence.

First, as Roman V. Yampolskiy, Associate Professor of Computer Engineering and Computer Science at the University of Louisville, noted, every security measure ever invented has eventually been circumvented.

“Signatures have been faked, locks have been picked, supermax prisons had escapes, guarded leaders have been assassinated, bank vaults have been cleaned out, laws have been bypassed…passwords have been brute-forced, networks have been penetrated, computers have been hacked, biometric systems have been spoofed, credit cards have been cloned, cryptocurrencies have been double spent…CAPTCHAs have been cracked, cryptographic protocols have been broken,” he wrote. “Millennia long history of humanity contains millions of examples of attempts to develop technological and logistical solutions to increase safety and security, yet not a single example exists which has not eventually failed.”

Any safeguards would eventually be circumvented either by human hackers, or acts of nature (for example, the tsunami that caused the radiation leak at the Fukushima nuclear reactor). Whilst a certain failure rate may be acceptable in an enterprise where the stakes are lower, it's unacceptable where a single leak might be all the AI needs to end humanity's dominance.

Then, there's the likelihood that any safeguards would be circumvented by the AI itself. Indeed, any security measures our best computer scientists could devise would be laughable to a superintelligence, which by definition would be many times smarter than any human.

Imagine a human being held captive by chimpanzees. Suppose that these are unusually intelligent chimpanzees that use state-of-the-art monkey technology to keep the human prisoner – perhaps they manage to construct a rudimentary cage out of sticks. Is there any doubt that the human wouldn't eventually escape in ways the chimpanzees couldn't possibly think of? Perhaps he'd dig a hole under the cage, or fashion tools out of nearby objects to help him, or remove the bars of the cage and use them as weapons, or make a fire that burns down a portion of the cage. One way or another, it would only be a matter of time before he found a way free

A superintelligence would be smarter than humans in a similar fashion. In his article “Leakproofing the Singularity: Artificial Intelligence Confinement Problem,” Yampolskiy suggested that a superintelligence could easily manipulate a human guard into letting it escape. It could target a guard's weaknesses, offering him power or immortality, or promising a cure for a loved-one with a terminal disease.

It could also find a bug in the system and exploit it (something even human hackers do all the time). Or pretend to malfunction, and then escape when its jailors lower safeguards to investigate. Or it could escape in ways humans aren't even aware are possible. Insulated from the outside world, Bostrom suggested, it might find a way to generate radio waves by shuffling the electrons in its circuitry in particular patterns. Of course, these are just the methods our puny human brains can imagine – an entity thousands of times smarter would be able to come up with a lot more. Effective safeguards are built around power – they're not possible against a being that's smarter, and therefore more powerful, than us. Thinking we could contain something like that would be hubris.

At a talk at MIT, Elon Musk compared developing AI to summoning a demon. “You know all the stories where there's a guy with the pentagram and the holy water and he's like, yeah, he's sure he can control the demon? Doesn't work out.”

How do you cage a god? The short answer to that question is “You can't.”

The Need for a Treaty

The development of AGI and superintelligence may be approaching. The median realistic year leading computer scientists predict it to happen by is 2040. While this might seem far off, we need to start preparing for it now.

“If a superior alien civilization sent us a text message saying, “We'll arrive in a few decades,” would we just reply, “Ok, call us when you get here – we'll leave the lights on?”” asked Stephen Hawking in an article co-written with Stuart Russell of the University of Berkeley and Max Tegmark and Frank Wilczek of MIT. “Probably not – but this is more or less what is happening with AI.”

AI is a technology no major power can afford to ignore if it wants to advance in the 21st century. The U.S. and China in particular are pouring vast resources into AI research in both the public and private sectors in hopes of achieving the next breakthrough.

At the same time however, AI presents a real existential threat to humanity. All other existential threats, from global warming to weapons of mass destruction, have some sort of treaty in place to manage the associated risks. It's time we had one for AI too.

It's vital we work on establishing an international framework now, in what are relatively early days, before the AI industry develops too far, before we become too used to its benefits, before associated vested interests and lobby groups gain too much power. The difficulties in addressing the global warming crisis show the tendency of humans to inertia, even when faced with existential threat. “[T]he human race might easily permit itself to drift into a position of such dependence on the machines that it would have no practical choice but to accept all of the machines' decisions,” wrote Bill Joy, co-founder of Sun Microsystems, in his essay “Why the Future Doesn't Need Us.” At that point, he warned, “People won't be able to just turn the machines off, because they will be so dependent on them that turning them off would amount to suicide.”

When I put the idea of an AI limitation treaty to top computer scientists, many were skeptical, some even fatalistic.

“A machine that is 'smarter than humans across the board' would be worth something comparable to world GDP, approximately $100 trillion,” said Russell. “It's not going to be easy to stop people building that.”

“[U]nlike [with] nuclear weapons,” said Steve Omohundro, formerly professor of computer science at the University of Illinois at Champaign-Urbana, and now President of Self-Aware Systems, a think tank promoting the safe uses of AI, “it is not easy to verify compliance with any [AI] agreement given today's technologies.”

Yet an effort must be made. The growing field of AI offers vast potential, both for human flourishing, and its extinction. We have no excuse for not trying to stave off the latter.

There seem to be a few conclusions that can be drawn:

1) A superintelligence cannot be tamed or caged.

2) An AGI capable of recursive self-improvement would soon become a superintelligence.

3) Even without recursive self-improvement, an AGI might pose an existential threat simply because in addition to being able to perform any task at a human level, it would also be able to do things only computers can do.

The line, if one is to be drawn in an AI limitation treaty, then, should be at the AGI level: no one should be allowed to develop an AI that's as smart as or smarter than a human across the board, nor one that could independently become so. Research into ANI – better versions of the AI we use today – can continue unimpeded. The important difference is domain specificity; an ANI cannot be used for problems beyond a narrow scope, whilst an AGI can be used for anything. “A system is domain specific if it cannot be switched to a different domain without significant redesigning effort,” explained Yampolskiy. “Deep Blue [IBM's chess AI] cannot be used to sort mail. Watson [IBM's Jeopardy! AI] cannot drive cars. An AGI (by definition) would be capable of switching domains.”

What might such a treaty based on these principles look like?

Possible Provisions

An international AI control framework could contain some of the same elements as control frameworks for weapons of mass destruction:

1) Commitments not to pursue that kind of technology, or to abet anyone in pursuing such technology, or to allow anyone to do so

2) An information and technology-sharing channel between signatories who abide by the provisions

3) An international organization to monitor developments

4) An inspections regime to catch cheaters

5) Recourse to the UN Security Council for punishment of anyone who breaches these rules

6) A mechanism to remove and dispose of any forbidden material

The commitments and information and technology sharing are self-explanatory enough. Suffice to say that states would have to commit not just to eschewing research that may result in AGI themselves, they will also have to commit to ensuring private entities within their borders do so.

This will obviously be difficult. The fruits of AGI research are likely lucrative, and corporations, in particular, have great incentives to pursue it, even illegally.

James Barrat, author of Our Final Invention: Artificial Intelligence and the End of the Human Era, points to many instances of irresponsible corporate behavior driven by greed.

“Corporations behave like psychopaths turned loose on society,” he told me. “I'm thinking of Union Carbide (Bhopal), Ford (the exploding Pinto), Enron (causing rolling blackouts in California). Facebook, Google, IBM, [and] Baidu are no more upright than these corporations. I don't expect them…to temper innovation with stewardship.”

States will have to commit to strict monitoring of AI research domestically, and to imposing penalties for any research that could lead to AGI that are harsh enough to outweigh any potential benefits.

When it comes to the monitoring of AI developments, this can be successfully done to an extent.

“Although several authors make the point that AGI is much easier to develop unnoticed than something like nuclear weapons,” wrote Yampolskiy and Kaj Sotala of the Machine Intelligence Research Institute, “cutting-edge high-tech research does tend to require major investments which might plausibly be detected even by less elaborate surveillance efforts.”

“[I]t would not be too difficult to identify capable individuals with a serious long-standing interest in artificial general intelligence research,” wrote Bostrom in Superintelligence: Paths, Dangers, Strategies. “Such individuals usually leave visible trails. They may have published academic papers, posted on internet forums, or earned degrees from leading computer science departments. They may also have had communications with other AI researchers, allowing them to be identified by mapping the social graph.”

Thus, researchers working on projects that may result in an AGI can be monitored. Perhaps an international agency can be established to promote safe AI practices and to carry out inspections, similar to what the International Atomic Energy Agency does for nuclear material.

The specifics would of course have to be decided by experts. As G. S. Wilson, Deputy Director of the Global Catastrophic Risk Institute, proposed, a body of experts could determine what constitutes a “reasonable level of concern” involving AGI or other possibly dangerous research.

Such a treaty would of course raise concerns that it's stifling innovation. These concerns are justified. AI innovations would be significantly constrained by these measures, innovations that could improve knowledge, save lives, raise our standard of living to an unprecedented degree. Yet the very real risk of human extinction makes it wiser to forfeit some of these benefits.

Shortcomings

The shortcomings of such a treaty are obvious.

Will some clandestine AGI-related research elude even the most vigilant watchdogs? Yes, in the same way that a terrorist somewhere could probably build a dirty nuclear bomb without the authorities' knowledge. But that doesn't mean nuclear control treaties aren't worthwhile.

Will some countries cheat? Certainly, and any treaty is only as good as its enforcement.

A loophole also lies in the thin distinction between AGI and ANI – an ANI can only perform a few tasks (How many are “a few”?), an ANI cannot be reconfigured to different tasks without significant redesigning (What counts as “significant?”).

Most of all, there's the difficulty of getting states to sign on to such a treaty. But if the leaders in the AI race – America, China, Japan – push for it, others will follow.

During the Cold War, the world lived under an existential threat for decades. That threat however prompted leading powers to create treaties to minimize the risks WMDs posed. Notably, the US and the Soviet Union chose to end their biological weapons programs, because, unlike nuclear material and chemicals, and like AI, viruses and bacteria are extremely unpredictable, capable of growing and evolving into stronger and more virulent strains.

The world now faces a new existential risk in the form of AI. No framework can remove that risk entirely, but if it can significantly minimize it then that's more than enough reason to forge one. The future is coming, and it waits for no one.

现在，我们需要一个人工智能限制协议

We Need an AI Limitation Treaty. Now.

现在，我们需要一个人工智能限制协议

We Need an AI Limitation Treaty. Now.