文/德米什·哈薩比斯 譯/莊曉旭 閆冉/審訂
By Demis Hassabis1
圍棋起源于中國,至今已有2500多年的歷史。孔子曾為圍棋作文,它也是中國文人騷客必需掌握的四藝之一。全世界的圍棋手總數(shù)超過4000萬,圍棋的規(guī)則簡單:棋手在棋盤上行白子或黑子,努力吃掉對方的棋子或在棋盤上圍地。下圍棋主要靠個人的直覺與感覺,其美妙、精微與蘊含的智慧,讓幾千年來的人們?yōu)橹裢?/p>
[2]雖然圍棋規(guī)則簡單,下起來卻極其復(fù)雜??赡艿钠逦欢噙_1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000種,這比宇宙中的原子數(shù)量都要多,比國際象棋大10的100次方倍。
The game of Go originated in China more than 2,500 years ago.Confucius wrote about the game, and it is considered one of the four essential arts required of any true Chinese scholar. Played by more than 40 million people worldwide, the rules of the game are simple: Players take turns to place black or white stones on a board, trying to capture the opponent’s stones or surround empty space to make points of territory. The game is played primarily through intuition and feel, and because of its beauty, subtlety and intellectual depth it has captured the human imagination for centuries.
[2] But as simple as the rules are,Go is a game of profound complexity.There are1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 possible positions—that’s more than the number of atoms in the universe,and more than a googol times larger than chess.
[3]圍棋的復(fù)雜性使其對計算機有很大難度,也正因此圍棋成為人工智能研究者渴望征服的挑戰(zhàn)。這些研究者們以各類游戲為實驗,以發(fā)明出可以解決問題的智能、靈活的計算程序,有時解決的方式與人類相似。電腦可以勝任的第一個游戲是“井字游戲”(又叫作“一字棋”),時間是1952年。1994年,掌握了跳棋。1997年,電腦“深藍”因戰(zhàn)勝棋王加里·卡斯帕羅夫而聞名。計算機的戰(zhàn)績并不局限于棋牌游戲——2011年IBM的沃森在《危險邊緣》節(jié)目中,擊敗該節(jié)目的兩位冠軍;2014年,我們通過原始像素輸入開發(fā)出掌握雅麗達游戲幾十種玩法的計算機程序。但到現(xiàn)在為止,人工智能工程師依舊不能開發(fā)出百戰(zhàn)百勝的圍棋計算機程序。
[4]傳統(tǒng)的人工智能方法是構(gòu)建覆蓋所有可能位置的搜索樹,而這并能不在圍棋中實現(xiàn)。因而當我們著手征服圍棋的時候,采取了不同的方法。我們建立了名為“阿爾法圍棋”(A l p h a G o)的體系。該體系結(jié)合了高級樹形檢索與深度神經(jīng)網(wǎng)絡(luò),我們給這些神經(jīng)網(wǎng)絡(luò)中輸入棋局并用含有數(shù)百萬類神經(jīng)元連接的1 2個不同的網(wǎng)絡(luò)層對其處理。一個神經(jīng)網(wǎng)絡(luò),即“策略網(wǎng)絡(luò)”,可以選擇下一步棋的走法;另一個神經(jīng)網(wǎng)絡(luò),即“價值網(wǎng)絡(luò)”,則可預(yù)測棋局的贏家。
[3] This complexity is what makes Go hard for computers to play, and therefore an irresistible challenge to artificial intelligence (AI) researchers,who use games as a testing ground to invent smart, flexible algorithms that can tackle problems, sometimes in ways similar to humans. The fi rst game mastered by a computer was noughts and crosses2noughts and crosses是一種在3×3格子上進行的連珠游戲,由于棋盤一般不畫邊框,格線排成井字故得名。兩個玩家,一個打圈(○),一個打叉(×),輪流在3乘3的格上打自己的符號,最先以橫、直、斜連成一線則為勝。因而又叫“一字棋”。(also known as tic-tac-toe)in 1952. Then fell checkers in 1994. In 1997 Deep Blue famously beat Garry Kasparov at chess. It’s not limited to board games either—IBM’s Watson bested two champions at Jeopardy3美國一檔智力競賽電視節(jié)目。in 2011, and in 2014 our own algorithms learned to play dozens of Atari games just from the raw pixel inputs. But to date, Go has thwarted AI researchers.
[4] Traditional AI methods—which construct a search tree over all possible positions—don’t have a chance in Go.So when we set out to crack Go, we took a different approach. We built a system, AlphaGo, that combines an advanced tree search with deep neural networks. These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like connections. One neural network, the “policy network,” selects the next move to play. The other neural network, the “value network,” predicts the winner of the game.
[5]我們用人類專家圍棋比賽中的3000萬個走法強化這套神經(jīng)網(wǎng)絡(luò)系統(tǒng),直到它可以預(yù)測57%的人類落子(在阿爾法圍棋之前,這個紀錄是44%)。但我們的目標不是模仿人類選手,而是要戰(zhàn)勝他們。要實現(xiàn)這個目標,阿爾法圍棋掌握了如何為自身發(fā)現(xiàn)新戰(zhàn)略,即在神經(jīng)網(wǎng)絡(luò)中對棋局進行成千上萬次計算,并運用試差法調(diào)整系統(tǒng)間的連接(這一過程又叫強化學(xué)習(xí))。當然,上文種種都要求強大的運算能力,所以我們也大量使用了谷歌云平臺。
[6]在種種強化之后,我們開始讓阿爾法圍棋參與實戰(zhàn)。首先,我們舉辦了阿爾法圍棋與其他頂級計算機圍棋程序間的錦標賽。阿爾法圍棋在5 0 0場競賽中只輸了一場。接著我們邀請了蟬聯(lián)三屆歐洲圍棋冠軍的樊麾——他從1 2歲起就投身圍棋,是職業(yè)選手中的精英。我們邀請他到倫敦的工作室來參加挑戰(zhàn)賽。在2 0 1 5年1 0月的閉門比賽中,阿爾法圍棋5∶0贏得了比賽。這是電腦程序第一次戰(zhàn)勝職業(yè)圍棋手。
[5] We trained the neural networks on 30 million moves from games played by human experts, until it could predict the human move 57 percent of the time (the previous record before AlphaGo was 44 percent). But our goal is to beat the best human players, not just mimic them. To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks, and adjusting the connections using a trial-and-error process known as reinforcement learning. Of course,all of this requires a huge amount of computing power, so we made extensive use of Google Cloud Platform.
[6] After all that training it was time to put AlphaGo to the test. First, we held a tournament between AlphaGo and the other top programs at the forefront of computer Go. AlphaGo won all but one of its 500 games against these programs. So the next step was to invite the reigning three-time European Go champion Fan Hui—an elite professional player who has devoted his life to Go since the age of 12—to our London office for a challenge match.In a closed-doors match last October,AlphaGo won by 5 games to 0. It was the first time a computer program has ever beaten a professional Go player.
[7] We are thrilled to have mastered Go and thus achieved one of the grand challenges of AI. However, the most significant aspect of all this for us is that AlphaGoisn’t just an“expert”system built with hand-crafted rules;instead it uses general machine learning techniques to fi gure out for itself how to win at Go. While games are the perfect platform for developing and testing AI algorithms quickly and efficiently,ultimately we want to apply these techniques to important real-world problems. Because the methods we’ve used are general-purpose4general-purpose 通用的。, our hope is that one day they could be extended to help us address some of society’s toughest and most pressing problems,from climate modelling to complex disease analysis. We’re excited to see what we can use this technology to tackle next! ■
[7]我們很開心能夠掌握圍棋訣竅,攻破人工智能眾多難點中的一個。但是,對我們來說最大的亮點在于,阿爾法圍棋不是靠人工建立的“專家”系統(tǒng),而是運用一般的機器學(xué)習(xí)技巧,自己贏得圍棋比賽。雖然各類游戲是迅速高效地開發(fā)和檢測人工智能計算程序的完美平臺,但我們最終的目標是把這些技巧用于解決重要的現(xiàn)實問題。我們使用的方法是通用的,因而我們希望有一天能拓展這些方法來解決社會中一些最艱難、最緊迫的問題,比如氣候模型和復(fù)雜疾病分析等。我們很希望看到,接下來我們可以用這項技術(shù)解決哪些問題。 □