摘 要:現(xiàn)有工作利用神經(jīng)網(wǎng)絡(luò)構(gòu)建了各種檢索模型,取得了一定的成功,但仍存在注入模型信息篩選不充分、引入噪聲和對(duì)已知內(nèi)容的潛在語(yǔ)義信息、時(shí)序關(guān)系挖掘不充分問(wèn)題。針對(duì)上述問(wèn)題,提出了基于深度多匹配網(wǎng)絡(luò)的多輪對(duì)話回復(fù)模型(DMMN)。該模型將上下文與知識(shí)作為對(duì)候選回復(fù)的查詢,在三者編碼之后提出預(yù)匹配層,采用單向交叉注意力機(jī)制分別篩選出基于知識(shí)感知的上下文與基于上下文感知的知識(shí),識(shí)別兩者中重要的信息。將候選回復(fù)與以上兩者交互作用之后,進(jìn)行特征聚合階段,一方面借助額外BiLSTM網(wǎng)絡(luò)捕獲基于回復(fù)的上下文對(duì)話信息間的時(shí)序信息,另一方面借助帶門控的注意力機(jī)制挖掘基于回復(fù)的知識(shí)間的語(yǔ)義信息,增強(qiáng)匹配特征信息。最后,融合上述表示特征。在原始的和修改后的Persona-Chat數(shù)據(jù)集上性能評(píng)測(cè)結(jié)果顯示,與現(xiàn)有方法相比,該模型召回率得到了進(jìn)一步的提高,檢索出的回復(fù)效果更好。
關(guān)鍵詞:多輪回復(fù)選擇; 深度多匹配網(wǎng)絡(luò); 語(yǔ)義挖掘; 帶門控的注意力機(jī)制
中圖分類號(hào):TP391文獻(xiàn)標(biāo)志碼:A
文章編號(hào):1001-3695(2023)08-023-2393-06
doi:10.19734/j.issn.1001-3695.2022.11.0783
Multi-turn dialogue response selection model based on
deep multi-matching network
Liu Chao, Li Wan
(School of Computer Science amp; Engineering, Chongqing University of Technology, Chongqing 400054, China)
Abstract:There are still issues with insufficient model information screening that introduces noise, insufficient mining of potential semantic information, and insufficient consideration of the temporal relationships of known contents, although existing works have constructed a variety of retrieval models using neural networks with some success. The research suggested a multi-turn dialogue response model based on a deep multi-matching network (DMMN) to overcome the aforementioned problems. The model took context and knowledge as queries to candidate responses, proposed a pre-matching layer after encoding all three, and used a one-way cross-attention mechanism to filter knowledge-aware context and context-aware knowledge, respectively, to identify the important information in both. After the candidate response had interacted with the aforementioned two, it conducted a feature aggregation phase to improve the matching feature information by mining the semantic information between the response-based knowledge and the attention mechanism with gating on the one hand and the temporal information between the response-based contextual dialogue messages with the aid of an additional BiLSTM network on the others. Finally, the representation features mentioned above were combined. According to the performance evaluation results on the original and revised Persona-Chat datasets, the model has further increased the recall rate and recovered better responses when compared to existing approaches.
Key words:response selection; deep multi-matching network; semantic mining; attention mechanism with gating
0 引言
建立一個(gè)智能的對(duì)話系統(tǒng)一直是人工智能中一個(gè)頗具難度的研究領(lǐng)域,人與機(jī)器之間能夠智能對(duì)話是人工智能的目標(biāo)之一?;跈z索的對(duì)話系統(tǒng)利用給定的用戶輸入信息,從回復(fù)候選集中選擇出最相關(guān)的回復(fù),其關(guān)鍵任務(wù)是衡量輸入信息和回復(fù)候選集的匹配度[1]?,F(xiàn)有大型對(duì)話平臺(tái)如百度的小度、微軟的小冰[2]和阿里巴巴的AliMe[3]等設(shè)備依舊偏向于采用基于檢索的對(duì)話模型,主要原因是它的回復(fù)是來(lái)自于人的真實(shí)對(duì)話,語(yǔ)句質(zhì)量高,語(yǔ)法錯(cuò)誤少,更為流暢。
開放域?qū)υ挼牡湫吞攸c(diǎn)是話題廣泛且多樣,對(duì)于用戶輸入信息可能存在多個(gè)恰當(dāng)回復(fù)。早期研究集中在將用戶輸入視為查詢的單輪對(duì)話,然而輸入語(yǔ)句本身攜帶的信息不足會(huì)極大程度限制機(jī)器理解用戶輸入語(yǔ)義的能力。與單輪檢索對(duì)話模型相比,多輪檢索對(duì)話系統(tǒng)需要整合當(dāng)前用戶輸入和對(duì)話語(yǔ)境信息。序列匹配網(wǎng)絡(luò)(sequential matching network,SMN) [4]將回復(fù)語(yǔ)句和上下文進(jìn)行匹配,研究表明引進(jìn)上下文的確會(huì)使多輪回復(fù)選擇模型的性能得到巨大提升。深度表達(dá)融合網(wǎng)絡(luò)(deep utterance aggregation,DUA)[5]用網(wǎng)格細(xì)化處理對(duì)話后用注意力機(jī)制挖掘關(guān)鍵信息并忽略冗余信息。深度注意匹配(deep attention matching,DAM)網(wǎng)絡(luò)[6]使用堆疊的自注意力獲取多粒度的語(yǔ)義表示,利用交叉注意力來(lái)依賴信息進(jìn)行匹配,分別對(duì)其進(jìn)行改善。交互匹配網(wǎng)絡(luò)(interactive matching network,IMN)[7]增強(qiáng)word-level和sentence-level的上下文—回復(fù)對(duì)表示,并將上下文和回復(fù)進(jìn)行雙向、全局交互,以得到匹配的特征向量。秦漢忠等人[8]提出的擴(kuò)展DAM模型(ex-DAM),對(duì)DAM模型進(jìn)行改進(jìn),引入多頭注意力機(jī)制使模型更適合處理有細(xì)微變化的數(shù)據(jù)。Whang等人[9]使用預(yù)訓(xùn)練語(yǔ)言模型將回復(fù)檢索任務(wù)定于為對(duì)話—回復(fù)二分類問(wèn)題,提出話語(yǔ)操縱策略來(lái)解決話語(yǔ)之間時(shí)間依賴被忽視的問(wèn)題。Liu等人[10]通過(guò)基于Transformer預(yù)訓(xùn)練模型的掩碼機(jī)制來(lái)解耦語(yǔ)境化的單詞表示,使得每個(gè)單詞分別只關(guān)注當(dāng)前話語(yǔ)、其他話語(yǔ)以及說(shuō)話角色。Zhang等人[11]利用監(jiān)督對(duì)比損失將對(duì)比學(xué)習(xí)應(yīng)用在回復(fù)選擇當(dāng)中,學(xué)習(xí)到的正、負(fù)例在嵌入空間中得到更遠(yuǎn)分離,從而提高匹配性能。
人們談話內(nèi)容通常圍繞話題背景知識(shí)進(jìn)行展開。例如,兩個(gè)人談?wù)撃骋槐緯鴷r(shí),他們大腦中已經(jīng)存在許多關(guān)于這本書籍的先驗(yàn)知識(shí)。缺乏先驗(yàn)知識(shí)會(huì)使對(duì)話系統(tǒng)與人的對(duì)話交互遭受語(yǔ)義和一致性等問(wèn)題困擾[12]。研究證明將外部知識(shí)作為對(duì)話基礎(chǔ),有利于生成更合理、信息性更豐富的回復(fù)[13~15]。因此,對(duì)話系統(tǒng)的研究重點(diǎn)逐漸轉(zhuǎn)向?qū)⑼獠恐R(shí)納入對(duì)話系統(tǒng)。Zhang等人[16]將人物角色描述(personal)作為個(gè)性化知識(shí)表征來(lái)增強(qiáng)上下文表示,提高模型回復(fù)選擇能力。Gu等人[17]采用IMN作為基礎(chǔ)架構(gòu),采取Zhang等人[16]人物角色融合方法,提出了改進(jìn)的個(gè)性化回復(fù)模型。Thulke等人[18]通過(guò)檢索無(wú)結(jié)構(gòu)文本知識(shí)來(lái)增強(qiáng)對(duì)話系統(tǒng)生成效果。Yang等人[19]提出一種新的圖結(jié)構(gòu)(ground graph)G2,對(duì)對(duì)話上下文和知識(shí)文檔的語(yǔ)義結(jié)構(gòu)進(jìn)行建模,以促進(jìn)任務(wù)的知識(shí)選擇和集成。Wu等人[20]提出一種知識(shí)源感知多頭解碼方法(knowledge source aware multi-head decoding)KSAM,更有效地將多源頭知識(shí)注入到對(duì)話生成中。Gu等人[21]設(shè)計(jì)了多種角色融合策略,深入探討了如何利用自身和同伴personal知識(shí)更好地進(jìn)行回復(fù)檢索。
現(xiàn)有的基于檢索的對(duì)話回復(fù)選擇工作已經(jīng)努力利用神經(jīng)網(wǎng)絡(luò)構(gòu)建了各種文本匹配模型,仍存在兩個(gè)突出問(wèn)題:a)將知識(shí)和上下文直接全用于匹配過(guò)程,這些信息并不全是有用的,不可避免地引入了噪聲信息,那些過(guò)度無(wú)用的信息會(huì)影響匹配過(guò)程;b)對(duì)已知內(nèi)容的潛在語(yǔ)義信息和時(shí)序關(guān)系采取的挖掘不夠充分。為了解決這些問(wèn)題,本文引入個(gè)性化角色知識(shí)(personal),提出了一個(gè)深度多匹配網(wǎng)絡(luò)(deep multi-matching network,DMMN)多輪對(duì)話回復(fù)選擇。該模型將知識(shí)和上下文的編碼信息相結(jié)合,使兩者軟對(duì)齊,篩選上下文與知識(shí)得到基于知識(shí)感知的上下文信息和基于上下文感知的知識(shí)信息。然后,將上述兩者與候選回復(fù)同時(shí)通過(guò)雙向交叉注意力機(jī)制進(jìn)行雙交互匹配,使用BiLSTM(bi-directional long short-term memory)[22,23]分別將匹配特征信息進(jìn)行整合之后,借助一個(gè)單獨(dú)的BiLSTM網(wǎng)絡(luò)和一個(gè)帶門控的注意力機(jī)制分別進(jìn)一步增強(qiáng)基于回復(fù)的上下文匹配信息和基于回復(fù)的知識(shí)匹配信息。最后將信息整合傳入最終預(yù)測(cè)層,進(jìn)行回復(fù)選擇。
本文主要貢獻(xiàn)有:a)提出了DMMN進(jìn)行多輪對(duì)話回復(fù)選擇,并在原始的和已修改的Persona-Chat數(shù)據(jù)集上證明其有效性;b)提出預(yù)匹配層,采用交叉注意力機(jī)制分別對(duì)知識(shí)和上下文中重要部分信息進(jìn)行篩選,使兩者軟對(duì)齊,解決了知識(shí)和上下文存在高度不對(duì)稱信息的問(wèn)題,使其能夠更好地進(jìn)行雙交互匹配;c)在特征聚合時(shí),分別借助BiLSTM網(wǎng)絡(luò)增強(qiáng)基于回復(fù)的上下文的匹配信息,幫助模型學(xué)習(xí)序列中的語(yǔ)義關(guān)系和時(shí)序信息,借助帶門控的注意力機(jī)制增強(qiáng)基于回復(fù)的知識(shí)匹配信息,更好地挖掘有用信息,舍棄冗余信息。
1 任務(wù)定義
2 DMMN模型
2.1 模型概述
2.2 表示層
2.3 編碼層
2.4 預(yù)匹配層
2.5 匹配層
2.6 聚合層
2.7 預(yù)測(cè)層
2.8 損失函數(shù)優(yōu)化策略
3 實(shí)驗(yàn)
本章具體介紹了驗(yàn)證模型效果的實(shí)驗(yàn),包括使用的數(shù)據(jù)集、實(shí)驗(yàn)數(shù)值設(shè)置、使用的評(píng)價(jià)指標(biāo)、對(duì)比模型、消融實(shí)驗(yàn)以及實(shí)驗(yàn)結(jié)果分析。
3.1 數(shù)據(jù)集
本文模型主要在原始的和修改后的Persona-Chat數(shù)據(jù)集上與基線模型進(jìn)行對(duì)比實(shí)驗(yàn)。該數(shù)據(jù)包含來(lái)自人類的162 064個(gè)對(duì)話語(yǔ)句,單個(gè)話語(yǔ)中每句最多15個(gè)詞。數(shù)據(jù)集作者將兩個(gè)人隨機(jī)配對(duì),每個(gè)人只知道自己的個(gè)性化角色信息內(nèi)容,不知道對(duì)方的個(gè)性化角色信息內(nèi)容,每個(gè)人按照被分配的個(gè)性化角色進(jìn)行自然的對(duì)話,并且在談話中了解對(duì)方。該原始的Persona-Chat數(shù)據(jù)集中的數(shù)據(jù)信息包括了8 939個(gè)完整的訓(xùn)練對(duì)話,其中1 000個(gè)用于驗(yàn)證,968個(gè)用于測(cè)試。如表1所示(加粗字體為所用到的Persona),實(shí)驗(yàn)發(fā)現(xiàn)人與人的談話過(guò)程中會(huì)不知不覺(jué)地重復(fù)一些個(gè)人角色信息里的詞匯內(nèi)容,數(shù)據(jù)集作者將這些信息通過(guò)改寫后,創(chuàng)建了修改后的Persona-Chat數(shù)據(jù)集。在原始的Persona-Chat數(shù)據(jù)集中,每一個(gè)個(gè)人角色資料在文件中平均含有4.49句話。已修改后的Persona-Chat數(shù)據(jù)集與原始的Persona-Chat句子數(shù)量相同,在原始資料中每句平均有7.33個(gè)單詞,修復(fù)后的資料中每句平均有7.32個(gè)單詞。
本文在原始Persona-Chat和修改后的Persona-Chat上分別做了實(shí)驗(yàn)。
3.2 基線模型
以下模型將作為基線模型,在Persona-Chat數(shù)據(jù)集上與本文模型進(jìn)行比較。
a)profile memory[16]。采用上下文作為查詢,對(duì)profile句子進(jìn)行關(guān)注,用余弦來(lái)測(cè)量融合查詢,對(duì)profile句子進(jìn)行關(guān)注,用余弦來(lái)測(cè)量融合查詢與回復(fù)之間的相似度。
b)KV profile memory[16]。對(duì)profile memory進(jìn)行改進(jìn),擴(kuò)展成多跳模型。首先用profile來(lái)獲得融合的查詢,隨后在第二跳中,將對(duì)話歷史作為關(guān)鍵信息,來(lái)幫助當(dāng)前對(duì)話的預(yù)測(cè)。
c)Transformer[27]。Transformer的一個(gè)變體,將上下文和回復(fù)候選對(duì)象進(jìn)行編碼,在Persona-Chat數(shù)據(jù)集上表現(xiàn)出不錯(cuò)的性能。
d)DGMN[1]。通過(guò)自我注意機(jī)制對(duì)對(duì)話歷史和知識(shí)進(jìn)行編碼,分別通過(guò)層次注意與候選回復(fù)交互。
e)DIM[17]。通過(guò)交叉注意機(jī)制使上下文和文檔分別與回復(fù)候選集進(jìn)行雙向交互匹配。此模型與上面提到的所有基線相比是最好的。
f)BERT-based。Gu等人[21]深入利用personal來(lái)進(jìn)行回復(fù)選擇,設(shè)計(jì)了多種角色融合策略,分別應(yīng)用于分層循環(huán)編碼器(HRE)[28]、交互匹配網(wǎng)絡(luò)(IMN)[7]和BERT[29]模型之中。在三者中,應(yīng)用于BERT模型的結(jié)果最好,本文基線采取其部署在BERT上的三種配置作為對(duì)比,分別為BERT-NA(無(wú)意識(shí))、BERT-CA(情境感知)和BERT-RA(回復(fù)感知)。
3.3 實(shí)驗(yàn)設(shè)置
實(shí)驗(yàn)在TensorFlow[30]框架上實(shí)現(xiàn),使用顯卡為Tesla T4的GPU機(jī)器進(jìn)行加速。本文模型參數(shù)配置參照DIM模型的參數(shù)部署。在表示層中,使用GloVe、word2vec和字符嵌入聯(lián)合生成表示。GloVe嵌入設(shè)置為300維,特定訓(xùn)練集的word2vec設(shè)置為100維,窗口為{3,4,5}大小的字符級(jí)的嵌入設(shè)置為150維。編碼層和聚合層都使用了BiLSTM網(wǎng)絡(luò)分別進(jìn)行編碼和聚合,其中批次維度為batch_size,在訓(xùn)練中將batch_size參數(shù)設(shè)置為16,此數(shù)值根據(jù)顯卡算力可以酌情增添,所有BiLSTM網(wǎng)絡(luò)的隱藏狀態(tài)設(shè)置為200維度,dropout_keep_prob設(shè)置為0.8。多層感知分類器(MLP)的隱藏層中隱藏單元設(shè)置為256。實(shí)驗(yàn)中,將輪次num_epochs設(shè)置為10,每100輪評(píng)估一次。
在數(shù)據(jù)集的相關(guān)數(shù)值設(shè)置中,每個(gè)上下文中最大話語(yǔ)數(shù)設(shè)置為15,最大話語(yǔ)長(zhǎng)度設(shè)置為20;每個(gè)回復(fù)候選最大話語(yǔ)數(shù)值設(shè)置為20,最大的回復(fù)候選長(zhǎng)度設(shè)置為20;每個(gè)知識(shí)最大話語(yǔ)數(shù)為5,最大的知識(shí)句子長(zhǎng)度為15。如果在話語(yǔ)中少于數(shù)值,將用零填充。
3.4 對(duì)比實(shí)驗(yàn)結(jié)果與分析
3.5 消融實(shí)驗(yàn)
綜合表3、4消融實(shí)驗(yàn)的結(jié)果可以明確觀察到,在原始的和修改后的Persona-Chat數(shù)據(jù)集上,添加預(yù)匹配層的模型指標(biāo)分別最高提高了0.7%和1.2%;聚合層采用帶門控的注意力機(jī)制模塊的模型指標(biāo)提高了1.7%和0.2%。預(yù)匹配后效果優(yōu)于未添加的模型,帶門控的注意機(jī)制對(duì)知識(shí)的聚合起到了重要作用,增加此模塊在兩種數(shù)據(jù)集上結(jié)果上升皆明顯。任何一個(gè)新增的模塊都會(huì)導(dǎo)致性能的提高,完整模型綜合了兩者,分別最高提高了2.3%、最少提高了0.6%和最高提高了1.3%、最少提高了0.4%,這證明了本文模型每個(gè)組件的有效性和必要性。
3.6 案例研究
為了進(jìn)一步理解DMMN模型預(yù)匹配的作用性,本文隨機(jī)一段Persona-Chat數(shù)據(jù)集中的樣例,計(jì)算上下文語(yǔ)句的相關(guān)性得分,并將其可視化。表5顯示了每個(gè)話語(yǔ)與正確回復(fù)之間的相似性得分,表6為表5中對(duì)話語(yǔ)境文本對(duì)應(yīng)的知識(shí)集與正確回復(fù)之間的相似性得分??梢钥闯?,不同語(yǔ)句對(duì)回復(fù)選擇作用不同,預(yù)先對(duì)其進(jìn)行匹配選擇是必要的。
圖5為上下文與知識(shí)集相關(guān)性得分的可視化,圖6為知識(shí)與選擇出的回復(fù)相似性得分可視化??梢钥闯?,U1和U5分別用K1和K4獲得了較大的注意權(quán)重。同時(shí)一些不相關(guān)的項(xiàng),如相對(duì)于U1和U3的K4相關(guān)性得分較小,可以通過(guò)適當(dāng)閾值進(jìn)行過(guò)濾。這些實(shí)驗(yàn)結(jié)果證明了預(yù)匹配的有效性。
4 結(jié)束語(yǔ)
本文提出了一個(gè)融合知識(shí)的深度多匹配網(wǎng)絡(luò)來(lái)進(jìn)行多輪的對(duì)話回復(fù)選擇。與DIM相比,提出增加預(yù)匹配層,將數(shù)據(jù)全部拋入匹配層進(jìn)行交互運(yùn)算之前,率先對(duì)齊知識(shí)與上下文,篩選即將進(jìn)入匹配運(yùn)算的數(shù)據(jù),并且在聚合層分別對(duì)基于回復(fù)的上下文和基于回復(fù)的知識(shí)進(jìn)行信息增強(qiáng),進(jìn)行深度信息挖掘。實(shí)驗(yàn)結(jié)果證明,本文的改動(dòng)和補(bǔ)充可以提升回復(fù)選擇的準(zhǔn)確性。
希望未來(lái)可以在預(yù)匹配層進(jìn)一步細(xì)化篩選的數(shù)據(jù),提高挖掘?qū)υ挌v史中全局、局部信息的能力,嘗試更好地將數(shù)據(jù)信息互相融合。
參考文獻(xiàn):
[1]Zhao Xueliang, Tao Chongyang, Wu Wei, et al. A document-groun-ded matching network for response selection in retrieval-based chatbots[C]//Proc of the 28th International Joint Conference on Artificial Intelligence. 2019: 5443-5449.
[2]Shum X Y, He Xiaodong, Li Di. From Eliza to Xiao-Ice: challenges and opportunities with social chatbots[J]. Frontiers of Information Technology amp; Electronic Engineering, 2018,19(1): 10-26.
[3]Li Fenglin, Qiu Minghui, Chen Haiqing, et al. AliMe assist: an intelligent assistant for creating an innovative ecommerce experience[C]//Proc of ACM on Conference on Information and Knowledge Management. New York: ACM Press, 2017: 2495-2498.
[4]Yu Wu, Wei Wu, Chen Xing, et al. Sequential matching network: a new architecture for multi-turn response selection in retrieval-based chatbots[C]//Proc of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2017: 496-505.
[5]Zhang Zhuosheng, Li Jiangtong, Zhu Pengfei, et al. Modeling multi-turn conversation with deep utterance aggregation[C]//Proc of the 27th International Conference on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018: 3740-3752.
[6]Zhou Xiangyang, Li Lu, Dong Daxiang, et al. Multi-turn response selection for chatbots with deep attention matching network[C]//Proc of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018: 1118-1127.
[7]Gu Jiachen, Ling Zhenhua, Liu Quan. Interactive matching network for multi-turn response selection in retrieval-based chatbots[C]//Proc of the 28th ACM International Conference on Information and Know-ledge Management. New York: ACM Press, 2019: 2321-2324.
[8]秦漢忠, 于重重, 姜偉杰, 等. 基于多頭注意力和BiLSTM改進(jìn)DAM模型的中文問(wèn)答匹配方法[J]. 中文信息學(xué)報(bào), 2021,35(11): 118-126. (Qin Hanzhong, Yu Chongchong, Jiang Weijie, et al. Improved DAM model based on multi-headed attention and Bi-LSTM for Chinese question and answer matching[J]. Journal of Chinese Information Processing, 2021,35(11): 118-126.)
[9]Whang T, Lee D, Oh D, et al. Do response selection models really know what’s next? Utterance manipulation strategies for multi-turn response selection[C]//Proc of AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2021: 14041-14049.
[10]Liu Longxiang, Zhang Zhuosheng, Zhao Hai, et al. Filling the gap of utterance-aware and speaker-aware representation for multi-turn dialogue[C]//Proc of AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2021: 13406-13414.
[11]Zhang Wentao, Xu Shuang, Huang Haoran. Two-level supervised contrastive learning for response selection in multi-turn dialogue[EB/OL]. (2022-03-01). https://arxiv.org/abs/2203.00793.
[12]Kai Hua, Feng Zhiyuan, Chong Yangtao, et al. Learning to detect relevant contexts and knowledge for response selection in retrieval-based dialogue systems[C]//Proc of the 29th ACM International Conference on Information amp; Knowledge Management. New York:ACM Press, 2020: 525-534.
[13]Majumder B P, Jhamtani H, Berg-Kirkpatrick T, et al. Like hiking? You probably enjoy nature: persona-grounded dialog with commonsense expansions[C]//Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2020: 9194-9206.
[14]Xu Lin, Zhou Qixian, Fu Jinlan, et al. CorefDiffs: co-referential and differential knowledge flow in document grounded conversations[C]//Proc of the 29th International Conference on Computational Linguistics. [S.l.]: International Committee on Computational Linguistics, 2022: 471-484.
[15]Oh M S, Kim M S. Persona-knowledge dialogue multi-context retrie-val and enhanced decoding methods[EB/OL]. (2022-07-28). https://arxiv.org/abs/2207.13919.
[16]Zhang Saizheng, Dinan E, Urbanek J, et al. Personalizing dialogue agents: I have a dog, do you have pets too?[C]//Proc of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2204-2213.
[17]Gu Jiachen, Ling Zhenhua, Zhu Xiaodan, et al. Dually interactive matching network for personalized response selection in retrieval-based chatbots[C]//Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computatio-nal Linguistics, 2019: 1845-1854.
[18]Thulke D, Daheim N, Dugast C, et al. Efficient retrieval augmented generation from unstructured knowledge for task-oriented dialog[EB/OL]. (2021-02-09). https://arxiv.org/abs/2102.04643.
[19]Yang Yizhe, Gao Yang, Li Jiawei, et al. G2: enhance knowledge grounded dialogue via ground graph[EB/OL]. (2022-04-27). https://arxiv.org/abs/ 2204.12681.
[20]Wu Sixing, Li Ying, Zhang Dawei, et al. KSAM: infusing multi-source knowledge into dialogue generation via knowledge source aware multi-head decoding[C]//Proc of Findings of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2022: 353-363.
[21]Gu Jiachen, Liu Hui, Ling Zhenhua, et al. Partner matters! An empirical study on fusing personas for personalized response selection in retrieval-based chatbots[C]//Proc of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021: 565-574.
[22]Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks[C]//Proc of IEEE International Confe-rence on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE Press, 2013: 6645-6649.
[23]Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997,9(8): 1735-1780.
[24]Pennington J, Socher R, Manning C. GloVe: global vectors for word representation[C]//Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1532-1543.
[25]Mikolov T, Sutskever I, Chen Kai, et al. Distributed representations of words and phrases and their compositionality[C]//Proc of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[26]Hao Yanchao, Zhang Yuanzhe, Liu Kang, et al. An end-to-end mo-del for question answering over knowledge-base with cross-attention combining global knowledge[C]//Proc of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2017: 221-231.
[27]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proc of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 5998-6008.
[28]Serban I V, Sordoni A, Bengio Y, et al. Building end-to-end dialogue systems using generative hierarchical neural network models[C]//Proc of the 30th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2016: 3776-3783.
[29]Devlin J, Chang Mingwei, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proc of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171-4186.
[30]Abadi M, Barham P, Chen Jianmin, et al. TensorFlow: a system for large-scale machine learning[C]//Proc of the 12th USENIX Confe-rence on Operating Systems Design and Implementation. [S.l.]: USENIX Association, 2016: 265-283.