鐘志峰,夏一帆,周冬平,晏陽(yáng)天
基于改進(jìn)YOLOv4的輕量化目標(biāo)檢測(cè)算法
鐘志峰,夏一帆*,周冬平,晏陽(yáng)天
(湖北大學(xué) 計(jì)算機(jī)與信息工程學(xué)院,武漢 430062)( ? 通信作者電子郵箱1479099354@qq.com)
針對(duì)當(dāng)前YOLOv4目標(biāo)檢測(cè)網(wǎng)絡(luò)結(jié)構(gòu)復(fù)雜、參數(shù)多、訓(xùn)練所需的配置高以及實(shí)時(shí)檢測(cè)每秒傳輸幀數(shù)(FPS)低的問(wèn)題,提出一種基于YOLOv4的輕量化目標(biāo)檢測(cè)算法ML-YOLO。首先,用MobileNetv3結(jié)構(gòu)替換YOLOv4的主干特征提取網(wǎng)絡(luò),從而通過(guò)MobileNetv3中的深度可分離卷積大幅減少主干網(wǎng)絡(luò)的參數(shù)量;然后,用簡(jiǎn)化的加權(quán)雙向特征金字塔網(wǎng)絡(luò)(Bi-FPN)結(jié)構(gòu)替換YOLOv4的特征融合網(wǎng)絡(luò),從而用Bi-FPN中的注意力機(jī)制提高目標(biāo)檢測(cè)精度;最后,通過(guò)YOLOv4的解碼算法來(lái)生成最終的預(yù)測(cè)框,并實(shí)現(xiàn)目標(biāo)檢測(cè)。在VOC2007數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果表明,ML-YOLO算法的平均準(zhǔn)確率均值(mAP)達(dá)到80.22%,與YOLOv4算法相比降低了3.42個(gè)百分點(diǎn),與YOLOv5m算法相比提升了2.82個(gè)百分點(diǎn);而ML-YOLO算法的模型大小僅為44.75 MB,與YOLOv4算法相比減小了199.54 MB,與YOLOv5m算法相比,只高了2.85 MB。實(shí)驗(yàn)結(jié)果表明,所提的ML-YOLO模型,一方面較YOLOv4模型大幅減小了模型大小,另一方面保持了較高的檢測(cè)精度,表明該算法可以滿足移動(dòng)端或者嵌入式設(shè)備進(jìn)行目標(biāo)檢測(cè)的輕量化和準(zhǔn)確性需求。
目標(biāo)檢測(cè);輕量化網(wǎng)絡(luò);YOLOv4;MobileNetv3;加權(quán)雙向特征金字塔網(wǎng)絡(luò)
目標(biāo)檢測(cè)主要分為以YOLO(You Only Look Once)系列算法[1-4]為代表的一階段目標(biāo)檢測(cè)和以R-CNN(Region-Convolutional Neural Network)系列算法[5-7]為代表的二階段目標(biāo)檢測(cè)[8]。二階段的目標(biāo)檢測(cè)第一階段先得到存在物品的候選區(qū)域,第二階段再對(duì)候選區(qū)域中生成的候選框進(jìn)行框大小和位置的回歸預(yù)測(cè)生成預(yù)測(cè)框;而一階段目標(biāo)檢測(cè)直接在整張圖片中生成若干候選框,再對(duì)這些候選框進(jìn)行類別、框大小以及位置的回歸預(yù)測(cè)生成預(yù)測(cè)框。一階段目標(biāo)檢測(cè)比二階段目標(biāo)檢測(cè)速度更快但準(zhǔn)確率較低,對(duì)于需要進(jìn)行快速目標(biāo)檢測(cè)的場(chǎng)景,目前通常使用一階段目標(biāo)檢測(cè)算法。
YOLOv4(You Only Look Once version 4)和YOLOv5(You Only Look Once version 5)算法是對(duì)YOLOv3(You Only Look Once version 3)算法改進(jìn)的一階段目標(biāo)檢測(cè)算法,YOLOv5算法在YOLOv4算法兩個(gè)月后由不同團(tuán)隊(duì)推出,并且有從輕量的s型到相對(duì)復(fù)雜的x型5個(gè)版本結(jié)構(gòu)。YOLOv5并未發(fā)表論文,并且YOLOv4和YOLOv5的主干特征提取網(wǎng)絡(luò)和特征融合網(wǎng)絡(luò)均使用CSPDarknet(Cross Stage Partial Darknet)和PANet(Path Aggregation Network)結(jié)構(gòu),兩者性能其實(shí)很接近,YOLOv4的可定制化程度很高。從目前的實(shí)驗(yàn)結(jié)果[9]來(lái)看,基于Darknet的YOLOv4仍然是YOLO系列相對(duì)最為準(zhǔn)確的算法。而對(duì)于更輕量化的YOLOv5s模型來(lái)說(shuō),雖然該模型大小只有14 MB,但算法平均準(zhǔn)確率均值(mean Average Precision, mAP)較低,只有68%左右,無(wú)法滿足本文所需的相對(duì)較高精確率的需求。
故基于算法資料的豐富性,以及本文所追求的綜合輕量化和高mAP算法需求考慮,本文選擇YOLOv4算法進(jìn)行輕量化改進(jìn)。
為了實(shí)現(xiàn)輕量化網(wǎng)絡(luò),文獻(xiàn)[10-13]中將YOLO(YOLO-Tiny)的主干網(wǎng)絡(luò)替換成MobileNet系列[14-16]主干網(wǎng)絡(luò),雖然大幅減小了網(wǎng)絡(luò)參數(shù)量,但mAP的下降也相對(duì)嚴(yán)重;文獻(xiàn)[17-18]中提出將YOLO系列主干網(wǎng)絡(luò)添加密集模塊,減小網(wǎng)絡(luò)深度,雖然使網(wǎng)絡(luò)參數(shù)減少,但網(wǎng)絡(luò)結(jié)構(gòu)也變得復(fù)雜,網(wǎng)絡(luò)中每個(gè)模塊會(huì)和多個(gè)模塊連接,增加計(jì)算量;文獻(xiàn)[19]中采用網(wǎng)絡(luò)剪枝的方法去掉了對(duì)特征提取用處不大的特征層,提高了網(wǎng)絡(luò)效率,但是,剪枝的效果很難與其他剪枝方法比較,并且對(duì)于不同的數(shù)據(jù)集優(yōu)化效果不穩(wěn)定。上述文獻(xiàn)都難以在模型大小大幅減少的情況下,穩(wěn)定保持算法的準(zhǔn)確率。
針對(duì)上述問(wèn)題,本文提出了一種輕量化網(wǎng)絡(luò)結(jié)構(gòu)ML-YOLO(MobileNetv3Lite-YOLO)。ML-YOLO用MobileNetv3替換YOLOv4的主干網(wǎng)絡(luò)來(lái)大幅減少參數(shù)量,并簡(jiǎn)化EfficientDet[20]的加權(quán)雙向特征金字塔網(wǎng)絡(luò)(weighted Bi-directional Feature Pyramid Network,Bi-FPN)結(jié)構(gòu),在ML-YOLO的特征融合網(wǎng)絡(luò)中加入類注意力機(jī)制來(lái)彌補(bǔ)網(wǎng)絡(luò)輕量化帶來(lái)的mAP下降問(wèn)題。ML-YOLO在VOC(Visual Object Classes) 2007數(shù)據(jù)集上的mAP達(dá)到80.22%,僅比YOLOv4算法低3.42個(gè)百分點(diǎn),而模型大小為44.75 MB,相較于YOLOv4減少了81.68%。
圖1 YOLOv4網(wǎng)絡(luò)框架
圖2 標(biāo)準(zhǔn)卷積和深度可分離卷積對(duì)比
本文分別計(jì)算兩種卷積所用參數(shù)量,結(jié)果如式(1)(2)所示:
進(jìn)行標(biāo)準(zhǔn)卷積的時(shí)候輸入通道數(shù)遠(yuǎn)小于輸出通道數(shù),將式(1)(2)進(jìn)行比較得到式(3):
EfficientDet中所用的Bi-FPN是典型的復(fù)雜雙向特征融合FPN(Feature Pyramid Network)結(jié)構(gòu),如圖3,Bi-FPN在傳統(tǒng)的雙向特征融合FPN(如PANet)的基礎(chǔ)上去掉了進(jìn)入FPN結(jié)構(gòu)的最高維特征層和最低維特征層的兩個(gè)中間節(jié)點(diǎn),同時(shí)在中間各個(gè)特征層加入一條輸入特征圖與輸出特征圖相連的殘差邊,在一定程度上簡(jiǎn)化了FPN的結(jié)構(gòu)。
由于在每個(gè)進(jìn)行特征融合的節(jié)點(diǎn),不同的特征圖輸入對(duì)最后輸出的貢獻(xiàn)權(quán)重應(yīng)該是不同的,Bi-FPN引入了一個(gè)權(quán)重進(jìn)行訓(xùn)練,來(lái)調(diào)整不同輸入對(duì)輸出特征圖的貢獻(xiàn)度。
在權(quán)重的選擇方面,Bi-FPN使用了快速歸一化的優(yōu)化策略(fast normalized fusion),它在與基于Softmax的優(yōu)化實(shí)現(xiàn)的效果相似的情況下提高了30%的速度??焖贇w一化優(yōu)化的公式如式(4):
其中:、 為在特征融合節(jié)點(diǎn)輸入的特征圖數(shù),;為輸入的特征圖矩陣;為防止分母為0的常數(shù),值為10-4;、為輸入各特征圖權(quán)重,權(quán)重初始值范圍為、,之后每訓(xùn)練一次權(quán)重將通過(guò)ReLU(Rectified Linear Unit)激活函數(shù)保證值恒大于0。經(jīng)過(guò)多輪訓(xùn)練,最終在每個(gè)特征融合節(jié)點(diǎn),輸入的特征圖將獲得使目標(biāo)檢測(cè)算法效果最好的權(quán)重。
本文提出的ML-YOLO算法主要進(jìn)行了兩方面改進(jìn):一是用MobileNetv3替換YOLOv4的主干網(wǎng)絡(luò),二是改進(jìn)FPN的結(jié)構(gòu)為Bi-FPN-Lite,網(wǎng)絡(luò)框架如圖4所示。
圖4 ML-YOLO網(wǎng)絡(luò)框架
表 1 ML-YOLO 主干網(wǎng)絡(luò)結(jié)構(gòu)
圖5bneck結(jié)構(gòu)
Fig. 5bneck structure
主干網(wǎng)絡(luò)中,ML-YOLO使用由swish激活函數(shù)改進(jìn)的h-swish激活函數(shù)。式(5)為swish激活函數(shù):
本文實(shí)驗(yàn)所用的是VOC2007數(shù)據(jù)集,如圖6,數(shù)據(jù)集包含人、自行車、汽車等20個(gè)類別的9 963幅圖像,并且已經(jīng)人工標(biāo)注好了YOLO系列算法中所需的真實(shí)框的位置、大小、類別信息。
訓(xùn)練所用的實(shí)驗(yàn)平臺(tái)硬件配置為Intel Xeon CPU,2080 Ti x2 GPU,128 GB內(nèi)存的工作站,系統(tǒng)為Windows 10。測(cè)試算法的實(shí)驗(yàn)平臺(tái)為Intel Core i7-6700HQ CPU,960M GPU,內(nèi)存為8 GB的筆記本電腦。統(tǒng)一設(shè)置訓(xùn)練輪次為100,批大小為16,初始學(xué)習(xí)率為1E-3,Bi-FPN-Lite中的起始權(quán)重均設(shè)為1。訓(xùn)練集共5 012幅圖像,在每輪訓(xùn)練時(shí)都會(huì)取90%的照片用于訓(xùn)練,另外10%的照片實(shí)時(shí)檢測(cè)訓(xùn)練效果,如圖7。本實(shí)驗(yàn)會(huì)選取各輪次中損失最低的一組權(quán)重文件進(jìn)行mAP大小、網(wǎng)絡(luò)結(jié)構(gòu)參數(shù)量、模型大小、實(shí)時(shí)檢測(cè)每秒傳輸幀數(shù)(Frames Per Second, FPS)的對(duì)比。
圖 6 VOC2007數(shù)據(jù)集
圖 7 驗(yàn)證VOC2007數(shù)據(jù)集上的訓(xùn)練效果
表 2 ML-YOLO消融實(shí)驗(yàn)
ML-YOLO與YOLOv4實(shí)時(shí)檢測(cè)FPS對(duì)比如圖8,ML-YOLO實(shí)時(shí)檢測(cè)FPS為14.15,每秒傳輸幀率比YOLOv4提升了123.54%,極大地提高了實(shí)時(shí)目標(biāo)檢測(cè)的速度。
圖 8 ML-YOLO與YOLOv4實(shí)時(shí)檢測(cè)FPS對(duì)比
ML-YOLO與YOLOv4的部分檢測(cè)結(jié)果如圖9。圖9(a)代表ML-YOLO在VOC測(cè)試集中各個(gè)類別的真陽(yáng)(True Positive)物品和假陽(yáng)(False Positive)物品的檢測(cè)數(shù)量。True Positive代表檢測(cè)到物品為某個(gè)類別并且物品確實(shí)為這個(gè)類別的數(shù)量;False Positive代表檢測(cè)到物品為某個(gè)類別,但物品并不是這個(gè)類別或者沒(méi)有這個(gè)物品的數(shù)量。這兩個(gè)指標(biāo)可以衡量檢測(cè)算法檢測(cè)準(zhǔn)確率;圖9(b)為ML-YOLO在VOC測(cè)試集中各個(gè)類別的誤檢率,可以反映算法在測(cè)試集上生成錯(cuò)誤的預(yù)測(cè)框的概率;圖9(c)代表ML-YOLO算法的mAP以及算法在各個(gè)類別的平均準(zhǔn)確率(Average Precision, AP),圖中每個(gè)柱狀圖都代表一個(gè)類別的AP;圖9(d)為YOLOv4算法的mAP以及算法在各個(gè)類別的AP。對(duì)比圖9(c)、圖9(d),可以看出,本文所提出的ML-YOLO的mAP下降并不明顯,可以保持較好的性能。
此外為了檢測(cè)ML-YOLO算法在VOC2007數(shù)據(jù)集以外圖片中表現(xiàn),本文還選取了一張包含自行車、人、汽車等復(fù)雜場(chǎng)景的圖片進(jìn)行目標(biāo)檢測(cè),對(duì)比改進(jìn)算法的實(shí)際效果,結(jié)果如圖10。
每個(gè)框的上方數(shù)值代表算法檢測(cè)框的置信度,置信度是YOLO系列算法評(píng)估檢測(cè)框準(zhǔn)確性的指標(biāo),表示預(yù)測(cè)框檢測(cè)到某個(gè)物品種類時(shí),預(yù)測(cè)框與物品真實(shí)框的重合程度,如式(11):
如圖11,對(duì)比細(xì)節(jié)可以看出ML-YOLO相較于YOLOv4檢測(cè)出了更多遠(yuǎn)處的人,對(duì)小物品的敏感度更高,不過(guò)對(duì)被遮擋的車敏感度有所下降。
圖 11 ML-YOLO與YOLOv4檢測(cè)細(xì)節(jié)對(duì)比
為了比較ML-YOLO算法與其他主流目標(biāo)檢測(cè)算法的性能,本文在VOC 2007數(shù)據(jù)集上進(jìn)行了ML-YOLO算法與主流目標(biāo)檢測(cè)算法的對(duì)比實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果如表3。
表 3 ML-YOLO與其他算法對(duì)比
實(shí)驗(yàn)結(jié)果表明相較于YOLOv5s、EfficientDet-d0等非常輕量化的算法,ML-YOLO雖然模型大小稍大,但mAP有較大幅度提高,并且FPS相差不大。另外與YOLOv4、CenterNet這些模型較大的算法相比,ML-YOLO的mAP比YOLOv4低3.42個(gè)百分點(diǎn),比CenterNet高出3.12個(gè)百分點(diǎn),同時(shí)處理速度有較大提升。綜合模型大小、mAP和FPS來(lái)看,ML-YOLO確實(shí)有較好性能。
本文提出的ML-YOLO是通過(guò)替換YOLOv4的主干網(wǎng)絡(luò)為MobileNetv3、簡(jiǎn)化Bi-FPN所得到的輕量化網(wǎng)絡(luò)。MobileNetv3通過(guò)深度可分離卷積大幅度減少了卷積所需的參數(shù)量,又加入了注意力機(jī)制,通過(guò)分配特征圖每個(gè)通道不同權(quán)重優(yōu)化主干網(wǎng)絡(luò)特征提取效果;簡(jiǎn)化的Bi-FPN結(jié)構(gòu)繼承了PANet雙向特征融合特點(diǎn)的同時(shí),適當(dāng)較少了結(jié)構(gòu)中的節(jié)點(diǎn)數(shù),同時(shí)加入了注意力機(jī)制,在減少卷積次數(shù)的同時(shí)提高了不同層間特征融合的魯棒性,提高了算法的mAP。
未來(lái)將會(huì)考慮進(jìn)一步優(yōu)化ML-YOLO系列算法的主干網(wǎng)絡(luò)結(jié)構(gòu),嘗試加入不同的FPN結(jié)構(gòu),在提高算法mAP的同時(shí)進(jìn)一步減小模型大小和參數(shù)量,并且在工程中實(shí)現(xiàn)運(yùn)用。
[1] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788.
[2] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525.
[3] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2021-04-08].https://arxiv.org/pdf/1804.02767.pdf.
[4] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23) [2021-04-08].https://arxiv.org/pdf/2004.10934.pdf.
[5] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587.
[6] GIRSHICK R. Fast R-CNN[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1440-1448.
[7] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[8] ZOU Z X, SHI Z W, GUO Y H, et al. Object detection in 20 years: a survey[EB/OL]. (2019-05-16) [2021-04-08].https://arxiv.org/pdf/1905.05055.pdf.
[9] JOCHER G. YOLOv5[EB/OL]. [2021-04-08].https://github.com/ultralytics/yoloV5/issues/6.
[10] 齊榕,賈瑞生,徐志峰,等. 基于YOLOv3的輕量級(jí)目標(biāo)檢測(cè)網(wǎng)絡(luò)[J]. 計(jì)算機(jī)應(yīng)用與軟件, 2020, 37(10):208-213.(QI R, JIA R S, XU Z F, et al. Lightweight object detection network based on YOLOv3 [J]. Computer Applications and Software, 2020, 37(10):208-213.)
[11] JIN Y Z, WEN Y X, LIANG J T. Embedded real-time pedestrian detection system using YOLO optimized by LNN[C]// Proceedings of the 2020 International Conference on Electrical, Communication, and Computer Engineering. Piscataway: IEEE, 2020: 1-5.
[12] 黃靖淞,左顥睿,張建林. 輕量化目標(biāo)檢測(cè)算法研究及應(yīng)用[J]. 計(jì)算機(jī)工程, 2021, 47(10):236-241.(HUANG J S, ZUO H R,ZHANG J L. Research and application on lightweight object detection algorithm[J]. Computer Engineering, 2021, 47(10):236-241.)
[13] 邵偉平,王興,曹昭睿,等. 基于MobileNet與YOLOv3的輕量化卷積神經(jīng)網(wǎng)絡(luò)設(shè)計(jì)[J]. 計(jì)算機(jī)應(yīng)用, 2020, 40(S1):8-13.(SHAO W P, WANG X, CAO Z R, et al. Design of lightweight convolutional neural network based on MobileNet and YOLOv3[J]. Journal of Computer Applications, 2020, 40(S1):8-13.)
[14] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17) [2021-04-08].https://arxiv.org/pdf/1704.04861.pdf.
[15] SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4510-4520.
[16] HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1314-1324.
[17] 馮媛,李敬兆. 改進(jìn)的卷積神經(jīng)網(wǎng)絡(luò)行人檢測(cè)方法[J]. 計(jì)算機(jī)工程與設(shè)計(jì), 2020, 41(5):1452-1457.(FENG Y, LI J Z. Improved convolutional neural network pedestrian detection method[J]. Computer Engineering and Design, 2020, 41(5):1452-1457.)
[18] HU X L, LIU Y, ZHAO Z X, et al. Real-time detection of uneaten feed pellets in underwater images for aquaculture using an improved YOLO-V4 network[J]. Computers and Electronics in Agriculture, 2021, 185: No.106135.
[19] FANG W, WANG L, REN P M. Tinier-YOLO: a real-time object detection method for constrained environments[J]. IEEE Access, 2020, 8: 1935-1944.
[20] TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787.
[21] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[22] WANG W H, XIE E Z, SONG X G, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 8439-8448.
[23] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
ZHONG Zhifeng, born in 1971, Ph. D., professor. His research interests include artificial intelligence, signal processing, system integration.
XIA Yifan, born in 1998, M. S. candidate. His research interests include object detection, machine vision.
ZHOU Dongping, born in 1997, M. S. candidate. His research interests include recommender system, knowledge graph.
YAN Yangtian, born in 1997, M. S. candidate. His research interests include deep learning, natural language processing.
Lightweight object detection algorithm based on improved YOLOv4
ZHONG Zhifeng, XIA Yifan*, ZHOU Dongping, YAN Yangtian
(,,430062,)
YOLOv4 (You Only Look Once version 4) object detection network has complex structure, many parameters, high configuration required for training and low Frames Per Second (FPS) for real-time detection. In order to solve the above problems, a lightweight object detection algorithm based on YOLOv4, named ML-YOLO (MobileNetv3Lite-YOLO), was proposed. Firstly, MobileNetv3 was used to replace the backbone feature extraction network of YOLOv4, which greatly reduced the amount of backbone network parameters through the depthwise separable convolution in MobileNetv3. Then, a simplified weighted Bi-directional Feature Pyramid Network (Bi-FPN) structure was used to replace the feature fusion network of YOLOv4. Therefore, the object detection accuracy was optimized by the attention mechanism in Bi-FPN. Finally, the final prediction box was generated through the YOLOv4 decoding algorithm, and the object detection was realized. Experimental results on VOC (Visual Object Classes) 2007 dataset show that the mean Average Precision (mAP) of the ML-YOLO algorithm reaches 80.22%, which is 3.42 percentage points lower than that of the YOLOv4 algorithm, and 2.82 percentage points higher than that of the YOLOv5m algorithm; at the same time, the model size of the ML-YOLO algorithm is only 44.75 MB, compared with the YOLOv4 algorithm, it is reduced by 199.54 MB, and compared with the YOLOv5m algorithm, it is only 2.85 MB larger. Experimental results prove that the proposed ML-YOLO model greatly reduces the size of the model compared with the YOLOv4 model while maintaining a higher detection accuracy, indicating that the proposed algorithm can meet the lightweight and accuracy requirements of mobile or embedded devices for object detection.
object detection; lightweight network; YOLOv4 (You Only Look Once version 4); MobileNetv3; Bi-FPN (weighted Bi-directional Feature Pyramid Network)
This work is partially supported by Hubei Province Technological Innovation Special Project (2018ACA13).
TP391.4
A
1001-9081(2022)07-2201-09
10.11772/j.issn.1001-9081.2021050734
2021?05?10;
2021?09?22;
2021?09?24。
湖北省技術(shù)創(chuàng)新專項(xiàng)(2018ACA13)。
鐘志峰(1971—),男,湖北黃岡人,教授,博士,主要研究方向:人工智能、信號(hào)處理、系統(tǒng)集成; 夏一帆(1998—),男,湖北黃岡人,碩士研究生,主要研究方向:目標(biāo)檢測(cè)、機(jī)器視覺(jué); 周冬平(1997—),男,湖北隨州人,碩士研究生,主要研究方向:推薦系統(tǒng)、知識(shí)圖譜; 晏陽(yáng)天(1997—),男,湖北孝感人,碩士研究生,主要研究方向:深度學(xué)習(xí)、自然語(yǔ)言處理。