亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Efficient Object Segmentation and Recognition Using Multi-Layer Perceptron Networks

2024-03-12 06:12:18AyshaNaseerNoufAbdullahAlmujallySaudAlotaibiAbdulwahabAlazebandJeongminPark

Computers Materials&Continua 2024年1期

Aysha Naseer ,Nouf Abdullah Almujally ,Saud S.Alotaibi ,Abdulwahab Alazeb and Jeongmin Park

1Department of Computer Science,Air University,Islamabad,44000,Pakistan

2Department of Information Systems,College of Computer and Information Sciences,Princess Nourah bint Abdulrahman University,P.O.Box 84428,Riyadh,11671,Saudi Arabia

3Information System Department,Umm Al-Qura University,Makkah,Saudi Arabia

4Department of Computer Science,College of Computer Science and Information System,Najran University,Najran,55461,Saudi Arabia

5Department of Computer Engineering,Tech University of Korea,Gyeonggi-do,15073,South Korea

ABSTRACT Object segmentation and recognition is an imperative area of computer vision and machine learning that identifies and separates individual objects within an image or video and determines classes or categories based on their features.The proposed system presents a distinctive approach to object segmentation and recognition using Artificial Neural Networks(ANNs).The system takes RGB images as input and uses a k-means clustering-based segmentation technique to fragment the intended parts of the images into different regions and label them based on their characteristics.Then,two distinct kinds of features are obtained from the segmented images to help identify the objects of interest.An Artificial Neural Network(ANN)is then used to recognize the objects based on their features.Experiments were carried out with three standard datasets,MSRC,MS COCO,and Caltech 101 which are extensively used in object recognition research,to measure the productivity of the suggested approach.The findings from the experiment support the suggested system’s validity,as it achieved class recognition accuracies of 89%,83%,and 90.30%on the MSRC,MS COCO,and Caltech 101 datasets,respectively.

KEYWORDS K-region fusion;segmentation;recognition;feature extraction;artificial neural network;computer vision

1 Introduction

Segmenting and recognizing objects of interest in images and videos can be vital for various applications,including video and security surveillance[1,2]hyperspectral imaging[3],human detection[4],video streaming[5],emotion recognition[6],and traffic flow prediction[7],medical field[8],Selfdriving cars[9],etc.This makes posture recognition[10]and scene understanding[11]sizzling issues in artificial intelligence (AI) and computer vision (CV).The purpose of the field is in the direction of teaching machines to understand (recognize) the content of images in the same way as humans.The extent of this research is constrained to object segmentation and recognition.While there has been significant progress in object detection and segmentation techniques,more can yet be done.Researchers are driven to create algorithms that are more precise,reliable,and scalable than those used today to overcome the drawbacks and difficulties of existing methods.To produce innovative findings and receive respect from the academic community,researchers in the fields of object detection and segmentation compete to surpass one another in benchmarks and challenges.The outcome of this research could influence a variety of areas and lead to better monitoring and security systems,safer autonomous vehicles,and increased industrial automation.To comprehend visual scenes,acquire data,and come to conclusions,segmentation sets object borders while object detection identifies specific objects in images or videos.

We outline a thorough approach to accurate object recognition in tropical settings.Through a series of carefully thought-out steps,our technique ensures accurate and successful object identification.To decrease noise while keeping important edge features,we first preprocess all RGB images using spatial domain filtering.Then,using our special“k-region fusion”method,which combines regionbased segmentation and k-means clustering,we execute image segmentation to extract necessary objects from the background.By creating more meaningful and coherent object portions,this fusion technique raises the quality of segmentation.Then,we extract features utilizing two distinct descriptors: SIFT (Scale Invariant Feature Transform) and ORB (Oriented FAST and Rotated BRIEF).Finally,object classification is accomplished using an artificial neural network.Large,publically accessible datasets were used in the experiment,including MSRC-v2,MSCOCO 2017,and Caltech 101.This work aims to address the issue of the need for improved object identification under difficult tropical environments.Current solutions sometimes lack the precision and durability required for critical applications like privacy and vehicle autonomy.The motivations behind this include the industry’s competitiveness,its potential for widespread influence,and the need to enhance critical performance metrics.Over time,these elements will contribute to the development of safer and more efficient systems in several fields.

Our research’s primary insights can be summed up as follows:

? The incorporation of a spatial domain filter during pre-processing successfully reduces noise in the images while maintaining crucial edge information,producing improved segmentation results.

? The development of“k-region fusion”illustrates the potency of merging region-based segmentation with k-means clustering as a strategy for segmenting images.

? Our robust object recognition approach is demonstrated by our high-performance classification system,which is driven by an Artificial Neural Network and features that originate from ORB and SIFT descriptors.

? We have substantially elevated the precision,sensitivity,F1 score,and mean accuracy performance measures for object recognition when compared to prior approaches.

? In the experimental results,the suggested model’s impact has been confirmed across three publically accessible datasets,displaying exceptional performance.

The remaining part of this article is structured into several units that provide a comprehensive overview of the proposed system,its approach,and experimental results.In Section 2,we discuss and analyze relevant investigated work related to the presented system,providing a comprehensive review of the existing literature.Section 3 defends the whole methodology of our system,which includes a general pre-classification procedure.Section 4 examines the datasets used in our recommended approach and demonstrates the structure’s strength over various tests.Finally,in Section 5,we summarise our significant results and contributions to the research.Overall,this work presents a thorough description of our proposed approach and its potential consequences for the research field.

2 Related Work

Object detection and recognition have been progressively developed by researchers for several years[12].They have investigated the possibility of complex images for anomaly detection[13],videos in depth and RGB+D [14] (Red,Green,Blue,Depth) films to improve the effectiveness of their processes in addition to ordinary RGB images.The complete image is typically utilized as feedback,and characteristics are dug out from it,which is a modest and effective method of identifying multiple objects in a single image.Segmentation is an essential first step in many techniques,including mine.This involves dividing the image into distinctive,meaningful regions with identical characteristics based on image component incoherence or similarities.The performance of succeeding processes is strongly dependent on the accuracy of segmentation findings.Furthermore,tools for segmentation and classification for feature detection method[15].In the proposed approach,objects are generated through an image segmentation process,wherein pixels with similar spectral characteristics are grouped to form a segment.Neural network implementation for object recognition then leads to more precise results.As a result,associated work can be classified into object recognition using Red,Green,Blue(RGB),and depth images.

2.1 Object Detection and Recognition over RGB Images

In the past,image-based approaches were frequently exploited.Chaturvedi et al.[16] used a combination of different classifiers with the Viola-Jones algorithm and the You Look Only Once version 3 (YOLOv3) algorithm for object recognition.The focus of their research was on selecting an algorithm that provides us with a good balance of accuracy and efficiency.Li et al.[17]introduced an object identification system centered around the best Bag of Words model and Area of Interest(AOI).They estimated the Region of Interest(ROI)using a saliency map and a Shi-Thomasi corner.To recognize and classify objects,they used Scale Invariant Feature Transform(SIFT)feature descriptors,a visual codebook,a Gaussian Mixture Model (GMM),and a Support Vector Machine (SVM).Deshmukh et al.[18]employed a novel processing strategy to find objects in original images by merging the object detection API,a mixture of identified edges,and an edge detection algorithm.

2.2 Object Detection and Recognition over Depth Images

Many researchers have engaged in identifying the objects of interest in an image over the last couple of years.As depth images are insensitive to lighting variations and intrinsically integrate 3D data,numerous intensity-based detection methods have been suggested.Lin et al.[19] presented a strong and reliable system for object identification in their research article.Their approach includes applying probabilistic image segmentation to remove the background from images.Cupec et al.[20]developed a method for recognizing fruits using depth image analysis.This strategy builds a group of triangles from depth images using Delaunay triangulation.Delaunay triangulation is a geometry technique aimed at generating a standard triangular grid from a given point collection.Then,using a region growth mechanism,convex surfaces were formed by joining triangles,each of which represented a possible fruit.Ahmed et al.[21]recommended a different approach by a using Histogram of Oriented Gradients (HOG) to excerpt features and detect an object by applying Nearest Neighbor search(NNS).Finally,they used the Hough voting algorithm to recognize the objects.

3 Suggested Method

This section outlines the proposed object recognition context.The schematic architecture is portrayed in Fig.1.Each RGB image was pre-processed.To smooth the images,the spatial domain filter is utilized.The images were then fragmented into foreground and background using the k-region fusion technique.The desired objects are present in the resulting foreground.For feature extraction,two kinds of features Oriented Fast and Robust Brief(ORB)and Scale Invariant Feature Transform(SIFT)were fused after extraction.Finally,the requested objects are classified using an artificial neural network.The subsections that follow describe each phase of the framework.

Figure 1:An overall description of the suggested system

3.1 Image Pre-Processing General

As part of the pre-processing,all RGB images in both datasets have undergone a filtering approach.Fig.2 illustrates the outcome of the filtered images,demonstrating that pre-processing at this stage increases the entirety of the system’s efficiency.Furthermore,the image normalization and median filter employed in pre-processing are explored in further detail in the subsection that follows.

Figure 2:Original images(a)MS COCO(b)MSRC-v2

3.1.1 Image Normalization

Normalization of an image is the process of altering the intensity values of pixels within an image to increase the image’s contrast.Datasets with the initial images are gathered underneath various conditions during pre-processing,such as radiance variations and dispersion of contrast[22],yielding more objects,greater intensity values,and different object scales in the images.To eliminate this unwanted information,we initially minimized the resolution to 213×213 by using fixed window resizing.Fig.2 represents normalized images.

3.1.2 Noise Removal

A median filter has been employed to enhance image quality as well as minimize noise.A median filter smoothed the images while keeping all of the objects’edges[23].The median filter is a nonlinear digital filter that is applied to eradicate distortion from an image or signal.This form of noise reduction is a popular pre-processing approach for improving the results of subsequent processing(such as edge identification on an image)[24].This approach works by replacing every pixel value with the median value calculated from the adjacent pixels.Eqs.(1) and (2) can be used to define the smooth image obtained after applying the median filter.Fig.3 is showing the pre-processed images of some classes from the mentioned datasets.

where(J_k(i,j))represent the submatrix centered at(i,j),J_k_sorted(i,j)represent the sorted vector of pixel values achieved by flattening the submatrix(J_k(i,j)),J_filtered(i,j)is the output filtered image andM(i,j)is the median value of the sorted vector.

Figure 3:Outcomes after pre-processing(a)MS COCO(b)MSRC-v2

3.2 K-Mean Clustering

K-means clustering is a standard unsupervised machine learning technique for gathering information based on similarities.It can be used on a variety of data types,including images.K-means clustering can be used in image processing to group similar pixels together to simplify the image or extract useful information from it.K-means clustering can be used in image segmentation to group similar pixels together based on their color or texture,and thus separate regions of the image that have different color or texture characteristics [24].To cluster homogeneous color regions,the k-mean algorithm is used,and it only requires the number of clusters k at the start,with no other prior knowledge required [25,26].K-means clustering uses Euclidean distance as in Eq.(3) to find similarities between objects.

Assigning random centroids to clusters and updating them based on the mean of the objects in the cluster until convergence are the steps involved in K-means clustering[27,28].Fig.4 is showing a combined resultant flowchart after preprocessing and clustering on incorporated datasets.

3.3 Object Segmentation

Image segmentation was carried out after pre-processing the images.The goal of region-based segmentation is to create a set of homogeneous regions based on these criteria [16].In this article,in order to obtain better segmentation results,we initially apply k-region fusion (clustering using K-means to an image and then performing region-based segmentation on the generated clusters).Huda et al.suggested a method for region-merging segmentation [29].K-means clustering can be used to cluster similar pixels together based on their color or texture reducing image complexity and improving the effectiveness of region-based segmentation [30].We can identify regions with similar color or texture characteristics by grouping similar pixels into clusters,which can then be used as inputs for region-based segmentation(see Eq.(4)).

Figure 4:Combined flowchart for preprocessing and clustering

The similarities between neighboring pixels(i,j)are ascertained using region-based segmentation.Pixels that have similar properties will form a unique region [19,20].Ahmed et al.also used regions to detect objects rather than the traditional sliding window method[21].Adjacent pixels in an image are compared to the region’s reference intensity values at each pixel[22].The adjacent pixel is chosen if the difference is less than or equal to the difference threshold.Fig.5 displays the resultant clustered segmented images.

Figure 5:Resultant segmentation of k-region fusion

3.4 Feature Extraction

In this article,we incorporate the use of both Oriented Fast and Robust Brief(ORB)features and Scale Invariant Feature Transform(SIFT)features.The following subsections define the specifics and outcomes of the mentioned features.The process of identifying and extracting useful information,or features,from an image for further analysis or processing is known as feature extraction on an image.Feature extraction aims to reduce the number of features in a dataset by creating new features from the existing ones(and subsequently removing the original features).This new,more condensed collection of features should then serve as a representation of the vast majority of the details in the original set of features.The extracted features should capture important aspects of the image,such as texture,color,shape,or edges[31].

3.4.1 Oriented Fast and Robust Brief

The Oriented Fast and Robust Brief (ORB) [32] is a high-performance feature detector that combines the FAST (Features from Accelerated Segment Test) keypoint detector’s orientation and rotational resilience with the description of visual appearance BRIEF stands for “Binary Robust Independent Elementary Features”[33].It detects key characteristics efficiently and provides a quick and reliable solution for feature extraction in computer vision applications.Location of key points determined by Eq.(5).

where i and j are the pixel intensity analyzed and determined at a and b,respectively.“u”and“v”is near the FAST feature point,a circle with a radius“r”of the vicinity u,vε[-r,r].Then figure out the center of mass,as presented in Eq.(6)which is also referred to as its“center of mass”.

We will calculate binary descriptors from BRIEF using Eqs.(7)and(8).

whereIis an image and(a,b)are the pixel values.We can figure out the patch’s orientation and generate a vector from the center(Eq.(9))of the corner to the centroid.

Fig.6 depicts the retrieved characteristics using ORB.

Figure 6:Features extracted using ORB on images from both datasets

3.4.2 Scale Invariant Feature Transform

To construct the set of image features,SIFT (see Algorithm 1) computed the following points,where H (a,b) is an input image,i and j are distances from points a and b [34],respectively,and is the Gaussian scale(see Eq.(10)).After fitting a model to determine scale and location(Eq.(11)),key points are chosen based on stability[35].

SIFT helps to regulate a direction for each key point to define a feature vector for each key[36].

where H(a,b)is an input image,i and j are the distances from point a,b and is the scale of Gaussian,respectively.The SIFT technique(Algorithm 1)is useful for 3D reconstruction and object detection.It can withstand variations in illumination,rotation,and image scale.Each key point’s direction is normalized by SIFT,creating a feature vector [37].A key point has an orientation to maintain robustness against rotation variations and calculate gradient magnitude (Eq.(12)) G.M (a,b) and gradient rotation(Eq.(13))G.R(a,b)[28]around collected key points.

Fig.7 represents the extracted features by SIFT.

Figure 7:Features extracted using SIFT on images from both datasets

3.4.3 Feature Fusion

While SIFT is dependable and invariant,handling scale,rotation,and lighting changes using histograms of gradient magnitudes and orientations,ORB uses binary strings for effective and condensed keypoint encoding.The binary character of ORB and the stability of SIFT are combined in feature fusion to provide a potent keypoint representation that captures crucial visual information including texture,edges,and shape.Using these combined attributes,operations like object detection and image matching are greatly enhanced.The concatenated feature vector F_combined is given by Eq.(14).

whereF_SIFTis the feature vector obtained from the SIFT descriptor,with dimensionality D_SIFT andF_ORBis the feature vector obtained from the ORB descriptor,with dimensionality D_ORB.

Here,[F_SIFT,F_ORB]denotes the association of the SIFT feature vector F_SIFT with the ORB feature vector F_ORB to create a single feature vector with a combined dimensionality of D_combined in Eq.(15).

3.5 Artificial Neural Network

Neurons in the input (X),hidden,and output (Y) layers make up Artificial Neural Networks(ANNs).They can be divided into three groups:feedback networks,multi-layer feedforward networks[38],and single-layer feedforward networks.In this study,artificial neural networks are used to process data using artificial neurons.A neuron’s activation function,which is comparable to decision-making in the brain,is the result of inputs moving from one neuron to the next.Interconnected layers of nodes (neurons) make up ANNs,which receive inputs,process them,and then output the results[39].Weights,which reflect neural connections and are modified throughout learning to improve task performance.In Algorithm 2,the functionality of ANN is described.

The principle of neurons can be presented as Eqs.(16)and(17).

where“yk”is the weighted sum for the kth neuron,“n”is the input features.“wki”:weight connecting the ith input to the kth neuron,“ai”value of ith input feature,“bk”is the bias term.Output“ok”of a neuron is defined in Eq.(18).

where“ok”:the output of the kth neuron,and“e”is Euler’s number(approx.2.71).

4 Evaluation Metrics

The study assesses the performance of the suggested system using three datasets for object recognition.It contrasts the proposed system with current object recognition technologies.

4.1 Dataset Overview

4.1.1 Microsoft Common Objects in Context(MS COCO)

To segment and identify objects in images,Microsoft developed the MS COCO dataset [40].In 330,000 images,there are 2.5 million object instances organized into 80 classes,including zebras,bears,and other typical objects.With significant occurrences of each of the categories,the dataset is utilized to evaluate the recognition tasks.

4.1.2 Microsoft Research in Cambridge(MSRC-v2)

The MSRC-v2 dataset entails 591 high-resolution images[41],21 different object categories(cow,sheep,grass,tree,horse,car,bicycle,plane,face)as well as one backdrop category[42]and ten distinct classes.Each 213×320-pixel image has a different color scheme and context.Due to the complex backgrounds and illumination,the dataset is demanding.

4.1.3 Caltech 101

The images included in the Caltech 101 dataset have several categories and are divided into object and background categories.The resolution of individual image is roughly 300×200 pixels.Numerous object classes,such as the camera,barrel,cup,bike,panda,chair,rhino,airplane,tree,and water are included in the Caltech 101 dataset.

4.2 Experimentations and Results

Python (3.7) has been employed for training and evaluating the system on an Intel Core i7 PC running 64-bit Windows 10.The machine has 16 GB of RAM (random access memory) and a 5(GHz)CPU.

4.2.1 Experimental I:Class Recognition Accuracy

The classification accuracies of the employed datasets by ten arbitrarily selected classes are shown in Table 1 as a confusion matrix.for MSRC-v2,Table 2 as a confusion matrix for MS COCO,and Table 3 as a confusion matrix for Caltech 101.

Table 1: Recognition accuracy confusion matrix over MSRC-v2 using ANN

Table 2: Recognition accuracy confusion matrix over MS COCO using ANN

Table 3: Recognition accuracy confusion matrix over Caltech 101 using ANN

Table 4: Precision,recall,F1 measure and computation time for MSRC-v2 dataset

Table 5: Precision,recall,F1 measure and computation time for MS COCO dataset

Table 6: Precision,recall,F1 measure and computation time for Caltech 101 dataset

4.2.2 Experimental II:Precision,Sensitivity,and F1 Measure

We report the precision,recall,and F measures for ten randomly selected classes from the datasets in this section.The results demonstrate that the presented recognition system is highly precise at identifying a variety of complicated objects.Eqs.(19)–(21)were used to figure out the precision,recall,and F1 scores for each object class in the datasets.

Tables 4–6 indicate the precision,sensitivity,and F measure using ANN for all datasets,i.e.,MSRC-v2,MS COCO,and Caltech 101,respectively,as TP stands for True positive,FP stands for False positive,and FN is for False negative.

In this paper,the comparison has been made among different classifiers on three datasets,i.e.,MS COCO,MSRC-v2,and Caltech 101.Artificial Neural Network gives us better results among all three.Results produced from all three classifiers have been shown below in Table 7.

Table 7: Comparison of ANN,RF,and Adaboost

Table 8: A comparative analysis against contemporary methods over the MSRC dataset

Table 9: A comparative analysis against contemporary methods over the MS COCO dataset

Table 10: A comparative analysis against contemporary methods over the Caltech 101 dataset

Finally,Tables 8–10 contrast the proposed system’s functionality for object recognition with other state-of-the-art methodologies over the mentioned RGB object datasets.

5 Conclusion

This study presents a useful technique for identifying intricate real-world objects.RGB images are first normalized and median filtered,and then the targeted objects are segmented using Kmeans clustering and segmentation jointly called k-region fusion.Then,to extract important details from the segmented objects,ORB and SIFT are fused to extract the key points.Finally,object labeling and recognition are accomplished using an Artificial Neural Network(ANN).Comparative analyses against cutting-edge systems illustrate how better our suggested approach is,highlighting its outstanding performance on object recognition tasks.The proposed solution is intended to work with a variety of real-world applications such as security systems,the medical field,self-driving cars,assisted living,and online learning.Including depth,information improves object segmentation and identification.Depth adds a new dimension,which enhances spatial comprehension.It aids in separating objects at various distances,managing occlusion situations,and lessening the effect on recognition.Segmentation is streamlined by localization in 3D space.To provide an accurate representation,depth-based features are added to RGB data.It improves scene comprehension and resistance to changes in lighting.Effective for dealing with objects without textures.In general,depth integration enhances accuracy in challenging situations.

Acknowledgement:The authors are thankful to Princess Nourah bint Abdulrahman University Researchers Supporting Project,Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.

Funding Statement:This research was supported by the MSIT(Ministry of Science and ICT),Korea,under the ITRC (Information Technology Research Center) Support Program (IITP-2023-2018-0-01426) supervised by the IITP (Institute for Information &Communications Technology Planning&Evaluation).The funding for this work was provided by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number (PNURSP2023R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding Program Grant Code(NU/RG/SERC/12/6).

Author Contributions:Study conception and design: Aysha Naseer,Jeongmin Park,data collection:Nouf Abdullah Almujally;analysis and interpretation of results:Aysha Naseer,Saud S.Alotaidi and Abdulwahab Alazeb;draft manuscript preparation: Aysha Naseer.All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials:All publicly available datasets are used in the study.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

Computers Materials&Continua2024年1期

Computers Materials&Continua的其它文章: Structured Multi-Head Attention Stock Index Prediction Method Based Adaptive Public Opinion Sentiment Vector; Intelligent Solution System for Cloud Security Based on Equity Distribution:Model and Algorithms; A Hybrid Model for Improving Software Cost Estimation in Global Software Development; Real-Time Detection and Instance Segmentation of Strawberry in Unstructured Environment; Leveraging Augmented Reality,Semantic-Segmentation,and VANETs for Enhanced Driver’s Safety Assistance; Improving Video Watermarking through Galois Field GF(24)Multiplication Tables with Diverse Irreducible Polynomials and Adaptive Techniques