亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        Using Semantic Web Technologies to Improve the Extract Transform Load Model

        2021-12-11 13:32:48AmenaMahmoudMahmoudShamsElzekiandNancyAwadallahAwad
        Computers Materials&Continua 2021年8期

        Amena Mahmoud,Mahmoud Y.Shams,O.M.Elzeki and Nancy Awadallah Awad

        1Department of Computer Science,Kafrelshiekh University,Kafrelshiekh,Egypt

        2Department of Machine Learning,Kafrelsheikh University,Kafrelshiekh,Egypt

        3Department of Computer Science,Mansoura University,Mansoura,Egypt

        4Department of Computer and Information Systems,Sadat Academy for Management Sciences,Cairo,Egypt

        Abstract:Semantic Web (SW) provides new opportunities for the study and applicationof big data,massive ranges of data sets in varied formats from multiple sources.Related studies focus on potential SW technologies for resolving big data problems, such as structurally and semantically heterogeneous data that result from the variety of data formats (structured, semi-structured,numeric,unstructured text data,email,video,audio,stock ticker).SW offers information semantically both for people and machines to retain the vast volume of data and provide a meaningful output of unstructured data.In the current research, we implement a new semantic Extract Transform Load(ETL) model that uses SW technologies for aggregating, integrating, and representing data as linked data.First, geospatial data resources are aggregated from the internet,and then a semantic ETL model is used to store the aggregated data in a semantic model after converting it to Resource Description Framework(RDF)format for successful integration and representation.The principal contribution of this research is the synthesis,aggregation,and semantic representation of geospatial data to solve problems.A case study of city data is used to illustrate the semantic ETL model’s functionalities.The results show that the proposed model solves the structural and semantic heterogeneity problems in diverse data sources for successful data aggregation,integration,and representation.

        Keywords:Semantic web; big data;ETL model; linked data;geospatial data

        1 Introduction

        Big Data consists of data from billions to trillions of millions of persons, all from various sources (e.g., Web, customer contact center, social media, mobile data, sales, etc.).Usually, the material is loosely structured and is frequently outdated and unavailable.Big Data is transforming science, engineering, medicine, healthcare, finance, business, and ultimately society itself.Huge volumes of data for strategic economic gain, public policy, and new insight into a wide variety of technologies are available (including healthcare, biomedicine, energy, smart cities, genomics,transportation, etc.).Most of this knowledge, however, is inaccessible to users because we need technologies and resources to discover, transform, interpret, and visualize data to make it consumable for decision-making [1,2].

        Due to the variety of data that includes different formats such as structured, semi-structured,and unstructured data, it is difficult to be processed using traditional databases and software techniques.Therefore, efficient technology and tools are needed to process data to be consumable for decision-making especially that most of them are inaccessible to users, as shown in Fig.1 [2].

        Figure 1:Processing data using traditional techniques

        Nevertheless, meaningful data integration in a schema-less, and complex big data world of databases is a big open challenge.Big data challenges are not only in storing and managing this variety of data but also extracting and analyzing consistent information from it.Researchers are working on creating a common conceptual model for the integrated data [3].The method of publishing and linking structured data on the web is called Linked Data [4].This data is machinereadable, its meaning is explicitly defined, it is linked to other external data sets, and it can be linked to from other data sets as well [5].

        Extract-Transform-Load (ETL) procedure is one of the most popular techniques in data integration.It covers the process of loading data from the source system to the data warehouse.This process consists of three consecutive stages:extracting, transforming, and loading, as shown in Fig.2.

        Figure 2:Traditional ETL model

        To accurately exploit web data, a system needs to be capable to read the exact semantic meaning of web-published information.An acknowledged way to publish machine-readable information is to use Semantic web (SW) technologies.The purpose of SW technologies is to fix a common vocabulary and a set of interpretation constraints (inferring rules) to semantically express metadata over web information and allow doing some reasoning on it.More specifically, SW presents human knowledge through structured collections of information and sets of inference rules [6,7].

        By using the SW formats, web resources can be enriched with annotations and other markups capturing the semantic metadata of resources.The first motivator of SW is data integration,which is a significant bottleneck in many IT applications.Current solutions to this problem are mostly ad hoc each time, a specific mapping is made between the data models (schemas) of the data sources involved.In addition to that, if the data sources’semantics were described in a machine-interpretable way, the mappings could be constructed at least semi-automatically.The second motivator is more intelligent support for end-users.If the computer programs can infer consequences of information on the web, they can give better support in finding information,selecting information sources, personalizing information, combining information from different sources, and so on.

        Unlike the documentation of semantics, the approaches to the complex description of ETL problems are presented in the field of graphic modeling, however their scope of application is essentially limited, and the resulting benefits from the application are not high.

        Figure 3:Semantic ETL model

        Currently, we are moving from the era of “data on the web” to the era of “web of data(linked data).” Linked Data (LD) is introduced as a step in transforming the web into a global database.The term LD refers to a group of best practices for publishing and interlinking data on the web [8,9].Creating LD requires having data available on the web in a standard, reachable,and manageable format.Besides, the relationships among data are required [10], as shown in Fig.3.LD depends on some SW technologies and Hypertext Transfer Protocol (HTTP) to publish structured data on the web and to connect data from different data sources to allow data in one data source to be linked to data in another data source effectively [11,12].SW contains design principles for sharing machine-readable interlinked data on the web.These links for different datasets make them clearly understood not only for humans but for machines as well.

        LD facilitates data integration and navigation through complex data owing to the standards to which it adheres.Guidelines allow easy upgrades and extensions to data models.Besides,representation under a set of global principles also increases data quality.Moreover, the database of semantic graphs representing LD creates semantic links between varied sources and disparate formats [13,14].

        2 Related Work

        Most of the technical difficulties that typically appear when dealing with big data integration results from a variety of data formats, including structured and semantic.Many existing studies depend on SW and metadata.Semantic technologies have been added recently to the ETL process to alleviate these problems.

        Srividya et al.[15] designed a semantic ETL process using ontologies to capture the semantics of a domain model and resolve semantic heterogeneity.This model assumes that the data resources’ type is the only relational database.Huang et al.[1] automatically extracted data from different marine data resources and transformed them into unified schemas relying on an applied database to integrate it semantically.Sonia et al.[16] and Lihong et al.[17] produced a semantic ETL process for integrating and publishing structured data from various sources as LD by inserting a semantic model and instances into a transforming layer using the OWL, RDF, and SPARQL technologies.Mahmoud et al.[18] enhanced the ETL definitions by allowing semantic transforming of semi-automatic, inter-attributes through the identification of data source schemes and semantic grouping of attribute values.

        Mei et al.[19] introduced a semantic approach for extracting, linking, and integrating geospatial data from several structured data sources.It also solves the individuals’redundancy problem facing data integration.The basic idea of this model is to use ontologies to convert extracted data from different sources to RDF format followed by linking similar entities in the generated RDF files using the linking algorithm.The next step is to use SPARQL queries to eliminate data redundancy and combine complementary properties for integration using an integration algorithm.Isabel et al.[20] developed a technique for solving the redundancy problems between individuals in data integration using SW technologies.

        Boury et al.[21] and Saradha et al.[22] discussed the mapping between data schemas in which the mapping process between column names is adjusted manually.Jadhao et al.[23] proposed and implemented a new model to aggregate online educational data sources from the internet and mobile networks using such semantic techniques as ontologies and metadata to enhance the aggregation results.Ying et al.[24] built a combined data lake using semantic technologies within architecture for aggregating data from numerous sources.

        Kang et al.developed a semantic big data model that reduces the context for semantically storing data in line with a map.However, the inclusion of data from existing database structures has not been facilitated by this model.In science, semantic models for data aggregation, convergence, and representation are still uncommon and face many obstacles, such as semantic and structural heterogeneity.Here, we suggest using some semantic strategies to resolve these issues and improve the aggregation, integration, and representation of big data [25,26].

        3 A Case Study

        A case study of city data is used to explain the new workflow features.Internet contains numerous data services, such as MapCruzin group [27], Data.gov [28], United States Census [29],OST/SEK Map group [30], USCitiesList.org [31], and Gaslamp media [32].Data are stored in these resources in different formats such asshape_file,comma-separated values(CSV), andDBF date filedata.TheOST/SEC GIs map groupdata resource provides data such ascountry_fip,ST,LON,LAT, STATE,name, andPROG_DISC, whiledata.govprovidescountry,countryfips,longitude,latitude,PopPlLat,PopPlLong,state, andstate_fip.United States CensusprovidescountryFP,name,Aland, andAwater.Besides,country,name,longitude,latitude,land area,water area,zip_codes, andarea codeare provided inUSCitiesList.org.The last data resource fromGaslamp mediacontainszip_code,longitude,latitude,city, andstate.

        Tab.1 represents the semantic heterogeneity in these sources.However, these data will be more useful if it is represented and stored in a semantic model after integrating it semantically and then removing data duplications.Some data in these resources are the same but are referred to use different names such as (city, name), (Aland, land area), (Awater, water area), (country_fip,countryFP, countryfips), (LON, longitude), and (LAT, latitude).This incompatibility causes many problems in data integration and hence the generic geospatial ontology is applied to transform this data into RDF format for easily integrating using Jena and SPARQL query.The following step is to represent these data and to store them semantically in the semantic big data model.

        Table 1:Semantic heterogeneity from diverse databases

        4 Proposed Approach

        The first approach proposed collecting geospatial data services by the geospatial ontology seen in Fig.4.The suggested semantic model of ETL, shown in Fig.5, aims to aggregate various geospatial data services from the network semantically and combine the derived resource data semantically to store it as a geospatial semantic big data model.Next, metadata is combined over the internet using geospatial data resources from various resources, as shown in Fig.6.Then, the three steps of the ETL are performed.

        The first phase is extracting data from the aggregated geospatial resources.These extracted data are different from each other and have different schemas.Hence, they have no semantic meaning, and their structures are different.We use SW technologies in the second phase to align and link this data.

        Figure 4:The geospatial ontology used for aggregation of heterogeneous services

        The principal purpose of the second phase is to prepare the extracted data and transform it into the RDF format for linking.This consists of five procedures.The first procedure is data preparation which contains some typical transformation activities for preparing the data.This includes such activities as normalizing data, removing duplicates, checking for integrity violations,filtering, sorting, grouping, and dividing data according to format.Additionally, it transforms data to RDF format (structured, semi-structured, or unstructured).The RDF generation, shown in Fig.7 for structured and semi-structured data, is based on the standardized geospatial ontology shown in Fig.8.The derived data are then converted to RDF format.

        The RDF generation algorithm used for this transformation is as follows:

        Trans-Data-to-RDF algorithm 1 Input:src1: Geospatial CSV file,2 src2: Geospatial Ontology file 3 Output:real: alignmentmeasure_value // matching value between entities in src1and src2 files,4 RDF data file generation 5 Variables:6 file:CSV_File // read the CSV data 7 string:RDF_className // class name of the generated RDF data 8 array:Column_Listing // list for storing all columns name 9 string:column_name // string store column name in the input CSV file 10 string:column_data // string store every value of the CSV data

        11 object:csvModel // model to hold the RDF data which generated from the Geospatial CSV input file 12 object:geospatial_ontology // model to hold the Geospatial Ontology file 13 object:dataPropertys // object from DatatypeProperty class 14 object:csvIndividual // object for creating individuals 15 string:property // string to get the data property name from the Colomn_Listing 16 object:First_ontology // object from JENAOntology 17 object:Second_ontology // object from JENAOntology 18 object:alignmentmeasure // for creating the alignment process between First_ontology and Second_ontology 19 int:counter // initial value equal 0 20 string: entity1//hold the data properties name of the First_ontology in each cell 21 string: entity2//hold the data properties name of the Second_ontology in each cell 22 Processing:23 Begin:24 // First Stage:25 //First Step:Read Geospatial CSV file 26 CSV_File ←src1 27 //Second Step:Convert data in CSV into RDF format 28 RDF class name = src1 name 29 // create the data properties from columns name 30 For all column_names in src1 {31 column_name ←column value 32 DatatypeProperty dataPropertys =33 csvModel.createDatatypeProperty(column_name);34 Coloumn_Listing.add(column_name); }35 // set every row data as a new individual 36 while (! End-of-file(src1))37 { column_data ←column value 38 Individual csvIndividual = cvsClass.createIndividual ();39 String property = Coloumn_Listing.get (counter++);40 csvIndividual.addProperty(csvModel.getDatatypeProperty(property),41 column_data); }42 OntModel geospatial_ontology ←read src2 43 //Using“Alignment API”to calculate similarities 44 JENAOntology First_ontology = new JENAOntologyFactory().newOntology(csvModel,true);45 46 JENAOntology Second_ontology = new JENAOntologyFactory().newOntology(geospatial_ontology, true);47 //Aligning data properties between two ontologies 48 AlignmentProcess alignmentmeasure=new SMOANameAlignment();49 alignmentmeasure.init (First_ontology, Second_ontology); // takes the source and target 50 ontology to alignment 51 alignmentmeasure.align (First_ontology.getdataproperties(), Second_ontology.Countryclass.

        52 getdataproperties());53 For all cells c in alignmentmeasure 54 {55 alignmentmeasure_value = cell.getStrength();//get measre value 56 entity1=cell.getObject1().toString();//get data property name of First_ontology 57 entity2=cell.getObject2().toString();//get data property name of Second_ontology 58 If (alignmentmeasure_value > 0.5)59 {60 entity1.value ←entity2.value;61 }62 }63 // Second Stage 64 Save First_ontology as RDF format in an XML file 65 66 67 68 END

        Figure 5:Semantic ETL model

        Figure 6:Geospatial data resources aggregation

        Figure 7:RDF file generation from data files of structured and semi-structured data

        Using the API 4.0 [33,34] alignment, this algorithm transforms the CSV data file into an RDF file by the default geospatial ontology used.Thus, structured, and semi-structured data files(such as XML, EXCEL, and JSON) are translated into CSV data files before the RDF generation algorithm is applied.Structuring analysis is used to remove the noisy components and generate metadata information, followed by a data mining operation consisting of two procedures, linguistic and semantic analysis as shown in Fig.9.

        Figure 8:Generic geospatial ontology

        The linguistic analysis method involves two steps.Firstly, phrase splitting which includes a speech tagger section, morphological examination, a JAPE transducer, and a root gazetteer.The JAPE transducer implements specific laws centered on regular expressions over the annotated corpus.It is the responsibility of the onto root gazetteer to take domain ontology as an input to construct an annotated corpus with the geospatial entities.The second step is the semantic analysis that is used to catch the hidden relationships between the annotated entities in the textual details.The output of the linguistic analysis is used as the input to the system of semantic analysis, which uses fundamental semantic rules adapted from [34] to extract the relationships from unstructured textual data.

        The final procedure is gathering the geospatial data for the semantic model.The data linking algorithm is used in this procedure to link RDF data files semantically before merging them to address the problem of semantic heterogeneity.Next, the linkage and integration algorithm are used to compare and avoid the redundancy of entities before the integration process, accompanied by merging all data from RDF files into a single file.Finally, the semantic model for storing integrated geospatial data semantically using the technique is built-in.The generated semantic model is stored in the data warehouse in the last phase of the semantic ETL model.

        Figure 9:RDF file generation from unstructured data files, adapted from [34]

        5 Discussion of Experiment

        5.1 Experimental Setup

        Specific SW technologies are used to implement the proposed approach, as follows:

        1.Uniform Resource Identifier:Defining and finding properties such as the default web pages,offering a baseline to represent the characters used in most languages of the world, and to classify resources [34].

        That was the last day I ever saw my first love. Now 4 years later, here I am in CANADA. I have guy in my life now, whom I am deeply love with after Mamun. I never lose him.

        2.RDF:Internet data-sharing model, which defines the metadata of websites and ensures interoperability between applications.This facilitates the data merging of various schemes and allows the mixing, exposure, and sharing of structured and semi-structured data across different applications [35].

        3.SPARQL:The RDF query language and protocol used to query, retrieve, and process RDF-format data [36].

        4.OWL:An SW language built on top of RDF.Written in XML, it represents things, classes of things, and links between items of knowledge of things [37].

        5.Alignment API:Offers abstractions for ontology, alignment, and correspondence network notes, as well as coercive building blocks such as matches, evaluators, renderers,and parsers.

        6.XML (eXtensible Markup Format):An extensible format that enables users to construct their document identifiers.Provides the syntax of the material structure inside documents [38].

        7.Program Eclipse.

        8.Protégé “ontology editor”:This is an open-source editor for the construction of ontology domain models and knowledge-based applications [39].

        5.2 Results

        Table 2:Matching attributes between ontology and data in source

        Table 3:Matching attributes between ontology and data in source

        Table 4:Matching attributes between ontology and data in source

        Table 5:Matching attributes between ontology and data in source

        The CSV data are used and translated into the RDF data file using the proposed RDF generation algorithm.If the used data is unstructured, the SPARQL query extracts the attributes from its RDF file, and then the proposed linking algorithm is applied to match the attributes and is translated into RDF format to validate the algorithm.Fig.10 shows a case to illustrate this procedure.

        Figure 10:Example of matched attributes

        The next stage is to align the attributes extracted in from both the constructed ontology and the data, as in Fig.10.Since the attributes “name, city,” “l(fā)on, longitude,” and “l(fā)at, latitude”correspond to the same details, they have matched attributes.This suggests that the issues of textual and structural variability have been solved and that the data services are combined and semantically processed.

        Table 6:Comparison between existing semantic models and the proposed model

        The emphasis for large-scale data studies is primarily on quantity, speed, and variety.SW technology was not introduced for such data.The pace and volume of SW developments remain major challenges.The semantic heterogeneity issue created by the variety of big data is overcome in the proposed model.Machine performance shows which components of the input systems have been implemented effectively.Tab.6 lists the differences between the existing models and the proposed model.

        Confidence in the relationship between characteristics of the alignment supplier is enhanced by the magnitude of the greater interest (measurement meaning:Float between 0.0 and 1.0).Various communications systems for the API are described in [39].

        6 Conclusion

        This study presents a new ETL semantic model that allows for combining, associating, incorporating, and characterizing geospatial data using semantic technology from numerous geospatial resources on the internet.Geospatial data services are first aggregated semantically, and then the three steps of the ETL are combined, viewed, and processed as LD.Besides, we addressed problems of systemic and semantic heterogeneity before the integration cycle.SW technology solves the big data variety problem, but not the quantity problem.

        Funding Statement:The authors received no specific funding for this study.

        Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

        亚洲AV无码日韩综合欧亚| 无码字幕av一区二区三区| 少妇无码吹潮| 日韩精品区欧美在线一区| 成人综合亚洲国产成人| 中文字幕国产亚洲一区| 少妇人妻在线无码天堂视频网| 天天干成人网| 欧洲AV秘 无码一区二区三| 男女打扑克视频在线看| 久久96国产精品久久久| 婷婷五月综合缴情在线视频| 国产品精品久久久久中文| 日本一区二区三区丰满熟女| 女人被男人爽到呻吟的视频| 久久亚洲中文字幕无码| 天堂Av无码Av一区二区三区| 久久伊人精品色婷婷国产| 影音先锋色小姐| 久久AⅤ无码精品为人妻系列| 元码人妻精品一区二区三区9| 亚洲av五月天一区二区| 亚洲av午夜福利精品一区二区 | 亚洲av日韩av高潮潮喷无码| 国产成人精品三上悠亚久久| 国产一区二区熟女精品免费| 乱中年女人伦| 肉体裸交丰满丰满少妇在线观看| 蜜桃一区二区三区自拍视频| 免费视频无打码一区二区三区 | 久久久久亚洲精品男人的天堂| 中文字幕亚洲情99在线| 乱色视频中文字幕在线看| 日本黄色影院一区二区免费看 | 免费成人在线电影| 四虎欧美国产精品| 日本高清一区二区三区不卡| 丰满熟妇人妻av无码区 | 欧美一级特黄AAAAAA片在线看| 午夜av福利亚洲写真集| 伊人久久大香线蕉午夜av|