There has been a meteoric rise in the total amount of digital texts as a direct result of the proliferation of internet access. As a direct result of this, document clustering has evolved into a crucial method that must be used in order to successfully extract relevant information from big document collections. When employing the document clustering approach, documents are automatically sorted into groups whose members have a high degree of similarity to one another. These groups are created by applying the document clustering technique. Because they do not take into account the semantic linkages that exist between the texts, traditional clustering approaches are unable to provide an acceptable description of a collection of texts. This is because traditional clustering techniques. Document clusters, in which texts are ordered according to their meaning rather than their use of keywords, have been extensively utilized as a means of overcoming these challenges as a result of the incorporation of semantic information. This has been possible as a result of the fact that document clusters can group together related texts. In this investigation, we looked at a total of 27 distinct papers that were published over the previous five years and categorized the documents based on the semantic similarities that existed between the various pieces. A detailed literature evaluation is included to each and every one of the publications that were selected for further consideration. Comparative research is carried out on a wide variety of evaluation strategies, including as algorithms, similarity metrics, instruments, and processes. Following that, there is a drawn-out discussion that analyzes the similarities and differences between the activities.
Due to the availability of a vast amount of unstructured data in various forms (e.g., the web, social networks, etc.), the clustering of text documents has become increasingly important. Traditional clustering algorithms have not been able to solve this problem because the semantic relationships between words could not accurately represent the meaning of the documents. Thus, semantic document clustering has been extensively utilized to enhance the quality of text clustering. This method is called unsupervised learning and it involves grouping documents based on their meaning, not on common keywords. This paper introduces a new method that groups documents from online laboratory repositories based on the semantic similarity approach. In this work, the dataset is collected first by crawling the short real-time descriptions of the online laboratories’ repositories from the Web. A vector space is created using frequency-inverse document frequency (TF-IDF) and clustering is done using the K-Means and Hierarchical Agglomerative Clustering (HAC) algorithms with different linkages. Three scenarios are considered: without preprocessing (WoPP); preprocessing with steaming (PPwS); and preprocessing without steaming (PPWoS). Several metrics have been used for evaluating experiments: Silhouette average, purity, V-measure, F1-measure, accuracy score, homogeneity score, completeness and NMI score (consisting of five datasets: online labs, 20 NewsGroups, Txt_sentoken, NLTK_Brown and NLTK_Reuters). Finally, by creating an interactive webpage, the results of the proposed work are contrasted and visualized.
In the era of digitalization, the number of electronic text documents has been rapidly increasing on the Internet. Organizing these documents into meaningful clusters is becoming a necessity by using several methods (i.e., TF-IDF, Word Embedding) and based on documents clustering. Document clustering is the process of dynamically arranging documents into clusters such that the documents contained within a cluster are very similar to those contained inside other clusters. Due to the fact that traditional clustering algorithms do not take semantic relationships between words into account and therefore do not accurately represent the meaning of documents. Semantic information has been widely used to improve the quality of document clusters by grouping documents according to their meaning rather than their keywords. In this paper, twenty-five papers have been systematically reviewed that are published in the last seven years (from 2016 to 2022) linked to semantic similarities which are based on document clustering. Algorithms, similarity measures, tools, and evaluation methods usage have been discussed as well. As result, the survey shows that researchers used different datasets for applying semantic similarity-based clustering regarding the text similarity. Hereby, this paper proposes methods of semantic similarity approach-based clustering that can be used for short text semantic similarity included in online laboratories repository.
Data Mining is the process of finding knowledge through the processing of massive amounts of data from different viewpoints and combining them into valuable information; data mining has been a crucial part in various aspects of human life. It is used to recognize the covered up patterns in a huge amount of data. Classification methods are supervised learning methods that categorize the data item into known categories. Creating classification models from an input dataset is one of the most beneficial techniques in data mining; these methods typically create models that are used to forecast future patterns in data. This work has been done to assess the effectiveness of different classifiers algorithms such as Support Vector Machine (SVM), Naïve Bayes (NB), J48, and Neural Network (NN), these algorithms were applied on several datasets to determine the performance of the algorithm. All techniques were used with 10-fold cross-validation in the machine learning platform WEKA. According to the study’s findings, no algorithm has consistently performed best for each dataset.
The Internet of Things (IoT) is one of today's most rapidly growing technologies. It is a technology that allows billions of smart devices or objects known as "Things" to collect different types of data about themselves and their surroundings using various sensors. They may then share it with the authorized parties for various purposes, including controlling and monitoring industrial services or increasing business services or functions. However, the Internet of Things currently faces more security threats than ever before. Machine Learning (ML) has observed a critical technological breakthrough, which has opened several new research avenues to solve current and future IoT challenges. However, Machine Learning is a powerful technology to identify threats and suspected activities in intelligent devices and networks. In this paper, various ML algorithms have been compared in terms of attack detection and anomaly detection, following a thorough literature review on Machine Learning methods and the significance of IoT security in the context of various types of potential attacks. Furthermore, possible ML-based IoT protection technologies have been introduced.
The Internet has caused the advent of a digital society; wherein almost everything is connected and available from any place. Thus, regardless of their extensive adoption, traditional IP networks are yet complicated and arduous to operate. Therefore, there is difficulty in configuring the network in line with the predefined procedures and responding to the load modifications and faults through network reconfiguring. The current networks are likewise vertically incorporated to make matters far more complicated: the control and data planes are bundled collectively. Software-Defined Networking (SDN) is an emerging concept which aims to change this situation by breaking vertical incorporation, promoting the logical centralization of the network control, separating the network control logic from the basic switches and routers, and enabling the network programming. The segregation of concerns identified between the policies concept of network, their implementation in hardware switching and data forwarding is essential to the flexibility required: SDN makes it less complicated and facilitates to make and introduce new concepts in networking through breaking the issue of the network control into tractable parts, simplifies the network management and facilitate the development of the network. In this paper, the SDN is reviewed; it introduces SDN, explaining its core concepts, how it varies from traditional networking, and its architecture principles. Furthermore, we presented the crucial advantages and challenges of SDN, focusing on scalability, security, flexibility, and performance. Finally, a brief conclusion of SDN is revised.
Air pollution, water pollution, and radiation pollution are significant environmental factors that need to be addressed. Proper monitoring is crucial with the goal that by preserving a healthy society, the planet can achieve sustainable development. With advancements in the internet of things (IoT) and the improvement of modern sensors, environmental monitoring has evolved into a smart environment monitoring (SEM) system in recent years. This article aims to have a critical overview of significant contributions and SEM research, which include monitoring the quality of air , water pollution, radiation pollution, and agricultural systems. The review is divided based on the objectives of applying SEM methods, analyzing each objective about the sensors used, machine learning, and classification methods. Moreover, the authors have thoroughly examined how advancements in sensor technology, the Internet of Things, and machine learning methods have made environmental monitoring into a truly smart monitoring system.
Whether you deal with a real-life issue or create a software product, optimization is constantly the ultimate goal. This goal, however, is achieved by utilizing one of the optimization algorithms. The progressively popular Gradient Descent (GD) optimization algorithms are frequently used as black box optimizers when solving unrestricted problems of optimization. Each iteration of a gradient-based algorithm attempts to approach the minimizer/maximizer cost function by using the gradient's objective function information. Moreover, a comparative study of various GD variants like Gradient Descent (GD), Batch Gradient Descent (BGD), Stochastic Gradient Descent (SGD) and Mini-batch GD are described in this paper. Additionally, this paper outlines the challenges of those algorithms and presents the most widely used optimization algorithms, including Momentum, Nesterov Momentum, Adaptive Gradient (AdaGrad), Adaptive Delta (AdaDelta), Root Mean Square Propagation (RMSProp), Adaptive Moment Estimation (Adam), Maximum Adaptive Moment Estimation (AdaMax) and Nesterov Accelerated Adaptive Moment Estimation (Nadam) algorithms; All of which, will be separately presented in this paper. Finally, a comparison has been made between these optimization algorithms that are based on GD in terms of training speed, convergency rate, performance and the pros and cons.
Social Link
GOOGLE SCHOLAR
RESEARCHGATE
ORCID
SSRN
Web of Science