Petanux - From Knowledge to Market


List of Publications

The main question in today’s rapidly changing world is how fast and what sort of corresponding knowledge should an agent be adopted to?! This can be defined as knowledge mapping problem for decision based on large scale datasets with veracity and accuracy as key criteria, especially in safety-critical systems. The following paper proposes a hybrid ans scalable approach for Multi-Criteria Decision Making (MCDM) problems that is deployed in MapReduce. The main sector specific problem that is solved is to recommend training resources that efficiently improves skill gaps of job seekers. The main innovations of this work are: (1) the use of large scale semi-real skill analytics and training resources dataset (Dataset Perspective), (2) a hybrid MCDM approach that resolves skill gaps by matching required skills to the training resources (Decision Support Perspective). This can be applied to any other sector with the context of matching problems. (3) the use of MapReduce as scalable processing approach to deliver lower processing latency and higher quality for large scale datasets (Big Data and Scalability Perspective). The experimental results showed 89% accuracy in the clustering and matching results. The recommendation results have been tested and verified with the industrial partner.

Mahdi Bohlouli; Martin Schrage2020 | IEEE International Conference on Big Data (Big Data)

Realization of a cloud enabled Knowledge modeling as a Service (KaaS) supports acquiring, managing and sharing knowledge through different types of sources. Cloud computing enables its users to access and edit their data easily without any administration and software development expenses and difficulties. In this paper, an overview and implementation results of the concept called Knowledge Forge is described. The proposed concept of this research is successfully tested and applied in the representation and management of job knowledge (competences) towards the assessment of talent’s proficiency. This can also be used for modeling of graduates and students proficiency in order to identify their competence gaps and assign proper learning sources to them in order to improve their competitiveness. The Knowledge Forge can be extended easily to a wide variety of applications in order to enable further knowledge formats and application areas. Thereby, new formats use their own graphical interface and methods, while no changes in the system itself are required. The main goal of this research is to enable skilled workers and knowledge engineers to search for, modify and connect different formats of knowledge(e.g. unstructured knowledge). Neither users nor administrators have to care about the data format, so a search may show better results out of all stored data along with external search results provided by web search engines. The result of this work is a framework for all KaaS purposes. Although its use is limited at the moment it creates a basis for new applications in several areas. Further work can enhance the functionality or simplify the enhancement process to enable even the users to create their own applications.

Mahdi Bohlouli, Sebastian Hellekes 2020 | IEEE International Conference on Big Data (Big Data)

Considering the diversity of proposed cloud computing services in federated clouds, users should be very well aware of their current required and future expected resources and values of the quality-of-service parameters to compose proper services from a pool of clouds. Various approaches and methods have been proposed to accurately address this issue and predict the quality-of-service parameters. The quality-of-service parameters are stored in the form of time series. Those works mostly discover patterns either between separate time series or inside specific time series and not both aspects together. The main research gap which is covered in this work is to make use of measuring similarities inside the current time series as well as between various time series. This work proposes a novel hybrid approach by means of time-series clustering, minimum description length, and dynamic time warping similarity to analyze user needs and provide the best-fit quality-of-service prediction solution to the users through the multi-cloud. We considered the time as one of our important factors, and the system analyzes the changes over time. Furthermore, our proposed method is a shape-based prediction that uses dynamic time warping for covering geographical time zone differences with the novel preprocessing method using statistically generated semi-real data to fulfill noisy data. The experimental results of the proposed approach show very close predictions to the real values from practices. We achieved about 0.5 mean absolute error rate on average. For this work, we used the WS-DREAM dataset which is widely used in this area.

Amin Keshavarzi, Abolfazl Torghi Haghighat, Mahdi Bohlouli 2020 | Iranian Journal of Science and Technology, Transactions of Electrical Engineering

Service monitoring in federated clouds generates large scale QoS time series data with various unknown, frequent and abnormal patterns. This could be associated with inaccurate resource provisioning and avoid violations through predictive and preventive actions. A sufficient intelligence in the form of expert system for decision support is needed in such situations. Therefore, the main challenge here is to efficiently discover unknown frequent and abnormal patterns from QoS time series data of federated clouds. On the other hand, QoS time series data in federated clouds is unlabeled and consists of frequent and abnormal structures. Studies showed that clustering is the most common and efficient method to discover interesting patterns and structures from unlabeled data. But, clustering is normally associated with time overhead that should be optimized as well as accuracy issues mainly in connection with convergence and finding an optimum number of clusters. This work proposes a new genetic based clustering algorithm that shows better accuracy and speed in comparison to state-of-the-art methods. Furthermore, the proposed algorithm can find the optimum number of clusters concurrently with the clustering itself. Achieved accuracy and convergence of the proposed method in the experimental results assure its use in expert systems, mainly for resource provisioning and further autonomous decision making situations in federated clouds. In addition to the scientific impact of this paper, the proposed method can be used by federated cloud service providers in practice.

Amin Keshavarzi, Abolfazl Torghi Haghighat, Mahdi Bohlouli2020 | Journal of Expert Systems with Applications

Chronic Kidney Disease (CKD) is being typically observed as a health threatening issue, especially in developing countries, where receiving proper treatments are very expensive. Therefore, early prediction of CKD that protects the kidney and breaks the gradual progress of CKD has become an important issue for physicians and scientists. Internet of Things (IoT) as a useful paradigm in which, low cost body sensor and smart multimedia medical devices are applied to provide remote monitoring of kidney function, plays an important role, especially where the medical care centers are hardly available for most of people. To gain this objective, in this paper, a diagnostic prediction model for CKD and its severity is proposed that applies IoT multimedia data. Since the influencing features on CKD are enormous and also the volume of the IoT multimedia data is usually very huge, selecting different features based on physicians’ clinical observations and experiences and also previous studies for CKD in different groups of multimedia datasets is carried out to assess the performance measures of CKD prediction and its level determination via different classification techniques. The experimental results reveal that the applied dataset with the proposed selected features produces 97% accuracy, 99% sensitivity and 95% specificity via applying decision tree (J48) classifier in comparison to Support Vector Machine (SVM), Multi-Layer Perception (MLP) and Naïve Bayes classifiers. Also, the proposed feature set can improve the execution time in comparison to other datasets with different features.

Mehdi Hosseinzadeh, Jalil Koohpayehzadeh, Ahmed Omar Bali,Parvaneh Asghari, Alireza Souri, Ali Mazaherinezhad,Mahdi Bohlouli, Reza Rawassizadeh
2020 | Multimedia Tools and Applications

Children with autism spectrum disorders (ASDs) have some disturbance activities. Usually, they cannot speak fluently. Instead, they use gestures and pointing words to make a relationship. Hence, understanding their needs is one of the most challenging tasks for caregivers, but early diagnosis of the disease can make it much easier. The lack of verbal and nonverbal communications can be eliminated by assistive technologies and the Internet of Things (IoT). The IoT-based systems help to diagnose and improve the patients’ lives through applying Deep Learning (DL) and Machine Learning (ML) algorithms. This paper provides a systematic review of the ASD approaches in the context of IoT devices. The main goal of this review is to recognize significant research trends in the field of IoT-based healthcare. Also, a technical taxonomy is presented to classify the existing papers on the ASD methods and algorithms. A statistical and functional analysis of reviewed ASD approaches is provided based on evaluation metrics such as accuracy and sensitivity.

Mehdi Hosseinzadeh, Jalil Koohpayehzadeh, Ahmed Omar Bali,Farnoosh Afshin Rad, Alireza Souri, Ali Mazaherinezhad, Aziz Rezapour,Mahdi Bohlouli
2020 | The Journal of Supercomputing

Outlier detection has received special attention in various fields, mainly for those dealing with machine learning and artificial intelligence. As strong outliers, anomalies are divided into point, contextual and collective outliers. The most important challenges in outlier detection include the thin boundary between the remote points and natural area, the tendency of new data and noise to mimic the real data, unlabeled datasets and different definitions for outliers in different applications. Considering the stated challenges, we defined new types of anomalies called Collective Normal Anomaly and Collective Point Anomaly in order to improve a much better detection of the thin boundary between different types of anomalies. Basic domain-independent methods are introduced to detect these defined anomalies in both unsupervised and supervised datasets. The Multi-Layer Perceptron Neural Network is enhanced using the Genetic Algorithm to detect new defined anomalies with a higher precision so as to ensure a test error less than that be calculated for the conventional Multi-Layer Perceptron Neural Network. Experimental results on benchmark datasets indicated reduced error of anomaly detection process in comparison to baselines.

Rasoul Kiani, Amin Keshavarzi, Mahdi Bohlouli2020 | The Applied Artificial Intelligence Journal

Cloud service providers should be able to predict the future states of their infrastructure in order to avoid any violation of Service Level Agreement. This becomes more complex when vendors have to deal with services from various providers in multi-clouds. As a result, QoS prediction can significantly support service providers in a better understanding of their resources future states. Users should also be very well aware of their resource needs, as well as the Quality of Service relative values. This paper proposes a hybrid approach to the prediction of the future value of the QoS features. The hybrid approach uses a modified version of k-medoids algorithm for the clustering of large time-series datasets, as well as a proposed algorithm inspired from the lazy learning and lower bound Dynamic Time Warping (LB-Keogh) for pruned DTW computations. The proposed method in this manuscript is a shape-based QoS prediction with a novel pre-processing method, which fulfills the missing data with statistically semi-real data. In order to solve the cold start problem, we proposed new algorithm based on the DTW Barycenter Averaging (DBA) algorithm. The results showed that our predicted values are very close to real values and achieve only 0.35 of the normalized mean absolute error rate, on average, for the WSDream dataset and 0.07 for the Alibaba dataset.

Amin Keshavarzi, Abolfazl Torghi Haghighat, Mahdi Bohlouli2019 | Computing Journal

Language recognition has been significantly advanced in recent years by means of modern machine learning methods such as deep learning and benchmarks with rich annotations. However, research is still limited in low-resource formal languages. This consists of a significant gap in describing the colloquial language especially for low-resourced ones such as Persian. In order to target this gap for low resource languages, we propose a “Large Scale Colloquial Persian Dataset” (LSCP). LSCP is hierarchically organized in a semantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. This encompasses the recognition of multiple semantic aspects in the human-level sentences, which naturally captures from the real-world sentences. We believe that further investigations and processing, as well as the application of novel algorithms and methods, can strengthen enriching computerized understanding and processing of low resource languages. The proposed corpus consists of 120M sentences resulted from 27M tweets annotated with parsing tree, part-of-speech tags, sentiment polarity and translation in five different languages.

Hadi Abdi Khojasteh, Ebrahim Ansari and Mahdi Bohlouli2020 | the 12th Language Resources and Evaluation Conference (LREC)

This book presents outstanding theoretical and practical findings in data science and associated interdisciplinary areas. Its main goal is to explore how data science research can revolutionize society and industries in a positive way, drawing on pure research to do so. The topics covered range from pure data science to fake news detection, as well as Internet of Things in the context of Industry 4.0. Data science is a rapidly growing field and, as a profession, incorporates a wide variety of areas, from statistics, mathematics and machine learning, to applied big data analytics. According to Forbes magazine, “Data Science” was listed as LinkedIn’s fastest-growing job in 2017. This book presents selected papers from the International Conference on Contemporary Issues in Data Science (CiDaS 2019), a professional data science event that provided a real workshop (not “listen-shop”) where scientists and scholars had the chance to share ideas, form new collaborations, and brainstorm on major challenges; and where industry experts could catch up on emerging solutions to help solve their concrete data science problems. Given its scope, the book will benefit not only data scientists and scientists from other domains, but also industry experts, policymakers and politicians.

Mahdi Bohlouli, Bahram Sadeghi Bigham, Zahra Narimani, Mehdi Vasighi, Ebrahim Ansari2020 | Springer Verlag, Berlin