Logo eprints

Supervised and Semi-Supervised Learning in Vision using Deep Neural Networks

Roychowdhury, Soumali (2019) Supervised and Semi-Supervised Learning in Vision using Deep Neural Networks. Advisor: Diligenti, Prof. Michelangelo. Coadvisor: De Nicola, Prof. Rocco . pp. 312. [IMT PhD Thesis]

[img] Text
Roychowdhury_phdthesis.pdf - Published Version
Restricted to IMT staff and National library only until July 2022.

Download (42MB)

Abstract

Deep learning has been a huge success in different vision tasks like classification, object detection, segmentation etc., allowing to start from raw pixels to integrated deep neural models. This thesis aims to solve some of these vision problems using several deep neural network architectures in different ways. The first and the core part of this thesis focuses on a learning framework that extends the previous work on Semantic Based Regularization (SBR) to integrate prior knowledge into deep learners. Deep neural networks are empirical learners and therefore heavily depend on labeled examples, whereas knowledge based learners on the other hand are not very efficient in solving complex vision problems. Therefore, SBR is designed as a semi-supervised framework that can tightly integrate empirical learners with any available background knowledge to get the advantages of learning from both perception and reasoning/knowledge. The framework is learner agnostic and any learning machinery can be used. In the earlier works of SBR, kernel machines or shallow networks were used as learners. The approach of the problem, concept of using multi-task logic functions are borrowed form the previous works of SBR. But for the first time, in this research work, the integration of logic constraints is done with deep neural networks. The thesis defines a novel back propagation schema for optimization of deep neural networks in SBR and also uses several heuristics to integrate convex and concave logic constraints into the deep learners. It also focuses on extensive experimental evaluations performed on multiple image classification datasets to show how the integration of the prior knowledge in deep learners can be used to boost the accuracy of several neural architectures over their individual counterparts. SBR is also used in a video classification problem to automatically annotate surgical and non-surgical tools from videos of cataracts surgery. This framework achieves a high accuracy compared to the human annotators and the state-of-the-art DResSys by enforcing temporal consistency among the consecutive video frames using prior knowledge in deep neural networks through collective classification during the inference time. DResSys, an ensemble of deep convolutional neural networks and a Markov Random Field based framework (CNN-MRF) is used, whereas SBR replaces the MRF graph with logical constraints for enforcing a regularization in the temporal domain. Therefore, SBR and DResSys, two deep learning based frameworks discussed in this thesis, are able to distill prior knowledge into deep neural networks and hence become useful tools for decision support during interoperative cataract surgeries, in report generation, in surgical training etc. Therefore, the first part of the thesis designs scientific frameworks that enable exploiting the wealth of domain knowledge and integrate it with deep convolutional neural networks for solving many real world vision problems and can be used in several industrial applications. In the present world, a range of different businesses possess huge databases with visuals which are difficult to manage and make use of. Since they may not have an effective method to make sense of all the visual data, it might end up uncategorized and useless. If a visual database does not contain meta data about the images or videos, categorizing it, is a huge hassle. Classification of images and videos through useful domain information using these unified frameworks like SBR is a key solution. The second part of the thesis focuses on another vision problem of image segmentation and this part of the thesis is more application-specific. However, it can still be viewed as utilizing some universal and basic domain knowledge techniques with deep learning models. It designs two deep learning based frameworks and makes a head to head comparison of the two approaches in terms of speed, efficiency and cost. The frameworks are built for automatic segmentation and classification of contaminants for cleanliness analysis in automobile, aerospace or manufacturing industries. The frameworks are designed to meet the foremost industry requirement of having an end-to-end solution that is cheap, reliable, fast and accurate in comparison to the traditional techniques presently used in the contaminant analysis and quality control process. These end-to-end solutions when integrated with the simple optical acquisition systems, will help in replacing the expensive slow systems presently existing in the market.

Item Type: IMT PhD Thesis
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
PhD Course: Computer science
Identification Number: 10.6092/imtlucca/e-theses/273
Date Deposited: 31 Jul 2019 14:32
URI: http://e-theses.imtlucca.it/id/eprint/273

Actions (login required, only for staff repository)

View Item View Item