DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture | Scientific Reports

In the post-genomic era, though an abundance of data is accessible, the information is indiscriminately spread over across high dimensional data space, making it challenging to differentiate phenotypes. The same problem of associating relevant features towards a class label lies for other kinds of data (e.g. vowels, text). It becomes critical to arrange elements in an appropriate manner which can enable extraction of relevant features for analyses. Accordingly, the arrangement of information turns to be an important phase via sorting and positioning of the elements in the right order for the subsequent step. We refer this phase as the element arrangement step. The identification or classification of phenotypes or class labels can conceivably be improved following the three steps: element arrangement, feature extraction and developing a suitable classifier.

The conventional machine learning (ML) techniques for classification or detection problem, requires a sample in the form of a feature vector (i.e., a column vector of size p × 1). This feature vector obtained from a feature extraction technique is processed to be categorized into one of the defined groups. The features in this vector form are generally considered mutually independent (particularly in the order of appearance) by ML techniques. Consequently, changing the order of features bears no direct impact in classification or phenotype detection, which makes the element arrangement step redundant for many state-of-the-art ML classifiers like random forest1,2 and decision trees3. However, the reliability of ML techniques is dependent on the feature extraction technique.

On the other hand, convolution neural network (CNN) architecture from deep neural networks accepts a sample as an image (i.e. a matrix of size m × n) and performs feature extraction and classification via hidden layers (such as convolutional layers, RELU layer, max-pooling layers). It does not require additional feature extraction techniques as it automatically derives features from the raw elements. The second advantage is that it finds higher-order statistics of image and nonlinear correlations. Third, convolutions neurons process data for its receptive fields or restricted subarea, relaxing the need to have a very high number of neurons for large input sizes and therefore enables the network to be much deeper with fewer parameters4. Another distinguishing attribute of CNN is weight sharing; i.e., many receptive fields share the same weights and biases (or filter), enabling a reduction in the memory footprint as compared to conventional neural networks. The CNN architecture allows to deal with images effectively and becoming a promise in accuracy for industrial applications (such as driverless cars). The image consists of spatially coherent pixels in a local region; i.e., the pixels close to each other share similar information. Subsequently, the positioning of respective pixels can adversely affect the feature extraction and classification performance of CNN architecture if arbitrarily arranged. Therefore, the order of neighboring pixels in an image utilized by CNN are no longer independent as they were in ML techniques. Additional information is captured at a time of process when CNNs employ a collection of neighboring pixels as opposed to individual use of features by ML techniques. The credit of success also goes to the hardware advancements such as GPUs, which allow very complex models to be trained in a much faster and affordable manner. Also, the development of new deep learning architectures and libraries enable models to be built and learned rapidly. Fortunately, for CNNs, captured images generally are a depiction of physical objects and don’t require rearrangement of pixels as camera lenses place the corresponding shades of objects rightly on to the pixels.

A lot of data such as genomic, transcriptomic, methylation, mutation, text, spoken words, financial and banking are in non-image form and ML techniques are dominantly used in these fields. Moreover, CNN can’t be used because it requires an image as an input. However, if we can transform non-image data to a well-organized image form, then CNN can be used for higher classification performance. For this, we need to develop a method that can perform element arrangement effectively. To improve the detection rate, we integrated all the three steps (element arrangement, feature extraction and classification) in the proposed DeepInsight method. DeepInsight, constructs an image by placing similar elements or features together and dissimilar ones further apart, enabling the collective use of neighboring elements. This collective approach of element arrangement can be useful in uncovering hidden mechanisms (e.g. pathways) or understanding relationship between a set of features (e.g. for texts, vowels). Therefore, conversion to an image by inserting alike features (or raw elements) as clusters is more meaningful and robust than dealing with individual features (ignoring neighborhood information) as important information (from weak elements) can be integrated. This has a potential to explore the relative importance of features towards a target or outcome. Element arrangement is a key to unlock crucial information. It is pertinent to ponder upon strategies which may retrieve more information from a given dataset. Furthermore, DeepInsight, allows feature extraction and classification via the utilization of CNN. This will increase the versatility of CNN by opening it to non-image cases and thus provide a generalized outcome of CNN. We show in this paper that DeepInsight has usefulness for various kinds of data like gene-expression, vowels, texts and artificial.

Different versions of CNNs have been proposed to deal with images effectively5,6,7,8,9,10,11,12,13,14,15,16. For example, He et al.8, proposed a residual networks architecture to make it easier to train very deep networks. They used 152 layers deep residual on the ImageNet dataset. Singh et al.17 developed CNN based technique to classify gene expression using histone modification data as input. Liu et al.18 used tumor gene expression samples as a column vector and employed 1-dimensional CNN to perform classification. They did not convert samples to images. Zeng et al.19 applied CNN to extract features from in situ hybridization gene expression patterns. The input samples were natural images. Gao et al.20 uses DNA sequences and convert into 4-dimensional binary codes. These binary codes are arranged according to the DNA sequence and then applied to CNN to predict polyadenylation sites. Xu et al.21 applied CNN on text hashing where texts are converted into binary coding and then fed to 1-dimensional convolution; i.e., these features are no longer treated as images in convolutional layers. Zhang et al.22 perceived text as a raw signal and applied 1-dimensional CNN for classification. Lyu and Haque23 have recently applied CNN for RNA-seq data by first performing gene selection followed by constructing an image based on chromosome location. This method is perhaps the first one of converting gene expression into image samples and applying CNN for classification. Since this method requires chromosome location information, it is not possible to use it for other kinds of datasets. Most of the methods discussed above are either applied images as input to CNN or used 1-dimensional CNN. Therefore, minimal literature is available to ubiquitously convert non-image samples to images for the applications of CNN.