Analyzing the Influence of Contrast in Large- Scale Recognition of Natural Images Ángel Sánchez a,* , A. Belén Moreno a , Daniel Vélez b and José F. Vélez a a Department of Computer Science and Statistics, Universidad Rey Juan Carlos, Campus de Móstoles, c/ Tulipán, s/n, 28933 Móstoles (Madrid), Spain b Department of Statistics and Operations Research I, Universidad Complutense, Ciudad Universitaria, Plaza Ciencias, 3, 28040 Madrid, Spain Abstract. This paper analyzes both the isolated influence of illumination quality in 2D facial recognition and also the influence of contrast measures in large-scale recognition of low-resolution natural images. First, using the Yale Face Database B, we have shown that by separately estimating the illumination quality of facial images (through a fuzzy inference system that combines average brightness and global contrast of the patterns) and by recognizing the same images using a multilayer perceptron, there exists a nearly-linear correlation between both illumination and recognition results. Second, we introduced a new contrast measure, called Harris Points Measured Contrast (HPMC), which assigns values of contrast in a more consistent form to images, according to their recognition rate than other global and local compared contrast analysis methods. For our experiments on image contrast analysis, we have used the CIFAR-10 dataset with 60,000 images and convolutional neural networks as classification models. Our results can be considered to decide if it is worth using a given test image, according to its calculated contrast applying the proposed HPCM metric, for further recognition tasks. Keywords: Image quality assessment, contrast measures, multilayer perceptron, convolutional neural network, 2D face images, CIFAR-10 images. * Corresponding author. E-mail: angel.sanchez@urjc.es 1. Introduction Digital image processing can produce different types of distortions in the images during acquisition, transmission, storage or compression processes that may result in a loss of their visual quality [46]. These distortions include addition of Gaussian noise, illumination and contrast variations, shadows, blurring or artifacts produced by compression algorithms. Subjective estimation of image quality can be a tedious task when the number of analyzed images is high. In consequence, objective image quality assessment (IQA) measures [14, 23, 46] have been proposed and these must agree with human visual system. Most traditional quality measures include the mean-squared error (MSE) and peak signal-to-noise-ratio (PSNR) which work pixel-wise and compute differences of intensity values at the same image positions between the original and the corresponding distorted image. Image quality does not depend on a single factor but is a composite of different ones, like: contrast, brightness, blur, noise and artifacts [30]. According to Rahman et al [32], quality of images can be broadly grouped in two categories: those having a sufficient signal-to-noise ratio and those not having it. The first ones present good levels of brightness and contrast to be directly processed, and the second ones with poor levels which would need from an image enhancement stage. Illumination changes in a scene produce many complex effects on the image of the objects [13, 44]. For the case of facial images, different methods and measures to determine how they were affected by the illumination conditions have also been proposed [34, 53]. Such variations in the appearance of faces could be even much larger than those caused by personal identity [16]. Adini, Moses and Ullman [1] studied these important image variations due to changes in illumination direction. Many 2D and 3D standard face datasets consider the different types of facial variabilities in order to evaluate the merits of the analyzed face recognition algorithms [13, 53]. However, and to the best of our knowledge, there are not studies that quantify how the isolated illumination degree of patterns affects to the 2D recognition of these faces. In the first part of this paper, we evaluate the illumination effect without considering other possible intrinsic conditions (such as pose, expressions or partial occlusions of the faces) or extrinsic ones (such as indoor/outdoor scenarios, different camera orientations or varying scales) in the facial recognition results. For our purpose, it is important the selection of an appropriate database for the tests where the illumination conditions in patterns could be isolated from other types of facial variations. In the second part of the paper, we extend our study to the influence of image contrast in the recognition of other classes of natural images. Image contrast is perhaps the most important factor in any subjective evaluation of image quality [14, 23]. Experimental studies show that our visual system is more sensitive to contrast than absolute luminance [29, 12]. Contrast is produced by the difference in luminance reflected from two adjacent surfaces and this makes an object distinguishable from other ones and the background. It is well known that factors like image resolution, lighting conditions, viewing distance or image content affect on how human observers will perceive the contrast (i.e. perceived contrast). Since subjective contrast analysis is not practical in real-world applications, a number of objective contrast metrics, which aim to mimic how human evaluate image contrast, have been proposed in the literature [19, 39, 23]. A recent survey by Simone et al. [39] analyzes a collection of proposed perceptual contrast measures for digital images. These metrics are roughly classified as global and local, respectively. The first ones are based on the absolute luminance image values (i.e. the darkest and brightest values of the whole image) while the second ones consider the relations among local luminance variations. In general, global contrast metrics are more compact and easier to compute than local ones. However, they present some problems to describe the perceived contrast since for the human visual system the absolute brightness difference is of less importance than the relative significant brightness relations between pixels and close local surrounding areas in the image [39]. Some well- studied global contrast measures are: Michelson formula [27] which uses maximum and minimum luminance values, Whittle metric [47] which considers the mean luminance in the image, and Calabria and Fairchild equation [6] which models the perceived contrast as a linear relation among image chroma, lightness and high-passed lightness. Some of the most used local contrast measures include the following ones. Peli’s pyramidal frequency-band contrast [31] which calculates for each image band the contrast as the ratio between the band-pass filtered image (at that frequency) to the low-pass filtered one (at an octave below the same frequency, which corresponds to the local luminance mean). Ahumada and Beard [2] measure local contrast for every pixel of the image by subtracting one to the quotient (at each position) between the original image convolved with a Gaussian filter by the image convolved twice with the same Gaussian filter. Tadmor and Tolhurst [42] have used the Difference of Gaussians (DOG) model as the basis of their analysis of contrast. Their proposed contrast measure is computed as the average local contrast of 1000 pixel locations (for images of 256×256) taken randomly, by assuring that the center mask (of the receptive field) and the surround mask do not exceed the edges of the image. Rizzi et al. [35] proposed a pyramid-based metric, called RAMM, which transformed original image to the CIELab color space [15] and subsampled it into levels, then calculate the 8-neighborhood local contrast in each pixel for each scale, and finally compute the overall contrast by averaging the contrast values for the different pyramid levels. Many algorithms for image contrast enhancement have been proposed in image processing applications [12, 37, 33, 26], since this improvement will increase the image quality and will produce better results for further processing tasks (mainly, visualization or recognition). Differently from other works [51, 1, 36] where bad illumination conditions (and more specifically, the contrast) are corrected to increase the recognition of images, in this study we also aim to quantify how the isolated measured contrast of an image will influence its corresponding recognition result. In image classification, an image is included into a category according to its visual content [41]. This process is essential for bridging the semantic gap between the image pixels and the objects present in it. Although the task can be, in general, easily performed by humans, it is still a complex problem for computers. Two major categories of image classification techniques include supervised and unsupervised methods. In supervised learning the classes are known a priori and a function is inferred from labeled training examples. Conversely, in unsupervised learning the considered problem consists of finding the hidden mapping from unlabeled data. Different surveys on techniques for object class recognition in images can be found in the literature [10, 25]. Nowadays, large-scale image classification [24, 49] is becoming an important challenge in pattern recognition. Most works on image classification have been focused on medium- size datasets (i.e. the data can fit into the memory of a desktop computer). However, large datasets like ImageNet [7], SVHN [28] or CIFAR-10/100 [17] have recently attracted the attention of many researchers since these datasets focus the challenge on how to achieve efficiency in both feature extraction and classifier training without affecting the overall recognition results. To learn datasets with about hundreds of thousands or even millions of images, it is necessary to develop effective computational models with a large learning capacity. The deep learning paradigm [4, 8, 21] is a modern branch of machine learning which aims to create and learn better representations from large-scale unlabeled data. Related to this paradigm, Convolutional Neural Networks (CNN) [20] are neuroscience-inspired models, variants of multilayer perceptrons (MLP), which are designed by enforcing local sparse connections and tied weights followed by some form of pooling that produces translation invariant features. CNN are easy to train and have fewer parameters than fully connected networks with the same number of hidden units. Differently from other classification models (i.e. SVM, boosting, decision trees and so on) where manually-designed features are extracted from images (i.e. using methods like SIFT, LBP or HOG [41, 16]), using CNN the image features are “learned” by the architecture. In consequence, the jointly learning of features and classifiers makes the integration (carried out by CNN) faster for applications with large datasets. This paper extends our previous work [38] on the influence of illumination conditions in 2D facial recognition using multilayer perceptrons (MLP). In that paper, we detected a nearly-linear correlation between the illumination quality and the recognition rates for the facial images tested. Now, we focus more precisely on how the contrast of an image is related to its recognition result using a CNN as classification technique. Due to the computational complexity of convolutional neural networks (CNN), low-resolution images are commonly used to train these feature-learning systems. One issue that arises when working with tiny (e.g. 32×32) images is the convenience of using local or global methods for measuring their contrast. We have analyzed different contrast measures on low-resolution images and proposed a new hybrid interest-point-based measure which better defines their contrast. Results show an improvement in the correlation between measured contrast and recognition for such images with respect to other global and local measures analyzed. The rest of the paper is organized as follows: Section 2 summarizes the implemented system to analyze the quantitative influence of illumination level in the 2D facial recognition result. Section 3 generalizes our approach for natural low-resolution images in order to evaluate how specifically the contrast of these images affects their respective recognition results, using convolutional neural networks as classifiers. In Section 4, we describe the databases and the experiments performed on the two previous systems of Sections 2 and 3. Finally, Section 5 outlines the conclusions of this study. 2. Influence of illumination in 2D facial recognition Fig. 1 shows a block diagram of the proposed system designed to automatically determine how illumination quality and recognition results are related to the 2D face images. Differently from other works, we neither compensate nor correct the illumination effect by making some assumptions about the light source nor carry out any illumination normalization stage [13, 22]. We used the images of the Yale Face Database B [1] (described in subsection 4.1) as input data. Our approach includes two subsystems, which work separately with the facial images: the recognition estimation and the illumination estimation ones, respectively. After that, both components are combined in order to determine a trend relation between illumination and recognition using a regression analysis approach. In the 2D facial model [52] considered, a collection of twenty relevant anthropometric facial feature points, corresponding to a subset of those defined by the standard MPEG-4, were automatically extracted from each facial image. For each of these points, 21×21-sized square texture regions, centered on the respective feature points [43], were selected to create the input pattern vectors for each 2D face image. The appropriate size of the regions was determined through experimentation. 2.1. Recognition estimation subsystem The input feature vectors describing each face image were built by concatenating, in a fixed order, the texture region pixels for the selected facial points. We performed the experiments with three different types of classification methods: K-Nearest Neighbors (KNN), Support Vector Machines (SVM) and Multilayer Perceptron Neural Networks (MLP), respectively. Moreover, we have tested two configurations for a 2-hidden-layer MLP networks: the fully connected MLP (FC-MLP), where all neurons of the input layer are connected to all neurons of the first hidden layer, and the non-fully connected MLP one (NFC-MLP), where the input neurons are grouped into blocks (i.e. each block corresponds to a facial feature region) and these are separately connected to different and disjoint sets of neurons in the first hidden layer. We also applied a k- fold cross-validation method for all tested classifiers. Ilumination and contrast estimation Recognition using KNN, SVM and MLP Illumination vs. recognition analysis Texture region extraction Yale Face Database B Histogram analysis Information fuzzification Illumination quality inference system 20 facial anthropometric points (MPEG-4) Defuzzification Recognition using KNN, SVM and MLP Data normalization Test KNN, SVM and MLP classifiers Determine best recognition system 21x21 point-centered texture regions Feature vectors Recognition confidence Illumination quality confidence Fig. 1. Block diagram of the proposed system to analyze the influ- ence of illumination in 2D facial recognition. 2.2. Illumination estimation subsystem To automatically determine the illumination quality of a 2D facial image [34], we computed the corresponding normalized 8-bin histogram for each feature point window wi (i=1..20) of the image I. After that, the average brightness Iav and the contrast C of each window were calculated as follows: (1) )( 8 1 )( 8 1hist ihistiav wIwI    (2) )()( )()( )( minmax minmax ii ii i wIwI wIwI wC    where Ihist(wi) is the frequency value of each histogram bin, and Imax(wi) and Imin(wi) are the respective maximum and the minimum values of the 8-bin histogram used. Note that Eq. (2) recalls the Michelson contrast measure [27]. To determine the illumination quality of each feature window, we first fuzzify the values of the two respective input variables Iav(wi) and C(wi). Triangular-shaped fuzzy sets for the input variables were used. Then, a zero- order Takagi-Sugeno (TS-0) fuzzy inference system [40] was created by considering the fuzzy rules presented in Table 1. Table 1. Fuzzy inference rules for determining image quality using brightness and contrast input variables. Brightness Contrast Very Low Low Medium High Very High Very Bad F F F F E Bad F F E D D Fair E D C C B Good D C B B A Very Good D C B A A The output variable in this table represents the illumination quality and it classifies each feature window into six possible categories which are labeled from ’A’ to ’F’, respectively. Class ’A’ corresponds to the ”best” illumination for a feature window, while class ’F’ to the ”worst” one. The defuzzification process is carried out using the weighted average method. Each feature window is then given a score, which is the highest for class ’A’ and the lowest for class ’F’. In particular, class ’A’ was assigned a score of 5, class ’B’ was given score of 4, and so on, and finally class ’F’ was scored with 0. In this way, the objective quality of each given 2D facial image, represented as a sequence of 20 feature- centered windows, is a normalized integer value between 0 and 100. This result is computed as the sum of the scores of the 20 considered feature windows. 3. Influence of contrast in the recognition of low- resolution natural images Fig. 2 shows the block-diagram of the proposed system to analyze the influence of contrast in low- resolution natural image recognition. The whole dataset CIFAR-10 [17, 18] (which is described in subsection 4.1) was used as input data. CIFAR-10 contains color images at a very low resolution (32×32) and no feature extraction is applied on them. 3.1. Recognition estimation subsystem An adaptation of the Convolutional Neural Network (CNN) described in [9] was trained and then tested to predict the recognition result for each test image. The topology of our network is shown in Fig. 3. CNN take translated versions of the same basis function, and “pool” over them to build translational invariant features. By sharing the same basis function across different image locations (i.e. weight-tying), CNN have significantly fewer learnable parameters. A CNN consists of several layers, which can be of three different types: convolutional, sub-sampling (also called max- pooling) and fully-connected, respectively. Convolutional layers consist of a rectangular grid of neurons, where each neuron takes inputs from a rectangular section of the previous layer. The weights in this rectangular section are the same for each neuron in the convolutional layer. This kind of layer produces an image convolution of the previous one, where the weights specify the convolution filter applied. In addition, there may exist several grids (or feature maps) in each convolutional layer; where each grid takes inputs from all the grids in the previous layer, using potentially different filters. Sub- sampling layers take small rectangular blocks from convolutional layers and subsample them to produce a single output from that block. Fully-connected layers receive all neurons of the previous layer and connect them to each output neuron. These fully- connected layers are not spatially located anymore. Our CNN takes the three 32×32 RGB color planes (feature maps) of the original image as inputs. The CNN lower-layers are composed by alternating convolution and sub-sampling layers, and finally a fully-connected MLP network (with a hidden layer with 100 neurons and 10 neurons in the output layer) is included in the architecture (as shown in Fig. 3). It is important to remark that the designed CNN has not been optimized to improve the recognition results. Our main goal was to have an effective recognition system available for a huge amount of images. Contrast analysis Recognition using CNN Correlation measure Light and contrast transforms CIFAR-10 test images Harris point detection Best 5 points Contrast measure Recognition rate Contrast quality confidence Training Predict test images 10,000 images 90,000 transformed images CIFAR-10 train images 50,000 images Fig. 2. Block diagram of the proposed system to analyze the influ- ence of contrast in low-resolution natural image recognition. 3.2. Contrast estimation subsystem The contrast quality confidence of a given test image is computed using a proposed measure presented by the pseudo-code algorithm of Fig. 4. Our proposed metric is called Harris Points Measured Contrast (HPMC) and it works in two main stages as follows. First, the original image is converted into a gray level one and then a local detector of interest points (in our case, the Harris corner detector [41]) is used to extract the most significant features in the image. These interest or key points have clearly defined spatial positions in the image space. They are rich in terms of information content (i.e. presenting a high gradient value in more than one direction), and they are also stable on local and global changes in the image domain. After computing these feature points, the N “strongest” ones (according to a scalar interest measure) are selected. RGB Image RGB Image RGB Image (32x32) 3 layers Feature mapFeature mapFeature map Feature map (30x30) convolution layer 4x3 layers 4x3 layers sub-samplig layer Feature map Feature map Feature map Feature map (15x15) Feature map Feature map Feature map Feature map Feature map Feature map (13x13) 8x4 layers Feature map Feature map Feature map Feature map Feature map Feature map (4x4) 8 layers convolution layer sub-samplig layer fully-connected MLP 100x1 10x1 3x3 3x3 Fig. 3. Architecture of convolutional neural network (CNN) used to recognize low-resolution natural images. Second, the local contrast value of each interest point is computed using Eq. (2) over a 3×3 window centered at the point. Finally, the image contrast is computed by averaging the local contrast values of selected interest points. The complexity of our HPMC algorithm is bounded by the image size n (in pixels), the number m of interest points detected and the inherent complexity of corner detection by the Harris method which depends on a chosen window size w. In consequence, the whole complexity of the proposed HPMC metric is given by: O(n) + O(m) + O(n·w) = O(max(m, n·w)). Note that as w and m are much smaller than n, then its final complexity can be reduced to O(n). We have introduced this interest point-based contrast measure (HPMC) due to two main reasons. First, interest points have been successfully used in content-based image retrieval and object recognition applications [48, 3]. Second, since the tiny images of CIFAR-10 database present only one object class per image and they are assumed to be easily classified without ambiguity by a human [17], then the perceived contrast of the images can be roughly defined by the contour points separating the object from the background. function HPMC_Analysis(image:GrayImage) → float; // Compute the five local maximal Harris points in the image harrisPoints = HarrisPoints (image, blockSize=2, aperture=3, k=0.04); maxPoints = localMaximum (harrisPoints, windowSize=5); best5 = getHighestN (maxPoints, N=5); // Compute contrast around the selected Harris points output = 0; for each point in best5 do maxVal = getMaximumNeighbour (point, image, windowSize=3); minVal = getMinimumNeighbour (point, image, windowSize=3); pointContrast = (maxVal - minVal) / (1 + maxVal + minVal); output += pointContrast; return (output / 5); //average value Fig. 4. Algorithm of proposed HPMC method to compute the contrast of low-resolution natural images. Fig. 5 shows some detail of applying the HPMC metric to a sample “cat” class image of CIFAR-10. Note the positions of the N=5 Harris points are mainly in the contour regions. 4. Experimental results This section first describes the image datasets used in our experiments. Then, we present and analyze the different experiments performed on the two respective datasets to determine the correlation between illumination/contrast estimations and recognition results. (a) (b) (c) (d) Fig. 5. Detail of extraction of the five maximal strength Harris feature points: (a) original “cat” image, (b) transformed gray level image, (c) Harris heat map, and (d) positions of 5 strongest Harris corner-points in the image. 4.1. Datasets In our experiments, we have selected two image datasets which are appropriate to study the relations between image lighting conditions and recognition results: Yale Face Database B and CIFAR-10, respectively. The Yale Face Database B [11] is well suited for illumination-varying 2D facial recognition experiments. It contains 5,760 images taken from 10 subjects under 576 viewing conditions (that is, 9 poses and 64 illumination conditions per subject’s pose). Images in the database are divided into 5 subsets based on the angle of the light source directions. This set of images was captured using a purpose-built illumination rig which is fitted with 64 computer controlled strobes. The 64 images of each subject in a particular pose were almost- simultaneously acquired by the system, so there could appear only a very small change in head pose for these 64 images. Among all the images, there is one with ambient illumination and it was captured without a strobe going off. Since we were only concerned in this work about the illumination of the facial images (and not about the pose changes), only the 640 frontal-pose images of the 10 subjects were used in our experiments (since these images represent the 64 possible illumination conditions under study). The CIFAR-10 database [17, 18] contains 10 classes and 60,000 color natural images (where each class has 6,000 images). All the images are in RGB format and at the very low spatial resolution of 32×32. These tiny images were labeled by hand. The corresponding class labels are: “airplane”, “automobile”, “bird”, “cat”, “deer”, “dog”, “frog”, “horse”, “ship” and “truck”, respectively. In this dataset, a subset of 50,000 images is used for training and the remaining 10,000 for tests. The images included in the dataset are assumed to be easy to be classified without ambiguity by a human. The dataset was collected by Krizhevsky, Nair and Hinton [18]. In recent years, CIFAR-10 has been frequently tested using different classification methods (see for example [18, 45, 50]). Figs. 6 and 7 show some images of the two datasets used. (a) (b) (c) (d) Fig. 6. Yale Face Database B: four sample images of the same subject, presenting different illumination conditions. 4.2. Experiments using Yale Face Database B Facial recognition experiments were repeated for the four different types of considered classifiers (see subsection 2.1): K-NN, SVM, FC-MLP and NFC- MLP, respectively. The best facial recognition results for the parameter configurations and pattern classification methods can be found in more detail in [38]. Classifiers were tested both using single- validation and cross-validation approaches. It can be noticed that best classification results for the test patterns in this database were achieved using a fully- connected multilayer perceptron (FC-MLP) classifier that produced 100% correct classification (using 70.76% of the images for training and 29.24% for test). This MLP network contained two hidden layers with 400 and 30 respective neurons, and the output layer contained 10 neurons (corresponding to the subject classes). It was trained in Matlab using standard feedforward backpropagation with the following parameter values: learning_rate=0.001, goal=0.01 and tansig as activation function. airplane automobile bird cat deer dog frog horse ship truck Fig. 7. Five sample images of the ten object classes contained in the CIFAR-10 database. Next, the recognition results produced by the FC- MLP for each test pattern were properly combined, in form of 2-tuples, with their corresponding independently-estimated illumination qualities (computed as explained in subsection 2.2). Fig. 8 plots these (illumination, recognition) tuples for the 260 test patterns. It can be seen that there exists a linear correlation between both variables by computing the regression line (in red) for all the points in Fig. 8. We obtained: y = 0.764x + 0.272, showing a near-linear tendency, and that ”when the illumination quality of 2D facial images increases, the corresponding recognition result achieved by the FC-MLP neural classifiers also increases linearly”. 4.3. Experiments using CIFAR-10 To analyze the influence of contrast in the recognition results for low-resolution natural images, the 10,000 CIFAR-10 test patterns were used. In general, the CIFAR-10 images have correct levels of bright and contrast. In order to analyze how light and contrast conditions influence the recognition, we produced a set of altered images for each original one by transforming their contrast values for worsening them. So, a set of 8 additional contrast-changed images was generated for each CIFAR-10 test image by applying the three types of point contrast transformations shown by Fig. 9. In particular, darkening gamma corrections (three images were created for parameter values n=2, n=3 and n=4), brightening gamma corrections (other three images were created for parameter values n=0.25, n=0.5 and n=0.75) and brightening piecewise transformation (two new images were created for parameters s=2 and s=4), respectively. Usually, when applying darkening filters (with n>1) the image contrast is increased and when using brightening filters (with n<1), the contrast is decreased. Piecewise transformation of Fig. 9(c), for s=2 and s=4, increases the contrast value of the images for some metrics. CNN have outperformed other classification systems when applied to different recognition problems like digit recognition, natural language processing and object recognition [18]. The architecture of our CNN is shown by Fig. 3. It takes the three 32×32 RGB image color planes as inputs, then there are two pairs of alternating convolution and sub-sampling layers (see details in Fig. 3). Finally, it has a fully-connected MLP network (including one hidden layer with 100 neurons and 10 neurons in the output layer). To build and train the CNN, we used Theano [5]. This numerical computation library for Python takes advantage of both GPU and CPU to optimize the high number of computations required to train the network. Our CNN was trained during 50 epochs and the parameter values were set to 0.01 for learning_rate and to 0.9 for momentum, respectively. feedForward.lr.0.001.goal.0.01.400.30.Train.390.Test.260.logsig.mat Illumination quality R e c o g n it io n r a te Fig.8. Illumination quality vs. recognition results for the FC-MLP neural network configuration using Yale Database B. Table 2 shows the resulting contrast-altered images after applying the contrast transformations of Fig. 9 to seven sample CIFAR-10 images. This way, we now have 90,000 test images available to estimate their contrast and evaluate their recognition value. We have compared our proposed HPMC metric 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 n=2 n=3 n=4 x n 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 n=0,25 n=0,5 n=0,75 x n 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 s=2 s=4 1 2 (1+(2 trunc (x + 1 2 )−1 )(2 x−1 ) s ) (a) (b) (c) Fig. 9. Histogram-based contrast transformations used in the experiments: (a) darkening gamma corrections (n>1), (b) brightening gamma corrections (n<1) and (c) brightening piecewise transformation. with the Michelson [27], RAMM [35], and Tadmor & Tolhurst [42] contrast measures on the whole extended CIFAR-10 image collection of 90,000 images. Tables 3, 4, 5 and 6 respectively show the contrast values calculated by the four measures for the images contained in Table 2. Table 2. Resulting contrast-altered images after applying the contrast transformations of Fig. 9 to seven sample CIFAR-10 images: columns (1)-(3) correspond to the respective brightening gamma corrections of Fig. 9(b) for n=0.25, n=0.5 and n=0.75 values; column (4) is the original image (n=1); columns (5)-(7) correspond to the respective darkening gamma corrections of Fig. 9(a) for n=2, n=3 and n=4; and columns (8)- (9) correspond to the respective brightening piecewise transformations of Fig. 9(c) for parameter s=2 and s=4 values. n= 0.25 n = 0.5 n= 0.75 n = 1 n = 2 n = 3 n = 4 s=2 s=4 cat boat 1 boat 2 plane frog 1 frog 2 auto Table 3. Calculated contrast values of applying Michelson contrast measure to the images in Table 2. n= 0.25 n = 0.5 n= 0.75 n = 1 n = 2 n = 3 n = 4 s=2 s=4 cat 0.24 0.46 0.65 0.81 0.98 1.00 1.00 1.00 0.55 boat 1 0.49 0.83 0.95 1.00 1.00 1.00 1.00 1.00 1.00 boat 2 0.39 0.71 0.91 0.98 1.00 1.00 1.00 1.00 1.00 plane 0.20 0.39 0.56 0.73 0.98 1.00 1.00 1.00 0.39 frog 1 0.31 0.59 0.80 0.94 1.00 1.00 1.00 1.00 1.00 frog 2 0.37 0.67 0.87 0.97 1.00 1.00 1.00 1.00 1.00 auto 0.42 0.74 0.92 0.98 1.00 1.00 1.00 1.00 1.00 Table 4. Calculated contrast values of applying RAMM contrast measure to the images in Table 2. n= 0.25 n = 0.5 n= 0.75 n = 1 n = 2 n = 3 n = 4 s=2 s=4 cat 0,25 0,43 0,54 0,61 0,65 0,54 0,41 0,36 0,13 boat 1 0,42 0,63 0,74 0,80 0,86 0,85 0,81 0,38 0,33 boat 2 0,30 0,51 0,65 0,75 0,85 0,79 0,70 0,37 0,17 plane 0,27 0,47 0,62 0,72 0,87 0,81 0,73 0,30 0,09 frog 1 0,31 0,51 0,63 0,69 0,67 0,50 0,34 0,50 0,25 frog 2 0,32 0,49 0,57 0,59 0,45 0,27 0,15 0,63 0,40 auto 0,56 0,82 0,91 0,92 0,75 0,59 0,48 0,77 0,82 Table 5. Calculated contrast values of applying Tadmor & Tolhurst contrast measure to the images in Table 2. n= 0.25 n = 0.5 n= 0.75 n = 1 n = 2 n = 3 n = 4 s=2 s=4 cat 0.09 0.09 0.09 0.10 0.10 0.06 0.03 0.09 0.08 boat 1 0.09 0.09 0.10 0.09 0.04 -0.02 -0.07 0.13 0.13 boat 2 0.08 0.08 0.08 0.08 0.07 0.04 0.00 0.08 0.08 plane 0.08 0.08 0.08 0.07 0.06 0.04 0.02 0.08 0.08 frog 1 0.08 0.08 0.07 0.06 0.01 -0.05 -0.11 0.08 0.09 frog 2 0.07 0.06 0.04 0.01 -0.05 -0.08 -0.10 -0.10 -0.04 auto 0.07 0.06 0.04 0.02 -0.05 -0.11 -0.15 -0.01 0.01 Table 6. Calculated contrast values of applying our HPMC contrast measure to the images in Table 2. n= 0.25 n = 0.5 n= 0.75 n = 1 n = 2 n = 3 n = 4 s=2 s=4 cat 0,11 0,21 0,31 0,42 0,68 0,85 0,93 0,73 0,32 boat 1 0,28 0,53 0,51 0,69 0,62 0,74 0,83 0,94 0,98 boat 2 0,14 0,30 0,37 0,35 0,65 0,86 0,74 0,66 0,57 plane 0,12 0,25 0,29 0,38 0,68 0,79 0,83 0,56 0,17 frog 1 0,18 0,36 0,53 0,64 0,61 0,75 0,78 0,99 0,85 frog 2 0,25 0,49 0,67 0,85 0,84 0,79 0,82 0,95 0,97 auto 0,27 0,52 0,62 0,77 0,94 0,97 0,92 0,93 0,98 Table 7. Estimated contrast intervals versus percentage of correctly recognized test images by the Michelson, RAMM, Tadmor & Tolhurst, and proposed HPMC method. Contrast Number of images with this contrast Recognized images percentage (%) Michelson RAMM Tadmor & Tolhurst HPMC Michelson RAMM Tadmor & Tolhurst HPMC 0.0 10 10 1 10 50.00 50.00 100.00 50.00 0.1 478 2995 19 2434 19.46 28.38 26.32 23.01 0.2 2213 5212 35 8267 18.35 31.35 20.00 22.25 0.3 4464 8201 76 8526 19.38 30.51 22.37 32.56 0.4 4771 10840 171 8211 24.73 33.96 24.56 40.97 0.5 5597 11765 391 8107 30.18 36.35 28.13 45.73 0.6 4097 11688 839 8266 41.57 40.19 36.47 47.21 0.7 4795 11175 1911 9464 42.59 42.26 37.21 45.31 0.8 4917 9675 4649 11979 46.98 44.69 39.49 43.57 0.9 6048 7421 20809 13055 49.01 48.32 41.65 41.62 1.0 52610 4732 61099 11681 43.07 50.95 39.63 41.30 A summary of estimated contrast histograms versus correctly recognized test images by the Michelson, RAMM, Tadmor & Tolhurst, and the proposed HPMC methods is presented in Table 7. Note that, with the transformations shown in Fig. 9, new images that modify the contrast of the original ones have been created. This way, we have tried to produce a more or less uniform distribution of images according to their contrast. The fifth column of Table 7, corresponding to the HPMC measure, produced a better distribution of the images contrasts than the other three compared measures (columns 2-4 of this table). Note that the bold values of the four last columns correspond to the best percentage of recognized images for each contrast. Using the data of the first five columns in Table 7, we visualized in Fig. 10 the produced distributions of the contrast values by the three compared measures for the 90,000 test images used. Note that the implementation of Tadmor & Tolhurst in [42] has been normalized to [0,1] for counting the number of cases. Observe that the peak produced by the Michelson measure which projected most of contrast values into the interval [0.9, 1.0]. It can be noticed that our HPMC metric distributes the images more uniformly into the contrast intervals than the other compared measures. The four curves of Fig. 11 respectively show the correlations of predicted contrast (by the compared contrast measures) with their corresponding percentages of recognition results. To build these curves, we have used the data shown on the last four columns of Table 7. Note that the recognition result of each test image is independently computed from the estimation of its contrast, and this recognition value is obtained using a trained CNN. Due to the very high number of test images, for a better visualization, the recognition results corresponding to test images in Fig. 11 have been grouped into contrast intervals and then averaged. Fig. 10. Produced distributions of contrast values by the three compared measures and for the 90,000 test images used. The fact that “images with an intermediate contrast value should be better recognized” is shown on Fig. 11. Note that for intermediate contrast values, between 0.3 and 0.75, the HPMC values corresponds to higher recognition results than Michelson, RAMM and Tadmor & Tolhurst ones. Related to this, it can be observed that “better recognized” images of Table 2 correspond to those placed in the columns from 3 to 5 and these images present intermediate contrast values. Finally, the respective complexities of compared contrast metrics are: O(n) for Michelson and HPMC metrics; and O(n·log(n)) for RAMM and Tadmor & Tolhurst metrics (as reported in the corresponding papers[27,35, 42] ). Fig. 11. Predicted contrast vs. recognition results (in %) for the analyzed contrast measures. 5. Conclusion This paper extends our previous study on the isolated influence of illumination quality in 2D facial recognition (using the Yale Face Database B). We showed that by separately estimating the illumination quality of the facial images (using a fuzzy inference system that combines average brightness and contrast values of the patterns) and by recognizing them using a trained MLP, there was a nearly-linear correlation between both illumination and recognition variables. We have increased our study to evaluate the influence of measured contrast in the large-scale recognition of natural images. For this work, we have used the CIFAR-10 database containing tens of thousands of low-resolution images. Due to their successful application for large-scale recognition problems, trained Convolutional Neural Networks (CNN) have been included in our system as recognition model. We created eight new images for each CIFAR-10 test pattern with the aim of having the whole collection of images according to their contrast value uniformly-distributed. After analyzing different contrast metrics in the literature, a new contrast measure called HPCM, based on the computation of image interest points (more specifically the Harris corner points), has been proposed and favorably compared to other three well- studied contrast metrics (Michelson, RAMM, and Tadmor & Tolhurst, respectively). Our HPMC metric assigns intermediate contrast values (between 0.3 and 0.75) to those images correctly recognized by the CNN. It happens for a higher number of images than the other compared methods. This fact is close to how contrast is perceived by humans. Our results can be used as contrast/illumination quality measure of natural images and also to decide whether or not it would be worth using these test images for further recognition tasks. As future work, we aim to extend our study to using normal-resolution natural images. The proposed CNN classification model can be properly trained to gain computational efficiency when evaluating the effect of contrast in the recognition of the object classes corresponding to these larger- resolution images. It is also interesting to quantify how close the proposed contrast measure is to the perceived contrast by human observers for the same images. Another planned natural continuation of this research is the use of chrominance information in images in order to analyze how they are recognized according this property. Acknowledgements This work was partially supported by the Spanish Ministerio de Economía y Competitividad under project number TIN2014-57458-R. The author A. Sánchez gratefully thanks the institutional support provided by the Prometeo Programme and the SENESCYT (Ecuador). References [1] Y. Adini, Y. Moses and S. Ullman, Face recognition: the problem of compensating for changes in illumination direction, IEEE Trans. Pattern Analysis and Machine Intelligence 19 (1997), 721-732. [2] A.J. Ahumada and B.L. Beard, A Simple Vision Model for Inhomogeneous Image Quality Assessment, In: SID Digest of Technical Papers, vol. 29, pp. 1109-1111, 1998. [3] P. Azad, T. Asfour and R. Dillmann, Combining Harris inter- est points and the SIFT descriptor for fast scale-invariant ob- ject recognition, Proc. IEEE/RSJ international conference on Intelligent robots and systems, 2009, pp. 4275-4280 [4] Y. Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning 2 (2009), 1-127. [5] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley and Y. Bengio, Theano: A CPU and GPU Math Expression Compiler, Proc. Of Python for Scientific Computing Conference (SciPy), pp. 1-7, 2010. [6] A.J. Calabria and M.D. Fairchild, Perceived image contrast and observer preference II. Empirical modeling of perceived image contrast and observer preference data, The Journal of Imaging Science and Technology 47 (2003), 494-508. [7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database, Proc. IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR’09), pp. 248-255, 2009. [8] S. Dieleman, Tutorial on Deep Learning & Theano, 2015. [9] L. Deng and D. Yu, Deep Learning: Methods and Applica- tions, Foundations and Trends in Signal Processing archive 7 (2014), pp. 197-387. [10] R. Fergus, P. Perona, and A. Zisserman, Object class recogni- tion by unsupervised scale-invariant learning, Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR’03), vol. 2, 2003, pp. 264–271. [11] A. Georghiades, P.N. Belhumeur and D. Kriegman. From Few to Many: Illumination Cone Models for Face Recognition under Variable lighting and Pose. IEEE Trans. Pattern Analy- sis and Machine Intelligence 23 (2001). 643-660. [12] R.C. Gonzalez and R.E. Woods, Digital Image Processing, Third Edition, Prentice Hall, 2008. [13] R. Gross, S. Baker, I. Matthews and T. Kanade, Face Recog- nition across Pose and Illumination, In: S.Z. Li and A.K. Jain (eds): Handbook of Face Recognition, Springer-Verlag, 2004, pp. 193-216. [14] K. Gu, G. Zhai, X. Yang, W. Zhang and M. Liu, Subjective and objective quality assessment for images with contrast change, Proc. 20th IEEE International Conference Image Processing (ICIP’13), 2013, pp. 383-387. [15] G. Hoffmann, CIELab Color Space, PDF Document, 2003. URL: http://docs-hoffmann.de/cielab03022003.pdf [16] J. Hou, Z. Chen, X. Qin, and D. Zhang, Automatic image search based on improved feature descriptors and decision tree, Integrated Computer-Aided Engineering, 18:2, 2011, pp. 167- 180. [17] A. Krizhevsky. Learning multiple layers of features from tiny images. Technical Report, 2009. URL: http://www.cs.toronto.edu/~kriz/learning-features-2009- TR.pdf [18] A. Krizhevsky, I. Sutskever and G.E. Hinton, Imagenet classification with deep con volutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105. [19] E. C. Larson and D. M. Chandler, Most apparent distortion: full-reference image quality assessment and the role of strate- gy, Journal of Electronic Imaging 19 (1), 2010. [20] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient- Based Learning Applied to Document Recognition, Proceed- ings of the IEEE 86 (1998), 2278-2324, [21] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015), 436–444 [22] Y. Li, C. Wang and X. Ao, Illumination Processing in Face Recognition. In: M. Oravec (ed.): Face Recognition, InTech. 2010. URL: http://www.intechopen.com/books/face- recognition/illumination-processing-in-face-recognition [23] W. Lin and C.-C. Jai Kuo, Perceptual visual quality metrics: A survey, Journal of Visual Communication and Image Rep- resentation 22 (2011), 297-312. [24] Y. Lin, F. Lv, S Zhu, M Yang, T Cour, K Yu, L Cao and T. Huang, Large-scale Image Classification: Fast Feature Extrac- tion and SVM Training, Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11), 2011, pp. 1689– 1696. [25] D.A. Lisin et al, Combining Local and Global Image Features for Object Class Recognition, Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 3, 2005, pp. 47-54. [26] J.J. McCann and A. Rizzi, The Art and Science of HDR Im- aging,. John Wiley & Sons, 2012. [27] A. Michelson, Studies in Optics, University of Chicago Press, 1927. [28] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu and A.Y. Ng, Reading Digits in Natural Images with Unsupervised Fea- ture Learning, NIPS Workshop on Deep Learning and Unsu- pervised Feature Learning, pp. 1-9, 2011. [29] S. Ohayon, W.A. Freiwald and D.Y. Tsao, What makes a cell face selective? The importance of contrast, Neuron. 74 (2013), 567–581. [30] M. Pedersen, N. Bonnier, J. Y. Hardeberg, and F. Albregtsen, Attributes of image quality for color prints. Journal of Elec- tronic Imaging, 19 (2010), 011016–1–011016–13. [31] E. Peli, Contrast in complex images, Journal of the Optical Society of America 7 (1990), 2032-2040. [32] Z. Rahman, D.J. Jobson and G.A. Woodell, Multi-scale reti- nex for color image enhancement, Proc. IEEE International Conference on Image Processing (ICIP’96), vol. 3, 1996, pp. 1003-1006. [33] E. Reinhard, G. Ward, S. Pattanaik, and P. Debevec. High Dynamic Range Imaging - Acquisition, Display and Image- Based Lighting, Morgan Kaufmann Publisher, 2005. [34] D. Rizo-Rodrıguez, H. Mendez-Vazquez and E. Garcıa-Reyes, An Illumination Quality Measure for Face Recognition, Proc. 20th International Conference on Pattern Recognition (ICPR’10), 2010, pp. 1477-1480. [35] A. Rizzi, T. Algeri, G. Medeghini and D. Marini, A proposal for contrast measure in digital images, Proc. Second European http://docs-hoffmann.de/cielab03022003.pdf http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf Conference on Color in Graphics, Imaging, and Vision (CGIV’04), 2004, pp. 187-192. [36] J. Ruiz del Solar and J. Quinteros, Illumination compensation and normalization in eigenspace-based face recognition: A comparative study of different pre-processing approaches, Pat- tern Recognition Letters 29 (2008), 1966–1979. [37] F. Saitoh, Image contrast enhancement using genetic algo- rithm, Proc. 1999 IEEE International Conference on Systems, Man, and Cybernetics (SMC’99), vol. 4, 1999, pp. 899-904. [38] A. Sanchez, J.F. Velez and A.B. Moreno, On the Influence of Illumination Quality in 2D Facial Recognition, in: Bioinspired Computation in Artificial Systems, LNCS 9108, 2015, pp. 79- 87. [39] G. Simone. M. Pedersen and J.Y. Hardeberg, Measuring perceptual contrast in digital images, Journal of Visual Com- munication and Image Representation 23 (2012), 491-506. [40] M. Sugeno, Industrial Applications of Fuzzy Control, Else- vier Publishing Company, 1985. [41] R. Szeliski, Computer Vision: Algorithms and Applications, Springer, 2010. [42] Y. Tadmor and D.J. Tolhurst, Calculating the contrast of retinal ganglion cells and LGN neurons encounter in natural scenes, Vision Research 40 (2000), 3145-3157. [43] Q.D. Tran and P. Liatsis, Improving Fusion with Optimal Weight Selection in Face Recognition, Integrated Computer- Aided Engineering, 19:3, 2012, 229-237. [44] M. Turk and A. Pentland, Face recognition using eigenfaces, Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’91), 1991, pp. 586–591 [45] L. Wan, M. Zeiler, S. Zhang, Y. Le Cun. and R. Fergus, Regularization of neural networks using dropconnect, Proc. 30th International Conference on Machine Learning (ICML’13), 2013, pp. 1058–1066. [46] Z. Wang, A.C. Bovik, H.R. Sheikh and E.P. Simoncelli, Image Quality Assessment: From Error Visibility to Structural Similarity, IEEE Trans. Image Processing 13 (2004), 600-612. [47] P. Whittle, Increments and decrements: luminance discrimi- nation, Vision Research 26 (1986), 1677-1691. [48] C. Wolf, J. Jolion, W. Kropatsch and H. Bischof, Content based image retrieval using interest points and texture features, Proc. 15th International Conference on Pattern Recognition, vol. 4, 2000, pp. 234 - 237. [49] Y. Yu and T. McKelvey, A Robust Subspace Classification Scheme Based on Empirical Intersection Removal and Sparse Approximation, Integrated Computer-Aided Engineering, 22:1, (2015), 59-69. [50] M.D. Zeiler and R. Fergus, Stochastic pooling for regulariza- tion of deep convolutional neural networks, Proc. International Conference on Learning Representations, 2013. [51] J. Zhang and X. Xie, A study on the effective approach to illumination-invariant face recognition based on a single im- age, Proc. 7th Chinese Conference on Biometric Recognition (CCBR’12), LNCS 7701, Springer, 2012, pp. 33-41. [52] L. Zhang, M. Yu Yan and Yanjun Zeng, Fatigue Detection with 3D Facial Features based on Binocular Stereo Vision Technology, Integrated Computer-Aided Engineering, 21:4, (2014), 387-397. [53] X. Zou, J. Kittler and K. Messer, Illumination Invariant Face Recognition: A Survey, Proc. First IEEE Intl. Conf. on Bio- metrics: Theory, Applications and Systems (BTAS’07). 2007. pp. 1-8.