Quality as an imagespecific characteristic perceived by an average human observer
Nonreference image quality measures. Blur as an important factor in its perception. Determination of the intensity of each segment. Research design, data collecting, image markup. Linear regression with known target variable. Comparing feature weights.
Рубрика  Программирование, компьютеры и кибернетика 
Вид  дипломная работа 
Язык  английский 
Дата добавления  23.12.2015 
Размещено на http://www.stud.wiki/
Размещено на http://www.stud.wiki/
Introduction
Ranking images by their quality is one of the most common challenges in many areas of applied science and technology. For example, a set of images returned by a web search may have good search relevance, but if these relevant images also have the best quality, this would certainly improve users' impression. Another area is medicine, where patient examinations result in terabytes of visual data which is hard to analyze at a time. This is why preprocessing this data by extracting the best images for further diagnostics will be a timesaving solution for physicians. Finally, if we know what defines image quality numerically, we can start developing quality enhancing filters, making imaging data more appealing to human visual system.
Image quality is a complex concept which might have different interpretations. In our work we consider quality as an imagespecific characteristic perceived by an average human observer. Thus, an image of good quality corresponds to our general idea of regular, informative, and well presented. Presently, it's common to measure image quality with a single metric like contrast, blurriness etc.
Our goal is to provide a more complex and formal definition of human quality perception by identifying the top factors responsible for visual quality. To eliminate any subjectivity, we consider quality as an objective, nonreference multidimensional measure, that we want to be able to compute independently without comparing the image to the others. Our practical goal is to find a restricted set of features that are most responsible for quality perception. Such a set would become a first step in solving a practical issue of creating a useful tool for displaying medical images improving their quality.
1. Nonreference image quality measures
Most of research published on image quality uses quality measures estimated for original image and its distorted copies []. In this study we use so called nonreference measures when quality is estimated for single image independently. We use a number of previously developed measures and a number of basic measures like contrast as described below.
1.1 Blurriness measures
Even partly blurred image affects human perception of quality. That is why we consider blurriness as an important factor of image quality perception. In this work we use two different blurriness measures.
The first one described by F. Crete and T. Dolmiere [1] uses lowpass filter and is based on principle that gray level of neighboring pixels in a less blurred image changes with higher variation than in its blurred copy. So, they compute absolute vertical and horizontal difference D for neighboring pixels in original and blurred images (1):
(Eq. 1a)
(Eq. 1b)
where I(x,y) is the intensity value at the (x,y) pixel, h and w are height and width of image. After that, variation of neighboring pixels before and after blurring needs to be analyzed: if variation is high, the original image is considered to be sufficiently sharp. To evaluate variation, we consider only the differences that decreased, and obtain variation V for vertical and horizontal directions (2):
(Eq. 2)
where D_{B_ver(x,y) }is the absolute difference for blurred image B.
Finally, blurriness for vertical direction is computed:
(Eq. 3)
Horizontal blurriness is computed in the same way. Finally, maximum of two is selected as the final blurriness measure: F_{blur }= max(F_{blur_hor}, F_{blur_ver}). Further we will write it as F_{blur_1}.
Another blurriness measure was presented by Min Goo Choi [2], based on edge extraction using intensity gradient. The authors define horizontal and vertical absolute difference value of a pixel computed as a difference between its left and right or upper and lower neighboring pixels. Then they obtain the mean horizontal and vertical absolute differences D_{hor_mean }for the entire image as in (Eq. 4).
(Eq. 4)
Then each pixel value is compared with mean absolute horizontal difference values computed for the whole image to select edge candidates as C_{hor}(x,y):
(Eq. 5)
If candidate pixel C_{hor}(x,y) has absolute horizontal value larger than its horizontal neighbors, this pixel will be classified as edge pixel E_{hor}(x,y) as shown in (6).
(Eq. 6)
Each edge pixel is examined to find whether it corresponds to a blurred edge or not. First, horizontal blurriness of a pixel is computed according to (7).
(Eq. 7)
Vertical value is obtained in the same way, maximum of two is selected for final decision. Pixel is considered blurred if its value is larger than a predefined threshold (0.1 suggested in the paper).
(Eq. 8)
Finally, the resulting measure of blurriness for the whole image is called inversed blurriness and is computed as a ratio of blurred edged pixels count to edge pixels count (9).
(Eq. 9)
Further we will term this measure F_{blur_2} to discern it from blur described in [1].
We assume that increase of blurriness should negatively affect quality perception because a very blurred image will loose important information and be less attractive.
1.2 Image entropy
The basic idea behind entropy is to measure the uncertainty of the image. The more information and less noise the image contains, the more useful it would be, and we might relate image usefulness to its objective quality. In our study Shannon entropy Formula used to calculate entropy is taken from Wikipedia ^{} was computed for the entire image, its foreground, and its background according to (Eq. 10).
(Eq. 10)
where p(I_{k}) is the probability of the particular intensity value I_{k.}
We assume that higher entropy should mean that more signal is contained in the image. For example, if there are less details and mode plain surfaces, entropy would be less. However, noisy image would have more entropy, so we will consider entropy for three levels of image.
1.3 Segmentation
Presented in [3], it shows how much various segments of image can be separated. We use the simplest yet most intuitive implementation comparing two major segments: image background (seg1) and foreground (seg2). In case of this study we simply computed average intensity value and used classified all pixels with lower intensity as background, while the rest of pixels was foreground. To compute segmentation measure, average difference U for neighboring pixels in 3x3 sliding window is computed for each image segment (Eq. 11):
(Eq. 11)
leading to the following measure W:
^{ }(Eq. 12)
Then we compute average pixel intensity in each segment and obtain squared difference between average intensities of very pair of segments  in our case there is only one pair. Inversed sum of squared differences of average intensities is called B:
(Eq. 13)
Resulting measure is obtained as:
F_{sep} = 1000*W+B (Eq. 14)
and it will be high for images with high separability between segments and low separability within segment. In our case this measure makes sense only for one set of images depicting trees because another set of medical images mostly presents dark background, which is clearly separated from the foreground.
1.4 Flatness
This measure is described in [4] and uses twodimensional discrete Fourier transform of the image. First, we obtain 2D Discrete Fourier Transform of the image, which is transformed to onedimensional vector F_{V}. Next, spectral flatness S_{F} is computed as ratio of geometric to arithmetic mean:
(Eq. 15)
The resulting measure proposed in the paper is called entropy power and is obtained as a product of spectral flatness measure S_{F} presented in (15) and image variance as shown:
(Eq. 16)
where is average intensity value for the image. This measure is assumed to be higher for less informative, nonpredictive and redundant images.
1.5 Sharpness
This measure [5] is based on assumption that differences of neighboring pixels change more in the areas with sharp edges. Therefore the authors compute secondorder difference for the neighboring pixels as a discrete analog of second derivative for the image passed through denoising median filter:
, (Eq. 17)
where I_{m }the original image passed through median filter
Then authors define vertical sharpness for each pixel S_{ver} as shown below:
, (Eq. 18)
and each pixel is treated as sharp if its sharpness exceeds 0.0001. Number of sharp pixels N_{Sver }is computed, and the edge pixels are found with Canny method, number N_{Ever }being their count. Then the same process is repeated in the horizontal direction, and the sharp to edge pixels ratio for vertical and horizontal directions is computed as:
(Eq. 19)
We assume that sharper image should be percepted as a more attractive and informative.
1.6 Blockness measure
This measure estimates image from the point of block artifacts [6]. Absolute intensity differences for neighboring pixels are obtained for vertical and horizontal directions as shown in (1), each element of resulting matrix is then normalized:
(Eq. 20)
By taking the average for each column of matrix we obtain the horizontal profile of image P_{hor} as shown in (Eq.21):
(Eq. 21)
The vertical profile is assessed in the same way, and1D DFT is applied to both profiles. Magnitude M of DFT coefficients is than considered:
(Eq. 22)
where 0 T w2.
Vertical blockness measure Bl for the block size Z is computed as shown in (Eq. 23). Due to DFT nature, M_{hor}(T) will have peaks at T, where number b=1,2…Z. Values for M_{hor}(T) at these peak points correspond to horizontal blockness of image Bl_{hor}:
(Eq. 23)
Vertical blockness measure can be obtained similarly. In our study 2, 4, 6 and 8 pixels were used as block width. Resulting measure is shown in (24):
, (Eq. 24)
where r and 1r are weights for horizontal and vertical measures. We use r equal to 0.5. This measure will be higher for images distorted with block artifacts.
1.7 Fractal dimension
The idea of possible relation between image quality and amount of image details brings us to the measures of fractal dimensions. We detect main contours in the image using Canny method and then estimate fractal dimension of the obtained curve. We use boxcount to compute dimension (Eq. 25). N stands for the number of square blocks with side е with е =2, 3, 4, and 5.
(Eq. 25)
We assume that higher values measure of fractal dimension would correspond to more informative images containing more information.
1.8 Noise level
It is natural to assume that the presence of noise can be detrimental for the perceived image quality. Therefore we included a noise measure developed by Masayuki T. [7]. In this work, noise level is described as standard deviation of the Gaussian noise. The authors propose a patchbased algorithm. First, the original image is decomposed into overlapping patches, and the model for the whole image is written as p_{i} = z_{i}+n_{i}, where z_{i} is the original image patch with ith pixel in its center transformed to a onedimensional vector, and p_{i} is the observed patch (also transformed to vector) distorted by Gaussian noise which is presented as vector n_{i}. To estimate noise level we need to obtain unknown standard deviation using only the observed distorted noisy image.
All the image patches are treated as data in Euclidean space, its variance can be projected onto single axis which direction is defined by vector u. Variance of data V projected on u can be written as:
(Eq. 26)
where is standard deviation of the Gaussian noise.
Minimum variance of data direction is than defined using Principal Component Analysis (PCA). First, data covariance matrix р is defined as:
(Eq. 27)
where b is number of patches, m is the average in dataset {p_{i}}. Then the variance of the original data is projected onto minimum variance direction equals the minimum eigenvalue :
, (Eq. 28)
where ? is covariance matrix for noisefree patches z.
The noise level can be estimated if we decompose minimum eigenvalue of the noisy patches covariance matrix, which is an illdisposed problem because minimum eigenvalue for noiseless patches covariance matrix is unknown. Then the authors suggest selecting weak textured patches from noisy images because such patches span lowdimensional space and minimum eigenvalue of their covariance matrix is close to zero, so their noise level F_{noise} can be estimated as:
, (Eq. 29)
where р' is the covariance matrix for weak textured patches.
Undoubtedly, the most important part of the proposed algorithm is the selection of weak textured patches. The main idea is to compare maximum eigenvalue of gradient covariance matrix of patch with some threshold. Gradient covariance matrix C of patch j is computed as:
(Eq. 30)
where G_{j }= [D_{hor}j, D_{ver}j] and D_{hor} and D_{ver} are horizontal and vertical derivative operators.
To select weak textured patch, statistical hypothesis is tested. Null hypothesis (patch has weak flat texture) is accepted if its gradient covariance matrix Cj maximum eigenvalue is less than threshold. Threshold ф for maximum eigenvalue of gradient covariant matrix can be found as:
(Eq. 31)
where is the significance level (we use 0.99), is the inversegamma cumulative distribution function with shape parameter b/2 and scale parameter. Inversegamma cumulative distribution function is defined as:
(Eq. 32)
where Г(.) denotes gamma function, is a scale parameter, is a shape parameter. Gamma function for positive integer n is defined as:
(Eq. 33)
We assume that noisier images would have worse quality and would be less informative.
1.9 Average gradient and edge intensity
Both measures are taken from [8]. Average gradient F_{AG} shows how pixel values change on average for vertical and horizontal directions according to:
(Eq. 34)
Edge intensity F_{EI} is computed as:
where G_{}ver and G_{}hor are vertical and horizontal gradients obtained as: (Eq. 35)
(Eq. 36)
(Eq. 37)
Finally, we use a number of simple image quality metrics. First of all, average intensity F_{}AI is computed as:
(Eq. 38)
Overall image contrast F_{C} and contrast per pixel F_{CPP} are obtained as:
(Eq. 39)
(Eq. 40)
Table 1. Correspondence between described measures and names of features in our dataset. 0, 1 and 2 prefixes relate to images on three levels of Laplacian pyramid
Metric 
Name 
Corresponding variables 

Noreference blur metric 
F_{blur1} 
Blur10, blur11, blur12 

Min Goo Choi method 
F_{blur2} 
Blur20, blur21, blur22 

Shannon entropy 
F_{ent}_{ } 
Ent10, ent11, ent12 

Local Shannon entropy 
EntB0, entF0 

Separability measure 
F_{sep} 
Sep0, sep1, sep2 

Flatness measure 
F_{flat} 
Flat0, Flat1, Flat2 

Sharpness 
F_{sharp} 
Sharp0, sharp1, sharp2 

Contrast 
F_{C} 
Contr20, contr21, contr22 

Blockness measure 
F_{block} 
Block20, block40, block60, block80, block21 etc 

Fractal dimension 
F_{frac} 
Frac0, frac1, frac2 

Average intensity 
F_{AI} 
Intens0, intens1, intens2 

Noise level 
F_{noise} 
Noise0, noise1, noise2 

CPP  contrast per pixel 
F_{CPP} 
Contr10, contr11, contr12 

Average gradient 
F_{AG} 
AG0, AG1, AG2 

Edge intensity 
F_{EI} 
EI0, EI1, EI2 

2. Research design, data collecting and image markup
In order to evaluate the performance of various quality measures and validate the results, we used two datasets of grayscale images of different nature and quality. Each image quality was assessed two times: first by human observers (thus capturing our visual perception of the image quality), and second, but a set of metrics described above. The metrics were applied to the original images as well as their lowerresolution copies, derived with Laplacian pyramid decomposition, which produced the total of 57 quality metric measurement per each image. Our main intention was to find the best sets of numerical metrics that would explain the observed human perception of image quality.
Each image dataset used in this work consisted of similar images: the first set had 50 medical images (CT tomography of an abdomen), and the second  50 scenery photographs of trees and forest landscapes. We intentionally chose the images of rather abstract and “emotionfree” nature to exclude any subjective bias in the human perception.
The human perception ranks for the images were obtained with pairwise comparisons between all images in each dataset. The images were presented in random pairs to 15 human spectators, asking them to choose the best of the two. This task was implemented using Amazon Mechanical Turk technology; Figure 1 Mechanical Turk assignment for image markup shows screenshot of assignment.
To ensure comparison robustness, we used markup with triple overlap: each pair of images was compared three times by different observers; final choice computed using the majority rule. As a result, more than 7000 pairs were presented and compared.
To get image features, 19 basic quality measures were computed for three copies of each image: the original image and its two lowerresolution derived as two levels of the Laplassian pyramid. The resulting 57 measurements were treated as57dimensional image feature vectors, used as independent variables in models.
Figure 1. Mechanical Turk assignment for image markup
3. Experimental Results
3.1 Linear regression with known target variable
On the first step of research we are trying to solve our task using known quality measures of every image. In such approach we are trying to fit models to predict known outcome.
Based on the pairwise image comparison results we computed a quality index for every image as the number of this image's wins divided by the number of comparisons. This allowed us to put the images in a linear quality order. Note that in general this linear order cannot correspond to all the recorded comparisons: in some instances an image with a higher quality index might have been perceived as inferior when compared with some lowerquality image. This nonlinearity in image grades originated from the differences in quality perception between different human observers, and we called such image pairs inverted. Overall, 10% of pairs were inverted in medical dataset and 14% were inverted in trees dataset.
Using linear quality indices (rankings) as a target variable, we implemented linear regression with L2 norm as a basic model. We considered all possible regression models containing various combinations of k, k = 1…57 features, and extracted the best models for each k as providing the least regression error. Note that this resulted in an exhaustive search through millions of possible models (feature combinations), therefore we used branchandbound algorithm to speedup the search,
Normalized regression error for L2 regression, E, was defined as:
(Eq. 41)
where W_{p} stands for the modelpredicted image quality, and W  for the real observed quality.
One of main goals of the study was to find a set of factors that are responsible for the human perception of the image quality. We validated our featuremodeling results using medical (MS) and trees (TS) image datasets separately to make sure that models that perform well for one dataset would be good for another dataset.
Figure 2 shows various models for 1, 2, 3, 4 and 5 features. We used R squared as a metric to evaluate each model as a measure of the fraction of the original data variation explained by model. Treating the concept of image quality as a function of our visual perception rather than image selection, we therefore assumed that a good model should perform well for both MS and TS datasets. Figure 2 Regression models for both datasets visualizes our results. As one can see, R squared is not increasing dramatically after using more than 6 features, so we show only the models with up to 5 predictors. Circle sizes correspond to average error in each model. Largest circles are close to 0.27 while the best models have errors close to 0.08.
Figure 2 Regression models for both datasets
You can also observe that the circles on the plot tend to cluster along the diagonal line, which means that most models perform similarly on both MS and TS datasets. Moreover, the higher is k (the number of model features/predictors), the closer are circles to the diagonal line. As a result, higher k generally corresponds to more accurate and more imageindependent models, which can provide optimal quality predictions for both MS and TS sets.
Figures 3 a, b illustrate best models obtained for MS and TS independently. As the figure indicates, the models selected as the best for one dataset perform well on the other. This already can be viewed as a strong demonstration of the objectivity in the human image quality perception: despite the obvious differences between the images of CT scans and forest landscapes, the models optimal for one set were among the best performers for the other.
Figure 3 a, b
Finally, Figure 4 demonstrates top ten models for each model size, sorted by the average mean error on two datasets. It can be seen that most models lie on the diagonal line, models with 4, 5 and 6 features becoming increasingly closer to each other due to high R square for both datasets.
Table 2 summarizes the best predictors selected for each number of features defined in Table 1. It provides us with some significant insights. First of all, there is a limited set of quality measures which occur in most optimal models derived for MS and TS data. It can be assumed that these factors play the most important role in our perception of the image quality:
Figure 4 Best ten models for both sets
· Entropy power of the image on first and second levels of Laplacian pyramid (metrics flat0, flat1). It is a product of spectral flatness and variance of the image and shows image signal compressibility, reflecting how much useful signal is contained in the image.
· Entropy of the background (entB0, entB1) and entropy of the whole image, present in many optimal models for both sets
· Blockness measures for all block sizes (of 2, 4, 6 and 8 pixels) are important for all sets of images on all three levels of pyramid
· Both blur measures, sharpness, contrast and edge intensity measures on all resolution levels are significant for all datasets, proving that that perception of contrast and blurriness is one of major image quality metrics.
· Fractal dimension on all levels of image resolution can be found in models for both sets.
· Average gradient is especially important for trees dataset. This measure shows how much pixel values change on average. According to it, images with more contrast edges between objects get higher mark.
· Object separability on first and second levels of pyramid can be found in models for both sets. This measure is higher for images with distinguishable and more contrast parts.
As a result, we identify the following major factors responsible for the human perception of image quality:
· Amount of information contained in image, which can be described by spectral flatness and entropy measures. It is remarkable that random noise is not taken into account, while larger objects have some impact.
· Contrast, average gradient and blurriness are the most important nonreference quality measures that affect visual perception of the whole image, while sharpness and noise level hardly appear in the best models. This might be explained by sensitivity of used metrics.
· Artifact measures like blockness appears to be significant in most models.
· Background entropy performs well only as a addon factor which explains the variance that was not already covered by the other factors
All things considered, we obtained models containing restricted sets of features that are able to explain quality perception. However, basic matrix of comparisons is our ground truth and main source of information. To measure quality of described approach, we compared each pair of images by predicted quality measures computed by best models of five features mentioned above. To get vector of predicted values we performed leaveoneout cross validation for each of the two sets. This procedure enabled us to get more stable resulting vector of quality measures. On each step one image was separated from other images, so the model weights were learned using the rest of images to predict quality measure for a single image. Final vector of model quality measures was constructed of predicted values and normalized.
Average share of inverted pairs computed for predicted quality measures in comparison to initial matrix is 31% for medical images and 29% for trees. However, this result is far from original and could be improved.
Table 2 Best predictor values for models with restricted sets of factors. Table contains best three models according to average error on two datasets
Model size N 
Best L2 predictors for both datasets 
Best L2 predictors for trees dataset 
Best L2 predictors for medical dataset 

1 
· Blur10, Blur 12 · Sep0 · Blur20, blur22 · Intens2, · EntF0 
· AG1 · Sharp1 · EI1, EI0 
· Ent10 · ent11 · sep0 

2 
· Blur20, sep0 · Blur20, sep1 · EntF1, blur20 · Blur20/21, intens0/1/2 
· Blur20, entF0 · EntB0, block60 · EntB0, frac0 
· Blur20, sep0 · Blur20, sep1 · Blur20, intens0 

3 
· Blur20, EntB0, sharp1 · Blur10, Blur11,blur22 · Block measures + blur · Blur20, EntB0, frac0 
· Blur20, entB0, frac2 · entB0, sep0, flat2 
· Blur10, blur11,blur22 · Contr20, blur21, noise2 · Contr20, intens0, ent11 · Blur20, block22, block62 

4 
· Blur20 + blockness measures · Blur11, entB1, intens1, block22 · Contr22, noise2, blur21, entB0 
· entB0, sep0, block80, flat2 · blur10, entB0, sep0, flat2 
· block62, blur20, contr10, block22 · blur20, contr20,block62, block22 

5, 6 
· entB0, blur21, flat1, EI1, frac2 · entB0, blur21,flat1,EI1, block62 
· blur10m entB0, sep0, block40, flat2 · blur10, entB0, sep0, block80, flat2 
· Blur20, block60, block62,block22 · Blur20, EI0, EI1, block22, block42 

3.2 Checking linearity of image quality perception
As we mentioned before, the reduction of pairwise comparison scores to onedimensional linear quality indices resulted in 10%14% of inverted pairs: the instances where linear image quality values would mismatch the result of the image pairwise comparison. Using OLS regression models of five features resulted in 2930% of inverted pairs.
To improve our results and to account for more arbitrary ways of defining image quality indices, we decided to consider a scenario where there was no original predefined quality order. That is, the basic idea was to consider quality measures as unknown variables and then try to find their optimal values which would satisfy two major criteria: good predictability with linear regression, and lowest number of inverted image pairs.
Besides, we have another issue: in previous part we used linear model of quality. However, linear dependence is not obvious and should be checked. To do this, we tried to use a simple method based on best models obtained on previous step. The idea was to use linear models and enlarge Rsquared, minimize error and avoid decreasing the number of inverted pairs. We were using known quality measures from previous step as starting values. If linear model is appropriate, than we should be able to improve target vector to get higher Rsquare without violating restrictions of initial matrix of comparisons.
To start we looked for the best set of measures which would have the lowest regression error and which will not increase the number of inversions according to initial pairwise comparison matrix. In addition, we tried to decrease the number of inverted pairs with the new set of measures.
To check this we implemented a simple algorithm described below.
Pseudocode:
Initialisation: the starting set of q1…qn,
Qi is fraction of wins for the ith image in parwise comparisons.
Iterative process:
WHILE not terminal condition do:
For qi , 1< i < n : 1. Get interval for qi_new: [qi_min, qi_max],
qi_min = max of all qj, qj<qi, iЃ‚j ;
qi_max = min of all qk, qk>qi, kЃ‚i
If qi_min>qi_max:
take sorted array [q1, q2,…qm, qm+1,…qn].
For each interval [qm, qm+1] set qt = (qm +qm+1)/2:
choose qt = argmin (Ninverted_pairs).
qi_min qt,
qi_max qt+1.
If qt provides no more inverted pairs: qi_new = qi.
If qi_min <qi_max: go to step 2.
For each qi in [qi_min, qi_max] with a step 0.1*length of interval:
find optimal qi = argmin (MSE) for linear regression model.
END WHILE*
* Repeat steps 1, 2 until Rsquared is more than threshold and square error difference on step s and s1 is less than threshold. To compare error on step s with previous step s1, fit features weights using vector Qs as a target, obtain model vector Qs_mod and compute errors of Qs1 and Qs against such vector.
We assumed that in case of nonlinear dependence between quality and features, this algorithm will not converge: the idea of algorithm is to move initial quality measures closer to the model line. If this is possible without violating restrictions existing in the comparisons matrix, then mean square error (MSE) would decrease because model line will fit new quality measures better.
We used best ten models of five features and quality measures from previous section as initial values. However, in all cases it was impossible to decrease the fraction of inverted pairs for more than 2% points. We suggest that this can be caused by peculiarities of human perception and lack of transitivity in pairwise comparisons: it is natural that a person who compares images by two is not able to keep all seen images in mind and provide ideal linear order of them.
We achieved increase of Rsquare and reached +0.2 improvement without violating conditions. However we hardly achieved Rsquare more than 0.8. This result still proves that linear model is adequate for explaining quality perception. Figures 5 a b demonstrate average new and old values of quality measures obtained for best models for MS and TS respectively. Pearson correlation between old and new values is around 0.8 which means that new values are a linear combination of initial vector. This result enabled us to use linear models on next step when quality measures are treated as unknown.
Figures 5 a b. Old and new values of quality for MS(a) and TS(b)
3.3 Computing quality measures of images using Elo ratings approach
To improve the initial assignment of the quality indices, we tried one more approach that does not use any initial target vector of quality measures and based on the initial comparisons matrix in order to improve results achieved on previous step.
This approach is based on Elo rating system for chess tournament [9]. Each pair is considered as an independent Bernoulli test where each of two outcomes (winning of image A over image B) has probability p. All comparisons are seen as a series of such tests. Each image in pair has a rating, which determines the outcome of comparison so that image with higher rating wins. Rating of image K is a linear combination of its L features with weights:
(Eq. 42)
Probability of choosing image A in pairwise comparison i or, in other words, probability of image A rating being larger than image B rating is written as logistic function:
(Eq. 43)
The optimal set of features weights would provide ratings that will give the most likely pairwise comparisons. Outcome x of each comparison can be 0 or 1, which can be written using Bernoulli formula where probability P is probability of shown in (Eq. 38):
, x = {0,1} (Eq. 44)
Likelihood function is written as:
(Eq. 45)
To obtain image rankings that would give the most likely pairwise comparisons according to initial matrix, we should iteratively change features weights to maximize logarithm of likelihood which is sum of logarithms of P_{i}(x) shown in (Eq.39). Optimization was conducted using gradient descent method from SciPy library Implementation described on webpage ^{}.
This method was applied to various combinations of five features used in previous method independently on each of image sets in order to compare features and estimate their importance in determining image quality perception. Besides best models for mixed set of images was obtained. To compare models we simply used a rate of truly detected pairwise outcomes, results are presented in Table 3.
We performed ranking approach for possible combinations of five features and looked at best models that provide best results for each set separately and that perform well for both sets. In case of testing model on both sets we use sum of log likelihood for two sets separately, and take average of features weights for two sets. Performance of every model was estimated by number of correct pairwise comparisons according to ratings. They are presented in Table 3.
Table 3 Best predictor values for models with restricted sets of factors. Table contains best three models according to average rate of correct pairwise comparisons
Model 
Ratio of correct comparisons 

Both 
Med 
Trees 

ent11, sharp2, block42, block62, intens2 
0.667 
0.63 
0.67 

entB0, entF0, ent11, ent12, block82 
0.69 
0.73 
0.75 

Ent10, entF1, block20, noise1, block22 
0.66 
0.74 
0.75 

Contr20, ent11, entF2, sharp2, contr12 
0.69 
0.76 
0.73 

Blur20, ent10, ent11, block21, block22 
0.69 
0.75 
0.74 

Ent10, entF1, block20, noise1, block22 
0.66 
0.75 
0.76 

According to the table, some of best models, that perform well on each of sets separately, give worse results on mixed set of images. It can be clearly seen on a 3D (Figure 6) and 2D plots (Figure 7 a b c) of models. Each axis corresponds to quality on one of sets: TS, MS or mixed set containing both sets. It is seen that most models have better quality on each of MS and TS sets, but have lower quality on mixed set. It means that models are quite good even with five features, however, these features are sensible to image content, so trying to use average weights affects quality of model. Moreover, in many cases feature weights for different sets have opposite signs.
Another interesting finding concerns putting all 57 features in one model which seriously affects result negatively and provides around 4050% of corrected pairs which is almost as good as just random choice.
If we look at features contained in best five models, it can be seen that features contained in most models repeat results obtained with OLS regression. One of most important ones is entropy of whole image and its background and foreground on all levels of pyramid. Besides, blurriness, blockness, noise and average intensity and contrast occur in top models, which does not contradict to results obtained with OLS regression in Section 4.1.
In comparison of previous approach with known quality measures, Elo rating approach provides 2427% of inverted pairs on separate sets, which is better than with linear regression. This should be so due to using initial comparisons matrix as a ground truth. As for quality for mixed set, we see that models are not able to provide good result because of difference in weights. We are giving a closer look on this question in next section.
Figure 6 Models in 3dimensional space
Figure 7 a b c
3.4 Comparing feature weights
After obtaining sets of most important features our intention was to check for features invariant to scene and try to get a unique formula of quality based on separate models for both image sets. In addition, we tested the best models for each image set separately. Using initial comparisons matrix as a ground truth we trained linear classifier with binary outcome to check the results obtained at previous steps. The first part of this experiment aimed at training a model on one set and test on the other. If weights of features derived from the first image set were providing a good prediction for the second set as well, we would assume that the selected features provide a good representation of human image quality perception.
Second part was to check model performance on each set, and to get testing and training samples out of a mixed set to make sure that restricted number of features is able to provide acceptable results. For both parts, the main requirement was the use of linear classifiers according to previous assumption that quality of image depends on the image features linearly.
We were also using logistic regression classifier, which considers linear dependence between outcome and features. For every pair we use differences of features between left and right image and binary target variable, which equals 1 if left image wins. Scikitlearn library implementation of logistic regression classifier was used Implementation is described on their webpage ^{}. We studied model quality metrics such as accuracy score and area under curve to evaluate model performance and see whether selected features are able to provide good result. On final step we took best ten models of five features and performed a number of binary classification experiments using logistic regression classifier with intercept.
First part of experiment considered learning classifier on one homogenous set of images and testing on another. Results of these experiments demonstrate very low quality regardless number of features in model. Accuracy score is below 45%, precision and recall measures are close to 50% which is the same as random choice. This result was obtained for all experiments with same design. Example of feature weights for the same model learned on each set of images presented in Table 4 demonstrate that coefficients are different on sets.
Table 4 Features weights learned on each set
Feature 
MS 
TS 
Mixed set 

EntF0 
0.7244 
0.745 
0.632 

Spec0 
0.5731 
0.01 
0.312 

Block60 
1.293 
0.4741 
0.65 

Spec2 
1.195 
0.424 
0.251 

Contr22 
1.634 
0.396 
0.56 

F1 score 
0.75 
0.75 
0.63 

As for training and testing on same set of images, better results were achieved even with a fivefeature set. For example, fifth model from Table 3 provides better results on both sets. It reaches average accuracy of 72% using random shuffle cross validation algorithm with 20% testing size on trees dataset and 71% accuracy score on medical dataset. On mixed dataset where examples of both sets were included into training and testing set, average accuracy score is about 59%.
Another sets of experiments considered models including all 57 features. In this case average accuracy score is 76% for mixed dataset, 80% for medical dataset and 77% for trees dataset. This demonstrates that the best models of 5 features contain most of the useful signal needed for classification.
If we train on one set and test on another one using all 57 features, model still gives only 50% of accuracy.
These results show that selected models containing restricted features set are good enough for both set of images. However, there is no universal formula of quality for both sets at once due to different weights of features.
Figure 8 ROC curves for five features classifiers
quality image intensity regression
Conclusion and further research
Using two datasets of very different nature, we identified the most important image quality factors explaining human perception of the image quality. We used two major approaches: first approach uses a vector of known quality measures that were obtained from initial comparisons using arbitrary formula of wins to comparisons rate. Second approach treats quality as unknown feature and tries to find values using raw comparisons matrix as source of information. Comparing these to major approaches based on their fraction of falselypredicted pairwise comparisons (inverted image pairs), we obtained 2930% for the first, and 2427% for the second approach
We also observed that some factors were conceptually similar which enabled us to select a limited set of really important quality factors. In case of medical images, this is a very useful finding which enables us to interpret quality perception and not only to rank images by a number of features but also try to build a framework that improves particular image features. Such tool could be one of potential practical extension of this study.
Still we would like to extend and generalize the achieved results by validating them on more datasets. Another potential study limitation lies in the field of ranking and classifying images by quality. After increasing dataset of manually ranked images we could then conduct a comparison of ranking provided by neural network which can use a large number of all possible features and a classifier which uses a restricted set of most important features. However, such comparison would be fair if we use dataset of neutral monochrome images which makes it useful only for a specific field like medicine and medical images.
All things considered, our results demonstrate that image quality perception can be modeled with a small set of nonreference factors that are easy to interpret. This can definitely lead to new useful tools for image quality control.
Works cited
[1] Dolmiere T., Ladret P. Crete F., The Blur Effect: Perception and Estimation with a New NoReference Perceptual Blur Metric. Grenoble: SPIE Electronic Imaging Symposium Conf Human Vision and Electronic Imaging, ЙtatsUnis d'Amйrique, 2007.
[2] Serir A. Kerouh F., A noreference blur image quality measure based on wavelet transform.: Digital Information Processing and Communications, 2012.
[3] K. De, A new noreference image quality measure to determine the quality of a given image using object separability. Taipei: Machine Vision and Image Processing (MVIP), 2012 International Conference on, 2012.
[4] Monica P. CarleySpencer Jeffrey P. Woodard, NoReference image quality metrics for structural MRI.: Neuroinformatics, 2006, vol. 4.
[5] Chen F., Doermann D. Kumar J., "Sharpness estimation for Document and Scene Images," in Pattern Recognition (ICPR), 2012 21st International Conference on, Tsukuba, 2012, pp. 3292  3295.
[6] JA Bloom C Chen, A blind referencefree blockiness measure. Shanghai: in Proceedings of the Pacic Rim Conference on Advances in Multimedia Information Processing: part I, 2010.
[7] Masayuki Tanaka and Masatoshi Okutomi Xinhao Liu, Noise Level Estimation Using Weak Textured Patches of a Single Noisy Image.: IEEE International Conference on Image Processing (ICIP), 2012.
[8] Xinqi Zheng, Xuan Hu, Wei Zhou, Wei Wang Tao Yuan, A method for the evaluation of image quality according to the recognition effectiveness of objects in the optical remote sensing image using machine learning algorithm.: PLoS ONE, 2014.
[9] Apard E Elo, 8.4 Logistic Probability as a Rating Basis". The Rating of Chessplayers, Past&Present. NY, United States: Press International, 2008.
Source code
A. Elo rating approach
import scipy
import scipy.optimize
import itertools
import random
import math
import numpy
import pandas as pd
class LikelihoodCalculator:
def __init__(self, features, comparisons):
self.features = features
self.comparisons = comparisons
def getLogLikelihood(self, ratings):
logLikelihoodSum = 0.0
for (i1, i2, v) in self.comparisons:
print i1,i2
print len(ratings)
if abs(ratings[i2]  ratings[i1]) > 200.0:
logLikelihoodSum += abs(ratings[i2]  ratings[i1]) if (v == 1) == (ratings[i2] > ratings[i1]) else 0.0
else:
p = (1.0 / (1.0 + math.exp(ratings[i2]  ratings[i1])))
logLikelihoodSum += math.log(abs(1.0  p  v))
return logLikelihoodSum
def getRatings(self, weights):
return [sum([weight * feature for (weight, feature) in itertools.izip(weights, features1)]) for features1 in self.features]
def updateDerivatives(self, weights, featuresA, featuresB, v, derivatives):
ratingA = sum([weight * featureA for (weight, featureA) in itertools.izip(weights, featuresA)])
ratingB = sum([weight * featureB for (weight, featureB) in itertools.izip(weights, featuresB)])
exp1 = math.exp(ratingB  ratingA)
value = 1.0 / (1.0 + exp1)
if exp1 > 1e50:
derivativeA = 1.0 / exp1
else:
derivativeA = exp1 / (1.0 + exp1) ** 2
derivativeB = derivativeA
for j in range(len(weights)):
derivatives[j] += (derivativeA * featuresA[j] + derivativeB * featuresB[j]) / (value + v  1.0)
def __call__(self, weights):
weights = list(weights)
ratings = self.getRatings(weights)
value = self.getLogLikelihood(ratings)
derivatives = [0.0 for j in range(len(weights))]
for (a, b, v) in self.comparisons:
self.updateDerivatives(weights, self.features[a], self.features[b], v, derivatives)
print "Value: " + str(value)
return (value, numpy.array([d for d in derivatives]))
def findOptimalWeights(features, comparisons):
weightsCount = len(features[0])
weights0 = [random.random() for j in range(weightsCount)]
print 'START'
(weights, f, d) = scipy.optimize.fmin_l_bfgs_b(LikelihoodCalculator(features, comparisons), weights0)
print f
print d
return weights
def checkDerivative(obj, point, u):
print "Starting, point: " + str(len(point)) + ", u: " + str(u)
(initialValue, gradient) = obj.__call__(point)
gradient = list(gradient)
print "Starting, point: " + str(len(point)) + ", gradient: " + str(len(gradient)) + ", u: " + str(u)
print "Calculated derivative for u = " + str(u) + ": " + str(gradient[u])
for power in range(7, 4):
delta = 10.0 ** power
pointWithDelta = point[:]
pointWithDelta[u] += delta
(value, gradient1) = obj.__call__(pointWithDelta)
print "delta: " + str(delta) + ", value: " + str(value) + ", derivative: " + str((value  initialValue) / delta)
def main():
features_df = pd.read_csv("/Users/nephidei/Documents/imgproc/final/reit/trees_features.csv", sep=';')
for nabor in [[3,15,2,50,6,11,21]]:
features_selected = features_df[nabor]
features = map(list, features_selected.values)
comparisons_df = pd.read_csv("/Users/nephidei/Documents/imgproc/final/reit/trees_comp_sure.csv", sep=';')
comparisons = map(list, comparisons_df.values)
for comp in comparisons:
comp[0] = 1
comp[1] = 1
print comparisons
weights = findOptimalWeights(features, comparisons)
print "Weights: " + str(weights)
okCount = 0
badCount = 0
for (i, features1) in enumerate(features):
rating = sum([weight * feature for (weight, feature) in itertools.izip(weights, features1)])
for (a, b, v) in comparisons:
ratingA = sum([weight * featureA for (weight, featureA) in itertools.izip(weights, features[a])])
ratingB = sum([weight * featureB for (weight, featureB) in itertools.izip(weights, features[b])])
if (ratingA > ratingB) == (v == 1):
okCount += 1
else:
badCount += 1
print "OK: " + str(okCount) + ", bad: " + str(badCount)
main()
B. Code for nonreference quality measures
Average gradient
function AG = avgGrad(image)
% original image F:
imageF = im2double(image);
[m, n] = size(imageF);
Gx = zeros(m1,n1);
for i=1:m1
for j=1:n1
a1 = imageF(i,j);
a2 = imageF(i+1,j);
a3 = imageF(i, j+1);
sum1 = ((a1a2)^2 + (a1a3)^2);
Gx(i,j) = sqrt((sum1/2));
end
end
C = 1/((m1)*(n1));
S = sum(Gx(:));
AG = C*S;
Blockness
% blockness
% A Blind ReferenceFree Blockiness Measure
function blockness = blockness(image, bl)
imageF = rgb2gray(image);
[m, n] = size(imageF);
% window width
w = 8;
% block size parameter: bl
% difference
diff_hor = abs(imageF(1:m1, :)  imageF(2:m, :));
diff_vert = abs(imageF(:, 1:n1)  imageF(:, 2:n));
% normalization
d_norm_hor = zeros(m,n);
for ii=1+w:mw1
for j=1:n
expr1 = sum(diff_hor(iiw:ii+w,j).^2)  diff_hor(ii,j)^2;
koren = double(expr1 / (2 * w + 0.0))^0.5;
d_norm_hor(ii,j) = diff_hor(ii,j)/koren;
end
end
% horizontal profile
prof_hor = 1/n*(sum(d_norm_hor,2));
PH_values= zeros(m1);
sum_FPH = 0.0;
for ii = 1:m1
X = ii*(m1)/bl1.0;
sum_PH = 0.0;
for xi = 1:m1
sum_PH = sum_PH + prof_hor(xi) * exp(i*2*pi*xi*X/(m1));
end
FPH = abs(sum_PH);
PH_values(ii) = FPH;
sum_FPH = sum_FPH + FPH^2;
end
bm_h = 1/sum(prof_hor(1:m1))*sqrt((1/(bl1))*sum_FPH);
% normalization vert
d_norm_vert = zeros(m,n);
for ii=1:m
for j=1+w:nw1
expr1 = sum(diff_vert(ii,jw:j+w).^2)  diff_vert(ii,j)^2;
koren = double(expr1 / (2 * w + 0.0))^0.5;
d_norm_vert(ii,j) = diff_vert(ii,j)/koren;
end
end
% vertical profile
prof_vert = 1/n*(sum(d_norm_vert,1));
PH_values_vert= zeros(n1);
sum_FPH_vert = 0.0;
for j = 1:n1
X_vert = j*(n1)/bl1.0;
sum_PH_vert = 0.0;
for xj = 1:n1
sum_PH_vert = sum_PH_vert + prof_vert(xj) * exp(i*2*pi*xj*X_vert/(n1));
end
FPH_vert = abs(sum_PH_vert);
PH_values_vert(j) = FPH_vert;
sum_FPH_vert = sum_FPH_vert + FPH_vert^2;
Подобные документы
Игра арканный симулятор гонок разработана: в среде Delphi 5 с использованием библиотеки OpenGL 1.3.4582, Pixia 2.4g для создания и редактирования текстур, Image Editor 3.0 для создания иконок, 3DStydio Max 5.0 для создания моделей машин (игрока).
курсовая работа [34,1 K], добавлен 23.12.2007Основные возможности Norton Ghost. Создание резервной копии и восстановление данных из нее. Основные возможности Paragon Drive Backup. Клонирование дисков и разделов. Пользовательский интерфейс Drive Image 6.0. Утилиты Image Explorer и Ghost Explorer.
лекция [1,7 M], добавлен 27.04.2009Программа "Labs", выбор шрифта с помощью элемента ComboBox. Очистка содержимого и добавление значений в элемент ListBox. Загрузка картинки в элементе Image. Совместная работа SpinButton и TextBox. Изменение масштаба надписи и текста элемента Label.
лабораторная работа [3,1 M], добавлен 31.05.2009Характеристика графических возможностей среды программирования Lazarus. Анализ свойств Canvas, Pen, Brush. Сущность методов рисования эллипса и прямоугольника. Возможности компонентов Image и PaintBox. Реализации программы "Графический редактор".
курсовая работа [2,8 M], добавлен 30.03.2015Сферы применения и возможности WordPress  CMS с открытым исходным кодом, распространяемой под GNU GPL. Уязвимости WordPress в плагинах Emaily, FeedList, WP Auctions и Old Post Spinner. Межсайтовый скриптинг WordPress в плагине Page Flip Image Gallery.
реферат [4,1 M], добавлен 12.07.2012Теоретичні відомості щодо головних принципів локалізації програмного забезпечення, основні технологічні способи його здійснення. Труднощі, пов`язані з цим процесом. Перекладацький аналіз україномовної локалізації програм XnView і VSO Image Resizer.
дипломная работа [1,0 M], добавлен 16.07.2013Структура сайта, характеристика процесса его создания. Необходимая кодировка, установка. Присоединение таблицы стилей к сайту. Окно специальных возможностей тега image. Разбор сайта на РНР блоки, создание базы данных. Доступ к админке по паролю.
лабораторная работа [889,7 K], добавлен 09.01.2013Дослідження логічних схем, їх побудови і емуляції роботи в різних програмних засобах, призначених для цього. Electronics Workbench 5 – розробка фірми Interactive Image Technologies, її можливості. Рівні бази Multisim. Ключові особливості Proteus.
курсовая работа [2,0 M], добавлен 23.08.2014Обзор технологий резервного копирования. Восстановление данных из резервных копий. Разновидности программ резервного копирования: GFI Backup, Paragon Drive backup Workstation, Acronis True Image. Применение и сравнение рассмотренных программных продуктов.
курсовая работа [3,0 M], добавлен 29.01.2013Основні технологічні способи здійснення локалізації програмного забезпечення: SDL Passolo, Lingobit Localizer, OmegaT, Pootle, Narro. Перекладацький аналіз україномовної локалізації програм XnView і VSO Image Resizer. Граматичні та лексичні трансформації.
дипломная работа [1,3 M], добавлен 25.02.2014