See5 Algorithm versus Discriminant Analysis. An Application to the Prediction of Insolvency in Spanish Non-life Insurance Companies Zuleyka Díaz Martínez Department of Financial Economics and Accounting I, Universidad Complutense de Madrid. Campus de Somosaguas, s/n. 28223 Madrid, Spain. Telephone: +34 - 913942577, fax: +34 - 913942570; e-mail address: zuleyka@ccee.ucm.es José Fernández Menéndez Department of Business Administration, Universidad Complutense de Madrid. Campus de Somosaguas, s/n. 28223 Madrid, Spain. Telephone: +34 - 913942971, fax: +34 - 913942371; e-mail address: jfernan@ccee.ucm.es Mª Jesús Segovia Vargas Department of Financial Economics and Accounting I, Universidad Complutense de Madrid. Campus de Somosaguas, s/n. 28223 Madrid, Spain. Telephone: +34 - 913942564, fax: +34 - 913942570; e-mail address: mjsegovia@ccee.ucm.es Abstract Prediction of insurance companies insolvency has arised as an important problem in the field of financial research, due to the necessity of protecting the general public whilst minimizing the costs associated to this problem. Most methods applied in the past to tackle this question are traditional statistical techniques which use financial ratios as explicative variables. However, these variables do not usually satisfy statistical assumptions, what complicates the application of the mentioned methods. In this paper, a comparative study of the performance of a well-known parametric statistical technique (Linear Discriminant Analysis) and a non-parametric machine learning technique (See5) is carried out. We have applied the two methods to the problem of the prediction of insolvency of Spanish non-life insurance companies upon the basis of a set of financial ratios. Results indicate a higher performance of the machine learning technique, what shows that this method can be a useful tool to evaluate insolvency of insurance firms. Keywords: Insolvency, Insurance Companies, Discriminant Analysis, See5. 1 1. Introduction Unlike other financial problems, there are a great number of agents facing business failure, so research in this topic has been of growing interest in the last decades. Insolvency, early detection of financial distress, or conditions leading to insolvency of insurance companies have been a concern of parties such as insurance regulators, investors, management, financial analysts, banks, auditors, policy holders and consumers. This concern has arised from the necessity of protecting the general public against the consequences of insurers insolvencies, as well as minimizing the costs associated to this problem such as the effects on state insurance guaranty funds or the responsibilities for management and auditors. It has long been recognized that there needs to be some form of supervision of such entities to attempt to minimize the risk of failure. Nowadays, Solvency II project is intended to lead to the reform of the existing solvency rules in European Union. Many insolvency cases appeared after the insurance cycles of the 1970s and 1980s in the United States and in European Union. Several surveys have been devoted to identify the main causes of insurers’ insolvency, in particular, the Müller Group Report (1997) analyses the main identified causes of insurance insolvencies in the European Union. The main reasons can be summarized as follows: operational risks (operational failure related to inexperienced or incompetent management, fraud); underwriting risks (inadequate reinsurance programme and failure to recover from reinsurers, higher losses due to rapid growth, excessive operating costs, poor underwriting process); insufficient provisions and imprudent investments. On the other hand, many insurance companies, specially larger companies, have developed internal risk models for a number of purposes. In Spain, whilst not a formal requirement, many insurers use internal risk models, developed to greater or lesser degrees. In general, models are partial in nature and do not cover the entirely risks. Whilst the Spanish insurance supervisor analyses models, difficulties are noted in relation to verification of the level of reliability, congruency with accounting data, lack of harmonization, and limited level of application as a management tool at business unit level. Nevertheless the Spanish supervisor is working on the development of an early warning system based on insurers’ internal models (KPMG, 2002). 2 Therefore, developing new methods to tackle prudential supervision in insurance companies is a highly topical question, especially for all countries that belong to European Union, like Spanish’s case. A large number of methods have been proposed to predict business failure; however, the special characteristics of the insurance sector have made most of them unfeasible, and just a few have been applied to this sector. Most approaches applied to prediction of failure in insurance companies are statistical methods such as discriminant or logit analysis (Ambrose and Carroll, 1994; Bar-Niv and Smith, 1987; Mora, 1994; Sanchís et al., 2003), which use financial ratios as explicative variables. However, this kind of variables does not usually satisfy statistical assumptions. In order to avoid these problems, a number of non-parametric techniques have been developed, most of them belonging to the field of Machine Learning, such as neural networks (Serrano and Martín, 1993; Tam, 1991), which have been successfully applied to this kind of problems. However, their black-box character make them difficult to interpret, and hence the obtained results cannot be clearly analysed and related to the economical variables for discussion. Other machine learning methods such as the one that we will test in this paper (See5 algorithm) are more useful for economic analysis, because the models provided by them can be easily understood and interpreted by human analysts. We will compare the accuracy of the See5 algorithm and the Linear Discriminant Analysis (LDA) to predict insolvency of insurance companies. Some previous researches have compared the machine learning methods with the traditional statistical approaches (Altman et al., 1994; De Andrés, 2001; Dimitras et al., 1999; Dizdarevic et al., 1999), but only in a few of papers the comparisons have focused on the insurance sector (Martínez de Lejarza, 1999; Segovia et al., 2003). In this paper a sample of Spanish non-life insurance firms is used and general financial ratios as well as those that are specifically proposed for evaluating insolvency of insurance sector are employed. The results of See5 are very encouraging in comparison with LDA and show that this technique can be a useful tool for parts interested in evaluating insolvency of an insurance firm. The rest of the paper is structured as follows: section 2 introduces some concepts of the tested techniques. In section 3 we describe the data and input variables. In section 4 the results of the two approaches are presented. The discussion and 3 comparison of these results are also provided in this section. Finally, section 5 closes the paper with some concluding remarks. 2. A brief overview of the tested techniques 2.1. The See5 algorithm Perhaps learning systems based on decision trees are the easiest to use and to understand of all machine learning methods. Moreover, the condition and ramification structure of a decision tree is suitable for classification problems. Prediction of insolvency is a kind of classification problem, as we try to classify firms into solvent or insolvent. The automatic construction of decision trees begins with the studies developed in the social sciences by Morgan and Sonquist (1963) and Morgan and Messenger (1973). In statistics, the CART (Classification and Regression Trees) algorithm to generate decision trees proposed by Breiman et al. (1984) is one of the most important contributions. At around the same time decision tree induction was beginning to be used in the field of machine learning, notably by Quinlan (1979, 1983, 1986, 1988, 1993 and 1997), and in engineering by Henrichon and Fu (1969) and Sethi and Sarvarayudu (1982). The successive branches of a decision tree achieve a series of exhaustive and exclusive partitions among the set of objects that a decision maker wants to classify. The main difference among the various algorithms used, is the criterion followed to carry out the partitions previously mentioned. The See5 algorithm (Quinlan, 1997) is the latest version of the ID3 and C4.5 algorithms developed by this author in the last two decades. The criterion employed in See5 algorithm to carry out the partitions is based on some concepts from Information Theory and has been improved significantly over time. The main idea shared with similar algorithms is to choose that variable that provides more information to realize the appropriate partition in each branch in order to classify the training set. The information that provides a message or the achievement of a random variable x is inversely proportional to its probability (Reza, 1994). This quantity is usually 4 measured in bits obtained through the relation: 2 1log . xp The average of this relation for all the possible cases of the random variable x is called entropy of x: ( ) ( ) ( )2 1log . x H x p x p x =∑ The entropy is a measure of the randomness or uncertainty of x or a measure of the average amount of information that is supplied by the knowledge of x. In the same way, we can define the joint entropy of two random variables x and y: ( ) ( ) ( )2 , 1, , log ,x y H x y p x y , p x y =∑ which represents the average amount of information that is supplied by the knowledge of x and y. The conditional entropy of x given the variable y, ( ) ,H x y is defined as ( ) ( ) ( )2 , 1, log x y H x y p x y , p x y =∑ and this relation is a measure of the uncertainty of x when we know the variable y. That is, the amount of information necessary to know completely x when we know the information provided by y-variable. Naturally, ( ) ( ) ,H x y H x≤ because if y-variable is known we have more information that can help us to reduce the uncertainty about x-variable. This reduction in the uncertainty is called mutual information between x and y: ( ) ( ) ( ; ) ,I x y H x H x y= − which is the information provided for one of the variables about the other one. It is always verified that ( ) ( ) ; ; .I x y I y x= Consequently the amounts of information that each variable provides about the other one are equal. The mutual information is similar to covariance but the former verifies some properties that make it preferable. We can consider that x is a random variable that represents the category to which an object belongs. On the other hand, , 1, 2,..., ,iy i n= represents the set of attributes that describe the objects we want to classify. 5 In a first time, Quinlan chose to make each partition the -variable that provided the maximum information about x-variable, that is, he maximized iy ( ; i )I x y iy (this is called gain by him). Though this procedure provided good results, at the same time, it introduces a bias in favour of -variables with many outcomes. In order to avoid this inconveniency, the subsequent releases of the algorithm chooses the -variable that maximizes the following relation called gain ratio, iy ( ) ( ) ; .i i I x y H y This ratio represents the percentage of information provided by that is useful in order to characterize x. iy Notice that ( ) ; iI x y should be big enough in order to avoid that an attribute could be only chosen because it has a low value for entropy, what would increase the gain ratio. A common problem for the majority of rules and tree induction systems is that the generated models can be quite adapted to the training set, so the classification obtained will be nearly perfect. Consequently, the model derived will be very specific and in the case we want to classify new objects the model will not provide good results, moreover if the training set has noise. In this last case the model would be influenced by errors (noise) what would lead to a lack of generalization. This problem is known as overfitting. The most frequent way of limiting this problem in the context of decision tress and set of rules consists on eliminating some conditions of the branches of the tree or of the rules, in order to achieve more general models with these modifications in the algorithms. In the case of the decision trees, this procedure can be considered as a pruning process. This way we will increase the misclassifications in the training set but, at the same time, we probably will decrease the misclassifications in the test set that has not been used to derive the decision tree. Quinlan incorporates a post-pruning method for an original fitted tree, instead of developing a pruned tree directly. This method consists in replacing a branch of the tree by a leaf, conditional on a predicted error rate. Suppose that there is a leaf that covers N objects and misclassifies E of them. This could be considered as a binomial distribution in which the experiment is repeated N times obtaining E errors. From this issue, the probability of error ep is estimated, and it will be taken as the aforementioned predicted 6 error rate. So it is necessary to estimate a confidence interval for the probability of error of the binomial distribution. The upper limit of this interval will be ep (this is a pessimistic estimate). Then, in the case of a leaf that covers N objects, the number of predicted errors will be If we consider a branch instead of a leaf, the number of predicted errors associated with a branch will be just the sum of the predicted errors for its leaves. Therefore, a branch will be replaced by a leaf when the number of predicted errors for the last one is lower than the one for the branch. .eN P⋅ Furthermore, See5 algorithm includes additional functions such as a method to change the obtained tree into a set of classification rules that are generally easier to understand than the tree. For a more detailed description of the features and working of See5 algorithm see Quinlan (1993 and 1997). 2.2. Linear Discriminant Analysis (LDA) Although the classical methods of multivariate analysis have been superseded by methods from pattern recognition (Duda et al., 2001; Venables and Ripley, 2002) they still have a place. In this paper, we have used one of these classical methods, LDA, as a benchmark to compare the performance of the machine learning method aforementioned, i.e., the See5 algorithm. LDA was introduced by Fisher in 1936. The aim of LDA is to classify a new object according with the value of an estimated linear function of some attributes of that object. Geometrically, the new object is mapped to the same class that those located in its neighbourhood. LDA is subject to certain restrictive assumptions: each group follows a multivariate normal distribution, the covariance matrices of each group are identical (homoscedasticity) and prior probabilities and misclassifications costs are known. If this theoretical assumptions are violated, the results obtained may be questionable. When this happens, LDA could be seen as a non-parametric classification method, not optimal, but quite good in many situations (Krzanowski, 1996). 7 3. Methodological aspects In this section, we show the main characteristics of the data and variables that will be used to develop our models. We have used the sample of Spanish firms used by Sanchís et al. (2003). This data sample consists of non-life insurance firms data five years prior to failure. The firms were in operation or went bankrupt between 1983 and 1994. From this period, 72 firms (36 failed and 36 non-failed) are selected. As a control measure, a failed firm is matched with a non failed one in terms of industry and size (premiums volume). We have developed three models using data of one, two and three years before the firms declared bankruptcy. Thus, it has to be noted that the prediction of the insolvency achieved by each of them will be one, two and three years in advance, respectively. We refer to these models as Model 1, Model 2 and Model 3. In order to test the predictive accuracy of the models, we have split the set of original data to form the training sets and the holdout samples to validate the obtained models, i.e., the test sets. For Model 1, the training set consisted of 54 firms (27 failed and 27 non-failed firms) randomly generated. Therefore we have left 18 firms (9 failed and 9 non-failed) for testing. Sample size is different for each one of the rest of the years, because data didn't exist for all the firms. In the following table these sample sizes are shown as well as the sizes of the training sets (randomly generated) to develop the models and the test sets to validate them. Model Sample size (number of firms) Training set (number of firms) Test set (number of firms) 1 72 (36 failed and 36 non-failed) 54 (27 failed and 27 non-failed) 18 (9 failed and 9 non-failed) 2 68 (34 failed and 34 non-failed) 52 (26 failed and 26 non-failed) 16 (8 failed and 8 non-failed) 3 54 (27 failed and 27 non-failed) 40 (20 failed and 20 non-failed) 14 (7 failed and 7 non-failed) As for the variables, each firm is described by 21 financial ratios that have come from a detailed analysis of the variables and previous bankruptcy studies for insurance companies. Table 1 shows the 21 ratios which describe the firms. Note that special financial characteristics of insurance companies require general financial ratios as well as those that are specially proposed for evaluating insolvency of insurance sector. The ratios have been calculated from the financial statements (balance sheets and income statements) issued one, two and three years before the firms declared 8 bankruptcy. Ratios 15 and 16 have been removed in our study due to most of the firms not having “other income” so there is no sense in using them for an economic analysis. This reduces the total number of ratios to 19. We want to mention that Linear Discriminant Analysis has been performed using SPSS 11.0 software, and the software used to implement See5 algorithm is See5 by RULEQUEST RESEARCH. 4. Results 4.1. See5 algorithm We have developed three models (three decision trees). We refer to them as Model 1, Model 2 and Model 3. They have been developed using, respectively, the previously mentioned training sets 1, 2 and 3, and we have tested them with the test sets 1, 2 and 3, as we can see below: Model 1 R13 > 0.68: :...R9 <= 0.59: failed (14) : R9 > 0.59: : :...R17 <= 0.99: failed (3) : R17 > 0.99: healthy (3) R13 <= 0.68: :...R1 > 0.29: healthy (20/2) R1 <= 0.29: :...R2 > 0.04: failed (3) R2 <= 0.04: :...R6 > 0.64: healthy (3) R6 <= 0.64: :...R9 <= 0.85: failed (4) R9 > 0.85: healthy (4/1) Evaluation on training data (54 cases): Decision Tree ---------------- Size Errors 8 3(5.6%) (a) (b) <-classified as ---- ---- 27 (a): class healthy 3 24 (b): class failed 9 Evaluation on test data (18 cases): Decision Tree ---------------- Size Errors 8 5(27.8%) (a) (b) <-classified as ---- ---- 7 2 (a): class healthy 3 6 (b): class failed As we can see, only 6 ratios appear in the tree instead of the 19 initial ones. This indicates that these 6 variables are the most relevant ones for discrimination between solvent and insolvent firms in our sample and, consequently, it shows the strong support of this approach in feature selection. Our tree would be read in the following way: - If the ratio R13 is greater than 0.68 and the ratio R9 is less than or equal to 0.59, then the company will be classified as "failed". This fact is verified by 14 firms in our sample. - If the ratio R13 is greater than 0,68 and the ratio R9 is greater than 0,59 and the ratio R17 is less than or equal to 0,99, then the company will be classified as "failed", completing these conditions 3 companies. - If... and so on. Every leaf of the tree is followed by a number n or n/m. The value of n is the number of cases in the sample that are mapped to this leaf, and m (if it appears) is the number of them that are classified incorrectly by the leaf. The section under the tree concerns the evaluation of the decision tree, first on the cases of the training set from which it was constructed, and then on the new cases of the test set. The size of the tree is its number of leaves and the column headed “Errors” shows the number and percentage of cases misclassified. The tree, with 8 leaves, misclassifies 3 of the 54 given cases, what implies an error rate of 5.6%, that is, 94.4% of correctly classified firms. Performance on the training cases is furher analyzed in a confusion matrix that pinpoints the kinds of errors made. A similar report of 10 performance is given for the test cases, that shows the model’s accuracy on unseen test cases: an error rate of 27.8%, that is, 72.2% of correctly classified firms. Though the tree we have derived is quite easy to understand, sometimes the trees developed are difficult to interpret. An important feature of See5 is its ability to generate unordered collections of if-then rules, which are simpler and easier to understand than decision trees. The rules that are obtained starting from the previous tree are: Rules: Rule 1: (20/2, lift 1.7) R1 > 0.29 R13 <= 0.68 -> class healthy [0.864] Rule 2: (12/1, lift 1.7) R2 <= 0.04 R6 > 0.64 R13 <= 0.68 -> class healthy [0.857] Rule 3: (7/1, lift 1.6) R9 > 0.85 -> class healthy [0.778] Rule 4: (14, lift 1.9) R9 <= 0.59 R13 > 0.68 -> class failed [0.938] Rule 5: (7, lift 1.8) R13 > 0.68 R17 <= 0.99 -> class failed [0.889] Rule 6: (26/6, lift 1.5) R1 <= 0.29 -> class failed [0.750] Default class: healthy Each rule consists of: - Statistics (n, lift x or n/m lift x) that summarize the performance of the rule. Similarly to a leaf, n is the number of training cases covered by the rule and m, if it appears, shows how many of them do not belong to the class predicted by the rule. The lift x is the result of dividing the estimated accuracy of the rule by the relative frequency 11 of the predicted class in the training set. The accuracy of the rule is estimated by the Laplace ratio (n-m+1)/(n+2) (Clark and Boswell, 1991; Niblett, 1987). - One or more conditions that must all be satisfied if the rule is to be applicable. - A class predicted by the rule. - A value between 0 and 1 that indicates the confidence with which this prediction is made. There is also a default class, here “healthy”, that is used when an object does not match any rule. In this model, performance on the training cases and on the test cases is the same with this ruleset that with the previous tree, but it won't always be in this way. Although these results are satisfactory, they can improve appealing to the boosting option that See5 incorporates, based on the research of Freund and Schapire (1997). Boosting is a technique for generating and combining multiple classifiers to improve predictive accuracy. Very briefly, the idea is to generate several classifiers (either decision trees or rulesets) rather than just one. As the first step, a single decision tree or ruleset is constructed as before from the training data. This classifier will usually make mistakes on some cases in the data. When the second classifier is constructed, more attention is paid to these cases in an attempt to get them right. As a consequence, the second classifier will generally be different from the first. It also will make errors on some cases, and these become the focus of attention during construction of the third classifier. This process continues for a pre-determined number of iterations or trials. Finally, when a new case is to be classified, each classifier votes for its predicted class and the votes are counted to determine the final class. The results obtained with this method are frequently very good. In this way, starting from the previous tree, the results that are reached by means of the boosting option with 18 trials are shown in the following table, in percent of correctly classified firms: Correct classifications Training set Test set “healthy” firms 100% 77.78% “failed” firms 100% 88.89% Total 100% 83.33% 12 The sets of variables in the trees that constitute the rest of the models are shown in the next table. This table also displays performance on the training cases and on the test cases, in percent of correctly classified firms. The trees 2 and 3 have been pruned, because previously we observed that the error rates were quite smaller on the training sets than on the test sets, and this could be due to an overfitting problem. However, pruning doesn't improve performance on the first tree. Correct classifications Training set Test set Model Set of variables Size of the tree “Healthy” firms “Failed” firms “Healthy” firms “Failed” firms 100% 88.89% 77.78% 66.77% 1 R13, R9, R17, R1, R2, R6 8 Total: 94.44% Total: 72.22% 96.15% 84.62% 87.5% 75% 2 R1, R13, R20, R7, R3 6 Total: 90.39% Total: 81.25% 100% 70% 100% 57.14% 3 R4, R19, R1 5 Total: 85% Total: 78.57% As we have previously mentioned, in many occasions the classifications accuracy can be improved by means of boosting. For example, for the model 2, the results that we have obtained by means of boosting with 11 trials are shown in the following table, in percent of correctly classified firms: Correct classifications Training set Test set “healthy” firms 100% 87.5% “failed” firms 100% 87.5% Total 100% 87.5% 4.2. Linear Discriminant Analysis As a previous step, we have detected the univariate outliers and due to the shortage of data they have been substituted by the median of the correspondent attribute instead of being eliminated. We have verified that the great majority of variables follows a normal univariate distribution and that these variables are not discriminatory, in other words, their means are not significantly different between groups. Next, the discriminant analysis has been carried out using the stepwise method for the selection of the variables to introduce in the models. The variables have been 13 always chosen from those that present the most significant difference of means between groups. Furthermore, we have checked using M Box estimator that homoscedasticity assumption is not satisfied. The obtained results are shown in the following table: Correct classifications Training set Test set Model Set of variables “Healthy” firms “Failed” firms “Healthy” firms “Failed” firms 77.78% 59.26% 77.78% 44.44% 1 R1, R7 Total: 68.52% Total: 61.11% 73.08% 65.38% 25% 75% 2 R12, R17 Total: 69.23% Total: 50% 90% 60% 57.14% 42.86% 3 R4 Total: 75% Total: 50% 4.3. Results comparison In order to make easier the comparison between the two approaches, in the following table the results for the test samples are shown, in percent of correctly classified firms: Correct classifications Model Technique Set of variables “Healthy” firms “Failed” firms 77.78% 66.77% See5 R13, R9, R17, R1, R2, R6 Total: 72.22% 77.78% 44.44% 1 LDA R1, R7 Total: 61.11% 87.5% 75% See5 R1, R13, R20, R7, R3 Total: 81.25% 25% 75% 2 LDA R12, R17 Total: 50% 100% 57.14% See5 R4, R19, R1 Total: 78.57% 57.14% 42.86% 3 LDA R4 Total: 50% Roughly speaking See5 outperforms clearly LDA. In fact, the last one works as a random classificator in the models 2 and 3. Machine learning technique selects many more ratios than LDA, so it makes a better use of the available information which leads to a higher correct classification rate. 14 Probably the structure of data space is too much complex to achieve a good classification with a linear hypersurface as LDA does it. The more complex rules generated by machine learning technique adapt better to data structure. It is a very powerful tool to capture the peculiarities of data in detail. Moreover, as we could see previously, results of See5 for some models can be clearly improved by means of boosting. When a model is developed, every technique uses a quite different set of variables. However, differences between models are not as great as they seem because of the correlations between the variables. If some different variables are correlated, they can provide the same information for the models. Naturally, the ratios which appear in the solutions are not the same ones for each year, because the prediction of the insolvency achieved by each model will be one, two and three years in advance, respectively. We can consider that the ratios which appear in the two solutions achieved by See5 and LDA are highly discriminatory variables between solvent and insolvent firms. Consequently, those parts interested in evaluating the solvency of non-life insurance companies should take into account the following questions: a) R1- One of the most important questions in order to assure the proper functioning of any firm is the need of having sufficient liquidity. But in the case of an insurance firm, the lack of liquidity should not arise due to “productive activity inversion” which implies that premiums are paid in before claims occur. If an insurance firm cannot pay the incurred claims, the clients and public in general could lose faith in that company. On the other hand, this ratio is a measure of financial equilibrium if it is positive as it implies that the working capital is also positive. b) R4- This ratio is a general measure of profitability. The variable that appears in the numerator is the cashflow (cashflow plus extraordinary results) because sometimes it would be better to use this variable than profits because the first one is less manipulated than the second one. In any case, it is necessary to generate sufficient profitability to follow a right self-financing. c) R7- This ratio is considered as a “solvency ratio in strictu sense”. The numerator shows the risk exposure through earned premiums and the denominator shows the real financial support because technical provisions are considered together 15 with capital and reserves. This demonstrates the need of having sufficient shareholder’ funds and the need of complying correctly with the technical provisions to guarantee the financial viability of the insurance company. This ratio belongs to IRIS (Insurance Regulatory Information System) ratios. IRIS ratios are tests developed by the National Association of Insurance Commissioners (EEUU) as an early warning system. d) R17- Combined ratio. It is a traditional measuring of undewriting profitability and it indicates if the firm is following a correct rating in order to calculate right premiums that take into account the whole costs. 5. Conclusions In this paper we have applied the See5 algorithm and the Linear Discriminant Analysis to a real problem of classification of non-life insurance companies into healthy or failed. We have used a sample of Spanish companies described by a set of 19 financial ratios and we have compared the obtained results for each model. In the light of the experiments carried out, the machine learning approach (See5) is a competitive alternative to existing bankruptcy prediction models in insurance sector and has great potential capacities that undoubtedly make it attractive for application to the field of business classification. Our empirical results show that this method offers better predictive accuracy than the Linear Discriminant Analysis we have developed. Moreover, this technique not requires the adoption of restrictive assumptions about the characteristics of statistical distributions of the variables and errors of the models and the decision models provided by it are easily understandable and interpretable. In practical terms, the trees and decision rules generated could be used to preselect companies to examine more thoroughly, quickly and inexpensively, thereby, managing the financial user’s time efficiently. They can also be used to check and monitor insurance firms as a “warning system” for insurance regulators, investors, management, financial analysts, banks, auditors, policy holders and consumers. However, our work has some limitations, such as the few available cases and the uncertain quality of some information. Furthermore, if we want to use these models for predicting insolvency, we should to take into account that they have been developed 16 without including some aspects which could be relevant for this issue, such as size and industry. But in spite of these problems, our objective is to show the suitability of this machine learning technique as a support decision method for insurance sector. In short, we believe that this method, without replacing analyst's opinion and in combination with another ones, will play a bright role in the decision making process in insurance sector. 17 References ALTMAN, E.I., MARCO, G. and VARETTO, F. (1994): “Corporate distress diagnosis: comparisions using linear discriminant analysis and neural networks (the Italian experience)”, Journal of Banking and Finance, 18, 505-529. AMBROSE, J.M. and CARROLL, A.M. (1994): “Using Best’s Ratings in Life Insurer Insolvency Prediction”, The Journal of Risk and Insurance, 61 (2), 317-327. BAR-NIV, R. and SMITH, M.L. (1987): “Underwriting, Investment and Solvency”, Journal of Insurance Regulation, 5, 409-428. BREIMAN, L., FRIEDMAN, J.H., OLSHEN, R.A. and STONE, C.J. (1984): Classification and regression trees, Wadsworth, Belmont. CLARK, P. and BOSWELL, R. (1991): “Rule Induction with CN2: Some Recent Improvements”, in KODRATOFF, Y. (Ed.): Machine Learning - Proceedings of the Fifth European Conference (EWSL-91), Springer-Verlag, Berlin, 151-163. DE ANDRÉS, J. (2001): “Statistical Techniques vs. SEE5 Algorithm. An Application to a Small Business Environment”, International Journal of Digital Accounting Research, 1 (2), 153-179. DIMITRAS, A.I., SLOWINSKI, R., SUSMAGA, R. and ZOPOUNIDIS, C. (1999): “Business failure prediction using Rough Sets”, European Journal of Operational Research, 114, 263-280. DIZDAREVIC, S., LARRAÑAGA, P., PEÑA, J.M., SIERRA, B., GALLEGO, M.J. and LOZANO, J.A. (1999): “Predicción del fracaso empresarial mediante la combinación de clasificadores provenientes de la estadística y el aprendizaje automático”, in Bonsón, E. (Ed.): Tecnologías Inteligentes para la Gestión Empresarial, RA-MA Editorial, Madrid, 71-113. DUDA, R.O., HART, P.E. and STORK, D.G. (2001): Pattern Classification, John Wiley & Sons, Inc., New York. FREUND, Y. and SCHAPIRE, R.E. (1997): “A decision-theoretic generalization of on-line learning and an application to boosting”, Journal of Computer and System Sciences, 55(1), 119-139. 18 HENRICHON, Jr., E.G. and FU, K.S. (1969): “A nonparametric partitioning procedure for pattern classification”, IEEE Transactions on Computers, 18, 614-624. KPMG (2002): “Study into the methodologies to assess the overall financial position of an insurance undertaking from the perspective of prudential supervision”, (in http://europa.eu.int/comm/internal_market/en/finances/insur/index.htm). KRZANOWSKI, W.J. (1996): Principles of Multivariate Analysis. A User’s Perspective, Oxford University Press, Oxford. MARTÍNEZ DE LEJARZA, I. (1999): “Previsión del fracaso empresarial mediante redes neuronales: un estudio comparativo con el análisis discriminante”, in Bonsón, E. (Ed.): Tecnologías Inteligentes para la Gestión Empresarial, RA-MA Editorial, Madrid, 53-70. MORA, A. (1994): “Los modelos de predicción del fracaso empresarial: una aplicación empírica del logit”, Revista Española de Financiación y Contabilidad, 78, enero-marzo, 203-233. MORGAN, J.N. and MESSENGER, R.C. (1973): THAID: a Sequential Search Program for the Analysis of Nominal Scale Dependent Variables, Survey Research Center, Institute for Social Research, University of Michigan. MORGAN, J.N. and SONQUIST, J.A. (1963): “Problems in the analysis of survey data, and a proposal”, Journal of the American Statistical Association, 58, 415- 434. MÜLLER GROUP (1997): Müller Group Report. 1997. Solvency of insurance undertakings, Conference of Insurance Supervisory Authorities of The Member States of The European Union. NIBLETT, T. (1987): “Constructing decision trees in noisy domains”, in BRATKO, I. and LAVRAČ, N. (Eds.): Progress in Machine Learning (proceedings of the 2nd European Working Session on Learning), Sigma, Wilmslow, UK, 67-78. QUINLAN, J.R. (1979): “Discovering rules by induction from large collections of examples”, in Michie, D. (Ed.): Expert systems in the microelectronic age, Edimburgh University Press, Edimburgh. QUINLAN, J.R. (1983): “Learning efficient classification procedures”, in Machine learning: an Artificial Intelligence approach, Tioga Press, Palo Alto. 19 QUINLAN, J.R. (1986): “Induction of decision trees”, Machine Learning, 1 (1), 81-106. QUINLAN, J.R. (1988): “Decision trees and multivalued attributes”, Machine Intelligence, 11, 305-318. QUINLAN, J.R. (1993): C4.5: Programs for machine learning, Morgan Kaufmann Publishers, Inc., California. QUINLAN, J.R. (1997) : See5 (available from http://www.rulequest.com/see5- info.html). REZA, F.M. (1994) : An introduction to Information Theory, Dover Publications, Inc., New York. SANCHÍS, A., GIL, J.A. and HERAS, A. (2003): “El análisis discriminante en la previsión de la insolvencia en las empresas de seguros no vida”, Revista Española de Financiación y Contabilidad, 116, enero-marzo, 183-233. SEGOVIA, M.J., GIL, J.A., HERAS, A. and VILAR, J.L. (2003): “La metodología Rough Set frente al Análisis Discriminante en los problemas de clasificación multiatributo”, XI Jornadas ASEPUMA, Oviedo, Spain. SERRANO, C. and MARTÍN, B. (1993): “Predicción de la crisis bancaria mediante el empleo de redes neuronales artificiales”, Revista Española de Financiación y Contabilidad, 74, enero-marzo, 153-176. SETHI, I.K. and SARVARAYUDU, G.P.R. (1982): “Hierarchical classifier design using mutual information”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 4, 441-445. TAM, K.Y. (1991): “Neural network models and the prediction of bankruptcy”, Omega, 19 (5), 429-445. VENABLES, W.N. and RIPLEY, B.D. (2002): Modern Applied Statistics with S, Springer-Verlag, New York. 20 Table 1: List of Ratios RATIO DEFINITION R1 Working capital/ Total Assets R2 Earnings before Taxes (EBT)/ (Capital+ Reserves) R3 Investment Income/ Investments R4 EBT*/ Total Liabilities EBT* = EBT+ Reserves for Depreciation+ Provisions + (Extraordinary Income-Extraordinary Charges) R5 Earned Premiums/ (Capital+ Reserves) R6 Earned Premiums Net of Reinsurance/ (Capital+ Reserves) R7 Earned Premiums/ (Capital+ Reserves+ Technical Provisions) R8 Earned Premiums Net of Reinsurance/ (Capital+ Reserves+ Technical Provisions) R9 (Capital +Reserves)/ Total Liabilities R10 Technical Provisions/ (Capital + Reserves) R11 Claims Incurred/ (Capital+ Reserves) R12 Claims Incurred Net of Reinsurance/ (Capital+ Reserves) R13 Claims Incurred / (Capital+ Reserves + Technical Provisions) R14 Claims Incurred Net of Reinsurance/ (Capital+ Reserves+ Technical provisions) R15 Combined Ratio 1 = (Claims Incurred/ Earned Premiums)+ (Other Charges and Commissions/ Other Income) R16 Combined Ratio 2 = (Claims Incurred Net of Reinsurance/ Earned Premiums Net of Reinsurance)+ (Other Charges and Commissions/ Other income) R17 (Claims Incurred + Other Charges and Commissions)/ Earned Premiums R18 (Claims Incurred Net of Reinsurance + Other Charges and Commissions)/ Earned Premiums Net of Reinsurance R19 Technical Provisions of Assigned Reinsurance/ Technical Provisions R20 Claims Incurred / Earned Premiums R21 Claims Incurred Net of Reinsurance / Earned Premiums Net of Reinsurance 21 Department of Business Administration, Universidad Complutense de Madrid. Abstract 2. A brief overview of the tested techniques 2.2. Linear Discriminant Analysis (LDA) Correct classifications Training set Test set Total Correct classifications Training set Test set Total 4.2. Linear Discriminant Analysis