1228 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 REVISTA BRASILEIRA DE GESTÃO DE NEGÓCIOS ISSN 1806-4892 REVIEw Of BuSINESS MANAGEMENT © FECAP RBGN Received on August 22, 2013 Approved on July 7, 2015 1. María-Jesús Segovia-Vargas Doctor of Financial and Actuarial Economics Universidad Complutense de Madrid (Spain) [mjsegovia@ccee.ucm.es] 2. María-del-Mar Camacho- Miñano Doctor of Accounting Universidad Complutense de Madrid (Spain) [marcamacho@ccee.ucm.es] 3. David Pascual-Ezama Doctor of Psychology Universidad Autónoma de Madrid (Spain) [david.pascual@ccee.ucm.es] Review of Business Management DOI:10.7819/rbgn.v17i57.1741 Risk factor selection in automobile insurance policies: a way to improve the bottom line of insurance companies María-Jesús Segovia-Vargas, María-del-Mar Camacho-Miñano e David Pascual-Ezama Financial economics and accounting II, Universidad Complutense de Madrid, Madrid, Spain Responsible editor: João Maurício Gama Boaventura, Dr. Evaluation process: Double Blind Review ABStRACt Objective – The objective of this paper is to test the validity of using ‘bonus-malus’ (BM) levels to classify policyholders satisfactorily. Design/methodology/approach – In order to achieve the proposed objective and to show empirical evidence, an artificial intelligence method, Rough Set theory, has been employed. Findings – The empirical evidence shows that common risk factors employed by insurance companies are good explanatory variables for classifying car policyholders’ policies. In addition, the BM level variable slightly increases the explanatory power of the a priori risks factors. Practical implications – To increase the prediction capacity of BM level, psychological questionnaires could be used to measure policyholders’ hidden characteristics. Contributions – The main contribution is that the methodology used to carry out research, the Rough Set Theory, has not been applied to this problem. Keywords – automobile insurance company, risk factors, bonus malus system, rough set theory, artificial intelligence. 1229 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 Risk factor selection in automobile insurance policies: a way to improve the bottom line of insurance companies 1 I n t R O D u C t I O n : C u R R E n t P R O B l E M S C O n C E R n I n g AutOMOBIlE POlICIES In SPAnISh InSuRAnCE COMPAnIES Spain is one of the European countries that were most hit by the financial crisis that happened in Europe while this study was being prepared, particularly due to lack of financial funding. In addition, it is one of the most relevant countries within the European Union (EU), since it was the 5th largest in terms of GDP in 2011, according to the International Monetary Fund (IMF) (2011). Spanish companies are trying to overcome the crisis from different approaches. For insurance companies, defining risk involves identifying events, their likelihood, and their costs; and, although this is easier for frequent events such as road accidents (Johnson, 2006), it is also important to be efficient from a management point of view, specifically, as to operational efficiency (PwC, 2012). As well as for keeping customers, this is necessary to manage risks properly. However, though the service sector has acquired a growing importance in national economies (Resende & Guimarães, 2012), there is little research concerning insurance services, particularly those referring to the behavior of the parts involved in the offer or demand of insurance services (Silva, 2004). Insurance companies try to classify their insured policies into homogeneous tariff classes, assigning the same premium to all policies that belong to the same class in order to charge fair premiums to drivers. In fact, ‘accuracy is therefore crucial’, as Arvidsson (2010) finds. Thus, it is extremely important for the insurance company to select an adequate set of risk factors to correctly predict future claim rates, for two main reasons. Firstly, insurance market competence is currently increasing overall due to internet company special offers (Segovia-González, Contreras, & Mar- Molero, 2009). Secondly, the accident rate has decreased significantly over the last ten years – especially in Spain, with a reduction of 50% (see Figure 1). Due to these circumstances, the claim premium could effectively be readjusted, since the probability of high indemnities has decreased as a result. These two readjustments need to be limited by insurance companies for economic viability reasons and to avoid bankruptcy. Consequently, convergence between insurance and fostering financial stability is needed within insurance companies. 3 Figura 1. Evolução da taxa de acidentes na Espanha (anos: 2001-2010) Fonte: Spanish National Institute of Statistics (INE) (2015) Deve-se mencionar aqui que a remuneração paga por uma apólice de seguro de automóvel depende da classe atribuída ao condutor principal. Essa atribuição tem consequências claras para as duas partes afetadas pela escolha do sistema de classificação: a companhia de seguros, em razão dos custos e das receitas incorridos, e o motorista segurado, pela taxa de remuneração. Essa classificação de políticas é baseada na seleção dos chamados “fatores de risco”. Esses fatores são características ou recursos das políticas que ajudam as companhias a prever os valores de reivindicação em determinado período (normalmente um ano). No seguro de automóveis, os fatores são variáveis observáveis relativas ao motorista, ao veículo e ao tráfego. As principais variáveis classificatórias utilizadas pela indústria de seguros são as seguintes: idade, sexo, acidente ou registro de reivindicação do motorista principal, data da carteira de motorista, tipo de veículo e local de residência. Essas variáveis estão correlacionadas com as taxas de reivindicações e, portanto, podem ser úteis para prever as futuras reivindicações. A abordagem comum para selecionar os fatores de risco é baseada em técnicas estatísticas multivariadas, embora essas técnicas ainda deixem uma grande quantidade de heterogeneidade dentro das classes tarifárias. Já existe uma grande quantidade de literatura científica abordando o assunto da classificação de risco dos segurados (Arvidsson, 2010; Denuit, Maréchal, Pitrebois & Walhim, 2007). Quando, no entanto, os produtos de seguro automobilístico atingem determinada faixa de preço, há muitos fatores importantes que não podem ser considerados a priori, por exemplo: rapidez de reflexos ou o comportamento agressivo ao volante. Na verdade, psicólogos têm podido demonstrar que os acidentes em estradas estão relacionados ao comportamento dos motoristas (Aberg & Rimmö, 1998) e às violações de condução (Arvidsson, 2010; Adiante, 2008). Considera-se, então, que essas “características ocultas” são parcialmente reveladas pelo número de reivindicações relatadas pelos segurados (Pitrebois, 0 5000 10000 15000 20000 25000 fatality number injured person number Número de fatalidades Número de pessoas feridas Figure 1 – Evolution of accident rate in Spain (years: 2001-2010) Source: Spanish National Institute of Statistics (INE) (2015) 1230 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 María-Jesús Segovia-Vargas / María-del-Mar Camacho-Miñano / David Pascual-Ezama It should be mentioned that the premium paid for an automobile insurance policy depends on the class to which the principal driver is assigned. This assignment has clear consequences for the two parties affected by the classification system choice: the insurance company, due to the costs and the revenues incurred; and the insured driver, due to the premium rate. This policy classification is based on the selection of so-called ‘risk factors’. These factors are policy characteristics or features that help companies predict their claim amounts over a given period of time (usually one year). In automobile insurance, these are observable variables concerning the driver, the vehicle and traffic. The main classificatory variables used by the insurance industry are as follows: driver’s age, gender, accident or claim record of the principal driver, driving license date, vehicle kind and place of residence. These variables are correlated with the claim rates, and therefore can be useful in order to predict future claims. The usual approach to select risk factors is based on statistical multivariate techniques, although these techniques still leave a great deal of heterogeneity within tariff classes. There is already a lot of scientific literature dealing with the subject of the risk classification of policyholders (Arvidsson, 2010; Denuit, Maréchal, Pitrebois, & Walhim, 2007). However, when motor insurance products are being given a price range, there are many important factors that cannot be taken into account a priori, for instance: swiftness of reflexes or the extent of aggressive behavior behind the wheel. Indeed, psychologists have been able to demonstrate that road crashes refer to drivers’ behavior (Aberg & Rimmö, 1998) and to driving violations (Arvidsson, 2010; Forward, 2008). It is therefore considered that these ‘hidden characteristics’ are partly revealed by the number of claims reported by policyholders (Pitrebois, Denuit, & Walhim, 2006). Hence, the premium can be readjusted according to the number of claims reported by policyholders. This is usually done by integrating past claims history in a so- called ‘bonus-malus system’ (BMS). Thus, BMS is a merit-demerit rating system with twofold purposes: to encourage policyholders to drive more carefully, as well as to better assess individual risks, so everyone pays a premium according to his or her own claim frequency history (Lemaire, 1988). BMS is used in several countries such as Spain or Brazil; therefore, the conclusions of this research are especially interesting for countries that have adopted this merit-demerit system. However, it is relevant at this stage to note that BMS is not used in all countries due to insurance market maturity and national culture (Park, Lemaire & Chua, 2009). In this context it is noteworthy that BMS schemes ‘force’ policyholders to decide whether the magnitude of an accident is sufficiently great to justify a claim, since making a claim necessarily involves a future loss of discount. In addition, policyholders may have information, unobservable to the insurer, which predicts the ex-post risk (Arvidson, 2010). There is empirically demonstrated evidence that ‘drivers who were involved in traffic accidents or crashes in the last year took more risks when driving’ (Iversen, 2004). However, while there is a continuing debate as to the effects, problems and benefits of BMS, its use may improve market efficiency (Heras, Vilar, & Gil, 2002; Hey, 1985; Richaudeau, 1999). When BMS is applied, the premium is calculated by multiplying the original one by a percentage attached to the policyholder level in the scale. This is known as the bonus-malus coefficient. Therefore, the BMS refines the tariff a priori risk classification. That is, an a posteriori scheme by using the BMS can be used to redefine the a priori rating (Dionne & Ghali, 2005; Pitrebois et al., 2006). This assignment is essential from a financial point of view, because if high 1231 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 Risk factor selection in automobile insurance policies: a way to improve the bottom line of insurance companies risk policyholders are inadequately assigned in the BMS, the company could incur a high costs risk. Such a situation could jeopardise the future of the insurance company. Consequently, another variable will be considered (BM level) together with the original risk factors in this study. Bearing all these things in mind, the testing model of this paper is to examine the accident predictability to all factors with and without BM classes, and a comparison of the predictability of both models is carried out. We hypothesize that BM can add information for improving automobile policies classification into tariff classes. Moreover, this will be done to explain the ‘hidden factors’ for accurate insurance pricing. If the model with BM level significantly improves the explanation of claims rate referring to the models without BM level, the ‘hidden factors’ are consequently sufficiently explained by the BM level variable. This research paper is divided hereafter into the following sections: beginning with section 2 which shows the Rough Set method (RS). To test the model, the RS method will be employed due to its advantages. To date, there is no study that has applied this methodology to classify insurance policies. Then Section 3 describes the data and variables. In section 4 the methodology is shown whilst Section 5 discusses and presents results. Finally, the conclusions and proposals will be described and outlined in section 6. 2 ROugh SEt MEthODOlOgy The RS methodology used to test the models proposed belongs to the domain of Artificial Intelligence (AI). AI has demonstrated very high performance in classifying problems such as the one under study. Yet, there is little AI research devoted to the insurance industry, although it plays a growing and crucial role in modern economies. As is the case with other methodologies of artificial intelligence, the RS method has been successfully employed to investigate financial problems such as financial distress (Ahn, Cho, & Kim, 2000; Beynon & Peel, 2001; Dimitras, Slowinski, Susmaga, & Zopounidis, 1999; Sanchís, Segovia, Gil, Heras, & Vilar, 2007; Slowinski & Zopounidis, 1995), activity-based travel modeling (Witlox & Tindemans, 2004) or travel demand analysis (Goh & Law, 2003). Within the financial sector, the banking one has received more attention from AI researchers. However, the business peculiarities of the insurance sector make impossible to transfer the findings from the banking sector analysis to the insurance one. Therefore a specific analysis is needed (D’Arcy, 2005). Most AI studies devoted to the insurance sector tackle insolvency problems with satisfactory results (Brockett, Golden, Jang, & Yang, 2006; Brockett, Cooper, Golden, & Pitaktong, 1994; Díaz, Segovia, Fernández & Pozo, 2005; Kramer, 1997; Martinez de Lejarza Esparducer, 1996; Salcedo Sanz, Fernández Villacañas, Segovia Vargas, & Bousoño Calzón, 2005; Salcedo Sanz, Prado Cumplido, Segovia Vargas, Perez Cruz, & Bousoño Calzón, 2004; Segovia-Vargas, Salcedo-Sanz, & Bousoño- Calzón, 2004). Currently, RS has been applied in the insurance domain. Indeed, Sanchis et al. (2007) apply RS model to tackle with insolvency in the insurance industry in order to minimize the risk of failure. A 30 rule-decision model was generated with high performance in terms of classifications accuracies (80.56%). The rule model shows, from a solvency viewpoint, the importance of these questions: sufficient liquidity, correct rating, proper reinsurance and the need of having enough technical provisions. On the other hand, Shyng, Wang, Tzeng, & Wu (2007) focus on discovering customers’ need for the insurance market in Taiwan. A questionnaire about insurance products has been designed for understanding customer needs for year 2005 1232 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 María-Jesús Segovia-Vargas / María-del-Mar Camacho-Miñano / David Pascual-Ezama with single-choice and multi-choice answers. The authors apply RS theory to investigate the relationship between a single value and a combination of values of attributes. The results obtained with RS analysis are satisfactory because a hit test has been applied to check the feasibility of the decision rules obtaining a 100% hit rate test. The decision rules show the following customers’ insurance needs: the purchase purpose is endowment, the average annual premium was under US$ 938, the targets customers’ age 25- 35 years and the most purchased product is a mixture of products. Moreover, given reasons for not purchasing are no interest and the age (too young, below 25 years old). The selection of the RS method is based here not only on its being a high-performance classifying method, but also on its explanatory character. If the classification result is satisfactory, then the conclusions derived from the methodology shall be analyzed. This methodology has become a valuable new way to analyze financial problems since it presents some fundamental advantages, such as the fact that it does not usually need variables to satisfy any assumptions. Statistical methods need explanatory variables to satisfy statistical assumptions, which can be quite difficult to achieve when working with real problems. In the data variable selection proposed here there are qualitative and quantitative factors to consider. This can complicate the analysis and the results obtained. Thereby the elimination of redundant variables is achieved, so that the cost of the decision-making process and time employed by the decision-makers are reduced. Indeed, RS method has not been applied to this problem yet. Until the date this research has been carried out, there is only one research paper referring to alternative procedures for risk factor selection. However, this is based on black box AI methods (Bousoño, Heras, & Tolmos, 2008). That is, although the results obtained in this paper are satisfactory, the AI methods employed are not as explanatory as RS. RS theory was firstly developed by Pawlak (1991) in the 1980s as a mathematical tool to deal with uncertainty inherent to decision-making processes. Though nowadays this theory has been extended (Greco, Matarazzo, & Solwinski, 1998, 2001), this paper will use the classical approach. RS theory involves a calculus of partitions; therefore, it refers in some aspects to other tools that deal with uncertainty, such as statistical probability or fuzzy set theory. Unlike the RS method, there is a considerable literature on fuzzy set theory in insurance classification (Ebanks, Karwowski, & Ostaszewski, 1992; Horgby, 1998; Lemaire, 1990; Shapiro, 2005; Wit, 1982; Young, 1996). RS approach is somewhat different from either statistical probability or fuzzy set theory. It can be considered that there are three general categories of imprecision within scientific analyses. The first occurs when events are random in nature; this kind of imprecision is described by statistical probability theory. The second occurs with objects that may not belong only to one category, but to more than one category by differing degrees. In this case, imprecision is associated to the form of fuzziness in set membership and it is the field of fuzzy logic. Finally, RS theory deals with the uncertainty produced when some objects described by the same data or knowledge (so, they are indiscernible) can be classified into different classes (for example, two companies with the same values for certain financial variables – they are indiscernible – and one of them goes bankrupt and the other one continues in operation), that is, there is not only one classification of these indiscernible objects. This fact prevents their precise assignment to a set. Therefore, the classes in which the objects are to be classified are imprecise, but they can be approximated with 1233 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 Risk factor selection in automobile insurance policies: a way to improve the bottom line of insurance companies precise sets (McKee, 2000; Nurmi, Kacprzyk, & Fedrizzi, 1996). These differences show one of the main advantages of RS theory: an agent is not required to establish any preliminary or additional information about the data. In the other two categories of imprecision, it is necessary to assign precise numerical values to express imprecision of the knowledge, such as probability distributions in statistics or grade of membership or the value of possibility in fuzzy set theory (Pawlak, Grzymala- Busse, Slowinski, & Ziarko, 1995). The main concept of this approach is based on the assumption that, with every object in the universe, there can be correlation with associated knowledge and data. Knowledge is regarded in this context as ability to classify objects. RS Theory represents knowledge about the objects as a data table, that is, an information table in which rows are labelled by objects (states, processes, firms, patients, candidates…) and columns are labelled by attributes. Entries of the table are attribute values. Consequently, for each pair object-attribute, x-q, there is known a value called descriptor, f(x, q). In the problem that will be analyzed, the information table consists in the policies and the risk factors, that is, the objects will be each policyholder and the columns will be the risk factors used (see Table 1). Therefore, the descriptor will be the risk factor value for each policyholder. Occasionally, objects described by the same data or knowledge are indiscernible in view of such knowledge. The indiscernibility relation leads to the mathematical basis for the RS theory. Intuitively, a RS is a collection of objects that, in general, cannot be precisely characterized in terms of the values of a set of attributes. In real problems or databases, the occurrence of inconsistencies in classifications usually appears. In the case of study there are two classes in the database (drivers with and without accident). If a good driver (without accidents) has the same values for all attributes (risk factors) as a bad one it is difficult to classify them properly into the corresponding classes. Mathematically, the indiscernibility relation can be expressed in terms of descriptors, that is, two objects, x and y, all their descriptors in the table have the same values, that is if, and only if, f(x, q) = f(y, q). To find a solution, there are several ways: the first one consists in increasing the information (for example, considering more attributes) which, sometimes, is not easy or possible. Another possibility is eliminating these inconsistencies which is not a proper way because at least some information will be lost. Finally, another way is to deal with these inconsistencies by incorporating them to the analysis (that is RS case). RS methodology incorporates these inconsistencies creating some approximations to the decision classes. The lower approximation of a class or category consists of all objects that certainly belong to this class and can be certainly classified to this category employing the set of attributes (in the case of study, the risk factors). The upper approximation of a class contains objects that possibly belong to this class and can be possibly classified to this category using the set of attributes. The difference between the lower and the upper approximation, if it exists, is called the boundary or doubtful region: the set of elements that cannot be certainly classified to a class, taking into account the set of attributes. Using the lower and the upper approximation, those classes that cannot be expressed exactly (there is a doubtful region) can be defined precisely using available attributes. Figure 2 graphically represents the upper approximation, the lower approximation and the boundary region for a class or category. 1234 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 María-Jesús Segovia-Vargas / María-del-Mar Camacho-Miñano / David Pascual-Ezama 8 classify them properly into the corresponding classes. Mathematically, the indiscernibility relation can be expressed in terms of descriptors, that is, two objects, x and y, all their descriptors in the table have the same values, that is if, and only if, f(x, q) = f(y, q). To find a solution, there are several ways: the first one consists in increasing the information (for example, considering more attributes) which, sometimes, is not easy or possible. Another possibility is eliminating these inconsistencies which is not a proper way because at least some information will be lost. Finally, another way is to deal with these inconsistencies by incorporating them to the analysis (that is RS case). RS methodology incorporates these inconsistencies creating some approximations to the decision classes. The lower approximation of a class or category consists of all objects that certainly belong to this class and can be certainly classified to this category employing the set of attributes (in the case of study, the risk factors). The upper approximation of a class contains objects that possibly belong to this class and can be possibly classified to this category using the set of attributes. The difference between the lower and the upper approximation, if it exists, is called the boundary or doubtful region: the set of elements that cannot be certainly classified to a class, taking into account the set of attributes. Using the lower and the upper approximation, those classes that cannot be expressed exactly (there is a doubtful region) can be defined precisely using available attributes. Figure 2 graphically represents the upper approximation, the lower approximation and the boundary region for a class or category. Figure 2. Approximations in Rough Set Theory Note. Source: Adapted from “Rough Sets, Their Extensions and Applications”, by Q. Shen and R. Jensen, Journal of Automation and Compunting, 4, p. 218. Upper Approximation Class Lower Approximation FIguRE 2 – Approximations in Rough Set Theory Note. Source: Adapted from “Rough Sets, Their Extensions and Applications”, by Q. Shen and R. Jensen, Journal of Automation and Compunting, 4, p. 218. A fundamental problem of the Rough Set approach is identifying dependencies between attributes in a database, since it enables the reduction of a set of attributes by removing those that are not essential to characterizing knowledge. This problem will be referred as knowledge reduction or, in more general terms, as a feature selection problem. Feature selection problem would imply the possibility of correctly classifying objects without using the whole attributes that were originally taken into account. This is a very useful question, because it enables to a decision maker to classify focusing on the relevant variables what reduces time, effort and cost in a decision making process. For instance, in medicine this fact would imply to diagnose a patient more quickly if some proofs (especially the most painful or time-consuming ones) could be avoided if the experience demonstrates that they do not provide additional information to diagnose an illness. In the considered problem, the risk factors could be reduced without misclassifying the policies. In RS theory, there are several models to reduce the number of attributes. One of the most popular is the suggested by Skowron and Rauszer (1992). He proposed to represent the information table in a differentiation matrix. It is a symmetric matrix in which the rows and the columns are the objects (policies in this case, for instance xi y xj). Each entry in the table (cij) represents the attribute or set of all attributes (risk factors, in this case) that can differentiate xi from xj. Comparing each object with the rest in terms of attributes, it is possible to calculate this matrix, and the core and the reducts will be obtained. A reduct is the minimal subset of attributes which provides the same classification as the set of all attributes. If there is more than one reduct, the intersection of all of them is called the core and is the collection of the most relevant attributes in the table. If none of the attributes is redundant, it is impossible possible to obtain any reduct and, therefore, it will be necessary to use all the variables. But, if there is at least one reduct, it is possible to eliminate all the attributes that do not belong to it because they are redundant, that is, they do not provide any additional information. Once elimination of the redundant variables is achieved, the model can thereafter be developed into the format of the decision rules. Moreover, this technique is explanatory and generates decision rules with the following format: ‘if conditions then decisions’. That is, what decisions (actions, classifications) should be 1235 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 Risk factor selection in automobile insurance policies: a way to improve the bottom line of insurance companies undertaken when some conditions are satisfied. The number of objects that satisfy the condition part of the rule is called the strength of the rule. The obtained rules do not usually need to be interpreted by an expert as they are easily understandable by the user or decision maker. Several algorithms can develop rules based on RS Theory. Bazan, Nguyen, Nguyen, Synak and Wróblewski (2000) have developed the one implemented within the software employed in the empirical part of this paper. In sum: the most important result in the RS approach is the generation of decision rules, because they can be used to assign new objects to a class by matching the condition part of one of the decision rules to the description of the object. Therefore, rules can be used for decision support. 3 DAtA AnD VARIABlE SElECtIOn A real sample of 5,500 Spanish automobile policies observed during the year 2005 was employed. This data were provided by a large auto insurance company from Spain, although it is not generally available due to privacy legislation and industrial confidentiality issues. The risk factors (variables) employed by the company are the following fourteen (Table 1), usually accepted in the insurance business, which display a mixture of both qualitative and quantitative variables: tABlE 1 – Definitions of the variables used Risk Factors Computer Codes Explanation Kind of vehicle KoV This variable takes into account six values, including car, van, all-terrain vehicle, mixed-car, mixed-van and adapted vehicle Use USE Use to which the vehicle is devoted. It takes into account twelve values: private, taxi, emergency vehicles, driving school car, car company, temporal car company, exhibition, distribution, transport, rental with and without a driver and agrarian use CV POW Vehicle power (horsepower-HPA) Private PvP Vehicle use: private or public use Tare TAR Tare (weight) Seats NoS Vehicle seats number. It takes the following values: 2, 3, 4, 5, 6, 7, 8, 9. Ambit CA Vehicle circulation area. This variable considers six values: international, national, regional, interurban, urban and rural Years of the vehicle AoV The age of the vehicle in years Policyholder age AoD The age of policyholder in years Driving license EXP Years of driver’s license validity Gender GEN Male (M) or female (F) Region REG The policyholder’s geographical area, where the vehicle is registered. All autonomous Spanish regions (Andalucía, Ceuta, Castilla León, Castilla La Mancha, Cantabria, Baleares, Madrid, País Vasco, Murcia, Extremadura, Comunidad Valenciana, Navarra, Aragón, Cataluña, Asturias, Galicia, Melilla, Canarias, Rioja) were included, as well as certain large Spanish cities such as Valencia, Barcelona and Seville. This variable includes 22 values. Combustible DoG Diesel (D) or gasoline (G) Bonus-Malus BON Bonus Malus levels in which policyholders are classified by the company. There are fourteen levels. A lower level indicates a lower premium (best bonus); therefore level one is the starting place Source: Company data.  1236 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 María-Jesús Segovia-Vargas / María-del-Mar Camacho-Miñano / David Pascual-Ezama The sample is described in Table 2. Most database policies belong to men (74.3%). In all, 80% of auto policies are car in kind of vehicle, and specifically its main use is almost 100% private. The autos of the sample are gasoline in almost 6 out of 10 cases. The use zone of the car is urban in 74.2% of cases, while 22.3% is rural. The sample is concentrated in two regions: Andalusia (28.3%) and Madrid (10.5%). tABlE 2 – Frequencies of variables referring to insurer, auto and circulation zone Codes Categories Frequency Percent Valid Percent Cumulative Percent gen Male 4084 74.3 74.3 74.3 Female 1416 25.7 25.7 100.0 Kov Car 4394 79.9 79.9 79.9 Van 2 0 0 79.9 All-terrain vehicle 394 7.2 7.2 87.1 Mixed-car 347 6.3 6.3 93.4 Mixed-van 85 1.5 1.5 94.9 Adapted vehicle 278 5.1 5.1 100.0 pvp Public 47 0.9 0.9 0.9 Private 5453 99.1 99.1 100.0 use Private 5358 97.4 97.4 97.4 Taxi 18 0.3 0.3 97.7 Emergency vehicle 1 0 0 97.8 Driving school car 13 0.2 0.2 98.0 Car company 48 0.9 0.9 98.9 Temporal car company 2 0 0 98.9 Exhibition 2 0 0 98.9 Distribution 10 0.2 0.2 99.1 Transport 1 0 0 99.1 Renting with driver 29 0.5 0.5 99.7 Renting without driver 17 0.3 0.3 100.0 Agrarian use 1 0 0 100.0 Dog Diesel 2271 41.3 41.3 41.3 Gasoline 3229 58.7 58.7 100.0 Ca Rural 1224 22.3 22.3 22.3 Urban 4081 74.2 74.2 96.5 Interurban 125 2.3 2.3 98.7 Regional 28 0.5 0.5 99.2 Nacional 2 0 0 99.3 International 40 0.7 0.7 100.0 Reg Melilla 17 0.3 0.3 0.3 Ceuta 18 0.3 0.3 0.6 Rioja 38 0.7 0.7 1.3 Cantabria 106 1.9 1.9 3.2 Navarra 134 2.4 2.4 5.6 Asturias 151 2.7 2.7 8.3 Baleares 145 2.6 2.6 10.9 Extremadura 446 8.1 8.1 19 Aragón 203 3.7 3.7 22.7 Murcia 175 3.2 3.2 25.9 C. Mancha 336 6.1 6.1 32 Canarias 2 0 0 32 P. Vasco 218 4.0 4.0 36 C. León 298 5.4 5.4 41.4 Galicia 272 4.9 4.9 46.3 Valencia 346 6.3 6.3 52.6 Madrid 575 10.5 10.5 63.1 Barcelona 285 5.2 5.2 68.3 Cataluña 176 3.2 3.2 71.5 Sevilla 373 6.8 6.8 78.3 Andalucía 1186 21.6 21.6 100.0 Total 5500 100.0 100.0 100.0 1237 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 Risk factor selection in automobile insurance policies: a way to improve the bottom line of insurance companies Other information (Table 3) of the sample is that the mean power of cars is 91.36 horsepower, the mean weight of vehicles is 1,198 kilos, the average number of seats is almost 5 and the mean age of the vehicle is 6.5 years. The mean age of the policyholder is 44 years old, and the average driver’s license experience is over 20 years. tABlE 3 – Descriptive statistics of the main variables of the sample ClAIM POW tAR noS AoV AoD EXP BOn non-accident N 2853 2853 2853 2853 2853 2853 2853 Mean 90.4048 1166.2471 4.9930 6.9113 44.8591 20.7774 3.0277 Std. Deviation 28.66927 292.12331 .58973 21.10206 14.11033 11.58152 2.59847 accident N 2647 2647 2647 2647 2647 2647 2647 Mean 92.3993 1232.9686 4.8515 6.0650 44.0147 20.1980 3.2116 Std. Deviation 28.32268 359.00620 .80968 5.01882 12.79749 10.82368 3.14247 Total N 5500 5500 5500 5500 5500 5500 5500 Mean 91.3647 1198.3584 4.9249 6.5040 44.4527 20.4985 3.1162 Std. Deviation 28.51782 327.69976 .70768 15.59640 13.49982 11.22590 2.87437 These policies are assigned to two classes: (accident = A) or (non-accident = N_A). It is important to note that policyholders are assigned to these two classes taking into account reported claims (that is, when an accident is reported the policyholder is reclassified to ‘class accident’ class), but crucially not their costs. This is because BMS in force throughout the world (with very few exceptions such as Korea) penalize just the number of claims (Lemaire, 1995). In this way, it seemed that accident class would be very heterogeneous, but it is qualified taking into account the fourteen BM levels used by the company. 4 RESEARCh MEthODOlOgy The financial problem tackled is a classification problem, so new policyholders described by a set of risk factors are assigned to a category (accident or non accident). In order to achieve goal, two models are developed: firstly, one without BM level, and thereafter another with BM level. If the classification accuracy (percentage of correctly classified policyholders) in the first model is higher than the classification accuracy for the second one, then the BM level is a redundant variable. On the contrary, if classification accuracy in the second model is higher than the other one, then the BM level includes the ‘hidden factors’ for accurate insurance pricing. Depending on the differences between the two models (with and without BM level), the BM level variable explanatory power is presented. The two RS models are obtained to explain the dependent variable (claims) without BM level and with the BM level variable. If a classification model is developed and tested with the entire sample, the results obtained could be conditioned. So in order to avoid this happening, two random samples have been selected: a training set to develop the model (4,400 policies; or 80% of the whole sample) and a holdout sample to validate the rules (1,100 policies; or 20% of the total sample). The software used to perform the analysis allows splitting the table into two disjoint subtables randomly. However, it is necessary to specify the split factor to determine the size of the first subtable and the other subtable complements the first one. The split factor has been set at 0.8. Rough set analysis has been performed using RSES21. This software follows step by step all the concepts previously explained 1238 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 María-Jesús Segovia-Vargas / María-del-Mar Camacho-Miñano / David Pascual-Ezama about RS theory. Before running the software, the continuous variables (vehicle power, tare, years of the vehicle, age of the driver and year number of the driving licence) were registered in qualitative terms. This registry was made by dividing the original domain into subintervals. While it is not imposed by the RS theory, it is very useful in order to draw general conclusions from the decision rules or to interpret them (Dimitras et al., 1999). The company, based on its internal analyses, has established some groups (intervals) for the majority of the continuous variables (power, years of driving license, years of policyholder age, years of the vehicle) to manage the risk of the policies and they have been adopted for the analyses. The only exception is the tare (weight) variable. Therefore, in this research the subintervals are based on the information of the insurance company for all the continuous variables except for the tare variable. For this variable, the percentiles (10 to 90) have been employed in order to avoid subjective bias (see also researchers such as Laitinen, 1992; McKee, 2000). The optimal boundary values definition in the subintervals is usually done by experts according to their experience, knowledge, habits or conventions (that is the reason why the groups established by the company have been adopted in the paper) (Dimitras et al., 1999; Slowinski & Zopounidis, 1995). If there is not an expert to recode the variables that could follow their experience or standards of financial analysis, it is deemed desirable to avoid subjective bias to the greatest extent possible (therefore, percentiles have been adopted for tare variable). After recoding risk factors, two tables were obtained. The two recoded training tables which consisted of 4,400 policies described with 13 (without Bonus-Malus variable) or 14 (with Bonus-Malus variable) risk factors and assigned to a decision class (accident or not – 0) was entered into an input file in RSES2. The first result obtained by RS analysis is reduct calculation. One reduct has been obtained from the sample in both models, with and without BM level variable. The only variable that does not appear in the reduct is vehicle use (PvP), which has therefore been eliminated. Consequently, though RS theory is a very strong tool for feature selection, in this particular case the company has carefully selected the variables in order to focus on a few risk factors to make the decisions. In this way the time and the cost of the decision-making process are minimized. After eliminating the redundant variable in both tables, RSES2 has induced two decision rule models (with and without Bonus Malus variable). Before analyzing the obtained rules, both models have been validated using the two test samples (1,100 policies each) randomly selected. To validate the rules, both models employ classification accuracy in percentages of correctly classified policyholders. The RS model 1 without BM variable has on average classification accuracy of 72% while the RS model 2 with BM level variable has on average classification accuracy of 74.5%. In general, both models are satisfactory, because the percentage of right classifications is higher than 70%, and, therefore, the obtained rules for both models can be interpreted. 5 RESultS AnD DISCuSSIOn Firstly, the RS model 1 without BM variable is shown in Table 4. The variables are defined in Table 1. 1239 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 Risk factor selection in automobile insurance policies: a way to improve the bottom line of insurance companies tABlE 4 – RS model 1 without BM level Rules KoV uSE POW tAR CA AoV AoD EXP gEn noS REg Class 1 54-75 Interurban 5 D N_A 2 20-30 50-65 G A 3 Regional 16-20 N_A 4 1010-1078 Interurban 30-40 G N_A 5 Interurban 10-15 F D N_A 6 76-118 Urban 30-40 Murcia A 7 1215-1290 Interurban 30-40 N_A 8 Regional 30-40 N_A 9 CAR Urban 30-40 M Murcia A 10 < 940 20-30 50-65 A 11 30-40 F N_A 12 Particular 20-30 50-65 A 13 VAN 3 N_A 14 > 20 2 D N_A 15 3 3 N_A 16 54-75 Interurban 5 M N_A 17 54-75 < 1 > 20 5 A 18 9 Interurban D N_A 19 < 1 > 20 G A The RS model 2 with BM level variable is shown in Table 5. tABlE 5 – RS model 2 with BM level Rules KoV POW tAR CA AoV AoD EXP gEn noS REg Dog BOn Class 1 Interurban 8 N_A 2 < 940 20-30 20 1 A 3 Interurban 20-30 1 A 4 54-75 Interurban 5 D N_A 5 Interurban G 4 N_A 6 Van 119-215 1 N_A 7 119-215 3 1 N_A 8 20-30 50-65 G A 9 20-30 50-65 1 A 10 Regional 16-20 N_A 11 Interurban M 6 N_A 12 1010-1078 Interurban 30-40 G N_A 13 76-118 Interurban 4 N_A 14 Interurban 10-15 F D N_A 15 76-118 Urban 30-40 Murcia A 16 1215-1290 Interurban 30-40 N_A 17 Interurban D 6 N_A 18 Regional 30-40 N_A 19 Car 76-118 Urban 40-50 > 20 M 2 A 20 Car Urban 30-40 M Murcia A 1240 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 María-Jesús Segovia-Vargas / María-del-Mar Camacho-Miñano / David Pascual-Ezama The rules for the first model can be read as follows: a) If power (POW) = (54-75 HPA.) and circulation area (CA) = interurban and the age of the vehicle (AoV) = (5 years) and combustible (DoG) = D (diesel), then N_A (non-accident). b) If the age of the vehicle (AoV) = (21-30 years) and the age of the driver (AoD) = (50-65 years) and combustible (DoG) = G (Gasoline), then A (accident). c) If the circulation area (CA) = regional and the experience (EXP) = (16-20 years), then N_A (non-accident). d) If…and so on. Therefore, RS model 2 can be read in the same way. The comparison between classification accuracies of both models shows that the BM level variable increases the classification rate by 2.5%. This result is in line with Bousoño et al. (2008). The inclusion of another variable (BM) implies a 7.7% increase in the number of variables to an increase of 2.5% in the classification, calling into question the value of including the BM variable. These results are also in line with Hey (1985), since the BMS do not improve to a great extent the explanation of ‘hidden factors’. In fact, psychological research show critical variables such as risk aversion, personality or stress referring to traffic incidents. There are also empirical studies which show that males were twice as likely to have reported at least one crash as a driver compared to females and nearly three times as likely to have reported two or more crashes. Additionally, drivers aged 17-29 were twice as likely to have reported at least one crash when compared to those aged over 50 years (Glendo, Dorn, Davies, Matthews, & Taylor, 1996). Taking into account these results, two of risk factors measured for the insurance companies are age and gender. However, when risk-taking behaviors were introduced into the model, males or 17-29 year olds being involved in at least one crash substantially reduced (Turner & McClure, 2003). On the other hand, personality is a very important variable in car incident prediction (Schwebel, Severson, Ball, & Rizzo, 2006). Aggression, traditionalism, and alienation were the most frequent personality scales associated with risky driving behavior and crash risk. Above all, high levels of aggression predict that a driver could be involved in a crash (Gulliver & Begg, 2007). Finally, driver stress is correlated with accident involvement as well. Moreover stress is also linked with other variables that could be correlated to accidents like frequency of daily hassles and aggressiveness, poorer self-rated attention or mood states (Matthews, Dorn, & Glendon, 1991). It seems unlikely that the importance of all these variables can be captured by the 2.5% increase in classification accuracy attributable to BM in the model. With this in mind, due to these positive results in terms of classification, the decision rules set can be interpreted accordingly. Taking into consideration the strongest ones, the following results are found for both models: a) In tables 4 and 5, there are more rules for the non-accident class than for accident class (Table 4 N_A rules 12- A rules 7; Table 5 N_A rules 13- A rules 7). A few rules facilitate their interpretation because the model is more compact and it is possible to be more concrete. Therefore it is easier to draw some standards for car accidents than for the other class. b) All the rules are deterministic. This means that both classes are well discriminated amongst each other. The number of attributes in the rules varies from 2 to 7. In some rules, the explanation model could be defined using only 2 risk factors. c) The most relevant risk factors (which appeared in more than 50% of the rules) to classify the policies in the RS model 1 are the following: circulation area, vehicle age and driver age. In the RS model 2 (with BM level) the circulation area is the most relevant risk factor as well. The BM level is the second relevant variable. Referring to the circulation area variable, it takes on the value ‘interurban’ for the 1241 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 Risk factor selection in automobile insurance policies: a way to improve the bottom line of insurance companies majority of the rules that belong to class A (accident) in both models. On the contrary, this variable takes on the value ‘urban’ for the majority of rules referring to class N_A (non-accident). This fact shows that the majority of reported accidents, also probably the most serious ones, occurred in the interurban circulation area. Referring to the BM level, this is assigned by the company based on past experience. The rules show that the best two levels (1 and 2) are assigned to non- accident class. This finding confirms that the company is correctly assigning the bonus levels. This is especially important given that BM level is a key variable in the premium calculation. a) The vehicle use risk factor does not appear in the RS model 2, whilst in the RS model 1 it only appears in one rule. This fact show that the use of the vehicle is not a determining factor but it cannot be eliminated because it belongs to the core. b) The gender risk factor appears just in four rules that belong to both classes in both models. Therefore no conclusions referring with that variable can be drawn. 6 C O n C lu S I O n S A n D F u t u R E RESEARCh The objective of this paper was to test the validity of using ‘bonus-malus’ (BM) levels to classify policyholders satisfactorily. In order to show this empirical evidence a novel method in insurance, Rough Set theory has been employed. According to the data, the empirical evidence shows that the common risk factors employed by the insurance company are good explanatory variables for classifying car policyholders’ policies. Furthermore, the BM level variable increases the explanatory power of the a priori risks factors. The differences of the model with and without BM level are not very notable. In fact, the empirical evidence in the sample shows that the BM level risk factor explanation power is too small (at only 2.5%). However, the referring literature consulted finds that there are many important factors that cannot be taken into account a priori. It is considered that these ‘hidden characteristics’ are partly revealed by the number of claims reported by the policyholders; that is, the BM level. Indeed, as it has been mentioned, there are many relevant factors to predict dangerous driver behavior as drivers’ personality, risk-taking or stress (among others) that automobile sector should be taken into account. To increase the prediction capacity of BM level, psychological questionnaires could be used to measure ‘hidden characteristics’. Concretely, in Spain drivers need to renew their driving licence at regular periods. This requires a medical examination that guarantees the physical conditions to drive and it could also be used to test the psychological factors mentioned above using, for instance, the ‘Driving Behavior Inventory’ for driving stress (Gulian, Matthews, Glendon, Davies, & Debney, 1989); the ‘NEO- FFI’ for personality (Costa & McCrae, 1992); the ‘Zuckerman-Kuhlman five-factor’ (Zuckerman & Kuhlman, 2000) for risk-taking referring with personality or a new simple questionnaire using a mix of traditional questionnaires. Another suggestion would be to take into account the ‘points-based driving licence system’ used in some European countries such as UK, Germany, France, Italy, Ireland, Luxembourg or Spain as a proxy to improve the BM level classification. There are also cultural factors that could affect the study (Nordfaern, Simsekoglu, & Rundmo, 2012), for instance if drivers pay more attention on written information and sounds in road traffic or on oral and visual traffic information. Another important variable is if drivers are more or less fatalistic. Therefore, more studies are required to generalize the obtained results. nOtA 1 RSES2 software was developed by the Institute of Mathematics, Warsaw, Poland. To download (Warsaw University, 2005). 1242 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 María-Jesús Segovia-Vargas / María-del-Mar Camacho-Miñano / David Pascual-Ezama REFEREnCES Åberg, L., & Rimmö, P. A. (1998). Dimensions of aberrant driver behavior. Ergonomics, 41(1), 39-56. Ahn, B. S., Cho, S. S., & Kim, C. Y. (2000). The integrated methodology rough set theory and artificial neural network for business failure prediction. Expert Systems with Applications, 18(2), 65-74. Arvidsson, S. (2010). Does private information affect the insurance risk? Working paper, The Geneva Association, 396, 2010. Retrieved from http://www.transportportal.se/SWoPEc/Essay_1_ Arvidsson_Does_private_information.pdf Bazan, J., Nguyen, H. S., Nguyen, S. H., Synak, P., & Wróblewski, J. (2000). Rough set algorithms in classification problem. In L. Polkowski, S. Tsumoto, & T. Y. Lin (Eds.), Rough set methods and applications (pp. 49-88). New York: Physica-Verlag. Beynon, M. J., & Peel, M. J. (2001). Variable precision rough set theory and data discrimination: An application to corporate failure prediction. OMEGA: The International Journal of Management Science, 29(6), 561-576. Bousoño, C., Heras, A., & Tolmos, P. (2008). Factores de riesgo y cálculo de primas mediante técnicas de aprendizaje. Madrid, España: Ed. MAPFRE. Brockett P., Cooper, W., Golden, L., & Pitaktong, U. (1994). A neural network method for obtaining an early warning of insurer insolvency. The Journal of Risk and Insurance, 61(3), 402-424. Brockett, P., Golden, L., Jang, J., & Yang.C. (2006). A comparison of neural network, statistical methods, and variable choice for life insurers’ financial distress prediction. The Journal of Risk and Insurance, 73(3), 397-419. Costa, P. T., & McCrae, R. R. (1992). NEO PI-R professional manual. Odessa, FL: Psychological Assessment Resources. D’Arcy, S. (2005). Predictive modeling in automobile insurance: A preliminary analysis. [Working Paper, 302]. World Risk and Insurance Economics Congress, August, Salt Lake City. Retrieved from http://business.illinois.edu/ ormir/Predictive%20Modeling%20in%20 Automobile%20Insurance%207-1-05(PDF).pdf Denuit, M., Maréchal, X., Pitrebois, S., & Walhin, J. F. (2007). Index, in actuarial modeling of claim counts: Risk classification, credibility and bonus-malus systems. Chichester, UK: John Wiley & Sons. Díaz, Z., Segovia, M. J., Fernández, J., & Pozo, E. Machine learning and statistical techniques: An application to the prediction of insolvency in Spanish non-life insurance companies. (2005). The International Journal of Digital Accounting Research, 5(9), 1-45. Retrieved from http://www. uhu.es/ijdar/10.4192/1577-8517-v5_1.pdf Dimitras, A., Slowinski, R., Susmaga, R., & Zopounidis, C. (1999). Business failure prediction using Rough Sets. European Journal of Operational Research, 114(2), 263-280. Dionne, G., & Ghali, O. (2005). The bonus- malus system in Tunisia: An empirical Evaluation. Journal of Risk and Insurance, 72(4), 609-633. Ebanks, B., Karwowski, W., & Ostaszewski, K. (1992). Application of measures of fuzziness to risk classification in insurance. Paper presented at Forth International Conference on Computing and Information ICCI’92, Toronto. Forward, S. (2008). Driving violations: Investigating forms of irrational rationality. Uppsala: Universitetsbiblioteket. Retrieved from http://uu.diva-portal.org/smash/get/ diva2:172720/FULLTEXT01 Glendon, A. I., Dorn, L., Davies, D. R., Matthews, G., & Taylor, R. G. (1996). Age and gender differences in perceived accident likelihood and driver competences. Risk Analysis, 16(6), 755-762. doi: 10.1111/j.1539-6924.1996. tb00826.x 1243 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 Risk factor selection in automobile insurance policies: a way to improve the bottom line of insurance companies Goh, C., & Law, R. (2003). Incorporating the rough sets theory into travel demand analysis. Tourism Management, 24(5), 511-517. Greco, S., Matarazzo, B., & Slowinski, R. (1998). A new rough set approach to evaluation of bankruptcy risk. In C. Zopounidis (Ed.), New operational tools in the management of financial risks (pp. 121-136). Dordrecht: Kluwer Academic Publishers. Greco, S., Matarazzo, B., & Slowinski, R. (2001). Rough sets theory for multicriteria decision analysis. European Journal of Operational Research, 129(1), 1-47. Gulian, E., Matthews, G., Glendon, A. I., Davies, D. R., & Debney, L. M. (1989). Dimensions of driver stress. Ergonomics, 32(6), 585-602. Gulliver, P., & Begg, D. (2007). Personality factors as predictors of persistent risky driving behavior and crash involvement among young adults. Injury Prevention, 13(6) 376-381. Heras, A., Vilar, J. L., & Gil, J. A. (2002). Asymptotic fairness of Bonus- Malus systems and Optimal scales premiums. The Geneva Papers on risk and Insurance Theory, 27(1), 61-82. Hey, J. (1985). No claim bonus? The Geneva Papers on risk and Insurance, 10(36), 209-228. Horgby, P.-J. (1998). Risk classification by fuzzy inference. The Geneva Papers on Risk and Insurance Theory, 23(1), 63-82. International Monetary Fund. (2011). World economic outlook database. Retrieved from www. imf.org Iversen, H. (2004). Risk-taking attitudes and risky driving behavior. Transportation Research Part F, 7(3), 135-150. Johnson, J. (2006). Can complexity help us better understand risk? Risk Management, 8(4), 227-267. Kramer, B. (1997). N.E.W.S.: A model for the evaluation of non-life insurance companies. European Journal of Operational Research, 98(2), 419-430. Laitinen, E. K. (1992). Prediction of failure of a newly founded firm. Journal of Business Venturing, 7(4), 323-340. Lemaire, J. (1988). A comparative analysis of most European and Japanese Bonus-malus Systems. Journal of Risk and Insurance, 55(4), 660-681. Lemaire, J. (1990). Fuzzy insurance. ASTIN Bulletin, 20(1), 33-55. Lemaire, J. (1995). Bonus-malus systems in automobile insurance. Boston: Kluwert Academic Publisher. Martinez de Lejarza Esparducer, I. (1996, September). Forecasting company failure: Neural approach versus discriminant analysis: An application to Spanish insurance companies of the 80´s. International Conference on Artificial Intelligence in Accounting, Finance and Tax, Punta Umbria (Huelva), Spain, 2. Matthews, G., Dorn, L., & Glendon, A. (1991). Personality correlates of driver stress. Personality and Individual Differences, 12(6), 535-549. McKee, T. (2000). Developing a bankruptcy prediction model via rough sets theory. International Journal of Intelligent Systems in Accounting, Finance and Management, 14(3), 159-173. Nordfjaern, T., Simsekoglu, O., & Rundmo, T. (2012). A comparison of road traffic culture, risk assessment and speeding predictors between Norway and Turkey. Risk Management, 14(3), 202-221. Nurmi, H., Kacprzyk, J., & Fedrizzi, M. (1996). Probabilistic, fuzzy and rough concepts in social choice. European Journal of Operational Research, 95(2), 264-277. Park, S., Lemaire, J., & Chua, C.T. (2009). Is the design of Bonus-Malus Systems influenced by 1244 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 María-Jesús Segovia-Vargas / María-del-Mar Camacho-Miñano / David Pascual-Ezama insurance maturity or national culture? Evidence from Asia. The Geneva Papers, 35(S1), 7-27. Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. Dordrecht: Kluwert Academic Publishers. Pawlak, Z., Grzymala-Busse, J., Slowinski, R., & Ziarko, W. (1995). Rough Sets. Communications of the ACM, 38(11), 89-97. Retrieved from http:// dl.acm.org/ft_gateway.cfm?id=277421&ftid=1 7537&dwn=1&CFID=220789019&CFTOK EN=72446287 Pitrebois, S., Denuit, M., & Walhin, J.F. (2006). Multi-event Bonus-malus scales. The Journal of Risk and Insurance, 73(3) 517-528. PwC. (2012). The five keys to the industry. Retrieved from http://www.pwc.es/en/financiero- seguros/claves-sector-seguros.jhtml Resende, P. C., Jr., & Guimãres, T. (2012). Inovação em serviços: O estado da arte e uma proposta de agenda de pesquisa. Revista Brasileira de Gestão de Negócios, 14(44), 293-313. Richadeau, D. (1999). Automobile insurance contracts and risk of accident: An empirical test using French individual data. The Geneva Papers on Risk and Insurance Theory, 24(1), 97-114. Salcedo Sanz, S., Fernández Villacañas, J. L., Segovia Vargas M. J., & Bousoño Calzón, C. (2005). Genetic programming for the prediction of insolvency in non-life insurance companies. Computers and Operations Research, 32(4), 749- 765. Salcedo Sanz, S., Prado Cumplido, M., Segovia Vargas, M. J., Perez Cruz, F., & Bousoño Calzón, C. (2004). Feature selection methods involving Support Vector Machines for prediction of insolvency in non-life insurance companies. Intelligent Systems in Accounting, Finance and Management, 12(4), 261-281. Sanchís, A., Segovia, M. J., Gil, J. A., Heras, A., & Vilar, J. L. (2007). Rough Sets and the Role of Monetary Policy in Financial Stability (Macroeconomic Problem) and the Prediction of Insolvency in the Insurance Sector (Microeconomic Problem). European Journal of Operational Research, 181(3), 1554-1573. Schwebel, D. C., Severson, J., Ball, K.K., & Rizzo, M. (2006). Individual difference factors in risky driving: The roles of anger/hostility, conscientiousness, and sensation-seeking. Accident Analysis and Prevention, 38(4), 801-810. Segovia-González, M. M., Contreras, I., & Mar-Molinero, C. A. (2009). DEA analysis of risk, cost, and revenues in insurance. Journal of Operational Research Society, 60(11), 1483-1494. Segovia-Vargas M. J., Salcedo-Sanz, S., & Bousoño-Calzón, C. (2004). Prediction of Insolvency in non-life insurance companies using support vector machines and genetic algorithms. Fuzzy Economic Review, 9(1), 79-94. Slowinski, R., & Zopounidis, C. (1995). Application of the rough set approach to evaluation of bankruptcy risk. International Journal of Intelligent Systems in Accounting, Finance and Management, 4(1), 27-41. Shapiro, A. (2005). Fuzzy logic in insurance: the first 20 years. Actuarial Research Clearing House, 39(1), 1-32. Retrieved from https:// www. soa .o r g /Ne ws - and -Pub l i c a t i on s / Publications/Proceedings/Arch/pub-arch-table- of-contents-2005-1.aspx Shen, Q., & Jensen, R. (2007). Rough sets, their extensions and applications. International Journal of Automation and Computing, 4(3), 217-228. Shyng, J.-Y., Wang, F.-K., Tzeng, G.-H., & Wu, K.-S. (2007). Rough Set Theory in analyzing the attributes of combination values for the insurance market. Expert Systems with Applications, 32(1), 56–64. Silva, J. C. B. (2004). A escolha da seguradora para o seguro fiança locatícia na óptica dos 1245 Rev. bus. manag., São Paulo, Vol. 17, No. 57, pp. 1228-1245, Jul./Sept. 2015 Risk factor selection in automobile insurance policies: a way to improve the bottom line of insurance companies corretores de seguros. Revista Brasileira de Gestão de Negócios, 6(15),49-68. Skowron, A., & Rauszer, C. M. (1992). The discernibility matrices and functions in information systems. In R. W. Slowinski (Ed.), Intelligent decision support (Chap. 2, pp. 331-362). Dordrecht: Kluwer Academic Publishers. Spanish National Institute of Statistics (2015). Accidentes. Serie 2004-2012. Retrieved from http:// www.ine.es/jaxi/menu.do?type=pcaxis&path=/t10/ a109/a04/&file=pcaxis Turner, C., & McClure, R. (2003). Age and gender differences in risk-taking behavior as an explanation for high incidence of motor vehicle crashes as a driver in young males. Injury Control and Safety Promotion, 10(3), 123-130. Warsaw University. (2005). RSES 2.2 User’s Guide. Retrieved from http://logic.mimuw.edu.pl/~rses/ RSES_doc_eng.pdf Wit, G. W. (1982). Underwriting and uncertainty. Insurance: Mathematics and Economics, 1(4), 277- 285. Witlox, F., & Tindemans, H. (2004). The application of rough sets analysis in activity-based modeling, opportunities and constraints. Expert Systems with Application, 27(4), 585-592. Young, V. (1996). Insurance rate changing: A fuzzy logic approach. Journal of Risk and Insurance, 63(3), 461-484. Zuckerman, M., & Kuhlman, M. (2000). Personality and Risk-Taking: Common bisocial factors. Journal of Personality, 68(6), 999-1029. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.