AN AGGREGATION OF SURVEY QUESTIONNAIRES IN A FUZZY ENVIRONMENT AND ITS APPLICATION

In the response process of survey questionnaires, respondents generally express their opinions (assessment) in a fuzzy environment, i.e., assessment with a degree of membership, and in many cases, an aggregation of survey questionnaires is generally computed mathematically. In this paper, we propose a new aggregation method using the FEV (Fuzzy Expected Value) in summarizing survey questionnaires. As an application of proposed aggregation method, to quantify accurately the current state of opinions and the reasons for national IT investment increasing the rate of economic growth, we decided to circulate a survey questionnaire on national IT investment to interested parties throughout the research institute, government, IT industry. Generally, as we have already known and experienced in most of survey questionnaires, there was a little response. Moreover, according to interested parties, the result was biased views on the government IT investment. So, we consider big data analysis on the selection of the proper IT items for national investment. Proposed aggregation method is consisted of 2-phases. In phase 1, keyword selection for the proper IT items based on big data such as Facebook, Twitter, blog, Google, etc. is achieved. This phase is not mandatory (optional and dependent on the characteristic of survey). In phase 2, aggregation by using the FEV is obtained. Proposed aggregation method on survey questionnaires is particularly useful to find current focal issues such as trend, what’s new, etc., on big data such as Facebook, Twitter, blog, Google, etc. Moreover, in many cases, aggregation using the FEV is a better representative value than the arithmetic mean. Generally, the FEV is more suitable than the value of averaging computation in searching for the representative value of fuzzy set.


Introduction
A survey questionnaire is a set of questions used in a survey. The survey questionnaire is a type of data gathering method that is utilized to collect, analyze and interpret the different views of a group of people from a particular population. The survey questionnaire has been used in the different field such as research, marketing, politics views, psychology, and so on. People use survey questionnaire to gather information that is beneficial to a group of individuals. The survey questionnaire uses statistical analysis to collect data, and the result of it will be used in the development of an individual or to a community, as described in the Web site "what is a survey questionnaire?" In this paper, for each survey questionnaire, the following question was asked: "what is the rate between 0 and 1 to which you agree, that 'necessity on national IT investment' is good". In general, we evaluate the appropriateness of this questionnaire and then answer the questionnaire with a degree of agreement. That is, we generally answer the questionnaire fuzzily and subjectively with the rate between 0 (disagree) and 1(agree).
Aggregation of information is a major problem for all kinds of knowledge-based systems, from image processing to decision making, from pattern recognition to machine learning. The aggregation operators are mathematical objects that have the function of reducing a set of numbers into a unique representative (or meaningful) number. In a rather informal way, the aggregation problem consists in aggregating n-tuples of objects all belonging to a given set into a single object of the same set. In the case of mathematical aggregation operator, this set is all the real numbers. In this setting, an aggregation operator is simply a function, which assigns a real number y to any n-tuple ( 1 , 2 , …, ) of real numbers: y = Agg ( 1 , 2 , …, ). Aggregation plays a central role as a means of combining all opinions that are expressed by group members (Choi, 2008).
In this paper, we propose an aggregation method using the FEV (Fuzzy Expected Value) in summarizing survey questionnaires. As an application of proposed aggregation method, to quantify accurately the current state of opinions and the reasons for national IT investment increasing the rate of economic growth, we decided to circulate a survey questionnaire on national IT investment to interested parties throughout the research institute, government, IT industry. We consider big data analysis on the selection of the proper IT items for national investment. Proposed aggregation method is consisted of 2-phases. In phase 1, keyword selection for the proper IT items based on big data such as Facebook, Twitter, blog, Google, etc. is achieved. This phase is not mandatory (optional and dependent on the characteristic of survey). In phase 2, aggregation by using the FEV is obtained. Proposed aggregation method on survey questionnaires is particularly useful to find current focal issues such as trend, what's new, etc., on big data. Moreover, in many cases, aggregation using the FEV is a better representative value than the arithmetic mean (see Example 1 and 2). Generally, the FEV is more suitable than the value of averaging computation in searching for the representative value of fuzzy set (Choi, 1999, Friedman et al., 1997, Friedman et al., 1989, Kandel, 1981, Schneider et al., 1987. However, for the better aggregation, both aggregation methods are complementary rather than competitive. In recent years, social networking is enormously increasing. The rapid adoption of smartphones and SNS (Social Network Services) is driving up the usage of social networking (Manyika et al., 2011). 'Big data' refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyse. Big data generally includes semi-structured data, unstructured data, etc. It does not conform with the formal structure of data models associated with existing relational databases or other forms of data tables (Manyika et al., 2011). Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits such as more effective marketing and increased revenue. Big data sources may include credit card usage data, Web server logs, Internet clickstream data, social media activity, mobile phone call detail records, information captured by sensors, photos in SNS, etc (Manyika et al., 2011). In this paper, however, the domain of big data is restricted to Internet and SNS data such as Facebook, Twitter, etc.

Existing Aggregation Operators and the FEV
Fuzzy set theory provides an attractive aggregation connectives for integrating membership values representing uncertain information. These connectives can be categorized into the following three classes: union, intersection and compensation connectives. In a fuzzy environment, existing aggregation operators are, in general, t-norm, t-conorm, mean operators, Ordered Weighted Averaging Operators (OWA) (Yager, 1993) and -operator (Zimmermann and Zysno, 1980). These aggregation operators have some problems in that they do not properly reflect the situation in the aggregation process. In order to solve this problem the fuzzy situation assessment model (FSAM) is proposed (Choi, 2008). An overview of existing aggregation operators is described in (Choi, 1999, Klir andFolger, 1988).

Arithmetic Mean
It is often used since it is simple and satisfies the properties of monotonicity, continuity, symmetry, associativity, idempotence and stability for linear transformations (Choi, 1999). The mean always lies between the 'and' and 'or' operator. In many applications, especially multicriteria decision making (MCDM), the union and intersection don't always capture the necessary aggregation of the fuzzy sets. In some of these cases, a mean-type aggregation is more appropriate.

Median
An aggregation operator that follows the idea obtaining "a middle value" is the median. It consists in ordering the arguments from the smallest one to the biggest one. And then takes the element in the middle. This aggregation operator satisfies the boundary conditions, monotonicity, symmetry, idempotence and evidently the compensation behavior. There exists a generalization of this operator: The k-order statistics, with which we can choose the element on the kth position on the ordered list (Choi, 1999).

Ordered Weighted Averaging Operators
The Ordered Weighted Averaging Operators (OWA) provide a means for aggregating scores associated with the satisfaction of multiple criteria, which unifies in one operator the conjunctive and disjunctive behaviors (Yager, 1993) : (1) where is a permutation that orders the elements : The weights are all non-negative (  0) and their sum equals one (∑ =1 = 1). This operator has been proved to be very useful, because of its versatility as described in (Yager, 1993). The OWA operators provide a parametrized family of aggregation operators, which include many of the well-known operators such as the maximum, the minimum, the k-order statistics, the median and the arithmetic mean. In order to obtain these particular operators we should simply choose particular weights. For example, for all , if = 1 ⁄ , the OWA operator becomes the arithmetic mean. The OWA operators are commutative, monotone, idempotent. They are stable for positive linear transformations and have a compensatory behavior in that the aggregation done by an OWA operator always is between the minimum and the maximum. The properties of the OWA operators are explained in considerable details in (Yager, 1993).

T-norms and t-conorms
The t-norms generalize the conjunctive 'AND' operator and the t-conorms generalize the disjunctive 'OR' operator. This allows them to be used to define the intersection and union operations in fuzzy logic. T-norms and t-conorms play a notable role in fuzzy logic theory, but these operators do not admit a compensating behavior. Particular complete works are presented in (Klir and Folger, 1988).

-operator
In Zimmermann and Zysno (1980), they suggest the compensatory behavior seems crucial in the aggregation process. However, t-norms and t-conorms lack of compensation behavior. They discover that in a decision making context humans do not follow exactly the behavior of a tnorm (nor of a t-conorm) when aggregating. In order to get closer to the human aggregation process, they proposed an operator on the unit interval based on t-norms and t-conorms (Zimmermann et al., 1980): (2) where = 1, 2, …, n, n = number of sets to be connected and 0 1, 0 1.
Here the parameter indicates the degree of compensation. If = 0, then =  . This equals the product and provides the truth values for the connective 'AND'. If =1, then =1 − (1 − ). This formula equals the generalized algebraic sum and provides the truth value for the connective 'OR'. The -operator is pointwise injective, continuous, monotonous, commutative and in accordance with the truth tables of dual logic (Klir et al., 1988).
In this paper, however, the domain of aggregation operators is restricted to averaging function, which assigns a real number y to any n-tuple ( 1 , 2 , …, ) of real numbers: y = Agg ( 1 , 2 , …, ).

Fuzzy Expected Value (FEV)
Definition1. Let B be a Borel field of subsets of the real line . A set function (•)is defined on B is called a fuzzy measure if it has properties as follows: (1) (∅) = 0 (∅ is the empty set), In Definition 1, (1) and (2) means that the fuzzy measure is bounded and nonnegative, (3) means monotonicity, and (4) means continuity. It should be noted that if  is a finite set, then the continuity requirement can be deleted. Let  be a B-measurable function such that  [0, 1]. The FEV of  over the set A, with respect to the fuzzy measure (•) is defined as follows (Friedman et al., 1989): The ability to summarize data provides an important method for getting a grasp of the meaning of a larger collection of data. It enables humans to understand the environment in a manner amenable to future useful manipulation. It also provides a starting point for the ability to make useful inferences from large collections of data. The mean does help in understanding the content of data, but in some respects, it may provide too terse as a summary (Choi, 2008, Choi, 1999. The FEV is usually used for evaluating the most 'representative' or 'typical' value of a fuzzy set as a measure of general tendency (Choi, 2008, Choi, 1999, Friedman et al., 1997, Friedman et al., 1989, Kandel, 1981, Schneider et al., 1987.

Examples of FEV
Example 1. In a certain decision situation, for computational simplicity, we assume that decision-maker consider 100 situation factors to solve his/her decision problem. For each situation factor, the following question was asked (Choi, 1999): "what is the rate between 0 and 1 to which you agree, that 'situation factor' is good". The results obtained as follows: 45 situation factors are rated: 0.0 40 situation factors are rated: 0.2 15 situation factors are rated: 0.25 We thus have 3 different thresholds (0.0, 0.2, 0.25). The first thing we have to do is to check how many answers are above each threshold (percentage-wise). Obviously, 100 answers are above or equal to 0.0, 55 answers are above or equal to 0.2, and 15 answers are above or equal to 0.25. Pairing these data and rearranging them by increasing order of the value of threshold, we obtain the following three [T, ] pairs: It means that the representative value of "good", in 'decision situation is good' is 0.2. i.e., FEV = 0.2. In this case, arithmetic mean of these data is 0.1175. We know that arithmetic mean skewed toward 0.0 more than the FEV. Note that the FEV is a better representative value than the arithmetic mean. Generally, the FEV is more suitable than the value of averaging computation in searching for the representative value of fuzzy set (Choi, 2008, Choi, 1999, Friedman et al., 1997, Friedman et al., 1989, Kandel, 1981, Schneider et al., 1987. Thus, the FEV, which is the maximum of all these minima, is: max (0.1, 1/6) = 1/6 Note that the FEV is a better representative value than the arithmetic mean.

An Application of Proposed Aggregation Method
To quantify accurately the current state of opinions and the reasons for national IT investment increasing the rate of economic growth, we decided to circulate a survey questionnaire on national IT investment to interested parties throughout the research institute, IT industry. As we have already known and experienced in most of survey questionnaires, there was a little response. Moreover, according to interested parties, the result was biased views on the government IT investment. So, we consider big data analysis on the selection of the proper IT items for national investment. This experiment was conducted under the following conditions: First, use open API, crawling tool, etc., to collect keyword (IT items)-related data. Second, boundary of crawling is domestic level. Third, crawling targets for big data are Facebook, Twitter, blog, Internet portal. After keywords selection process for national IT investment, we obtain some keywords for IT items such as 'AI' 'IoT', 'big data', etc. Now, these keywords for IT items can be implicitly used as a trend. Using these selected IT items, a survey questionnaire on national IT investment in Table 1 is obtained and it is circulated to interested parties such as the research institute, IT industry, etc. We try to find IT items' trend ranking for national IT investment. The following each question was asked: "What is the rate between 0 and 1 to which you agree, that 'necessity on national IT investment' is good", respectively, as in Table 1. National IT investment on Cloud Question10 National IT investment on Platform * Assessment 0 means disagree and 1 means agree For each survey questionnaire, for computational simplicity, we assume that 100 different views of a group of people from a particular population such as the research institute, IT industry, etc., are collected. For example, on question 1 ("National IT investment on Mobile App.") in Table 1 In this case, aggregation results using the average and the FEV on question 1 are shown respectively in Table 2 and 3. Similarly, aggregation results using the average and the FEV on questions 2-10 are obtained as shown respectively in Table 2 and 3. According to aggregation methods, we may get different aggregation results as shown in Table 2 and 3. Proposed aggregation method in an aggregation of survey questionnaires can be explained by 2-phases as follows: Phase 1 (Keyword selection based on big data): Based on big data such as Facebook, Twitter, blog, Google, keyword selection for the proper IT items is achieved as described in this Section. They become trend keywords such as what's new, real-time keywords on big data. This phase is not mandatory (optional and dependent on the characteristic of survey).
Phase 2 (Aggregation): Aggregation using the FEV is obtained.
Proposed aggregation method on survey questionnaires is particularly useful to find current focal issues such as trend, what's new, etc., on big data such as Facebook, Twitter, blog, Google. Moreover, aggregation using the FEV is a better representative value than the arithmetic mean (see Example 1 and 2). Generally, the FEV is more suitable than the value of averaging computation in searching for the representative value of fuzzy set (Choi, 2008, Choi, 1999, Friedman et al., 1997, Friedman et al., 1989, Kandel, 1981, Schneider et al., 1987. However, for the better aggregation, both aggregation methods are complementary rather than competitive. We briefly summarize the differences between the proposed aggregation method and existing aggregation using the average in Table 4.

Conclusions
In this paper, we propose a new aggregation method using the FEV in summarizing survey questionnaires. It is consisted of 2-phases. In phase 1, keyword selection based on big data such as Facebook, Twitter, blog, Google, etc. is achieved. This phase is not mandatory (optional and dependent on the characteristic of survey). In phase 2, aggregation using the FEV is obtained. Proposed aggregation method on survey questionnaires is particularly useful to find current focal issues such as trend, what's new, etc., on big data such as Facebook, Twitter, blog, Google. Moreover, in many cases, aggregation using the FEV is a better representative value than the arithmetic mean. Generally, the FEV is more suitable than the value of averaging computation in searching for the representative value of fuzzy set (Choi, 2008, Choi, 1999, Friedman et al., 1997, Friedman et al., 1989, Kandel, 1981, Schneider et al., 1987. However, in many cases, the use of the FEV as an aggregation method may also lead to improper results. For the better aggregation, both aggregation methods are complementary rather than competitive. There is also a need for some field works to test the value of the proposed approach. In addition, we need further research on different aggregation methods such as clustering FEV (CFEV), most typical value (MTV) (Friedman et al., 1997) for n-dimensional sets, etc.