Learning Decision Sequence [Graduate Thesis]
read the beginning of the thesis here inline:
Machine learning is the ability of a computer to improve its problem solving capability based on previous results. The machine learning is a scientific discipline which is concerned with the development of theories and algorithms that allow computer technologies to learn from data collected from prior solution instances. Major focus of machine learning is to recognize complex patterns and make intelligent decisions analyzing those patterns. Many disciplines including artificial intelligence, data mining, neural networks and decision tree theory are concerned with the goal of investigating and finding new methods of machine learning.
The task of this thesis is development of a learning system capable of learning from prior solution instances of a problem and able of developing new solution procedures when similar problem is introduced to the system. This research considers particular type of learning system, where steps taken to solve the problem by various experts/decision makers (DM) is documented and stored in the system. Note that DM is an abstraction that may represent human, machine or system. Every step toward the goal presents an intermediate decision and outcome is a result of a combination of decisions. The only information available to the system to evaluate the correctness of decisions (actions) is the quality of the outcome.
Sutton and Barto (1998) define reinforcement learning (RL), as the learning what to do and how to map situations to actions to maximize a reward. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by merely trying them.
Our approach is based on fundamentals of reinforcement learning (Barreto and Anderson 2008, Whiteson and Stone 2006, Ribeiro 2002, Ghavamzadeh and Mahadevan 2007), which is often used for modeling real life applications such as, games (Fujita and Ishii 2007, Duan et al. 2007). The approach proposed in this thesis assumes that the information about the success or failure of an action sequence in problem solving is provided only when the sequence is complete. In contrary, each decision in RL problem solving is assigned a “reward” or “punishment” by the system depending on the correctness or incorrectness of the decision. Also, in RL process is accomplished through agents, while in the proposed approach graph theory is used for knowledge representation and inference. The similarity of developed model with RL is that both approaches use dynamic programming algorithms to find the best solution.
The thesis consists of two major sections. Initial approach builds a learning system to document the steps of decision sequences in problem solving and also information/data used for decision making. Directed Acyclic Graph (DAG) (Stanley 1973, Bang-Jenson 2008, Czumaj et al. 2007, Pearce and Kelly 2006) is used to represent these decision sequences, where the nodes in the graph represent the actions taken by the problem solver and directed edges represent the order of the decisions (steps). It is assumed that multiple decision sequences are documented in the system using DAG, since in case of a single decision sequence documented in DAG no learning takes place and “learned” solution of the problem is the input to the system. As mentioned earlier typically many decision sequences are documented in the system and decisions that are based on the same arguments are mapped to the same node in DAG. The algorithm presented in the thesis for the optimal DAG construction is analyzed in terms of computational time and storage space.
Second part of the thesis is concerned with finding the best solution from the learning space of DAG. The notion of “best” solution is based on the problem and the application (shortest completion time, shortest distance traveled, least load or cost required and so on). Dynamic programming approach (Bellman and Drayfus 1962, Bird and Moor 1993, Powell 2007) is proposed to quickly examine various solution alternatives documented in DAG and develop as new improved solution for the problem. Algorithm is developed to initialize the best solution sequence and it also serves as the basis of the recursion used in dynamic programming approach. Finally, new algorithm is developed that recursively searches for the best solution using dynamic programming approach.
The task of finding the diagnosis of a patient in medicine is presented in detail as application for developed approach. In this case entities are medical tests, decision makers are the doctors, and goal outcomes are diagnosis. Different doctors might come to same conclusions by performing different medical tests. In this case our goal is to learn based on diagnosis history of different doctors, and find the best sequence of tests, so that a minimum number of test are performed.
The situations that different sequences lead to same goal outcome are common in real life. Examples are not limited to medical field. Other application areas include road planning, manufacturing, product assembly, and so on. The examples are discussed in later section of the thesis after presentation of approach and algorithms.
 In mathematics, a directed acyclic graph DAG, is a directed graph with no directed cycles; such that, for any vertex v, there is no nonempty directed path that starts and ends on v.
Focus of machine learning research is the construction of system that can automatically recognize knowledge patterns and make intelligent decisions based on prior accumulated knowledge. Learning process takes place whenever system under the consideration changes its structure or accumulates more knowledge (based on its inputs or in response to external information). The newly gathered knowledge then is used to improve future performance of the system. The improvements might be either enhancements to already performing systems or synthesis of new systems (Nilsson 1996). Machine learning systems are used in many scientific disciplines such as robotics, medical science or bioinformatics. They are also of increased interest in real-life applications such as, building personalized GPS navigation system or creating high-performance speech-recognition system.
2.1 Artificial Intelligence
The basic unit of intelligent system is the agent. An agent is a computational mechanism that exhibits a high degree of autonomy and performs actions based on information (sensors, feedback) it receives from the environment it operates within (Panait and Luke 2005). Russell and Norvig (2003) define agent as anything that can sense, recognize and understand its environment through sensors and can act in that environment through effectors. AI based algorithms use the concept of rational agent in problem solving. Rational agent acts rationally and has high probability of making right decision at a moment of time. Several learning models (Goldman and Rosenschein 1996, Garland Alterman 2004, Haynes et al. 1996, Doniec et al. 2008, Montano et al. 2009) use multi-agent environment, where multiple agents actively interact with one another to solve a common problem. To ensure effectiveness of the multi-agent environment, it is a constrained that at any given time an agent should not know anything about the world that other agents know (including the internal states of the other agents themselves).
Multi-agent systems (MAS), emphasizes the joint behaviors of agents that cooperate to solve a problem. MAS has two main learning models: cooperative and concurrent learning. Cooperative learning uses a team learning approach where all agents serve one common goal. In team learning system a single learner is involved, however this learner is discovering a set of behaviors for a team of agents, rather than a single agent (Haynes et al. 1996, Haynes and Sen 1995(a), (b), 1997). Although system involves a single learner it still poses challenges as agents interact with one another and the joint behavior can be unexpected. Concurrent learning uses multiple concurrent learning processes where each agent has main individual goal. Several learning processes attempt to improve parts of the team simultaneously in order to solve the proposed problem (Iba 1996, Iba 1998, Miconi 2003). Typically, each agent has its own unique learning process that modifies its behavior. In contrast to cooperative learning, concurrent learning approaches typically have a learner for each team member, assuming that this reduces the joint problem-solving space by projecting the problem into N separate spaces.
There are three main learning approaches in agent-based environment: supervised, unsupervised, and reinforcement learning. These methods are distinguished by type of feedback they provide to the learner. In supervised learning the correct output is provided. In unsupervised learning no feedback is provided. In reinforcement learning the agent is rewarded for good responses and punished for bad ones (Russell and Norvig 2003).
Supervised learning includes elements of two theories: 1) classification – the ability of determining which category instance belongs to and 2) regression – based on set of input-output pairs develop function of input and output mapping. Due to complexity of interactions between multiple agents supervised learning methods are not easily applied to most problems as they assume that system provides agents with the “correct” behavior for the situation. The exception involving teaching in the context of mutual supervised learners is presented in (Garland and Alterman 2004, Goldman and Rosenschein 1996). Unsupervised learning has the ability to find patterns in a stream of input. The goal of unsupervised learning is determining how the data is organized. Most common forms of unsupervised learning include data clustering, Self-Organizing Map and Adaptive Resonance Theory models of neural networks. Reinforcement learning (RL) methods are especially useful in cases where reinforcement information (in terms of punishments and rewards) is provided after the actions take place in the environment (Barto et al. 1990, Kaelbling et al. 1996). RL methods are inspired by dynamic programming concepts and define formulas for updating the expected utilities and for using them for the exploration of the state space. Those methods have theoretical proofs of convergence. Along with the RL methods that estimate value functions there are stochastic search methods which directly learn behaviors without applying to value functions.
Semi-supervised learning is a type of learning that uses elements of both supervised and unsupervised learning models. It is a class of machine learning techniques that uses both labeled (supervised learning) and unlabeled (unsupervised learning) data for training. Typically, a small amount of labeled data with a large amount of unlabeled data (Abney, S. 2008, Altun et al. 2005, Blum et al. 1998). Certain number of semi-supervised learning applications use graph based models (Zhu 2005 (a), (b)). Blum and Mitchell (1998) introduced combining labeled and unlabeled data with co-training and later introduced algorithms based on finding minimum cuts in graphs to learn from both labeled and unlabeled data (Blum and Mitchell 2001).
Recently there has been increased interest in non-centralized approaches to solving complex real-world problems. Many of these approaches use distributed systems technique where a number of entities cooperate towards achieving a common goal. The combination of AI and distributed systems led to distributed problem solving in which problem is decomposed and assign many processors or computers.
2.2 Decision Learning Trees
Decisions trees are used for data classification and learning based on attributes of data. Every branch in the decision tree represents a choice between a number of alternatives, and each leaf node represents a classification or decision. The values of the object attributes are propagated through the nodes of the tree to the leaf during object classification. Decision trees are used for classification learning. Each leaf in decision tree is a class outcome (not necessarily unique, as two distinct routs may lead to the same class). Learning stage involves creating new branches from root to the leaf with expected classification. After the learning stage is complete, upon arrival of new input without class data, it is propagated from root to leaf with most likely outcome. Thus larger learning data leads to more accurate classification and unreliable probability estimates from small number of training instances often produce high error rates. Current-best-hypothesis approach used traditionally in decision tree learning has the disadvantage of using a single hypothesis and relying solely on probability for evaluating the generated inductive rules against input data. Version space method (Mitchell 1977, 1982), improves current-best hypothesis approach by maintaining the set of all consistent hypotheses and eliminating those found to be inconsistent with new examples. The approach was used in the expert system for chemistry (Buchanan and Mitchell, 1978), and later in LEX system (Mitchell 1983) to solve calculus problems. Decision tree algorithms, such as, ID3, C4.5 (Quinlan 1993, Murthy 1998), classification trees CN2 (Clark and Boswell 1989), are widely used in the industry. However they required large computing power and storage space and thus are limited to small scale problems. In order to address memory consumption issue most recent decision tree algorithms, such as, SLIQ, SPRINT (Han and Kamber 2006, Shafer et al. 1996), use presorting techniques to handle dataset that are too large to fit in memory.
2.3 Explanation Based Learning
Explanation Based Learning (EBL) is type of analytic learning that uses deductive mechanisms. Its advantage is that learning model of an example is sufficient for learning. In EBL one can use examples to directly solve new problems with analogical reasoning approach without any prior knowledge. To utilize the example the approach requires set of axioms about the domain of interest, goal concept and criteria for determining which features in the domain are efficiently recognizable. EBL has its bases in the techniques used by the automated STRIPS planner (Fikes et al. 1972). STRIPS uses divide and conquer approach to achieve conjunction of goals, i.e., create a plan to achieve a preliminary goal and then create a plan to achieve the rest of the goals. Upon construction of a plan, its generalized version is saved in a plan library and used to support future planning as a macro-operator. EBL involves reformulating the domain theory to produce general rules that classify examples in a single inference step (Mitchell 1997). Classifying future examples that are similar to the training examples is performed very quickly. Recent analysis had led to a better understanding of the potential high costs of EBL in terms of problem-solving speed required for applying the learned rules (Minton 1988).
2.4 Relevance Based Learning
Relevance Based Learning (RBL) is a deductive learning approach that generalizes the information from prior (background) knowledge and uses hypothesis to create new learning examples. RBL information in the form of functional dependencies was first developed and is still mainly used in database community, where its main application is structuring large sets of attributes into manageable subsets (Russell and Norvig 2003). Tadepalli (1993) describes an ingenious algorithm for learning with determination that shows improvements in learning speed.
2.5 Inductive Logic Programming
Given the input set of examples represented as logical database of facts and encoding known to background knowledge, Inductive Logic programming (ILP) system can derive a hypothesized logic program which entails all positive and none of the negative examples. It is particularly useful in case of bioinformatics and natural language processing. ILP uses logic programming as a uniform representation of examples, background knowledge, and hypotheses. First major ILP program was FOIL (Quinlan, 1990), which allowed practical induction of relational rules. Next major system introduced was CIGOL by Muggleton and Buntine (1988) that uses inverse resolution and capable of generating new predicates. More recent system PROGOL by Muggleton (1995) has been applied to a number of practical problems, particularly in biology and natural language processing. Muggleton (2000) describes an extension to PROGOL to handle uncertainty in the form of stochastic logic paradigms.
2.6 Neural Networks
Neural Networks (NN) were inspired and modeled based on brain neurons. It is a mathematical model that tries to simulate the structure aspects of biological neural networks. NNs consist of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. They have wide range of applications, including in digital signal processing, detection of medical phenomena, stock market predictions, and so on.
The feed-forward neural network was the first and simplest type of artificial neural network devised. In this network, the information moves only in forward direction, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network. The elementary type of the feed-forward network of only one node is perceptron. The back-propagation technique that is capable of adjusting its weights was introduced by Bryson and Ho (1969) but it was used after the publications (Werbos 1974, Parker 1985).
Contrary to feed-forward networks, Recurrent Neural Networks are models with bi-directional data flow. While a feed-forward network propagates data linearly from input to output, Recurrent Neural Networks also propagate data from later processing layers of the network to earlier layers. Hopfield networks (Hopfield 1982) are the most used class of recurrent networks. Hopfield network functions as an associative memory.
Other popular types of Neural Networks include Support vector machines and Boltzman machines (Vapnik 1998, Cristianini and Shawe-Taylor 2000, Scholkopf and Smola 2002). Support Vector machines are used in text categorization (Joachims, 2001), bioinformatics research (Brown et al., 2000), and text recognition (DeCoste and Scholkopf 2002).
NNs are statistical learning models, that are particularly useful for predictions and learning from continues data where most of discussed logical methods hardly succeed. The main drawback of the method is computational complexity and difficulty of finding the best cost function. Since our research deals with discreet data, where the learning system knows if it succeeded or failed logical programming methods such as modeling on graphs are more suitable and time efficient.