path-dependence and routines

The emergence of path-dependent behaviors

in cooperative contexts

Massimo Egidi and

Alessandro Narduzzo[1]

Abstract

The issue of path-dependence in organizational learning is explored by analyzing human behaviors in an artificial context in which many agents must cooperate to achieve a common goal without being allowed to use verbal communication. The artificial context is based on Transform the Target, a game created by M. Cohen and P. Bacdayan to explore in laboratory the emergence of rules of coordination and the routinization of behaviors.

The game has very different starting configurations, depending on the card distributions. There exist two sub-optimal strategies which allow players to achieve the final goal by coordinating their efforts.

The efficiency of each strategy, measured in terms of the (lowest) number of moves required to achieve the goal, depends upon the starting configuration: some initial configurations of the game can be more efficiently solved by one strategy, while others can be easily solved by the other.

The working hypothesis of the experiment was that if a group of players was exposed to a set of preliminary runs characterized by starting configurations all of which could be easily solved by the same strategy, they could be "induced" to discover this solution more easily than the alternative one and to memorize it more deeply.

To test this hypothesis two groups of players were asked to play a tournament. During the first part of the tournament (the training phase) every group was exposed to a set of starting configurations which could more easily be played using one strategy only. After the training phase both groups was exposed to the same (random) configurations.

We observed the emergence of a persistent differentiation in players' behavior. The group of players exposed to a set of configurations which led more easily to one strategy continued to use it more frequently in the second part of the tournament, and symmetric behavior arose in the other group. Moreover in both groups there emerged a subset of players with strongly routinized behaviors, i.e. groups of players which, after the training phase, used one sub-optimal strategy for all runs of the tournament: they adopted a strategy once and for all and insisted on using it even when the configurations could not be efficiently played with the strategy adopted.

These results are used to define precisely and test experimentally the degree of routinization in players' behaviors, the lock-in effect of the learning process, and the sub-optimality of routinized behaviors.

While the experiment was based only on the observation of micro-behaviors, after the tournament subjects were required to verbalize their ideas about the strategies they adopted. Their answers permitted comparison between the micro behaviors and the "mental models" that emerged from verbalization. They explained routinization in terms of the triggering of actions induced by sets of condition-action rules, and they yielded data on the extent of tacitness. The paper ends with a brief exploration of the implications for both the cognitive microfoundations and the institutional aspects of the theory of the firm and organization.

1. Introduction.

1.1 The experimental setting.

The persistence of diversity among economic organizations has received a considerable amount of attention in the recent literature. Considerable light was shed on the matter when it became clear that one important source of diversity lies in path dependent features of organizational learning: similar organizations - for example, small firms competing within the same industry - can increase their differences over time if they respond with different strategies to environmental changes. If we interpret the different ways in which firms respond to changes as the outcome of their different and idiosyncratic accumulations of knowledge, then learning - here regarded as a form of knowledge acquisition - becomes the key element in explanation of such differentiation.

One of most distinctive features emphasized in both individual and organizational learning is path dependency. A number of important studies in different theoretical areas - technological change (Kauffman 1988, Brian Arthur 1988, 1989, Dosi and Kaniowsky 1994, David 1988, 1989), organizational learning (March 1981, Levitt and March 1988, Levinthal 1994), economic and institutional change (North 1991, Denzau and North 1994) - claim that path dependency plays a key role in explaining the evolution of, and differentiation among, economic organizations and institutions.

In some important research areas the role of path dependency has been successfully modeled using a growing array of sophisticated mathematical models, like Landscape Theory and Polya Urns, (Wright 1932, Kauffman 1989, 1993, Kauffman and Johnsen 1992, Hill, Lane and Sudderth (1980), Arthur, Ermoliev and Kaniovski (1983) ). Nevertheless, the empirical evidence on the way in which path dependent processes develop over time is still meager.

One reason for this lack of evidence is the considerable difficulty encountered by researchers in selecting and collecting the relevant data from a huge amount of information: when learning and path dependency are observed at the level of the rules that govern organizations - by monitoring the planning activities of managers and employees - the analysis is usually based on a high stylization of facts. Field experiments do not enable researchers to retrieve all the relevant information on the learning process and on the formation of the rules adopted by individuals at the micro-micro level: the process is too dense in information, and it cannot be easily scanned and broken down into its elementary components. The overabundance of information forces observers to select information according to some prior stylization.

This scarcity of experimental evidence at the micro-micro level is especially odd when one considers the huge number of experiments on rational behavior in decision making designed by Allais (1953) and Kahneman-Tversky (1986). One reason for the lack of experimental evidence on individual and interactive learning in cooperative processes is the "technical" difficulties and obstacles involved in handling flows of information. A second constraint lies in the opaque nature of the learning process, much of which is unclear to the subjects involved : even though they endeavors to clarify the rationale of their actions, there is no clear evidence that these attempts faithfully reflect their will and beliefs.

That humans beings are generally unable to achieve perfect awareness of their mental processes, and that their behaviors are based on partially opaque deliberations, was first contended by Michael Polany in his 1958 book Personal Knowledge. Since Polany's work, the idea of tacit knowledge has been transferred into the theory of the firm and revisited in the context of routines creation by Nelson and Winter (1982), whose analysis, conducted at the level of the tacit features of organizations, is complementary with the results of studies on the transfer of cognitive skills (Singley and Anderson 1989).

Therefore, even though the opaqueness of knowledge has been recognized and detected in the field of cognitive psychology, problems arise in the use of standard psychological experimental method to analyze phenomena which involve opaqueness. These problems stem from the fact that the most widely used techniques of experimentation (like protocol analysis, for example) are based on direct responses of individuals to tests. They fail to take account of the partial unawareness of actors and of the limits on the verbalization of mental processes (Nisbett and Wilson 1977, Ericsson and Simon 1984).

An alternative experimental device - one which enables experimenters to move beyond the limits of classical tests - has been devised by Michael Cohen and Paul Bacdayan. This device consists in the creation of an artificial context for collective action: a game in which two agents must cooperate to achieve a common goal without being subjected to verbal tests. This artificial game, called "Target The Two" (Cohen and Bacdayan 1994), permits analysis of the emergence of rules of coordination without involving the players' verbal competence.

1.2 Routinized behaviors and routinized thinking.

Previous experiments with Target the Two (Cohen and Bacdayan 1992, Egidi 1994) provide evidence that in tournaments, after an initial learning period, players' behavior grows increasingly routinized. Of course the problem is to define "routinization of behaviors" in such a way as to allow experimenters to test routinization precisely. We contend that in the context of games like Target the Two, the routinization of behaviors can be precisely defined and therefore experimentally tested.

Cohen and Bacdayan define organizational routines as "patterned sequences of learned behavior involving multiple actors who are linked by relations of communication and/or authority" (1994, page 555). They therefore consider the occurrence of repeated sequences of action to be the most salient feature of routinized behaviors (although they cite three further features: reliability, speed, and occasional sub-optimality). In their experiments all four indicators of routinization are statistically significant. A second relevant finding of their research is that routinized behaviors are stored as procedural memory; a property which directly relates to the opaque nature of the knowledge embodied in routinized behaviors and their partially inarticulate nature.

This finding suggests that the automaticity with which players repeat the same sequences of actions can be explained in terms of automaticity in their mental processes. Studies on the mechanization of thinking - the so-called "Einstellung effect" - have a long tradition in psychology (Luchins 1942, 1950). The literature has suggested that routinized behaviors are based on "routinized thinking", i.e. on the automatic use of "chunks" which enable individuals to save on mental effort (Weisberg 1980, Simon and Newell 1972, Laird Newell and Rosembloom 1987, Newell 1990).

Following this tradition, we assume that behind routinized behaviors there lie particular features in terms of mental models (Johnson Laird 1983): subjects who behave in a repetitive (routinized) way follow set of rules sedimented in the long term memory which enable them to make their actions with a reduced mental effort. We therefore consider - as preliminary hypothesis - routinized behaviors to be the outcome of routinized thinking. With this assumption "automaticity" is considered important not only at the behavioral level but also and mainly at the level of mental model. Interestingly, this property of mental activity - i.e. the need to save on mental effort, to lighten the load on the short term memory by creating mental building blocks and to store new elements of knowledge in long term memory - is not only widely analyzed in the context of experimental psychology, it was also emphasized by Hayek in his "The Sensory Order" (1952) .

We shall use the term routinized behaviors to denote sequences of actions performed by players over time as they obey a given set of conditional rules of action. As we shall show later, it may happen that repetitive behavioral sequences do not emerge even though players rigidly adhere to a set of condition-action rules. It thus becomes impossible to experimentally reveal the routinization of thinking by looking for repetitive sequences of actions. This happens when - as in Transform the Target - there is a huge number of different initial conditions of the game which are given at random, and produce quite different behavioral sequences even though players rigidly follow the same set of rules.

Consequently, even though testing for routinization via analysis of repeated sequences may be successful in many situations, we suggest a more general approach based on testing for the existence of systems of rules which trigger appropriate actions in response to a given condition of the game. Note that this approach to "routinized behaviors" is consistent with the definition provided by March and Simon:

"We will regard a set of activities as routinized, [then,] to the degree that choice has been simplified by the development of a fixed response to defined stimuli. If search has been eliminated, but a choice remains in the form of clearly defined and systematic computing routine, we will say that the activities are routinized" (March and Simon 1993, page 142)[2].

One aspect of this definition should be particularly stressed: routinized behaviors take place when "search has been eliminated", i.e. when the individual learning process ceases. Of course the problem is to clarify where and how show that "search has been eliminated". If only behavioral data are used, without applying psychological tests, the necessary conditions ensuring that the subject stops searching are obtained, but not sufficient ones. As we shall show, when players try to discover appropriate rules of action, at the beginning of the tournament, their reactions to the same board conditions change over time, and these changes reveals their learning activity. Conversely, if players always follow the same rules of action in different runs of the game, - as many of them do midway through a tournament - this is evidence that they are not longer searching for new rules.

Of course this is indirect evidence, because we observe a stable use of behavioral rules without investigating at the level of mental models. To obtain a direct proof that individuals no longer learn as they are playing, we must employ psychological tests. We have noted above that the automatic use of action rules enables individuals to save on mental effort. The psychological literature suggests that one can experimentally verify whether thinking is automatized by checking if subjects performing a repetitive task are simultaneously able to perform a different mental activity, like problem solving. If, for example, they can solve a puzzle while playing a game like Target for Two, we may infer that their behaviors are routinized. It is possible, therefore, to use the experimental tests of cognitive psychology as complementary tools to check the routinization of behaviors.

1.3 Path dependency and cognitive limits in learning activity.

A key property of many real and artificial contexts in which individuals cooperate to achieve a common goal is that of the variety and multiplicity of the possible solutions. Actors are able to devise many alternative ways to cooperate and solve the problems; the challenge for the experimenters is to understand how subjects learn one instead of the other solution.

Simon 's idea of bounded rationality suggests that learning and problem solving have severe cognitive limitations (Simon 1971). The psychological literature, in particular studies of chess, provides ample evidence that during problem solving activity the complexity of the problem may generate a mental overload. When too many symbolic manipulations are required to explore the alternatives, players are unable to create a comprehensive internal model of the actions required to play optimally. They fail to acquire all the knowledge needed to play the best strategy and consequently explore only a limited part of the space of strategies, and learn and memorize sub optimal strategy .

Therefore, mental overload provides an explanation for path dependency in learning, since it prevents players from achieving full exploration of the space of the problems. It is reasonable to suppose that if a game admits to different solutions, i.e. different, sub-optimal sets of rules with which to achieve the goal - different players come up with these different solutions in relation to the way in which they explore the space of the possible solutions.

One of the main concerns of this paper is to provide experimental evidence that - at least in the context of games like Target the Two - players explore only a limited part of the space of the solutions. They therefore learn and memorize bounded sub-sets of rules which allow them to behave in a satisficing, only locally optimal, way.

In the experiments discussed below, two different groups were exposed to initially different set of game configurations and induced to discover and memorize different sets of behavioral rules: the memorization showed persistency, insofar later on induced the two groups to react in very different ways to the same game configurations.

It was possible to conduct this experiment because the game Target The Two admits multiple sub-optimal solutions: two different, locally optimal strategies, exist while the optimal one is a "mixed" strategy (in a sense we will make clear later). To coordinate their efforts in achieving the common goal, the players must discover one of these strategies (or both). The extent to which they are able to extend their mental exploration is strongly influenced by the way in which they initially learn.

Each strategy is defined by a set of simple action rules which allow players to trigger the appropriate action in a coordinated way in response to every condition of the game. There is a large variety of initial configurations. Which of the two strategies is the more efficient depends upon the initial distribution of the cards.

The working hypothesis of the experiment is therefore that by exposing a group of players to a set of preliminary runs characterized by starting configurations all easily solved by the same strategy, they will be "induced" to discover this solution more easily than the alternative one and to memorize it more deeply.

To test this hypothesis we compared the behaviors of two groups of players. Both groups were made to play a tournament. During the first part of the tournament (the training phase) one group was exposed to a set of starting configurations which could more easily be played using one strategy only. The opposite was the vase of the other group. After the training phase both groups was exposed to the same (random) configurations.

We observed the rise of a persistent differentiation in players' behavior. The group of players exposed to a set of configurations which led more easily to one strategy continued to use it more frequently in the second part of the tournament, and symmetric behavior arose in the other group. Moreover, in both groups there emerged a sub set of players with strongly routinized behaviors, i.e. groups of player which, after the training phase, used one sub-optimal strategy for all runs of the tournament: they adopted one strategy once and for all, and insisted on using it even when the configurations could not be efficiently played with the strategy adopted.

These results are used to define precisely and to test experimentally the degree of routinization in players' behaviors, the lock-in effect of learning process, and the sub-optimality of the routinized behaviors .

While the experiment was based only on the observation of micro-behaviors, after the tournament subjects were required to verbalize their ideas about the strategies they adopted. Their answers permitted comparison between micro behaviors and the "mental models" that emerged from verbalization and explanation of routinization in terms of the triggering of actions induced by sets of condition-action rules, while they also yielded some data on the extent of tacitness.

2. Target the Two.

2.1 Rules of the game.

Since Cohen and Bacdayan (1994) provide a detailed description of the rules of the Target the Two[3] game, it will only be briefly introduced here. The deck consists of 6 cards: 2§ 3§ 4§ and 2© 3© 4©. All six cards are used in each game. Each player has one card and the other four are placed in four positions on the board named Target, Up, DownC and DownN (see Figure 1).

Figure 1. The board for the Target the Two game.

In the Target and Up areas the cards are face-up, while in the DownC and DownN areas they are face-down. Therefore, as soon as the cards are dealt, each player can see his own card and the cards occupying the Target and the Up areas. The game ends when one of the players puts 2© in Target position. In order to do this, the players alternately exchange their cards for one of the cards placed on the board. There are no restrictions on exchange with cards occupying the Up, DownC and DownN areas except that the card placed in the first position must always be face upwards while the other two must be face-downwards. Exchange with the card in Target area is constrained and the rules are different for the two players. One player may exchange his card with the card placed in Target position only if the two cards are of the same color (e.g. exchange 2© with 4©, or 2§ with 3§ ). The other player may exchange his own card with the card placed in Target position only if they have the same number (e.g. exchange 3§ with 3©, or 2§ with 2©). Because of these constraints, the two players are respectively called Colorkeeper and Numberkeeper. Colorkeeper always moves first, then it is Numberkeeper's turn, and so on. They move alternatively and exchange their cards with one of the cards on the board until one of them is able to put 2© in the Target area. There is an additional move, called "Pass", which is always available to the players. When a player decides to "Pass" he skips the move so that it is once again the partner's turn.

To simplify, henceforth the following symbols are used:

U - exchange the card with the card Up

C - exchange the card with the face-down card on the left of Colorkeeper' s card (DownC)

N - exchange the card with the face-down card on the left of Numberkeeper' s card (DownN)

T - exchange the card with Target

P - pass

The reward system is based on the number of moves players make to achieve the goal and on the time that elapses: at the beginning of each hand a given amount of money is assigned to each pair of players. Every move has a fixed cost. Therefore at the end of each hand one pair is rewarded by the difference between the initial amount and the cost of the moves they have made; The session consists of 40 runs, and players have a time limit of forty minutes. Therefore to maximize their reward, the subjects must use the fewest moves possible for every hand, and to play the higher possible number of runs within the forty minutes.

In the first experiments a real board and real cards were used. Afterwards, a computerized version of the game was written to run under NeXTStep; each player has the board reproduced on his screen and may exchange his card using a mouse. All the relevant data regarding moves, times, mistakes are recorded and are available for analysis. Any form of verbal communication and other "physical" expressions of player beliefs and expectations are excluded from the experiment. Nevertheless, an experimental design of this kind establishes a powerful and controlled environment, and provides an extremely "fine grain" source of data for the study of micro behaviors and their evolution, for exploration of the way in which beliefs, expectations and decisions are created, and for analysis of how coordination rules emerge without involving players' verbal competence.

2.2 Searching for routines

The first experiments conducted by Cohen and Bacdayan (1994) with Target the Two provide evidence that after the initial learning period players' behavior becomes more and more routinized .

As noted in the introduction there are two complementary ways to measure routinization: one may either focus on the repetitive sequences of patterned behaviors, as in Cohen and Bacdayan (1994, page 558), or also include considerations regarding the mental model involved. This second experimental level is suggested by the incompleteness that arises if observation is restricted to the sequences over time.

In Cohen and Bacdayan' experiment, routines are revealed as repeated action sequences. As we noted before, to define what means that two sequences are "the same" it depends upon the representation of the problem and the level of observation involved. Cohen and Bacdayan point out, for example, that the sequence of moves Up-Up-anything-Target (UU*T) is often used by subjects (the median pair solves more than one game in four playing UU*T), and that there is a statistically significant positive correlation between the number of times UU*T is played and the number of moves used in the tournament. However, other relevant repetitive sequences exist like Search DownC-Search DownN-Pass-Up (CNPU) which are the typical reactions of over cooperative players to particular configurations The players realize that their partner is searching for a card (C) which is presumed to be useful. They search in the opposite covered position (N) in order to offer the useful card to the partner (U) after he has passed. The sequence is therefore CNPU. Of course the sequence: search DownN-search DownC-Pass-Up reflects the same procedure in terms of the mental model involved but it corresponds to a different sequence of actions. The point is, therefore, that CNPU should be considered as perfectly equivalent to NCPU in terms of routinized thinking, although they differ as regards the sequence of actions. Therefore, there is a limit to defining routinized behaviors as repetitive behaviors which arise in similar conditions. As the above example shows, two phenomena can be considered as similar in relation to the level of abstraction at which the observers are thinking, and more generally in relation to their mental representation of the phenomena.

To overcome this difficulty it is convenient to define routinized behaviors as behaviors based on fixed condition-action rules, following March and Simon (1993). As we have seen, March and Simon refer to routinized behaviors as sequence of choices reduced to a "systematic computing routine", instead of appealing to a set of condition-action rules: yet it is well known in Computation Theory that the two computing devices are equivalent (Cutland 1988), in the sense that it is always possible to build a system of condition-action rules that is equivalent to a "computing routine", i.e. to a Turing machine.[4]

In defining "routinized behaviors" we must therefore carefully distinguish between the sequence of actions realized over time, and the set of rules which generate this sequence .

If we identify a set of rules governing the behavior of a player, we can resolve the conceptual difficulty raised by defining routinized behaviors as "repetitive" : in fact a set of condition-action rules prescribes the action to be made for every condition of the game, and therefore implicitly defines the abstraction level. As we shall see, in our experiment identification of a set of rules governing players' behaviors is possible and statistically significant.

To find out the set of condition-action rules which allow players to coordinate their efforts and achieve the common goal, we must find a way to represent the problem (the goal to be achieved) in terms of sub-goals and discover the logic of collective action which is involved.

2.3 Decomposition of the game into sub-goals and coordination

A well known approach to problem solving suggests that the overwhelming complexity of a problem can be mastered by decomposing it into independent sub-problems (March and Simon 1958, Simon 1963, Laird, Newell and Rosembloom 1987, Nilsson 1971, 1980).

The decomposition of problems into sub-problems proceeds recursively until elementary problems, i.e. problems which are solved, are achieved . This procedure has been used (Egidi 1994, page 7) to analyze the structure of Target the Two, and we will below recall and describe it briefly.

According to the rules of the game the problem is solved when 2© is placed in the Target position by one of the players. This is the final goal and all the configurations with 2© in Target are possible final configurations of the board. In order to solve the game and put 2© in Target, players must take a series of intermediate steps. In the configuration shown in Figure 1, for instance, one of players must find 2©; moreover, since neither of them can exchange 4§ with 2©, they must place another card in Target which allows the final exchange with 2©. For each configuration it is possible to identify sequences of intermediate steps that must be accomplished to solve the game. These sequences of intermediate steps can be conceived of as a decomposition of the problem of the game (2© in Target) into sub-goals.

A graph (see Figure 2) has been introduced in order to represent the space of sub-goals for any possible configuration. This representation focuses only on the card occupying the Target area and illustrates all possible transitions in the Target area [Egidi 1994 pages 9-11].

Figure 2. The graph of sub-goals.

Note that a change between two configurations of the Target marked by horizontal lines is legally performed only by Numberkeepers, while a change of configurations marked by vertical lines is available only to Colorkeepers. By using the graph of sub-goals represented in Figure 2 it is possible to follow the progress of the solution to a game and to keep track of the sequence of cards occupying the Target area from the initial board to its solution. Reasoning backwards and given the rules of the game, the only possible configurations that immediately precede the final one (that is 2© in Target) are those in which the Target position is occupied by either 3©, or 4© or 2§. In fact, because of the rules constraining the exchange with the Target, Colorkeeper can end the game by exchanging his card (that is 2©) with the card in Target only if they are of the same suit (that is 3© or 4©). On the other hand, Numberkeeper is bound by the "number" constraint, and if he has 2© in his hand and wants to end the game, the exchange will be possible only if the card in Target is 2§. Referring to Figure 2, the final goal is in the lower right corner. The sub-goal to be accomplished in order to achieve the final goal is to fill the Target position occupied by one of the cards that in Figure 2 are immediately close to 2©. Inspection of the graph of sub-goals clearly shows the decomposable nature of the problem; in fact, if the Target is occupied either 3§ or 4§, it is necessary to put in Target either 3©ible to record the solution of the game in terms of the sequence of sub-goals and goal that the pairs of players follow to solve the game. For instance, the string 4§ 2§ 2© identifies a specific path in Figure 2 and a well defined solution to the game: when the cards are dealt, the card in Target position is 4§, Colorkeeper eventually exchanges his card (2§) with the Target, leaving Numberkeeper to end the game. On the other hand, these paths may be also very long (e.g. 4§ 3§ 3© 4© 4§ 2§ 2©).

As a consequence of the above remarks on the structure of goal and sub-goals, all possible configurations of the board are classified into two levels. A first level configuration is any board in which the Target area is occupied by one of the three cards that in Figure 2 are immediately close to 2©, that is, any board with 3©, or 4© or 2§ in Target. With regard to boards in which 4§ is in Target., the shortest paths on the Figure 2 that mark changes in the Target area require two transformations. Such configurations have two alternative shortest paths in Figure 2, that is, respectively 4§ 2§ 2© (henceforth 422) and 4§ 4© 2© (henceforth 442). The same reasoning applies to boards in which 3§ is in the Target. Therefore, a second level configuration is any board where either 3§ or 4§ is in Target.

Figure 3. The board for the Target the Two game.

An example (see Figure 3) should clarify these differences. Colorkeeper sees that 2© is in the Up area and may decide to take it. After the first Colorkeeper's move, Numberkeeper can exchange his card, i.e. 4©, with the Target card. At this point Colorkeeper can exchange 2© in Target and end the game. Summarizing, the sequence of moves required to play the game in this way are: exchange with Up, exchange with Target, exchange with Target. In terms of the path of goals, the sequence of cards occupying the Target area during the game is: 4§ 4© 2©.

On the other hand, Colorkeeper may exchange his card, i.e. 2§, with the Target and sets the game to be finished by his partner. Numberkeeper sees that 2© is in Up and that he can take it. On his next turn, Colorkeeper passes and Numberkeeper exchanges his 2© in Target and ends the game. In this second case the sequence of moves were: exchange with Target, exchange with Up, Pass, exchange with Target. The path of sub-goals described by this solution is 4§ 2§ 2©. Note that the number of moves needed to solve a game depends on the specific board and in general differs for the solutions provided by the two paths. There are many other different ways to solve the game; therefore, the sequence of cards that can be placed in Target area before the end of the game (for instance, 4§ 3§ 3© 2©.) can be different. In general, all sequences that differ from 442 and 422 require a higher number of transitions in the Target position and they are less efficient.

2.4 Different representations of a strategy.

So far, we have referred to strategies as dealing with the space of problems and sub-problems. According to this approach, problem solving is a process of decomposition of the original problem into a hierarchical system of sub-problems; this decomposition is iterated until the sub-problems are primitive, i.e. situations that have evident solutions. A strategy is therefore defined by a path in a graph. The path starts from the node defining the problem to be solved and ends at the "terminals" , i.e. the primitive problems. Figure 2 is a summary representation of this graph. Obviously, this representation of a strategy is fully compatible with the definition of strategy in games theory: according to the traditional Von Neumann approach, a game, and more generally a problem, can be represented in terms of state-space by defining the starting state (configuration) and the transition rules with which to change from one state to another. The two definitions of strategy - in terms of sub-goals spaces and in terms of state space - are equivalent and it is always possible to shift from one representation to the other (Nilsson 1980). To see briefly how this is possible, consider the formal features of a strategy in a game. A run is made up of a sequence of game configurations generated by the moves of players: every move modifies the game configuration, and the entire process can be described as a dynamic sequence by which a starting configuration is transformed, via the sequence of moves, to the final one. A strategy is therefore equivalent to a set of instructions which describe the action to be taken in relation to every game configuration in order to achieve the final goal. These instructions (rules of action), allowing players to trigger the appropriate action for every configuration of the game can be formally described as "condition-action rules" in the Computation Theory sense (see for example Cutland 1988). There exists an instruction for making a (right) move for every game configuration, this means that it a local solution has been provided for the sub problem which allows a player to take one (or more) step ahead in the run.
To summarize, a strategy can be represented in a summary form as path in a sub problem graph; or in detailed form as a set of rules to trigger an action for every game configuration. With regard to Target the Two, the 422 and 442 strategies can be described as paths in the problem space (Figure 2); any possible path in the graph defines a sequence of sub-goals to be achieved in order to solve the problem. This representation of a strategy does not mention specific moves, but gives the players directions about the goal to be achieved at each step. Corresponding to every path in the graph, which defines a sequence of goals, are many different ways to realize the goals. For example, suppose that Colorkeeper plays a 422 strategy; he must search for his key-card (2§) and put it in Target area. In relation to the distribution of the cards, 2§ may be in many different (covered or clear) positions on the board, and therefore the sequences of actions corresponding to the same goal may differ considerably.

2.5 Coordination and rationality.

So far two competing, alternative strategies, i.e. 422 and 442, have been defined. Both require coordination and entail a division of labor that specifies two roles: one player must exchange his key-card in Target and the partner must end the game. In the 422 strategy these roles are played by Colorkeeper and Numberkeeper, respectively; in the 442 strategy the roles are reversed and they are played by Numberkeeper and Colorkeeper, respectively. With regard to coordination, it is to be stressed that the players are not allowed to communicate before and during the tournament; therefore the emerging division of labor is not the outcome of a verbal agreement, and the only source of coordinative information is the game situations: the position of the cards on the board and the partner's moves. Nevertheless, in some particular situations such information is not sufficient to coordinate the pair; at other times it is ambiguous and generates misunderstanding. For instance, consider the board shown in Figure 1: If Colorkeeper exchanges his card with the Up card, Numberkeeper receives ambiguous information, since he may think that Colorkeeper has taken his key-card, or that Colorkeeper has revealed Numberkeeper's key-card. In the first case they play the 422 strategy, in the second case they play the 442 one. Therefore, if a pair plays in a perfectly rational manner and it uses all available information, it always coordinates its action, but in ambiguous situations. Of course, playing in a perfectly rational way entails a high computational effort. We may therefore expect that , at least in the first stages of a tournament, novice players will try to find simpler strategies of behavior. This they can do by learning and using one only strategy, say the 422 , whatever the initial configuration of the game (II level) . Players therefore can avoid using all available information by adopting a strategy, like 442, which automatically coordinates their actions . Such not-fully rational behavior may paradoxically yield an advantage in terms of coordination because the ambiguity is reduced by reducing the source of relevant information. For instance, a pair of players that invariably adopt the 422 strategy will not perceive any ambiguity in the previous example. Colorkeeper sees his key card in Up and takes it; Numberkeeper looks for 2© and ends the game. This pair may be even more efficient than a rational one in situations in which too information produces ambiguity. Before exploring the problem involved, i.e. the relation between rationality and efficiency at a deeper level, better understanding is required of how repetitive behaviors, and in particular coordination, can be realized via sets of fixed rules of actions.

2.6 Routinized behaviors
In order to show that routinized behaviors can be represented in terms of condition-action rules, it is convenient to return to the representation of the game in terms of sub-goals. Table 1 shows the moves that players are expected to make if they follow the 422 or the 442 sub goals division respectively. For every distribution of cards in the three visible positions, Hand, Up and Target, the table reports the action expected by Colorkeeper and Numberkeeper. An analogous table can be written by substituting 4§ with 3§ and correspondingly 4© with 3© everywhere.

Table 1. The two sub optimal sets (442 and 422) of Condition-action rules for Colorkeeper and Numberkeeper.
How can the condition-action representation of the behaviors be connected with the representation in terms of repeated actions? The answer is now quite simple: given the initial configuration of the game, by applying the two sets (one for Colorkeeper and one for Numberkeeper) of condition-action rules in Table 1 we generate theoretical sequences which represent the routines in terms of repeated actions. Table 2 reports some of the configurations previously considered (in Figure 4) to analyze the strategy played during the tournament in Egidi's experiment.
By applying the condition-actions rules reported in Table 1 to the configurations introduced in the left part of Table 2 we obtain the sequences of actions listed on the right part of the figure.
Table 2 illustrates of what was suggested in the introduction, i.e. that there are conditions whereby routinized behavior cannot be detected by looking for repeated sequences of actions: even though players blindly follow the same set of rules (instructions), the sequences of actions over time greatly differ because of the initial random distribution of the cards. Players who strictly adhere to the 422 strategy, for example, play sequences like UUTT, TUPT, TCPNPT, CPTT (Figure 4), which are very different. Therefore, the condition-action representation becomes necessary to detect all regularities in behaviors.
We can now finally give precise definition to "routinized behaviors". Routinized behaviors are characterized by the use of a fixed set of conditional rules automatically triggered by players .
Automaticity means that when subjects identify environmental conditions, they consider them as belonging to a repertoire of familiar conditions. This therefore triggers a familiar reaction without the players having to conduct further mental exploration of the problems involved.

Table 2. A sample of the sequences of moves made over time following the two alternative strategies.
This definition implies that players' actions may be sub optimal, because there is no a priori reason why the action rules should give rise to a fully rational behavior. But nor it there anything to prevent the contrary, i.e. that behaviors exist which are routinized and optimal: in fact suppose that there exists one optimal strategy which can be described by a set of action rules. A player able to learn and memorize all of these rules behave optimally. The problem is that normally the action rule system contains a huge number of rules and players have incomplete knowledge of them.
In our particular context, the complete system of rules with which to play optimally is composed of the two set of rules 442 and 422 and in addition a set of dynamic rules (Egidi, 1994 page 26) which prescribe the choice of the 442 or 422 strategy in relation to the information flowing from the board and from the partner's action. If a player fails to take account of the dynamic rules, his behavior will in consequence be sub optimal. The same happens a fortiori if he learns and uses only one of the available strategies.
Therefore it is clear that there are many different degrees of routinization, In our context we consider particularly relevant - and call strongly routinized - the behavior of subjects that always play the same strategy, say 422 (the rules are given by the Table 1) whatever the starting distribution of cards (II level).
This latter condition implies that players do not take into account information that may serve to improve their performance: new information does not induce these players to discover new rules of action. Therefore their mental activity does not entail a learning process (following March and Simon), or at least that their learning activity is not "organized" enough to allow the discovery of new rules of action even when they are clearly more efficient .
Summing up, the definition of routinization we have provided allows us to discriminate different degrees of routinization among players. We have two extreme situations: on the one hand there are pairs of players who learns to play only one strategy. Something prevents them from learning the alternative strategy available as well. They are locked in the set of rules defined by this strategy and they will use it even when it is not efficient. On the other hand we have fully rational players who know both strategies and use all available information to decide which strategy is the most efficient for every specific condition of the game. Why should we not consider the behaviors of these players routinized? In principle they can be routinized, because it is possible that they automatically execute the set of rules prescribed by the optimal strategy. Of course, using analysis at the behavioral level, it is impossible to discern whether the pairs of players who behave with full rationality are routinized or not. In order to discriminate whether they are learning and calculating their action, or whether they are not, we should ask players to verbalize their thinking, and use psychological tests as protocol analysis. This complementary research goes beyond the limits of our paper. More modestly in the next section we shall identify pairs of players who behave rationally , pairs of players who exhibit deviations from rational behavior, and we will classify these deviations in terms of different degrees of routinization.

2.7 Experimental evidence on routinization.
Let us now turn to the problem of fully rational behaviors addressed in the previous section. One way to determine whether if players deviate from fully rational behaviors is to consider the starting configurations of a tournament and whether all of them trigger the same reaction in players who are not strongly routinized. We shall use the data on the replication of Cohen and Bacdayan's experiment discussed in a paper by Egidi (1994).
The data are plotted in Figure 4. For each second level run, the percentage of pairs playing the 442 strategy is reported. Almost all of players reacted by activating the same strategy in some runs, for instance 10, 16, 23, 24, 29 and 37. Therefore is advisable to verify whether there is some relationship between the strategy chosen by the players and the position of particular cards on the board at the beginning of the runs. A natural hypothesis is that players gradually increase their ability to discover their sub goals and therefore grow increasingly able to react to the key-cards which defines the subgoals, i.e. the cards 4©,2§, and 2©.

Figure 4. Percent of pairs that played the 442 strategy to solve the hands of the tournament.
A first step in verification of this assumption is to check the starting configurations of the tournament where one key card only is visible to one of the players, to ascertain whether all players choose the same strategy . Some of the hands are reported in Table 2, which also shows (middle of the figure) the positions of the visible key card on the board at the beginning of any hand. Comparison of Figure 4 with these features of the game configurations (Table 2), shows that the following hands fulfill our requirement: in hands 3, 6 and 28 Colorkeeper has his key-card (2§) in his hand; in hands 4 and 29, Colorkeeper has the double key-card (2©) in his hand. It is clear that in response to this pure information, players behave in a very regular manner: the majority of these choose the 422 path in the former cases, while the majority choose the 442 in the latter cases.
Having observed that the reaction of players to some of the pure configurations is stable, the next step is to explore how players react to all possible combinations of the elementary relevant information, thereby assessing whether they are able to generate a complete system of rules of action.
Consider this new simple rule (A): a player picks up his key card, when it is face up on the board, or uses it when he has the card in hand. If Colorkeeper follows this rule A, he must pick up his key card 2§ - and in consequence he starts to play the 422 strategy - when 2§ is visible on the board; this condition arises in hands 6, 7, 14, 28, 34, 36 (see Figure 4). We should therefore expect these hands to be played with 422 strategy. This hypothesis fits perfectly with the data in Figure 4, with the sole exception of hand n.7. But remember that 442 was played in hands 4 and 29, where Colorkeeper had the double key-card (2©) in his hand. We can therefore add a new rule: if a player has the 2© in his hand he passes, waiting until the partner has been able to find his key card. If we give priority to this latter rule, when also rule A can be applied, we can explain the choice of the strategy for all boards reported on Table 2.
The problem now is to determine whether these rules are "rationally complete", i.e. whether they prescribe choice of the best strategy for every board configuration. To do so we take a more careful look of the comparative efficiency of the two strategies (next section). But even if we limit discussion to the example in section 2.6 it is evident that there are many boards for which the rules are conflicting and many boards to which they are not applicable. The set of rationally complete rules exists, and it is more complex than the set considered above (see Egidi 1994, page 24-26). It can be straightforwardly shown, at least in the experiment in question, that very few players have discovered this set of rules (see section 4). As a consequence, the two rules we experimentally discovered, can be considered as "rules of thumb", which prescribe satisficing but not optimal behaviors.
This finding has some important consequences.
First, the players are strongly limited in their ability to explore the consequences of their move in order to decide which strategy to apply they mainly use the information directly visible, i.e. the cards in their hands or face up on the board. (This information is evidenced in the middle of Table 2). We thus have a first explanation for the fact that the initial conditions of the game trigger one solution or the one alternative to it: the great majority of players use the information available at the beginning of a hand to decide the first move, and this first move is crucial for the triggering of a strategy. They decide on the basis of the simple "rules of thumb" we cited, which are a clear example of boundedly rational behavior in the exploration of the problem.
Second, it is clear that the majority of players in the experiment are not strongly routinized because they are able to decide a different strategy according to the information on the board.
What we found out about routinization by inspecting the global behavior of players across time can be confirmed or disproved by analyzing the behaviors of individual pairs of players over time. Accordingly we now move to a discussion of the behavior of single pairs of players over time and try to establish whether there exists a typology of differently routinized pairs. If we take account of the fact that during first runs all the pairs must learn at least one strategy, and that in any case numerous errors are produced by lack of coordination, we may expect different behaviors to range between the two extremes of purely routinized and perfectly rational behavior.
For any run in a tournament we may expect a pair of fully rational subjects to play the most efficient of the 442 and 422 strategies. We may thus compute the proportions of games in the tournament which can be played more efficiently by using one strategy, say 442. In our case the proportion of times that the 442 strategy proved most efficient is 35%. We may therefore expect to find the same distribution in the data if players are fully rational.
On the other hand, we have players who are locked in a sub-optimal situation, and who are expected to use one strategy only for every initial configuration[5]. Let us now inspect the empirical data (Figure 5): from the small distribution over the tails it is clear that only a few pairs solved the games of the tournament playing only one strategy. The majority of them played both 422 and 442, and with the proportions we predicted, i.e. 35% and 65%. In consequence we have confirmation that the majority of players adopt behaviors which are not fully routinized.

Figure 5. Frequency distribution of 442 strategy.
3. Building cognitive traps
We have noted that the two available strategies 442 and 422 are by no means equally efficient, since the number of moves required to solve a game using each of them differs and depends on the initial configurations of the game. Considering all the possible combinations of the cards in the six positions, there are 120 different starting configurations of second level; 12 of them are neutral (both strategies are equally efficient), 54 are solvable more efficiently with the 442 strategy (because the number of moves required to achieve the goal using 442 is lower that the number of moves required by 422), and 54 are more efficiently solvable with 422. Table 3 provides some examples.

Table 3. Sample of second level configurations showing the strategy efficiency.
[[Delta]] is the difference between the number of moves required by the 442 and 422 strategies to achieve the goal. The key card distributions corresponding to the upper and lower values of delta on Table 3 are antithetical: the two extreme situations occur when Colorkeeper has his key card 2§ in his hand, and Numberkeeper has 2© (422's efficiency is maximum), and likewise when Numberkeeper has his key card 4© in his hand and Colorkeeper 2© (442's efficiency is maximum). This suggests that if players are exposed more frequently to one of these extreme configurations, they will learn and memorize it more easily.
Table 3 shows how many moves are required to solve the game playing the 442 and 422 strategies for some of the 120 possible initial configurations. If we order all the 120 strategies for increasing distance, and complete Table 3, we obtain the result plotted in Figure 6. The abscissa consists of all the different second level starting configurations, codified with integers. The vertical axis denotes the number of moves required to achieve the goal by using the 442 or 422 strategy. The right part of Figure 6 shows the starting configurations which can be solved more efficiently way by using 442. The left part of the figure depicts the opposite relation (422 is more efficient).

Figure 6. Landscape of strategy efficiency. The vertical axis consists of number of moves required by strategies 442 and 422 to solve the games.
Given the initial board of the game, what is the relation between the efficiency of the strategy and the complexity of the problem subspaces to be explored in order to discover the strategy? We assume that there is an inverse correlation between the two terms. In fact a strategy X is considered efficient, for a given starting configuration, if the number of moves required to solve the problem is small: but if, by using a strategy X, a hand can be played with a very small number of moves, it is reasonable to expect that a player exposed to this hand during the learning period should be able to discover it easily.
Vice versa, we expect that the higher the number of alternatives to be evaluated, the harder it will be to build a mental model of the situation and to understand how well this model fits with the actual situation. This assumption is complementary to the hypothesis of bounded rationality made in section 2.5 during discussion of Figure 4.
The working hypothesis of the experiment was therefore that by exposing a group of players to a set of preliminary runs characterized by starting configurations all easily solved by the same strategy, they would be "induced" to discover this solution more easily than the alternative one and to memorize it more deeply.
We selected a set of starting configurations of the board all of which required very few moves to be played with the same strategy 422, while the alternative strategy required a higher number of moves. Analogously we prepared a set of starting configurations which were easy to play using the 442 strategy. A tournament was organized as follows: in the first phase one group of players (called the 422 group) was exposed to the set of configurations that was more efficiently solved playing the 422 strategy. Another group (called the 442 group) was exposed to hands more easily played with the 442 strategy. Both groups were exposed to the same boards in the second part of the tournament, which consisted of 27 hands with randomly distributed cards. By comparing how the two groups exposed to the different training sets played the second part of the tournament, we can evaluate whether their behaviors were significantly different and therefore check if path-dependency occurred.[6]

4. Path dependent behaviors emerging from experimental data.
The first interesting question is whether differentiated training was able to create a difference between the two groups and whether subjects actually explored only a part of the strategy space. Figure 7 is analogous to Figure 5 and reports how many times each pair used the 442 strategy to solve the first 15 hands. The distribution of the two groups is markedly different and within each group the majority of pairs played a single strategy in a routinized way. Moreover, such result confirms the hypothesis that the subjects look at the board to decide the strategy to play.

Figure 7. Frequency distribution of the 442 strategy, played in the first 15 runs of the tournament.
On the other hand, in order to judge whether or not rule-based behavior is a product of a path-dependent process of learning, analysis must focus on the last 27 games. In terms of the number of moves needed by the two groups to solve the 27 games, there is no difference: the average number of moves is around 165. In other words, there is no difference in the efficiency of the two groups. Nevertheless, the quality of the solutions played is clearly different as we shall see. Table 4 and Figure 8 report the percentage of pairs in each group that solved the games by playing a 422 strategy. The path-dependency effect is extraordinary strong. Analysis to assess the different use of the 422 and 442 strategies in the two groups can be made by using statistical tests. The Mann-Whitney-U test calculated for the two independent experimental groups is statistically significant (U =21,5 ; Z value -6,3375 , p < 0.0001 two tail).
As they were trained on two different sets of games, the subjects use more frequently the strategy that they learned first. Some pairs are able to discover and play the other strategy as well, but the difference between the two groups remains clear. In each group there is a large number of players who play the first learned strategy even when it is less efficient. This behavior implies that subjects stop to explore and discover the sub-goaling space and that they cope with new situations introduced in the second part of the tournament simply by using the strategies and the rules learned at the beginning.

Table 4. Percent of pairs within each group that played the two strategies
in the last 27 runs of the tournament

The path dependency effect is so strong that the reactions of players to almost all the starting configurations in the second part of the tournament are the opposite across groups: for example, when 2© is in Up position, and Colorkeeper has his key card 2§ in hand (see hands 17 and 32 in Table 4), almost all the players in the 422 group use the 422 strategy , while in the 442 group the majority of pairs play the other strategy.
In order to discriminate more sharply among different behaviors, it is convenient to check how many pairs behaved in a strongly routinized way and how many discovered the other strategy as they played the last 27 games. Figure 9 depicts this variety and provides a way to measure the strength of the routinized behavior in the last 27 games[7].

Figure 8. Percent of pairs within each group that played 442 strategy in the last 27 runs of the tournament.
An interesting feature of the distribution is that, between the two ext a system of condition-action rules (see Table 1) enabling the players to solve the game. The same system of conditions was used to classify the moves actually performed by the sixty pairs of subjects to solve the last 27 hands and to form clusters of pairs, based on Ward's method. Table 5 shows the pairs grouped into five clusters (rescaled distance of agglomeration = 2.6)[8]. The first cluster (the 422 routinized) consists of 12 pairs; they all belong to the 422 experimental group and they routinizedly play the 422 strategy. The second cluster (the 442 routinized) is made by 15 pairs all belonging to the 442 experimental group. Throughout the tournament these pairs learn only one strategy and play according to one single system of condition-action rules; they are identified in the tails of Figure 9 and their moves closely resemble the actions of routinized behavior reported in Table 1.

Cluster Cas Av. Std Av.mov Std es 422% Dev es Dev 422 routinized 12 92 7.4 6.1 0.4 behaviors 442 routinized 15 14 12.2 6.0 0.4 behaviors rational behaviors 11 42 15.8 6.2 0.7 helper behaviors 12 60 8.6 6.6 1,4 not well coordinated 8 24 20.2 6.1 0.3 behaviors
Table 5. Cluster analysis based on the moves performed to solve the last 27 hands.
During the second part of the tournament, the other pairs of subjects are able to explore the problem space further and to learn how to play a different strategy. As a consequence, the discovery of a new strategy dramatically raises a coordination problem, and the way pairs deal with it makes the difference among the next three clusters. The third cluster (rational behaviors) aggregates the pairs playing both strategies in a coordinate way. These subjects are able to change roles according to the strategy played and to use the information flowing from partner's action to coordinate their actions. Many pairs trying to use the two strategies find it extremely difficult to coordinate, in particular when the partner' actions can be interpreted in many different ways (see section 2.5 above): information is sometime ambiguous and players have great difficulties to decode it.
The fourth cluster (not well-coordinated behaviors) aggregates pairs in which only one subject discovered that there is another and different strategy with which to solve the games. Analysis of the condition-action rules of these pairs shows that one player is strongly routinized while his partner sometimes attempts to play the other strategy as well.
Finally, the fifth cluster (helper behaviors) consists of players behaving in a particular way: when one has the partner's key-card in his hand, he first offers the partner the card by playing Up, and only afterwards he looks for the complementary card (2) he needs. Even though this behavior is rather inefficient it may be the best solution when a player realizes that his partner has not understood how the game works.
Finally, a striking difference can be observed also between the present and the previous experiment. The difference can be evidenced by comparing the distribution in the tails (Figure 9 and Figure 5). In the new experiment (Figure 9) a larger number of pairs behaved in a routinized way and solved the games by using only one strategy. This difference can be ascribed to the path dependent character of the learning process .

5. Final remarks.
Target the Two admits two alternative sub optimal strategies for playing all the (second level) games. The working hypothesis of the experiment was that by exposing a group of players to a set of preliminary runs characterized by starting configurations all easily solved by the same strategy, they would be "induced" to discover this solution more easily than the alternative to it, to memorize it more deeply, and to routinize their behaviors accordingly.
The experiment shows the onset of persistent differentiation in players' behavior. The group of players exposed to a set of configurations which led more easily to one strategy continued to use it more frequently in the second part of the tournament, and symmetric behavior arose in the other group. Moreover, in both groups there emerged a sub set of players with strongly routinized behaviors, i.e. groups of player which, after the training phase, adopted one strategy once and for all, and insisted on using it even when hands could not be efficiently played with the strategy adopted.
We have experimental evidence that these routinized players were locked into a sub optimal strategy insofar as they used the same set of rules of action even when they was inefficient, being unable or unwilling to find alternative rules of action. Furthermore a variety of different types of behavior, all of them imperfectly rational and routinized, have been revealed via application of cluster analysis methods.
These findings give rise to new problems and suggest new directions for future research, of which we shall try to draw up a short list before concluding .
First: methodological issues related to tacit knowledge. One of the most relevant features of our results is that they are independent of the verbal explanations that players may provide of their actions. We simply interpret the "logic of action" of the players by observing behaviors at a fine grain level: the elementary micro behaviors (actions corresponding to different board conditions) of the subject are observed , retrieved and compared with the environmental conditions. Therefore we can assume that behaviors are the consequence of mental models and more stringently that they are the outcome of the execution of a set of condition-action rules. This assumption is useful if we wish to extend our ability to explore the features of routinized behaviors. We would stress, however, that the differentiation we have shown is at the level of micro-behaviors. Therefore the results we have achieved (sub-optimality, path dependency, asymmetry in routinization) remain valid whatever model of thinking we assume.
This does not rule out an explanation of these behaviors which relies on a model of individual rationality and thought. We noted before that it is possible to use the experimental tests of cognitive psychology as complementary tools in verification of many aspects of the routinization process. We made some preliminary exploration in this direction by submitting players to a sequence of tests after the tournament which required them to verbalize their decisions in relation to some board configurations of Transform the Target.
The preliminary findings seem to indicate that even when following a fixed set of behavioral rules, players neither have a clear symbolic representation of these rules, nor store the complete set of them in long term memory. It seems to be confirmed that players are only partially aware of the set of rules they apply, and that their knowledge of the "logic of the game" is incomplete.
Second: are fully rational behaviors more efficient that routinized one? The previous results suggest that attention should be paid to the relationship between rationality and efficiency. We have shown that routinized behaviors imply that players' decisional links are disentangled: if they follow the same strategy for all runs, they must search for two complementary key-cards without interacting directly. The two goals can be achieved independently, and coordination is embodied in the strategy adopted via complementarity of the sub-goals. By contrast, fully rational behavior requires a dynamic coordination between the two players.
Therefore if we measure the efficiency of a pair by counting the number of moves they make to play the game, we find pairs playing with fully rational style but who are inefficient, because their errors are amplified by the dynamic rules of coordination and their performances are seriously impaired. Hence players who try to adopt the rational strategy , but are not fully familiar with it may be less efficient than purely routinized players .
On the contrary, expert players who are familiar with the two sub optimal strategies and use it without errors should play more efficiently than strongly routinized players. But this is not always true.
In fact, on the one hand, an expert player greatly reduces the computational effort required to play rationally by using the two sets of alternative rules of action (442 and 422) as routinized building blocks in his mental exploration of the problem. But, on the other hand , not all game configurations can be played efficiently by experts: as we have shown in section 2.4 , there are game configurations in which the actions generated by one player cannot clearly be interpreted by the partner; information in these situations is ambiguous, and routinized behavior is more efficient precisely because it does not use ambiguous information and therefore reduces errors.
Summing up, there are situations in which behaving in a routinized way is more efficient than using all available information. This happens when players cannot decode all available information, or to use a different expression, when players cannot the reduce Competence Gap (Heiner 1983).[9]
Third: knowledge incompleteness and persistency of differentiation. The above considerations suggest that in real organizations micro-learning activity is the fundamental force at work, insofar as it either enables actions to be stabilized into routinized behaviors or gives rise to a search for new alternative routines. Situations in which an individual's activity is fully routinized, i.e. when all possible contingencies are covered with memorized actions rules, are extreme cases, while in general memorization and routinization are partial and incomplete, and the learning process allows subjects to repair and complete the gaps in memory. This feature of human learning shows that different degrees of routinization can exist in human behaviors. In our experiments we observed that many players did not merely execute a set of condition - action rules. Their activity is supported by a persistent micro-learning activity. This activity enabled some players to escape a lock-in and move toward exploration of a new set of rules. It is intuitive that the more solid the sets of cooperative rules that the two players establish, the lower will be the incentive to escape the lock-in. In fact, a set of well consolidated rules can solve every situation, and even though there are configurations that can be solved more efficiently, by using the alternative strategy, the effort of jointly discovering the alternative strategy may be higher that the "price" of using a well known strategy under unfavorable circumstances. We suggest that this element, the "sunk costs" of the search process and of the accumulated knowledge, may explain how it happens that path dependency gives rise to persistent differentiation in mental models and behavioral rules (North and Denzau 1994).
The elements which prevent or activate the learning process are therefore keys in understanding more profoundly what characterizes the path dependent creation of different "mental models", and their persistence over time.
References

Allais M. (1953) "Le comportement de l' homme rationel devant le risque: Critique des postulats et axiomes de l"Ecole Américaine", Econometrica 21, pages 503-46.
Anderson, J.R. (1983). The architecture of cognition, Cambridge, MA: Harvard University Press.
Arthur W. B. (1988) "Self reinforcing mechanisms in Economics", in Anderson W.P. and Arrow K.J. (editors) The economy as an evolving Complex System Redwood City, CA: Addison Wesley, pages 9-31.
Arthur W. B.(1989) "Competing Technologies, Increasing Returns and Lock-in by Historical Events", The Economic Journal, vol.99, n.394 , pages 116-131.
Arthur W. B. (1994) Increasing returns and path dependence in the economy. Ann Arbor: University of Michigan Press.
Arthur W.B., Ermoliev Y.M., Kaniovsky Y.M. (1983) "The generalized urn Problem and Its Applications", Kibernetika , 1, pages 49-56 (in Russian).
Arthur W.B., Ermoliev Y.M., Kaniovsky Y.M. (1987) "Path Dependent Processes and the Emergence of Macro-Structure", European Journal of Operation Research, 1, pages 294-303.
Cohen M. D. (1991) "Individual learning and organizational routine: Emerging connections", Organization Science, 2 (1) pages. 135-139.
Cohen M. D. and Bacdayan P. (1994) "Organizational Routines Are Stored as Procedural Memory: Evidence Form a Laboratory Study" , Organization Science, Vol.5, N.4, pages 554-568.
Cohen M. D. Burkhart R., Dosi G. Egidi M., Marengo L., Warglien M., Winter S. (1955) Routines and Other Recurring Action Patterns of Organizations: Contemporary Research Issues. Santa Fé Institute Working Paper , Santa Fé, New Mexico.
Cutland N. J. (1988) Computability - An Introduction to recursive function theory, Cambridge University Press : Cambridge.
David P. A. (1989) "The Future of Path-Dependent Equilibrium Economics", in Stanford Center for Economic Policy Research Discussion Paper Series: 155, August.
David P. A. (1988) "Path-Dependence: Putting The Past Into The Future of Economics", Stanford Institute for Mathematical Studies in the Social Sciences (Economic Series) Technical Report: 533, August.
Denzau A.T. and North D. C. (1994) "Shared Mental Models: Ideologies and Institutions", Kyklos; 47(1), pages 3-31.
Dosi G. and Kaniovski Y. (1994) "On 'Badly Behaved' Dynamics: Some Applications of Generalized Urn Schemes to Technological and Economic Change", Journal of Evolutionary Economics; 4(2), June, pages 93-123.
Egidi, M. (1994) "Routines, hierarchies of problems, procedural behavior: some evidence from experiments", IIASA working Paper WP-94-58 July . To appear in K. Arrow et alii (editors) The Rational Foundations of Economic Behavior, MacMillan, in Press.
Ericsson, K.A., Simon, H.A. (1984) Protocol Analysis, The MIT Press: Cambridge, MA.
Galor, O., Tsiddon, D. (1989) " Technological Breakthroughs and Development Traps", Brown University Department of Economics Working Paper: 89-31.
Heiner, R A (1983), "The Origin of Predictable Behaviours", American Economic Review
Hill B.M., Lane D., Sudderth W.(1980) "A Strong Law for some Generalized Urn Processes" Annals of Probability 8 :214-226
Holland, J. H. (1975) Adaptation in natural and artificial systems, Ann Arbor: University of Michigan Press.
Holland, J. H., Holyoak, KJ., Nisbett, R.E., Thagard P.R., (1988) Induction - Processes of Inference, Learning, and Discovery , Cambridge (Mass) : MIT Press.
Johnson-Laird, P.N. (1983) Mental Models, Harvard University Press.
Kahneman, D., Tversky, A. (1986) "Rational choice and the Framing of Decisions", in Hogart R. M. , Reder M. W. Rational choice - The Contrast between Economics and Psychology , Chicago, The University of Chicago Press.
Kauffman, S.A. (1993) The origins of order: self-organization and selection in evolution, Oxford University Press.
Kauffman, S. A. (1988) "The evolution of economic webs" in Anderson P.W., Arrow J. and Pines D. The Economy as an Evolving Complex System" Santa Fe Institute Studies in the Science of Complexity,Vol 5, Addison Wesley, Reading, Mass.
Kauffman, S. A. (1989) "Adaptation on Rugged Fitness Landscapes" in Lectures in the Sciences of Complexity, edited by Stein D. L. Santa Fe Institute Studies in the Sciences of Complexity, Vol I, pp.619-712. Redwood City, CA: Addison Wesley.
Kauffman, S. A., Johnsen, S. (1992) "Co-evolution to the Edge of Chaos: Coupled Fitness Landscapes, Poised States, and Co-Evolutionary Avalanches", in Langton C.G., Taylor L:, Farmer J. D. , Rasmussen S. Artificial Life II, Redwood City, CA: Addison Wesley.
Laird, J.E., Newell, A., Rosembloom, P.S. (1987) "Soar: An architecture for general intelligence", Artificial Intelligence 33, pages 1-64.
Levinthal, D., (1994), "Adaptation in Rugged Landscapes", mimeo.
Levitt, B., March, J.G. (1988) "Organizational Learning", Annual Review of Sociology, 14, pages 319-340.
Luchins, A.S (1942) "Mechanization in Problem-Solving" , Psychological Monograph, 54, pages 1-95.
Luchins, A.S., Luchins, E.H (1950) "New experimental Attempts in Preventing Mechanization in Problem-Solving", The Journal of General Psychology, 42, pages 279-291.
March, J. G. (1990) "Exploration and exploitation in Organizational Learning, Organization Science, 2, pages 71-87.
March, J.G. (1981) Footnotes to organizational change, Administrative Science Quarterly, 26, pages 563-577.
March, J. G., Simon H. A. (1958) Organizations, (1993 2nd edition) New York: John Wiley.
Minsky, M. (1967) Computation. Finite and Infinite Machines, Englewood Cliffs, Prentice-Hall.
Nelson, R. R., Winter, S. (1974) "Neoclassical vs. Evolutionary Theories of Economic Growth: Critique and Prospectus", The Economic Journal n. 4 (336) pp. 886-905.
Nelson, R. R., Winter, S. (1982) An Evolutionary Theory of Economic Change, Cambridge (Mass) :The Belknap Press of Harward University Press .
Newell, A. (1990) Unified Theories of Cognition, Cambridge (Mass): Harward University Press.
Newell A., Simon H.A. (1972) Human Problem Solving, Englewood Cliffs, N.J. : Prentice Hall.
Nilsson, N,J. (1971) Problem Solving Methods in Artificial Intelligence, New York: McGraw Hill.
Nilsson, N. J.(1980) Principles of Artificial Intelligence, Palo Alto (Calif.) :Tioga.
Nisbett, R. Wilson, T.D. (1977) "Telling More than We Can Know: Verbal Reports on Mental Processes", Psychological Review, 84, pages 231-259.
North,D. C.(1991) Towards a Theory of Institutional Change Quarterly Review of Economics and Business; 31(4), pages 3-11.
Polanyi, M. (1958) Personal Knowledge: Towards a Post-Critical Philosophy, London: Routledge and Kegan.
Singley, M. K., Anderson J.R. (1989) The Transfer of Cognitive Skill, Cambridge, Massachusetts, Harward University Press.
Rosenberg, N. (1994) Exploring the black box: Technology, economics, and history Cambridge: Cambridge University Press.
Simon, H. A. (1971) "Theories of Bounded Rationality", in McGuire B. and Radner R. (editors.), Decision and Organisation , Amsterdam, North-Holland
Weisberg, R.(1980) Memory: Thought and Behavior , New York: Oxford University Press.
Wright, S. (1932) "The roles of mutation, inbreeding, crossbreeding and selection in evolution", Proceedings of the Sixth International Congress on Genetics, 1, 356.

e