However, the obstacle was the latch on the door. The cat made random movements inside the box indicating trial and error type of behavior biting at the box, scratching the box, walking around, pulling and jumping etc. to come out to get the food.
Now in the course of her movements, the latch was manipulated accidently and the cat came out to get the food. Over a series of successive trials, the cat took shorter and shorter time, committed less number of errors, and was in a position to manipulate the latch as soon as it was put in the box and learnt the art of opening the door.
An analysis of the learning behavior of the cat in the box shows that besides trial and error the principles of goal, motivation, explanation and reinforcement are involved in the process of learning by Trial and Error.
E. L. Thorndike had a powerful impact on both psychology and education. Thorndike experimented on a variety of animals like cats, fishes, chicks and monkeys. His classic experiment used a hungry cat as the subject, a piece of fish as the reward, and a puzzle box as the instrument for studying trial-and-error learning
1. Multiple Response: in any given situation, the organism will respond in a variety of ways if the first response does not immediately lead to a more satisfying state of affairs. Problem solving is through trial and error. A learner would keep trying multiple responses to solve a problem before it is actually solved
Thorndike also established that learning is the result of a trial-and-error process. This process takes time, but no conscious thought. He studied and developed our initial concepts of operant conditioning reinforcement and how various types influence learning.
The desired position and within each stage without consequences of fear and anxiety, fear from the desired position gradually disappears. Thus, it is possible to use this method for procedures of student training. One of the other proposed theories in this subset is the Thorndike theory, which is described as the selectivity or choosing a response among a set of organism available responses and transplant that respond to the driving position. Therefore, Thorndike learning method was named learning through trial and error. In summary, Thorndike stated that in a learning or problem-solving situation, the learner answers repeatedly up to an appropriate position for one of the responses (or to solve the problem). Thorndike quoted that it was brought him to a satisfying situation. This response is learned and in similar situations of learning is repeated by the learner again.
In nursing education, it is possible to provide access to skills by performing the procedures through using the mannequins. Thus, by using the harmless trial and error method, the students will gain to the desired skills. Satisfying results will lead to its strengthening. Unpleasant results cause the students to find alternative answers through trial and error and eventually reach to the correct answer for each question. These results may be the observation of satisfied clients, the classmates, or the teacher applauded
Trial-and-error learning is a universal strategy for establishing which actions are beneficial or harmful in new environments. However, learning stimulus-response associations solely via trial-and-error is often suboptimal, as in many settings dependencies among stimuli and responses can be exploited to increase learning efficiency. Previous studies have shown that in settings featuring such dependencies, humans typically engage high-level cognitive processes and employ advanced learning strategies to improve their learning efficiency. Here we analyze in detail the initial learning phase of a sample of human subjects (N = 85) performing a trial-and-error learning task with deterministic feedback and hidden stimulus-response dependencies. Using computational modeling, we find that the standard Q-learning model cannot sufficiently explain human learning strategies in this setting. Instead, newly introduced deterministic response models, which are theoretically optimal and transform stimulus sequences unambiguously into response sequences, provide the best explanation for 50.6% of the subjects. Most of the remaining subjects either show a tendency towards generic optimal learning (21.2%) or at least partially exploit stimulus-response dependencies (22.3%), while a few subjects (5.9%) show no clear preference for any of the employed models. After the initial learning phase, asymptotic learning performance during the subsequent practice phase is best explained by the standard Q-learning model. Our results show that human learning strategies in the presented trial-and-error learning task go beyond merely associating stimuli and responses via incremental reinforcement. Specifically during initial learning, high-level cognitive processes support sophisticated learning strategies that increase learning efficiency while keeping memory demands and computational efforts bounded. The good asymptotic fit of the Q-learning model indicates that these cognitive processes are successively replaced by the formation of stimulus-response associations over the course of learning.
Humans and other animals can learn how to respond to novel stimuli by incrementally strengthening or weakening associations between stimuli and responses based on feedback. Q-learning, which is based on a delta learning rule, has been established as the standard computational model for associative learning. By comparing the Q-learning model with alternative computational models, we investigate human learning strategies in a simple trial-and-error learning task, where stimuli mapped onto responses one-to-one and correct responses were invariably rewarded. We find that humans can learn more efficiently than predicted by the Q-learning model in this setting. Specifically, we show that some subjects systematically went through the response options and made inferences across stimuli to improve their learning speed and avoid unnecessary errors during the initial learning phase. However, after the initial learning phase, the Q-learning model provided a better prediction than the competing models. We conclude that human learning behavior in our experimental task can be best explained as a mixture of sophisticated learning strategies involving high-level cognitive processes at the beginning of learning, and associative learning facilitating further performance improvements at later learning stages.
Citation: Mohr H, Zwosta K, Markovic D, Bitzer S, Wolfensteller U, Ruge H (2018) Deterministic response strategies in a trial-and-error learning task. PLoS Comput Biol 14(11): e1006621.
The trial-and-error learning task was performed by N = 85 subjects. For each subject, it was tested in descending order (see main text for details) which model provided the best fit for the initial learning phase. For 43 subjects (50.6%), the DRP models outperformed the FOP, BP and Q-learning models, with 36 subjects following the dfkl response pattern and 7 subjects following the lkfd pattern. Of the remaining subjects, 18 subjects (21.2%) showed a tendency towards generic optimal learning, while 19 subjects (22.3%) partially exploited stimulus-response dependencies. Q-learning was never significantly better than FOP or BP on the initial learning phase. Five subjects (5.9%) could not be assigned to a model-specific subsample.
A: Learning curves of the initial learning phase from trial 1 to 17. For the DRP subsample, the DRP, FOP and BP models provided a markedly better fit to the human learning curve than the Q-learning model. The DRP models improved the fit compared to the FOP and BP models for the first few trials. Within the FOP subsample, again both FOP and BP outperformed Q-learning, with the FOP model providing a marginally better fit than the BP model. For the BP subsample, the FOP and BP learning curves were indistinguishable but again fitted markedly better than Q-learning. Vertical lines indicate standard errors of the mean. B: Learning curves of the initial learning phase from trial 1 to 32. These data are shown for the sake of completeness in addition to the truncated learning curves shown in A. As the initial learning phase ended in 75% of the blocks before trial 18, estimates became increasingly unreliable after trial 17, see also S5 Fig. C: Learning curves including trials of the initial learning phase and the subsequent practice phase. While the DRP, FOP and BP models became stationary when the initial learning phase ended, the Q-learning model further strengthened its associations between stimuli and responses, resulting in the best asymptotic fit on all three subsamples. Note that maximum likelihood estimates of the response selection noise parameter τ were consistently larger than zero, thus the asymptotic performance of the DRP, FOP and BP models was below 100%.
Using computational models to analyze the initial learning phase of a trial-and-error learning task with deterministic feedback and hidden stimulus-response dependencies, we found that about 50% of the subjects employed deterministic response patterns to increase learning efficiency. Most of the remaining subjects either showed a tendency towards generic optimal learning, or performed better than predicted by pure associative learning by partially exploiting stimulus-response dependencies. A detailed analysis of specific error types showed that only the DRP model could generate the variability found in the human data, whereas the other three models were unable to reproduce this variability.
The novel contribution of the results presented here is that they demonstrate that human learning strategies can be characterized beyond a general trend towards the optimal learning strategy. For 50% of the subjects, the initial learning phase was better explained by the DRP models than by FOP. Thus, these subjects did not select responses arbitrarily from the set of theoretically optimal responses, as predicted by FOP, but instead implemented a response selection procedure that determined a unique response in every trial. On the presented trial-and-error learning task with deterministic feedback, this was a highly adaptive learning strategy: Although being equivalent to FOP from a theoretical point of view, DRPs were more efficient from the human perspective as they considerably reduced working memory and computational demands. Indeed, using DRPs, only the correct or designated response for each stimulus had to be maintained in working memory, whereas FOP required tracking all 24 S-R mappings. Computational costs were also significantly reduced, as the DRPs only required counting up to the next free response in case of negative feedback or storing the correct response in case of positive feedback, whereas FOP required computing response probabilities by averaging across all S-R mappings consistent with the S-R-O history. Moreover, subjects could choose their preferred response order, which was arbitrary from a theoretical point of view, but not from the human perspective, as evidenced by the strongly non-uniform distribution across response orders (S1 Fig). 2b1af7f3a8