A. E. Henninger, A. J. Gonzalez, M. Georgiopoulos, and R. F. DeMara, "The Limitations of Static Performance Metrics for Dynamic Tasks Learned Through Observation,"in Proceedings of the Tenth Conference on Computer Generated Forces and Behavioral Representation (CGF-BR'01), pp. 147 - 154, Norfolk, Virginia, U.S.A., May 14 - 17, 2001. Abstract: A recent report developed by the National Research Council (NRC) for the Defense Modeling and Simulation Office (DMSO) encourages the use of real world, war- gaming, and laboratory data in support of the development and validation of human behavioral models for military simulations. This paper reviews existing validation metrics used in human behavioral modeling exercises and discusses the limitations of these metrics. Common to the metrics examined is the fact that they have been applied to a specific type of human behavioral model, a low-level, reactive skill model. These models, in turn, are related by the fact that they have been developed through some form of learning by observation. Thus, the scope of this paper is constrained to reviewing fidelity metrics of low-level, reactive skill models that have been created from observational data. To this end, it is assumed that model fidelity is correlated with similarity to true human performance. This paper will be designed around two metrics found in the literature on skill models developed through learning by observation and it will give detailed illustrations showing where these metrics are deficient. It is anticipated that through the explanation of where these metrics fall short, improved metrics can be developed. Concepts for improving these metrics and candidate metrics are presented, but a completely functional alternative has not yet been investigated