Evaluation of the biceps tendon reflex in dogs
- 1. Small Animal Clinic (WE20), Department of Veterinary Medicine, Freie Universität Berlin, Germany
- 2. Southern Counties Veterinary Specialists, Unit 6 Forest Corner Farm, UK
- 3. Department of Clinical Science and Services, Royal Veterinary College, University of London, UK
Abstract
The biceps tendon reflex (BTR) of thirty-two dogs with a median age of 5 (0.5- 15) years and a median weight of 17.5 (5.8-57) kg was assessed by two examiners. The examinations were videotaped and evaluated by 12 observers. The observers were divided in three groups depending on level of expertise (neurologists, veterinary surgeons and students). Each group evaluated the reflex-presence and reflex-briskness. Kappa-analysis and Intercorrelation-coefficient (ICC) were applied for analysis of interobserver-agreement.
Logistic regression analysis was used to investigate the influence of sex, age, weight, fur length and examiner on the interobserver-agreement. The interobserver-agreement was highest for the neurologist-group and lowest for the student-group. Neither sex, weight, age, fur length or the person who did the exam influences the interobserver-agreement. The level of expertise is an influencing factor on interobserver-agreement of canine BTR evaluation. In healthy dogs the BTR can be reliable assessed by veterinary neurologists. The clinical significance is still unknown as the BTR was only assessed in healthy dogs.
Keywords
• Biceps tendon reflex
• Dog; Interobserver-agreement
Citation
Giebels F, Kohn B, Shihab N, Volk HA, Loderstedt S (2014) Evaluation of the biceps tendon reflex in dogs. J Vet Med Res 1(3): 1013
ABBREVIATIONS
ANOVA: Analysis of Variance; BTR: Biceps Tendon Reflex; CI (95%): 95% Confidence interval; ICC: Inter Correlation Coefficient; KC: Cohen´s Kappa; KF: Fleiss-Kappa; KW: Weighted Kappa; SE: Standard Error; SEM: Standard Error of the Mean
INTRODUCTION
In veterinary literature, authors rarely report the biceps tendon reflex (BTR) when describing the neurological examination findings in dogs. The BTR has been described to be non-consistent and challenging to elicit [1-4]. In human medicine, on the other hand the BTR is thought to be reliable and is commonly used for the assessment of the integrity of the cervical segments C5-C6 and the brachial plexus and also for the diagnosis and follow-up of cervical myelopathies [5,6]. In dogs, the musculocutaneous nerve, the function of which is tested by this reflex, originates from the spinal-cord-segments C6-C8 and innervates the canine biceps brachii muscle, a flexor of the elbow. The reflex response involves an elbow flexion and/ or movement over the biceps brachii muscle [3].
In general, segmental spinal cord reflexes can be influenced by a number of factors [4,7] and its evaluation is highly subjective [8,9]. It remains, however, an integral part of the neurological examination to determine the neuroanatomical localization of a lesion and therefore requires being comparable between different observers [10]. Different studies in human medicine aimed to identify such influencing factors and objectify the reflex-activity by different standardization procedures of the examination and the evaluation of the reflex-activity [8,11-13].
The aims of this study were: (1) evaluation of interobserveragreement of BTR assessment depending on the observer´s level of expertise and (2) detection of influencing factors for the BTR response.
MATERIALS AND METHODS
Thirty-two dogs of different breeds with a median age of 5 (0.5-15) years and a median weight of 17.5 (5.8-57) kg were included. There were twenty female and twelve male dogs. Eleven (34.4%) dogs were mixed breed (Table 1). Including criteria were a normal clinical, orthopedic and neurological examination and no history of neurological disorders. Examinations were performed by two of the authors (FG, SL) and videotaped under standardised conditions: same room, fixed camera position, lateral recumbency of the dog, same reflex-hammer. Each dog was examined by one or both examiners within one hour, each examination took about 3 minutes. The owner was watching the dog´s head during the examination. Dogs were anonymised by randomised numbering. The examination footage was evaluated by 12 observers. Observers were divided into three groups of four depending on their level of expertise: veterinary neurologists (Group 1), veterinary surgeons without special affinity to neurology and three to four years of work experience (Group 2) and final year veterinary students (Group 3). Both examiners were included in Group 1 and evaluated the anonymised video in the same manner. Each observer evaluated if the reflex was present or absent (0=absent, 1=present) and scored the level of reflex briskness using a scoring-scale (0=absent; 1=reduced; 2=normal; 3=exaggerated; 4=clonus) [14]. One dog had an amputated right forelimb and two examination-videos had to be excluded due to poor quality resulting in sixty-one examined thoracic limbs. Forty-two thoracic limbs of these were examined by both examiners and nineteen by only one examiner, so that altogether one hundred and three examination-sequences were observed.
Data analysis
For statistical data analysis SigmaPlot 11.1 (Systat Software Inc.) and SPSS Statistics 22.0 (IBM) were used. The results were tabulated in a ‘table of agreement’ [15] depending on their level of agreement among each group. Interobserver-agreement between the observers and between the groups was analysed using Kappa analysis. The Cohen´s Kappa (KC )- and weighted Kappa (KW)- values were calculated for each pair of observers, including the pair of examiners, within each group. According to Landis and Koch (1977) the strength of agreement was designated as ‘poor’ (Κ<0.0), ‘slight’ (0.0≤Κ≤0.2), ‘fair’ (0.21≤Κ≤0.4), ‘moderate’ (0.41≤Κ≤0.6), ‘substantial’ (0.61≤Κ≤0.8), and ‘near perfect to perfect’ (0.81≤Κ≤1.0). ANOVA was used to test significance of the mean ΚC , mean ΚW- and Fleiss-Kappa (KF )-values between the three groups of observers. For every K-value standard error (SE) and the 95% Confidence interval (CI 95%) were calculated. KF does not take into account the grade of discrepancy between the observers, so the Intercorrelation Coefficient (ICC; twoway random, absolute agreement definition) was additionally calculated for the reflex-briskness for each group. The ICC of each group was compared to each other using one-way ANOVA under estimation of the standard error of the mean (SEM). The p-values were Holm-Šidak adjusted. Furthermore, data from reflexpresence evaluation were used for regression analysis. The dogs were subdivided into groups depending on sex, weight, age, fur length and the examining person (Table 2). The interobserveragreement was set as dependent variable, all others were covariates. P<0.05 was considered significant.
Number | Sex | Breed | Age (years) |
Weight (kg) |
1 | fa | Mixed | 0,5 | 16 |
2 | f | Bavarian scenthound | 2 | 19 |
3 | f | Marshall Beagle | 4 | 10 |
4 | f | Mixed | 3 | 8 |
5 | f | Mixed | 2 | 16 |
6 | f | Dalmatian | 6 | 26,5 |
7 | f | Giant Schnauzer | 2 | 18 |
8 | f | Mixed | 6 | 17 |
9 | m b | Golden Retriever | 9 | 39 |
10 | m | Mixed | 12 | 18,5 |
11 | m | Mixed | 7 | 30 |
12 | f | Mixed | 2 | 27,5 |
13 | f | Labrador | 2,5 | 25,5 |
14 | f | Mixed | 1,25 | 10 |
15 | f | Mixed | 8,5 | 23 |
16 | f | Labradoodle | 4,5 | 23 |
17 | f | Belgian Malinois | 7,5 | 24,7 |
18 | f | French bulldog | 4,5 | 10,6 |
19 | m | Greater swiss mountaindog | 7 | 57 |
20 | f | Mixed | 6 | 11,8 |
21 | m | Labrador | 1,5 | 28 |
22 | m | Australianshepherd | 1 | 12 |
23 | m | Australianshepherd | 2,5 | 15 |
24 | f | Australiancattledog | 5,5 | 20 |
25 | m | BostonTerrier | 14 | 6,4 |
26 | m | Bernese mountaindog | 6,75 | 40,5 |
27 | f | Dachshund | 12,2 | 14,75 |
28 | m | German Shorthairedpointer | 2,5 | 40 |
29 | m | Wire-haired dachshund | 12 | 5,8 |
30 | m | YorkshireTerrier | 4,5 | 5,8 |
31 | f | AustralianTerrier | 11,25 | 10 |
32 | f | Mixed | 6 | 8,4 |
RESULTS
Analysis of reliability
a) Reflex-presence: Mean KC (0.706)- and KF (0.753)-value are significantly highest (‘substantial’) for Group 1, where the highest KC -value (0.852) is ‘near perfect to perfect’. Group 2 shows the second highest (‘fair’) mean KC (0.401)- and KF (0.380)-value and Group 3 has the lowest interobserver-agreement in the reflex-presence evaluation (‘fair’) with a mean KC -value of 0.313 and a KF -value of 0.304. The KC -value for the pair of examiners was 0.658 (‘substantial’) (Figure 1).
b) Reflex-briskness: The number of Complete agreementevaluations was highest for Group 1 (65.1%; n=67) and lowest for Group 3 (24.3%; n=25). Group 2 shows most often a discrepancy of one point among the observers; Group 3 has the highest number of two-point-discrepancy decisions. The total amount of non-agreement-decisions (sum of Partial (dis)agreement and Complete disagreement-decisions) is 36/103 (35%) for Group 1; 66/103 (64.1%) for Group 2 and 79/103 (76.7%) for Group 3 (Figure 2).
Mean KW (0.542)- and the KF (0.331)-value are highest for Group 1. For Group 1 and 2 the mean KW is within the ‘moderate’- level, whereas the KF -value for both groups reach the ‘fair’-level. Group 3 shows the significant lowest interobserver-agreement among the groups, where the mean KW is ‘fair’ (0.286) and the KF ‘slight’ (0.17). No significance could be shown between Group 1 and 2. The KW-value for the pair of examiners was ‘moderate’ (0.445), which is the lowest KW-value within Group 1 (Figure 3).
ICC of Group 3 (0.321) is significant lower compared to Group 1 and 2. Group 1 has the highest ICC (0.557). The difference compared to Group 2 (0.483) is not significant (Figure 3).
Correlation analysis
None of the parameters sex, age, weight, fur length or examiner show a significant influence on the interobserveragreement of the reflex-presence.
DISCUSSION
Our data show that the interobserver-agreement of the BTR assessment in dogs increases with the observer’s level of expertise, with experienced observers having a high level of agreement. None of the examined parameters influenced the interobserver-agreement in any of the three groups.
Several studies examined the reliability of different spinal reflexes in veterinary medicine [16-20]. De Lahunta and Glass (2009) for veterinary and Litvan et al. (1996) for human medicine stated that the patellar reflex has the highest reliability. De Lahunta and Glass (2009) stated their doubts regarding the reliability of tendon reflexes in the thoracic limbs. However, this statement is based on personal experience rather than on a systematic investigation of the interobserver-agreement. To the authors’ knowledge the present study is the first prospective analysis on interobserver-agreement of a tendon reflex in veterinary medicine.
In general, three factors can influence the interobserveragreement: the examining person, the examined subject and the examination itself [21]. Perfect agreement is highly unlikely in clinical studies [22] and many medical studies using Kappa analysis for reflex-evaluation score a ‘moderate’ interobserveragreement [10, 22-27]. Our data show that the canine BTR can be assessed reliably by experienced observers. However, comparing the results of different studies is critical since they differ in methodology, examiner`s expertise, task characteristics or the applied scoring scale [25]. Besides the naturally occurring interobserver variability, other factors were described in both human and veterinary medicine to have an influence on the reflex-activity such as intramuscular temperature, muscle tone, positioning of the subject, patient´s age, body weight, and the used armamentarium [7,28,29]. Increased muscle tone, fear or anxiety [30,31] and a stress-induced hyperthermia of the patient [29] might influence the neurological examination of the veterinary patient. Several studies in human medicine attempted to quantify the reflex response more objectively and in this way make it more reliable and comparable between observers [32,33]. Most of the studies that identified influencing factors of the reflex-activity used electromyography (EMG) [7,15,20,31]. Due to the high vulnerability to confounding factors such as the activity of adjacent muscles [9,31,34] or the alertness [29] of the examined subject, EMG needs to be performed under general anesthesia in veterinary medicine [34] and thus is not performed routinely during the neurological exam. The aim of this study was to test the reliability of the BTR in a clinical setting to establish if this test is useful for the routine neurological examination and therefore EMG studies were not performed.
We standardized the examination procedure by using standardized conditions for each subject, a scoring-scale and video-analysis. Although video-analysis does not reflect clinical settings, several studies used it for interobserver-agreement calculation [8,27,35]. When evaluating a videotape the setup is the same for each observer and thus its results can be considered comparable. ‘Myotatic reflex scales’ [13] were developed as an instrument to increase the comparability of reflex evaluation among different observers [32] and are routinely used in veterinary and human medicine [11-14]. Manschot et al. (1998) named three criterions for a scoring-scale to increase interobserver-agreement: not too many categories, unambiguous formulated categories and the possibility to distinguish between ‘normal’ and ‘abnormal’. The higher the number of categories a scoring-scale has and the greater the number of observers that are included the lower the interobserver-agreement [12,22,24]. This is in agreement with the results of the current study where mean KC -values are higher for each of the three groups compared to mean KW-values. Additionally KF -values are higher for the dichotomous scoring of the reflex-presence than for the fivepoint scoring-scale of reflex-briskness.
O`Keeffe et al. (1994) stated that the performed technique influences the reliability between observers. The examination technique differs slightly between both examiners in our study; however, the scoring did not show any significant difference between them and therefore can most likely be neglected [22]. Different authors showed that knowledge of the patient´s history improve significantly the reliability [10,26]. Except of the two examiners none of the observers was aware of the unremarkable clinical, orthopaedic and neurological examination and absence of any history of neurological disorders within the study population. Nevertheless the agreement for the non-blinded pair of examiners was only medium for the reflex-presence (KC =0.658) and lowest for the reflex-briskness (KW=0.445) within Group 1. Additionally a high interobserver-variability could be seen in studies, where the examiner was also observer [12,15,26]. Pairwise comparison of the KC - and KW-values in this study found no significant difference between pairs of observers within each group, which agrees with previous studies [10]. In difference to previous investigations [8,27,35] the observers in our study were not introduced into the evaluation of the reflex response since this was part of the hypothesis.
Dafkin et al. (2013) evaluated the patellar reflex in humans and found that most of the observers rely on the change in knee angle and the maximum angular acceleration. Interestingly none of the examined mechanical factors in her study correlated in respect of scoring within the neurologist-group and the authors assumed that the observers rely on a non-examined variable or an unquantifiable clinical skill. In the present study, which factors influenced the observers scoring was not analyzed, but some observers mentioned difficulties in evaluating longhaired dogs and the difficulty to distinguish between breathing movement and the contracture of the biceps muscle in panting dogs. Correlation analysis showed that the fur length didn’t influence the interobserver-agreement.
An increase of interobserver-agreement through training of the examining person is often assumed [22,37-40]. Nevertheless in the present study the KC - and KW-values of two of the authors (FG – PhD student) and one of the board-certified neurologists (HV) within Group 1 are at least ‘moderate’ (0.445) with the highest KC -value (0.852) of all observer-pairs reaching the ‘near perfect to perfect’-level. Dafkin et al. (2013) found no correlation between the accuracy of reflex assessment and the observer´s level of expertise; our data provide a difference in interobserver-agreement that was significant between Group 1 and 3. The interobserver-agreement for reflex-presence- and reflex-briskness-evaluation increases with the observer´s level of expertise.
Our results suggest that in healthy dogs the canine BTR can be reliably assessed by veterinary neurologists. Further examinations on dogs with lower motor neuron lesions in the thoracic limbs are needed to prove the usefulness of the BTR in the neurological examination in dogs.
Table 2: Subdivision of the dogs dependingon the covariates sex, weight, age, furlength and the examining person.
Covariate | Total | Percentage (%) |
Sex male female |
||
39 | 37,9 | |
64 | 62,1 | |
Weight <15kg 15-30kg >30kg |
||
42 | 40,8 | |
46 | 44,7 | |
15 | 14,6 | |
Age <3a 3-7a >7a |
||
38 | 36,9 | |
43 | 41,8 | |
22 | 21,4 | |
Furlength longhairedshorthaired |
||
38 | 36,9 | |
65 | 63,1 | |
ExaminerSL FG | ||
47 | 45,6 | |
56 |
54,4 |
CONCLUSIONS
The interobserver-agreement of canine BTR evaluation depends on the observer’s level of expertise. Neither sex, weight, age, fur length nor the examining person influenced the interobserver-agreement of the BTR evaluation. In healthy dogs the BTR can reliably be assessed by veterinary neurologists.
CONFLICT OF INTERES
T None of the authors of this paper has a financial or personal relationship with other people or organisations that could inappropriately influence or bias the content of the paper.