Volume 4, Issue 6 p. 1603-1615
RESEARCH ARTICLE
Open Access

Towards more effective identification keys: A study of people identifying plant species characters

Jana Wäldchen

Corresponding Author

Jana Wäldchen

Max Planck Institute for Biogeochemistry, Jena, Germany

German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany

Correspondence

Jana Wäldchen

Email: [email protected]

Search for more papers by this author
Hans Christian Wittich

Hans Christian Wittich

Data Intensive Systems and Visualisation, Technische Universität Ilmenau, Ilmenau, Germany

Search for more papers by this author
Michael Rzanny

Michael Rzanny

Max Planck Institute for Biogeochemistry, Jena, Germany

Search for more papers by this author
Alice Fritz

Alice Fritz

Max Planck Institute for Biogeochemistry, Jena, Germany

Search for more papers by this author
Patrick Mäder

Patrick Mäder

German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany

Data Intensive Systems and Visualisation, Technische Universität Ilmenau, Ilmenau, Germany

Faculty of Biological Sciences, Friedrich Schiller University, Jena, Germany

Search for more papers by this author
First published: 27 October 2022
Citations: 1
Handling Editor Helen Roy

Abstract

  1. Accurate species identification is essential for ecological monitoring and biodiversity conservation. Interactive plant identification keys have been considerably improved in recent years, mainly by providing iconic symbols, illustrations, or images for the users, as these keys are also commonly used by people with relatively little plant knowledge. Only a few studies have investigated how well morphological characteristics can be recognized and correctly identified by people, which is ultimately the basis of an identification key's success.
  2. This study consists of a systematic evaluation of people's abilities in identifying plant-specific morphological characters. We conducted an online survey where 484 participants were asked to identify 25 different plant character states on six images showing a plant from different perspectives.
  3. We found that survey participants correctly identified 79% of the plant characters, with botanical novices with little or no previous experience in plant identification performing slightly worse than experienced botanists. We also found that flower characters are more often correctly identified than leaf characteristics and that characters with more states resulted in higher identification errors. Additionally, the longer the time a participant needed for answering, the higher the probability of a wrong answer.
  4. Understanding what influences users' plant character identification abilities can improve the development of interactive identification keys, for example, by designing keys that adapt to novices as well as experts. Furthermore, our study can act as a blueprint for the empirical evaluation of identifications keys.

Read the free Plain Language Summary for this article on the Journal blog.

1 INTRODUCTION

Accurate species identification is essential for ecological monitoring and underpinning biodiversity conservation (Austen et al., 2016; Farnsworth et al., 2013). Many activities, such as studying the biodiversity of a region, monitoring populations of endangered species, implementation and evaluation of population management plans and health assessments of ecosystems are dependent upon accurate identification skills (Elphick, 2008; Farnsworth et al., 2013). The aim of identification keys is to provide an accurate approach to species identification (Drinkwater, 2009), that is, following a series of questions based on contrasting morphological characteristics towards an unknown taxon (Kirchoff et al., 2008). Possibly the oldest method for species identification—developed long before computers were available—dichotomous keys, in which users choose between two opposing character states at a time, are still widely applied today (Scharf, 2009; Walter & Winterton, 2007). However, these keys were designed and mainly applied by experts. Studies show that their application is difficult, time-consuming and—due to the use of technical biological terms—frustrating for novices and sometimes even for skilled biologists (Fermanian et al., 1989; Silva et al., 2011; Stevenson et al., 2003; Tilling, 1984). Consequently, inexperienced laypersons tend to avoid familiarizing themselves with dichotomous keys and use image-based browsable field guides instead (Stevenson et al., 2003). Additionally, there is evidence that amateur botanists and non-specialists perform worse than specialists using traditional taxonomic resources (Ahrends et al., 2011; Scott & Hallam, 2003). Considering that a continuously growing number of monitoring programs involves and relies on citizen scientists who often have little or no identification skills and species knowledge (Dickinson et al., 2012; Pocock et al., 2017), new approaches to support beginners are needed. Providing better tools to develop species identification skills is desirable to maintain high levels of accuracy while expanding participation in monitoring programs.

Therefore, there is motivation for creating more effective identification methods. In recent years, DNA barcoding has gained momentum and begun to supersede more traditional identification needs (Kress, 2017; Porter & Hajibabaei, 2018). Even more recently, researchers started to address the issue more systematically by providing identification tools that employ image recognition techniques (Jones, 2020; Mäder et al., 2021; Wäldchen et al., 2018; Wäldchen & Mäder, 2018). However, barcoding initiatives are still very expensive and slow, while computer vision solutions are not yet precise enough to replace the biologist in identifying critical species (Jones, 2020; Pärtel et al., 2021; Rzanny et al., 2019, 2021; Wäldchen et al., 2018). Furthermore, the benefit of automating identifications may also be a drawback since users are no longer required to study and recognize plant characteristics. Therefore, the usage of identification keys dependent on the human ability to recognize states of morphological characteristics will be essential even in combination with, for example, computer vision or DNA barcoding (Bruni et al., 2012). We argue that current and future research should also focus on easy-to-use and effective identification keys, which are needed alongside modern communication technologies.

Biologists, as well as computer scientists have made various attempts to advance the structure, identification procedure and usability of identification keys (Burkmar et al., 2014; Kirchoff et al., 2011). With the help of new technologies, dichotomous identification keys evolved from their static form to become more dynamic, flexible and interactive in recent years, thus attracting more non-expert users (Bodin et al., 2019; Jouveau et al., 2018; Kirchoff et al., 2011; Nimis et al., 2012). A number of online tools focused on the creation of identification keys have helped to make identification tools more accessible (e.g. Lucid: a software platform for producing keys and Free DELTA, an open-source software system for processing taxonomic descriptions and producing keys and interactive identification tools). However, while people's ability to recognize and correctly identify morphological characteristics should ultimately be the basis for a successful identification process, it has been investigated in only a few studies. One main focus of these studies is the comparison of different field guides with respect to species identification accuracy and user friendliness (Hawthorne et al., 2014; Sharma et al., 2019). Another important focus is phylogeny, as identifying characters and defining character states is also highly relevant to systematics research (Kirchoff, 2001; Kirchoff et al., 2004, 2007). To improve identification keys and make them user-friendly even for less experienced people, it is important to obtain more information about the individual identification steps during an identification process. In this paper, we report on a systematic study evaluating participants' abilities in identifying morphological plant character states by focusing on the following key questions: How is identification correctness of a character state affected by (a) the person's previous knowledge, (b) the plant organ and (c) the number of states per characters? Our comparative assessment allows us to suggest a set of design principles for intuitive and user-friendly identification keys. Our results are expected to improve the design of future interactive identification keys.

2 METHODS

2.1 Deriving a core set of plant characters

The fundamental principle of identification keys is that individuals of the same plant species share a combination of relevant morphological characters that differentiate them from other species. Two main categories of such characters are generally distinguished: (1) vegetative parts and (2) reproductive parts. The former are typically related to leaf morphology, while the latter refer to flower, fruit, or seed morphology (Duminil & Di Michele, 2009). These characters may be qualitative, meristic or quantitative. Quantitative characters can be measured, such as plant height and flower width; meristic characters are countable, such as the number of petals or stamen per flower; and qualitative characters are described by a distinct set of character states, such as leaf shape, flower colour, or ovary position. To determine a plant species, a number of sequential decisions regarding the state of certain plant characters are required.

We defined 43 basic morphological plant characters suitable to distinguish broad taxonomic plant groups. These characters describe fundamental morphological differences of plants and are typically used in plant identification keys to distinguish plants on the family or genus level. We explicitly chose characters that are recognizable from images; in other words, we intentionally left out potentially informative but hardly noticeable characters, such as the number of stamens, the position of the ovary and any characters that require manipulation or the physical presence of the plant, for example, smell or the presence of milky sap. All characters can broadly be attributed to one of the two groups discussed above, that is, 20 characters relate to a plant's flower or inflorescence and 23 characters relate to its leaves. Each plant character is described by 2–9 character states amounting to 134 states across all 43 characters; in addition, we chose five typical plant species to exhibit each character state. The definition of characters and corresponding states, as well as the selection of representative species, was carried out in a systematic and collaborative procedure by three botanists (JW, MR and AF). For the character state ‘multiple incisions of the petals’ (CF9-4), we could choose only two species, while for orbicular leaf shape (CL22-4) and lanceolate leaf shape (CL22-2), we could select only four species each, since no additional native species of the central European flora exhibits these character states. In all, we compiled 665 character state-species combinations to be used in our study (Figure 1). All characters, their corresponding character states and the selected species are listed and described in Table S1.

Details are in the caption following the image
We studied 43 basic morphological plant characters, 20 related to flower or inflorescences and 23 related to leaves, which altogether result in 134 unique character states. Each character state was represented by five plant species.

2.2 Study design and task description

The goal of our study was to systematically evaluate people's abilities to identify morphological plant characters with a focus on the following research questions:
  • RQ 1: How well can people identify morphological plant characters?
  • RQ 2: How does identification correctness differ among flower and leaf characteristics, and which plant characters are the easiest or most difficult to identify?
  • RQ 3: Are all states of a certain character recognized equally well?
  • RQ 4: How does prior botanical knowledge affect identification correctness?
  • RQ 5: How does the number of character states per character affect identification correctness?
  • RQ 6: How much time do participants spend on identifying a character, and does it correlate to identification correctness?

In order to cover all combinations of character states and species with a reasonable number of replicates, we developed a study for several hundred participants to be conducted as an online study for the following reasons: (1) the selected species do not usually occur together in nature and differ greatly in their phenology; and (2) participants should determine plant characters from the same individual, avoiding bias due to intraspecific character variability.

In the online form, the participants were informed about the project, the aims of the study and data protection. After they had given their consent, they could start the survey. This online survey was structured into two sections. In the first section, we inquired about participants' education and preexisting plant knowledge. In the second section, participants were asked to identify plant characters from a number of plant images (Figure 2). For this purpose we displayed all character states as pictograms accompanied by a brief explanation; however, no further definitions of botanical terms were provided to the participants. The pictograms and the corresponding brief explanations are available in Table S1.1. Participants were asked to click on the pictogram that best matched the depicted plant species. To simulate the observation of real plants as closely as possible, we represented each plant by images showing them from six different perspectives based on the recommendations of Baskauf and Kirchoff (2008).

Details are in the caption following the image
Study workflow and example of a character identification page. On the left side of the screen, characters with their related character states are displayed as a combination of icons and accompanying text. On the right, six images showing a plant from different perspectives are shown. Each picture could be clicked to magnify and zoom in and out.

The following images were shown: First, the entire plant—an image capturing the general appearance of the whole plant taken in its natural surroundings. Second, flower frontal—an image of the flower from a frontal perspective with the image plane vertical to the flower axis. Third, flower lateral—an image of the flower from a lateral perspective with the floral axis parallel to the image plane. In the case of composite flowers and flower heads forming a functional unity (i.e. Asteraceae), the flower heads were treated as a single flower. Fourth, leaf top—an image showing an entire upper surface of a leaf. In the case of compound leaves, all leaflets were covered by the image. Fifth, leaf back—the same as before but referring to the leaf's lower surface. Sixth, an image of the inflorescence or another characteristic image of the flower. The provided plant images were taken with the Flora Capture app (Boho et al., 2020) and independently validated by three botanists (JW, MR and AF). Only pictures that all three botanists found suitable for the study were used; in the case of a disagreement, other pictures were chosen. All plant images are shown in Table S1.2. Initially, participants saw thumbnails of all six images arranged one below the other. Clicking on any of these images would show a magnified, zoomable version when hovering over it or using the scroll wheel. We selected each image in such a way that the character in question was clearly recognizable in at least one of them. To simulate a real identification situation as best as possible, all pictures were always visible, regardless of the character that had to be identified. After each question, the participants were asked to rate the difficulty of the current question on a Likert scale between 1 (easy) and 4 (difficult), and how certain they felt about their answer with 1 (certain) and 4 (uncertain). We logged the time spent per question and the six images which were displayed or zoomed.

Participants had to answer a total of 26 questions. The first question served as a warming-up question, making them familiar with the questionnaire and the survey environment. The remaining 25 questions were dynamically assigned from the 665 character state-taxon combinations. Thereby, assignments had to: (a) contain 25 randomly shuffled and distinct characters of the 43 available ones, and (b) balance the number of responses per character state-taxon combination across all completed questionnaires, that is, prioritize those having the lowest number of responses yet. Applying these criteria, we could limit survey length, avoid repetition by never showing a participant the same character or taxon more than once, and alleviate possible signs of fatigue during the completion of the questionnaire.

We ran a pilot study with ten colleagues to evaluate the integrity of the questionnaire prior to the actual study; based on their feedback, we improved the questionnaire by excluding ambiguous plant photos and rewording questions. As a result of this pilot study, we also determined the number of questions as 25 to compromise between effort and required repetition. To attract as many participants as possible, we shared the survey link via social media, e-mail lists, newsletter and personal communication; the link remained online for 2 weeks. We were not focused on attracting any specific group of people but rather tried to achieve a broad cross-section of the population. Participants giving only partial responses or not finishing the survey were removed from the dataset. Furthermore, identical IP addresses were not allowed to prevent the study from being conducted more than once by the same person. The online study was performed with the SurveyGizmo platform (SurveyGizmo, 2019), and our initial goal was collecting at least 10 identifications per character state-taxon combination. The survey was carried out in German, since it was geared towards a German-speaking participant group.

2.3 Quality assurance and analysis procedure

In total, 492 participants completed the survey; two were excluded as their responses indicated that their German language skills were not at least conversational or beyond. Furthermore, we excluded six questionnaires that were carried out perfunctorily, that is, the participants had not selected an image other than the one initially displayed for over half of the questions, indicating that the participants simply browsed the questions rather than answered them properly. The resulting 484 questionnaires contained 12,100 answered questions of which we excluded two where a temporary server error had prevented the display of plant images. We excluded two more questions with response times longer than 4 min, indicating an interruption or the use of external help and material. During analysis, we discovered one falsely assigned character state-species combination (the respective participants saw a species not matching the asked character states) and removed the 19 answers. These changes resulted in a total of 12,077 answered questions for further analysis. On average, each character state-taxon combination was identified by 18 participants with a minimum of 15 and a maximum of 19, thereby almost doubling our initial goal for 10 repetitions per combination.

To compare the answers of persons with different skill levels, we split the participants into three different expertise groups depending on their self-reported experience with plant identification (Figure 4) based on cluster analysis (details explained in Supporting Information S2). To find out whether our measured variables, that is, answer correctness, self-assessed difficulty, self-assessed certainty, number of image views and response time differed significantly between expertise groups and between flower and leaf related questions, we conducted multiple statistical significance tests. Furthermore, we tested the influence of different variables for the probability of a correct character identification using a generalized linear mixed-effect model. The aim was to find out which factors have an influence on whether the character states were recognized correctly or incorrectly. A detailed description of these tests and models is available in Tables S2.1 and S2.2.

2.4 Ethics statement

Permission for this survey was granted by the responsible ethics committee at Friedrich Schiller University Jena. All participants were provided with a brief description of the study on the first page of the questionnaire and gave their consent before entering the survey. The survey was anonymous.

3 RESULTS

3.1 Demographics of study participants

Figure 3 provides an overview of the participants' ages and educational levels. The participants were 17–77 years old (median 36 years), with the majority being 20–30 years old. In terms of their levels of education, the majority held a university degree (55%). Through four additional questions, we inquired about participants' prior knowledge of plant species and their identification and found that the vast majority of participants (67%) were not professionally concerned with these topics. Nearly half of the respondents (49%) stated that they know 20–100 wild growing herbal plant species by name, while 17% reported to knowing more than 100 species, and 34% claimed to know less than 20 species. In a self-assessment question about their plant knowledge, 11% replied that they have no knowledge of plants, 61% reported little knowledge, 25% intermediate knowledge and 3% described themselves as experts. The fourth question concerned the participants' previous use of identification keys for plant identification. Roughly half of the participants (51%) had used identification keys several times before or used them regularly, while 49% responded that they had never used an identification key before or had only used them once (Figure 4). Cluster analysis distinguished three expertise groups reasonably representing the differences in experience with plant identification of the participants (Supporting Information S2). We refer to these groups as: little plant knowledge (novice; n = 197), moderate plant knowledge (intermediate; n = 168) and established plant knowledge (expert; n = 119).

Details are in the caption following the image
Distribution of participants according to age and education
Details are in the caption following the image
Self-assessed experience with plant identification reported by participants. (a) Does knowledge of plant species play a role in your professional activity? (b) How many wild growing herbal plant species do you know by name? (c) Have you ever identified a plant with a dichotomous identification key? (d) Please rate yourself in terms of plant knowledge (e) Expertise groups based on a clustering of the self-reported information provided by the participants (a–d).

3.2 Character identification and prior knowledge

In total, participants correctly identified 79% of the character states they were shown (Table 1(I)). We found that participants with expert knowledge correctly identified a significantly but marginally higher number of character states than those with intermediate or no prior knowledge. However, the absolute correctness difference among the three groups was marginal, with novices responding on average merely 4% less correctly than experts (Table 1(II)). Each participant not only identified the character shown, but also reported how difficult they experienced the identification to be and how certain they felt about their answer, both on a 1 (easy, certain) to 4 (difficult, uncertain) Likert scale. Character identifications were rated with a difficulty of 1.7 on average, although novices found them significantly more difficult (1.8) than experts (1.6). The participants also reported an average certainty of 1.7, with novices being significantly less certain (1.8) than experts (1.5). On average, 3.4 out of six images showing the character-exhibiting plant from different perspectives were used for a character identification. Participants belonging to groups ‘intermediate’ and ‘expert’ consulted significantly fewer images than participants classified as ‘novice’. We argue that prior knowledge helped the participants in the sense that they only had to look at the pictures of the relevant perspectives without needing to see every single image. This observation is substantiated by a significantly varying response time across the groups; on average, response time was 22 s.

TABLE 1. Correctness, self-assessed difficulty and certainty, how many different images where viewed, and time needed for single character identifications (mean and standard error) dependent on (II) prior plant knowledge and (III) organ. Lower case letters next to the reported values (a, b, c) indicate significant differences within the sub-groups of each column based on Kruskal–Wallis tests with p < 0.05 for prior knowledge and based on Mann–Whitney U tests with p < 0.05 for plant organ, where a is significantly different from b, b is significantly different from c, c is significantly different from a, and values annotated with the same letter differ not significantly.
Correctness [%] Difficulty (1–4) Certainty (1–4) # viewed images Response time [s]
(I) Total 79.28 1.73 (±0.01)a 1.73 (±0.01)a 3.41 (±0.02)a 22.09 (±0.15)a
(II) Prior knowledge
Novice 77.85a 1.79 (±0.01)a 1.82 (±0.01)a 3.61 (±0.05)a 23.10 (±0.25)a
Intermediate 78.70a 1.76 (±0.01)a 1.76 (±0.01)b 3.46 (±0.05)b 21.78 (±0.25)b
Expert 82.48b 1.61 (±0.01)b 1.53 (±0.01)c 3.00 (±0.05)c 20.87 (±0.30)c
(III) Organ
Flower 82.52a 1.73 (±0.01)a 1.73 (±0.01)a 3.28 (±0.04)a 20.54 (±0.23)a
Leaf 76.79b 1.74 (±0.01)a 1.73 (±0.01)a 3.50 (±0.04)b 23.30 (±0.21)b

3.3 Character identification and plant organs

We observed that the average correctness across characters was always higher than 50% (see Figure S2.1). The most accurately identified character (98%) was that of a nodding or upright flower (CF10), while the character most often identified incorrectly (58%) was petal margin structure (CF19). Table 1(III) aggregates the results of characters per organ and shows that flower-related characters are identified correctly significantly more often (82.5%) than leaf-related characters (76.7%). To identify flower characters, the participants studied 3.3 images on average, while they consulted 3.5 images for leaf-related characters. We observed a similar relationship in the participants' response times, where flower-related characters were identified significantly faster (20.54 s) than leaf-related characters (23.30 s). Figure 5 indicates that the performance of the participants may not only be related to the organ a character is associated with, but also the number of different states a participant has to choose from. Figure 5 shows average identification correctness in relation to the number of states expressed by characters; we observe that the higher the number of states per character, the lower the identification correctness for flower as well as leaf characters.

Details are in the caption following the image
Identification accuracy as a function of the number of character states (black line) and organ (red = flower, blue = leaf). The shaded area shows the 95% confidence intervals of the leaf and flower curves. The used binomial logistic regression is statistically significant (<0.01) for flower and leaf and both together.

3.4 Character state identification and species

Except for character states ‘multiple incisions of the petals’ (CF9-4), ‘orbicular leaf shape’ (CL22-4) and ‘lanceolate leaf shape’ (CL22-2) (Section 2.1), we evaluated each character state based on its occurrence in five different species. For instance, with the character state ‘yellow color of the flower’ (CF03-3), we used Hypericum perforatum, Ranunculus repens, Potentilla anserina and Lapsana communis.

The most general observation is that identification correctness highly depends on the character state to be identified as well as the species assessed. Identification correctness across states per character could be similar (e.g. elongated or rounded inflorescence [CF14]) but also vastly different. Some individual states were considerably more difficult to identify than others (e.g. the shape of the leaf margin (CL27) or the margin of the petals (CF19)). Whether a character state was correctly identified was also strongly dependent on the species shown. On one hand, the character states were identified equally poorly or well across all five species (e.g. the presence of tendrils (CL24) or the shape of the inflorescence (CF14)); on the other hand, there were also examples where the character state was identified very well in one species, but very poorly in others (e.g. the presence of spikes or thorns (CL23) or the structure of the composite flowerhead (CF16)). An extreme example is spherical inflorescence (CF05-5), which was identified correctly considerably less often than the other inflorescence types. Furthermore, individual species, that is, Phyteuma orbiculare and Prunella vulgaris, were even more challenging when identifying the character spherical inflorescence (CF05-5); we found that this form of inflorescence was most often confused with single flowers (CF05-1). However, single flowers were not to be confused with spherical inflorescence. Figure S2.2 gives a comprehensive overview of how correctness can vary depending on the character states and species.

Figure 6 shows matrices visualizing this confusion among states of the worst-recognized leaf and flower characters with at least four character states, highlighting whether a confusion is unidirectional or bidirectional. It appears that very specific character states are interchanged; for example, capitulum flowers (CF07-5) were often determined as flat outspread single flowers (CF07-1). There was also high confusion between papilionaceous flowers (CF07-3) and flowers with upper and/or lower lip (CF07-2). However, bell-/jug-shaped flowers (CF07-4) were recognized effectively and not confused with other flower shapes. For the shape of the petals' front margin (CF09) character, we observe that deeply 2-lobed petals (CF09-3) were often confused with straight/rounded petals (CF09-1), while the straight/rounded petals were in turn often interpreted as indented (CF09-2). All other character states were recognized very well. Notably, the character CF05's state of spherical inflorescence shape (CF05-5) created confusion, as discussed above. For the three most poorly identified leaf characters, the overall confusion was somewhat more arbitrary than for flower characteristics. Across all three confusion matrices, we find that the participants had more difficulties in determining the leaf-related character states correctly than for flower-related characters.

Details are in the caption following the image
Confusion matrices of the most poorly recognized leaf and flower characters with at least four character states. Each matrix compares correct character states (y-axis) to the quantities, visualized as colour shading, of those states that participants identified it as (x-axis). The upper left matrix shows that, for example, a capitulum flower (CF07-5) is often misinterpreted as a flat-outspred flower (CF-07-1), but not vice versa.

When evaluating individual character state-species combinations, we observed that identification correctness for 81 out of 665 combinations was below 50%, nine were below 10%, and for five, not a single participant identified them correctly. These five combinations are: (1) Centaurea scabiosa was not identified as consisting only of disc flowers (CF16-3); (2) Aconitum lycoctonum's flower was consistently confused as ‘with wings’ (CF18-1); (3) Stellaria media was not associated with petals that are strongly incised and thus show certain structures (CF19-2); (4) Nymphaea alba's orbicular leaf shape (CL22-4) was not recognized correctly even once; and (5) the basic shape of Anthriscus sylvestris's entire compound leaf was always determined as elongated (CL34-1) rather than rounded (CL34-2).

There might be various reasons why characteristics of certain species were recognized better or worse than the average. First, certain species show very pronounced characteristics that almost iconically resemble a character and are therefore easier to identify (e.g. Rosa canina for indented petal's margin (CF9-2) or the spur (CF08-1) for Impatiens parviflora). Second, other species expose ambiguous characteristics in between character states, for example, Nymphaea alba is described as having round leaves by prominent identification keys (Jäger, 2016; Lauber et al., 2001; Spohn, 2021), while our participants almost uniformly identified it as being reniform; however, this may have been caused by a rather short descriptive text or an icon that was difficult to understand. Similarly, Centaurea scabiosa is very challenging to identify as having an inflorescence consisting solely of disc flowers, since what may initially appear as rays are in fact elongated disc flowers. Identifying this character is very difficult in general, and even more so without being able to see an actual specimen. In general, the generalizations made by analogue and digital keys when forming characters bear the risk that users without special training misinterpret them.

4 DISCUSSION

Quantitative research on botanical knowledge within the population is scarce, and most studies on botanical knowledge have been conducted with students. Regardless of the participants' differing educational stages and the varying experimental designs of the studies, all studies found that in general, children and young adults have poor floristic knowledge and abilities to identify plants (Balmford et al., 2002; Bashan et al., 2021; Bebbington, 2005; Buck et al., 2019; Cooper, 2008; Hawthorne et al., 2014; Hesse, 1984; Lehnert et al., 1999; Lindemann-Matthies, 2006; Robinson et al., 2016). Confirming these earlier findings, a considerable share of our participants (34%) claimed to know fewer than 20 species. However, despite the poor species identification knowledge reported by the participants, we observe that they were able to identify species-specific characters considerably well. On average, they identified 79% of the characters correctly, and even participants who had poor experience with plant identification performed only marginally worse; other studies have reported similar results.

Hawthorne et al. (2014) compared users' species identification accuracy based on different image formats, including drawings, specimen photos, living plant photos and paintings, and found that a typical user attained a 70%–95% accuracy across all species in their study. One study conducted in the United Kingdom estimated the overall misidentification rates to be 5.9% at the species level, with much higher values (25.6%) for less experienced botanists (Moody, 2009). We argue that carefully derived plant characters illustrated by a combination of icons and explanatory texts are recognizable even for amateurs (Scott & Hallam, 2003). This is an important finding underlining the effectiveness of properly constructed identification keys as a tool for identifying plant species in the field. Nevertheless, our results also show that future design of identification keys should focus on users and their capabilities, a fact that has already been considered in the designs of some existing keys (Kirchoff et al., 2011; Leggett & Kirchoff, 2011). Interactive identification keys that can be used on mobile devices could be instrumental in making the process more interactive and dynamic and thus appealing to a wider range of users. Existing examples are, among many others, the iFlora App (identification of German flora), the Flora Helvetica App (identification of Swiss flora), or the app Tree ID (identification of British trees). These technologies can bring us one step closer to eventually overcoming the existing cultural gap, which Lobanov aptly described by stating that ‘keys are compiled by those who do not need them for those who cannot use them’ (Lobanov, 2003).

4.1 Deriving principles for identification keys

Based on the results of this study, we argue that the following principles should be considered when designing and studying identification keys:

4.1.1 Adapt to users with different levels of prior knowledge

Current technology for mobile devices allows interactive identification keys to effortlessly adapt to different groups of users or even individuals. Our study shows that users with prior plant knowledge are better able to identify plant characters than users with poor prior knowledge. We observed significant but small performance differences among user groups, potentially making user adaptation challenging. However, our study focused on rather simple and easily recognizable characters, and we hypothesize that further studies with more complex characters distinguishable on the species-level may show larger differences (Kirchoff et al., 2011; Leggett & Kirchoff, 2011; Scott & Hallam, 2003).

Thus, an identification algorithm that computes a sequence of characters for the user to consider should balance this selection, which is based on a character's discriminative information, with user-centric information. For instance, suitable algorithms could select more challenging but better discriminating characters for experts, while users with little to no prior knowledge would be presented with simpler characters and fewer states, even if this leads to a longer identification process. This identification scheme could also be adapted by enabling the user to transition between novice and expert modes depending on their increasing knowledge over time. Additionally, the user-provided metrics of perceived difficulty and certainty can inform a more appropriate prioritization of characters.

4.1.2 Consider user behaviour during the identification process

Our study shows that response time is an important indicator of correctness. When a user takes a long time to identify a character state, they are less confident in their decision, making this identification less dependable. Such a relationship has also been shown in similar studies concerning other areas (Lasry et al., 2013; Wilding, 1971). For example, Lasry et al. (2013) measured the time needed by students to respond to physicsrelated questions. Similar to our study, they examined response time differences between correct and incorrect answers. Response times were longer for incorrect answers than for correct ones, indicating that the answers were not randomly given. Furthermore, response times were inversely related to students' expressed confidence; the lower their confidence, the longer it took them to respond. Therefore, an algorithm may incorporate the response time as a metric to acquire additional, more dependable characters to verify the identification correctness.

4.1.3 Incorporate uncertainty and error tolerance in the identification process

Our study shows that an identification algorithm needs to anticipate and tolerate roughly 20% incorrectly identified characters even from expert users; in fact, we find that identification correctness varies across characters as well as across individual character states. Given an empirically evaluated identification key arising from a study like ours, these individual error rates can also be considered to form an even more precise hypothesis about a user's expected correctness for a character in question, thereby making the identification more reliable. For more challenging characters a reduced confidence in user responses and higher error tolerance could be adopted by the identification process. This applies to the character level as well as individual character states for which a variable error tolerance would be desirable. This concept has already inspired key implementations, for example, the LucidKey software (Lucid Key, 2021). LucidKey allows identification key authors to add misinterpretation scores per character reflecting typical confusions of novice users, a function that more experienced users can turn off. Authors can also score particular characters as ‘present, but rare’ influencing the rank order of remaining taxa in an identification process. Finally, LucidKey provides an adjustable error tolerance that users can adapt based on their experience (Lucid Key, 2021).

4.1.4 Design characters with less states

Our study shows that a character's identification correctness is significantly influenced by its number of states, a finding that is consistent with other studies. For example, Dallwitz (1974) argued that most taxonomists prefer two-level characters in keys because it was felt that errors are more likely in determining the values of multistate characters, especially if the characters in question involve long or complicated descriptions. In addition, Martellos and Nimis (2015) argued that a high number of options can confuse the user. To ease identification, especially for non-experts, an algorithm should select characters with fewer states before others in order to obtain as much low-error information as possible. Furthermore, characters with many states could be split into multiple smaller ones. Further systematic research based on the present study is still required to substantiate or possibly revise these statements, under the condition that the latest multimedia technologies can be used. This is because, in some cases, multilevel characters in combination with icons and example images are not more difficult to use than the corresponding two-level characters, and their use can lead to considerably shorter keys.

4.1.5 Design identification keys with iconic symbols, illustrations, or images

Although most older keys are text-based with relatively few illustrations, recent advances in digital technology have made the creation of visually enhanced identification guides a reality (Farnsworth et al., 2013; Kirchoff et al., 2011; Leggett & Kirchoff, 2011). The keys themselves follow the usual format, which means that users choose between different character states. However, these character states are always additionally represented with corresponding icons, realistic drawings of the characters, or photographs, to supplement the written descriptions. For example, Ribeiro et al. (1999) and Dellinger-Johnston (2015) use images, or more specifically photographs, in their keys; Dellinger-Johnston (2015) used photographs to create a survey on how botanical experts and botanical novices rate the pair-wise similarity of different oak leaves. The mean of each rating was summarized into a distance matrix, which was then converted into a dendrogram. Next, from the resulting dendrogram, a visual key was constructed using the standardized photographs of oak leaves. This key was then tested with an existing dichotomous key. The results showed that users of the visual key gave 22%–30% more correct answers than users of the traditional key. This clearly indicates that user studies are of great importance during the creation of identification keys and the use of images can simplify the identification process. The iFlora app, focused on the identification of German flora; the Flora Helvetica app, targeting the identification of the Swiss flora; and the Tree ID app, specialized in the identification of British trees, are all examples of keys that support the identification process with iconic symbols. For non-experts, these kinds of keys can help them overcome their considerable problems with terminology, since plants can be identified solely by visual means (Kirchoff et al., 2011). Therefore, enabling visual assessment can greatly enhance the usability of taxonomic studies and can also be expected to result in more reliable identifications, especially in when it comes to non-experts (Dellinger-Johnston, 2015; Martellos & Nimis, 2015). However, these iconic symbols and illustrations should also be empirically evaluated in advance to ensure that they are understandable, thus contributing to better usability of the identification key.

4.2 Limitations

Our study has several limitations. First, the setting in which we evaluated the participants' identification abilities was artificial and may have been unable to convey important specifics of its real counterpart, thereby biasing our results. Using real plants, preferably in their natural habitat, would be more reminiscent of the way people are used to identify and engage with plants, for example, they would be able to smell and touch a plant and get a better understanding of size and shape. In addition, certain characteristics are more difficult to recognize from photographs than from actual specimen, and participants with prior knowledge were much more likely to identify them correctly (see the Centaurea scabiosa example in Section 3.4). However, we argue that a study of this magnitude would not have been possible with real plants in their respective habitats due to plants being strongly impacted by repeated participant inspection, different flowering and growing periods, and spatially distinct habitats. Given this conclusion, we aimed to design a setting that came as close as possible to providing all the information available during an identification in situ. However, it is conceivable that correctness will vary depending on how and which plant images are presented. Aiming for minimal bias arising from the way we presented the survey, we posed the same initial question to each participant without analysing their answers to it. This concept arises from a previous study where we found it effective in getting participants acquainted with the experimental setup (Mäder & Egyed, 2015). Second, for the descriptions of characters and their states, we used only short verbal descriptions and icons illustrating states; however, this could potentially pose a threat to the validity of our study if the participants did not correctly understand their meaning. To mitigate this threat, we ran a think-aloud pilot study with 10 participants who provided feedback on the comprehensibility of the characters and suggestions for improving the descriptions in three iterations. Third, the assignment between character states and taxa, as well as the identification of taxa shown on images, might have been incorrect. To mitigate this threat, one expert botanist created and curated this information, while two others independently reviewed all materials. Fourth, we cannot prove that the participants performed their tasks on their own and without external help; however, the short time spent per question makes it unlikely that external resources were used. Fifth, by creating an online survey and advertising it across various channels, that is, e-mail distribution list, newsletter, social media and personal communication, we aimed to minimize bias arising from participant selection. Statistics on the participants' demographics show that we attracted a vast variety of people of different ages and experience levels, which we consider to be relevant attributes of potential identification key users. Finally, the findings of this study do not allow us to draw overarching conclusions of people's abilities in eventual taxa identification due to the observed differences in character identification.

5 CONCLUSIONS

Our study systematically evaluated people's abilities in identifying morphological characters based on photo sets showing plant individuals from different perspectives. The participants' task was to identify the specific character expressions (aka character states), with the help of graphical icons and accompanying text, for the species depicted in the images. On average, participants identified 79% of the characters correctly. We observed that experts identified characters more correctly than non-experts by a small but significant margin. Furthermore, our results indicate that identification correctness differs across individual character expressions and individual species. For example, we found that the character state of one species was identified nearly 100% correctly, while the same character state was hardly identified for another species at all. Additionally, the longer the participants spent on identifying a character state, the more uncertain they were and the more incorrect their answer was. We argue that our study is relevant for the development of interactive identification keys; furthermore, we believe that user behaviour should be integrated into the identification process, that identification keys should be extended with images and symbols, and that error tolerance should be adapted to the user's prior knowledge in the design of future identification keys. Our measured relative performances in identifying individual characters, also in relation to the participants' different knowledge levels, can be used to inform the development of interactive plant identification keys that take user behaviour into account more accurately. In addition, we recommend that identification keys should be empirically evaluated at the time of their creation. Our study can serve as a basis for evaluating new identification keys for further species groups.

AUTHOR CONTRIBUTIONS

Writing manuscript: Jana Wäldchen, Patrick Mäder, Hans Christian Wittich and Michael Rzanny. Data analysis and visualization: Jana Wäldchen and Michael Rzanny. Programming online study: Hans Christian Wittich. Defining plant characteristics and assigning them to species, selecting species images: Alice Fritz, Michael Rzanny and Jana Wäldchen. Funding acquisition: Patrick Mäder and Jana Wäldchen. All authors read and approved the final manuscript.

ACKNOWLEDGEMENTS

We are grateful to all those who participated in the study for their generous assistance. Furthermore, we thank our colleagues Nedal Alaqraa, Angelika Thuille and Marco Seeland for support in designing and evaluating the survey. Thanks to Annett Börner for designing the iconic symbols and Anke Bebber for improving the language. We are funded by the German Ministry of Education and Research (BMBF) grants: 01LC1319A and 01LC1319B; the German Federal Ministry for the Environment, Nature Conservation, Building and Nuclear Safety (BMUB) grants: 3514685C19, 3519685A08 and 3519685808; the Thuringian Ministry for Environment, Energy and Nature Conservation Grant: 68678; Stiftung Naturschutz Thüringen (SNT) grant: SNT-082-248-03/2014.

    FUNDING INFORMATION

    German Federal Ministry for the Environment, Nature Conservation, Building and Nuclear Safety (BMUB) grants: 3514685C19, 3519685A08 and 3519685B08; the German Ministry of Education and Research (BMBF) grants: 01LC1319, 6PGF0334; and the Thuringian Ministry for Environment, Energy and Nature Conservation grant: 0901-44-8652; Stiftung Naturschutz Thüringen (SNT) grant: SNT-082-248-03/2014.

    CONFLICT OF INTEREST

    The authors declare no conflict of interest.

    DATA AVAILABILITY STATEMENT

    Data sets of the survey responses of the participants can be found in: https://doi.org/10.7910/DVN/ZGDH9G.