Descriptive Analysis in Specialty Coffee III

Descriptive Analysis in Specialty Coffee III

Sample Sizing and Presentation

Screw a nail or hammer a screw, that fastener don’t drive. In addition to their design and construction, descriptive analysis techniques require certain benchmarks to be met for their deployment. Key among these is what’s known as a test’s n or sample size. Sample size here doesn’t refer to the amount of coffee sampled, but to the number of independent tests performed on it. We use the word “sample” in many contexts in specialty coffee. In descriptive analysis, this is another technical term. The n is composed of the number of qualified assessors multiplied by the number of tests or replications that each performs.

Unsurprisingly, descriptive analysis has minimum expected values for both of these. While we might define a minimum n, it’s not ideal to have only one assessor doing a ton of tests or for a ton of assessors to do only one test each. Obviously having few testers doing only one test each is insufficient. For replications the expectation is that panelists will see each sample at least two or three times. Note that the SCA’s 5-cup presentation only qualifies as a single instance. If you use the 5-cup presentation you still need to create at least two or three 5-cup place settings per green** (10 or 15 cups total, for each coffee).

**We use the conventions Lot, Green, Roast, Extraction, Test where a lot is a pile of green coffee; a green is a little pile of green coffee (a sample pulled from a lot); a roast is a roast of some or all of a green; an extraction is a place setting on a cupping table of a roast; and a test is an instance of a cupper testing an extraction.

Whatever presentation is used, these place settings need to be assessed independently of one another and need to be replicated. Descriptive analysis sample size and replication can become quite extensive and complicated to manage depending on the sensitivity of your testing and statistical requirements and tolerance for sensory error. Because placement impacts perception, some more involved replication schemes go so far as to replicate each sample in every possible position relative to each other sample.

Cupping Spoon #5


Keeping it real.

One of the big take-aways here, no matter how you’re cupping, is that blinding, replication, and panel collaboration are invaluable tools not just for generating better outputs, but for making us more humble and authentic coffee cuppers, tasters, communicators, professionals, and appreciators.

As paradoxical as it might seem, blinding and replication are essential to transparency in cupping. When you don’t know what you’re tasting, there’s less room for doubt about the basis of your assessment. That is, there’s less room for doubt about whether you described how the coffee tasted or how you thought the coffee should have tasted.

Transparency in cupping isn’t about us knowing what we’re assessing and getting the right answer. It’s about our counterparts knowing how we’re assessing and arriving at our answers. Where do the valuations come from and how do our partners know that the coffee was the variable that we tested? Coffees of course have extrinsic attributes. Some may be surprised to learn that they always have. It is imperative that we don’t allow knowledge of those attributes to influence our assessment of coffee cup quality if we are claiming to assess coffee cup quality.

When you can be wrong about what you’re tasting and being wrong matters, when you can only taste what you taste and you have to rely on others to fill in the gaps, that’s when you’re tasting the cups that professional tasters taste.

Focus Q: Does my assessment change if this is a different coffee than I think it is? How?


In terms of panelists, minimum recommendations are for 5 or 6 trained assessors. Note that this range describes a minimum, and that this minimum is in reference to qualified or validated assessors as opposed to initially screened ones. Opting to use the minimum panel size should be balanced by increasing the replications of greens (extractions). While larger panels are more expensive to train and operate, increasing replications with smaller panels can quietly limit the reliability of your results over time due to fatigue.

I am unable to speak directly to anyone else’s cupping program, but I can state that at Cafe Imports we have an exceptionally skilled and supported sensory team and there is next to zero chance that we would ever achieve the minimum n required for descriptive analysis outside of intentionally special circumstances (e.g. competitions, auctions, or other purpose designed, closed tests). Staffing a team of trained cuppers in a dedicated sample room is a hard thing to do. Staffing enough to perform ongoing, rolling, or on demand descriptive analysis for any sort of volume or diverse coffee range is much more so. The suggestion that any of us can do it by just adopting a special cupping form and maybe taking some classes is disingenuous.

Core Concepts

  1. Descriptive tests pay careful attention to sample size and presentation.
  2. They require independent data outputs and relative to cupping a high minimum number of responses.
  3. Response rates are achieved through panel size and presentation frequency.
  4. Assessors see replicated tests in part to account for errors of placement or timing and to facilitate statistics.
  5. Multiple assessors are used in part to account for personal biases and limitations and to facilitate statistics.
  6. Minimums for descriptive testing are critical and hard to achieve.

Mind the Gap

Making matters worse, or better, or more hilarious depending on your point of view, descriptive analysis is not only expensive and hard to implement, it’s arguably not even appropriate for specialty coffee, outside of specific scenarios. The cultural gulf between descriptive analysis (and much of sensory science generally) and specialty coffee is large and critical to understand if we want to bridge in a healthy way and draw on the very real benefits that sensory science has to offer us.

There’s a saying in NBA circles that availability is the best ability. The point is that conversations about who or what is the best can only begin after establishing that the contestants are real participants. A player can have all the upside talent and potential in the world, but if they don’t play for not fitting into a system or because of injury or whatever it might be, they’re not in the conversation. The same goes for cupping. The best white room protocol that you can devise still falls short of a practical and effective sample room protocol that you can deploy.

We have to use our brains here to consider what we want to improve, what we want to retain, and what we want to jettison in specialty coffee. Reflecting on the development of the Coffee Rose back in early 2022 I wrote:

“In many ways, a strict sensory science approach (of the sort that I initially imagined) runs the risk of taking the “specialty” out of specialty coffee, at least for the specialists. Do we want that? By the same token, we have to ask ourselves if we’re okay with the “specialty” of specialty coffee meaning anecdotal, imprecise, and personal. As much as we may prefer to deny that it is these things, as much as we may want to project rigorous (and dare I say “scientific”) precision and validity, at bottom I suspect that in large part we are not quite willing to let these things go entirely. And beyond shoring up some fundamental issues, I’m not sure that we should be (Rich Content CATA for Specialty Coffee Cupping, 50).”

Sensory science has a lot to offer specialty coffee cupping. We don’t have to role play the most obvious and least fitting parts (parts that if implemented with competency will undermine core values of the industry that we’ve built). There are plenty of less flashy principles in sensory science that we can draw on to support and give structure to the anecdotal, imprecise, and personal foundations of specialty coffee. We can address the common sensory errors, improve our blinding, and introduce sample replication and independent presentation protocols.

We can do better than just apologizing for the humanity of specialty coffee and looking for sensory science bandaids for our perceived imperfections. We can celebrate, support, and strengthen the things that make coffee specialty. And we can use sensory science to do so as long as we don’t fall into the trap of devaluing the former in favor of the latter. Specialty coffee professionals have always been creative. We can reclaim that creativity and apply it to our understanding of sensory and specialty.

Consider that imprecision can still be accurate. Accuracy can become too accurate. An anecdotal and personal description may connect your audience with a coffee better than a technically descriptive one. We can become too myopic with our view of precision and accuracy. We might assume that our goals in cupping and sensory science should look like the target on the left:

Article content

There’s a lure to the terms accuracy and precision. Hitting the bullseye seems inherently correct. But what if the coffee that we’re tasting is more complex than that? What if it changes or shows different aspects over the course of a brew? What if the same cluster on the left target landed somewhere near the target’s outer edge? Could that not represent the repeated identification of a key point of interest, while failing to nail the coffee itself? In specialty coffee I think that could very well be the case, and I think that would be interesting.

What if the target in the middle, which is less precise, is actually more accurate to the coffee and not just to our concepts of accuracy and precision? What if the coffee lacks precision…or is exceptionally complex? In terms of being too myopic, cuppers often get stuck in the weeds debating the difference between two similar or conceptually related descriptions. Stepping back from the middle target, we might realize that it looks like the right target (without yet leaving the specialty realm).

The historical imprecision of specialty coffee description, while almost certainly in need of a little refinement and tightening up, may not have been doing such a disservice to specialty coffees, which are famously complex, as is sometimes suggested.

At Cafe Imports we moved on from descriptive analysis in 2021 not only because its core application was out of our and many of our industry peers’ reach, but also because we found it to be poorly aligned with the deeper values and interests of our industry.

Descriptive and affective testing are interesting, and they do have potential applications within specialty coffee, but we need to be clear that they are better designed to do things like determine if too many people will notice when a company reformulates a product more cheaply, or help rush the release of a suitably innocuous RTD matcha before the trend dies. The contextual genesis, historical funding, goals, and application of these methods have influenced what they are, what they do, and how they work.

This doesn’t mean that we cannot create uses from them, but it does mean that we need to apply serious creativity and care when we pick them up. It was Dr. Ian Malcolm who said "Your ~~scientists~~ association leaders were so preoccupied with whether or not they could, they didn't stop to think if they should."

Even though we moved on from descriptive analysis, we didn’t throw the bath water out with the baby. Far from it. Our Coffee Rose cupping form and protocol draw on numerous learnings from sensory science, all applied with the goal of supporting our efforts in specialty coffee rather than supplanting them.

It’s a journey. Once you get past the Boromian dream of wielding sensory science to the glory of a new era for specialty coffee, you realize that we could instead just study sensory science to inform, improve, and support specialty coffee where appropriate. “Sensory Science is really an interesting field. Included in all the logic, statistics, and best practices is a foundational commitment to a project’s stakeholders and the realities of their business environment, along with a patient and iterative approach to discovering the best possible solution to the stated problem. In the real world of coffee assessment, there is rarely time for the sample sizing and replication common to most sensory science studies, which are highly oriented around statistical validation (Rich Content, 50).” Emphasis added, for emphasis.

There’s no reason to use cupping forms that don’t fit into your workflow or output the information that you need to know about coffees. You don’t have to imitate or adopt technical assessment forms to benefit from sensory science. You don’t have to remove the specialty from specialty coffee to use sensory science, but you can if you’re not careful.

Cupping Spoon #6


Ope.

One of the big take-aways here, no matter how you’re cupping, is that even without adopting poorly fitting sensory science tests, we can improve our processes by studying and addressing core sensory learnings like the common sensory errors (CSE). The CSEs are the troubleshooting guide for human tasters that you didn’t know you needed. They are readily available to read about and easy to see and test for yourself.

Identifying and addressing the common sensory errors in your tasting program is probably the single most powerful thing that you can do to improve it. It’s not flashy. It’s not dramatic. But it not only makes you better the moment you make the updates, it makes your entire team better.

What could my cupping/tasting outputs derive from other than the coffee attributes or treatments that I’m tasting for?


Core Concepts

  1. There are technical and cultural gaps between specialty coffee and sensory science.
  2. Misusing sensory science may be somewhat innocuous on its own, beyond wasting the time and money of participants.
  3. The appropriate application of inappropriate sensory science presents a risk to specialty coffee and specialty coffee people (A) To the degree that emergent specialty properties and values are seen as sensory errors and (B) To the degree that training and other principles are upheld in exclusion of access and consolidation of power.

Describing Coffee

Descriptive analysis requires both a descriptive assessment form and an appropriate assessment protocol-methodology. We’ve discussed the use of the term “descriptive” in “descriptive assessment”, as well as provided a general outline of the steps that go into setting up a descriptive test. Examples of descriptive methodology applications are linked in the first part of this series. We can also take a quick look at descriptive assessment as it pertains to specialty coffee.

Descriptive attributes for specialty coffee need to be more descriptive than “Flavor” and “Aftertaste” or “Fragrance” and “Aroma”. We need to understand both what “descriptive” generally means in application, and also how efficient or concentrated a given descriptive method is. Even if we focus on coffee attributes, failing to differentiate coffees based on those attributes means a test is not descriptive. Failing to measure the intensity of attributes drastically decreases descriptiveness. We can easily understand the gist of the “descriptive” in descriptive assessment by considering how we naturally talk about coffees. This is an area where the concept is not at all hard to understand.

If someone tells you that they’re excited about a coffee and says that “the flavor and aroma are really strong”... and that’s it, that’s not particularly descriptive in an area as broad and deep as specialty coffee. I think almost all of us would follow up by asking what those really strong flavors and aromas are.

Dude- “Bro. I just had this amazing coffee?”

Bro- “Oh, fr real Dude? What did it taste like?”

Dude- “Flavor.”

Bro- “Oh, no shhhhh. That’s my favorite kind of coffee.”

Dude- “Ya man. Not only that, but that flavor was like medium-high, like a 10.”

Bro- “10?! Out of 15? Oh hell yeah. Medium-high is awesome, but you know I’m more of a high-medium 8 - 9 guy. What about aroma? Did it?”

Dude- “You know it did, Bro. It totally had smells. Aroma and fragrance. Not only that, but after I swallowed it, I still tasted it.” 

Bruh- “Daaaang. It aftertasted too? Call Best of Someone, that’s comp worthy.”

When you consider a descriptive cupping form as a tool specifically intended to describe and differentiate specialty coffees, saying that sample A has strong flavor and sample B has moderate flavor leaves much to be desired. One might think that the incorporation of elements like a free text field and a low resolution CATA would help. In practice they just clutter the UX and confuse the output. Such form elements still fail to differentiate coffees when the base resolution of the form is lacking. Remember the core concept from the first section: attributes must categorize products and intensity must tie to that categorization for quantification to be descriptive. 

If sample A has intense jasmine and mild peanut and sample B has intense peanut and mild jasmine, most would agree that they are very different coffees. Unfortunately a descriptive form that does not measure jasmine and peanut individually cannot distinguish them as flavors and therefore fails to describe or differentiate coffees that share them.

As an example here’s what the modified QDA (quantitative descriptive analysis) cupping form that we designed and used from 2016 - 2021 looked like:

Article content
2016-2017 Cafe Imports' "Analytic Cupping Scorecard"

Already in 2016 we understood that it wasn’t enough to assess “Flavor” but rather that if we were going to be authentic to the tasting, quality discovery, and valuation experience we needed to assess flavors. We experimented with the addition of both an open text field and a few minor CATA options. Our form was somewhat able to solve the jasmine-peanut problem, but only with the use of the free text field and an acquired understanding of how we viewed the “Caramel” category. Very imperfect.

A step beyond the jasmine-peanut example, there was no way for our form to differentiate stone fruit and citrus. So while it could easily tell characteristic Yirgacheffes from other coffees, our form could not distinguish between those same Yirgs on its own basis. Note that the numbers in the CATA boxes were a value embedding system, and so had no descriptive use.

Despite the flaws, we began with this form to understand the process of building qualitative attributes into a quantitative form so that our cuppers could focus on a more descriptive and coffee oriented quantitative assessment. We introduced a measure for bitterness here which quickly showed us the importance of uniform scaling and localized assessor training. Assessors were unfamiliar with explicitly assessing, let alone scaling bitterness in coffee. The fact that we also inverted the bitter scale combined to make it the attribute that most people reported and were observed to have trouble applying.

Importantly, this cupping form was still just a modified QDA. It, and we, suffered from versions of many of the problems already described. We’ve learned a lot since that scorecard. We used this modified QDA form for around 5 years. Note that we were not alone in 2016 and 2017 running descriptive forms. Around that same time I had the good fortune to meet and participate in a competition under the direction of Dr Shawn Steiman using his descriptive analysis form and protocol. This form differed from and surpassed my own in a number of substantive ways.

Dr Steiman’s protocol had us (cuppers) establish target attributes and their scales prior to judging (as above) and then rate the proximity of each sample’s attributes to the target. If the target for the intensity of floral flavor was a 7, a coffee with a 6 was equal to a coffee with an 8, both being 1 unit away from the target. The description output in this case was twofold. We were able to describe each coffee by virtue of the intensity of its attributes, and we were also able to describe each coffee in terms of its overall attribute proximity to our target profile.

Once we had finished the preparatory work and were at the cupping table, Dr. Steiman’s form was fast and very intuitive to use. Its descriptive component was highly tuned, and while I do not think that the ultimate winner of the competition necessarily matched every judge’s favorite coffee, I do think that we all agreed that it was the best match for what we, along with other stakeholders, defined as the target profile at the outset. We also found that the subsequent place rankings were similarly very good representations of their coffee’s proximity to that target.

Descriptive systems like this have an extremely unique dual nature to consider. They simultaneously offer exceptional flexibility and nearly unyielding rigidity. It may seem obvious that if we get as fine grained as “fish flavor” for our croquette form, there’s going to be a hard limit on the breadth of experiences that we can describe. At some point the form will become unwieldy and will detract from an assessor’s ability to generate good output.

On the other hand, a panel or an administrator can easily learn and adapt future targets. This adaptivity is a strength of descriptive analysis, even if it limits the throughput or “on-the-flyness” of the method. Within a category of similar products, descriptive forms can be honed to identify and describe very fine distinctions. Within a context of broad descriptive potential, discrete descriptive tests can be created to specifically address sections of the required breadth, as opposed to being made so general as to embrace all but catch none.

Dr Steiman’s form was bespoke. Descriptive forms tend to be. It’s one of their great strengths and limitations. The terms of that bespoke creation are not accidental. As stakeholders we enter into an agreement over those terms. If we later find that we don’t agree with the output, that could as likely be because the methodology worked exactly as agreed and the stakeholders failed to understand the ramifications of the agreement or the process. We’re not stuck with it forever. We can create a new lexicon and form that looks at different things, or looks in a different way…if we’re willing to do the work of designing, setting up, and running another descriptive test.

Note that descriptive forms and lexicons can be pre-fab. The trade off is that this means the forms are less targeted to both the products being tested and to the assessors doing the testing. They are less descriptive. Even with pre-fab materials, descriptive panels remain subject to the other listed requirements. They likely also need to increase the robustness of their panel, screening, validation, and training regimens.

Cupping Spoon #7


Emperor Clothing

Cupping processes do not have to be complicated. In fact, they specifically should not be. A good cupping or tasting process, the cupping itself, will be straightforward and simple to understand and execute. As careful and involved as design and planning need to be, the tasting process itself should be as painless as possible. After an initial learning curve to develop familiarity, a tasting form or protocol should help focus you on your coffees, not on itself.

Taking Stock Q: Is there a fluency point to this form/protocol, or will I always feel like I’m trying to bend it toward my workflow, coffees, and coffee experiences?


Core Concepts

  1. Bro
  2. Dude
  3. Ya man
  4. Boots

Conclusion

Descriptive analysis testing is a widely accepted form of sensory evaluation. As suggested early in this paper, it is considered the gold standard for such evaluation. At the same time, it has achieved that status and recognition within a specific milieu. We cannot just set aside its foundations and context and assume that we can then proceed to use it unchanged as a “gold standard.” Much that we honor and treasure in specialty coffee is incompatible with descriptive analysis.

Descriptive analysis has a number of methods, most of which share common principles. Careful implementation of these principles is critical to the functioning of descriptive analysis. These can be challenging to achieve and maintain. Violation or exclusion of those principles must be countered intentionally. Free Choice Profiling is an example of a descriptive analysis technique that replaces many of these norms. Failure to account for omitted components will add cost and complexity to your cupping program while undermining its quality.

While the legwork surrounding descriptive analysis is rigorous, the actual performance of a good descriptive test should be relatively easy. These tests are specifically prep and process loaded in order to allow testers to focus as much as possible on performing the test. The testing form and procedure should not add appreciable drag to the user’s experience of testing coffees. It should describe and differentiate them.

Descriptive testing methods are powerful tools for better and for worse. We do a disservice to ourselves and to sensory professionals by pretending that they’re easy to adopt but hard to understand. The reality is that they are easier to understand and much, much harder to adopt than many seem to think. If they are a fit for your cupping needs, it’s crucial to understand both the validity of a proposed process as well as its running and maintenance requirements. If they are not a fit, they can still offer valuable threads to follow and insights to glean along the path to improving your cupping program.

When we look at the CVA, we need to be clear about what it is, and what it is not. It is not a sensory science tool. It is not a food technologist tool. We can debate whether or not it is a specialty coffee tool. Q graders and instructors took the legacy form around the globe and made it into a specialty coffee tool, before it and their work was taken away from them. We will have to wait and see if the same plays out for the CVA. Without a doubt it is a specialty coffee association tool that is intended to assess and value coffee and specialty coffee people. This is ok, as long as we’re clear about what it is and what it does.

When we look at any cupping form or protocol, we need to have this same clarity. Missing technical distinctions can undermine every good faith effort that we make to apply new ideas, and can even take us in directions that we did not intend to go. Even the best examples of descriptive analysis in specialty coffee have limitations and requirements for setting and application. Within that scope these methods can be incredibly powerful.

However, that power comes with significant cultural concerns and introduces a real risk for the consolidation and gatekeeping of coffee value assessment. Making technical descriptive testing part of the standard for cupping specialty coffee also means making the requirements for technical descriptive testing part of the standard for cupping specialty coffee. Unless the specialty coffee industry wants to restrict coffee value assessment to a small group of technically trained specialists, as is more common in commodity and related food product markets, we need to take stock of our goals and capacities before adopting protocols that suggest that direction. If we are broadly unable or unwilling to meet the requirements for technical testing procedures ourselves, we should not adopt the protocols or even the facsimiles of the protocols that require them.

If we wish to use sensory science, we should seek and be careful to augment and support specialty coffee, rather than undermine and supplant it. Sensory science offers both possibilities and as non-technical users who are the stewards of a specialty coffee industry that embodies a delicate, living, and emergent balance of human values and coffee qualities, it is imperative that we critically examine the terms and conditions of the techniques that we adopt.

Coco T.

Expert on coffee roasters | small shop use coffee roaster | 6kg 12kg commercial coffee roaster | coffee grinders

1mo

cool

Like
Reply

To view or add a comment, sign in

Explore content categories