SlideShare a Scribd company logo
 

LinkedUp: Linking Web Data for Education Project – Open Challenge in
Web-scale Data Integration
http://linkedup-project.eu/
Coordination and Support Action (CSA)
Grant Agreement No: 317620

D2.2.1

Evaluation Framework

Deliverable Coordinator:

Drachsler, Hendrik

Deliverable Coordinating Institution:

Open University of the
Netherlands (OUNL)

Other Authors:

Wolfgang Greller, (OUNL)
Slavi Stoyanov, (OUNL)

Document Identifier:

LinkedUp/2013/D2.2.1/v1.

Date due:

30.04.2013

Class Deliverable:

LinkedUp 317620

Submission date:

26.04.2013

Project start date:

November 1, 2012

Version:

V0.5

Project duration:

2 years

State:
Distribution:

Public

10:23 © Copyright lies with the respective authors and their institutions.
Page 2 of 21

LinkedUp Support Action – 317620

LinkedUp Consortium
This document is a part of the LinkedUp Support Action funded by the ICT Programme of the
Commission of the European Communities by the grant number 317620. The following partners are
involved in the project:
Leibniz Universität Hannover (LUH)
Forschungszentrum L3S
Appelstrasse 9a
30169 Hannover
Germany
Contact person: Stefan Dietze
E-mail address: dietze@L3S.de

The Open University
Walton Hall, MK7 6AA
Milton Keynes
United Kingdom
Contact person: Mathieu d'Aquin
E-mail address: m.daquin@open.ac.uk

Open Knowledge Foundation Limited LBG
Panton Street 37,
CB2 1HL Cambridge
United Kingdom
Contact person: Sander van der Waal
E-mail address: sander.vanderwaal@okfn.org

ELSEVIER BV
Radarweg 29,
1043NX AMSTERDAM
The Netherlands
Contact person: Michael Lauruhn
E-mail address: M.Lauruhn@elsevier.com

Open Universiteit Nederland
Valkenburgerweg 177,
6419 AT Heerlen
The Netherlands
Contact person: Hendrik Drachsler
E-mail address: Hendrik.Drachsler@ou.nl

EXACT Learning Solutions SPA
Viale Gramsci 19
50121 Firenze
Italy
Contact person: Elisabetta Parodi
E-mail address: e.parodi@exactls.com

Work package participants
The following partners have taken an active part in the work leading to the elaboration of this
document, even if they might not have directly contributed to the writing of this document or its
parts:
-­‐ LUH
-­‐ OU
-­‐ EXT
-­‐ ELSV

Change Log
Version Date

Amended by

Changes

0.1

15.03.2013

Hendrik Drachsler

Initial structure

0.2

25.03.2013

Hendrik Drachsler

Enrichment

0.3

26.03.2013

Wolfgang Greller

Minor corrections

0.3

01.04.2013

Hendrik Drachsler

Minor corrections

0.4

23.04.2013

Slavi Stoyanov

Reviewers feedback incorporated

0.5

25.04.2013

Hendrik Drachsler

Minor corrections
D2.2.1 Evaluation Framework

Page 3 of 21

Executive Summary
The main purpose of the current deliverable D2.2.1 is to hold the current version of the Evaluation
Framework and to operationalise it for the LinkedUp challenge judges into a concrete evaluation
instrument. This deliverable is not intended as a very elaborated report rather than a summary of the
current version of the Evaluation Framework based on the extensive studies in deliverable D2.1 –
Evaluation Methods and Metrics. D2.2.1will be reconsidered in the final report of WP2 to
demonstrate the development of the Evaluation Framework during the life cycle of the LinkedUp
project. For this purpose it is supportive to have the first version of the Evaluation Framework as a
tangible outcome and an own entity as conducted in this deliverable.

10:23 © Copyright lies with the respective authors and their institutions.
Page 4 of 21

LinkedUp Support Action – 317620

Table of Contents
1.	
   Introduction	
  ...................................................................................................	
  5	
  
2.	
   Overview of the first version of the Evaluation Framework	
  ..................	
  5	
  
3.	
  	
  	
  	
  Evaluation	
  of	
  the	
  LinkedUp	
  scoring	
  sheet	
  .........................................................	
  7	
  
5. Conclusions	
  ................................................................................................	
  12	
  
References	
  .........................................................................................................	
  13	
  
Appendix A – The LinkedUp scoring sheet	
  ..................................................	
  14	
  
D2.2.1 Evaluation Framework

Page 5 of 21

1. Introduction
The deliverable D2.1 – Evaluation Criteria and Metrics of the Task 2.1 of WP2 describes the
foundations for the first version of the LinkedUp Evaluation Framework (EF). This first version of
the EF is based on Group Concept Mapping approach that identified consensus about criteria and
methods for the evaluation of Open Web Data applications in Education and a state-of-the-art
analysis of available evaluation metrics.
The main purpose of the current deliverable D2.2.1 is to freeze the current version of the EF and to
operationalise it for the LinkedUp challenge judges into a concrete evaluation instrument. The EF is
one of the main outcomes of the FP7 LinkedUp project and will be further developed and improved
throughout the duration of the project, especially after each round of a data competition in the
LinkedUp Challenge (see D1.2). Therefore, this deliverable is not intended to be an elaborated report
but rather a summary of the current version of the EF that will be reconsidered in the final report of
WP2 to demonstrate the development of the EF during the life cycle of the LinkedUp project. For
this purpose it is important to have the first version of the EF as a tangible outcome and an own
entity as conducted in this deliverable.
In Task 2.2 - Validation of the evaluation criteria and methods of WP2 (DoW. p. 8), the EF will be
further developed and amended according to the experiences collected in the three LinkedUp data
competitions. These upcoming content validation steps of the EF after each data competition cycle is
the main responsibility for WP2 in the LinkedUp project. Each of the content validation reviews will
be reported in an amended version of D2.2.1 (D2.3.1, D2.3.2) and reported respectively in the final
version of the EF in deliverable D2.2.2.

2. Overview of the first version of the Evaluation
Framework
The information shown in this section is based on the extensive analysis reported in D2.1. Before
reporting on main findings we briefly describe the procedure for deriving the set of evaluation
criteria and indicators to enable readers who are unaware of D2.1 to get an idea about the
background of the EF. The evaluation framework is based on an empirical study applying the Group
Concept Mapping approach. 57 experts generated 212 evaluation indicators. 26 experts then sorted
the ideas generated into groups of similarity in meaning and rated the indicators on two values:
priority and applicability. The statistics of multidimensional scaling and hierarchical cluster analysis
identified 6 criteria. The Linkedup Consortium discussed the results of the study. The final, shared
vision of the Consortium is presented in Figure 1. The six criteria are: 1. Educational Innovation, 2.
Usability, 3. Performance, 4. Data, 5. Legal aspects, and 6. Audience. In the following we will
shortly introduce each evaluation criterion and it aligned evaluation method.

10:23 © Copyright lies with the respective authors and their institutions.
Page 6 of 21

LinkedUp Support Action – 317620

Figure 1: Comprehensive version of the LinkedUp Evaluation Framework based on the deliverable D2.1 of the LinkedUp
project.

Educational Innovation

‘Educational Innovation’ is based on a list of indicators that innovative educational tools should
support based on an expert survey and a recent report of Institute for Prospective Technological
Studies (IPTS), an EC research institute. In the first version of the EF, judges of the data challenge
will be able to check whether data applications address the set of indicators composing this criterion
In addition, we will ask the judges to provide a short statement for how innovative is the application
and a rating on a scale from 1-5 stars.
Usability

‘Usability’ is a very well known and elaborate concept with clear evaluation indicators. There is also
a wide range of standardised tools that can be applied to measure this criterion. The two most
applicable methods for the evaluation of the LinkedUp challenge are the Open Source Desirability
Kit (Storm, 2012), and the SUS method (Tullis and Stetson, 2004). SUS is often used in carrying out
comparisons of usability between software, it is quickly done, and yields a single benchmarking
score on a scale of 0–100 that provides an objective indication of the usability of a tool. This makes
it highly relevant for the LinkedUp challenge especially in the later stages of the data competition
where more advanced systems are expected to be entered into the competitions.
The Desirability Kit is relatively easy to apply by the judges. However, it provides more a general
description of the user satisfaction with the tool rather than a comparison score. Nevertheless, this
D2.2.1 Evaluation Framework

Page 7 of 21

approach might be very helpful to evaluate participants especially in the open track, where no clear
task is provided.
Performance

The ‘Performance’ criterion provides very clear measuring indicators derived from both the GCM
study and the literature review. For the first version of the EF we will ask participants to report
suitable indicators for their systems and asks the judges to review those descriptions.
For a future version of the EF we are considering to develop a gold standard benchmark based on the
data pool of the LinkedUp project. Such a benchmark could be based on standard algorithms as they
are part of the Mahout system1 and provided clear metrics to the participants where improvements by
their tools are expected.
Data

The indicators of the ‘Data’ criterion can be partly evaluated by providing statistics about the used
data sources, a description of some of the indicators by the participants, and an evaluation of the
same indicators by the judges of the LinkedUp challenge. For the first version of the EF, we are
considering to provide tick boxes if certain information is provided, open review fields, in addition,
and a rating scale from 1-5 stars for the judges.
Legal and Privacy

Privacy was a very consistent cluster in the GCM study and was also rated as important by the
LinkedUp consortium. We can inform the scoring sheet with some specific questionnaire items
reported in related literature. The judges will then need to rate these question items on ordinal and
nominal scales.
Audience

Audience is a very relevant aspect of the LinkedUp competition, as we are aiming to promote Linked
Data applications that have potential to change current educational practices. An application can
score very high on technical aspects of data and user interface but if it does not address educational
problems learners, teachers and educational managers have, then it is useless. Users characteristics
simply can not be ignored when developing a linked educational data application. In addition, when
looking at the impact of applications, we tend to appreciate more those that address issues of larger
user groups. The analysis can easily be done by reports gained from common analytics tools (e.g.
Google analytics) or indicators from social media applications. Thus, for the evaluation of this
criterion we expect the participant to provide indicators from analytic tools and describe their future
development and marketing plans. Finally, we will rely on the expertise of the judges to estimate the
potential of the tool and the user scenario for the near future of 1-3 years.

3. Evaluation of the LinkedUp scoring sheet
Based on the first version of the LinkedUp EF, we created a scoring sheet in Google forms2 that
allows an effective and efficient comparison of the judges’ ranked reviews of the participants
performance in the LinkedUp challenge. The scoring sheet will support the members of the review
board in evaluating the participating projects and award the cash prices. Another advantage is that we
can integrate survey-based system such as SUS for Usability and directly compute the SUS score for
1	
  http://mahout.apache.org/	
  
2

https://docs.google.com/forms/d/1-LhIS_wmoQNKFHZvod1JFMCqm-o9EevaL7ABD6-aSl4/edit

10:23 © Copyright lies with the respective authors and their institutions.

More Related Content

PDF
LinkedTV Deliverable 2.7 - Final Linked Media Layer and Evaluation
PDF
D9.1.1 Annual Project Scientific Report
PDF
First LinkedTV End-to-end Platform
PDF
10.1.1.104.5038
PDF
D2.3.1 Evaluation results of the LinkedUp Veni competition
PDF
Positive developments but challenges still ahead a survey study on ux profe...
PDF
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LinkedTV Deliverable 2.7 - Final Linked Media Layer and Evaluation
D9.1.1 Annual Project Scientific Report
First LinkedTV End-to-end Platform
10.1.1.104.5038
D2.3.1 Evaluation results of the LinkedUp Veni competition
Positive developments but challenges still ahead a survey study on ux profe...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...

What's hot (7)

PDF
PDF
ClipFlair Evaluation Report - July 2014
DOC
Preliminry report
PDF
D502023439
PDF
FRAMEWORKS BETWEEN COMPONENTS AND OBJECTS
PDF
AN APPROACH TO IMPROVEMENT THE USABILITY IN SOFTWARE PRODUCTS
PDF
The Impact of Software Complexity on Cost and Quality - A Comparative Analysi...
ClipFlair Evaluation Report - July 2014
Preliminry report
D502023439
FRAMEWORKS BETWEEN COMPONENTS AND OBJECTS
AN APPROACH TO IMPROVEMENT THE USABILITY IN SOFTWARE PRODUCTS
The Impact of Software Complexity on Cost and Quality - A Comparative Analysi...
Ad

Viewers also liked (20)

PPT
Unit 1.2 Introduction to Programming
PPS
Yurdum Insaninin Davetiyeleri
PPTX
Improve your Presentation Power
PDF
UNIT 2.2 Web Programming HTML Basics - Benchmark standard
PPT
ICT competence of Language Teachers
PPT
Unit 2.1 Part 2
PPTX
Publishing for the students living in the iPad era: our view of the industry
ODP
PPT
Best Buy Computer World100 Pres3.1.09
PPTX
Infrastructure of Pathtraq
DOCX
El sermón la vaca más sagrada del protestantismo
PDF
Persuasion Engineering - Training handout المادة التدريبية لدورة هندسة الإقناع
PDF
OPENVIS Conference Quick Report Part 2
PPT
Alla scoperta di Zend Framework 1.8
PPT
Prairie Chicken by Summer
PPT
Szablon strony www
PPTX
SEO Explained
PPTX
Тренинг продаж: проход секретаря при холодном звонке
PPS
香港六合彩 » SlideShare
PPT
Santiago Calatrava
Unit 1.2 Introduction to Programming
Yurdum Insaninin Davetiyeleri
Improve your Presentation Power
UNIT 2.2 Web Programming HTML Basics - Benchmark standard
ICT competence of Language Teachers
Unit 2.1 Part 2
Publishing for the students living in the iPad era: our view of the industry
Best Buy Computer World100 Pres3.1.09
Infrastructure of Pathtraq
El sermón la vaca más sagrada del protestantismo
Persuasion Engineering - Training handout المادة التدريبية لدورة هندسة الإقناع
OPENVIS Conference Quick Report Part 2
Alla scoperta di Zend Framework 1.8
Prairie Chicken by Summer
Szablon strony www
SEO Explained
Тренинг продаж: проход секретаря при холодном звонке
香港六合彩 » SlideShare
Santiago Calatrava
Ad

Similar to D2.2.1 Evaluation Framework (20)

PDF
D2.1 Evaluation Criteria and Methods
PDF
Scenario demonstrators
PDF
Attachment_0 (2).pdf
PDF
LinkedTV Deliverable 5.7 - Validation of the LinkedTV Architecture
PDF
LinkedTV Deliverable 3.8 - Design guideline document for concept-based presen...
DOCX
Running head critical path method1 critical path method7critic
PDF
Project management
PDF
D7.2. Dissemination and Standardisation Plan
PDF
The W+ Monitoring and Evaluation System 2017
PDF
IRJET - Scrutinizing Attributes Influencing Role of Information Communication...
PDF
Requirements document for LinkedTV user interfaces
PDF
LinkedTV interface and presentation engine
PDF
Step4all guide 3 How to apply and how to manage successful European projects
PDF
Software Requirements Specification on Bengali Braille to Text Translator
DOCX
IMPORTANT label each projects title page as follow Project 1P.docx
PDF
LinkedTV D7.5 LinkedTV Dissemination and Standardisation Report v2
PDF
Report lca tools for sustainable procurement final 20100331
PDF
EFFICIENT AND RELIABLE PERFORMANCE OF A GOAL QUESTION METRICS APPROACH FOR RE...
PDF
EFFICIENT AND RELIABLE PERFORMANCE OF A GOAL QUESTION METRICS APPROACH FOR RE...
PDF
ProjectManageme_2021_21CREATINGVALUE_AGuideToTheProjectMan.pdf
D2.1 Evaluation Criteria and Methods
Scenario demonstrators
Attachment_0 (2).pdf
LinkedTV Deliverable 5.7 - Validation of the LinkedTV Architecture
LinkedTV Deliverable 3.8 - Design guideline document for concept-based presen...
Running head critical path method1 critical path method7critic
Project management
D7.2. Dissemination and Standardisation Plan
The W+ Monitoring and Evaluation System 2017
IRJET - Scrutinizing Attributes Influencing Role of Information Communication...
Requirements document for LinkedTV user interfaces
LinkedTV interface and presentation engine
Step4all guide 3 How to apply and how to manage successful European projects
Software Requirements Specification on Bengali Braille to Text Translator
IMPORTANT label each projects title page as follow Project 1P.docx
LinkedTV D7.5 LinkedTV Dissemination and Standardisation Report v2
Report lca tools for sustainable procurement final 20100331
EFFICIENT AND RELIABLE PERFORMANCE OF A GOAL QUESTION METRICS APPROACH FOR RE...
EFFICIENT AND RELIABLE PERFORMANCE OF A GOAL QUESTION METRICS APPROACH FOR RE...
ProjectManageme_2021_21CREATINGVALUE_AGuideToTheProjectMan.pdf

More from Hendrik Drachsler (20)

PDF
Trusted Learning Analytics Research Program
PDF
Smart Speaker as Studying Assistant by Joao Pargana
PDF
Verhaltenskodex Trusted Learning Analytics
PDF
Rödling, S. (2019). Entwicklung einer Applikation zum assoziativen Medien Ler...
PDF
E.Leute: Learning the impact of Learning Analytics with an authentic dataset
PDF
Romano, G. (2019) Dancing Trainer: A System For Humans To Learn Dancing Using...
PPTX
Towards Tangible Trusted Learning Analytics
PPTX
Trusted Learning Analytics
PPTX
Fighting level 3: From the LA framework to LA practice on the micro-level
PDF
LACE Project Overview and Exploitation
PPTX
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
PPTX
Recommendations for Open Online Education: An Algorithmic Study
PDF
Privacy and Analytics – it’s a DELICATE Issue. A Checklist for Trusted Learni...
PDF
DELICATE checklist - to establish trusted Learning Analytics
PDF
LACE Flyer 2016
PPT
The Future of Big Data in Education
PDF
The Future of Learning Analytics
PDF
Six dimensions of Learning Analytics
PDF
Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store -
PDF
Ethics privacy washington
Trusted Learning Analytics Research Program
Smart Speaker as Studying Assistant by Joao Pargana
Verhaltenskodex Trusted Learning Analytics
Rödling, S. (2019). Entwicklung einer Applikation zum assoziativen Medien Ler...
E.Leute: Learning the impact of Learning Analytics with an authentic dataset
Romano, G. (2019) Dancing Trainer: A System For Humans To Learn Dancing Using...
Towards Tangible Trusted Learning Analytics
Trusted Learning Analytics
Fighting level 3: From the LA framework to LA practice on the micro-level
LACE Project Overview and Exploitation
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
Recommendations for Open Online Education: An Algorithmic Study
Privacy and Analytics – it’s a DELICATE Issue. A Checklist for Trusted Learni...
DELICATE checklist - to establish trusted Learning Analytics
LACE Flyer 2016
The Future of Big Data in Education
The Future of Learning Analytics
Six dimensions of Learning Analytics
Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store -
Ethics privacy washington

Recently uploaded (20)

PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
Modernising the Digital Integration Hub
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
The various Industrial Revolutions .pptx
PDF
project resource management chapter-09.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
DP Operators-handbook-extract for the Mautical Institute
Programs and apps: productivity, graphics, security and other tools
observCloud-Native Containerability and monitoring.pptx
WOOl fibre morphology and structure.pdf for textiles
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Getting started with AI Agents and Multi-Agent Systems
Chapter 5: Probability Theory and Statistics
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Enhancing emotion recognition model for a student engagement use case through...
NewMind AI Weekly Chronicles – August ’25 Week III
Module 1.ppt Iot fundamentals and Architecture
Web App vs Mobile App What Should You Build First.pdf
OMC Textile Division Presentation 2021.pptx
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Modernising the Digital Integration Hub
A contest of sentiment analysis: k-nearest neighbor versus neural network
The various Industrial Revolutions .pptx
project resource management chapter-09.pdf
Developing a website for English-speaking practice to English as a foreign la...

D2.2.1 Evaluation Framework

  • 1.   LinkedUp: Linking Web Data for Education Project – Open Challenge in Web-scale Data Integration http://linkedup-project.eu/ Coordination and Support Action (CSA) Grant Agreement No: 317620 D2.2.1 Evaluation Framework Deliverable Coordinator: Drachsler, Hendrik Deliverable Coordinating Institution: Open University of the Netherlands (OUNL) Other Authors: Wolfgang Greller, (OUNL) Slavi Stoyanov, (OUNL) Document Identifier: LinkedUp/2013/D2.2.1/v1. Date due: 30.04.2013 Class Deliverable: LinkedUp 317620 Submission date: 26.04.2013 Project start date: November 1, 2012 Version: V0.5 Project duration: 2 years State: Distribution: Public 10:23 © Copyright lies with the respective authors and their institutions.
  • 2. Page 2 of 21 LinkedUp Support Action – 317620 LinkedUp Consortium This document is a part of the LinkedUp Support Action funded by the ICT Programme of the Commission of the European Communities by the grant number 317620. The following partners are involved in the project: Leibniz Universität Hannover (LUH) Forschungszentrum L3S Appelstrasse 9a 30169 Hannover Germany Contact person: Stefan Dietze E-mail address: dietze@L3S.de The Open University Walton Hall, MK7 6AA Milton Keynes United Kingdom Contact person: Mathieu d'Aquin E-mail address: m.daquin@open.ac.uk Open Knowledge Foundation Limited LBG Panton Street 37, CB2 1HL Cambridge United Kingdom Contact person: Sander van der Waal E-mail address: sander.vanderwaal@okfn.org ELSEVIER BV Radarweg 29, 1043NX AMSTERDAM The Netherlands Contact person: Michael Lauruhn E-mail address: M.Lauruhn@elsevier.com Open Universiteit Nederland Valkenburgerweg 177, 6419 AT Heerlen The Netherlands Contact person: Hendrik Drachsler E-mail address: Hendrik.Drachsler@ou.nl EXACT Learning Solutions SPA Viale Gramsci 19 50121 Firenze Italy Contact person: Elisabetta Parodi E-mail address: e.parodi@exactls.com Work package participants The following partners have taken an active part in the work leading to the elaboration of this document, even if they might not have directly contributed to the writing of this document or its parts: -­‐ LUH -­‐ OU -­‐ EXT -­‐ ELSV Change Log Version Date Amended by Changes 0.1 15.03.2013 Hendrik Drachsler Initial structure 0.2 25.03.2013 Hendrik Drachsler Enrichment 0.3 26.03.2013 Wolfgang Greller Minor corrections 0.3 01.04.2013 Hendrik Drachsler Minor corrections 0.4 23.04.2013 Slavi Stoyanov Reviewers feedback incorporated 0.5 25.04.2013 Hendrik Drachsler Minor corrections
  • 3. D2.2.1 Evaluation Framework Page 3 of 21 Executive Summary The main purpose of the current deliverable D2.2.1 is to hold the current version of the Evaluation Framework and to operationalise it for the LinkedUp challenge judges into a concrete evaluation instrument. This deliverable is not intended as a very elaborated report rather than a summary of the current version of the Evaluation Framework based on the extensive studies in deliverable D2.1 – Evaluation Methods and Metrics. D2.2.1will be reconsidered in the final report of WP2 to demonstrate the development of the Evaluation Framework during the life cycle of the LinkedUp project. For this purpose it is supportive to have the first version of the Evaluation Framework as a tangible outcome and an own entity as conducted in this deliverable. 10:23 © Copyright lies with the respective authors and their institutions.
  • 4. Page 4 of 21 LinkedUp Support Action – 317620 Table of Contents 1.   Introduction  ...................................................................................................  5   2.   Overview of the first version of the Evaluation Framework  ..................  5   3.        Evaluation  of  the  LinkedUp  scoring  sheet  .........................................................  7   5. Conclusions  ................................................................................................  12   References  .........................................................................................................  13   Appendix A – The LinkedUp scoring sheet  ..................................................  14  
  • 5. D2.2.1 Evaluation Framework Page 5 of 21 1. Introduction The deliverable D2.1 – Evaluation Criteria and Metrics of the Task 2.1 of WP2 describes the foundations for the first version of the LinkedUp Evaluation Framework (EF). This first version of the EF is based on Group Concept Mapping approach that identified consensus about criteria and methods for the evaluation of Open Web Data applications in Education and a state-of-the-art analysis of available evaluation metrics. The main purpose of the current deliverable D2.2.1 is to freeze the current version of the EF and to operationalise it for the LinkedUp challenge judges into a concrete evaluation instrument. The EF is one of the main outcomes of the FP7 LinkedUp project and will be further developed and improved throughout the duration of the project, especially after each round of a data competition in the LinkedUp Challenge (see D1.2). Therefore, this deliverable is not intended to be an elaborated report but rather a summary of the current version of the EF that will be reconsidered in the final report of WP2 to demonstrate the development of the EF during the life cycle of the LinkedUp project. For this purpose it is important to have the first version of the EF as a tangible outcome and an own entity as conducted in this deliverable. In Task 2.2 - Validation of the evaluation criteria and methods of WP2 (DoW. p. 8), the EF will be further developed and amended according to the experiences collected in the three LinkedUp data competitions. These upcoming content validation steps of the EF after each data competition cycle is the main responsibility for WP2 in the LinkedUp project. Each of the content validation reviews will be reported in an amended version of D2.2.1 (D2.3.1, D2.3.2) and reported respectively in the final version of the EF in deliverable D2.2.2. 2. Overview of the first version of the Evaluation Framework The information shown in this section is based on the extensive analysis reported in D2.1. Before reporting on main findings we briefly describe the procedure for deriving the set of evaluation criteria and indicators to enable readers who are unaware of D2.1 to get an idea about the background of the EF. The evaluation framework is based on an empirical study applying the Group Concept Mapping approach. 57 experts generated 212 evaluation indicators. 26 experts then sorted the ideas generated into groups of similarity in meaning and rated the indicators on two values: priority and applicability. The statistics of multidimensional scaling and hierarchical cluster analysis identified 6 criteria. The Linkedup Consortium discussed the results of the study. The final, shared vision of the Consortium is presented in Figure 1. The six criteria are: 1. Educational Innovation, 2. Usability, 3. Performance, 4. Data, 5. Legal aspects, and 6. Audience. In the following we will shortly introduce each evaluation criterion and it aligned evaluation method. 10:23 © Copyright lies with the respective authors and their institutions.
  • 6. Page 6 of 21 LinkedUp Support Action – 317620 Figure 1: Comprehensive version of the LinkedUp Evaluation Framework based on the deliverable D2.1 of the LinkedUp project. Educational Innovation ‘Educational Innovation’ is based on a list of indicators that innovative educational tools should support based on an expert survey and a recent report of Institute for Prospective Technological Studies (IPTS), an EC research institute. In the first version of the EF, judges of the data challenge will be able to check whether data applications address the set of indicators composing this criterion In addition, we will ask the judges to provide a short statement for how innovative is the application and a rating on a scale from 1-5 stars. Usability ‘Usability’ is a very well known and elaborate concept with clear evaluation indicators. There is also a wide range of standardised tools that can be applied to measure this criterion. The two most applicable methods for the evaluation of the LinkedUp challenge are the Open Source Desirability Kit (Storm, 2012), and the SUS method (Tullis and Stetson, 2004). SUS is often used in carrying out comparisons of usability between software, it is quickly done, and yields a single benchmarking score on a scale of 0–100 that provides an objective indication of the usability of a tool. This makes it highly relevant for the LinkedUp challenge especially in the later stages of the data competition where more advanced systems are expected to be entered into the competitions. The Desirability Kit is relatively easy to apply by the judges. However, it provides more a general description of the user satisfaction with the tool rather than a comparison score. Nevertheless, this
  • 7. D2.2.1 Evaluation Framework Page 7 of 21 approach might be very helpful to evaluate participants especially in the open track, where no clear task is provided. Performance The ‘Performance’ criterion provides very clear measuring indicators derived from both the GCM study and the literature review. For the first version of the EF we will ask participants to report suitable indicators for their systems and asks the judges to review those descriptions. For a future version of the EF we are considering to develop a gold standard benchmark based on the data pool of the LinkedUp project. Such a benchmark could be based on standard algorithms as they are part of the Mahout system1 and provided clear metrics to the participants where improvements by their tools are expected. Data The indicators of the ‘Data’ criterion can be partly evaluated by providing statistics about the used data sources, a description of some of the indicators by the participants, and an evaluation of the same indicators by the judges of the LinkedUp challenge. For the first version of the EF, we are considering to provide tick boxes if certain information is provided, open review fields, in addition, and a rating scale from 1-5 stars for the judges. Legal and Privacy Privacy was a very consistent cluster in the GCM study and was also rated as important by the LinkedUp consortium. We can inform the scoring sheet with some specific questionnaire items reported in related literature. The judges will then need to rate these question items on ordinal and nominal scales. Audience Audience is a very relevant aspect of the LinkedUp competition, as we are aiming to promote Linked Data applications that have potential to change current educational practices. An application can score very high on technical aspects of data and user interface but if it does not address educational problems learners, teachers and educational managers have, then it is useless. Users characteristics simply can not be ignored when developing a linked educational data application. In addition, when looking at the impact of applications, we tend to appreciate more those that address issues of larger user groups. The analysis can easily be done by reports gained from common analytics tools (e.g. Google analytics) or indicators from social media applications. Thus, for the evaluation of this criterion we expect the participant to provide indicators from analytic tools and describe their future development and marketing plans. Finally, we will rely on the expertise of the judges to estimate the potential of the tool and the user scenario for the near future of 1-3 years. 3. Evaluation of the LinkedUp scoring sheet Based on the first version of the LinkedUp EF, we created a scoring sheet in Google forms2 that allows an effective and efficient comparison of the judges’ ranked reviews of the participants performance in the LinkedUp challenge. The scoring sheet will support the members of the review board in evaluating the participating projects and award the cash prices. Another advantage is that we can integrate survey-based system such as SUS for Usability and directly compute the SUS score for 1  http://mahout.apache.org/   2 https://docs.google.com/forms/d/1-LhIS_wmoQNKFHZvod1JFMCqm-o9EevaL7ABD6-aSl4/edit 10:23 © Copyright lies with the respective authors and their institutions.