SlideShare a Scribd company logo
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3
Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics
[Digital Preservation]
“This project has received funding from the European Union’s Seventh
Framework Programme for research, technological development and
demonstration under grant agreement no601138”.
Choice of IE Technique
Anna-Grit Eggers (University of Goettingen)
Encapsulation
Techniques
Features
• IE techniques cover a wide range of uses which differ regarding the:
•processing velocity
•required disk space
•location of storage
•accessibility and perceptibility of the payload (by human / by machine analysis)
•preservation level of the carrier / digital object
•processability of digital object and payload file formats and file sizes
•provided compression mechanisms.
Range of uses
• We identified criteria to distinguish between the techniques.
Criteria
• An encapsulation technique that fits for a specific use
scenario can be chosen based on the technique specific
characteristics of these criteria.
• Definition of criterion in this context:
A property or feature of information encapsulation techniques that can be
used to compare different techniques on the basis of the criterion
characteristics.
• For example:
•Robustness of encapsulated information after encapsulation with an algorithm
towards processing of the carrier.
•Perceptibility of the encapsulated information by an observation of the carrier.
Examples of characteristics for this example:
• The characteristic of a technique for the criterion “Robustness” can be:
• “robust” (“true”)
• “not robust” (“false”)
• The characteristic of a technique for the criterion “Perceptibility” can be:
• “visible”
• “not perceptible by humans”
• “detectable by computers”
• “not perceptible at all”.
Criterion characteristics
• The assignment for this criterion is harder, because the threshold for
“human perceptibility” or “computer detectability” is blurred and not a
“true/false” value.
PERICLES - Choice of Information Encapsulation (IE) Technique
Encapsulation
Techniques
User scenario
• IE techniques are used for a specific purpose in the context of a use
scenario.
• This usage scenario defines features and characteristics that are desired to
be fulfilled by a potential IE technique.
• The overall task is to find the best IE technique by capturing and evaluating
the scenario defined by the user.
• The user could aim for encapsulating messages or metadata with digital
files.
• The aim could also be to add legal information, ownership information,
corporate designs, or information to ensure the authenticity of a digital
object.
Creating a scenario
• Some need a visible payload, others prefer to hide it.
• The most valuable information might be the digital object or the payload
itself (this is often the case with steganographic messages).
Creating a scenario (cont.)
• Three procedures are required to implement the scenario:
•scenario capturing
•weighting
•decision calculation mechanism
Implementing a scenario
• Capturing:
a questionnaire which requests the importance of a set of scenario criteria
for the given scenario. The criteria are chosen in a way that they can be
mapped to the features of the IE techniques.
• The amount of investigated criteria correlates with the amount of available
IE techniques: It should be high enough to be able to distinguish between
all main techniques, but low enough that the user won’t be overwhelmed
while filling the questionnaire.
• Weighting:
Another crucial aspect is to ask the user
• which of the criteria are important for the scenario
• how important they are,
• which should be excluded because not pertinent.
Implementing a scenario
• The Analytic Hierarchy Process is a sophisticated but complex method:
criteria are compared to each other and the user has to decide for each
comparison which criterion is the more important one.
• A simpler approach is to include an option to exclude unimportant
criteria and add a weighting mechanism for the user to indicate how
desirable a characteristic of a criterion is, or how important it is for the
scenario.
● There are two types of criteria to consider:
Decision criteria
• must be fulfilled to be able to use a
specific algorithm.
• File formats are an example of a
technical criterion, because some
algorithms can only be used for
specific file formats.
Technical
criteria
• depend on a usage scenario or the
user preferences.
Scenario
criteria
Technical criteria:
◦ File formats (for carrier files as well as the payload files)
◦ Number of files
◦ Capacity
Technical decision criteria
File formats
• Embedding algorithms usually supports a set of file formats and cannot
be used on files with the wrong formats.
• For packaging, the metadata has to be mapped to one of the standard
XML packaging formats. The created packet reduces the risk of data
loss.
Technical decision criteria
Number of files
• A growing number of files increases the risk of losing one of them.
• With more than one file, the files can facilitate the identification process
of the belonging files by providing indications for the used formats. That
reduces the impact of file format obsolescence.
Capacity
• Capacity is the message size constraint: the number of payload bits that can be
embedded by an embedding algorithm into a specific digital object.
Technical criteria (cont.)
• It is influenced by the data format and used method. Some methods increase
the risk of damaging the carrier file, or the payload becomes visible if the
message size is too big.
• While packaging methods have no limit for the size of the payload files,
embedding methods mostly have a maximum payload size.
• Invisible watermarking and steganography embedding methods not only
become visible, if the payload size is too high, the cost for algorithmic
calculations will also strongly increase with the size of the payload.
• The use of an information frame can scale theoretically well for big payload file
size. Though it might be unproductive, if the data frame outsizes the original
digital object. In such a case the use of a packaging method can be considered
● Processability and robustness
● Complexity (space/time) of the algorithm
● Used disk space of the output
● Restorability of the carrier
● Risk of data loss
● Perceptibility
● Location of the encapsulated information
● Spreading, standardization
● Security, confidentiality
● Authenticity
Scenario criteria
• To be re-usable, digital objects need to be processable normally by
applications unhindered by its encapsulated information.
Processability
• Packaging techniques might require unpacking before processing
the digital object, thus consuming additional calculation power.
• Embedding techniques do not change the file format of a digital
object which therefore can usually be processed directly.
• A method that allows for encapsulated information to survive
processing steps is considered “robust”.
• In an ideal case the embedded metadata survives even file format
conversions.
• Robustness can be strong or weak. Weakness implies an additional
extraction step to keep the metadata safe.
Robustness
• In a scenario where the digital object is frequently viewed and
processed, the metadata has to be embedded with a robust method.
• Steganographic methods are often very robust:
• they take an attacker into account.
• the digital objects can be processed normally, because a usage restriction would betray the
hidden messages.
• Visible digital watermarks can be very robust.
• Imperceptible watermarks are often fragile or semi fragile,
to allow the recognition of authenticity violations.
• The use of available metadata fields is a very robust method that allows
even object conversions. This can be valid for information frames, too.
Parameters for calculating costs for algorithmic calculations and
resources for the encapsulation process:
Complexity of algorithms
● The time and space requirements related to the complexity of the algorithm
used for the encapsulation and recovery of the original digital object and
the environment information
● This includes costs for validation calculations and for the unpacking
algorithms of packaging strategies.
● Big O notation can help to express the algorithms behaviour in relation to
the embedding payload size:
http://web.mit.edu/16.070/www/lecture/big_o.pdf
● Frequent use a digital object and its metadata requires faster the
restoration time.
• The time for decompression has to be added if the data was compressed.
• Packaging mostly needs a lot more time for this than embedding.
• With robust methods, an extraction is not necessary before the reuse of the
object.
• Edition of available metadata fields is integrated in many programs, and
therefore not very time intensive.
• The extension of an information frame in itself is not necessarily time
intensive, whereas the embedding method used on the frame might.
Complexity of algorithms (cont)
• The difference of disk space needed for the enriched digital object in
contrast to the original digital object can be an important parameter for
preserving a large amount of data.
Disk space requirements
• Some methods offer the possibility of compressing the data, so that disk
space can be saved.
• Packaging container compress both payload and digital object.
• Embedding methods offering compression only compress the embedded
metadata and not the carrier.
• Packaging can save more disk space than embedding with compression.
• An integrity check for possible damage during compression.
• Compression, decompression and validation require extra calculation time
• Embedding methods mostly do not need much extra disk space.
Disk space requirements (cont.)
• With steganography algorithms changing only single bits, the size of
the digital object remains constant. This method, however limits the
capacity.
• The use of available metadata fields doesn’t need much disk space.
Compression can extend the capacity for these methods.
• Information frames need additional disk space proportional to the size of
the metadata files that should be stored.
• The encapsulation method has to ensure that the digital object and the
metadata can be restored.
• The digital object has to be restored in its original state unscathed.
• The integrity has to be verified by checksum comparisons.
• There are different levels of integrity, e.g. just to ensure that the significant
properties survive, or a bit exact replica.
• The significant properties have to survive in any case.
• A validation of the whole object is often simpler than the validation of the
significant properties.
• It is highly improbably that packaging damages the digital object. A
validation is easy, if the checksum was added to the metadata file.
Restoration
• Not all algorithms that are used for embedding are completely
reversible.
• Reversible embedding algorithms often embed the information of
how to reverse them into a defined location of the digital object.
• If metadata is converted in the encapsulation, by example by
compression, encryption, or format conversion, it might be
necessary to validate the metadata.
• The methods using available metadata fields or information frames
offer easy restoration.
Restoration (cont.)
• The following factors increase the risk of damage for the digital object or the
metadata:
•encryption usage
•information hiding
•compression
•processing
•conversion of the digital object
Risk of data loss
• Packaging stores the metadata in separate files. This guarantees access to
embedded information which in turn may help identify the related digital data.
• Data containers used for packaging mostly have standard formats that are
not as vulnerable as non-standardised formats.
• At the same time the risk of data loss is higher for separated files when
unpacking.
• For some embedding methods object modifications are inevitable.
• The term ’Data Hiding’ describes methods to embed information in a way it
is not perceptible by humans.
• Steganography and invisible digital watermarks are mostly detectable by
machines.
• For most preservation scenarios it is necessary to be able to detect the
encapsulated information.
• Data hiding increases the risk of losing the knowledge about the existence
of this data.
• To avoid this, the carrier can be tagged with a visible method.
• Packaging is always visible, whereas steganographic methods are usually
invisible.
Visibility
• The location where the metadata is encapsulated can be a decision
criterion:
•a separate file
•the exact location at a file
•the time dimension of an audio file
•the background noise.
Location
• The location of storage is the main difference between packaging
and embedding methods:
•Packaging stores the information in a separated file, mostly in a standardised XML
format.
•Embedding stores the environment information directly in the digital object.
• The embedding into the background noise has no influence on the
significant properties and doesn't need additional disk space.
• Therefore, the noise has to be clearly identifiable, to prevent damage of
the digital object.
• Using available metadata fields, or an information frame, do not
influence the significant properties of the digital object directly.
• Some embedding methods store information by changing elements of
the object, e.g. by inverting single bits of an image pixel, or by usingof
an imperceptible frequency of audio files.
• Some data formats offer extra space for the storage of additional
information.
Location (cont.)
• Some encapsulation tools offer security features, like encryption.
Security
• If encryption requires a secret key for accessing the data, there is
high risk potential of losing the data by losing the key.
• The preservation and re-use of confidential objects or encapsulated
information requires adequate prudence. For this purpose an
encryption makes sense.
• The confidentiality of steganographic methods is based on the
retention of knowledge or authorisation, if no additional encryption is
used. Insofar this constitutes a very weak kind of confidentiality.
• Authenticity and integrity of the digital object and its environment
information are paramount for many usages.
• Authenticity can be important if the digital object has special legal
requirements.
• Fragile or semi-fragile digital watermarks can be used in some
cases to ensure the integrity of a delivery copy of an object, hereby
the object is changed slightly by the application of the watermarking
algorithm.
• The marking would be destroyed, if the file is altered, thereby an
intact mark can ensure that no third party changed the object.
• Authenticity plays a major role in the archive context in which also
the provenance and chain of custody of an object are important.
Authenticity
• To guarantee the integrity of a digital object, it is often kept apart in
its original state and context, all changes to the original are omitted.
Integrity
• The BagIt directory structure can be used without applying an
additional packaging or compression method to prevent object
alterations.
• Metadata is added into other defined directories of the structure, so
that the digital object remains untouched, even by complementing
information at a future date.
• The integrity of the encapsulated information can be verified by
adding the checksum of their originals to the restoration metadata.

More Related Content

PPT
The digital preservation technical context
PPTX
The Building of Thai Grid
PDF
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
PDF
Secure Image Encryption Using Filter Bank and Addition Modulo 28 with Exclusi...
PDF
A Survey: Enhanced Block Level Message Locked Encryption for data Deduplication
PDF
1699 1704
PDF
PERFORMANCE ANALYSIS OF TEXT AND IMAGE STEGANOGRAPHY WITH RSA ALGORITHM IN CL...
PPTX
The PeriCAT Framework
The digital preservation technical context
The Building of Thai Grid
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
Secure Image Encryption Using Filter Bank and Addition Modulo 28 with Exclusi...
A Survey: Enhanced Block Level Message Locked Encryption for data Deduplication
1699 1704
PERFORMANCE ANALYSIS OF TEXT AND IMAGE STEGANOGRAPHY WITH RSA ALGORITHM IN CL...
The PeriCAT Framework

Similar to PERICLES - Choice of Information Encapsulation (IE) Technique (20)

PDF
Preservation Planning: Choosing a suitable digital preservation strategy
PDF
Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhame...
PDF
File compression sunzip (huffman algorithm)
PDF
Andrew Waugh presentation
PDF
Advanced image processing notes ankita_dubey
PPT
Andrew Waugh presentation
PDF
Avoiding Over Design and Under Design
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
International Journal of Engineering Research and Development (IJERD)
PPT
Know the user
PPT
InSPECT Significant Properties Framework (SPs part 2), by Stephen Grace and G...
PDF
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
PDF
Invisible water marking within media files using state of-the-art technology
PDF
Ijeee 16-19-digital media hidden data extracting
PDF
Oos Short Q N
Preservation Planning: Choosing a suitable digital preservation strategy
Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhame...
File compression sunzip (huffman algorithm)
Andrew Waugh presentation
Advanced image processing notes ankita_dubey
Andrew Waugh presentation
Avoiding Over Design and Under Design
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
Know the user
InSPECT Significant Properties Framework (SPs part 2), by Stephen Grace and G...
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
Invisible water marking within media files using state of-the-art technology
Ijeee 16-19-digital media hidden data extracting
Oos Short Q N
Ad

More from PERICLES_FP7 (20)

PPTX
Digital Ecosystem and Process Compiler - IDCC17
PPTX
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
PPTX
Technical appraisal and change impact analysis - IDCC17 workshop
PDF
ForgetIT: human memory inspired Information Model
PPTX
Data quality, preservation and access: a DANS perspective
PPTX
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
PPTX
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
PPTX
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
PPTX
Filling the Digital Preservation Gap - Acting on Change
PPTX
Risk assessment for preservation in the active life of complex digital object...
PPTX
Technical Appraisal Tool, MICE - Acting on Change 2016
PPTX
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PDF
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
PPTX
Automatic policy application and change management - Acting on Change 2016
PPTX
Reproducibile scientific workflows - Acting on Change 2016
PPTX
Pro-active solutions for higher reproducibility of scientific experiments - A...
PPTX
PERICLES Policy management & ontology supported preservation - Acting on Chan...
PPTX
PERICLES Modelling Policies - Acting on Change 2016
PPTX
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PPTX
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
Digital Ecosystem and Process Compiler - IDCC17
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical appraisal and change impact analysis - IDCC17 workshop
ForgetIT: human memory inspired Information Model
Data quality, preservation and access: a DANS perspective
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Filling the Digital Preservation Gap - Acting on Change
Risk assessment for preservation in the active life of complex digital object...
Technical Appraisal Tool, MICE - Acting on Change 2016
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Automatic policy application and change management - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016
Pro-active solutions for higher reproducibility of scientific experiments - A...
PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Modelling Policies - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
Ad

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Spectroscopy.pptx food analysis technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
The AUB Centre for AI in Media Proposal.docx
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
Programs and apps: productivity, graphics, security and other tools
MYSQL Presentation for SQL database connectivity
Spectroscopy.pptx food analysis technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectral efficient network and resource selection model in 5G networks
Building Integrated photovoltaic BIPV_UPV.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
Diabetes mellitus diagnosis method based random forest with bat algorithm

PERICLES - Choice of Information Encapsulation (IE) Technique

  • 1. GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] “This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”. Choice of IE Technique Anna-Grit Eggers (University of Goettingen)
  • 3. • IE techniques cover a wide range of uses which differ regarding the: •processing velocity •required disk space •location of storage •accessibility and perceptibility of the payload (by human / by machine analysis) •preservation level of the carrier / digital object •processability of digital object and payload file formats and file sizes •provided compression mechanisms. Range of uses
  • 4. • We identified criteria to distinguish between the techniques. Criteria • An encapsulation technique that fits for a specific use scenario can be chosen based on the technique specific characteristics of these criteria. • Definition of criterion in this context: A property or feature of information encapsulation techniques that can be used to compare different techniques on the basis of the criterion characteristics. • For example: •Robustness of encapsulated information after encapsulation with an algorithm towards processing of the carrier. •Perceptibility of the encapsulated information by an observation of the carrier.
  • 5. Examples of characteristics for this example: • The characteristic of a technique for the criterion “Robustness” can be: • “robust” (“true”) • “not robust” (“false”) • The characteristic of a technique for the criterion “Perceptibility” can be: • “visible” • “not perceptible by humans” • “detectable by computers” • “not perceptible at all”. Criterion characteristics • The assignment for this criterion is harder, because the threshold for “human perceptibility” or “computer detectability” is blurred and not a “true/false” value.
  • 8. • IE techniques are used for a specific purpose in the context of a use scenario. • This usage scenario defines features and characteristics that are desired to be fulfilled by a potential IE technique. • The overall task is to find the best IE technique by capturing and evaluating the scenario defined by the user. • The user could aim for encapsulating messages or metadata with digital files. • The aim could also be to add legal information, ownership information, corporate designs, or information to ensure the authenticity of a digital object. Creating a scenario
  • 9. • Some need a visible payload, others prefer to hide it. • The most valuable information might be the digital object or the payload itself (this is often the case with steganographic messages). Creating a scenario (cont.)
  • 10. • Three procedures are required to implement the scenario: •scenario capturing •weighting •decision calculation mechanism Implementing a scenario • Capturing: a questionnaire which requests the importance of a set of scenario criteria for the given scenario. The criteria are chosen in a way that they can be mapped to the features of the IE techniques. • The amount of investigated criteria correlates with the amount of available IE techniques: It should be high enough to be able to distinguish between all main techniques, but low enough that the user won’t be overwhelmed while filling the questionnaire.
  • 11. • Weighting: Another crucial aspect is to ask the user • which of the criteria are important for the scenario • how important they are, • which should be excluded because not pertinent. Implementing a scenario • The Analytic Hierarchy Process is a sophisticated but complex method: criteria are compared to each other and the user has to decide for each comparison which criterion is the more important one. • A simpler approach is to include an option to exclude unimportant criteria and add a weighting mechanism for the user to indicate how desirable a characteristic of a criterion is, or how important it is for the scenario.
  • 12. ● There are two types of criteria to consider: Decision criteria • must be fulfilled to be able to use a specific algorithm. • File formats are an example of a technical criterion, because some algorithms can only be used for specific file formats. Technical criteria • depend on a usage scenario or the user preferences. Scenario criteria
  • 13. Technical criteria: ◦ File formats (for carrier files as well as the payload files) ◦ Number of files ◦ Capacity Technical decision criteria
  • 14. File formats • Embedding algorithms usually supports a set of file formats and cannot be used on files with the wrong formats. • For packaging, the metadata has to be mapped to one of the standard XML packaging formats. The created packet reduces the risk of data loss. Technical decision criteria Number of files • A growing number of files increases the risk of losing one of them. • With more than one file, the files can facilitate the identification process of the belonging files by providing indications for the used formats. That reduces the impact of file format obsolescence.
  • 15. Capacity • Capacity is the message size constraint: the number of payload bits that can be embedded by an embedding algorithm into a specific digital object. Technical criteria (cont.) • It is influenced by the data format and used method. Some methods increase the risk of damaging the carrier file, or the payload becomes visible if the message size is too big. • While packaging methods have no limit for the size of the payload files, embedding methods mostly have a maximum payload size. • Invisible watermarking and steganography embedding methods not only become visible, if the payload size is too high, the cost for algorithmic calculations will also strongly increase with the size of the payload. • The use of an information frame can scale theoretically well for big payload file size. Though it might be unproductive, if the data frame outsizes the original digital object. In such a case the use of a packaging method can be considered
  • 16. ● Processability and robustness ● Complexity (space/time) of the algorithm ● Used disk space of the output ● Restorability of the carrier ● Risk of data loss ● Perceptibility ● Location of the encapsulated information ● Spreading, standardization ● Security, confidentiality ● Authenticity Scenario criteria
  • 17. • To be re-usable, digital objects need to be processable normally by applications unhindered by its encapsulated information. Processability • Packaging techniques might require unpacking before processing the digital object, thus consuming additional calculation power. • Embedding techniques do not change the file format of a digital object which therefore can usually be processed directly. • A method that allows for encapsulated information to survive processing steps is considered “robust”. • In an ideal case the embedded metadata survives even file format conversions.
  • 18. • Robustness can be strong or weak. Weakness implies an additional extraction step to keep the metadata safe. Robustness • In a scenario where the digital object is frequently viewed and processed, the metadata has to be embedded with a robust method. • Steganographic methods are often very robust: • they take an attacker into account. • the digital objects can be processed normally, because a usage restriction would betray the hidden messages. • Visible digital watermarks can be very robust. • Imperceptible watermarks are often fragile or semi fragile, to allow the recognition of authenticity violations. • The use of available metadata fields is a very robust method that allows even object conversions. This can be valid for information frames, too.
  • 19. Parameters for calculating costs for algorithmic calculations and resources for the encapsulation process: Complexity of algorithms ● The time and space requirements related to the complexity of the algorithm used for the encapsulation and recovery of the original digital object and the environment information ● This includes costs for validation calculations and for the unpacking algorithms of packaging strategies. ● Big O notation can help to express the algorithms behaviour in relation to the embedding payload size: http://web.mit.edu/16.070/www/lecture/big_o.pdf ● Frequent use a digital object and its metadata requires faster the restoration time.
  • 20. • The time for decompression has to be added if the data was compressed. • Packaging mostly needs a lot more time for this than embedding. • With robust methods, an extraction is not necessary before the reuse of the object. • Edition of available metadata fields is integrated in many programs, and therefore not very time intensive. • The extension of an information frame in itself is not necessarily time intensive, whereas the embedding method used on the frame might. Complexity of algorithms (cont)
  • 21. • The difference of disk space needed for the enriched digital object in contrast to the original digital object can be an important parameter for preserving a large amount of data. Disk space requirements • Some methods offer the possibility of compressing the data, so that disk space can be saved. • Packaging container compress both payload and digital object. • Embedding methods offering compression only compress the embedded metadata and not the carrier. • Packaging can save more disk space than embedding with compression. • An integrity check for possible damage during compression. • Compression, decompression and validation require extra calculation time
  • 22. • Embedding methods mostly do not need much extra disk space. Disk space requirements (cont.) • With steganography algorithms changing only single bits, the size of the digital object remains constant. This method, however limits the capacity. • The use of available metadata fields doesn’t need much disk space. Compression can extend the capacity for these methods. • Information frames need additional disk space proportional to the size of the metadata files that should be stored.
  • 23. • The encapsulation method has to ensure that the digital object and the metadata can be restored. • The digital object has to be restored in its original state unscathed. • The integrity has to be verified by checksum comparisons. • There are different levels of integrity, e.g. just to ensure that the significant properties survive, or a bit exact replica. • The significant properties have to survive in any case. • A validation of the whole object is often simpler than the validation of the significant properties. • It is highly improbably that packaging damages the digital object. A validation is easy, if the checksum was added to the metadata file. Restoration
  • 24. • Not all algorithms that are used for embedding are completely reversible. • Reversible embedding algorithms often embed the information of how to reverse them into a defined location of the digital object. • If metadata is converted in the encapsulation, by example by compression, encryption, or format conversion, it might be necessary to validate the metadata. • The methods using available metadata fields or information frames offer easy restoration. Restoration (cont.)
  • 25. • The following factors increase the risk of damage for the digital object or the metadata: •encryption usage •information hiding •compression •processing •conversion of the digital object Risk of data loss • Packaging stores the metadata in separate files. This guarantees access to embedded information which in turn may help identify the related digital data. • Data containers used for packaging mostly have standard formats that are not as vulnerable as non-standardised formats. • At the same time the risk of data loss is higher for separated files when unpacking. • For some embedding methods object modifications are inevitable.
  • 26. • The term ’Data Hiding’ describes methods to embed information in a way it is not perceptible by humans. • Steganography and invisible digital watermarks are mostly detectable by machines. • For most preservation scenarios it is necessary to be able to detect the encapsulated information. • Data hiding increases the risk of losing the knowledge about the existence of this data. • To avoid this, the carrier can be tagged with a visible method. • Packaging is always visible, whereas steganographic methods are usually invisible. Visibility
  • 27. • The location where the metadata is encapsulated can be a decision criterion: •a separate file •the exact location at a file •the time dimension of an audio file •the background noise. Location • The location of storage is the main difference between packaging and embedding methods: •Packaging stores the information in a separated file, mostly in a standardised XML format. •Embedding stores the environment information directly in the digital object.
  • 28. • The embedding into the background noise has no influence on the significant properties and doesn't need additional disk space. • Therefore, the noise has to be clearly identifiable, to prevent damage of the digital object. • Using available metadata fields, or an information frame, do not influence the significant properties of the digital object directly. • Some embedding methods store information by changing elements of the object, e.g. by inverting single bits of an image pixel, or by usingof an imperceptible frequency of audio files. • Some data formats offer extra space for the storage of additional information. Location (cont.)
  • 29. • Some encapsulation tools offer security features, like encryption. Security • If encryption requires a secret key for accessing the data, there is high risk potential of losing the data by losing the key. • The preservation and re-use of confidential objects or encapsulated information requires adequate prudence. For this purpose an encryption makes sense. • The confidentiality of steganographic methods is based on the retention of knowledge or authorisation, if no additional encryption is used. Insofar this constitutes a very weak kind of confidentiality.
  • 30. • Authenticity and integrity of the digital object and its environment information are paramount for many usages. • Authenticity can be important if the digital object has special legal requirements. • Fragile or semi-fragile digital watermarks can be used in some cases to ensure the integrity of a delivery copy of an object, hereby the object is changed slightly by the application of the watermarking algorithm. • The marking would be destroyed, if the file is altered, thereby an intact mark can ensure that no third party changed the object. • Authenticity plays a major role in the archive context in which also the provenance and chain of custody of an object are important. Authenticity
  • 31. • To guarantee the integrity of a digital object, it is often kept apart in its original state and context, all changes to the original are omitted. Integrity • The BagIt directory structure can be used without applying an additional packaging or compression method to prevent object alterations. • Metadata is added into other defined directories of the structure, so that the digital object remains untouched, even by complementing information at a future date. • The integrity of the encapsulated information can be verified by adding the checksum of their originals to the restoration metadata.