SlideShare a Scribd company logo
l|||||||||||||ll||l||||||||l|||||||||||||||||||||l|||||||||||||||||||l||||||||||||||||||||US 20030145022A1
(19) United States
(12) Patent Application Publication (10) Pub. No.: US 2003/0145022 A1
Dingley (43) Pub. Date: Jul. 31, 2003
(54) STORAGE AND MANAGEMENT OF Publication Classi?cation
SEMI-STRUCTURED DATA
(51) Int. Cl.7 ..................................................... G06F 12/00
(52) US. Cl. .............................................................. 707/204
(75) Inventor: Andrew Peter Dingley, Bristol (GB) (57) ABSTRACT
Correspondence Address: Data havmg a desirable and machine readable structure, but
HEWLETT_PACKARD COMPANY Which is not known in advance may be thought of as
Intellectual Property Administration semi-structured data. Semi-structured data may be repre
P 0 BOX 272400 sented in Resource Document FramWork (RDF) format, and
Fort Collins CO 80527_2400 (Us) such documents may be parsed to form a table of triples.
’ Relatively small amounts of data give rise to substantial
(73) AssigneeZ HEWLETT_PACKARD COMPANY number of triples, meaning that a triple store for relatively
small amounts of data Will have relatively large number of
(21) APPL NO; 10/303,137 roWs. Amanagement programme for a triple store monitors
the number of occasions on Which a given query is executed,
(22) Filed; Nov_ 21, 2002 and if the frequency of the query exceeds a given threshold,
then the triples forming the result set of the query are
(30) Foreign Application Priority Data migrated to an auxiliary triple store, thus reducing the
number of roWs searchable as a result of execution of the
Jan. 31, 2002 (GB) ......................................... 02021780 given query,
252% /
__T’
(If)??? z -- 404
———408
Store
RnX;QC ',412
l
i=i+1 ""414
416 N
Y
i=0 <—417
420
418 CALC NO. Rn
QUERIESIN ::> RnX;Q - RnX;(Qc»100)]
LASTPERIOD [ l l
422 Y
MIGRATE TO
SEPARATE
STORE
424REPATRIATE
426 ‘ TO PRINCIPAL
STORE
Patent Application Publication Jul. 31, 2003 Sheet 1 0f 3 US 2003/0145022 A1
PATENT NO. : 1234
1 INVENTOR :DINGLEY
AUTHOR :CHEESEMAN
PATENT NO. : 5678
2 INVENTOR :DINGLEY
AUTHOR :FORMAGGIO
Fig. 1
5678 1234
Pat‘ No. Rdfs:type Pat- NO.
Author Inventor Inventor
Author
A DINGLEY
Name
FORMAGGIO
CHEESEMAN
Fig. 2
Storage and management of semi structured data
Patent Application Publication Jul. 31, 2003 Sheet 3 0f 3 US 2003/0145022 A1
DBASE
QUERY ””"402
l
QCOUNT=
QC+1 ~-—4O4
406
Rnx : INITIALISE —— 408
410*‘ Rnx + 1 RnX
+ J
Store
416 N
i=0 --——417
I 420
418 CAU3NO.Rn / 7
LAST PERIOD [n O] [n (Q422 Y
MIGRATE TO
REPATRIATE SEPARATE '/ 424
426‘TO PRINCIPAL STORE
STORE l
Fig. 4
END ‘f
US 2003/0145022 A1
STORAGE AND MANAGEMENT OF
SEMI-STRUCTURED DATA
BACKGROUND TO THE INVENTION
[0001] 1. Field of the Invention
[0002] The present invention relates to the storage of
semi-structured data, for example in a database, and to the
management of such data storage.
[0003] 2. Description of Related Art
[0004] Adatabase typically contains a plurality of records,
and may be thought of as tabular in architecture, With each
roW of the table relating to a different record, and each
attribute of a record, such as “name” or “date of birth” for
example being stored in a different column of a roW.
Traditionally databases have been used to store What may be
termed structured data. That is to say that, for example each
column of the table is designated speci?cally for the storage
of a particular attribute. Thus for example, Where, in a
database Which stores personal details of employees, a
column is designated for the storage of “date of birth” data,
all entries in that column Will relate only to date of birth.
This ostensibly self-evident database architecture Works
Well Where the nature of the data being stored may be
de?ned accurately prior to con?guration of the system, and
Where any changes to the nature of the attributes of a record
are pre-noti?ed, thereby enabling the database to be recon
?gured to take account of them, for example either by
re-designation of one or more existing columns to provide
for the storage of changed attributes.
[0005] HoWever such in?exibility is regarded as a signi?
cant handicap to the easy maintenance of contemporary
records, and is Wholly inappropriate in circumstances Where
it is not possible to de?ne accurately in advance the
attributes of the data to be stored, or Where these may change
frequently and/or Without prior notice. Data Whose attributes
may change in this Way may be termed semi-structured data.
Semi-structured data thus has a describable and machine
processable structure, but this structure may not be knoWn in
advance. It is possible to represent semi-structured data
using a data model knoWn as Resource Description Frame
Work (RDF), Which represents data in the form of a math
ematical graph, that is to say a graph of nodes and directed
arcs, and in doing so illustrates any interrelationship of
different attributes, Whether betWeen attributes of the same
record, or attributes of a different record. In accordance With
the terminology of the RDF data model, data is represented
either as a Resource, a Property, or a Value. It is possible to
deconstruct, or “parse” the RDF graphical representation of
data into tabular form, Where the table has three columns:
subject, verb, object, corresponding to Resource, Property
and Value. The parsing and subsequent storage of records is
performed in such a manner that no data is lost. Thus it is
possible to reconstruct the RDF graphical representation
from the information present in the table, i.e. the data Within
the table, together With the column or roW in Which the data
is stored. Records Which are stored as “Subject, Verb,
Object” are knoWn in the art as “triples”, and complete
parsing (i.e. so that all the information Within the RDF
document is transferred into the resulting table of triples) of
an RDF document of any siZe results in a relatively large
table (i.e. having many roWs) of triples. Consequently,
searching a given column for a given attribute is likely to
Jul. 31, 2003
take a substantial amount of time as a result of the relatively
large number of roWs in the table.
SUMMARY OF THE INVENTION
[0006] A ?rst aspect of the present invention relates to the
management of a store of triples in order to ameliorate the
problem of searching large numbers of roWs of a triple store
on each occasion a search query is executed. Accordingly, a
?rst aspect of the present invention provides a database
having a principal table of triples, and a management
programme adapted to monitor operation of the principal
table and to migrate triples from the principal table to one or
more auxiliary tables When at least one criterion tested by
the programme is met.
[0007] In migrating triples to an auxiliary table, Which
may already exist, or may have been created especially for
the purpose of accommodating the migrating triples, the
management programme is reducing the number of roWs
Which have to be searched in order to execute a query Whose
result set includes the migrated triples, since the siZe, i.e. the
number of roWs, of the table in Which the migrated triples are
stored Will typically be smaller than the principal table.
[0008] In one embodiment the management programme
migrates triples on the basis of the frequency individual sets
of triples (a set containing any number of triples from, and
including Zero, upWards) are accessed as a result of a query
being executed. In a further embodiment, the management
programme operates on the basis of the frequency of par
ticular queries, for example migrating triples Which are the
result set to frequent queries.
[0009] The frequency With Which sets of triples are
accessed may be determined in a number of Ways, for
example in one embodiment it may be calculated as a
proportion of the queries for the triple store as a Whole over
the course of an interval determined by a preset number of
queries. Alternatively, it may be determined With reference
simply to the passage of time.
[0010] Other criteria, either alone or in conjunction may
be applied to determine Whether triples are to be migrated.
[0011] Preferably the management programme also oper
ates continually to monitor auxiliary tables, and to repatriate
sets of triples to the principal table When one or more of the
criterion tested by the programme fail to be met, thus for
example, removing an unnecessary overhead of maintaining
an auxiliary table containing triples Which are never
accessed during execution of a search query. Typically, the
same criterion or criteria are tested for determining Whether
migration and repatriation ought to take place.
BRIEF DESCRIPTION OF DRAWINGS
[0012] An embodiment of the invention Will noW be
described, by Way of example, and With reference to the
accompanying draWings in Which:
[0013]
[0014] FIG. 2 shoWs the representation of the data form
ing the entries of FIG. 1 in Resource Document Format
(RDF);
[0015] FIG. 3 is a triple store resulting from the complete
parsing of the RDF document of FIG. 2;
FIG. 1 shoWs tWo conventional database entries;
US 2003/0145022 A1
[0016] FIG. 4 is a ?owchart illustrating the operation of a
database management programme, used for example With
the triple store of FIG. 3.
DESCRIPTION OF PREFERRED
EMBODIMENTS
[0017] Referring noW to FIG. 1, tWo records Whose data
it is desired to store in a database are illustrated. Each record
has three attributes: the publication number of a patent, the
inventor designated on the patent, and the author of the
speci?cation of the patent. As can be seen from looking at
the records, the inventor in each case is the same, and so to
this extent at least, the tWo records are interrelated.
[0018] Referring noW to FIG. 2, both records, and their
interrelationship can be represented in a graphical document
format knoWn as Resource Description Framework (RDF),
and an RDF document representative of the tWo records is
shoWn in FIG. 2. The RDF document may be thought of as
graphical representation of the data in FIG. 1, Which also
describes the structure of that data, and contains essentially
three elements: Resources, Properties and Values. Thus for
example, the document in FIG. 2 has a resource #Al. This
Resource is labelled #Al, although in the event that the
resource could be named by a Uniform Resource Indicator
(URI), such as for example a Web page address, this Would
also appear in the name of the Resource. In this example the
resource has no such name, but has four different properties
Which, inter alia serve to characterise it: Pat. No., Author,
Inventor (all of Which may intuitively be related to one of the
records in FIG. 1), and “rdf: type”. The ?rst three properties
are simply the different attributes of one of the records
shoWn in FIG. 1, While the fourth indicates the type or
nature of the Resource, Which in this instance is a patent.
With this in mind it folloWs that a patent (Which is the “type”
of the Resource) has the properties of Author, Inventor and
Number, and While this may not be the most intuitive Way
to describe a record in FIG. 1 from a lay person’s perspec
tive, it nonetheless is possible to see that all of the infor
mation shoWn in a record in FIG. 1 is replicated in this
format. Thus the tWo Resources #Al and #B1 relate to the
patents 5678 and 1234 respectively.
[0019] The properties of Inventor and Author for each of
these tWo Resources are respectively represented by further
Resources: #B2 Which corresponds to the inventor—since
the inventor is the same in each case; and #A2 and #C2
Which correspond to the tWo authors. The Resource #B2 is
thus the Value of the Inventor Property for each of the
Resources #Al and #B 1, and itself has tWo further prop
erties, one of Which is its rdfs: type, indicating that the
Inventor is a person, and the other is the name of the
inventor, Which is its “literal” Value, the inventor’s name A.
Dingley. The Author Properties of the Resources #Al and
#B1 are respectively the Resources #A2 and #B2 and each
have an rdfs: type property Which signi?es that the Author
is a person, and Name Properties having literal Values,
Which are the names of the Authors “Formaggio” and
“Cheeseman” respectively.
[0020] Thus an RDF document describes completely both
the data in a record, its nature and any interrelationship With
data in another record. The purpose of representing data in
such a manner is essentially to provide a common format
independent of the source format of data, Which may be
manipulated by computers, and Which contains all of the
original data.
Jul. 31, 2003
[0021] In order to store data having the form of an RDF
document, it must be converted into a tabular form, and this
is achieved by a process knoWn in the art as parsing, Which
in this example is the analysis of the RDF document to yield
a table of What are knoWn as “triples”. A triple may be
thought of as being the smallest part of the RDF document
illustrated in FIG. 2 Which has any meaning in isolation (i.e.
an “atomic” part of an RDF document). Thus for example
the Value “1234” is essentially meaningless on its oWn; it
only starts to take on some meaning When it exists Within a
context Which indicates that it is the Publication Number of
a particular Resource; this is an example of a triple.
[0022] The RDF document of FIG. 2 is parsed to generate
triples in a tabular form by considering the various elements
of the document and their interrelationship as either “Sub
ject”, “Verb” or “Object”, corresponding generally to
Resource, Property and Value. Thus referring noW to FIG.
3, the table of triples generated from the complete parsing of
the RDF document of FIG. 2 is shoWn, and it can be seen
that the ?rst triple has a Subject #Al, the Verb Publn. No.,
and the Object 1234, corresponding to the Resource, Prop
erty and Value from the RDF document of FIG. 2. The
category of the Verb in a given column, that is to say Whether
the property in the Verb points to a Subject Which is a literal
Value, or a Value Which is a Resource, is also indicated
Within the Verb column With an appropriate letter (i.e. “L”
or “R”).
[0023] In total the table of FIG. 3 contains 13 triples,
Which are the result of the complete parsing of the RDF
document of FIG. 1, Which in turn is generated from merely
tWo database entries each of Which has only three attributes.
It is thus apparent that relatively small amounts of data may
result in the creation of a relatively large triple store When
the data is represented as an RDF document. One of the
premises underlying the use of RDF is that the inevitable
increase in the amount of data as a consequence of convert
ing data into RDF is offset by the advantages gained from
representing data in a standard form (assuming of course
that RDF is a format Which becomes Widely adopted), and
the increased ?exibility Which operating on data in RDF
offers. Another premise is that the advances in computing
poWer and memory may be used to deal With the additional
data arising from the adoption of RDF.
[0024] HoWever, it remains the case that, in order to
execute a query on a triple store, each roW of a particular
column of the triple store must be searched for attributes in
that column Which match the query. The length of the triple
store is thus one of the principal determining factors in the
time required to execute a query on such a store. One aspect
of the present invention provides dynamic management of a
triple store to migrate particular sets of triples (or “roWs” in
database theory nomenclature) into a separate store in the
event that they are frequently accessed When a query is
executed, and (if they are located in a separate store)
re-migrate sets of triples back into the principal triple store
When they cease to be accessed frequently. This means that
frequently accessed triples are located in one or more
separate tables having feWer roWs, and on Which queries
may therefore be executed more rapidly. In addition this also
removes triples from the principal store, thus improving
performance there for the remaining triples.
[0025] In one embodiment of the invention the criterion
for determining Whether a given triple is migrated to a
US 2003/0145022 A1
separate store is Whether it is accessed to form a part of the
result set to a query on a predetermined number of occasions
over the course of either a predetermined period of time (ie
determined in terms, for example of years, days, hours,
minutes and seconds), or alternatively as a proportion of a
predetermined number of queries performed on the database
(Whether their execution accesses the given triple or not).
[0026] Referring noW to FIG. 4, a database management
programme operates to manage the triple store, and, Where
appropriate to migrate selected triples Within the store into
a separate store When the selected sets of triples are accessed
frequently in the course of executing a query on the store.
The programme’s operation is effectively automatically
invoked by the receipt of a query by the database at step 402,
and receipt of the query causes, at step 404, the programme
to augment a variable QCOUNT, representative of the total
number of queries made of the triple store, by one. At step
406 the programme determines, for each triple forming part
of the result set of the query, Whether it has been accessed
pursuant to a query before. If this is the ?rst time the triple
has been accessed, then a variable RnX is initialised With a
value of one at step 408. The variable RnX is simply an an
identi?er for the triple Which is unique Within the database,
Which in this example is the roW number of the triple (Rn),
together With the number of times the triple Rn has been
accessed. If the triple has been accessed before, then the
variable RnX Will already be initialised, and is augmented
by one at step 410. At step 412, the variable RnX is then
stored, in conjunction With the value QC. These tWo vari
ables denote the same event, ie a given query of the triple
store, but With reference to different things: the variable QC
is refers to the total number of queries, and so each value of
QC is unique Within the database, While the variable RnX
denotes the Xth occasion on Which roW n of the database has
been accessed. In combination, these tWo variables enable
an evaluation of the frequency With Which roW n of the
database is accessed in the course of a given number of
queries of the triple store as a Whole, or put another Way, the
proportion of queries of the triple store as a Whole Which
access nth roW of the database. This may be measured for
example by reference to the aggregate number of queries
ever received by the database, or by reference to an interval
de?ned by a set number of queries. In the present example,
the frequency With Which a given triple is accessed is
measured as a proportion of a given interval of 100 queries
Which accessed that triple. At step 414 a variable i, repre
senting the total number of queries Within the current
interval of 100 queries, is augmented by 1, and at step 416
a decision is taken as to Whether the interval total of 100
queries for the database as a Whole has been reached. If it
has, i is reset to Zero at step 417, to restart the count, and then
a calculation is performed at step 418 for each set of triples
accessed over the course of the most recent interval to
determine hoW often it has been accessed in this interval.
This calculation is shoWn in box 420, and is simply the
difference betWeen the number of occasions on Which the
triple Rn Was part of the result set to a query When the total
number of queries (of the triple store as a Whole) is (QC),
and again When the total number of queries is (QC-100). A
decision is then taken at step 422 to determine Whether the
number of occasions the triple has been accessed during the
interval exceeds the predetermined number set as the thresh
old for migrating the triple into a separate store. If it has, the
triple in question then is denoted as a candidate for migration
Jul. 31, 2003
to a separate store, and at step 424 the triple is migrated.
Conversely, if the threshold is not exceeded, then the triple
is repatriated at step 426 to the principal table if in a separate
store, or not migrated if already in the principal store.
[0027] It should be noted that the steps of measuring,
deciding, then migrating, may be performed by separate
processes. Their description here as part of one process is not
essential, but is useful for convenience in describing them.
SloW processes such as migration may also be delayed or
deferred until times of loW system load. It is also possible to
sWitch off monitoring for periods of extremely high load.
[0028] In a programme such as the one illustrated herein,
in Which management of the triple store is performed
principally on the basis of the frequency of accessing a
triple, a dif?culty exists in deciding on an appropriate
destination for migrating triples. In its simplest form the
present invention provides simply that all sets of triples
Which, over the course of the previous 100 queries of the
triple store as a Whole, Were accessed more than a prede
termined number of occasions (“threshold access fre
quency”) are migrated to a single separate store. HoWever,
further improvements in this approach include, in one
embodiment providing a plurality of separate stores for sets
of triples having different access frequencies, With the
number of triples in each separate store being determined by
the access frequency of the triples in that store. Thus for
example a store With triples With a high access frequency has
a maximum of only a feW triples, Whereas a store With triples
having a relatively loW access frequency, but still in excess
of the threshold Will have a relatively large number of
triples. In addition, the management programme preferably
groups the triples for migration so that, Where possible,
triples are stored With other triples having a common sub
ject, verb or object.
[0029] Alternatively, triples migrated from the triple store
are grouped by reference to rdf type; either of the migrated
triples, or possibly by reference to the rdf type of their
parent, or even grandparent.
[0030] In a modi?cation of the programme illustrated and
described above, the management programme operates by
using queries of the triple store to identify triples to be
migrated. Thus in accordance With this modi?cation the
number of occasions a given query is executed is recorded,
and in the event that the frequency of the given query
exceeds a predetermined threshold, the sets of triples Which
form the result set to this given query are migrated to a
separate store. This approach has the advantage of more
straightforWard migration and management of triples, since
the process of identifying the triples to be migrated inher
ently groups them together for storage into a neW store.
[0031] The dynamic management exempli?ed in the
examples described above is particularly bene?cial When
storing semi-structured data, since documents in RDF for
mat may be used to represent all manner of data. It is thus
quite possible that upon addition of further triples to the
triple store, subsequent to further parsing of an amended
document, for example, the Verbs of the neWly resultant
triples may be Verbs not previously stored and Whose triples
are accessed more frequently than triples previously stored.
In such a circumstance, it Would make sense to migrate such
neW triples to an auxiliary table, Which the present invention
enables.
US 2003/0145022 A1
[0032] In a further modi?cation, repatriation of a triple to
the principal store is determined on the basis of one or more
criteria Which differ from the or each criterion used to
determine Whether the triple should be migrated. Thus for
example, the management programme may be con?gured to
include some in-built inertia against repatriation once migra
tion has occurred. For example, in the case Where both
migration and repatriation are determined on the basis of a
proportion of queries Which access them, the programme
may be con?gured so that once migrated, a query accessing
a triple must fail to be executed the requisite number of
times, for example, on tWo intervals of 100 queries of the
database as a Whole before being repatriated. Alternatively,
an entirely different criterion may be used to determine
repatriation, so that, for example the proportion of queries is
monitored to determine Whether migration ought to take
place, Whereas the number of occasions a migrated triple is
accessed is monitored to determine Whether repatriation
takes place. Typically repatriation is likely to be less fre
quent than migration, and in one embodiment repatriation
may simply not be possible.
1. A database having a principal table of triples, and a
management programme adapted to monitor operation of the
principal table and migrate triples from the principal table to
at least one neWly-generated auxiliary table When at least
one criterion tested by the programme is met.
2. A database according to claim 1 Wherein the manage
ment programme is additionally adapted to monitor opera
tion of an auxiliary table and to repatriate one or more triples
from the monitored auxiliary table to the principal table in
the event at least one criterion tested by the programme is
not met.
3. A database according to claim 2 Wherein the pro
gramme is adapted to test the same at least one criterion in
determining Whether a triple is to be migrated to an auxiliary
table and in determining Whether a triple is to be repatriated
to the principal table from an auxiliary table.
Jul. 31, 2003
4. A database according to claim 2 Wherein the pro
gramme is adapted to test different criteria in determining
Whether a triple is to be migrated to an auxiliary table and
in determining Whether a triple is to be repatriated to the
principal table from an auxiliary table.
5. A database according to claim 1 Wherein the manage
ment programme is adapted to test the number of occasions
on Which a triple is accessed as a result of execution of a
query, as a proportion of a number of queries received by the
database as a Whole.
6. A database according to claim 6 Wherein the manage
ment programme is adapted to test the number of occasions
on Which a triple is accessed as a result of execution of a
query, as a proportion of a predetermined number of queries
received by the database as a Whole.
7. A database according to claim 1 Wherein the manage
ment programme is adapted to test the number of occasions
on Which a triple is accessed as a result of execution of a
query Within a given period of time.
8. A database according to claim 1 Wherein the manage
ment programme is adapted to test the number of occasions
a given query is executed as a proportion of all queries
executed.
9. A database according to claim 8 Wherein the manage
ment programme is adapted to test the number of occasions
a given query is executed during the course of execution of
a predetermined total number of queries executed.
10. A database according to claim 1 Wherein the manage
ment programme is adapted to test the number of occasions
on Which a given query is executed Within predetermined
period of time.
11. A database according to claim 8 Wherein, in the event
the at least one criterion tested by the management pro
gramme is met, all triples forming the result set to a given
query are migrated to an auxiliary table.
12. A database according to claim 1, Wherein migrated
triples of the same rdf type are migrated to a common
auxiliary table.

More Related Content

PDF
Comparison between riss and dcharm for mining gene expression data
PDF
International Journal of Engineering Research and Development
PDF
Grouping & Summarizing Data in R
PDF
Pivoting approach-eav-data-dinu-2006
ODP
System of a Down
PPTX
Tinnitus, how affected my life
PDF
PPT
Chapter 12 – section 1 - FINAL
Comparison between riss and dcharm for mining gene expression data
International Journal of Engineering Research and Development
Grouping & Summarizing Data in R
Pivoting approach-eav-data-dinu-2006
System of a Down
Tinnitus, how affected my life
Chapter 12 – section 1 - FINAL

Viewers also liked (16)

PPTX
Love fest
PPT
Chapter 12 – Section 1 - Africa
PPTX
Financial scams impact on stock market or
ODP
Cambridge powerpoint
PDF
prueba 84 09-anexo01
PPT
Tercer encuentro - soporte y lenguaje gráfico
PDF
Unified business mailbox
PDF
Analyse Oasis
PDF
Good Friday
PDF
urgent sale chd golf avenue sector 106 ,size-1940@4800 per sq.ft more details...
PDF
How NOT To Become A Successful Entrepreneur
PDF
P L_09_15
PDF
MIKE CAD Level 4.PDF
PPTX
20150922 vloggers zijn de nieuwe helden
PDF
Comunicare Live Conference: L’esperienza del web al femminile: un magazine, t...
PDF
Khuloud Sakkal 3524336
Love fest
Chapter 12 – Section 1 - Africa
Financial scams impact on stock market or
Cambridge powerpoint
prueba 84 09-anexo01
Tercer encuentro - soporte y lenguaje gráfico
Unified business mailbox
Analyse Oasis
Good Friday
urgent sale chd golf avenue sector 106 ,size-1940@4800 per sq.ft more details...
How NOT To Become A Successful Entrepreneur
P L_09_15
MIKE CAD Level 4.PDF
20150922 vloggers zijn de nieuwe helden
Comunicare Live Conference: L’esperienza del web al femminile: un magazine, t...
Khuloud Sakkal 3524336
Ad

Similar to Storage and management of semi structured data (20)

PDF
Data Retrieval and Preparation Business Analytics.pdf
PDF
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
PPTX
codd rules of dbms given by E F codd who is called father of dbms
DOCX
Bc0041
DOCX
Database Concepts
PPT
D B M S Animate
PPTX
Codds rules & keys
PPTX
Chapter 4 Chapter Relational DB - Copy.pptx
PPT
DatabaseFundamentals.ppt
PPT
DatabaseFundamentals.ppt
PPTX
DBMS (1).pptx
PPTX
Relational Database Design Functional Dependency – definition, trivial and no...
DOCX
COMPUTERS Database
PPT
Nunes database
PDF
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
PPTX
Introduction to database
PPTX
Relational Database Design
PPTX
Database Management System
PPTX
Dbms &amp; oracle
PPT
Mca ii-dbms-u-iv-structured query language
Data Retrieval and Preparation Business Analytics.pdf
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
codd rules of dbms given by E F codd who is called father of dbms
Bc0041
Database Concepts
D B M S Animate
Codds rules & keys
Chapter 4 Chapter Relational DB - Copy.pptx
DatabaseFundamentals.ppt
DatabaseFundamentals.ppt
DBMS (1).pptx
Relational Database Design Functional Dependency – definition, trivial and no...
COMPUTERS Database
Nunes database
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Introduction to database
Relational Database Design
Database Management System
Dbms &amp; oracle
Mca ii-dbms-u-iv-structured query language
Ad

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Approach and Philosophy of On baking technology
PPTX
A Presentation on Artificial Intelligence
Mobile App Security Testing_ A Comprehensive Guide.pdf
1. Introduction to Computer Programming.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Reach Out and Touch Someone: Haptics and Empathic Computing
MYSQL Presentation for SQL database connectivity
Group 1 Presentation -Planning and Decision Making .pptx
NewMind AI Weekly Chronicles - August'25-Week II
Digital-Transformation-Roadmap-for-Companies.pptx
SOPHOS-XG Firewall Administrator PPT.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Unlocking AI with Model Context Protocol (MCP)
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Getting Started with Data Integration: FME Form 101
Approach and Philosophy of On baking technology
A Presentation on Artificial Intelligence

Storage and management of semi structured data

  • 1. l|||||||||||||ll||l||||||||l|||||||||||||||||||||l|||||||||||||||||||l||||||||||||||||||||US 20030145022A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2003/0145022 A1 Dingley (43) Pub. Date: Jul. 31, 2003 (54) STORAGE AND MANAGEMENT OF Publication Classi?cation SEMI-STRUCTURED DATA (51) Int. Cl.7 ..................................................... G06F 12/00 (52) US. Cl. .............................................................. 707/204 (75) Inventor: Andrew Peter Dingley, Bristol (GB) (57) ABSTRACT Correspondence Address: Data havmg a desirable and machine readable structure, but HEWLETT_PACKARD COMPANY Which is not known in advance may be thought of as Intellectual Property Administration semi-structured data. Semi-structured data may be repre P 0 BOX 272400 sented in Resource Document FramWork (RDF) format, and Fort Collins CO 80527_2400 (Us) such documents may be parsed to form a table of triples. ’ Relatively small amounts of data give rise to substantial (73) AssigneeZ HEWLETT_PACKARD COMPANY number of triples, meaning that a triple store for relatively small amounts of data Will have relatively large number of (21) APPL NO; 10/303,137 roWs. Amanagement programme for a triple store monitors the number of occasions on Which a given query is executed, (22) Filed; Nov_ 21, 2002 and if the frequency of the query exceeds a given threshold, then the triples forming the result set of the query are (30) Foreign Application Priority Data migrated to an auxiliary triple store, thus reducing the number of roWs searchable as a result of execution of the Jan. 31, 2002 (GB) ......................................... 02021780 given query, 252% / __T’ (If)??? z -- 404 ———408 Store RnX;QC ',412 l i=i+1 ""414 416 N Y i=0 <—417 420 418 CALC NO. Rn QUERIESIN ::> RnX;Q - RnX;(Qc»100)] LASTPERIOD [ l l 422 Y MIGRATE TO SEPARATE STORE 424REPATRIATE 426 ‘ TO PRINCIPAL STORE
  • 2. Patent Application Publication Jul. 31, 2003 Sheet 1 0f 3 US 2003/0145022 A1 PATENT NO. : 1234 1 INVENTOR :DINGLEY AUTHOR :CHEESEMAN PATENT NO. : 5678 2 INVENTOR :DINGLEY AUTHOR :FORMAGGIO Fig. 1 5678 1234 Pat‘ No. Rdfs:type Pat- NO. Author Inventor Inventor Author A DINGLEY Name FORMAGGIO CHEESEMAN Fig. 2
  • 4. Patent Application Publication Jul. 31, 2003 Sheet 3 0f 3 US 2003/0145022 A1 DBASE QUERY ””"402 l QCOUNT= QC+1 ~-—4O4 406 Rnx : INITIALISE —— 408 410*‘ Rnx + 1 RnX + J Store 416 N i=0 --——417 I 420 418 CAU3NO.Rn / 7 LAST PERIOD [n O] [n (Q422 Y MIGRATE TO REPATRIATE SEPARATE '/ 424 426‘TO PRINCIPAL STORE STORE l Fig. 4 END ‘f
  • 5. US 2003/0145022 A1 STORAGE AND MANAGEMENT OF SEMI-STRUCTURED DATA BACKGROUND TO THE INVENTION [0001] 1. Field of the Invention [0002] The present invention relates to the storage of semi-structured data, for example in a database, and to the management of such data storage. [0003] 2. Description of Related Art [0004] Adatabase typically contains a plurality of records, and may be thought of as tabular in architecture, With each roW of the table relating to a different record, and each attribute of a record, such as “name” or “date of birth” for example being stored in a different column of a roW. Traditionally databases have been used to store What may be termed structured data. That is to say that, for example each column of the table is designated speci?cally for the storage of a particular attribute. Thus for example, Where, in a database Which stores personal details of employees, a column is designated for the storage of “date of birth” data, all entries in that column Will relate only to date of birth. This ostensibly self-evident database architecture Works Well Where the nature of the data being stored may be de?ned accurately prior to con?guration of the system, and Where any changes to the nature of the attributes of a record are pre-noti?ed, thereby enabling the database to be recon ?gured to take account of them, for example either by re-designation of one or more existing columns to provide for the storage of changed attributes. [0005] HoWever such in?exibility is regarded as a signi? cant handicap to the easy maintenance of contemporary records, and is Wholly inappropriate in circumstances Where it is not possible to de?ne accurately in advance the attributes of the data to be stored, or Where these may change frequently and/or Without prior notice. Data Whose attributes may change in this Way may be termed semi-structured data. Semi-structured data thus has a describable and machine processable structure, but this structure may not be knoWn in advance. It is possible to represent semi-structured data using a data model knoWn as Resource Description Frame Work (RDF), Which represents data in the form of a math ematical graph, that is to say a graph of nodes and directed arcs, and in doing so illustrates any interrelationship of different attributes, Whether betWeen attributes of the same record, or attributes of a different record. In accordance With the terminology of the RDF data model, data is represented either as a Resource, a Property, or a Value. It is possible to deconstruct, or “parse” the RDF graphical representation of data into tabular form, Where the table has three columns: subject, verb, object, corresponding to Resource, Property and Value. The parsing and subsequent storage of records is performed in such a manner that no data is lost. Thus it is possible to reconstruct the RDF graphical representation from the information present in the table, i.e. the data Within the table, together With the column or roW in Which the data is stored. Records Which are stored as “Subject, Verb, Object” are knoWn in the art as “triples”, and complete parsing (i.e. so that all the information Within the RDF document is transferred into the resulting table of triples) of an RDF document of any siZe results in a relatively large table (i.e. having many roWs) of triples. Consequently, searching a given column for a given attribute is likely to Jul. 31, 2003 take a substantial amount of time as a result of the relatively large number of roWs in the table. SUMMARY OF THE INVENTION [0006] A ?rst aspect of the present invention relates to the management of a store of triples in order to ameliorate the problem of searching large numbers of roWs of a triple store on each occasion a search query is executed. Accordingly, a ?rst aspect of the present invention provides a database having a principal table of triples, and a management programme adapted to monitor operation of the principal table and to migrate triples from the principal table to one or more auxiliary tables When at least one criterion tested by the programme is met. [0007] In migrating triples to an auxiliary table, Which may already exist, or may have been created especially for the purpose of accommodating the migrating triples, the management programme is reducing the number of roWs Which have to be searched in order to execute a query Whose result set includes the migrated triples, since the siZe, i.e. the number of roWs, of the table in Which the migrated triples are stored Will typically be smaller than the principal table. [0008] In one embodiment the management programme migrates triples on the basis of the frequency individual sets of triples (a set containing any number of triples from, and including Zero, upWards) are accessed as a result of a query being executed. In a further embodiment, the management programme operates on the basis of the frequency of par ticular queries, for example migrating triples Which are the result set to frequent queries. [0009] The frequency With Which sets of triples are accessed may be determined in a number of Ways, for example in one embodiment it may be calculated as a proportion of the queries for the triple store as a Whole over the course of an interval determined by a preset number of queries. Alternatively, it may be determined With reference simply to the passage of time. [0010] Other criteria, either alone or in conjunction may be applied to determine Whether triples are to be migrated. [0011] Preferably the management programme also oper ates continually to monitor auxiliary tables, and to repatriate sets of triples to the principal table When one or more of the criterion tested by the programme fail to be met, thus for example, removing an unnecessary overhead of maintaining an auxiliary table containing triples Which are never accessed during execution of a search query. Typically, the same criterion or criteria are tested for determining Whether migration and repatriation ought to take place. BRIEF DESCRIPTION OF DRAWINGS [0012] An embodiment of the invention Will noW be described, by Way of example, and With reference to the accompanying draWings in Which: [0013] [0014] FIG. 2 shoWs the representation of the data form ing the entries of FIG. 1 in Resource Document Format (RDF); [0015] FIG. 3 is a triple store resulting from the complete parsing of the RDF document of FIG. 2; FIG. 1 shoWs tWo conventional database entries;
  • 6. US 2003/0145022 A1 [0016] FIG. 4 is a ?owchart illustrating the operation of a database management programme, used for example With the triple store of FIG. 3. DESCRIPTION OF PREFERRED EMBODIMENTS [0017] Referring noW to FIG. 1, tWo records Whose data it is desired to store in a database are illustrated. Each record has three attributes: the publication number of a patent, the inventor designated on the patent, and the author of the speci?cation of the patent. As can be seen from looking at the records, the inventor in each case is the same, and so to this extent at least, the tWo records are interrelated. [0018] Referring noW to FIG. 2, both records, and their interrelationship can be represented in a graphical document format knoWn as Resource Description Framework (RDF), and an RDF document representative of the tWo records is shoWn in FIG. 2. The RDF document may be thought of as graphical representation of the data in FIG. 1, Which also describes the structure of that data, and contains essentially three elements: Resources, Properties and Values. Thus for example, the document in FIG. 2 has a resource #Al. This Resource is labelled #Al, although in the event that the resource could be named by a Uniform Resource Indicator (URI), such as for example a Web page address, this Would also appear in the name of the Resource. In this example the resource has no such name, but has four different properties Which, inter alia serve to characterise it: Pat. No., Author, Inventor (all of Which may intuitively be related to one of the records in FIG. 1), and “rdf: type”. The ?rst three properties are simply the different attributes of one of the records shoWn in FIG. 1, While the fourth indicates the type or nature of the Resource, Which in this instance is a patent. With this in mind it folloWs that a patent (Which is the “type” of the Resource) has the properties of Author, Inventor and Number, and While this may not be the most intuitive Way to describe a record in FIG. 1 from a lay person’s perspec tive, it nonetheless is possible to see that all of the infor mation shoWn in a record in FIG. 1 is replicated in this format. Thus the tWo Resources #Al and #B1 relate to the patents 5678 and 1234 respectively. [0019] The properties of Inventor and Author for each of these tWo Resources are respectively represented by further Resources: #B2 Which corresponds to the inventor—since the inventor is the same in each case; and #A2 and #C2 Which correspond to the tWo authors. The Resource #B2 is thus the Value of the Inventor Property for each of the Resources #Al and #B 1, and itself has tWo further prop erties, one of Which is its rdfs: type, indicating that the Inventor is a person, and the other is the name of the inventor, Which is its “literal” Value, the inventor’s name A. Dingley. The Author Properties of the Resources #Al and #B1 are respectively the Resources #A2 and #B2 and each have an rdfs: type property Which signi?es that the Author is a person, and Name Properties having literal Values, Which are the names of the Authors “Formaggio” and “Cheeseman” respectively. [0020] Thus an RDF document describes completely both the data in a record, its nature and any interrelationship With data in another record. The purpose of representing data in such a manner is essentially to provide a common format independent of the source format of data, Which may be manipulated by computers, and Which contains all of the original data. Jul. 31, 2003 [0021] In order to store data having the form of an RDF document, it must be converted into a tabular form, and this is achieved by a process knoWn in the art as parsing, Which in this example is the analysis of the RDF document to yield a table of What are knoWn as “triples”. A triple may be thought of as being the smallest part of the RDF document illustrated in FIG. 2 Which has any meaning in isolation (i.e. an “atomic” part of an RDF document). Thus for example the Value “1234” is essentially meaningless on its oWn; it only starts to take on some meaning When it exists Within a context Which indicates that it is the Publication Number of a particular Resource; this is an example of a triple. [0022] The RDF document of FIG. 2 is parsed to generate triples in a tabular form by considering the various elements of the document and their interrelationship as either “Sub ject”, “Verb” or “Object”, corresponding generally to Resource, Property and Value. Thus referring noW to FIG. 3, the table of triples generated from the complete parsing of the RDF document of FIG. 2 is shoWn, and it can be seen that the ?rst triple has a Subject #Al, the Verb Publn. No., and the Object 1234, corresponding to the Resource, Prop erty and Value from the RDF document of FIG. 2. The category of the Verb in a given column, that is to say Whether the property in the Verb points to a Subject Which is a literal Value, or a Value Which is a Resource, is also indicated Within the Verb column With an appropriate letter (i.e. “L” or “R”). [0023] In total the table of FIG. 3 contains 13 triples, Which are the result of the complete parsing of the RDF document of FIG. 1, Which in turn is generated from merely tWo database entries each of Which has only three attributes. It is thus apparent that relatively small amounts of data may result in the creation of a relatively large triple store When the data is represented as an RDF document. One of the premises underlying the use of RDF is that the inevitable increase in the amount of data as a consequence of convert ing data into RDF is offset by the advantages gained from representing data in a standard form (assuming of course that RDF is a format Which becomes Widely adopted), and the increased ?exibility Which operating on data in RDF offers. Another premise is that the advances in computing poWer and memory may be used to deal With the additional data arising from the adoption of RDF. [0024] HoWever, it remains the case that, in order to execute a query on a triple store, each roW of a particular column of the triple store must be searched for attributes in that column Which match the query. The length of the triple store is thus one of the principal determining factors in the time required to execute a query on such a store. One aspect of the present invention provides dynamic management of a triple store to migrate particular sets of triples (or “roWs” in database theory nomenclature) into a separate store in the event that they are frequently accessed When a query is executed, and (if they are located in a separate store) re-migrate sets of triples back into the principal triple store When they cease to be accessed frequently. This means that frequently accessed triples are located in one or more separate tables having feWer roWs, and on Which queries may therefore be executed more rapidly. In addition this also removes triples from the principal store, thus improving performance there for the remaining triples. [0025] In one embodiment of the invention the criterion for determining Whether a given triple is migrated to a
  • 7. US 2003/0145022 A1 separate store is Whether it is accessed to form a part of the result set to a query on a predetermined number of occasions over the course of either a predetermined period of time (ie determined in terms, for example of years, days, hours, minutes and seconds), or alternatively as a proportion of a predetermined number of queries performed on the database (Whether their execution accesses the given triple or not). [0026] Referring noW to FIG. 4, a database management programme operates to manage the triple store, and, Where appropriate to migrate selected triples Within the store into a separate store When the selected sets of triples are accessed frequently in the course of executing a query on the store. The programme’s operation is effectively automatically invoked by the receipt of a query by the database at step 402, and receipt of the query causes, at step 404, the programme to augment a variable QCOUNT, representative of the total number of queries made of the triple store, by one. At step 406 the programme determines, for each triple forming part of the result set of the query, Whether it has been accessed pursuant to a query before. If this is the ?rst time the triple has been accessed, then a variable RnX is initialised With a value of one at step 408. The variable RnX is simply an an identi?er for the triple Which is unique Within the database, Which in this example is the roW number of the triple (Rn), together With the number of times the triple Rn has been accessed. If the triple has been accessed before, then the variable RnX Will already be initialised, and is augmented by one at step 410. At step 412, the variable RnX is then stored, in conjunction With the value QC. These tWo vari ables denote the same event, ie a given query of the triple store, but With reference to different things: the variable QC is refers to the total number of queries, and so each value of QC is unique Within the database, While the variable RnX denotes the Xth occasion on Which roW n of the database has been accessed. In combination, these tWo variables enable an evaluation of the frequency With Which roW n of the database is accessed in the course of a given number of queries of the triple store as a Whole, or put another Way, the proportion of queries of the triple store as a Whole Which access nth roW of the database. This may be measured for example by reference to the aggregate number of queries ever received by the database, or by reference to an interval de?ned by a set number of queries. In the present example, the frequency With Which a given triple is accessed is measured as a proportion of a given interval of 100 queries Which accessed that triple. At step 414 a variable i, repre senting the total number of queries Within the current interval of 100 queries, is augmented by 1, and at step 416 a decision is taken as to Whether the interval total of 100 queries for the database as a Whole has been reached. If it has, i is reset to Zero at step 417, to restart the count, and then a calculation is performed at step 418 for each set of triples accessed over the course of the most recent interval to determine hoW often it has been accessed in this interval. This calculation is shoWn in box 420, and is simply the difference betWeen the number of occasions on Which the triple Rn Was part of the result set to a query When the total number of queries (of the triple store as a Whole) is (QC), and again When the total number of queries is (QC-100). A decision is then taken at step 422 to determine Whether the number of occasions the triple has been accessed during the interval exceeds the predetermined number set as the thresh old for migrating the triple into a separate store. If it has, the triple in question then is denoted as a candidate for migration Jul. 31, 2003 to a separate store, and at step 424 the triple is migrated. Conversely, if the threshold is not exceeded, then the triple is repatriated at step 426 to the principal table if in a separate store, or not migrated if already in the principal store. [0027] It should be noted that the steps of measuring, deciding, then migrating, may be performed by separate processes. Their description here as part of one process is not essential, but is useful for convenience in describing them. SloW processes such as migration may also be delayed or deferred until times of loW system load. It is also possible to sWitch off monitoring for periods of extremely high load. [0028] In a programme such as the one illustrated herein, in Which management of the triple store is performed principally on the basis of the frequency of accessing a triple, a dif?culty exists in deciding on an appropriate destination for migrating triples. In its simplest form the present invention provides simply that all sets of triples Which, over the course of the previous 100 queries of the triple store as a Whole, Were accessed more than a prede termined number of occasions (“threshold access fre quency”) are migrated to a single separate store. HoWever, further improvements in this approach include, in one embodiment providing a plurality of separate stores for sets of triples having different access frequencies, With the number of triples in each separate store being determined by the access frequency of the triples in that store. Thus for example a store With triples With a high access frequency has a maximum of only a feW triples, Whereas a store With triples having a relatively loW access frequency, but still in excess of the threshold Will have a relatively large number of triples. In addition, the management programme preferably groups the triples for migration so that, Where possible, triples are stored With other triples having a common sub ject, verb or object. [0029] Alternatively, triples migrated from the triple store are grouped by reference to rdf type; either of the migrated triples, or possibly by reference to the rdf type of their parent, or even grandparent. [0030] In a modi?cation of the programme illustrated and described above, the management programme operates by using queries of the triple store to identify triples to be migrated. Thus in accordance With this modi?cation the number of occasions a given query is executed is recorded, and in the event that the frequency of the given query exceeds a predetermined threshold, the sets of triples Which form the result set to this given query are migrated to a separate store. This approach has the advantage of more straightforWard migration and management of triples, since the process of identifying the triples to be migrated inher ently groups them together for storage into a neW store. [0031] The dynamic management exempli?ed in the examples described above is particularly bene?cial When storing semi-structured data, since documents in RDF for mat may be used to represent all manner of data. It is thus quite possible that upon addition of further triples to the triple store, subsequent to further parsing of an amended document, for example, the Verbs of the neWly resultant triples may be Verbs not previously stored and Whose triples are accessed more frequently than triples previously stored. In such a circumstance, it Would make sense to migrate such neW triples to an auxiliary table, Which the present invention enables.
  • 8. US 2003/0145022 A1 [0032] In a further modi?cation, repatriation of a triple to the principal store is determined on the basis of one or more criteria Which differ from the or each criterion used to determine Whether the triple should be migrated. Thus for example, the management programme may be con?gured to include some in-built inertia against repatriation once migra tion has occurred. For example, in the case Where both migration and repatriation are determined on the basis of a proportion of queries Which access them, the programme may be con?gured so that once migrated, a query accessing a triple must fail to be executed the requisite number of times, for example, on tWo intervals of 100 queries of the database as a Whole before being repatriated. Alternatively, an entirely different criterion may be used to determine repatriation, so that, for example the proportion of queries is monitored to determine Whether migration ought to take place, Whereas the number of occasions a migrated triple is accessed is monitored to determine Whether repatriation takes place. Typically repatriation is likely to be less fre quent than migration, and in one embodiment repatriation may simply not be possible. 1. A database having a principal table of triples, and a management programme adapted to monitor operation of the principal table and migrate triples from the principal table to at least one neWly-generated auxiliary table When at least one criterion tested by the programme is met. 2. A database according to claim 1 Wherein the manage ment programme is additionally adapted to monitor opera tion of an auxiliary table and to repatriate one or more triples from the monitored auxiliary table to the principal table in the event at least one criterion tested by the programme is not met. 3. A database according to claim 2 Wherein the pro gramme is adapted to test the same at least one criterion in determining Whether a triple is to be migrated to an auxiliary table and in determining Whether a triple is to be repatriated to the principal table from an auxiliary table. Jul. 31, 2003 4. A database according to claim 2 Wherein the pro gramme is adapted to test different criteria in determining Whether a triple is to be migrated to an auxiliary table and in determining Whether a triple is to be repatriated to the principal table from an auxiliary table. 5. A database according to claim 1 Wherein the manage ment programme is adapted to test the number of occasions on Which a triple is accessed as a result of execution of a query, as a proportion of a number of queries received by the database as a Whole. 6. A database according to claim 6 Wherein the manage ment programme is adapted to test the number of occasions on Which a triple is accessed as a result of execution of a query, as a proportion of a predetermined number of queries received by the database as a Whole. 7. A database according to claim 1 Wherein the manage ment programme is adapted to test the number of occasions on Which a triple is accessed as a result of execution of a query Within a given period of time. 8. A database according to claim 1 Wherein the manage ment programme is adapted to test the number of occasions a given query is executed as a proportion of all queries executed. 9. A database according to claim 8 Wherein the manage ment programme is adapted to test the number of occasions a given query is executed during the course of execution of a predetermined total number of queries executed. 10. A database according to claim 1 Wherein the manage ment programme is adapted to test the number of occasions on Which a given query is executed Within predetermined period of time. 11. A database according to claim 8 Wherein, in the event the at least one criterion tested by the management pro gramme is met, all triples forming the result set to a given query are migrated to an auxiliary table. 12. A database according to claim 1, Wherein migrated triples of the same rdf type are migrated to a common auxiliary table.