SlideShare a Scribd company logo
On Semi-Supervised Learning
Of Legal Semantics
L. Thorne McCarty
Rutgers University
Three Papers
● 1998: Structured Casenotes: How Publishers Can Add Value to
Public Domain Legal Materials on the World Wide Web.
● 2007: Deep Semantic Interpretations of Legal Texts.
● 2015: How to Ground a Language for Legal Discourse in a
Prototypical Perceptual Semantics.
And a Proposal:
A research strategy to produce a computational summary
of a legal case, which can be scaled up to a realistic legal
corpus.
The Challenge
A structured casenote is a computational summary of the
procedural history of a case along with the substantive legal
conclusions articulated at each stage of the process. It would play
the same role in the legal information systems of the 21st century
that West Headnotes and Key Numbers have played in the 20th
century.
From my 1998 paper:
Why focus on procedural history?
The traditional case brief focuses on the procedural context first:
Who is suing whom, and for what? What is the plaintiff's legal
theory? What facts does the plaintiff allege to support this theory?
How does the defendant respond? How does the trial court
dispose of the case? What is the basis of the appeal? What
issues of law are presented to the appellate court? How does the
appellate court resolve these issues, and with what justification?
Think about the traditional “brief” that students are
taught to write in their first year of law school:
Within this procedural framework, we would represent
the substantive issues at stake in the decision.
● For the computational summary, we need an expressive
Knowledge Representation (KR) language.
● How can we build a database of structured casenotes at the
appropriate scale?
● Fully automated processing of legal texts?
● Semi-automated, with a human editor in the loop?
● For either approach, we need a Natural Language (NL)
technology that can handle the complexity of legal cases.
● But in 1998, neither the NL nor the KR technology was
sufficiently advanced.
Two Steps Toward a Solution:
ICAIL '07
Contributions:
● Showed that a “state-of-the-art statistical parser ... can handle
even the complex syntactic constructions of an appellate court
judge.”
● Showed that the “semantic interpretation of the full text of a
judicial opinion can be computed automatically from the output
of the parser.” Technical specifications:
● Quasi-Logical Form (QLF).
● Definite Clause Grammar (DCG).
She has also brought this ADA suit in which
she claims that her former employer, Policy
Management Systems Corporation,
discriminated against her on account of her
disability.
526 U.S. 795 (1999)
Terms:
term(lex, var, list)
...
“She has also brought this ADA suit ... “
The petitioner contends that the regulatory
takings claim should not have been decided by
the jury and that the Court of Appeals adopted an
erroneous standard for regulatory takings liability.
526 U.S. 687 (1999)
sterm(decided,C,[_,_])
...
AND
sterm(adopted,J,[_,_])
...
[modal(should),negative,perfect,passive]
The court ruled that sufficient evidence had
been presented to the jury from which it
reasonably could have decided each of these
questions in Del Monte Dunes' favor.
526 U.S. 687 (1999)
Semantics of 'WDT' and 'WHNP': W^nterm(which,W,[])
Semanticsof 'IN': Obj^Subj^P^pterm(in,P,[Subj,Obj])
Unify: Obj = nterm(which,W,[])
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 Term = pterm(in,P,[Subj,Obj])
Semanticsof 'WHPP':
W^Subj^P^pterm(in,P,[Subj,nterm(which,W,[])])
Semantics of 'S': E^sterm(claims,E,[_,_])
Unify: Term = pterm(in,P,[E,nterm(which,W,[])])
Tense = [present]
Semanticsof 'SBAR':
W^(E^(P^pterm(in,P,[E,nterm(which,W,[])]) &
	
 	
 	
  sterm(claims,E,[_,_]))/[present])
● How accurate are these semantic interpretations?
● Unfortunately, we do not have the data to answer this
question.
● Consider a different strategy:
● Write hand-coded extraction patterns to map information
from the QLF interpretations into the format of a structured
casenote.
● Generalize these extraction patterns by the unsupervised
learning of the legal semantics implicit in a large set of
unannotated legal cases.
● The total system would thus be engaged in a form of
semi-supervised learning of legal semantics.
Two Steps Toward a Solution:
ICAIL '15
● New Article (less technical, more intuitive):
“How to Ground a Language for Legal Discourse in a
Prototypical Perceptual Semantics”
(An edited transcript of a presentation at the Legal Quanta
Symposium at Michigan State University College of Law on
October 29, 2015)
Forthcoming in 2016 Michigan State Law Review _____.
Includes links to my more technical papers.
● Prototype Coding:
● The basic idea is to represent a point in an n-dimensional
space by measuring its distance from a prototype in several
specified directions.
● Furthermore, assuming that our initial space is Euclidean,
we want to select a prototype that lies at the origin of an
embedded, low-dimensional, nonlinear subspace, which is in
some sense “optimal”.
● The second point leads to a theory of
● Manifold Learning
● Deep Learning
● The theory has three components, drawn from:
Probability, Geometry, Logic.
● The Probabilistic Model:
This is a diffusion process determined by a potential function,
U(x), and its gradient, ∇U(x), in an arbitrary n-dimensional
Euclidean space.
The invariant probability measure for the diffusion process is
proportional to , which means that ∇U(x) is proportional to
the gradient of the log of the stationary probability density.
e
2U x
● The Geometric Model:
This is a Riemannian manifold with a Riemannian metric, ,
which we interpret as a measure of dissimilarity.
Using this dissimilarity metric, we can define a radial coordinate,
ρ, and the directional coordinates, θ1
, θ2
,...,θn– 1
, in our original n-
dimensional space, and then compute an optimal nonlinear k-
dimensional subspace.
The radial coordinate is defined to follow the gradient vector,
∇U(x), and the directional coordinates are defined to be
orthogonal to ∇U(x).
gij  x
7X7
patch
60,000 images 600,000 patches
49 dimensions
12 dimensions
sample
scan
encode
scan
14X14
patch
48 dimensions
encode
12 dimensions
encode Category: 4
12 dimensions48 dimensions
● is estimated from
the data using the mean
shift algorithm.
● at a prototype.
● The prototypical clusters
partition the space of
600,000 patches.
∇ U x
∇ U x=0
35 Prototypes
Prototype 09
Prototype 27
Prototype 30
Principal Axes
ρ
Geodesic Coordinate Curves
θ
θ
● The Logical Language:
The proposed logical language is a categorical logic based on
the category of differential manifolds (Man), which is weaker
than a logic based on the category of sets (Set) or the category
of topological spaces (Top).
For an intuitive understanding of what this means, assume that
we have replaced the standard semantics of classical logic,
based on sets and their elements, with a semantics based on
manifolds and their points. The atomic formulas can then be
interpreted as prototypical clusters, and the geometric properties
of these clusters can be propagated throughout the rest of the
language.
The same strategy can be applied to the entirety of my
Language for Legal Discourse (LLD).
Logic
Geometry
Probability
Constraints
Logic is constrained by the geometry.
Geometric model is constrained by
the probabilistic model.
Probability measure is constrained by the data.
Conjecture: The existence of these mutual constraints makes
It possible to learn the semantics of a complex knowledge
representation language.
● Why is this a “prototypical perceptual semantics”?
● It is a prototypical semantics because it is based on a
representation of prototypical clusters.
● It is a prototypical perceptual semantics because the primary
illustrations of the theory are drawn from the field of image
processing.
● Claim: If we can build a logical language on these
foundations, we will have a plausible account of how
human cognition could be grounded in human
perception.
Can We Learn
A Grounded Semantics
Without a Perceptual Ground?
● Two reasons to think this is possible:
● The theory of differential similarity is not really sensitive to
the precise details of the representations used at the lower
levels.
● There is increasing evidence that the semantics of lexical
items can be represented, approximately, as a vector in a
high-dimensional vector space, using only the information
available in the texts.
● Research Strategy:
● We initialize our model with a word embedding computed
from legal texts.
● We learn the higher level concepts in a legal domain by
applying the theory of differential similarity.
● Discussion?

More Related Content

PDF
Basic review on topic modeling
PDF
Lifelong Topic Modelling presentation
PDF
J79 1063
ODP
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
PDF
Logics of Context and Modal Type Theories
DOCX
Mit203 analysis and design of algorithms
PDF
PDF
FUZZY LOGIC IN NARROW SENSE WITH HEDGES
Basic review on topic modeling
Lifelong Topic Modelling presentation
J79 1063
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
Logics of Context and Modal Type Theories
Mit203 analysis and design of algorithms
FUZZY LOGIC IN NARROW SENSE WITH HEDGES

What's hot (19)

PDF
Supporting language learners with the
PDF
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
PDF
Allerton
PDF
Conceptual Spaces for Cognitive Architectures: A Lingua Franca for Different ...
PDF
Latent Dirichlet Allocation
PDF
Latent dirichletallocation presentation
PPTX
Topic model, LDA and all that
PDF
Extending the knowledge level of cognitive architectures with Conceptual Spac...
PDF
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
PDF
Constructive Hybrid Logics
PDF
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
PPTX
Introduction to Distributional Semantics
PDF
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
PDF
Topicmodels
PDF
Canini09a
PDF
Constructive Description Logics 2006
PDF
Fundamentals of the fuzzy logic based generalized theory of decisions
PDF
Centroid-based Text Summarization through Compositionality of Word Embeddings
PDF
AN IMPLEMENTATION, EMPIRICAL EVALUATION AND PROPOSED IMPROVEMENT FOR BIDIRECT...
Supporting language learners with the
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Allerton
Conceptual Spaces for Cognitive Architectures: A Lingua Franca for Different ...
Latent Dirichlet Allocation
Latent dirichletallocation presentation
Topic model, LDA and all that
Extending the knowledge level of cognitive architectures with Conceptual Spac...
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Constructive Hybrid Logics
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Introduction to Distributional Semantics
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
Topicmodels
Canini09a
Constructive Description Logics 2006
Fundamentals of the fuzzy logic based generalized theory of decisions
Centroid-based Text Summarization through Compositionality of Word Embeddings
AN IMPLEMENTATION, EMPIRICAL EVALUATION AND PROPOSED IMPROVEMENT FOR BIDIRECT...
Ad

Similar to Slides.ltdca (20)

PDF
Argumentation Within Deductive Reasoning
PDF
4213ijaia04
PDF
Language independent document
PDF
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
PDF
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
PDF
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
PDF
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
PDF
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
PDF
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
PPTX
LEC 1oral pathology by lecture 23jn yh.pptx
PPTX
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
PDF
semeval2016
PDF
Optimal combination of operators in Genetic Algorithmsfor VRP problems
PDF
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
PDF
L017158389
ODP
A Logical Language with a Prototypical Semantics
PDF
Predicting Forced Population Displacement Using News Articles
PDF
Predicting Forced Population Displacement Using News Articles
PDF
Predicting Forced Population Displacement Using News Articles
PPTX
PREDICT 422 - Module 1.pptx
Argumentation Within Deductive Reasoning
4213ijaia04
Language independent document
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
LEC 1oral pathology by lecture 23jn yh.pptx
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
semeval2016
Optimal combination of operators in Genetic Algorithmsfor VRP problems
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
L017158389
A Logical Language with a Prototypical Semantics
Predicting Forced Population Displacement Using News Articles
Predicting Forced Population Displacement Using News Articles
Predicting Forced Population Displacement Using News Articles
PREDICT 422 - Module 1.pptx
Ad

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
project resource management chapter-09.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Approach and Philosophy of On baking technology
PDF
Mushroom cultivation and it's methods.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
August Patch Tuesday
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
1. Introduction to Computer Programming.pptx
Getting Started with Data Integration: FME Form 101
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
A comparative study of natural language inference in Swahili using monolingua...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
A comparative analysis of optical character recognition models for extracting...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Unlocking AI with Model Context Protocol (MCP)
Chapter 5: Probability Theory and Statistics
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Programs and apps: productivity, graphics, security and other tools
project resource management chapter-09.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Approach and Philosophy of On baking technology
Mushroom cultivation and it's methods.pdf
NewMind AI Weekly Chronicles - August'25-Week II
August Patch Tuesday
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Hybrid model detection and classification of lung cancer
Web App vs Mobile App What Should You Build First.pdf
1. Introduction to Computer Programming.pptx

Slides.ltdca

  • 1. On Semi-Supervised Learning Of Legal Semantics L. Thorne McCarty Rutgers University
  • 2. Three Papers ● 1998: Structured Casenotes: How Publishers Can Add Value to Public Domain Legal Materials on the World Wide Web. ● 2007: Deep Semantic Interpretations of Legal Texts. ● 2015: How to Ground a Language for Legal Discourse in a Prototypical Perceptual Semantics. And a Proposal: A research strategy to produce a computational summary of a legal case, which can be scaled up to a realistic legal corpus.
  • 3. The Challenge A structured casenote is a computational summary of the procedural history of a case along with the substantive legal conclusions articulated at each stage of the process. It would play the same role in the legal information systems of the 21st century that West Headnotes and Key Numbers have played in the 20th century. From my 1998 paper: Why focus on procedural history?
  • 4. The traditional case brief focuses on the procedural context first: Who is suing whom, and for what? What is the plaintiff's legal theory? What facts does the plaintiff allege to support this theory? How does the defendant respond? How does the trial court dispose of the case? What is the basis of the appeal? What issues of law are presented to the appellate court? How does the appellate court resolve these issues, and with what justification? Think about the traditional “brief” that students are taught to write in their first year of law school: Within this procedural framework, we would represent the substantive issues at stake in the decision.
  • 5. ● For the computational summary, we need an expressive Knowledge Representation (KR) language. ● How can we build a database of structured casenotes at the appropriate scale? ● Fully automated processing of legal texts? ● Semi-automated, with a human editor in the loop? ● For either approach, we need a Natural Language (NL) technology that can handle the complexity of legal cases. ● But in 1998, neither the NL nor the KR technology was sufficiently advanced.
  • 6. Two Steps Toward a Solution: ICAIL '07 Contributions: ● Showed that a “state-of-the-art statistical parser ... can handle even the complex syntactic constructions of an appellate court judge.” ● Showed that the “semantic interpretation of the full text of a judicial opinion can be computed automatically from the output of the parser.” Technical specifications: ● Quasi-Logical Form (QLF). ● Definite Clause Grammar (DCG).
  • 7. She has also brought this ADA suit in which she claims that her former employer, Policy Management Systems Corporation, discriminated against her on account of her disability. 526 U.S. 795 (1999)
  • 8. Terms: term(lex, var, list) ... “She has also brought this ADA suit ... “
  • 9. The petitioner contends that the regulatory takings claim should not have been decided by the jury and that the Court of Appeals adopted an erroneous standard for regulatory takings liability. 526 U.S. 687 (1999) sterm(decided,C,[_,_]) ... AND sterm(adopted,J,[_,_]) ... [modal(should),negative,perfect,passive]
  • 10. The court ruled that sufficient evidence had been presented to the jury from which it reasonably could have decided each of these questions in Del Monte Dunes' favor. 526 U.S. 687 (1999)
  • 11. Semantics of 'WDT' and 'WHNP': W^nterm(which,W,[]) Semanticsof 'IN': Obj^Subj^P^pterm(in,P,[Subj,Obj]) Unify: Obj = nterm(which,W,[]) Term = pterm(in,P,[Subj,Obj]) Semanticsof 'WHPP': W^Subj^P^pterm(in,P,[Subj,nterm(which,W,[])])
  • 12. Semantics of 'S': E^sterm(claims,E,[_,_]) Unify: Term = pterm(in,P,[E,nterm(which,W,[])]) Tense = [present] Semanticsof 'SBAR': W^(E^(P^pterm(in,P,[E,nterm(which,W,[])]) & sterm(claims,E,[_,_]))/[present])
  • 13. ● How accurate are these semantic interpretations? ● Unfortunately, we do not have the data to answer this question. ● Consider a different strategy: ● Write hand-coded extraction patterns to map information from the QLF interpretations into the format of a structured casenote. ● Generalize these extraction patterns by the unsupervised learning of the legal semantics implicit in a large set of unannotated legal cases. ● The total system would thus be engaged in a form of semi-supervised learning of legal semantics.
  • 14. Two Steps Toward a Solution: ICAIL '15 ● New Article (less technical, more intuitive): “How to Ground a Language for Legal Discourse in a Prototypical Perceptual Semantics” (An edited transcript of a presentation at the Legal Quanta Symposium at Michigan State University College of Law on October 29, 2015) Forthcoming in 2016 Michigan State Law Review _____. Includes links to my more technical papers.
  • 15. ● Prototype Coding: ● The basic idea is to represent a point in an n-dimensional space by measuring its distance from a prototype in several specified directions. ● Furthermore, assuming that our initial space is Euclidean, we want to select a prototype that lies at the origin of an embedded, low-dimensional, nonlinear subspace, which is in some sense “optimal”. ● The second point leads to a theory of ● Manifold Learning ● Deep Learning ● The theory has three components, drawn from: Probability, Geometry, Logic.
  • 16. ● The Probabilistic Model: This is a diffusion process determined by a potential function, U(x), and its gradient, ∇U(x), in an arbitrary n-dimensional Euclidean space. The invariant probability measure for the diffusion process is proportional to , which means that ∇U(x) is proportional to the gradient of the log of the stationary probability density. e 2U x
  • 17. ● The Geometric Model: This is a Riemannian manifold with a Riemannian metric, , which we interpret as a measure of dissimilarity. Using this dissimilarity metric, we can define a radial coordinate, ρ, and the directional coordinates, θ1 , θ2 ,...,θn– 1 , in our original n- dimensional space, and then compute an optimal nonlinear k- dimensional subspace. The radial coordinate is defined to follow the gradient vector, ∇U(x), and the directional coordinates are defined to be orthogonal to ∇U(x). gij  x
  • 18. 7X7 patch 60,000 images 600,000 patches 49 dimensions 12 dimensions sample scan encode scan 14X14 patch 48 dimensions encode 12 dimensions encode Category: 4 12 dimensions48 dimensions
  • 19. ● is estimated from the data using the mean shift algorithm. ● at a prototype. ● The prototypical clusters partition the space of 600,000 patches. ∇ U x ∇ U x=0 35 Prototypes
  • 20. Prototype 09 Prototype 27 Prototype 30 Principal Axes ρ
  • 22. ● The Logical Language: The proposed logical language is a categorical logic based on the category of differential manifolds (Man), which is weaker than a logic based on the category of sets (Set) or the category of topological spaces (Top). For an intuitive understanding of what this means, assume that we have replaced the standard semantics of classical logic, based on sets and their elements, with a semantics based on manifolds and their points. The atomic formulas can then be interpreted as prototypical clusters, and the geometric properties of these clusters can be propagated throughout the rest of the language. The same strategy can be applied to the entirety of my Language for Legal Discourse (LLD).
  • 23. Logic Geometry Probability Constraints Logic is constrained by the geometry. Geometric model is constrained by the probabilistic model. Probability measure is constrained by the data. Conjecture: The existence of these mutual constraints makes It possible to learn the semantics of a complex knowledge representation language.
  • 24. ● Why is this a “prototypical perceptual semantics”? ● It is a prototypical semantics because it is based on a representation of prototypical clusters. ● It is a prototypical perceptual semantics because the primary illustrations of the theory are drawn from the field of image processing. ● Claim: If we can build a logical language on these foundations, we will have a plausible account of how human cognition could be grounded in human perception.
  • 25. Can We Learn A Grounded Semantics Without a Perceptual Ground? ● Two reasons to think this is possible: ● The theory of differential similarity is not really sensitive to the precise details of the representations used at the lower levels. ● There is increasing evidence that the semantics of lexical items can be represented, approximately, as a vector in a high-dimensional vector space, using only the information available in the texts.
  • 26. ● Research Strategy: ● We initialize our model with a word embedding computed from legal texts. ● We learn the higher level concepts in a legal domain by applying the theory of differential similarity. ● Discussion?