Heaviness vs. Newness in Constituent Ordering @cite{arnold-wasow-losongco-ginstrom-2000} #
@cite{arnold-wasow-losongco-ginstrom-2000} use a corpus analysis (the Aligned-Hansard corpus, with verbs give, bring...to, take...into account) and an elicitation experiment (with give) to disentangle two confounded predictors of English postverbal constituent ordering:
- Heaviness — relative word count: heavier constituents come later (Behaghel's Gesetz der wachsenden Glieder @cite{behaghel-1909}; "end-weight" in @cite{quirk-greenbaum-leech-svartvik-1972}).
- Newness — discourse status: given material precedes new information. The principle predates @cite{prince-1981}'s given/inferable/new taxonomy that the paper codes with; @cite{gundel-hedberg-zacharski-1993} place it in the broader accessibility-hierarchy literature.
The paper's central empirical claim is that both factors independently predict ordering in dative alternation and heavy NP shift; neither reduces to the other (§2-3). The deeper theoretical point of §5 is that the two factors interact: heaviness exerts more influence when newness is non-discriminating, and vice versa, consistent with a constraint-based architecture in which "the strength of a constraint is greater when competing constraints are weak".
Formalization strategy #
Following @cite{coetzee-pater-2011}, we model the system as a MaxEnt
grammar (@cite{goldwater-johnson-2003}) over the binary ordering
candidates {themeLast, goalLast} with two markedness constraints:
*HEAVY-FIRST— penalize the order whose first constituent is strictly heavier than the second*NEW-FIRST— penalize the order whose first constituent is discourse-new while the second is discourse-given
The harmony-score difference between the two orderings decomposes
additively (score_diff_eq_components) into a heaviness term
wH * heavyDiff p and a newness term wN * newDiff p, where
heavyDiff p, newDiff p ∈ {-1, 0, +1} record the per-constraint signed
preference for themeLast over goalLast. Every prediction theorem
below is a one-line consequence of this decomposition.
The §5 interaction story falls out as
heaviness_dominates_when_newness_neutral: when the newness constraint
is silent on a pair, the harmony difference is exactly wH * heavyDiff
— so any change in heaviness is undiluted by competing pressure. The
sigmoid shape of MaxEnt softmax then turns this into the empirical
"newness has more effect when heaviness is balanced" pattern.
Non-reducibility of the two factors #
heaviness_and_newness_genuinely_independent exhibits two pairs that
witness the §2-3 claim that neither factor reduces to the other:
heavyGoalContrasthasnewDiff = 0butheavyDiff ≠ 0— only heaviness firesnewThemeContrasthasheavyDiff = 0butnewDiff ≠ 0— only newness fires
A theory operationalizing only one of the two factors must give the same prediction at one of these contrasts as at the trivial baseline, contradicting the paper's findings.
Bridges #
Core.Constraint.MaxEntGrammar— the grammar packages as a generic MaxEnt grammar (maxEntGrammar), making the softmax probability machinery available without redefinition.Features.BinaryGivenness— discourse-status partition. The paper collapses @cite{prince-1981}'s three-way given/inferable/new into two categories (inferable → given). Focus marking is on a separate axis (Features.InformationStructure.Focus) and not consumed here.Theories.Syntax.DependencyGrammar.Formal.DependencyLength— Dependency Locality (@cite{futrell-gibson-2020}) provides the positive derivation of the heaviness signal: §9 below showsheavyDiffis exactly the sign of the DLM cost difference between the two orderings, so*HEAVY-FIRSTis not a stipulation but a theorem about which order minimizes total dependency length on a binary postverbal pair. The same module's word-invariance (dlm_word_invariant) shows DLM cannot, on its own, also derive the newness effect — motivating §10's UID derivation.Theories.Processing.MemorySurprisal— under @cite{futrell-2019}'s information-locality framework, DLM and UID are reductions of a single mutual-information-weighted cost (§11), so the two independent constraints*HEAVY-FIRSTand*NEW-FIRSTreflect variation along orthogonal axes of one underlying processing theory.Theories.Processing.Memory—MemorySurprisal's information locality is itself the behavioural profile of a finite-capacityMemoryProcess(@cite{futrell-gibson-levy-2020}'s lossy-context surprisal). §11 below traces the full grounding chainMemoryProcess → MemorySurprisal → {DLM, UID} → {*HEAVY-FIRST, *NEW-FIRST}, so Arnold's two constraints are not just co-justified at the cost-function level but anchored in a single architectural primitive: a predictor reading from a lossily-encoded memory.
A constituent characterized by the two dimensions @cite{arnold-wasow-losongco-ginstrom-2000} measure: word count (heaviness) and discourse status (newness). Concrete syntactic structure is abstracted away — these two scalars exhaust what the paper's regressions condition on.
- wordCount : ℕ
- discourse : Features.BinaryGivenness
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- ArnoldEtAl2000.instReprPhrase = { reprPrec := ArnoldEtAl2000.instReprPhrase.repr }
The two constituents of a binary postverbal alternation. For the
dative alternation, (theme, goal); for heavy NP shift, (direct object, prepositional phrase). The constraints below are
construction-neutral.
Equations
Instances For
Equations
- ArnoldEtAl2000.instDecidableEqOrder x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- ArnoldEtAl2000.instReprOrder = { reprPrec := ArnoldEtAl2000.instReprOrder.repr }
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- ArnoldEtAl2000.instFintypeOrder = { elems := {ArnoldEtAl2000.Order.themeLast, ArnoldEtAl2000.Order.goalLast}, complete := ArnoldEtAl2000.instFintypeOrder._proof_1 }
Equations
Instances For
*HEAVY-FIRST: violated when the first (verb-adjacent) constituent
is strictly heavier than the second. The OT-style markedness
encoding of @cite{behaghel-1909}'s law of growing constituents:
avoid placing the longer constituent first.
Equations
- One or more equations did not get rendered due to their size.
Instances For
*NEW-FIRST: violated when the first constituent is discourse-new
while the second is discourse-given. A markedness encoding of the
given-before-new principle the paper draws from
@cite{prince-1981}/@cite{gundel-hedberg-zacharski-1993}.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The two-constraint MaxEnt grammar parameterized by weights.
wH weights *HEAVY-FIRST; wN weights *NEW-FIRST.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The heaviness constraint's signed preference for themeLast over
goalLast on a pair: +1 when the theme (p.1) is heavier (so
placing it last avoids violation), -1 when the goal (p.2) is
heavier, 0 when they are equal.
Equations
- ArnoldEtAl2000.heavyDiff p = (if p.1.wordCount > p.2.wordCount then 1 else 0) - if p.2.wordCount > p.1.wordCount then 1 else 0
Instances For
The newness constraint's signed preference for themeLast over
goalLast on a pair: +1 when the theme is new and the goal given
(so placing the theme last respects given-before-new), -1 when
the goal is new and the theme given, 0 otherwise.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The harmony-score difference decomposes additively into per-constraint signed preferences scaled by their weights. This is the foundational identity of the formalization — every prediction theorem below is a one-step consequence.
newDiff is positive iff the theme is new while the goal is
given — i.e., *NEW-FIRST prefers themeLast.
Heaviness independently predicts ordering. With the newness
weight zeroed out, a positive heaviness weight is enough to make
the order placing the heavier constituent last strictly more
probable. Symmetric in the heavier-side direction: when heavyDiff p
is non-zero, its sign determines which order wins.
Newness independently predicts ordering. With the heaviness weight zeroed out, a positive newness weight is enough to make the order placing the new constituent last strictly more probable.
Both factors compose additively. When neither factor opposes
themeLast (both per-constraint contributions are non-negative)
and at least one strictly favors it, themeLast wins — without
requiring the caller to compute the combined sum. No separate
interaction term is needed to reproduce the experiment's significant
heaviness × newness term in logistic regression: it falls out of
additive harmony plus the sigmoid shape of MaxEnt probability.
Tradeoff theorem. When heaviness and newness conflict — one
favors themeLast, the other goalLast — the prediction depends
on which side has the larger weighted contribution. This is the
constraint-based architecture @cite{arnold-wasow-losongco-ginstrom-2000}
argue for in §5.
@cite{arnold-wasow-losongco-ginstrom-2000} §5 observe that "heaviness
had the largest effect on utterances where both constituents were given"
— and in general, "the strength of a constraint is greater when competing
constraints are weak". The next two theorems derive this directly from
MaxEnt's additive harmony: when one constraint contributes 0 to the
harmony difference (its eval is identical on both candidates), the
entire difference is borne by the other constraint, undiluted. The
sigmoid then translates a larger harmony differential into a larger
probability shift.
Constraint-interaction theorem (heaviness side). When the
newness constraint is neutral on a pair (newDiff p = 0, i.e., the
two constituents share the same givenness status), the harmony
difference between orderings is determined entirely by the weighted
heaviness term.
Constraint-interaction theorem (newness side). When the
heaviness constraint is neutral on a pair (heavyDiff p = 0, i.e.,
the two constituents are equally long), the harmony difference is
determined entirely by the weighted newness term. The paper's
elicitation experiment (where give stimuli held NP length roughly
constant) is exactly this regime — and unsurprisingly newness
showed a larger effect there than in the corpus study.
Give the carrot to the white rabbit who lived in the briar patch. Heavy goal (8 words), light theme (1 word), both new — the heaviness contrast (newness is silent here).
Equations
- ArnoldEtAl2000.heavyGoalContrast = ({ wordCount := 1, discourse := Features.BinaryGivenness.new }, { wordCount := 8, discourse := Features.BinaryGivenness.new })
Instances For
Give Alice the carrot. (Theme new, goal given.) Equal length, pure newness contrast (heaviness is silent here).
Equations
- ArnoldEtAl2000.newThemeContrast = ({ wordCount := 1, discourse := Features.BinaryGivenness.new }, { wordCount := 1, discourse := Features.BinaryGivenness.given })
Instances For
Non-reducibility witness. The two contrast pairs jointly
establish the paper's central claim that neither factor reduces to
the other: heavyGoalContrast activates only heaviness (newness
differential is zero), newThemeContrast activates only newness
(heaviness differential is zero). Any theory that operationalizes
only one of the two dimensions must collapse the prediction at one
contrast to the trivial baseline, contradicting
@cite{arnold-wasow-losongco-ginstrom-2000}.
Pure-heaviness MaxEnt grammar predicts goal-last (heavier-last) when
the goal is heavier than the theme. Direct application of the
heavyDiff-symmetric independence theorem on the swapped pair.
Pure-newness MaxEnt grammar predicts theme-last (given-first) when the theme is new and the goal is given.
The two-constraint grammar packaged as a generic MaxEntGrammar
over pairs (input) and orderings (output). This makes the library's
softmax probability infrastructure (MaxEntGrammar.prob, the
ConstraintSystem bridge, softmax_argmax_limit for the OT limit,
etc.) available without redefinition.
Equations
- One or more equations did not get rendered due to their size.
Instances For
totalDepLength (from DependencyLength.lean) is a candidate
formalization of @cite{behaghel-1909}'s end-weight effect — and
@cite{arnold-wasow-losongco-ginstrom-2000} discuss
@cite{hawkins-1990}'s parsing-theoretic version (Early Immediate
Constituents) as an instance. The next three lemmas show that any such
purely structural account cannot, on its own, reproduce the newness
effect: dependency length is a function of the dependency structure
alone — it never reads the words. So no DLM-derived predictor
distinguishes a sentence with discourse-given NPs from a sentence with
discourse-new NPs sharing the same dependency tree.
Combined with newness_independently_predicts, this implies any
adequate theory of postverbal ordering must combine a weight constraint
with at least one further dimension — here, discourse status.
totalDepLength ignores word identity: it depends only on the
dependency structure (head index × dep index × relation).
Even at the single-dependency level, depLength is |head − dep|
— the grammatical relation is irrelevant.
Corollary of dlm_word_invariant: trees that differ only in whether
their NPs are discourse-given or discourse-new receive identical
DLM cost. So Dependency Locality, as a pure tree-structural cost,
cannot reproduce the newness effect that
@cite{arnold-wasow-losongco-ginstrom-2000} demonstrate.
@cite{futrell-gibson-2020} establish dependency length minimization (DLM) as the explanatory principle behind a wide range of word-order universals, including @cite{behaghel-1909}'s law of growing constituents (their §2.3). The argument: in a head-initial language, when a head V has multiple right-dependents, total dependency length from V is minimized by ordering them shortest-first, because the head→dep distance to the second constituent equals the length of the first plus one.
Specialized to Arnold's binary postverbal alternation:
- Order V Y X (head-initial): V→head(Y) = 1, V→head(X) = |Y|+1. V-side DLM cost = |Y| + 2.
So goalLast (V theme goal) costs |theme|+2 and themeLast (V goal
theme) costs |goal|+2. DLM picks the order whose first constituent is
shorter — and that is what the *HEAVY-FIRST constraint operationalizes.
The heavyDiff sign is therefore not a free parameter of the
formalization but a theorem about DLM.
DLM cost contribution from the verb to its two postverbal complement heads, under a head-initial binary structure. The verb sits at position 0; the first constituent occupies positions 1…|first|, so its head (also at position 1, head-initial) is distance 1 from V, and the second constituent's head is at position |first|+1.
Equations
Instances For
Heaviness is DLM, not a stipulation. The MaxEnt grammar's
*HEAVY-FIRST constraint signal heavyDiff is exactly the sign of
the DLM cost difference between the two orderings. With this
bridge, heavyDiff is no longer a primitive of the formalization —
it is a theorem about which ordering @cite{futrell-gibson-2020}'s
dependency-length cost minimizes on a binary postverbal pair.
The DLM cost difference matches heavyDiff numerically up to scale:
cost(goalLast) - cost(themeLast) = p.1.wordCount - p.2.wordCount,
which has the same sign as heavyDiff. This is the "exact"
arithmetic version of heavyDiff_eq_dlm_signal and makes the
DLM-cost gap directly computable from word counts.
Genzel & Charniak's uniform information density (UID), elaborated in @cite{levy-2008}'s expectation-based parsing, predicts that high-surprisal material should be placed late in an utterance: by then more context has been processed, so high-information words can be integrated with greater predictability and lower per-step processing load.
For Arnold's binary postverbal pair, this maps cleanly: discourse-new
material is high-surprisal (the listener must construct a fresh referent),
discourse-given material is low-surprisal (the referent is already
active). UID therefore prefers placing the new constituent last — exactly
the direction *NEW-FIRST operationalizes.
Unlike the DLM/heaviness bridge, this is an implication rather than a
biconditional. Focus marking lives on its own axis
(Features.InformationStructure.Focus) and would enter UID via a
separate Focus-parameterized cost, not by extending this givenness
surprisal. Whenever newDiff p > 0, UID strictly prefers the same
ordering as *NEW-FIRST.
A coarse two-level surprisal proxy keyed on givenness:
.new is high-information (1), .given is low (0). This matches
the asymmetric pattern of Arnold's *NEW-FIRST constraint, which
fires only when one side is .new and the other is .given.
Equations
Instances For
UID cost for the binary postverbal pair: the surprisal of whichever constituent occupies the verb-adjacent (first) position. UID prefers delaying high-surprisal material, so this should be minimized.
Equations
Instances For
With §9 and §10 in place, Arnold's two MaxEnt constraints are no longer free stipulations — each is the boundary signal of an independently-motivated processing cost already formalized in linglib:
| Constraint | Bridge | Cost lives in |
|---|---|---|
*HEAVY-FIRST | heavyDiff_eq_dlm_signal | Theories.Syntax.DependencyGrammar.Formal.DependencyLength |
*NEW-FIRST | newDiff_pos_implies_uid_prefers_themeLast | Theories.Processing.MemorySurprisal (information locality) |
The two costs unify under @cite{futrell-2019}'s information locality
framework (see Theories.Processing.MemorySurprisal.Basic,
MutualInfoProfile.weightedSum): both DLM and UID are special cases of
minimizing Σ (memory cost × mutual information) across the utterance.
- DLM is the limit where mutual information is uniform across positions (only structural distance matters).
- UID is the limit where structural distance is held constant and only per-word surprisal varies.
The fact that *HEAVY-FIRST and *NEW-FIRST are both needed in the
MaxEnt grammar — and neither reduces to the other empirically
(@cite{arnold-wasow-losongco-ginstrom-2000} §2-3,
heaviness_and_newness_genuinely_independent) — reflects that real
utterances vary along both the structural-distance and surprisal
axes. The MaxEnt weights wH / wN are then empirical estimates of
how much each pressure dominates in a given construction, with the
underlying processing theory supplying the constraint definitions
themselves rather than leaving them as stipulated penalties.
Architectural anchor: the lossy-memory predictor #
MutualInfoProfile.weightedSum is itself a behavioural profile of a
deeper substrate: a MemoryProcess (@cite{futrell-gibson-levy-2020},
formalized in Theories.Processing.Memory.Basic) — a predictor that
reads from a lossily-encoded summary of the past rather than from the
raw history. Classical surprisal arises as the lossless special case
(MemoryProcess.expectedSurprisal_eq_surprisal_of_lossless in
Memory.LossyContext); finite-capacity memory shifts it upward by an
amount controlled by which information the encoder retains.
Both Arnold constraints are diagnostic of this finite memory:
*HEAVY-FIRST(DLM) penalises orderings that stress retention — long first constituents force the encoder to carry more history before integration, increasing per-bit memory cost.*NEW-FIRST(UID) penalises orderings that stress prediction — high-surprisal items at sentence-initial position face a memory state with no informative context yet to condition on.
The 4-level grounding chain is therefore explicit:
MemoryProcess (lossy substrate; Theories.Processing.Memory)
↓ (behavioural profile across distances)
MemorySurprisal.MutualInfoProfile (information locality)
↓ (specialise to one axis)
DLM (uniform info, distance varies) │ UID (uniform distance, info varies)
↓ (sign-of-cost-difference signal on a binary postverbal pair)
*HEAVY-FIRST │ *NEW-FIRST
What the new substrate buys here is not new theorems about Arnold's
data — heavyDiff_eq_dlm_signal and newDiff_pos_implies_uid_prefers_themeLast
already do that work — but a common architectural source for both
constraints. Where information locality says "DLM and UID are limits
of the same cost function", MemoryProcess says "and that cost
function is the expected surprisal of a memory-bottlenecked
predictor". The two-constraint MaxEnt grammar is then a quantitative
read-out of how that bottleneck shows up in postverbal ordering.