Documentation

Linglib.Phenomena.WordOrder.Studies.ArnoldEtAl2000

Heaviness vs. Newness in Constituent Ordering @cite{arnold-wasow-losongco-ginstrom-2000} #

@cite{arnold-wasow-losongco-ginstrom-2000} use a corpus analysis (the Aligned-Hansard corpus, with verbs give, bring...to, take...into account) and an elicitation experiment (with give) to disentangle two confounded predictors of English postverbal constituent ordering:

  1. Heaviness — relative word count: heavier constituents come later (Behaghel's Gesetz der wachsenden Glieder @cite{behaghel-1909}; "end-weight" in @cite{quirk-greenbaum-leech-svartvik-1972}).
  2. Newness — discourse status: given material precedes new information. The principle predates @cite{prince-1981}'s given/inferable/new taxonomy that the paper codes with; @cite{gundel-hedberg-zacharski-1993} place it in the broader accessibility-hierarchy literature.

The paper's central empirical claim is that both factors independently predict ordering in dative alternation and heavy NP shift; neither reduces to the other (§2-3). The deeper theoretical point of §5 is that the two factors interact: heaviness exerts more influence when newness is non-discriminating, and vice versa, consistent with a constraint-based architecture in which "the strength of a constraint is greater when competing constraints are weak".

Formalization strategy #

Following @cite{coetzee-pater-2011}, we model the system as a MaxEnt grammar (@cite{goldwater-johnson-2003}) over the binary ordering candidates {themeLast, goalLast} with two markedness constraints:

The harmony-score difference between the two orderings decomposes additively (score_diff_eq_components) into a heaviness term wH * heavyDiff p and a newness term wN * newDiff p, where heavyDiff p, newDiff p ∈ {-1, 0, +1} record the per-constraint signed preference for themeLast over goalLast. Every prediction theorem below is a one-line consequence of this decomposition.

The §5 interaction story falls out as heaviness_dominates_when_newness_neutral: when the newness constraint is silent on a pair, the harmony difference is exactly wH * heavyDiff — so any change in heaviness is undiluted by competing pressure. The sigmoid shape of MaxEnt softmax then turns this into the empirical "newness has more effect when heaviness is balanced" pattern.

Non-reducibility of the two factors #

heaviness_and_newness_genuinely_independent exhibits two pairs that witness the §2-3 claim that neither factor reduces to the other:

A theory operationalizing only one of the two factors must give the same prediction at one of these contrasts as at the trivial baseline, contradicting the paper's findings.

Bridges #

A constituent characterized by the two dimensions @cite{arnold-wasow-losongco-ginstrom-2000} measure: word count (heaviness) and discourse status (newness). Concrete syntactic structure is abstracted away — these two scalars exhaust what the paper's regressions condition on.

Instances For
    def ArnoldEtAl2000.instDecidableEqPhrase.decEq (x✝ x✝¹ : Phrase) :
    Decidable (x✝ = x✝¹)
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      def ArnoldEtAl2000.instReprPhrase.repr :
      PhraseStd.Format
      Equations
      • One or more equations did not get rendered due to their size.
      Instances For
        @[reducible, inline]

        The two constituents of a binary postverbal alternation. For the dative alternation, (theme, goal); for heavy NP shift, (direct object, prepositional phrase). The constraints below are construction-neutral.

        Equations
        Instances For

          Which of the two constituents occupies the second (sentence-final) slot. themeLast is the prepositional dative for DA and the shifted V PP DO for HNPS; goalLast is the double object for DA and the canonical V DO PP for HNPS.

          Instances For
            @[implicit_reducible]
            Equations
            def ArnoldEtAl2000.instReprOrder.repr :
            OrderStd.Format
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              *HEAVY-FIRST: violated when the first (verb-adjacent) constituent is strictly heavier than the second. The OT-style markedness encoding of @cite{behaghel-1909}'s law of growing constituents: avoid placing the longer constituent first.

              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                *NEW-FIRST: violated when the first constituent is discourse-new while the second is discourse-given. A markedness encoding of the given-before-new principle the paper draws from @cite{prince-1981}/@cite{gundel-hedberg-zacharski-1993}.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  The two-constraint MaxEnt grammar parameterized by weights. wH weights *HEAVY-FIRST; wN weights *NEW-FIRST.

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    The heaviness constraint's signed preference for themeLast over goalLast on a pair: +1 when the theme (p.1) is heavier (so placing it last avoids violation), -1 when the goal (p.2) is heavier, 0 when they are equal.

                    Equations
                    Instances For

                      The newness constraint's signed preference for themeLast over goalLast on a pair: +1 when the theme is new and the goal given (so placing the theme last respects given-before-new), -1 when the goal is new and the theme given, 0 otherwise.

                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        The harmony-score difference decomposes additively into per-constraint signed preferences scaled by their weights. This is the foundational identity of the formalization — every prediction theorem below is a one-step consequence.

                        heavyDiff is positive iff the theme (p.1) is strictly heavier than the goal — i.e., *HEAVY-FIRST prefers themeLast.

                        newDiff is positive iff the theme is new while the goal is given — i.e., *NEW-FIRST prefers themeLast.

                        Heaviness independently predicts ordering. With the newness weight zeroed out, a positive heaviness weight is enough to make the order placing the heavier constituent last strictly more probable. Symmetric in the heavier-side direction: when heavyDiff p is non-zero, its sign determines which order wins.

                        Newness independently predicts ordering. With the heaviness weight zeroed out, a positive newness weight is enough to make the order placing the new constituent last strictly more probable.

                        theorem ArnoldEtAl2000.both_factors_compose {p : Pair} {wH wN : } (hH : 0 wH) (hN : 0 wN) (hHeavy : 0 heavyDiff p) (hNew : 0 newDiff p) (hStrict : 0 < wH * heavyDiff p 0 < wN * newDiff p) :

                        Both factors compose additively. When neither factor opposes themeLast (both per-constraint contributions are non-negative) and at least one strictly favors it, themeLast wins — without requiring the caller to compute the combined sum. No separate interaction term is needed to reproduce the experiment's significant heaviness × newness term in logistic regression: it falls out of additive harmony plus the sigmoid shape of MaxEnt probability.

                        Tradeoff theorem. When heaviness and newness conflict — one favors themeLast, the other goalLast — the prediction depends on which side has the larger weighted contribution. This is the constraint-based architecture @cite{arnold-wasow-losongco-ginstrom-2000} argue for in §5.

                        @cite{arnold-wasow-losongco-ginstrom-2000} §5 observe that "heaviness had the largest effect on utterances where both constituents were given" — and in general, "the strength of a constraint is greater when competing constraints are weak". The next two theorems derive this directly from MaxEnt's additive harmony: when one constraint contributes 0 to the harmony difference (its eval is identical on both candidates), the entire difference is borne by the other constraint, undiluted. The sigmoid then translates a larger harmony differential into a larger probability shift.

                        Constraint-interaction theorem (heaviness side). When the newness constraint is neutral on a pair (newDiff p = 0, i.e., the two constituents share the same givenness status), the harmony difference between orderings is determined entirely by the weighted heaviness term.

                        Constraint-interaction theorem (newness side). When the heaviness constraint is neutral on a pair (heavyDiff p = 0, i.e., the two constituents are equally long), the harmony difference is determined entirely by the weighted newness term. The paper's elicitation experiment (where give stimuli held NP length roughly constant) is exactly this regime — and unsurprisingly newness showed a larger effect there than in the corpus study.

                        Give the carrot to the white rabbit who lived in the briar patch. Heavy goal (8 words), light theme (1 word), both new — the heaviness contrast (newness is silent here).

                        Equations
                        Instances For

                          Give Alice the carrot. (Theme new, goal given.) Equal length, pure newness contrast (heaviness is silent here).

                          Equations
                          Instances For

                            Non-reducibility witness. The two contrast pairs jointly establish the paper's central claim that neither factor reduces to the other: heavyGoalContrast activates only heaviness (newness differential is zero), newThemeContrast activates only newness (heaviness differential is zero). Any theory that operationalizes only one of the two dimensions must collapse the prediction at one contrast to the trivial baseline, contradicting @cite{arnold-wasow-losongco-ginstrom-2000}.

                            Pure-heaviness MaxEnt grammar predicts goal-last (heavier-last) when the goal is heavier than the theme. Direct application of the heavyDiff-symmetric independence theorem on the swapped pair.

                            Pure-newness MaxEnt grammar predicts theme-last (given-first) when the theme is new and the goal is given.

                            The two-constraint grammar packaged as a generic MaxEntGrammar over pairs (input) and orderings (output). This makes the library's softmax probability infrastructure (MaxEntGrammar.prob, the ConstraintSystem bridge, softmax_argmax_limit for the OT limit, etc.) available without redefinition.

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              totalDepLength (from DependencyLength.lean) is a candidate formalization of @cite{behaghel-1909}'s end-weight effect — and @cite{arnold-wasow-losongco-ginstrom-2000} discuss @cite{hawkins-1990}'s parsing-theoretic version (Early Immediate Constituents) as an instance. The next three lemmas show that any such purely structural account cannot, on its own, reproduce the newness effect: dependency length is a function of the dependency structure alone — it never reads the words. So no DLM-derived predictor distinguishes a sentence with discourse-given NPs from a sentence with discourse-new NPs sharing the same dependency tree.

                              Combined with newness_independently_predicts, this implies any adequate theory of postverbal ordering must combine a weight constraint with at least one further dimension — here, discourse status.

                              theorem ArnoldEtAl2000.dlm_word_invariant (deps : List DepGrammar.Dependency) (rootIdx : ) (words₁ words₂ : List Word) :
                              DepGrammar.DependencyLength.totalDepLength { words := words₁, deps := deps, rootIdx := rootIdx } = DepGrammar.DependencyLength.totalDepLength { words := words₂, deps := deps, rootIdx := rootIdx }

                              totalDepLength ignores word identity: it depends only on the dependency structure (head index × dep index × relation).

                              theorem ArnoldEtAl2000.depLength_ignores_relation (h d : ) (r₁ r₂ : UD.DepRel) :
                              DepGrammar.DependencyLength.depLength { headIdx := h, depIdx := d, depType := r₁ } = DepGrammar.DependencyLength.depLength { headIdx := h, depIdx := d, depType := r₂ }

                              Even at the single-dependency level, depLength is |head − dep| — the grammatical relation is irrelevant.

                              theorem ArnoldEtAl2000.dlm_discourse_blind (deps : List DepGrammar.Dependency) (rootIdx : ) (givenWords newWords : List Word) :
                              DepGrammar.DependencyLength.totalDepLength { words := givenWords, deps := deps, rootIdx := rootIdx } = DepGrammar.DependencyLength.totalDepLength { words := newWords, deps := deps, rootIdx := rootIdx }

                              Corollary of dlm_word_invariant: trees that differ only in whether their NPs are discourse-given or discourse-new receive identical DLM cost. So Dependency Locality, as a pure tree-structural cost, cannot reproduce the newness effect that @cite{arnold-wasow-losongco-ginstrom-2000} demonstrate.

                              @cite{futrell-gibson-2020} establish dependency length minimization (DLM) as the explanatory principle behind a wide range of word-order universals, including @cite{behaghel-1909}'s law of growing constituents (their §2.3). The argument: in a head-initial language, when a head V has multiple right-dependents, total dependency length from V is minimized by ordering them shortest-first, because the head→dep distance to the second constituent equals the length of the first plus one.

                              Specialized to Arnold's binary postverbal alternation:

                              So goalLast (V theme goal) costs |theme|+2 and themeLast (V goal theme) costs |goal|+2. DLM picks the order whose first constituent is shorter — and that is what the *HEAVY-FIRST constraint operationalizes. The heavyDiff sign is therefore not a free parameter of the formalization but a theorem about DLM.

                              DLM cost contribution from the verb to its two postverbal complement heads, under a head-initial binary structure. The verb sits at position 0; the first constituent occupies positions 1…|first|, so its head (also at position 1, head-initial) is distance 1 from V, and the second constituent's head is at position |first|+1.

                              Equations
                              Instances For

                                Heaviness is DLM, not a stipulation. The MaxEnt grammar's *HEAVY-FIRST constraint signal heavyDiff is exactly the sign of the DLM cost difference between the two orderings. With this bridge, heavyDiff is no longer a primitive of the formalization — it is a theorem about which ordering @cite{futrell-gibson-2020}'s dependency-length cost minimizes on a binary postverbal pair.

                                The DLM cost difference matches heavyDiff numerically up to scale: cost(goalLast) - cost(themeLast) = p.1.wordCount - p.2.wordCount, which has the same sign as heavyDiff. This is the "exact" arithmetic version of heavyDiff_eq_dlm_signal and makes the DLM-cost gap directly computable from word counts.

                                Genzel & Charniak's uniform information density (UID), elaborated in @cite{levy-2008}'s expectation-based parsing, predicts that high-surprisal material should be placed late in an utterance: by then more context has been processed, so high-information words can be integrated with greater predictability and lower per-step processing load.

                                For Arnold's binary postverbal pair, this maps cleanly: discourse-new material is high-surprisal (the listener must construct a fresh referent), discourse-given material is low-surprisal (the referent is already active). UID therefore prefers placing the new constituent last — exactly the direction *NEW-FIRST operationalizes.

                                Unlike the DLM/heaviness bridge, this is an implication rather than a biconditional. Focus marking lives on its own axis (Features.InformationStructure.Focus) and would enter UID via a separate Focus-parameterized cost, not by extending this givenness surprisal. Whenever newDiff p > 0, UID strictly prefers the same ordering as *NEW-FIRST.

                                A coarse two-level surprisal proxy keyed on givenness: .new is high-information (1), .given is low (0). This matches the asymmetric pattern of Arnold's *NEW-FIRST constraint, which fires only when one side is .new and the other is .given.

                                Equations
                                Instances For
                                  def ArnoldEtAl2000.uidCost (p : Pair) :
                                  Order

                                  UID cost for the binary postverbal pair: the surprisal of whichever constituent occupies the verb-adjacent (first) position. UID prefers delaying high-surprisal material, so this should be minimized.

                                  Equations
                                  Instances For

                                    Newness is UID, in the direction *NEW-FIRST cares about. Whenever the MaxEnt grammar's newDiff signal favors themeLast (theme new + goal given), UID strictly prefers the same order.

                                    With §9 and §10 in place, Arnold's two MaxEnt constraints are no longer free stipulations — each is the boundary signal of an independently-motivated processing cost already formalized in linglib:

                                    ConstraintBridgeCost lives in
                                    *HEAVY-FIRSTheavyDiff_eq_dlm_signalTheories.Syntax.DependencyGrammar.Formal.DependencyLength
                                    *NEW-FIRSTnewDiff_pos_implies_uid_prefers_themeLastTheories.Processing.MemorySurprisal (information locality)

                                    The two costs unify under @cite{futrell-2019}'s information locality framework (see Theories.Processing.MemorySurprisal.Basic, MutualInfoProfile.weightedSum): both DLM and UID are special cases of minimizing Σ (memory cost × mutual information) across the utterance.

                                    The fact that *HEAVY-FIRST and *NEW-FIRST are both needed in the MaxEnt grammar — and neither reduces to the other empirically (@cite{arnold-wasow-losongco-ginstrom-2000} §2-3, heaviness_and_newness_genuinely_independent) — reflects that real utterances vary along both the structural-distance and surprisal axes. The MaxEnt weights wH / wN are then empirical estimates of how much each pressure dominates in a given construction, with the underlying processing theory supplying the constraint definitions themselves rather than leaving them as stipulated penalties.

                                    Architectural anchor: the lossy-memory predictor #

                                    MutualInfoProfile.weightedSum is itself a behavioural profile of a deeper substrate: a MemoryProcess (@cite{futrell-gibson-levy-2020}, formalized in Theories.Processing.Memory.Basic) — a predictor that reads from a lossily-encoded summary of the past rather than from the raw history. Classical surprisal arises as the lossless special case (MemoryProcess.expectedSurprisal_eq_surprisal_of_lossless in Memory.LossyContext); finite-capacity memory shifts it upward by an amount controlled by which information the encoder retains.

                                    Both Arnold constraints are diagnostic of this finite memory:

                                    The 4-level grounding chain is therefore explicit:

                                    MemoryProcess (lossy substrate; Theories.Processing.Memory)
                                       ↓  (behavioural profile across distances)
                                    MemorySurprisal.MutualInfoProfile (information locality)
                                       ↓  (specialise to one axis)
                                    DLM (uniform info, distance varies) │ UID (uniform distance, info varies)
                                       ↓  (sign-of-cost-difference signal on a binary postverbal pair)
                                    *HEAVY-FIRST                        │ *NEW-FIRST
                                    

                                    What the new substrate buys here is not new theorems about Arnold's data — heavyDiff_eq_dlm_signal and newDiff_pos_implies_uid_prefers_themeLast already do that work — but a common architectural source for both constraints. Where information locality says "DLM and UID are limits of the same cost function", MemoryProcess says "and that cost function is the expected surprisal of a memory-bottlenecked predictor". The two-constraint MaxEnt grammar is then a quantitative read-out of how that bottleneck shows up in postverbal ordering.