Pullum & Gazdar (1982) @cite{gazdar-pullum-1982} #
Natural Languages and Context-Free Languages. Linguistics and Philosophy, 4(4), 471–504.
Core Argument #
@cite{gazdar-pullum-1982} systematically examine every published argument purporting to demonstrate the non-context-freeness of some natural language, and show that each is invalid — either empirically or formally. The five arguments examined:
Comparatives (§2): @cite{chomsky-1963} claims English comparative constructions form an xy language (requiring nonidentity between substrings x and y). P&G show the empirical premise is wrong (the data doesn't support the nonidentity claim) and the formal premise is false: infinitely many xy languages are context-free. They exhibit a CF-PSG (grammar 9) generating {αxβyγ | x ≠ y, x,y ∈ (a+b)*}.
Pi (§3): Elster (1978) applies the pumping lemma directly to English sentences about the decimal expansion of π, arguing that the requirement for digits to match π's expansion makes English non-CF. P&G show this confuses grammaticality with arithmetic truth — the sentences are grammatical regardless of whether the digits are correct.
Respectively (§4): Bar-Hillel & Shamir (1960) and @cite{langendoen-1978} argue that respectively constructions require matching the number of elements in two lists, creating non-CF dependencies. P&G show both the empirical characterization of English respectively data and the formal reasoning are flawed.
Dutch (§5): Huybregts (1976) argues that Dutch cross-serial verb clusters create non-CF cross-serial dependencies of the form a₁...aₙb₁...bₙ. P&G construct explicit CF-PSGs (grammars 29, 32) that generate the correct Dutch subordinate clause strings with proper verb subcategorization, showing that cross-serial word ORDER alone does not exceed CF power.
Mohawk (§6): Postal (1964) and Langendoen (1977) argue that Mohawk noun incorporation creates non-CF string-copying (the incorporated noun-stem must match the external NP). P&G show that possessed incorporation (examples 40a–e) violates the stem-matching premise: incorporated stems need not match external NPs.
Key Formal Results #
Two constructive results are formalized here:
Grammar (29): A CF-PSG generating Dutch subordinate clause verb phrases with correct NP-verb valency, formalized using mathlib's
ContextFreeGrammar. This demonstrates that cross-serial word ORDER is context-free.The critical distinction: Cross-serial word order alone is CF (grammar 29). What is non-CF is cross-serial order PLUS case agreement across unbounded depth — proven for Swiss German by @cite{shieber-1985}, formalized in
Shieber1985.
Architectural Note #
This file uses mathlib's Language α and Language.IsContextFree /
Language.IsRegular natively. linglib's CFG-extension machinery
(ContextFreeGrammar.Tree, ContextFreeGrammar.Pumping, etc.) layers on
top of Mathlib.Computability.ContextFreeGrammar.
§2: xy Languages Can Be Context-Free #
@cite{gazdar-pullum-1982} refute @cite{chomsky-1963}'s claim that xy languages are inherently non-context-free. The language {aⁿbᵐ | n ≠ m} is both an xy language (nonidentity between the a-block and b-block) and context-free.
P&G's grammar (9) handles the more general language {αxβyγ | x,y ∈ (a+b)*, x ≠ y} with 8 nonterminals and ~15 rules. We use the simpler {aⁿbᵐ | n ≠ m} here, which makes the same point with 3 nonterminals and 6 rules.
Equations
- PullumGazdar1982.instDecidableEqXYNT x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- PullumGazdar1982.instReprXYNT = { reprPrec := PullumGazdar1982.instReprXYNT.repr }
Equations
- PullumGazdar1982.instReprXYNT.repr PullumGazdar1982.XYNT.S prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "PullumGazdar1982.XYNT.S")).group prec✝
- PullumGazdar1982.instReprXYNT.repr PullumGazdar1982.XYNT.S₁ prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "PullumGazdar1982.XYNT.S₁")).group prec✝
- PullumGazdar1982.instReprXYNT.repr PullumGazdar1982.XYNT.S₂ prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "PullumGazdar1982.XYNT.S₂")).group prec✝
Instances For
Terminals for the {aⁿbᵐ | n ≠ m} grammar.
Instances For
Equations
- PullumGazdar1982.instDecidableEqXYT x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- PullumGazdar1982.instReprXYT = { reprPrec := PullumGazdar1982.instReprXYT.repr }
Equations
- PullumGazdar1982.instReprXYT.repr PullumGazdar1982.XYT.a prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "PullumGazdar1982.XYT.a")).group prec✝
- PullumGazdar1982.instReprXYT.repr PullumGazdar1982.XYT.b prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "PullumGazdar1982.XYT.b")).group prec✝
Instances For
A CF-PSG generating {aⁿbᵐ | n ≠ m}.
Rules:
- S → S₁ | S₂
- S₁ → aS₁ | aS₁b | a (n > m: at least one extra a)
- S₂ → S₂b | aS₂b | b (n < m: at least one extra b)
This refutes the claim that xy languages are inherently non-CF. @cite{gazdar-pullum-1982} §2, with grammar (9) for the more general case.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Terminal symbols for Dutch subordinate clause verb phrases.
Following @cite{gazdar-pullum-1982} grammar (29b), these are the syntactic categories of words that appear in Dutch verb-final subordinate clauses. Each category determines the verb's valency (transitive vs intransitive) and whether it takes a VP complement.
- np : DutchT
- transInf : DutchT
- intrans : DutchT
- trVPInf : DutchT
- inVPInf : DutchT
- trVPFin : DutchT
- inVPFin : DutchT
Instances For
Equations
- PullumGazdar1982.instDecidableEqDutchT x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- PullumGazdar1982.instReprDutchT = { reprPrec := PullumGazdar1982.instReprDutchT.repr }
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- PullumGazdar1982.instDecidableEqDutchNT x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- PullumGazdar1982.instReprDutchNT = { reprPrec := PullumGazdar1982.instReprDutchNT.repr }
Equations
- One or more equations did not get rendered due to their size.
Instances For
@cite{gazdar-pullum-1982} grammar (29): a context-free grammar generating Dutch subordinate clause verb phrases with correct NP-verb valency.
Syntactic rules (29a):
- A → BCD | CE
- C → BCF | CG | BH | I
The grammar ensures each transitive verb has an NP argument and each intransitive verb does not, producing cross-serial NP-verb dependencies in the surface string. The cross-serial ORDER (NP₁...NPₙ V₁...Vₙ) is a consequence of the right-branching VP structure.
Crucially, this grammar makes NO case-agreement demands: the NPs are untyped. This is why the string set is context-free. Adding case agreement (requiring NP case to match verb case across unbounded depth) takes the language beyond CF — that is @cite{shieber-1985}'s argument.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Two-verb cross-serial clause: Marie Pieter laat schrijven ("Marie let Pieter write").
Surface: NP₁ NP₂ V₁ V₂ = np np trVPFin transInf
Cross-serial binding: NP₁(Marie) ──── V₁(laat) NP₂(Pieter) ──── V₂(schrijven)
Derivation: A ⇒ np·C·transInf ⇒ np·np·trVPFin·transInf
Three-verb cross-serial clause: Marie Pieter Arabisch laat zien schrijven ("Marie let Pieter see Arabisch write").
Surface: NP₁ NP₂ NP₃ V₁ V₂ V₃ = np np np trVPFin trVPInf transInf
Cross-serial binding: NP₁ ─────────────── V₁(laat) NP₂ ─────────────── V₂(zien) NP₃ ─────────────── V₃(schrijven)
Derivation: A ⇒ np·C·transInf ⇒ np·np·C·trVPInf·transInf ⇒ np·np·np·trVPFin·trVPInf·transInf
Intransitive clause with modal: wil zwemmen ("wants to swim").
Surface: V₁ V₂ = inVPFin intrans (no NPs — both verbs are intransitive)
Derivation: A ⇒ C·intrans ⇒ inVPFin·intrans
The critical distinction. Cross-serial word order alone is context-free (@cite{gazdar-pullum-1982} grammar 29, demonstrated above). What takes the language beyond CF is cross-serial order PLUS case agreement — the requirement that dative NPs match dative verbs and accusative NPs match accusative verbs across unbounded depth.
This distinction resolves the apparent contradiction:
- @cite{bresnan-etal-1982} argue Dutch cross-serial dependencies are non-CF
- @cite{gazdar-pullum-1982} show CF grammars CAN generate cross-serial strings
- @cite{shieber-1985} proves Swiss German IS non-CF, using case-marking
The resolution: @cite{bresnan-etal-1982}'s argument relied on constituency assumptions. The valid proof (@cite{shieber-1985}) uses case agreement, which grammar (29) deliberately omits.
The question of whether natural languages are context-free remained open as of 1982. @cite{gazdar-pullum-1982} conclude: "Notice that this paper has not claimed that all natural languages are CFL's. What it has shown is that every published argument purporting to demonstrate the non-context-freeness of some natural language is invalid."
The question was settled by @cite{shieber-1985}, who provided the first valid proof (for Swiss German) using case-marking — a purely string-based argument making no constituency assumptions.
The xy grammar language is context-free (by construction: we exhibit the CFG).
The Dutch cross-serial order grammar language is context-free.
The critical distinction, strengthened with mathlib predicates:
- The Dutch cross-serial order language IS context-free
- {aⁿbⁿcⁿdⁿ} (cross-serial order + case-matching) is NOT context-free