Shieber (1985) @cite{shieber-1985} #
Evidence against the Context-Freeness of Natural Language. Linguistics and Philosophy, 8(3), 333–343.
Core Argument #
@cite{shieber-1985} proves that Swiss German is not weakly context-free,
using a purely string-based argument that makes no assumptions about
constituent structure or semantics. The proof rests on four empirical
claims about Swiss German subordinate clauses, plus the closure of
context-free languages under string homomorphism and intersection with
regular languages (proven in Linglib.Core.Computability.ContextFreeGrammar.{Map, InterRegular}
following @cite{bar-hillel-perles-shamir-1961} / @cite{hopcroft-motwani-ullman-2000}
Theorems 7.24, 7.27).
The Four Claims #
- Swiss German subordinate clauses have structures where all Vs follow all NPs.
- Among such sentences, those with all DAT-NPs before all ACC-NPs, and all DAT-Vs before all ACC-Vs, exist as a regular-intersection-filterable subset. This is not a grammaticality condition on Swiss German itself — SG freely allows interleaved NP orderings — but the case-sorted subset is what Shieber's homomorphism + regular intersection isolates.
- The number of DAT-Vs equals the number of DAT-NPs (and similarly for ACC).
- An arbitrary number of Vs can occur (subject to performance).
What Is Formalized Here #
swiss_german_not_contextFree proves: Swiss German subordinate clauses are
not weakly context-free. The language swissGermanLang allows any token
sequence whose cross-serial core (after stripping matrix boundary tokens —
matrix subjects, complementizers, auxiliaries, finite verbs of the embedding
clause) has shape nps ++ vs with case counts matched. Matrix tokens may
appear interleaved anywhere; Shieber's homomorphism erases them.
The proof exercises both legs of the Bar-Hillel schema
(Language.not_isContextFree_via_witness from Closure.lean):
- Homomorphism leg (
tokenStringHom): cross-serial NPs and Vs map to singleton {a,b,c,d} letters; matrix tokens map to[](Shieber's ε-erasure). - Regular-intersection leg (
caseSorted = a*b*c*d*): selects only case-sorted images, isolating the canonical cross-serial shape from any source-string interleaving of cross-serial tokens.
The witness language stringMap h L ⊓ R = ambncmdn — non-CF by
ambncmdn_not_contextFree (two-parameter pumping). Per the Bar-Hillel
schema, source SG is non-CF.
Proof bottoms out at the two CFL closure theorems in
Linglib.Core.Computability.ContextFreeGrammar.{Map, InterRegular}
(homomorphism + intersection-with-regular, both proven; see those files for
@cite{bar-hillel-perles-shamir-1961} / @cite{hopcroft-motwani-ullman-2000}
construction details) and ambncmdn_not_contextFree (the two-parameter
pumping result) in Linglib.Core.Computability.NonContextFree.
A weaker pedagogical waypoint, swiss_german_diagonal_not_contextFree, uses
only the simpler one-parameter anbncndn substrate; it covers just the
diagonal (m = n) subset of SG clauses with no matrix material.
Remaining idealizations #
The matrix token is a single abstract constructor, not a richer model of
SG matrix material's internal structure (specific lexical items, agreement,
finite-verb fronting). For Shieber's purpose — showing non-CFness of the
string set — this abstraction is faithful: he too treats matrix material as
an erasable substring under the homomorphism. A richer model that ties
matrix-token positions to specific SG fragment data would be a substantive
extension; the current formalization captures Shieber's full argument
without it.
Contrast with @cite{bresnan-etal-1982} #
@cite{bresnan-etal-1982}'s earlier argument for Dutch non-context-freeness relied on linguistic assumptions about constituent structure, which @cite{gazdar-pullum-1982} contested. @cite{shieber-1985}'s argument is purely formal — it rests entirely on the string set of Swiss German and the case-marking facts, making no claims about phrase structure.
Subsequent literature #
- @cite{huybregts-1976} gave a contemporaneous Swiss German argument.
- Joshi (1985) introduced TAG and the mild-CS hierarchy; Vijay-Shanker–Weir (1994) established the equivalence of TAG/CCG/MCFG/MG within mild-CS. The MCS classification of cross-serial dependencies is post-Shieber and not attributable to him; he proved only not weakly CF.
- Manaster-Ramer (1987) raised concerns about the universality of Shieber's case-matching premise.
- Stabler (1997) and Kobele (2006) provide MG analyses of the same data.
The processing-side dissociation (cross-serial easier than nested despite
greater formal complexity) is the subject of @cite{bach-brown-marslen-wilson-1986},
formalized in Phenomena.WordOrder.Studies.BachBrownMarslenWilson1986 —
not duplicated here.
A Swiss German subordinate clause token, abstracting over specific lexical items to their role in the cross-serial construction.
@cite{shieber-1985}'s argument projects SG sentences to four case-marked
classes (DAT-NP / ACC-NP / DAT-V / ACC-V) plus boundary material. The
matrix constructor abstracts non-cross-serial tokens — matrix subjects,
complementizers, auxiliaries, finite verbs of the embedding clause, etc.
Shieber's homomorphism erases these to ε; our tokenStringHom does the
same by mapping matrix to []. This is what makes the proof apply to
full Swiss German rather than just the NPV sub-shape — matrix tokens
can appear interleaved anywhere in a sentence, and the homomorphism
strips them, so the regex-filtered image picks up the cross-serial
core regardless of how it's wrapped in the source string.
- datNP : Token
Dative NP (e.g., em Hans)
- accNP : Token
Accusative NP (e.g., d'chind, de Hans)
- datV : Token
Dative-subcategorizing verb (e.g., hälfe "help")
- accV : Token
Accusative-subcategorizing verb (e.g., lönd "let", aastriiche "paint")
- matrix : Token
Boundary material: matrix subject, complementizer, auxiliary, finite verb of the embedding clause, etc. Erased by
tokenStringHom(Shieber's projection).
Instances For
Equations
- Shieber1985.instDecidableEqToken x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- Shieber1985.instReprToken.repr Shieber1985.Token.datNP prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Shieber1985.Token.datNP")).group prec✝
- Shieber1985.instReprToken.repr Shieber1985.Token.accNP prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Shieber1985.Token.accNP")).group prec✝
- Shieber1985.instReprToken.repr Shieber1985.Token.datV prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Shieber1985.Token.datV")).group prec✝
- Shieber1985.instReprToken.repr Shieber1985.Token.accV prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Shieber1985.Token.accV")).group prec✝
- Shieber1985.instReprToken.repr Shieber1985.Token.matrix prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Shieber1985.Token.matrix")).group prec✝
Instances For
Equations
- Shieber1985.instReprToken = { reprPrec := Shieber1985.instReprToken.repr }
Whether a token is a case-marked NP (vs a verb or matrix material).
Equations
- Shieber1985.Token.datNP.isNP = true
- Shieber1985.Token.accNP.isNP = true
- Shieber1985.Token.datV.isNP = false
- Shieber1985.Token.accV.isNP = false
- Shieber1985.Token.matrix.isNP = false
Instances For
Whether a token is a case-subcategorizing verb (vs an NP or matrix material).
Equations
- Shieber1985.Token.datNP.isV = false
- Shieber1985.Token.accNP.isV = false
- Shieber1985.Token.datV.isV = true
- Shieber1985.Token.accV.isV = true
- Shieber1985.Token.matrix.isV = false
Instances For
Whether a token is matrix (boundary) material (vs a cross-serial NP/V).
Equations
- Shieber1985.Token.datNP.isMatrix = false
- Shieber1985.Token.accNP.isMatrix = false
- Shieber1985.Token.datV.isMatrix = false
- Shieber1985.Token.accV.isMatrix = false
- Shieber1985.Token.matrix.isMatrix = true
Instances For
The abstract shape of a case-sorted cross-serial clause: counts of each (case × NP/V) combination, in the canonical order DAT-NPs, ACC-NPs, DAT-Vs, ACC-Vs.
This is not a grammaticality condition on Swiss German — SG freely allows interleaved NP orderings — but the case-sorted shape is what Shieber's homomorphism + regular intersection isolates from the full SG language.
- datNPs : ℕ
- accNPs : ℕ
- datVs : ℕ
- accVs : ℕ
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Equations
- One or more equations did not get rendered due to their size.
Instances For
Claim 3: case matching — the number of dative verbs equals the number of dative NPs, and similarly for accusative.
Equations
- c.caseMatches = (c.datNPs = c.datVs ∧ c.accNPs = c.accVs)
Instances For
A case-sorted clause that satisfies case matching. Renamed from the
misleading GrammaticalClause — case matching is part of grammaticality,
but case-sorting is not (per Claim 2 framing above).
- datNPs : ℕ
- accNPs : ℕ
- datVs : ℕ
- accVs : ℕ
- matching : self.caseMatches
Instances For
Claim 4: any combination of dative and accusative verb counts can occur
(we can produce a CaseMatchedClause for any m, n).
Equations
- Shieber1985.arbitraryDepth m n = { datNPs := m, accNPs := n, datVs := m, accVs := n, matching := ⋯ }
Instances For
Shieber's homomorphism f: maps Swiss German cross-serial clause tokens
to the abstract alphabet {a, b, c, d}. Matrix tokens have no meaningful
image; the placeholder .a is never observed by any consumer (every use
site filters via isMatrix or constructs from non-matrix tokens). The
real homomorphism tokenStringHom ERASES matrix tokens to [].
Equations
- Shieber1985.tokenToSymbol Shieber1985.Token.datNP = FourSymbol.a
- Shieber1985.tokenToSymbol Shieber1985.Token.accNP = FourSymbol.b
- Shieber1985.tokenToSymbol Shieber1985.Token.datV = FourSymbol.c
- Shieber1985.tokenToSymbol Shieber1985.Token.accV = FourSymbol.d
- Shieber1985.tokenToSymbol Shieber1985.Token.matrix = FourSymbol.a
Instances For
Shieber's homomorphism, lifted to lists. Cross-serial NPs and Vs map to
singleton {a,b,c,d} letters; matrix tokens are erased. This []-valued
case is essential for full-SG fidelity: it lets the schema apply to
sentences with arbitrary boundary material wrapping the cross-serial
core. The function is no longer a StringHom.letterMap for that reason.
Equations
Instances For
The token sequence corresponding to a case-matched clause: NPs first (DAT then ACC), then Vs (DAT then ACC).
Equations
- One or more equations did not get rendered due to their size.
Instances For
The string image of a case-matched clause's token sequence under
tokenStringHom: a string in {a,b,c,d}*.
Equations
- Shieber1985.clauseImage c = List.replicate c.datNPs FourSymbol.a ++ List.replicate c.accNPs FourSymbol.b ++ List.replicate c.datVs FourSymbol.c ++ List.replicate c.accVs FourSymbol.d
Instances For
The image of a case-matched clause is the StringHom.apply of
tokenStringHom on its case-sorted token sequence.
A case-matched clause with m DAT-pairs and n ACC-pairs maps to
aᵐ bⁿ cᵐ dⁿ.
Setting m = n in the clause image gives aⁿbⁿcⁿdⁿ.
The diagonal clause images are in {aⁿbⁿcⁿdⁿ}.
The diagonal subset of Swiss German cross-serial clauses (encoded as
token sequences): case-matched, case-sorted, with m = n.
Equations
- Shieber1985.swissGermanDiagonalLang = {ts : List Shieber1985.Token | ∃ (n : ℕ), ts = Shieber1985.caseSortedTokens (Shieber1985.arbitraryDepth n n)}
Instances For
The homomorphic image of the diagonal SG language under tokenStringHom
is exactly anbncndn.
This is the load-bearing equality for Shieber's argument: it sits at the
interface between linguistic data (the diagonal SG clauses) and the
formal-language witness (anbncndn) whose non-CF status is established
in Linglib.Core.Computability.NonContextFree.
Diagonal Swiss German is not context-free. A formal corollary of
@cite{shieber-1985}'s argument, using only the one-parameter substrate
{aⁿbⁿcⁿdⁿ}. The image of the diagonal SG language under tokenStringHom
equals anbncndn (not CF), so by closure under string homomorphism
(Linglib.Core.Computability.ContextFreeGrammar.Closure), the source is not CF.
The Swiss German cross-serial language. A token list ts is in this
language iff:
- stripping matrix tokens leaves a list of form
nps ++ vs(NPs precede Vs in the cross-serial core), and - the case counts match (#DAT-NP = #DAT-V, #ACC-NP = #ACC-V).
Matrix tokens (subjects, complementizers, auxiliaries) may appear anywhere
interleaved with cross-serial NPs/Vs — Shieber's homomorphism erases them.
The case-sorted canonical form caseSortedTokens (arbitraryDepth m n)
(no matrix tokens, NPs and Vs each in canonical case order) is properly
contained.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Every case-matched, case-sorted clause (no matrix material) is in the (more general) free SG language.
DFA states for the case-sorted shape a*b*c*d*.
- sA : CaseSortedState
- sB : CaseSortedState
- sC : CaseSortedState
- sD : CaseSortedState
- sDead : CaseSortedState
Instances For
Equations
- Shieber1985.instDecidableEqCaseSortedState x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- Shieber1985.instReprCaseSortedState = { reprPrec := Shieber1985.instReprCaseSortedState.repr }
DFA recognizing a*b*c*d* over FourSymbol.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Concrete witness: every aᵐbⁿcᵐdⁿ is case-sorted.
Any case-sorted FourString decomposes uniquely into block-sorted
aᵖ ++ bᵠ ++ cʳ ++ dˢ.
The Bar-Hillel-style intersection equality: full SG (with arbitrary matrix
material), projected to {a,b,c,d} and filtered to case-sorted shape,
equals exactly the non-CF language aᵐbⁿcᵐdⁿ.
@cite{shieber-1985}'s main theorem. Swiss German subordinate clauses are not weakly context-free.
Proof structure (the actual Shieber schema, both legs exercised):
apply Language.not_isContextFree_via_witness with the tokenStringHom
homomorphism + caseSorted regular filter; the witness is that
stringMap h L ⊓ R = ambncmdn, which is non-CF by
ambncmdn_not_contextFree (two-parameter pumping). The R-leg matters:
swissGermanLang allows interleaved NP/V orderings, and the regular
filter forces the case-sorted canonical shape needed to land in ambncmdn.
Verbs in the Swiss German fragment have case requirements that match their Token-level classification (DAT-V or ACC-V). Collapsed from three near-identical theorems into one decide-checked conjunction.
Example (1): mer em Hans es huus hälfed aastriiche "we helped Hans paint the house"
em Hans (DAT) → hälfed (DAT-verb "helped"); es huus (ACC) → aastriiche (ACC-verb "paint").
Equations
Instances For
Example (5): triply embedded cross-serial clause mer d'chind em Hans es huus lönd hälfe aastriiche "we let the children help Hans paint the house"
d'chind (ACC) → lönd (ACC-verb "let"); em Hans (DAT) → hälfe (DAT-verb "help"); es huus (ACC) → aastriiche (ACC-verb "paint").
With case sorting: 1 DAT-NP, 2 ACC-NPs, 1 DAT-V, 2 ACC-Vs.
Equations
Instances For
The homomorphic image of example (5) is abbcdd = a¹b²c¹d².