2

 Rule system, Principles & Parameters, Minimalist Ideas

 

2.0   Introduction

Now a more detailed account of Chomsky’s notion of language is in order.  Throughout the years of Chomsky’s linguistic enterprise, it has been thought that syntax, that is, the core of I-language, is the basis of the phonological (sound) and semantic (meaning) properties of linguistic expressions.  However, this ‘language machine’ has undergone great conceptual changes. 

In this chapter, we shall trace the steps in the development of Chomsky’s linguistics.  It may be divided into three phases, as Chomsky does, with reference to three different notions of the generative procedure: (1) the rule-system phase, S0 being considered as a format of rule systems and SL as a selected instance of the format given at S0, that is, a particular rule system, in accordance with some evaluation measure; (2) the principles-and-parameters (pre-minimalist) phase, S0 being considered as an innate schema of universal principles and parameters (or options) and SL as an instantiation of this schema, with the values of the parameters fixed; (3) the minimalist program, advancing (2)’s P&P approach along the minimalist line, that is, introducing economy principles, minimizing or simplifying (2)’s theoretical framework ¾ the framework requiring further idealisation of the language faculty.[1]  In 2.1, we shall see how (1) is developed into (2).  Then we shall highlight, in 2.2, some main features advanced in (3), and in 2.3, demonstrate briefly how the computational system of the language faculty as conceived in phase (3) operates.  Finally, in 2.4, this chapter will be closed with a recapitulation of the minimalist assumptions.

The persistent goals of Chomsky’s linguistics are descriptive and explanatory adequacy and above all, reducing the tension between them.  Our attention will be restricted to the various frameworks of its past and present, especially the present one, which is known as Minimalist Program (MP)[2].  Simple examples will be given in the following introduction, since our interests are chiefly the evolving concept of generative procedure and the basic ideas that motivated the approaches of different stages.

 

2.1   From specific rules to principles and parameters

The emergence and ascendance of generative grammar to the dominant position in the field indicates a basic shift of focus: “from behavior or the products of behavior to states of the mind/brain that enter into behavior.”[3]  In the 1950s there occurred a revival of the view that language is acquired on the basis of innate capacities and that language is essentially concerned with the biologically endowed mental structure rather than with training.  Traditional or pedagogical grammar provided useful information about particular languages, including lists of exceptions, irregularities, paradigms and examples, etc. but ignored the facts that speaking and learning to speak have very much to do with intelligence (or cognitive capacities).  That use of language depends on a certain kind of knowledge was simply taken for granted and yet no specification of this knowledge was given.  Structuralist grammar made explicit the procedures for studying linguistic data but its scope was limited to topics of phonology and morphology.  Both of these grammars were data-oriented, and neglected the relevant mental mechanism (competence) that generates an infinite set of linguistic expressions, that is, the ‘creative’ nature of language.[4]

            What is knowledge of language?  How does this knowledge arise?  To answer these two elementary questions has become the task of generative grammarians since the earliest stage.  The first question asks for a description, while the second question demands an explanation, of such knowledge.  It is by no means a long-term memory of a full list of sentences.  The human mind does not allow a tremendous storage like that.  In addition, the relative ease and speed of learning a language, and the ability of understanding and producing novel sentences also suggest that knowledge of language consists in grasping a definite set of rules.   This conception motivates the first phase of Chomsky’s linguistics, in which linguistic competence was formalised in a format of rule systems.

 

2.1.1  The rule-system phase

As Chomsky remarks on the early generative grammar,

UG provides a format for permissible rule systems; any instantiation of this format constitutes a specific language.  Each language is a rich and intricate system of rules that are, typically, construction-particular and language-particular: the rules forming verb phrases or passives or relative clauses in English, for example, are specific to these constructions in this language.  Similarities across constructions and languages derive from properties of the format for rule systems.[5]

 

Two types of rules were permitted according to this idea: (1) phrase structure (PS) rules and (2) transformational rules.

The format proposed in the earliest work allowed for two types of rules: phrase structure rules that form phrase-markers ..., namely, representations in which categorial structure (noun phrase, prepositional phrase, clause, etc.) is indicated; and transformational rules that convert phrase markers into other phrase-markers.  This proposed format for rules was adapted from the traditional descriptive and historical grammar, recast in terms of ideas developed in the theory of computation (recursive function theory, the theory of algorithms.)[6]

 

Consider the following examples, which illustrate the earliest proposal (adapted from the one in Syntactic Structures).

            (1) David bought the house

(1a)

 

            (1b) [S [NP [N David]] [VP [V bought] [NP [DET the] [N house]]]]

 

The constituent structure of (1) is represented in (1a) (known as tree diagram or Phrase-marker) or (1b) (labelled bracketing) showing the hierarchical relations and linear order of the sentence’s constituents.[7]  It is supposed that underlying (1) is a basic string associated with a structural description (SD) called a base Phrase-marker.[8]

            The following simple system of phrase structure (rewrite) rules are the base of the syntactic component generating (1)’s underlying base Phrase-marker (where S stands for sentence, NP for noun phrase, VP for verb phrase, DET for determiner, N for noun and ® for is written as):

(2) (i)     S ®  NP  VP

     (ii)    VP ®  V  NP

     (iii)   NP ®  DET  N

     (iv)   NP ®  N

     (v)    V  ®  bought, (buy, took, take, walk, walked, etc.)

     (vi)   N  ®  house, (cat, sincerity, orange, etc.)

     (vii)  N ®  David, (Mary, Tom, He, etc.)

     (viii) DET ®  the

 

(i) to (iv) are syntactic rules, and (v) to (viii) are lexical rules ((vi) and (vii) may be collapsed into one single rule).  We can get the following derivation by this rule system:

(3) Sentence (S)

NP     +    VP                                                   (i)

N       +    VP                                                   (iv)

N       +    V     +    NP                                     (ii)

David  +   V     +    NP                                    (vii)

David  + bought +    NP                                 (v)

David  + bought +  DET  +  N                       (iii)

David  + bought +  the  +  N                          (viii)

David  + bought +  the  +  house                   (vi)

 

Applying certain lexical-phonological rules representing the phonological properties of the lexical items, we know how the sentence is pronounced.

            More complicated sentences can be specified by expanding the rule system (2).  For instance, we have: 

            (4)        (i)  S ®  NP (Aux) VP,

                        (ii) VP ®  V (NP) (PP),

                        (iii) PP ® P NP,

                        (iv) VP ® (ADV) V (NP)(ADV),

                        (v)  NP ®  NP PP,    [e.g. the picture of a mountain]

(vi) PP ®  PP PP,     [e.g. under the desk of the pupil][9]

(vii) Aux ®  will, can, have, may, would ...

where (v) and (vi) are recursive rules, deriving an instance of the same category. Exercising this kind of rule can form indefinitely long sentences.   In (4)(i), (ii) and (iv), some category symbols are put in brackets, which means that they are optional.

            In order to exclude ungrammatical sentences like David bought, lexical rules like

(5)  V ®  bought/__ NP

are employed to replace (2)(v), to restrict the use of the transitive verb bought to be followed by a NP.  Rules of this kind, which concern the environment of a particular category in a string, are called context-sensitive rules.  Those that ignore the context are context-free rules.

            Phrase structure rules satisfy the demand of generative grammar, since they are explicitly formulated.  They have limitations, however, Chomsky thinks. They work rather mechanically as an immediate constituent analysis.[10]  Each step of a derivation depends on the last step and every operation of a PS rule can only concern one element of a string at a time.  One can imagine that the PS rules which purport to specify a more complicated sentence than (1), such as a conjunction, question or passive construction, will be very complex.  Moreover, the following apparently related sentences would have to be derived from five different specific sets of PS rules:

(6)  David bought the house

(7)  The house was bought by David

(8)  Did David buy the house

(9)  What did David buy

(10)  The house, David bought

            To enhance the simplicity and descriptive power of generative grammar, Chomsky introduced the transformational rules.   Let us see how generative grammar could be made much simpler with the addition of these rules.  To deal with the passive constructions, the PS rule of auxiliary phrase (Aux), that is, (4vii), must include the element be+en. Chomsky observes that we must specify special conditions for selecting this passive element. For instance, in a passive structure the V must be transitive but must not be followed immediately by a NP. He explains:

Passive sentences are formed by selecting the element be+en ... .  But there are heavy restrictions on this element that make it unique among the elements of the auxiliary phrase.  For one thing, be+en can be selected only if the following V is transitive (e.g., was+eaten is permitted, but not was+occurred); but with a few exceptions the other elements of the auxiliary phrase can occur freely with verbs.  Furthermore, be+en cannot be selected if the verb V is followed by a noun phrase ... (e.g., we cannot in general have NP+is+V+en+NP, even when V is transitive -- we cannot have “lunch is eaten John”).[11]

 

Furthermore, there are selectional restrictions governing verbs.  For instance, the verb frighten may not take any non-animate object though it may take an abstract subject; and the verb admire may not take any abstract or non-animate subject or any non-animate object. However, these restrictions can hardly be adapted to the context where be+en is selected.

[We have already placed] many restrictions on the choice of V in terms of subject and object in order to permit such sentences as: “John admires sincerity,” “sincerity frightens John,” “John plays golf,” “John drinks wine,” while excluding the ‘inverse’ non-sentences “sincerity admires John,” “John frightens sincerity,” “golf plays John,” “wine drinks John”.   But this whole network of restrictions fails completely when we choose be+en as part of the auxiliary verb.  In fact, in this case the same selectional dependencies hold, but in the opposite order.[12]

 

The consequence for incorporating the be+en element in the PS rule system is that there will be two sets of selectional restrictions for the choice of V, one specific for the active constructions, another specific for the passive ones.  This makes the grammar unacceptably complicated.  To pursue a simpler model of grammar, Chomsky suggests adding a transformational rule for passivization.

If we try to include passives directly in the [PS rule] grammar, we shall have to restate all of these restrictions in the opposite order for the case in which be+en is chosen as part of the auxiliary verb.  This inelegant duplication, as well as the special restrictions involving the element be+en, can be avoided only if we deliberately exclude passive from the grammar of phrase structure, and reintroduce them by a rule such as:

[(11)]   If S1 is a grammatical sentence of the form

                       NP1   -- Aux --V -- NP 2

            then the corresponding string of the form

                       NP 2   -- Aux + be + en --V -- by + NP1

             is also a grammatical sentence.[13]

 

            Transformational rules have two components: structural analysis of a given sentence and its structural change.  (11) can be formulated in the following way:

(12) Passive -- optional:

Structural analysis:  NP -- Aux -- V -- NP

Structural change: X1 -- X2 -- X3 -- X4  ®  X4 -- X2 + be + en -- X3 -- by + X1 [14]

 

The form of PS rules is X ® Y.  In a derivation using PS rules, each time one single symbol is acted upon.  But a transformational rule, for instance, that of movement: XY ® YX might affect more than one symbol at a time. There are also transformational rules of deletion: XY ® Y and of addition: XY ® XYZ.  Chomsky has provided two pairs of distinction since his early work: (a) singulary transformation (simply speaking, dealing with only one sentence, such as transforming an active sentence into a passive one, and a declarative sentence into an interrogative one, etc.) and generalised transformation (combining two sentences into one sentence, such as constructing a sentence with a relative clause, and constructing a conjunction, etc.); (b) obligatory transformation (such as number agreement) and optional transformation (such as passive).[15]

            We must first obtain the structural analysis of (6) on which the passive transformational rule (12) is applied.  To do this, it is necessary to go back from the surface structure of (6) to the following string of category symbols:

            (13)  NP1 -- Aux -- V --  NP2    

The exercise of the transformational rule (12), unlike the PS rules that always deal with the last step, “requires a more powerful machine, which can ‘look back’ to earlier strings in the derivation in order to determine how to produce the next step in the derivation.”[16]  (13) presupposes the information about the constituency of (6) ¾ about the category and lexical rules involved in generating that syntactic structure.  Moreover, rule (12) affects more than one element at a time: X1  and  X4 exchange their positions, and be+en is inserted in between Aux and V.  However, the introduction of this rule greatly simplifies the PS-rule grammar.  The above-mentioned restrictions and complication accompanying the selection of be+en are no longer needed.   Indeed be+en may be eliminated from the PS rule.

            We get the passive transform by operating rule (12):

            (14) The house -- Aux + be + en -- buy -- by + David.

Further transformational rules must be exercised in order to derive the surface structure of the passive construction (7).  The PS rule system combining (2) and (4) does not accommodate the inflectional character of (auxiliary) verbs such as buys, has bought, have bought, has been bought and is being bought, etc., which behave rather regularly.  Even the pattern of affixation of the past-tense verb form, -ed, is omitted.  To deal with these linguistic facts, Chomsky’s transformational grammar isolated the verbal roots from their affix and auxiliary verbs and introduced transformational rules to bring them together.  Consider the following elementary rule system (15), which supplements and modifies the PS rules of (2) and (4), and includes the affix-movement rule (16):

(15)         (i) Verb ® Aux + V

(ii) V ® hit, take, walk, read, [buy], etc.

(iii) Aux ® C(M)(have+en)(be+ing)(be+en)

                                (iv) M ® will, can, may, shall, must

 

(16)                      S in the context NPsing_

            (i) C ®    Ę in the context NPpl_

                            past

            (ii) Let Af stand for any of the affixes past, S, Ę, en, ing. Let v stand

            for any M or V, or have or be (i.e., for any non-affix in the phrase

            Verb).  Then:

                      Af + v ® v + Af #.

            where # is interpreted as word boundary.[17]

 

Note that at that time in Chomsky’s view (16ii) reveals a feature of English that Af obligatorily moves from the position of Aux to append the verbal root (Affix-hopping), rather than that the verbal root raises to the Aux (V-raising).[18]

            Applying (16i), we derive

(16)  The house --  past +  be + en  -- buy --  by + David.

            X4                                  X2                                     X 3               X1   

Then (16ii) two times:

(17)  The house -- be+past # buy+en # by + David.

(17) will be the input to the component of morphophonemic rules, including the following:

(18) be+past ® was in the context NPsing_; were in the context NPpl_

(19) buy+en ® bought

(20) the ® /D«/

(21) house ® /hAUs/

(22) was ® /w«z/

(23) bought ® /b¨t/

(24) by ® /bAI/

(25) David ® /devId/

From these rules, we obtain the passive sentence (7), repeated here as (26):

            (26)  The house was bought by David

            (17) is a derived string called surface structure with abstract symbols past and en, whereas (26) is the sentence we actually hear and use.  In Syntactic Structure (1957), it was thought that both the active sentence (6) and passive sentence (26) share the same ‘underlying structure’ generated by certain PS rules (including the lexical ones).[19]  (6) is taken to be more basic, as (26) is derived from it, on the basis of some further transformational rules.  In this early work, the relation between syntax and semantics has not been made very clear.  It was argued that syntactic knowledge is autonomous and does not depend on any insecure semantic notions.[20]  Later, in Aspects of the Theory of Syntax (1965) (that is, the Standard Theory), rules of semantic interpretation were introduced.  They were supposed to operate on the deep structure, that is, the phrase marker generated directly by the base (categorial rules and the lexicon) and being the input to the transformational component.[21]   Still later, the Extended Standard Theory (EST) in the mid-1970s and Revised Extended Standard Theory (REST) in the late 1970s no longer took the D-structure (deep structure) to be the only semantically relevant level.[22]  As traces have been introduced in REST, the predicate-argument structure is indirectly shown at the S-structure (that is, a slightly more abstract level than that of the previous surface structure, with traces added to the original position of the moved element).  Further, it is realised that apart from the S-structure, the level of Logical Form (LF) also contains semantic information such as anaphora and scope.  Chomsky writes,

It has, however, become clear that other features of semantic interpretation having to do with anaphora, scope, and the like are not represented at the level of D-structure but rather at some level closer to surface level, perhaps S-structure or a level of representation derived directly from it ¾ a level sometimes called “LF” to suggest “logical form,” with familiar provisos to avoid possible misinterpretation.  The term is used because this level of representation has many of the properties of logical form in the sense of other usage.[23]

 

            At this stage, linguistic expressions are analysed into four levels: D-structure, S-structure, LF and PF.  PF abbreviates for Phonetic Form, that is, the output of the operation of (mainly) phonological rules to S-structure representations.  Representations of each level are generated by a particular type of rules: D-structure representations are yielded by (I) PS rules, S-structure representations by (II) transformational rules, PF representations by (III) PF rules and LF representations by (IV) LF rules.  These representational levels and rules form a system shown in the following diagram:[24]

                                                                        

                           Figure 2    The (R)EST Rule System

            A linguistic theory, as Chomsky maintains, must deal with two types of universals.  Traditional universal grammar has provided us the required vocabulary to describe a language, to represent the so-called substantive universals, that is, the formal elements present in every grammar.  In syntax, we have terms to indicate fixed categories such as verbs, verb phrases, nouns and adjectives; in phonology and semantics, too, we have terms to indicate the limited number of fixed, universal, phonetic and semantic elements such as voiced, anterior, coronal and physical object, feeling, behaviour, respectively.   Another kind of universals, called formal universals, is more abstract.  These universals have not been given sufficient attention until the generative grammar of the twentieth century emerged.  They “involve rather the character of the rules that appear in grammars and the ways in which they can be interconnected.”[25]  The proposal presented in the foregoing, in particular, that there must be transformational rules in a linguistic theory, is concerned with formal universals.   

            Substantive and formal universals constitute the most elementary assumptions of a general linguistic theory, which postulates definitions to represent them, including:

(i) a universal phonetic theory that defines the notion “possible sentence”

(ii) a definition of “structural description”

(iii) a definition of “generative grammar”[26]

 

In addition, the application (or relation) of these definitions must be specified. There must be:

(iv) a method for determining the structural description of a sentence, given a grammar.

Having fulfilled these “formal conditions” (i)-(iv), a grammar of English, say, must further be in agreement with the intuitions of the native English speakers.  However, even though these internal and external justifications or conditions obtain, there might still be mutually inconsistent proposals competing to claim themselves as the correct grammar ¾ unless (1) extremely abundant and rich data (of English) are accessible to us and/or (2) our theory restricts a narrow range of potential grammars ¾ so narrow that ideally a unique grammar is permitted on the basis of certain empirical data.  In Chomsky’s view, the linguist need not worry so much about the availability of linguistic data.[27]  In this respect, the situation for a linguist is different from the situation for a child who learns a language.  The linguist’s “real problem is almost always to restrict the range of possible hypotheses by adding additional structure to the notion ‘generative grammar’”,[28] through abstracting universal ‘linguistic forms’ from the various languages.  However, it will be extremely difficult to accommodate this reduction procedure with the increasing details of various languages discovered.  (See the issue on the tension between descriptive and explanatory adequacy in 1.3.1.)  In his early works, he tended to think that a strong linguistic theory like a discovery or decision procedure ¾ which allows an extremely narrow range of possible languages, perhaps only one ¾ is not feasible.[29]  He proposed to postulate a general theory whose formal conditions are not too restrictive and incorporate an evaluation procedure to select descriptively adequate grammars, that is,

(v) a way of evaluating alternative proposed grammars.[30]                                

            The ranking of a proposed grammar (or rule) relies first of all on whether it captures a linguistic fact of nature (concerning the competence of the native speakers of a language) and secondly, on the generality of such a fact, whether the grammar (or rule) in question can be generalised to other empirically given languages.[31]  Further, thirdly, that rule might be a reduction of a set of rules, and thus have a generality within a rule system; the degree of this generality or simplicity has a bearing on the evaluation procedure, too.  Chomsky remarks, “descriptions of particular subsystems of the grammar must be evaluated in terms of their effect on the entire system of rules”.[32]  An evaluation measure defined in terms of generality or simplicity is motivated by the idea that a more general or simple hypothesis provides a better insight into the relatedness of certain rules as well as of certain facts.  In addition, the fact that the children acquire a language quickly also supports linguistic hypotheses, which are simple and easily learned.  In Chomsky’s term, a preferred descriptively adequate grammar is one that expresses some “linguistically significant generalization” ¾ “a decision as to what are ‘similar process’ and ‘natural classes’”.[33]

            Chomsky in Aspects gave the following case:

As a concrete illustration, consider the question of whether the rules of a grammar should be unordered (let us call this the linguistic theory Tu ) or ordered in some specific way (the theory To). ... For example, if Tus  is the familiar theory of phrase structure grammar and Tos   is the same theory, with the further condition that the rules are linearly ordered and apply cyclically, with at least one rule A ® X being obligatory for each category A, so as to guarantee that each cycle is nonvacuous, then it can be shown that Tus   and Tos  are incomparable in descriptive power ... .  Consequently, we might ask whether natural languages in fact fall under Tus  , these being non-equivalent and empirically distinguishable theories.[34]

 

Suppose the theory of unordered rules Tus is applicable to other natural languages while the theory of ordered rules Tos is not, we have a reason to choose Tus rather than Tos: the former represents a (more general) “natural class” of linguistic facts.   Selecting Tus means further putting the data under structural analysis in the “natural class” and in this way the data are not only described, but also to some extent, explained.

            The evaluation measure depending on the degree of “linguistically significant generalizations” may be quantified.  Chomsky maintains, “The obvious numerical measure to be applied to a grammar is length, in terms of number of symbols.”[35]  An example about the generality of the rule of English Verbal Auxiliary ¾ within the system of rules (English grammar) ¾ serves to show this:

(27)    Aux  ®  Tense (Modal) (Perfect) (Progressive)

Rule (27) is an abbreviation for eight rules that analyse the element Aux into its eight possible forms.  Stated in full, these eight rules would involve twenty symbols, whereas rule (27) involves four (not counting Aux, in both cases).  The parenthesis notation, in this case, has the following meaning.  It asserts that the difference between four and twenty symbols is a measure of the degree of linguistically significant generalization achieved in a language that has the forms given in list (28), for the Auxiliary Phrase, as compared with a language that has, for example, the forms given in list (29), as the representatives of this category:

(28)  [i] Tense, [ii] Tense