Volume 300, March 2024, 103675

Syntactic functions of words grammatically related to verbs in interlanguage: A valency perspective

https://doi.org/10.1016/j.lingua.2024.103675Get rights and content


  • The extraction and classification of collocations are based on syntactic relations, not n-grams.

  • The analysis covers a wider variety of syntactic functions.

  • Distributions of syntactic functions show significant differences between interlanguage and the target language.

  • Distributions of syntactic functions reflect progress in interlanguage proficiency.


This study is intended to examine the syntactic functions of words grammatically related to verbs in interlanguage. Based on probabilistic valency patterns, a wider variety and a larger number of syntactic functions were investigated. The distributional data of these syntactic functions show significant differences between interlanguage and the target language and between interlanguage at different proficiency levels. Specifically, verbs in interlanguage have stronger associations with subjects, adverbials and complements, whereas verbs in the target language are more strongly associated with prepositions and objects; verbs in lower-level interlanguage have stronger associations with subjects and complements, while verbs in higher-level interlanguage are more strongly associated with auxiliaries, connections and subordinations; the distributions of the syntactic functions in interlanguage gradually approach those in the target language as the proficiency level of interlanguage improves. Our analysis shows that the syntactic functions in interlanguage have unique, progressive, and probabilistic characteristics.


Syntactic functions or grammatical relations are familiar concepts in the literature on language studies. Some researchers have debated their status in linguistic theories (Butler, 2012, Holvoet and Nau, 2014); some have discussed their role as typological features (Bickel, 2010, Yan and Liu, 2022, Saito and Liu, 2022); and some have focused on analyzing specific syntactic functions in particular structures (Deshors, 2020, Wang and Luo, 2021, Lee and Noh, 2023). The syntactic functions of words in interlanguage have also drawn much attention, and the common categories of syntactic functions under investigation include subjects and objects (Quesada, 2015, Chang and Zheng, 2018, Eguchi and Kyle, 2023). However, the scope of research has remained rather limited since only one or two types of syntactic functions are usually covered in a single study, and the analysis of the syntactic functions of words in a language from a panoramic viewpoint is rare. This situation can be attributed to a lack of corpora that include specific information about the grammatical relations between words, causing many studies on lexical collocations in learner corpora to be based on n-grams, i.e., word sequences extracted according to the linear positions of words (Yoon, 2016, Chen and Zhang, 2022, Saito and Liu, 2022, Yan and Liu, 2022). Building a syntactically annotated corpus requires considerable time and effort. However, the advantage is that it can provide more valuable and detailed information about how language learners use collocations. Specifically, it facilitates the identification of collocations with grammatical relations, excluding sequences such as is the or have beautiful, in which the words are not grammatically related. Such a syntactically annotated corpus makes the extraction of the syntactic functions of words possible, and thus more interesting results can be obtained than using n-grams.

Interlanguage (Selinker, 1972) refers to a language system produced by second language learners that differs from the target language. Interlanguage is a dynamic system that changes as second language learners keep learning the target language. Previous studies have compared the combinations of words in interlanguage with those identified in the target language (Erman et al., 2015, Appel and Trofimovich, 2017) and also compared various multi-word expressions used by second language learners at different proficiency levels (Granger and Bestgen, 2014, Tsai, 2020). Based on authentic language use, it was found that the frequency distributions of collocations in learner corpora are different from those in the target language, and students at different proficiency levels use collocations in different ways. The learners’ use of collocations reflects the development of interlanguage proficiency since it indicates the idiomaticity and fluency of language use (García Salido and Garcia, 2017, Uchihara et al., 2022). However, the combinations under investigation are commonly categorized according to part of speech (the extraction of which was based on the linear relations among words). For instance, many previous studies have focused on collocations such as verb + noun (Cangır and Durrant, 2021, Gyllstad and Snoder, 2021) or adjective + noun (Szudarski and Carter, 2016, Macis et al., 2021). A noun next to a verb could play various syntactic roles, such as the subject, object, adverbial, or complement. However, there is a lack of empirical studies on collocations like verb + subject, verb + object and verb + complement at different proficiency levels of interlanguage and in the target language.

As mentioned above, collocations in learner corpora are usually extracted based on the locations of words rather than the grammatical relations between them. One problem with the adjacent or linear extraction of collocations can be illustrated in sentence (1). Since there are search limits on the locations of words that collocate with each other, it is difficult to position a verb-noun collocation in a long sentence like example (1).

(1) The ordinary people like you and me might just as well have only one modest and achievable goal.

The range of the extracted collocational words is usually restricted to bigram or n-gram (n is often set to 4, Wood, 2015). When we look for nouns that combine with the verb have on its left and right side, it can be seen that the two nouns people and goal neither occur next to the verb nor fall into the regular scope of extraction. Therefore, the extraction results are unreliable, and the analyses solely based on linear relations are problematic. However, if the extraction is carried out according to the grammatical relations between words, the above problem can be avoided. Instead of searching for nouns that appear around the verb have, one can search for the subject and object, avoiding restrictions placed by the linear distance between them.

Given the limitations of previous research, the present study attempts a comprehensive analysis of the distributions of the syntactic functions of words grammatically related to verbs in the interlanguage produced by Chinese EFL learners. Verbs play an important role in determining the elements constituting a sentence, and therefore, verbs are central to many studies on syntactic functions (Bürsgens et al., 2021, Keshev and Meltzer-Asscher, 2021). How is each syntactic function (such as subjects, objects, adverbials, and complements) distributed in interlanguage? What is the relationship between the distributions of syntactic functions and the proficiency level of interlanguage? In order to find the possible answers, we propose using probabilistic valency patterns (Liu, 2009) to investigate the syntactic functions in interlanguage. Valency theory maintains that lexis should not be separated from syntax. The syntactic role of a word in a sentence is determined by its valency. Valency patterns are combinations of words based on grammatical relations. They contain information about the syntactic functions of the grammatically related words. Therefore, the extractions of verb collocations based on valency patterns can meet our research requirements. Furthermore, probabilistic valency patterns provide a quantitative analytical framework, contributing to empirical evidence for how language learners use syntactic functions. In the next section, we provide a more detailed introduction to probabilistic valency patterns.

Section snippets

Probabilistic valency patterns

The introduction of valency into linguistic research is generally credited to Lucien Tesnière (1959). Valency refers to the capacity of a word to combine with other words (Liu, 2009). It is an item-specific (Herbst, 2018) feature of words reflecting an idiosyncratic orientation to analyzing multi-word expressions. Valency was initially applied to verbs, and it was believed that only verbs have valency. Accordingly, many analyses have focused on the valency of verbs rather than other types of

Participants and treebanks

We have built two dependency treebanks (corpora annotated with syntactic properties, Liang and Sang, 2022) for the research purpose—the interlanguage treebank and the target language treebank. The interlanguage treebank consists of compositions written by Chinese EFL learners whose first language (L1) is Chinese and who have never lived abroad. There were three groups of students, classified by their proficiency level of interlanguage. The first and the youngest group of students consisted of

General distributions of syntactic functions in interlanguage and the target language

Before examining the syntactic functions of words collocating with verbs at different proficiency levels of interlanguage, we compared the target language and interlanguage as a whole. Table 3 shows the probabilistic distributions of the syntactic functions in the target language and interlanguage treebanks.

The most apparent gap lay in the preposition group (i.e., the valency pattern V + PREP). These elements accounted for 23.06% in the articles written by English L1 speakers while accounting


The present study compared the probabilistic distributions of the syntactic functions of words collocating with verbs in the target language with those in interlanguage as a whole, investigated their changes at different proficiency levels of interlanguage, and contrasted interlanguage at different proficiency levels with the target language. To sum up, the types of syntactic functions identified in interlanguage and the target language were the same (eight types in total) but the distributions


This work was supported by the Introduction of Talents Research Foundation of Guizhou University [grant number GZUTF(2021)24].

CRediT authorship contribution statement

Qianying Zhao: Writing – original draft, Methodology, Formal analysis, Data curation. Jingyang Jiang: Writing – review & editing, Supervision, Data curation, Conceptualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

