Corpora in Translation Studies
17 January2014
School of French, Aristotle University of Thessaloniki
Abstracts
Dr. Vassiliki Foufi (School of
Modern Greek), Eleni Kogitsidou (Université de Grenoble), dr. Athanasios
Mavropoulos (Centre for the Greek Language), dr. Olympia Tsaknaki (Aristotle
University of Thessaloniki)
Compilation of a Literary Text Corpus
The ultimate goal of our research, carried out in the context of the
project Compilation of a parallel corpus of French fiction translated into
Greek, led by prof. Titika Dimitroulia and financed by the AUTH Research
Committee, is the construction of a literary bitext, with the aim of enriching
the Greek digital content and studying key issues of Translation and
Translation Studies, such as the contribution of translation in shaping the
language of the time, the imprint of the time in translation, the translator's
style, etc.
The parallel texts are considered very important for the applications of
automatic language processing and linguistic research as they help to eliminate
semantic ambiguities and contribute to terminology extraction and corpora
contrastive studies.
We will present in detail the steps we followed to construct the parallel
corpus, based in electronic texts and using multiple tools. First, we
undertook the conversion of files into editable formats by means of ABBYY Fine
Reader that provides with an optical character recognition
(http://www.abbyy.com.gr/). Then, we elaborated and corrected the texts in
order to remove the problems encountered after the file conversion (not
recognized accented characters, typographical errors, etc.). Finally, text
alignment at a sentence level was performed with the open source LF aligner
(http://sourceforge.net/projects/aligner/). We will present mismatches between
the source text and the translation, such as different formatting, improper
text segmentation in the source language or the target language, mismatches
between the textual units, etc.
Prof. Dionysis Goutsos
(National and Kapodistrian University of Athens)
Greek corpus building and analysis: The story so far and what is to follow
The paper offers a state-of-the-art account of corpus research on Greek,
focusing on both corpus compilation and analysis. It outlines the main phases
of development of Modern Greek corpora and presents the most important findings
on the description of the Greek language deriving from corpora, with specific
examples. The main focus of the paper is on the relevance of these findings for
the study of translation in Greek, as well as their implications for
translation theory and practice. Finally, the perspectives of corpus-related
research on Greek are outlined and some translation hypotheses for further
exploration are pointed out.
Prof. Titika Dimitroulia
(Aristotle University of Thessaloniki)
Design and compilation of a literary parallel corpus: aims and applications
The compilation of a parallel corpus of French literary fiction translated
into Greek is situated in the context of the Corpus-based Translation Studies
(CTS) in the Greek-speaking world and its applications to translation
didactics. At the same time, the corpus, containing works published after 1974,
constitutes a first sample of contemporary translated Greek literary discourse,
and thus we believe that it can and even must be incorporated, with all other
similar works in progress, in monolingual Greek corpora (HNC, SEK, Diachronic
Greek Corpora) and comparable corpora.
The choice of the works when designing this small-scale corpus, financed by
the AUTH Research Committee, complies with the criterion of
representativeness/balance, as an effort was made to include works covering
three centuries (18th-20th) and different genres, so that the interference of
the source language and the genre can be studied better. For the same reason,
we tried to include the different categories of translators (ordinary mediators
and established writers, academics, professional translators. The texts are
released with the permission of their publishers, always under the copyright
regime. This is the reason why the corpus, online and open accessed, will give
possibility of access to and download of the full texts.
Based on the corpus and taking into account extra-textual parameters, we
plan to study the style of individual translators, collective stylistic
features, authorship attribution, translationese and translation universals
(explicitation, simplification, normalization etc.), among others. In the field
of translation didactics, we examine the translation of culturèmes and
intertextuality, and discuss the concept of quality in literary translation.
Finally, the corpus will be used in the context of the debate on digital
literature and humanities today.
Prof. Fryni Kakoyianni-Doa &
dr. Eleni Tziafa (University of Cyprus)
The SOURCe Project
The SOURCe project was developed in three parts and includes (a) the search
engine for the Searchable Online French-Greek Parallel Corpus for the University of Cyprus (SOURCe),
(b) the Pencil and (c) the Library tool. These are designed as freely available
resources for language processing, along with the data to be processed, in
usable formats for teachers, learners and translators. Our aim is to describe
the design principles and the properties of the SOURCe Project and we will
outline its future perspectives and applications. This project is led by Fryni
Kakoyianni-Doa and is fully funded by the University of Cyprus.
The core of the project is a collection of parallel corpora: aligned (in
sentence level) original and translated texts, in French and Greek
language. In order to release teachers and translators from long
preparation and complex corpus-building, we propose the construction of simple,
online corpora with basic text-searching facilities, avoiding machine-based
annotated, tagged or parsed corpora which are more appropriate for detailed
linguistic research. We designed a simple interface, through which the user may
search existing corpora, upload texts, and see them online. Moreover, we
enabled different 'viewpoints' so that different types of users can see
different views on the same underlying datasets.
Prof. Rudy Loock (Université Lille3)
Intra-language differences and translation quality
The aim of this presentation is to raise the question whether the
measurement of intra-language differences between original language and
translated language can be used as a tool for translation quality assessment.
To ask such a question is to enter the thorny debate on the interpretation
of intra-language differences: should we consider translated language as
variation comparable to dialectal variation or should we consider that the
over-representation or under-representation of a given linguistic construction
means that the quality of the translation should be improved? From an even more
general perspective, should we consider that translated language is
intrinsically different and represents what researchers have called a third
code or should we consider that “the utopian goal is to make it virtually
impossible to tell the translation from an original text in that language”
(Teubert 1996: 241)?
Through the analysis of a learner corpus (translations tasks from English
to French performed by first-year and master’s students) for two case studies
(derived adverbs and existential constructions), we try and see whether some
correlation can be found between the observed intra-language differences and
the overall quality of the translation tasks.
Prof. Sofia Malamatidou (University of Birmingham)
Translation and Language Change: The Interplay of Diachronic and Synchronic
Corpus-Based Studies
Corpus-based research has yielded important insights into translation in
recent years, but most studies in the field have focused on synchronic
analyses, thus neglecting the potential for diachronic analysis to enhance our
understanding of how translation might contribute to important phenomena such
as language change. Recently, a number of scholars have adopted a corpus-based
approach in the investigation of translation as a form of language contact and
its impact on the target language. However, no diachronic corpus-based study of
translation involving Modern Greek has so far been attempted. Similarly,
comparable and parallel corpora have not been efficiently used by linguists for
the analysis of diachronic phenomena.
This study aims to combine synchronic and diachronic corpus-based
approaches, as well as parallel and comparable corpora for the analysis of
linguistic features of translated texts and their impact on non-translated
texts. Unlike most studies employing comparable corpora, which focus on
revealing recurrent features of translated language independently of the SL and
TL, this study approaches texts with the intention of revealing features that
are dependent on the specific language pair involved in the translation
process, i.e. English and Modern Greek.
The study involves the diachronic analysis of the TROY Corpus: a corpus of
Modern Greek non-translated and translated popular science articles, along with
their English source texts, covering a 20-year period (1990-2010) and
consisting of approximately half a million words. The corpus is divided into
three sections. The first subcorpus consists of non-translated Modern Greek
popular science articles published in 1990-1991. The second subcorpus consists
of non-translated and translated Modern Greek popular science articles
published in 2003-2004, as well as the source texts of the translations. The
third subcorpus includes non-translated as well as translated texts and their
source texts, all published in 2010-2011. The linguistic feature analysed for
the purposes of this study is the frequency of the passive voice reporting
verbs.
Spyridon Pilos (Spyridon Pilos, Head of sector “Language
Applications”, Informatics Unit, Resources Directorate, Directorate
General for Translation, European Commission, Luxembourg)
The public translation memories and corpora of DG Translation of the
European Commission
The Directorate General for Translation (DGT) has made available two data
sets relevant for translation: the DGT-TM and the DGT-Acquis. The DGT-TM,
i.e. "DGT Translation Memory", is a collection of compressed
multilingual files in the TMX format, an XML-based standard used for the
exchange of Translation Memory data. The present update covers the entire body
of EU law as published in the L-Series of the Official Journal between 1972 and
2012, in 23 official languages of the EU. Updates are scheduled to be released
on a yearly basis. The DGT-Acquis is a paragraph-aligned parallel corpus
consisting of full text documents with added meta-information on which
paragraphs are aligned with which others in the other languages. In this
corpus, one can thus see each sentence in its context, while in translation
memories, each sentence is in isolation, i.e. out of context. DGT-Acquis also
contains the L-series of the Official Journal but also the LM, C, CA and CE
collections. Both resources are available for downloading from the Joint
ResearchCenter's website on language technology resources. DGT-TM is also
accessible through the EU Open data Portal.
Prof. Mojca Schlamberger Brezar (University of Ljubljana)
L'argumentation
pour ou contre - les connecteurs en traduction du français vers le
slovène à travers un corpus parallèle journalistique
À partir des
années 1990, la linguistique des corpus a fait des progrès révolutionnaires
dans le traitement des données. Le domaine de la traductologie n’y présente pas
une exception. Nous parlerons de l'essor de la linguistique des corpus pour le
slovène, langue d'un peu plus de 2 millions de locuteurs.
Après un court compte-rendu
de l’état de choses dans la linguistique des corpus pour le slovène, nous
présenterons la recherche sur les connecteurs argumentatifs et
contre-argumentatifs français du type parce que, puisque,
mais, pourtant etc. dans les deux parties, journalistique et
littéraire, du corpus parallèle français-slovène FraSloK, et
rassemblerons leurs équivalents en traduction slovène. Nous nous pencherons sur
les stratégies argumentatives utilisées dans le texte du départ et leur impact
sur le choix des connecteurs en traduction. Nous analyserons les stratégies de
traduction de ces connecteurs et étudierons leur dépendance du type du texte,
de différents registres de langue présents dans le corpus et des choix
personnels des traducteurs.
Prof. Federico Zanettin (Università
di Perugia)
Corpora and literary translation research: Issues and challenges
In this presentation, I consider applications of corpus linguistics
tools and methodologies to descriptive translation studies. More specifically, I
discuss ways in which corpora of different types can help investigating
literary translation, both from a quantitative and a qualitative perspective. I
provide an overview of the main research lines, namely research on so-called
translation universals, translation norms and translator style. Finally, I
consider the main stages of corpus compilation and use, from corpus design and
annotation to search and visualization techniques, with a focus on parallel
corpora.
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου
Σημείωση: Μόνο ένα μέλος αυτού του ιστολογίου μπορεί να αναρτήσει σχόλιο.