Jörg Tiedemann

RSS feed of this list

  1. Detecting hospital-acquired infections: A document classification approach using support vector machines and gradient tree boosting

    General information

    Publication statusPublished
    MoE publication typeA1 Journal article-refereed
    OrganisationsDepartment of Digital Humanities, Language Technology, University of Helsinki, Stockholm University, KTH Royal Institute of Technology
    ContributorsEhrentraut, C., Ekholm, M., Tanushi, H., Tiedemann, J., Dalianis, H.
    Number of pages19
    Pages24-42
    Publication dateMar 2018
    Peer-reviewedYes

    Publication information

    JournalHealth informatics journal.
    Volume24
    Issue number1
    ISSN (Print)1460-4582
    Ratings
    • Scopus rating (2018): SJR 0.62 SNIP 1.071
    Original languageEnglish
    Fields of Scienceclinical decision-making, databases and data mining, ehealth, electronic health records, secondary care, CARE-ASSOCIATED INFECTIONS, AUTOMATED SURVEILLANCE, 6121 Languages, 113 Computer and information sciences, 3141 Health care science
    Electronic versions
    DOIs
    SourceWOS
    Source-ID000424053900003

    Research output: Contribution to journalArticleScientificpeer-review

  2. Accepted/In press

    What do Language Representations Really Represent?

    A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just like it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, while genetic relationships---a convenient benchmark used for evaluation in previous work---appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.

    General information

    Publication statusAccepted/In press
    MoE publication typeA1 Journal article-refereed
    OrganisationsDepartment of Digital Humanities, Language Technology, Doctoral Programme in Language Studies, University of Zurich, University of Copenhagen
    ContributorsBjerva, J., Östling, R. M., Han Veiga, M., Tiedemann, J., Augenstein, I.
    Publication date2019
    Peer-reviewedYes

    Publication information

    JournalComputational Linguistics
    ISSN (Print)0891-2017
    Original languageEnglish
    Fields of Science6121 Languages (language technology, computational linguistics), 113 Computer and information sciences (natural language processing)
    URLs

    Research output: Contribution to journalArticleScientificpeer-review

  3. Opus-MontenegrinSubs 1.0: First electronic corpus of the Montenegrin language

    General information

    Publication statusPublished
    MoE publication typeA4 Article in conference proceedings
    OrganisationsDepartment of Digital Humanities, Language Technology, Doctoral Programme in Language Studies
    ContributorsBozovic, P., Erjavec, T., Tiedemann, J., Ljubesic, N., Gorjanc, V.
    Number of pages5
    Pages24-28
    Publication date2018

    Host publication information

    Title of host publicationProceedings of the conference on Language Technologies & Digital Humanities 2018
    Place of publicationLjubljana
    PublisherLjubljana University Press
    EditorsFišer, D., Pančur, A.
    ISBN (Electronic)978-961-06-0111-1
    Fields of Science6121 Languages (language technology, computational linguistics), 113 Computer and information sciences (natural language processing)
    Electronic versions
    URLs
    SourceBibtex
    Source-IDurn:86555f7fdb212929b4997e4aee2661e5

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

  4. Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

    General information

    Publication statusPublished
    MoE publication typeC2 Edited book
    OrganisationsDepartment of Digital Humanities, Language Technology, Doctoral Programme in Language Studies
    ContributorsZampieri, M. (ed.), Nakov, P. (ed.), Ljubesic, N. (ed.), Tiedemann, J. (ed.), Malmasi, S. (ed.), Ali, A. (ed.)
    Publication date2018

    Publication information

    Place of publicationStroudsburg
    PublisherAssociation for Computational Linguistics
    ISBN (Electronic)978-1-945626-43-2
    Original languageEnglish
    Fields of Science6121 Languages (Language Technology, computational linguistics), 113 Computer and information sciences (natural language processing)
    Electronic versions
    URLs
    SourceBibtex
    Source-IDurn:f0b0b1324aaf33258d97b54f25bb3767

    Research output: Book/ReportAnthology or special issueScientificpeer-review

  5. Creating a Dataset for Multilingual Fine-grained Emotion-detection Using Gamification-based Annotation

    This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection. We present a flexible tool, Sentimentator, that can be used for efficient annotation based on crowd sourcing and a selfperpetuating gold standard. We also present a novel dataset with multi-dimensional annotations of emotions and sentiments in movie subtitles that enables research on sentiment preservation across languages and the creation of robust multilingual emotion detection tools. The tools and datasets are public and opensource and can easily be extended and applied for various purposes.

    General information

    Publication statusPublished
    MoE publication typeA4 Article in conference proceedings
    OrganisationsDepartment of Digital Humanities, Language Technology, Doctoral Programme in Language Studies, University of Helsinki
    ContributorsÖhman, E. S., Tiedemann, J., Honkela, T. U., Kajava, K.
    Number of pages7
    Pages24-30
    Publication date31 Oct 2018

    Host publication information

    Title of host publicationProceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
    Place of publicationStroudsburg
    PublisherAssociation for Computational Linguistics
    ISBN (Electronic)9781948087803
    Fields of Science113 Computer and information sciences, 6121 Languages, 6160 Other humanities
    Electronic versions
    URLs

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

  6. An Analysis of Encoder Representations in Transformer-Based Machine Translation

    General information

    Publication statusPublished
    MoE publication typeA4 Article in conference proceedings
    OrganisationsDepartment of Digital Humanities, Language Technology, Doctoral Programme in Language Studies
    ContributorsRaganato, A., Tiedemann, J.
    Number of pages11
    Pages287-297
    Publication date2018

    Host publication information

    Title of host publicationProceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
    Place of publicationStroudsburg
    PublisherAssociation for Computational Linguistics
    EditorsTal, L., Chrupała , G., Alishahi , A.
    ISBN (Electronic)978-1-948087-71-1
    Fields of Science113 Computer and information sciences, 6121 Languages
    Electronic versions
    URLs
    SourceBibtex
    Source-IDurn:bcd8c985c6805219a8ec18bb7843c066

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

  7. The University of Helsinki submissions to the WMT18 news task

    General information

    Publication statusPublished
    MoE publication typeA4 Article in conference proceedings
    OrganisationsDepartment of Digital Humanities, Language Technology, Doctoral Programme in Language Studies, University of Helsinki, Helsinki
    ContributorsRaganato, A., Scherrer, Y., Nieminen, T., Hurskainen, A., Tiedemann, J.
    Number of pages8
    Pages488-495
    Publication date2018

    Host publication information

    Title of host publicationProceedings of the Third Conference on Machine Translation : Shared Task Papers
    Place of publicationStroudsburg
    PublisherAssociation for Computational Linguistics
    EditorsBojar, O., Chatterjee, R., Federmann, C., Fishel, M., Graham, Y., Haddow, B., Huck, M., Yepes, A. J., Koehn, P., Monz, C., Negri, M., Névéol, A., Neves, M., Post, M., Specia, L., Turchi, M., Verspoor, K.
    ISBN (Electronic)978-1-948087-81-0
    Fields of Science113 Computer and information sciences, 6121 Languages
    Electronic versions
    URLs
    SourceBibtex
    Source-IDurn:d3fc27f1298c2d81f6d65df0f4db387e

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

  8. The MeMAD Submission to the IWSLT 2018 Speech Translation Task

    This paper describes the MeMAD project entry to the IWSLT Speech Translation Shared Task, addressing the translation of English audio into German text. Between the pipeline and end-to-end model tracks, we participated only in the former, with three contrastive systems. We tried also the latter, but were not able to finish our end-to-end model in time.

    All of our systems start by transcribing the audio into text through an automatic speech recognition model trained on the TED-LIUM English Speech Recognition Corpus. Afterwards, we feed the transcripts into English-German text-based neural machine translation (NMT) models. Our systems employ three different translation models trained on separate training sets compiled from the English-German part of the TED Speech Translation Corpus and the OpenSubtitles2018 section of the OPUS collection.

    In this paper, we also describe the experiments leading up to our final systems. Our experiments indicate that using OpenSubtitles2018 in training significantly improves translation performance. We also experimented with various pre- and postprocessing routines for the NMT module, but we did not have much success with these.

    Our best-scoring system attains a BLEU score of 16.45 on the test set for this year’s task.

    General information

    Publication statusPublished
    MoE publication typeD3 Professional conference proceedings
    OrganisationsDepartment of Digital Humanities, Doctoral Programme in Language Studies, Language Technology, Aalto University
    ContributorsSulubacak, U., Tiedemann, J., Rouhe, A., Stig-Arne, G., Kurimo, M.
    Number of pages6
    Pages89-94
    Publication date30 Oct 2018

    Host publication information

    Title of host publicationProceedings of the 15th International Workshop on Spoken Language Translation (IWSLT 2018)
    Place of publicationBruges
    EditorsTurchi, M., Niehues, J., Frederico, M.
    Fields of Science113 Computer and information sciences, 6121 Languages
    Electronic versions
    URLs

    Research output: Chapter in Book/Report/Conference proceedingConference contributionProfessional

  9. The MeMAD Submission to the WMT18 Multimodal Translation Task

    This paper describes the MeMAD project entry to the WMT Multimodal Machine
    Translation Shared Task.

    We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice.

    We have the top scoring system for both English-to-German and English-to-French, according to the automatic metrics for flickr18.

    Our experiments show that the effect of the visual features in our system is small. Our largest gains come from the quality of the underlying text-only NMT system. We find that appropriate use of additional data is effective.

    General information

    Publication statusPublished
    MoE publication typeA4 Article in conference proceedings
    OrganisationsDepartment of Digital Humanities, Doctoral Programme in Language Studies, Language Technology, Aalto University, Institut Eurecom
    ContributorsStig-Arne, G., Huet, B., Kurimo, M., Laaksonen, J., Merialdo, B., Pham, P., Sjöberg, M., Sulubacak, U., Tiedemann, J., Troncy, R., Vázquez Carrillo, J. R.
    Number of pages9
    Pages603-611
    Publication date1 Nov 2018

    Host publication information

    Title of host publicationProceedings of the Third Conference on Machine Translation (WMT) : Shared Task Papers
    Place of publicationStroudsburg
    PublisherAssociation for Computational Linguistics
    EditorsBojar, O., Chatterjee, R., Federmann, C., Fishel, M., Graham, Y., Haddow, B., Huck, M., Yepes, A. J., Koehn, P., Monz, C., Negri, M., Névéol, A., Neves, M., Post, M., Specia, L., Turchi, M., Verspoor, K.
    ISBN (Electronic)978-1-948087-81-0
    Fields of Science113 Computer and information sciences, 6121 Languages
    Electronic versions
    URLs

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

  10. Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign

    We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects. The campaign was organized as part of the fifth edition of the VarDial workshop, collocated with COLING’2018. This year, the campaign included five shared tasks, including two task re-runs – Arabic Dialect Identification (ADI) and German Dialect Identification (GDI) –, and three new tasks – Morphosyntactic Tagging of Tweets (MTT), Discriminating between Dutch and Flemish in Subtitles (DFS), and Indo-Aryan Language Identification (ILI). A total of 24 teams submitted runs across the five shared tasks, and contributed 22 system description papers, which were included in the VarDial workshop proceedings and are referred to in this report.

    General information

    Publication statusPublished
    MoE publication typeB3 Article in conference proceedings
    OrganisationsDepartment of Digital Humanities, Language Technology, Doctoral Programme in Language Studies, University of Wolverhampton, Harvard Medical School, Harvard University, Qatar Computing Research Institute, MIT, Massachusetts Institute of Technology (MIT), Jozef Stefan Inst, Jozef Stefan Institute, Dept Knowledge Technol, Univ Zagreb, University of Zagreb, Tilburg University, Radboud University Nijmegen, University of Leuven, Meertens Instituut, Bhim Rao Ambedkar University, Jadavpur University, Kolkata, Jawaharlal Nehru University, Zurich University
    ContributorsZampieri, M., Malmasi, S., Nakov, P., Ali, A., Shon, S., Glass, J., Scherrer, Y., Samardžić, T., Ljubešić, N., Tiedemann, J., van der Lee, C., Grondelaers, S., Oostdijk, N., Speelman, D., van den Bosch, A., Kumar, R., Lahiri, B., Jain, M.
    Number of pages17
    Pages1-17
    Publication date2018

    Host publication information

    Title of host publicationProceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects
    Place of publicationSanta Fe
    PublisherAssociation for Computational Linguistics
    EditorsZampieri, M., Nakov, P., Ljubešić, N., Tiedemann, J., Malmasi , S., Ali, A.
    ISBN (Electronic)978-1-948087-55-1
    Fields of Science6121 Languages
    Electronic versions
    URLs

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientific

Previous 1 2 3 4 5 6 7 8 ...10 Next