Data are the new gold, and this includes language data. As a language professional and language entrepreneur, this can only mean good news.

But, in these times of AI, who is really making money from your language data? From your words? With or without your agreement? It is time to explore the business of monetizing language data and become aware of the ethical questions that this activity can raise. Join our online knowledge exchange on 31 May 2023, 12 00 - 13 00 pm, and expand your awareness.


Master student (VUB) and intern at De Taalsector, Nidiyare Frutos, took a look at the kind of language data involved (blog posts and forums, open sources such as Wikipedia, translations and translation memories, glossaries, interpretation recordings and transcriptions, language handbooks, etc.), as well as how language data are processed and the revenue that is being generated from them.

During this session, she will raise serious questions and invite you to consider the revenue models in question and to swap ideas and opinions with the other professionals present.

Content and Objectives

This session will include the following topics:

Types of language data

Open sources such as Wikipedia, blog posts and forums, translations and translation memories, glossaries, language data for AI, speech databases, transcripts of interpreted meetings, content for language manuals, and other structured and unstructured data.

Language data processing

Methods for cleaning and refining language data, enrichment, anonymization, pseudonymization, and customization.

Language data quality

The methods and criteria used to assess the quality of language data.

Language data market

Overview of the "language data sector": key players, stakeholders, and supply and demand dynamics.

Revenue models

The strategies used to generate income from language data.

Ethical considerations

Intellectual property rights (IPR), copyright issues, impact on market forces, challenges facing languages with smaller amounts of language data.


The aim of the session is to show that, in the language sector, money is made not only from finished products or services but also from the basic resources fed into the language production processes, as well as to show the possible implications of these practices for you as a professional, company, or organization working in the language sector.

Language policy: The working language for this session is English although questions can also be asked in Dutch, Spanish, or French.

Sign up to register!

Date: Wednesday, 31 May 2023 from 12 00 to 13 00 online (via Teams)

Registration is required via this online form. You will then receive an immediate confirmation e-mail followed by a second one with all necessary practical information.


We respecteren je privacy.
Door op deze website te surfen aanvaard je functionele en analytische cookies, bedoeld om de site goed te laten werken. Hier geen trackingcookies.