Gema Ramírez-Sánchez

CEO

at Prompsit Language Engineering

Business Data

Work Phone:

+965457549

Twitter

Other Affiliation:

https://twitter.com/altlang

Personal Bio

Gema Ramírez holds a Bachelor's degree in Translation and Interpreting and a Master's degree in Computer Applications from the University of Alicante where she started her professional career as a computational linguist. She is the CEO of Prompsit, a language technology provider with a strong focus on tailored MT services and multilingual applications of natural language technologies. She's currently the product manager of AltLang, the language variety converter that will help you to take better care of local audiences. You will find her frequently running workshops or linguistic olympiads everywhere, thanks to the European project Abu-MaTran. She is co-author of various research papers, an active developer the Apertium MT platform and the vice president of ARTES Cultura y Ocio, an association for inclusive leisure for disabled and non-disabled people.

Presentation title:

Curating and Analysing Massive Amounts of Multilingual Data for Open and High-Performance Language Modelling

Presentation description:

We will present the pipeline and tools used in the HPLT project (https://hplt-project.org) that have enabled the release of a massive and multilingual dataset collection for LLM training. The 2nd release of the HPLT data, still in the oven, will also be described along with some of the thorough and practical by-language analytics reports obtained with the HPLT Analytics tool. All outputs from the HPLT project, software and data, are released under free/open-source licences. Through this presentation, we will encourage community adoption and contributions to HPLT, committed to set solid bases for building open and high-performance LLMs and MT models.