Members

The Valencian translation company Pangeanic has recently gained a new contract for the development of an Anonymization Project all over Europe.

The Innovation and Networks Executive Agency, also known as INEA, last January conferred Pangeanic’s consortium the quantity a million euros in order to develop a new project for the European Union. The project´s purpose is to develop a multi-lingual anonymization toolkit moved by artificial intelligence (AI) processing of Named Entity Recognition.

The final deliverable will be used in the fields of life science, health and justice. Once the project is finished, the concluding version of the anonymization toolkit will be available to be downloaded as an entirely deployable docker, freely available for developers with an open-source license.

The consortium that will lead the project includes several Public Administrations and other use cases are likely among the EU Member States as the anonymization need amongst public sector administrations is rising. Also, Commercial Chambers, Embassies and other European Public Administrations all around the globe will benefit from this project, as they work with a mandate for open information and transparency while at the same time they have to keep personal data completely safe, which can´t be shared with third parties.

MAPA stands for Multilingual Anonymization toolkit for Public Administrations. The MAPA Project will be an extremely sophisticated project, as it implies top end Natural Language Processing tools for the open source toolkit development, which will be focused on two specific domains: legal and medical. This toolkit with be used in numerous Public Administrations all over the European continent.

Manuel Herranz is the CEO of Pangeanic. He is convinced that the MAPA Project will be able to provide big amounts of data de-identification for institutions to be able to release or share data, while at the same time keeping its privacy. Implementation cases will be focused on pseudo-anonymizing, de-identifying or obfuscating individually recognizable data.

Additionally, he states that it will be completely language independent. This permits the toolkit to delete the personal data regardless of the language which the institution works with, or even the names that are mentioned there. He considers that GDPR (Guide to the General Data Protection Regulation) has definitely altered the mode data is spread. Likewise, he considers that there a rising interest in the protection of people´s privacy.

Apart from the Spanish translation agency Pangeanic, MAPA Partners are the Spanish Language Plan Government Office (SEDIA), represented by the Barcelona Supercomputing Center, Tilde, LIMSI at CNRS (the National French Center for Scientific Research, the R&D Center of Vicomtech, the language resource center ELRA and the University of Malta. This multitasking and multicultural workgroup will be in charge of addressing all the European languages.

Why data should be anonymized?

GDPR mandates institutions to protect people’s personal information in order to not be shared with third parties. The MAPA data anonymization toolkit will enable language information sharing while preserving personal and sensitive information safety.

The option of liberating large amounts of anonymized information can benefit the community to count with more training information for machine learning, for instance. This can be tremendously supportive for institutions with offices all around the globe to transfer data securely across different jurisdictions.

On a more tangible level, health authorities, justice departments and healthcare businesses will be capable to share information and achieve a de-identification strategy. Tailored deployment cases will validate the customization and flexible features of the solution.

Most remarkably, MAPA will please GDPR mandates at scale. Despite the fact that no software is able to 100% guarantee exactness in anonymization because faultless machine translation haven´t been developed yet, it will let document sharing to be much simpler.

Anonymization Technical Approach
Fundamentally, the MAPA anonymization toolkit will work with Classification (NERC) techniques and Named-Entity Recognition. This two use neural networks and Deep Learning techniques.

The polyglot NERC approach will be especially helpful for the extremely difficult challenge of working with under-resourced languages such as Estonian, Croatian, Slovenian, Latvian and Lithuanian. And also, it will benefit the difficult task of working with Irish and Maltese, which are considered ultra-under-resourced languages.

Likewise, thanks to the transfer learning skills shown by modern varieties of Deep-Learning models, advanced systems can be trained requiring fairly small datasets of manually classified data. The knowledge acquired for a specific language or field can be applied in others using re-used cross-domain or cross-language techniques. MAPA will be trained to perceive named entities which include delicate information.

MAPA will have hips of valuable features and the NERC approach will be supplemented with other configurable procedures like pattern detection using regular expressions (passport or ID numbers, blood groups, telephone numbers, street addresses, marital status, sex, bank accounts, age, email addresses, etc.)
User-definable dictionaries for particular requests will also supply for precise procedures of entity names recognized in advance.

Views: 1

Comment

You need to be a member of On Feet Nation to add comments!

Join On Feet Nation

© 2024   Created by PH the vintage.   Powered by

Badges  |  Report an Issue  |  Terms of Service