Tan, Zhiyin

NLP & Knowledge Graph Researcher

Hannover, Lower Saxony, Germany
zhiyin121@gmail.com

Work Experience

Research Assistant

02.2024 current

L3S Research Center, Leibniz University Hannover

- Extracting structured knowledge from scientific articles (topic modelling, ontology supplementation, etc.) under the Hybrint project.

R&D NLU Language Specialist

10.2020 09.2022

Cerence, Remote

- Processing unstructured data (intents & slots annotation, syntax & discourse correction and diversity, ASR annotations) used for dialogue system training and testing, and analysing errors to improve intent and slot recognition accuracy by 10%.

- Significantly increase the speed of annotating new data by developing an annotation tool using Windows Presentation Foundation (WPF). It assembles multiple reference files on a single page with click-to-link and obtains the required tags with clickable use cases replacing manually typing the tags.

Marketing Data Analyst - Intern

07.2019 08.2019

TINNO Mobile/Wiko Mobile, China

- Create the potential for marketing departments to move from purely manual collection of competitors' product and sales information to real-time automated information scraping by leaving a set of data scraping Python code and workflows.

Teaching Assistant

01.2017 05.2017

University of Macao, Macao

- Prepare teaching materials, organize classroom group discussions, provide feedback.

- Assist in organizing several large-scale academic confluence for greater than 100 participants, and fully responsible for publicity planning (The 10th Cross-Strait Symposium on Modern Chinese Language, ULS15&SDP2, etc.).

Research Analyst - Intern

12.2016 06.2017

E-Research & Solutions, Macao

- Analyzed social media text data to gauge public opinion, emphasizing the enhancement of granularity in sentiment analysis model classifications.

- Write opinion analysis reports for delivery to clients, participation in competitions, and write brochures "Data Mining".

Language

- Mandarin (Native), Cantonese (Native)

- English (C1), German (A2)

Education

M.Sc. Computational Linguistics

10.2018 03.2023

University of Stuttgart

Natural Language Processing, Machine Learning, Knowledge Graph

M.A. Chinese Linguistics

09.2016 10.2018

University of Macau

"Excellent Master's Thesis", Teaching Assistant (Scholarship)

B.A. Chinese Language and Literature (Education)

09.2012 06.2016

Wuyi University

Outstanding Graduate (Top 10%), Outstanding Student Leaders (Top 2%), Second-class Scholarship (Top 6%).

Publication

LATEX Rainbow: Universal LATEX to PDF Document Semantic & Layout Annotation Framework, 11.2023, WIESP, IJCNLP-AACL 2023

Changxu Duan, Zhiyin Tan, Sabine Bartsch - A framework for annotating LaTex format PDF file.

Text-based Personality Prediction, 08.2021, MLSS 2021

Individual Work - A dataset of 22,647 anonymous posts tagged with the author's MBTI type was developed and used to train the DistilBERT model, yielding an average accuracy of 73.25% for binary classification in each of four dimensions.

The use of Cantonese Discourse Markers by Legislative Council Members in Hong Kong and Macau, 06.2018, WICL-4

Individual Work - Develop a 360,000-word Cantonese corpus.

Use a sociolinguistic approach (SPSS) to explore the relationship between the use of Cantonese discourse markers and the social background (gender, age, education, etc.) of the speakers (link to abstract).

A Comparative Study of Mandarin Pitch Range in Northern and Central Taiwan, 08.2017, NTU Summer Research Program

Individual Work - Analyzed phonetic data of single words from 3 men and 3 women with Praat, revealed Taiwanese Mandarin in Taipei and Taichung can be distinguished by pitch range.

Project Experience

Applied Contrastive Learning to Fine-grained Entity Type Classification

07.2021 03.2023

- Master thesis

- Utilized SPARQL to extract artificial product entity data from DBpedia, and associated types and properties from Wikidata.

- Evaluated and enhanced performance using models including CNN, LSTM, BERT, ALBERT, and RoBERTa through Contrastive Learning.

Deep Learning Graph-based Dependency Parser

12.2020 03.2021

- Take the chu-Liu-Edmonds algorithm as the baseline, Bi-LSTM model and Deep Biaffine Attention as the boosting model. The accuracy was obtained on English and German datasets with UAS 87.1%, LAS 85% and UAS 85.9%, LAS 81.9% respectively.

Predicting Author's Personality by Text

07.2020 09.2020

- Crawl user comments and user self-tagged personality MBTI tags from personality type forums, collecting a total of 22,422 texts.

- Four binary Bi-LSTM models are used for replacing one sixteen classification model to achieve an average accuracy of 20%.

- Using BERT as a pre-trained model, obtained f1 31% for the 16 classification models, f1 70% for the four binary classification models.

Optimizing Reinforcement Learning Policies with Emotion Signals

04.2020 07.2020

- Optimizing RL policies by capture emotion signals from the logs file of rule-based simulator-system conversations in the dialogue system.

Twitter Sentiment Analysis with Ordered Neuron LSTM

04.2019 07.2019

- Build eight classification models using Naïve Bayes and Ordered Neuron LSTM with Stance Sentiment Emotion Corpus (SSEC) as training and testing data respectively, obtain an average accuracy of 60%.

Language Learning Chatbot

12.2018 01.2019

- Using Django (HTML, CSS, JS, Python) implement a webpage application allows users to learn the basic vocabulary in Arabic, Chinese and German.

Program

AITalents Competition - Empowering Business with AI Technology

11.2020 01.2021

TechQuartier

- Collaborate with AI company SONEAN on a real-time platform for using AI technology to improve the efficiency of machine supply chain operations.

- Propose a global intelligent business system solution for building partnership networks and ESG indices for each entity:

- Establish real-time crawling of business news and news for real-time tagging of environmental, social, and government (ESG) signal indices and sentiment indices for each company entity to facilitate the selection of suppliers by buyers.
- Use docker and Neo4j to display company network graphs on the web to show competitors' partnership networks and suppliers' various indices in real time.

- Personal responsibility:

- Write Python code for crawling news and business reports and building a sentiment analysis system.

- Build graphs and write Cypher code for Neo4j queries.

- Create an animated Pitching video.

IBM Female Mentoring Program

04.2020 12.2020

IBM & University of Stuttgart

- A nine-month training program for twelve selected female students in computer science-related disciplines on topics including AI, data science and technology consulting.

Other Skills

- Graphic Design, Painting (>20 years), Presentation (Demo video).

- News Interview (5 years).