9524143565

9524143565



Identify ing key customers' problems based on Twitter messages


Sebastian Muszyński, Adrian Zdobylak, Paweł Mielniczuk {212727, 212824, 212714} @ student.pwr.edu.pl

Wrocław University of Science and Technology


Sent


Eurostar is a high-speed railway service that connects London with major Eu-ropean cities, e.g. Amsterdam, Brussels, Lyon, Paris and Rotterdam. Ali the trains traverse the Channel Tunnel between the United Kingdom and France.

The company collects feedback from its passengers through multiple chan-nels but one of the most popular is Twitter. Although it is easy for travelers to express their dissatisfaction on social media, it is hard to address their issues and infer meaningful insights. The current process of addressing the passengers feedback consumes a lot of time sińce it is being done manually by a Eurostar employee.


A rangę of major passenger problems has been successfully identified. In automatically created groups of tweets the following issues could be easily recognised:

•    Internet connection/wifi related complaints,

•    compensation and refund queries,

•    online booking error reports,

•    immediate help requests,

•    and direct messages (DMs) inquiries.


tam

board

ES9114


saymg


Mteraiy dctually internet


““"Snbsi §""**■* Mp

deiices ES»50 Help


ServlCedoesnevej^ trains

Paweł Mielniczuk O

@TheRealMielniczukPawel


@Eurostar Third time this month riding your train from Paris to London and again no WiFi!

#Eurostar #angry #dissapointed #noWifi


*'•! cant 4-raitact tryoi

tjirostarnlfssey

anjwhere

gettmg


laies

£    E59I48

lic.tMtter.cini/SuiKPuTtC

Frwre

• starek bft*w, ! air/thirg lajltę

terrible

works ^

post

ESSIS8 time

gettiig


“ connected

get    lim


es9l28

MEYER


a- Jns bother

falseatfrtrtisiig

■ —_ ES09S3    ‘^5 s

sericusljf alnost

aa w ui mii^ ter

Iteier preterd E.ery


= KiustartlK ESS024

please connection

aihise


..-working

phone" never


free


5:11 PM 04 Dec 18


,travel


JK^JSiiuissue


Mlii.*


rp.ply1


&


nessages -


thanks


Sili:


France

the utilg trjnv    | ,-rL

M Jeurostarfr ™2:i n\m

j.£w ca


message

^Hello


dm .check



tao

ca


13RETWEETS 15 LIKES


IB


The main goal of this project was to be able to automatically identify the most common types of passenger problems sent in through Twitter messages and filter out SPAM and non-important issues.



Data used in the project has been collected directly from Twitter using their public API in November 2018. Fetched tweets consist of not only messages sent in by Eurostar passengers but also Eurostar employee responses. Even though many of them express gratitude and happiness from their travels, the vast majority of messages are considered to be either complaints or direct questions regarding Eurostar train services. Over 290 thousands of tweets were collected in total.


0 1 2 3 4 5

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 HOUfS


urgeit

hello respaiid

e,rtn,t

ttiin fmlt* flEASE


g;S instaf ood! F ŃT H £ f 3 PSyiStS

a    | ir.

delish i JlStoryr    g; Pl

travelingi.!| s g^^pariS^group

— ~ m. S    --=rWatch V=tresli =

travelpics


amazing

s



smile^


Dozens of SPAM messages were also filtered out as well as a good amount of tweets containing words of thanks. There was plenty of tweets with @Eurostar mention that did not include any relevant information nor requests. Uploaded photos of happy passengers during their travels were not taken into account either and such messages were simply ignored.


Methods & Tools


The finał solution was implemented in Python 3 using i.a. scikit-learn, pandas, spaCy, textacy, NumPy and NLTK.


(E)


English tweets

Since Eurostar supports multiple languages English tweets must have been exlractcd.


TWitter

Data collected from ©Eurostar Twitter account over the period of 6 years.


sweet


■—i! ■ Ihis!l

Hi^uys email Micket

■orńińćhelpjglj^”


Takeaways


n<i -=

received

ńeediDOOking

mi ^ ł In a n 1/ r* Th^nkc


* .tomorrow # thanksTł,anksS


OD

tali « W

someone


i

delay Please "‘E* s

complaints

'-^igets

compensation

taxiFa"seC0nfirniati0n


&

Clustering

GloN/e tralned on Twitter data was used to convert text to vcctors. The following clustering methods were used: KMeans, DBSCAN and Agglomerative Clustering.


©


Data cleanlng

Emojis. numbers. timestamps. train numbers. urls. mentions epfacements are only a few preprocessmg methods that have been used to clean raw data, Cities and train station names used by Eurostar have also been handled.


•    Planning data cleaning process ahead of time is difficult due to its highly con-textual process and may require morę iterations than assumed.

•    Twitter data contains a lot of clutter. Some tweets do not include any rele-vant information, hence many of them cannot be addressed automatically.

•    There is no straightforward way of automatic evaluation of clusters.

•    DBSCAN algorithm is not well suited to perform clustering on data with vary-ing density like Twitter messages, but may perform really well in filtering out the noise.

•    Agglomerative clustering gives much better results compared to K-Means or DBSCAN sińce it is able to analyse deeper every cluster yielding morę detailed categories.


The clustering algorithms were run on a set of tweets containing passengers’ questions as well as on direct Eurostar responses. Obtained clusters using both methods were later compared and unfortunately no significant improve-ments were noticed in clustering by answers over clustering by questions.


Scan this QR codę for an online version of this poster:




Wyszukiwarka

Podobne podstrony:
Driscoll, Britain, CH4 Identity p 4 Geographical identity A sense of identity based on place of birt
A Method for Solving Linear Programming Problems with Fuzzy Parameters Based on Multiobjective
The integrated tram tracking system based on GPS for tram tracking because of satellite signal loss
ICBIQ 13 ^attern >£ey l/.w f/ii> (omrnient key to hrtpyou identify the quilling shapts m tlur
COURSE AIM: The aim of the course is to familiarize students problem-solving methodology based on th
Identify your needs and skills gaps Based on your anticipated projects and workload for the year
00037 o86c7d5f75231ac44c5be450c1bb400 36 Molnau Ruleset 1— Based on the test s values recorded at t
00064 ?809c384562125d8a133f54f4e5b949 63 Adaptiye Hierarchical Bayesian Kalman Filtering section, i
00181 ?d17515ad1b09ad29b923c2e3121477 182 McWilliams considered before implementing any economic co
lingwistyka 1 PROSEMINAR IN GENERAL LINGUISTICS, TUTOR? WALDEMAR SKRZYPCZAK. Based on Ways to Langua
SNC00523 32 Thermography m human medicine Thermography or thermovision jn medfcme is based on the na
638 TJN DfiBAT : LES MENTALITES COLLECTIYES 8 are based on a narrative complex with orał circul
house based on heal pump <»• wykorzystujący skupiający kolektor słoneczny The design of

więcej podobnych podstron