Understanding the mind of the customers is essential for a successful business. In fact, lots of IT companies have launched Location recommendation services based on a person's past behavioral patterns to predict where he/she could be next and what he/she may like.
Similarly, understanding the travel patterns is key to design personalized travel recommendation and to improve local tourism business.
In particular, we are interested in how travel patterns differ in subpopulations of tourists, and thus predict a tourist's next location.
Thanks to the availability of call detail records (CDR) data during tourism seasons in 2014, we are able to analyze the location-time series of more than 20000 aanonymized cell phone users in Andorra, a counntry bordering Spain and France.
Topics we are trying to explore are:
How travel patterns differ in different subpopulations of tourists and during different time of the year
Whether it is possible to predict a tourist's future route choice based on their own or their group's travel history
Develop a better targeted tourist attraction recommendation system, based on potential places of interest for each traveler
What are the most important groups of tourists visiting Andorra? And at what time do they usually visit this country? The answer to these questions are important for identifying the appropriete target group of our recommendation system, as well as launching the system at the appropriete time.
We first carried out an exploratory analysis by mapping out twitter data related to travel activities. This allows us to identify places of interest as well as what types of events they are associated with.
What are the most important groups of tourists visiting Andorra? And at what time do they usually visit this country? The answer to these questions are important for identifying the appropriete target group of our recommendation system, as well as launching the system at the appropriete time. We plotted the number of cell phone users in this region over the entire 9 month period, aggregating them by their nationalities. Apparently, the general trend follows a stable weekly cycle throughout the time, and some variations happening during holiday seasons (winter and summer) and festivals (e.g. Easter). Spanish and French are the two main groups of travellers.
How much different these two main groups of tourists are from each other? Do they natually form two different clusters in terms of route choice? We are interested in undersdandint this question because if nationality is closely related to route choice, the recommendation efficacy would be largely increased just because we know a person's natinality. Here we plotted the number of long term versus short term visitors from each of the two categories, as well as their choice of cities to visit. As we can tell, Spanish short-term visitors usually choose city 0 and city 1, only long-term visitors would go to city 4,5,6. On the other hand, French people are more likely to pay a longger visit, and one-day visitors would almost certainly choose city 6, whereas long-term visitors would choose other cities, too.
Within each country, we looked further into internal variations of traveling styles. We implemented K-Means classification on days spent on each city for each traveler, and plotted the 8 possible different styles.
Taking into considerations of the spatial layour of these cities, we found out that Spanish visitors concentrate in the southwest area, whereas French people concentrate in the city on the northeast. By overlaying a time sequence analysis on top of that, we discovered that most Spanish tourists arrive at city 0 first and are very likely to go back to stay in city 0 until they leave. However for the French visitors, they arrive at city 6 (northeastern), but in most cases would move on the the next city shortly after that.
Based on the exploratory analysis, we found it may be possible to carry out a prediction on a person's next location based on their previous travel history, as well as their nationality.
Incorporating the results with a TripAdvisor comments dataset, we were able to identify the most popular places for Spanish and French tourists. Not surprsingly, the locations of popular places highly correspond to the frequency and duration of visit for people from these two countries.
we implemented a random forrest algorithm on this dataset, trying to predict a person's next location based on location-time series of his / her previous travel records. In addition to that, we implemented an association rule algorithm in order to figure out which other places people are likely to visit if they have already visited some cities.
Click on one of the thumbnails to see interactive maps of predicted next locations