Cheng, Zhiyuan (2014-05). Toward Geo-social Information Systems: Methods and Algorithms. Doctoral Dissertation. Thesis uri icon

abstract

  • The widespread adoption of GPS-enabled tagging of social media content via smartphones and social media services (e.g., Facebook, Twitter, Foursquare) uncovers a new window into the spatio-temporal activities of hundreds of millions of people. These \footprints" open new possibilities for understanding how people can organize for societal impact and lay the foundation for new crowd-powered geo-social systems. However, there are key challenges to delivering on this promise: the slow adoption of location sharing, the inherent bias in the users that do share location, imbalanced location granularity, respecting location privacy, among many others. With these challenges in mind, this dissertation aims to develop the framework, algorithms, and methods for a new class of geo-social information systems. The dissertation is structured in two main parts: the rst focuses on understanding the capacity of existing footprints; the second demonstrates the potential of new geo-social information systems through two concrete prototypes. First, we investigate the capacity of using these geo-social footprints to build new geo-social information systems. (i): we propose and evaluate a probabilistic framework for estimating a microblog user's location based purely on the content of the user's posts. With the help of a classi cation component for automatically identifying words in tweets with a strong local geo-scope, the location estimator places 51% of Twitter users within 100 miles of their actual location. (ii): we investigate a set of 22 million check-ins across 220,000 users and report a quantitative assessment of human mobility patterns by analyzing the spatial, temporal, social, and textual aspects associated with these footprints. Concretely, we observe that users follow simple reproducible mobility patterns. (iii): we compare a set of 35 million publicly shared check-ins with a set of over 400 million private query logs recorded by a commercial hotel search engine. Although generated by users with fundamentally di erent intentions, we nd common conclusions may be drawn from both data sources, indicating the viability of publicly shared location information to complement (and replace, in some cases), privately held location information. Second, we introduce a couple of prototypes of new geo-social information systems that utilize the collective intelligence from the emerging geo-social footprints. Concretely, we propose an activity-driven search system, and a local expert nding system that both take advantage of the collective intelligence. Speci cally, we study location-based activity patterns revealed through location sharing services and nd that these activity patterns can identify semantically related locations, and help with both unsupervised location clustering, and supervised location categorization with a high con dence. Based on these results, we show how activity-driven semantic organization of locations may be naturally incorporated into location-based web search. In addition, we propose a local expert nding system that identi es top local experts for a topic in a location. Concretely, the system utilizes semantic labels that people label each other, people's locations in current location-based social networks, and can identify top local experts with a high precision. We also observe that the proposed local authority metrics that utilize collective intelligence from expert candidates' core audience (list labelers), signi cantly improve the performance of local experts nding than the more intuitive way that only considers candidates' locations. iii
  • The widespread adoption of GPS-enabled tagging of social media content via
    smartphones and social media services (e.g., Facebook, Twitter, Foursquare) uncovers
    a new window into the spatio-temporal activities of hundreds of millions of people.
    These footprints" open new possibilities for understanding how people can organize
    for societal impact and lay the foundation for new crowd-powered geo-social systems.
    However, there are key challenges to delivering on this promise: the slow adoption
    of location sharing, the inherent bias in the users that do share location, imbalanced
    location granularity, respecting location privacy, among many others. With these
    challenges in mind, this dissertation aims to develop the framework, algorithms, and
    methods for a new class of geo-social information systems. The dissertation is structured
    in two main parts: the rst focuses on understanding the capacity of existing
    footprints; the second demonstrates the potential of new geo-social information systems
    through two concrete prototypes.

    First, we investigate the capacity of using these geo-social footprints to build new
    geo-social information systems. (i): we propose and evaluate a probabilistic framework
    for estimating a microblog user's location based purely on the content of the
    user's posts. With the help of a classi cation component for automatically identifying
    words in tweets with a strong local geo-scope, the location estimator places 51%
    of Twitter users within 100 miles of their actual location. (ii): we investigate a set of
    22 million check-ins across 220,000 users and report a quantitative assessment of human
    mobility patterns by analyzing the spatial, temporal, social, and textual aspects
    associated with these footprints. Concretely, we observe that users follow simple reproducible
    mobility patterns. (iii): we compare a set of 35 million publicly shared check-ins with a set of over 400 million private query logs recorded by a commercial
    hotel search engine. Although generated by users with fundamentally di erent intentions,
    we nd common conclusions may be drawn from both data sources, indicating
    the viability of publicly shared location information to complement (and replace, in
    some cases), privately held location information.

    Second, we introduce a couple of prototypes of new geo-social information systems
    that utilize the collective intelligence from the emerging geo-social footprints.
    Concretely, we propose an activity-driven search system, and a local expert nding
    system that both take advantage of the collective intelligence. Speci cally, we study
    location-based activity patterns revealed through location sharing services and nd
    that these activity patterns can identify semantically related locations, and help with
    both unsupervised location clustering, and supervised location categorization with a
    high con dence. Based on these results, we show how activity-driven semantic organization
    of locations may be naturally incorporated into location-based web search.
    In addition, we propose a local expert nding system that identi es top local experts
    for a topic in a location. Concretely, the system utilizes semantic labels that people
    label each other, people's locations in current location-based social networks, and can
    identify top local experts with a high precision. We also observe that the proposed
    local authority metrics that utilize collective intelligence from expert candidates' core
    audience (list labelers), signi cantly improve the performance of local experts nding
    than the more intuitive way that only considers candidates' locations.
    iii

publication date

  • May 2014