Issues in the Management of Spatio-Textual Data

Issues in the Management of Spatio-Textual Data

ABSTRACT

The explicit specification of location has traditionally been geometric (e.g., as latitude-longitude pairs of numbers). However, few people know them nor do they use them to communicate or receive them from others in this way. Instead, people are used to specify a location textually, which includes verbally, often on a mobile device where a virtual keyboard is always present. Textual specifications have numerous benefits. First, they are very useful for searching for text documents for relevance to a particular location or set of locations. Second, textual location specifications are general in the sense that when they are used in a query, there is no need to be concerned about their internal representation (e.g., a city like Los Angeles can be a point or an area or every a boundary). In this project, the underlying textual data is accessed via a map query interface using direct manipulation actions such as pan and zoom to navigate the data. The advantage of these actions is that the act of pointing at a location (e.g., by the appropriate positioning of a pointing device or gesturing appropriately) and making the interpretation of the precision of this positioning specification dependent on the zoom level is equivalent to permitting spatial synonyms.

This project is a response to shortcomings observed in systems and applications such as NewsStand and TwitterStand which make use of a map interface to access documents such as news and tweets, respectively. (1) Detecting tweets about local events. This is difficult as only a few people may be posting related tweets in contrast to global events where many people post tweets thereby making it easier to detect them. (2) Improving the resolution of ambiguous location names when retrieving documents using textually-specified locations by developing more appropriate precision and recall evaluation metrics. (3) Enabling domain-specific tracking of mentions of events such as crimes and diseases in news and social media such as Twitter over time with the aid of heat maps which may have an impact on public safety and health. (4) Allowing users to specify the desired domain in (3) as well as infer it by use of exemplars. (5) Improving NewsStand's clustering by using word2vec which makes better use of semantics than the currently used TF-IDF. This clustering is used for the actual documents and their associated images and videos. This has the advantage of enabling the detection of similar images on semantics which are the contents of the related news articles and tweets rather than local features such as color, texture, etc. This has the potential for much more sophisticated retrieval than just using image captions or tagging the images with humanly generated keywords. Note that no humans are involved in the image similarity process.

NSF Grant IIS-18-1816889

PI: Hanan Samet

Relevant Publications:

  1. Ayhan, Samet and Costas, Pablo and Samet, Hanan. (2018). Prescriptive analytics system for long-range aircraft conflict detection and resolution. 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 239 to 248. doi:10.1145/3274895.3274947.

  2. Cao, Hancheng and Sankaranarayanan, Jagan and Feng, Jie and Li, Yong and Samet, Hanan. (2019). Understanding metropolitan crowd mobility via mobile cellular accessing data. ACM Transactions on Spatial Algorithms and Systems. 5 (2) 1 to 18. doi:10.1145/3323345.

  3. Peng, Shangfu and Sankaranarayanan, Jagan and Samet, Hanan. (2018). DOS: A spatial system offering extremely high-throughput road distance computations. 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 199 to 208. doi:10.1145/3274895.3274898.

  4. Wei, Hong and Anjaria, Janit and Samet, Hanan. Learning embeddings of spatial, textual and temporal entities in geotagged tweets. Submitted for publication.

  5. Wei, Hong and Fellegara, Riccardo and Wang, Yin and De Floriani, Leila and Samet, Hanan. (2018). Multi-level filtering to retrieve similar trajectories under the Frechet distance. 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 600 to 603. doi:10.1145/3274895.3274978.

  6. Wei, Hong and Sankaranarayanan, Jagan and Samet, Hanan.. (2018). Enhancing Local Live Tweet Stream to Detect News. Second ACM SIGSPATIAL Workshop on Analytics for Local Events and News (LENS 2018). doi:10.1145/3282866.3282868.

  7. Wei, Hong and Zhou, Hao and Sankaranarayanan, Jagan and Sengupta, Sudipta and Samet, Hanan. (2018). Detecting latest local events from geotagged tweet streams. 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 520 to 523.doi:10.1145/3274895.3274977.

Theses

  1. Janit Anjaria. DistLearn: Learning to Compute Distance between Trajectories. M.S. Scholarly Paper.

  2. Samet Ayhan: Airspace Planning for Optimal Capacity, Efficiency, and Safety Using Analytics. Ph.D. Thesis.

  3. Hao Li (with Tom Goldstein): Toward Fast and Efficient Representation Learning. Ph.D. Thesis.

  4. Shangfu Peng: High-Throughput Network Distance Computations for Spatial Analytics Inside Any Store. Ph.D. Thesis.

Programs/Software

  1. NewsStand. An example application of a general framework that enables people to search for information with a map-query interface. The NewsStand system monitors the output of more than 10,000 RSS news feeds and incorporates new articles within minutes of publication. Each article undergoes a geotagging procedure, where location references are identified and interpreted, allowing us to associate each article with the geographic locations that it mentions.
  2. An article describing the NewsStand system appears as the cover article of the October 2014 issue of the Communications of the ACM. It can be found at http://tinyurl.com/newsstand-cacm. A cached version can be found at http://www.cs.umd.edu/~hjs/pubs/cacm-newsstand.pdf The original NewsStand article in the 2008 SIGSPATIAL conference received the 2018 SIGSPATIAL 10 year impact award.

    ACM also made a video about NewsStand to accompany the above article which can be viewed at https://vimeo.com/106352925

    For a tour of the various functions of NewsStand, click on the "Tour" button in the upper right corner of the NewsStand system at http://newsstand.umiacs.umd.edu, or alternatively, go to http://newsstand.umiacs.umd.edu/web/#tour.

  3. TwitterStand. The TwitterStand system monitors a tweet stream where each tweet undergoes a geotagging procedure, where location references are identified and interpreted, allowing us to associate each article with the geographic locations that it mentions explicitly or that it is associated with as a result of the news or news tweet cluster with which it is associated based on data from the NewsStand system..

  4. PhotoStand. An image based browser that enables the use of a map query interface to retrieve news photos associated with news articles that are in turn associated with the principal locations that they mention, based on the data from the NewsStand system.