Michael P. Cutter

Graduate Student Researcher


The Decapod Project

The decapod project's goal is to provide an end-to-end book capture system. Two consumer grade digital cameras will be calibrated in order to capture a 3D representation of a books surface. The surface will be dewarped and then will be outputted in a searchable PDF. The digital document will be recognized with the open source optical character recognition engine, OCRopus. The PDF will be compressed with token compression and a novel method of font reconstruction.

Font reconstruction is under development for the purpose of faithfully reproducing documents with vector type one fonts, without any prior knowledge of the fonts. A type one font digital representation is a set of mathematical instructions, which define the shape of each glyph and how they are inter-spaced when typeset on a document image. Font reconstruction is the inference of the mathematical representation of the digital font, given how it is typeset on a document image.

The goal of the font generation model is to completely reconstruct the font(s) found in a scanned document image. The newly reconstructed font is intended to have all the same characteristics of the original font in order to reproduce the original document in a visually faithful way.

Font reconstruction has been split into three tasks:

Once these steps are completed with a high degree of accuracy it will be possible to combine a reconstructed font with page image segmentation to produce high quality visual representations of scanned documents. The source code for this project is available on Decapod’s Google Code mercurial repository. This specifc work is available in the gen-pdf sub repository.

The Deacpod project is supported by a grant from The Andrew W. Mellon Foundation.

Personalized Recommendation Engine

logoview video summary!

Internet users expect to find interesting articles online easily. Those same users share the same sentiment while driving; however, it is difficult to find content on news radio that is relevant. The solution is RECOO, an application that streams personalized content directly to his/her mobile phone. When you drive you must focus on the road and cannot complete complex search tasks. The personalized news recommendation system, RECOO, learns user preferences while reading the news to the user while he/she drives. Web and mobile interfaces log implicit and explicit feedback automatically, which is then used to update the user’s profile. A user can create a profile either on the web or on his/her cellphone, which provides recommended news articles based on a hybrid of collaborative and content filtering.

  • RSS parser (spider and story extractor) courtesy of FeedParser
  • Search Engine (index and news categorizer) functionality courtesy of Lemur
  • Mobile application written in J2ME [source code is available upon request]
  • Text to Speech is performed by The MARY Text-to-Speech System
  • Collaborative filtering (search and personalize) is performed with the Flexible Mixture Model report and source code

The Recoo system is part of the Proactive Personalized Information Integration and Retrieval project funded by the National Science Foundation.