![]() |
|
| |
|
Yitzhak Mandelbaum
Princeton University
The Theory and Practice of Data Description
May 1, 2006 10:00am
Abstract:
Massive amounts of useful data are stored and processed in non-standard or ad hoc formats, for which critical tools like parsers and formatters do not exist. Traditional databases and XML systems provide rich infrastructure for processing well-behaved data, but are of little help when dealing with data in ad hoc formats.
I will discuss my attempts to address the challenges of ad hoc data with my work on the PADS project. I will present an introduction to PADS/ML, a declarative data description language that permits analysts to describe the physical layout of their data and its semantic properties. From a description, the PADS compiler can automatically generate a collection of useful data-processing tools for the data source described, including parsing routines, statistical profiling tools, and translators to standard formats like XML. I will discuss the formal semantics of the PADS language and two of its essential properties. Finally, I will describe support for querying ad hoc data with the PADS tool PADX. I will discuss PADX from the user’s perspective and review the main challenges encountered in implementing PADX and their solutions.
If you have questions, or would like to meet the speaker, please contact Ponda at 4-1994 or pondabarnes@tti-c.org. For information on future TTI-C talks or events, please go to the TTI-C Events page.