It’s been quite a while since my last post about Google Summer of code activities. I was trying to understand the inner working of the library which I use and gathering deeper knowledge about spatial coordinate and mapping. As I may previously mentioned in earlier post, I am working with GDAL and PROJ.4 Library. GDAL/OGR is mainly used for supporting many raster/vector format and also dealing with WKT (Well Known Text) representation of coordinate system. The Coordinate system is defined in WKT Format although in the GDAL/OGR Library, the main coordinate system transformation is done in the PROJ.4 library which is wrapped through an interface which consists of OGRSpatialReference and OGRCoordinateTransformation.
A little bit of Geospatial Coordinate System
A Spatial coordinate is a set of numbers relating to a specific point in space. These coordinate is defined relative to some coordinate system which is used as reference. In a geospatial application there are three kind of systems which are :
- Geocentric Coordinate System,
This is a 3D rectangular/cartesian coordinate system in a sense that the coordinate is defined as a relative position from a common origin (usually the center of mass of the planet).
- Geographic Coordinate System,
A geographic coordinate system is where coordinates are defined over more local concept such as horizontal and vertical components. Horizontal coordinates are defined as a position in a 3D Surface (usually Ellipsoid as a model of planet’s shape). While vertical component is what we understand as ‘height’ (We’ll come back to this later) which in our common understanding is a displacement in normal direction of the surface.
- Projected Coordinate System,
Although Horizontal component in Geographic Coordinate System is already “2D”, it is actually defined in a curved surface while the medium we usually interacting is flat (paper, screen). Therefore these geographic coordinate is required to be ‘flattened’ by projecting to the flat surface. Since the projection is done from curved to flat surface, it introduces distortion. There are various projection method which is developed to minimize distortion or to preserve specific information (length/angle) with respect to the task at hand.
Geospatial Coordinate Transformation Chain
A full chain transformation from a projected coordinate system to another projected coordinate system might be composed of a series of steps if the source and target coordinate system use different reference/datum. Suppose we want to transform coordinate P from projected coordinate system (CS) to another CS (P’). The steps might be the following :
- transform from source Projected CS to source Geographic CS
- transform from source Geographic CS to source Geocentric CS
- transform from source Geocentric CS to target Geocentric CS
- transform from target Geocentric CS to target Geographic CS
- transform from target Geographic CS to target Projected CS
Of course the steps required to transform from one CS to another might not involve all above steps. for example, if source and target geographic CS are using the same datum then there is no need to transform to Geocentric CS for datum transformation. Thus, we follow a shortcut or skip some steps.
Information about vertical measurement are usually called Elevation. Our intuitive notion for vertical information is height. But in Geodesy, there are two (at least what I’ve understand so far) concept of height. The first concept is geometric, which is measured as radius from the center of the earth. The second is based on gravity potential. The distinction is explained by example where water flows. water might flow from low geometric height to higher geometric height if the point in higher geometric height has higher gravity potential. In short, there are some names for height:
- Ellipsoidal Height,
from it’s name, the height is defined as distance from the surface to the surface of the reference ellipsoid (as rough model of the planet’s shape).
- Orthometric Height,
Orthometric height is defined as distance from the surface to the reference ‘zero surface’ which is usually based on gravity model (gravimetric), reference point (tropometric). Reference surface based on gravity potential are called Geoid. This zero surface called Geoid also known as MSL (mean sea level).
The main problem which become the focus of work in the google summer of code is that in existing system (GDAL/OGR-PROJ.4), there is limited support for coordinate transformation involving vertical information. As mentioned before, this transformation is implemented in PROJ.4 which then wrapped in GDAL/OGR. The only supported transformation is vertical grid shift which assumes the use of the same datum and also homogeneity of the height system. If there is a inhomogeneity (as the case in Austria), it might be cumbersome to fix each point using other information. Therefore the problem which is tried to be tackled in this summer of code is to implement a rigorous support for coordinate transformation involving vertical datum and vertical correction model for homogeneity.
Geoids are usually stored as raster file which GDAL supports many formats and a common interface for manipulation. But the coordinate transformation is done in PROJ.4 which makes the dependency become circular if the modification is implemented in PROJ. To overcome this circular dependency, a redundancy is introduced by duplicating a part of PROJ function which then modified and put in the GDAL/OGR codebase (this is the first idea).
Other thing considered in this GSoC2013 is the efficiency while handling raster data for coordinate transformation. The predicted use-case scenario is involving a large set of point therefore seeking the raster file for each point is not considered efficient but naively loading the raster also not a better option if the raster file is large. Some Ideas are to load a portion of the raster in memory (caching) and reuse this portion for further transformation as much as possible to reduce disk access. In general there are some approaches :
- one-pass cache-free approach,
This is a regular approach where raster file is accessed for each point. This is good for small set of points (about less than 10 pts.)
- one-pass naive caching approach,
A cache is updated if the current point to be transformed is outside the covered region of the cache. This is more efficient from one-pass but the worst case is if the points are not in order and next point lies outside current point which is less efficient than one-pass because of loading larger chunk of data from disk.
- two-pass (indexed caching) approach,
This approach consists of two steps. first it traverse the set of points to build an index then it process each point based on this index. using the cache-update mechanism as previous approach. this method optimize the processing order to avoid worst-case. There are some alternative to how indexing is done. One can calculate the extent of all the points and make partitions to group of points (which make it a little bit like three pass) or one can build partition while traversal which basically reorganizing the order by skipping some points that are not in the immediate vicinity of preceding points. I will explain this algorithm in more detail after implementation.
Last Wednesday (24.07.2013 15:00 CEST) I had a meeting with the mentors to discuss things that are done so far and the direction to proceed.
- Move implemented code into separate library which depends on GDAL and PROJ library (done)
- Refactor raster object management and interpolation into separate class for further optimization (ongoing)
- Transformation chain shortcut (pending after mid-term eval.)
- Implement unsupported geocentric coordinate transformation (pending after mid-term eval.)