Generalizing hydrosheds

GeneralizingHydrosheds

The hydrosheds 3" (~90m) DEM is a depressionless DEM built from the SRTM dataset and vector hydrography. Our group at the University of Washington needed the data at 5-minute resolution to run the VIC model on the Mekong and Zambezi Rivers. Such a dataset in not now available from hydrosheds, so I wrote the code to create one in the a very rigorous manner. Some preparation was done with commercial software, but the actual computation was done with a python program which executes (on an old computer running RHEL4 linux) in nine minutes.

The essential algorithm

We start with a coarse binary file of flags that represent the area of interest, a large (> 2GB) binary file (with the same geographic extent) containing flow accumulation values, and a large file of flowdirection flags. For each 5-minute block in the area of interest, flows from the 396 perimeter cells are examined. If water flows back into itself, it is ignored. Otherwise, the flow is traced up to 62 steps to see if it flows back into itself (ignore it), flows into a diagonal neighbor, or remains in the adjacent block. These split flows are buffered for the entire landscape. Then the flows between pairs of blocks are calculated, and flow goes to the neighbor cell with the largest flow. This algorithm may generate some imperfections where flows if crisscross. When these are encountered, the smaller of the flows is directed to the same cell as its neighbor.

How can one aggregate flowaccumulation?

I don't see how this program can render an improved flowaccumulation. It is best to calculate flowaccumulation from the new coarse flowdirection grid. It can be improved by weighting it by the number of 3" basin cells contained within each block. It can be further refined by weighting by the cosine of the latitude and a factor to get square km.

Notes on the program.

The main loop is driven by a mask of the area of interest. I used a bil file created by ArcGIS from a buffered basin grid. I made sure that the mask was an 8-bit file, as python 2.5.1 provides no handy tools for unpacking a one-bit mask into an array. It would be simple to alter the program to process all the blocks--except for the edges--in the array. With no bounds checking, water in the edge cells could be directed off the edge of the array, causing index overflow. The cells at the perimeter of the area of interest do not see flows from outside that area, and should therefore be discarded, especially since bad bad can lead to loops.

The program uses arrays for speed, but arrays are always one-dimensional, so it generates index pointers for these arrays. It is a bit schizoid, using two-dimensional lists for alltots and mask. Speaking of pointers and craziness, I encountered bizarre errors occured while reading the large flowdirection file. After reading n bytes, the file pointer was advanced more than n bytes. This occurred on an NFTS filesystem under XP, and disappeared after I copied the file to an ext2 system under RHEL4. The program now writes to an Errorfile if it sees such a problem.

Three diagnostic files are written. The file a8 has a line for each processed cell, listing row, col, and the flows to each neighboring cell, clockwise from eastward. This is equivalent to the alltots list with the program. The files pointaccs.txt and arrows.gen can be imported into arcmap for visual verification of what the program is doing. The first contains the geographic location and flowaccumulation for cells on the perimeter of each block. It can be imported to arcamp with tools:add_XY_data. The second file shows the flowpath traced for each perimeter cell. It can be converted to an arc/info line coverage with the generate command. As these two files could be very large, they are produced only for those rows listed in the diagrows list.