The hydrosheds 3" (~90m) DEM is a depressionless DEM built from the
SRTM dataset and vector hydrography. Our group at the
University of Washington needed the data at
5-minute resolution to run the VIC model on the Mekong and Zambezi Rivers.
Such a
dataset in not now available from hydrosheds, so I wrote the code to create
one in the a very rigorous manner. Some preparation was done with
commercial software, but the actual computation was done with
a python program which executes (on an old
computer running RHEL4 linux) in nine minutes.
The essential algorithm
We start with a coarse binary file of flags that represent the area of
interest, a large (> 2GB) binary file (with the same geographic extent)
containing flow accumulation values, and a large file of flowdirection
flags. For each 5-minute block in the area of interest, flows from the
396 perimeter cells are examined. If water flows back into itself,
it is ignored. Otherwise, the flow is traced up to 62 steps to see if it
flows back into itself (ignore it), flows into a diagonal neighbor,
or remains in the adjacent block. These split flows are buffered for the
entire landscape. Then the flows between pairs of blocks are calculated,
and flow goes to the neighbor cell with the largest flow.
This algorithm may generate some imperfections where flows if crisscross.
When these are encountered, the smaller of the flows is directed to the
same cell as its neighbor.
How can one aggregate flowaccumulation?
I don't see how this program can render an improved flowaccumulation.
It is best to calculate flowaccumulation from the new coarse flowdirection
grid. It can be improved by weighting it by the number of 3" basin cells
contained within each block. It can be further refined by weighting by the
cosine of the latitude and a factor to get square km.
Notes on the program.
The main loop is driven by a mask of the area of interest. I used a bil file
created by ArcGIS from a buffered basin grid. I made sure that the mask was
an 8-bit file, as python 2.5.1 provides no handy tools for unpacking a
one-bit mask into an array. It would be simple to alter the program to
process all the blocks--except for the edges--in the array. With no bounds
checking, water in the edge cells could be directed off the edge of the array,
causing index overflow. The cells at the perimeter of the area of interest
do not see flows from outside that area, and should therefore be discarded,
especially since bad bad can lead to loops.
The program uses arrays for speed, but arrays are always one-dimensional, so
it generates index pointers for these arrays. It is a bit schizoid,
using two-dimensional lists for alltots and mask. Speaking of pointers
and craziness, I encountered bizarre errors occured while reading the
large flowdirection file. After reading n bytes, the file pointer
was advanced more than n bytes. This occurred on an NFTS filesystem under XP,
and disappeared after I copied the file to an ext2 system under RHEL4.
The program now writes to an Errorfile if it sees such a problem.
Three diagnostic files are written. The file a8 has a line for each
processed cell, listing row, col, and the flows to each neighboring cell, clockwise from eastward. This is equivalent to the alltots list with the program.
The files pointaccs.txt and arrows.gen can be imported into arcmap for
visual verification of what the program is doing. The first contains the
geographic location and flowaccumulation for cells on the perimeter of each
block. It can be imported to arcamp with tools:add_XY_data. The second
file shows the flowpath traced for each perimeter cell.
It can be converted to an arc/info line coverage with the generate command.
As these two files could be very large, they are produced only for those rows
listed in the diagrows list.