Network is defined as global interconnected things or objects.

And there are many examples of spatial data represented by network.

The best example is transportation network,

such as road network and railroad network, and so on.

Network data can be stored in network data model,

which is basically a graph data structure,

which you studied in the third week.

Network analysis can be defined as data processing and analysis of network data.

It includes data pre-processing,

such as Geocoding and map-matching and data analysis,

such as routing, service area delineation,

resource allocation, and so on.

In fact, you already used network analysis in proximity and accessibility analysis,

in which floating catchment analysis defines service area by travel time or travel distance,

which are based on transportation network analysis.

Geocoding is the first example of network analysis.

Precisely speaking, one method of Geocoding

is based on network analysis, but others are not.

The definition of Geocoding is the process of

exchanging between coordinates and description of place.

The example is that,

the exchange between coordinates and addresses,

also known as address matching, which is based on network analysis.

Geocoding includes not only the address matching but also other methods,

such as c-squares and GeoHash.

Address matching is a critical issue in spatial data analytics where

address-based data and coordinate-based data are integrated.

C-squares and GeoHash are not exactly related to

network analysis but they are very important

particularly in spatial big data management and processing.

So, I, on purpose, added them here for more explanation.

The conversion to address with respect to a given coordinate,

is to interpolate the position of the address based on its numeric distance,

between the starting and the ending addresses on the block of road network.

In urban area, each road segment corresponds to a block.

And it has attributes of the left from to and the right from to addresses.

In the example, the left from and to is 100 and 200.

On the other hand, the right from and to is 101 and 201.

So, if the coordinate is located on the left side,

and the proportion of the location on the road network is 60 percent,

then the address is expected to be 160 main street by interpolation.

This is a simple example of address matching from coordinate to address.

It looks simple and straightforward.

However, in reality, it will not work out very well in rural area.

At the same time,

it should be noted that,

the example is for road-based address matching in the US.

The opposite way around,

conversion to coordinate with respect to given address should take a series

of text processing steps.

The process includes parsing the input address

into address components such as street names,

street types, city, zip code,

standardizing abbreviated values,

assigning each address element to category known as match key,

searching the reference data, assigning a score to each potential candidate,

and delivering the best match

and each coordinate as the outcome of the whole process.

This is not domain-specific process.

In other words, different countries have different Geocoding solutions.

Surprisingly, the accuracy of the address matching is not that high.

In most countries, 90 percent,

more or less, is the typical range.

C-squares stands for Concise Spatial Query and Representation System,

which provides the basis for simple spatial indexing of geographic features.

C-squares divides the surface of the earth

into rectangular cells at a different level of scales,

and assigns a unique c-squares code to each cell.

The code is basically encoding of latitude and longitude.

Here's an example of c-squares code for one degree cell with origin

at 37 degree north and 127 degree east.

The first number represents the global sector,

one of northeast, southeast, southwest, and northwest.

The second digit stands for tens of degrees in the latitude.

And third and fourth digit, collectively represent tens of degrees in the longitude.

After colon, the first digit stand for five degree quadrant.

The second and the third digit denote

additional degrees in latitude and longitude, respectively.

In such way, all the rectangular cell can have its unique c-squares code.

Now you are looking at one degree by one degree cell around Seoul in Korea.

Where 1312:477 is highlighted.

GeoHash is another Geocoding system, and similar to c-squares,

it divides the surface of the earth into rectangular cells,

and assign a unique ID which can be

automatically generated by transforming latitude and longitude to symbol,

composed of numbers and alphabets.

It is hierarchical spatial data structure based on

Z-order curve which is one of spaced-filling curves.

The symbol is based on Base32 which is composed of 32 numbers and characters.

It is easy to program and it is a popular Geocoding method.

The table shows the matching between base 32 and decimal number 0 to 31.

For example, wydm8 in base 32 is 28, 30,

12, 19, and 8 in decimal number,

which can be converted to a binary code.

With respect to the given binary code,

digits on odd position represents longitude,

and digits on even position does latitude.

The slide explains the decoding methods of GeoHash.

Each binary code is used to determine a series of dividends of a given interval.

The first digit represents the interval of minus 90 to plus 90,

and it is divided by two.

producing 2 intervals, minus 90 to zero and zero to plus 90.

Since the first digit is one,

the higher interval zero to plus 90 is chosen.

The procedure is repeated for all the bits in the code.

Longitudes are processed in the same way.

Keeping in mind that the initial interval is minus 180 to plus 180.

The table summarizes the decoding

of a given code for latitude 1 0 1 1 0 1 0 1 0 1 1 0.

The first digit is 1.

So the interval zero to plus 90 degree is chosen in yellow color,

and zero to plus 90 degree interval is divided by 2 intervals,

and the second digit is zero,

so the lower intervals,

which is 0 to plus 45 is chosen.

The third digit this 1,

so the higher interval 22.5 to 45 is chosen.

The procedure is applied for the other digits, and finally,

the interval of the code is between 37.529297 and 37.573243.

That figure illustrates the areas of which GeoHash code wydm8 represents,

region around Seoul Korea.

The given longitude and latitude are the center coordinates of the given cell of wydm8.

The value of GeoHash

as well as C-squares is that

they can provide one-dimensional indexing for two-dimensional spatial data.

Why is this important?

Because spatial big data process GeoHash or C-squares

can be used for the key value pairs in the Hadoop MapReduce.

Particularly GeoHash is popularly used for the propose,

because GeoHash code contains spatial semantics,

locations, and it is rather easy to calculate.

There is a value of GeoHash in spatial big data processing.