5 Data Design Separating Data, Part 1


[MUSIC] Welcome back everyone. In this lesson, we're going to cover data structures and joins in database tables. This sounds maybe a little more technical than some of you thought you signed up for. But it's a really critical component of GIS that you should understand. And it's easy to understand at the level that you need to know it. To begin with, let's take a data table similar to something that we might have in my line of work of watershed sciences. And it's going to be field data which I hope you can easily adapt to the type of work you're doing. It's where someone goes out and collects some information using a sensor and a GPS at some remote location and then brings it back and needs to analyze it. So we'll start by creating a table here that'll look just like your data tables in ArcGIS. And I'm going to create a column here. I'm just going to call it ID. That'll be our primary key column like you're used to from the datasets you've been working with. And then, if we think about it, probably the next most basic column we need in a GIS is a location field of some sort. We're used to that being called Shape in ArcGIS, but you could have your latitude and longitude coordinates or something like that. For now, I'm going to call it Shape just so it mirrors what you'll probably see in ArcGIS. And then, what's left is the data we actually captured when we sent somebody out into the field. So in this case, we're going to have Velocity or for stream velocity, the measure of flow. And then we'll have Depth for the stream depth at that location and then we may also want to know who took that measurement. So I'm going to create a field called Observer here. And that's who was sent out in the field. That, in this case, could be something like the name. So let's create a couple records. I'm just going to give them IDs really quickly, 1, 2, 3 and Shape will just be some sort of binary data. We can't actually read but ArcGIS can display it for us and that'll go all the way through there. And then for Velocity and Depth, I'm just going to make up some numbers so we can work with them. 12, 8, 7 and Depth in whatever units they're collecting in, might be 4, 17, 8. Something like that. And then Observer, give these people names. Liam, Sarah and Kelly. Okay, so we have a really basic data table here. Records our information in a way that seems acceptable to me and doesn't really have any special cases. We could just keep adding more and more records, more locations, more velocity, more depth, more observers. But where it starts to get concerning is as we grow out in specific directions, we run into potential complications where we want to start separating our data into multiple tables that we bring back together with things called joins or relates in ArcGIS. In database parlance, it tends to be joins across table relationships, but in ArcGIS there are joins and relates, which we'll explain at some point. So, the next thing we might want to do with this data table would be to add a field indicating when this information was collected so that we can track observations across time. So I'm just going to remove the end here, So we can extend our table out a little further. And add another field called Date. As a side note, you don't generally want to actually call your fields Date. That's what's called the reserved word and the database might choke on it. You might call it something like observation date. But for this particular example we'll call it Date just to keep it short. And I'll put 2016-01-14, which, it says 14, for today's date. And then we'll do 2015-12-7-07 there and then we'll do 2015-11-18 or something like that for those observation dates. If I start adding more records here with more dates and more locations, this is fine. But think about what happens if I start wanting to record multiple observations for the same location here. I start duplicating information about my location and if we do that, we're going to get multiple features in ArcGIS at the same location. And they're going to overlap each other and that's going to be inconvenient. Because if all the sudden I want to say edit the name of a location or the actual location I need to correct it, then we run into issues where we have to do that across multiple records and that's time consuming and error-prone. So what do we do about it? We're going to use that thing called joins we were talking about. And we're going to create a new table and separate it out so that we can attach it back when we need it but that the data is kind of in its own object. We have data tables that have a theme, that they can characterize a specific phenomenon and then we can bring them together to understand something larger. So, I'll do that right over here for now, and we'll call this table Observations. And then I'll create a table over here that I'll call Sites. And in the Sites table, I'm also going to have an ID, just like in the Observations table, but it's going to be an ID that refers to its own things. And, in that Sites table, we're basically going to take this Shape field and move it over here too. So, we'll have ID, we're going to have Shape and maybe we'll have a site name now that we have its own table for it. Site Name, and we'll draw those fields down and I'll give it a few IDs here. 3, 4, and then this is still binary data here. It represents feature information in ArcGIS. And then we have a site name. I'm going to call them Site A and Site B. And maybe this one will be like B downstream or something. Indicating it's still near B, but it's not exactly at B. And then we'll call this one Site C. Now that I have it in this table, I can remove it from the Observations table. And we'll just strip off this whole field in here, And we'll rewrite that in a moment. We're going to modify this table now. In programming terms, a lot of times that's called refactoring, when you take something you already have and you change it to update it to your new parameters. You call that refactoring. So we're going to have two fields still, because we still need an ID field for our Observations. And if you were going to do this in place, they'd keep their same existing IDs if we were moving it out. The Sites would get this new ID and our Observations field would have their same IDs so we have our ID field. And then we have our Site_ID field. And what we get with that is we're going to reference the ID field of the Sites table in our Site_ID field to say, this Velocity, this Depth measurement occurred at the site with this ID. And then, we're going to bring those together in ArcGIS when we need to know information about the site, such as its location or its name. And in ArcGIS in particular, we'd actually go in the direction of starting with the sites table that has the location and attaching this information in that direction. And a lot of database systems, you might start on this other side. So we'll reference it back and say that this velocity and flow measurement occurred at Site with Site_ID 1. And this one occurred at Site with Site_ID 2 and this one occurred at Site with Site_ID 3. These happen to correspond right now just to keep it consistent with what it was before. But we can add a new record down here with ID 4 and let's say that this one also happened at SiteID 2. So we have multiple records at the same site and now we can attach those back to one site over here. And 11 and 13 and let's say this one also was done by Sarah. And it occurred on some date. So, now we have additional records to show that these aren't unique. This ID field is a reference to this ID field. And one way that we can keep this straight in our heads is if you make your table names plural, so instead of just being Site, this is Sites. This has multiple different sites in this table and then our foreign key here which references the key field, the ID field in a foreign table. It starts with the non-plural version of the table name and then ends with ID. You can come up with your own system. This isn't a rule. It's just a way to keep it straight so that if I send this to somebody, they automatically know without much additional documentation that the Site_ID field probably references the ID field in Sites and can be joined that way. Now, for a moment, just to make this a little stronger for you, let's take a moment to visualize how that join would occur. If I have these Sites over here and these Observations. If I have Sites in ArcGIS and I want to attach the data, I'd use the join tool, which we'll show you in a screencast. I'd use the join tool and attach this information. And how that would work is record here with Site_ID gets appended to the end of the record for Site_ID 1. And the record here with Site_ID 2 that occurred at Site_ID 2 rather goes and gets appended to Site_ID 2. And same for Site_ID 3. But then what happens, and this is what makes this something called a one-to-many join, is this one, since it also has Site_ID of 2, in ArcGIS, we'd use a relate to connect both of these records with Site_ID 2 here. And get the records to show up for us in our attribute table. Where this information with the Shape shows up with both of these records. And that way we can go query for information at Site_ID 2 and see everything that's happening at Site_ID 2 in one shot. I'm going to erase these now so you can see me again. We've covered a number of concepts in this video already, so let's pause it here and we'll continue on with some more advanced concepts in the next video. In this video, we covered how to separate our data into multiple tables with a type of join called a one-to-many join. We utilized primary keys and foreign keys to reference data in the Sites table from our Observations table. And add additional information to our observation records. In the next lesson, we'll make this even more powerful with a new type of join called the many-to-many join, where we associate large quantities of new data with our Observations table.

Wyszukiwarka

Podobne podstrony:
6 Data Design Separating Data, Part 2
function mailparse msg get part data
2 Data Design Attribute Types
4 Data Design Joins and Relates
Data shortcut
Data quotes
data sequence
data (3)
DOD Net Centric Data Strategy and Community of Interest (COI) Training Glossary
IEEE Finding Patterns in Three Dimensional Graphs Algorithms and Applications to Scientific Data M
SHSpec 224 6212C13 R2 12 Data Needle Behavior
rc183 010d data mining 1
function mysql data seek
Data Store Read
3 4 Screw Compressor Technical Data

więcej podobnych podstron