Andy White Anthropology
  • Home
  • Research Interests
    • Complexity Science
    • Prehistoric Social Networks
    • Eastern Woodlands Prehistory
    • Ancient Giants
  • Blog
  • Work in Progress
    • The Kirk Project >
      • Kirk 3D Models list
      • Kirk 3D Models embedded
      • Kirk 2D images >
        • Indiana
        • Kentucky
        • Michigan
        • Ontario
      • Kirk Project Datasets
    • Computational Modeling >
      • FN3D_V3
    • Radiocarbon Compilation
    • Fake Hercules Swords
    • Wild Carolina >
      • Plants >
        • Mosses
        • Ferns
        • Conifers
        • Flowering Plants >
          • Grasses
          • Trees
          • Other Flowering Plants
      • Animals >
        • Birds
        • Mammals
        • Crustaceans
        • Insects
        • Arachnids
        • Millipedes and Centipedes
        • Reptiles and Amphibians
      • Fungi
  • Annotated Publications
    • Journal Articles
    • Technical Reports
    • Doctoral Dissertation
  • Bibliography
  • Data

It's Time to Build an Eastern Woodlands Megabase

10/9/2017

1 Comment

 
Back in the late 2000's, I took the terrifying step of creating folders on my computer to start pursing my formal dissertation research. Around the same time, I realized that my system for organizing my paper files had become a sandbag. The physical compartments I was using to segregate "different" aspects of my work were hurting my ability to see and explore the overlapping areas of several inter-connected problems. I tore everything apart and put it back together again so the overall structure was different, the grains of information were different, and the "bins" were collapsed into a single well that I could draw from. In order to stop blindly analyzing the different parts of the elephant and start trying to understand the whole animal, you first have to understand  that you're looking at pieces of a much larger puzzle.

It was more of a strategy than an epiphany. 

Last week I got into the nitty-gritty of a SEAC paper I'm writing with David G. Anderson (University of Tennessee). We're using various large datasets to try to describe and interpret patterns of change in archaeological remains that could be related to changes in the size, structure, and distribution of human populations in the Eastern Woodland during the Late Pleistocene and Early Holocene.  

As I started pulling together information (from PIDBA, DINAA, and my ongoing radiocarbon compilation) and thinking about how to organize it, I realized that keeping the databases separate was both a logistical hassle and an analytical problem. I invested in dumping all the information into a single relational database that we can use for this paper and that I'll continue to update in the future. I've been calling it "Megabase" in my head. So that's what it is until it gets a better name.

​Here is an illustration that I'll briefly discuss:
Picture
  • DINAA is a compilation of state-curated site data, one entry per Smithsonian Trinomial;
  • PIDBA has county-by-county counts of various kinds of Paleoindian projectile points;
  • EWHADP is a compilation of prehistoric structure data (keyed to both county and Smithsonian Trinomial);
  • The Kirk Project is point-by-point attribute data, with most entries having county-level provenience;
  • Most of the entries in the radiocarbon compilation have a Smithsonian Trinomial.
On the left is what I'm building now. I used GIS to generate a listing of "center" UTM coordinates (n=2097) for every county in the eastern US (everything east of the first tier of states west of the Mississippi River) and much of eastern Canada. I'm calling that the "County Core." That coordinate list lets me easily create a spatially-reference file for whatever other information I want from any of the other databases without needing to know the exact locations of archaeological sites.  Making a county-level map of all eastern radiocarbon dates in the database (9,533 and counting) in the eastern US is just a matter of a few button clicks in Access, Excel, and GIS. The same is true of the PIDBA data, the Kirk Project data, the household archaeology data, and the DINAA data. 

The Megabase of Today will be fine for the SEAC paper and for the near future. It will be able to do a lot. Ideally, however, the Megabase of the Future will have DINAA serving as both a "router" for data that is attached to a Smithsonian Trinomial and an analytical tool in its own right. One issue is that not all states are currently participating (and therefore not all Smithsonian Trinomials -- the "addresses" for sites -- are in the system).  Another issue is that the site forms (and therefore the site information that is collected and stored) differ by state. To reach its full potential, DINAA data will have to be supplemented by additional data about the materials recovered from sites, how sites were recorded, etc. Ensuring that we're making "apples to apples" comparisons will be a significant chore -- DINAA currently has information on somewhere in the neighborhood of half a million sites. You can't just sit on your couch and cross-check all that.

I know enough to be dangerous with a computer, but I'm not sufficiently sophisticated to know the nuts-and-bolts options for building the Megabase of the Future. In 2015 we did a sort of "proof" of concept to demonstrate that the EWHADP and DINAA could be linked together. I'm not sure if that is they way to go or not. Perhaps there's something that can be done with blockchain technology -- it sure sounds cool.

Anyway, I'm going to get the Megabase of Today functional in time to do the analysis for the SEAC paper we'll give in a month. If you're interested in talking about the Megabase of the Future, please let me know.
1 Comment

EWHADP Coming Out of Mothballs (Again)

10/4/2017

0 Comments

 
Readers of this blog from the pre-Swordgate era may remember the Eastern Woodlands Household Archaeology Data Project (EWHADP) that I initiated in February of 2014. The goal of the EWHADP is to assemble and make available information about prehistoric residential structures in eastern North America. The project has it's own website where you can read all about it and access the data.

After an initial startup pulse, I had to put the project on hold when I started teaching at Grand Valley (Fall of 2014). Emily Gilhooly, one of my undergraduate students at Grand Valley, worked on the project during the winter of 2015 and began the long process of updating the database by checking and re-coding every single entry. As I prepared to make the move to the University of South Carolina in the spring of 2015, I started a GoFundMe campaign to raise some cash to pay a research assistant to work on the project. Thanks to the generosity of several donors, that campaign was successful. With the hiring of USC graduate student Laura Clifford, the project was up and running again in the fall of 2015. While Laura made a lot of headway tracking down references and moving the database forward, however, she soon found a better long-term employment situation. So I hit the pause button again on the EWHADP.

The project stayed paused as I moved from the SCIAA building into my new lab space, worked my way through my first two years of teaching at South Carolina, and initiated some excavation fieldwork at a site on the Broad River. After that field school, I hired one of my students (Sam McDorman) to do the bulk of the basic laboratory processing of the materials we recovered. With the artifacts from 38FA608 washed and cataloged, I can start analysis. And I can move Sam on to another project: the EWHADP.

Of the $3400 in donated EWHADP funds that I came with, there was about $2870 left at the beginning of the semester.  That money should be sufficient to get through at least 2.5 of the three goals I have:

1) assemble primary references for all information in the database;
2) check and re-code existing information in the database, supplying missing information and adding greater detail;
3) add new information to the database.

I don't expect to get through these goals quickly, but the wheels are now in motion again. The EWHADP is staffed, funded, and exists in a dedicated office space with room for files, books, piles of stuff, a scanner, and a computer. When the second goal is completed, you'll get an updated database that should be an order of magnitude better than the one that exists now. And then you'll start getting updates to the website as we begin adding in new information.

I'd like to again thank those that donated to this project and haven't questioned why it has periodically slowed down over the years: anyone who juggles knows that it's difficult to keep everything up in the air at all times. Your patience is appreciated. Thank you. And stay tuned.
Picture
EWHADP World Headquarters: staffed up and open for business.
0 Comments

Three-Headed Research Monster: A Brief Update

9/8/2016

1 Comment

 
Picture
We're now into the fourth week of the semester here at the University of South Carolina. As usual I've been writing for this blog less than I'd like (I have several unfinished draft posts and ideas for several more, and there's currently a backlog of Fake Hercules Swords). A good chunk of my time/energy is going into the Forbidden Archaeology class (you can follow along on the course website if you like -- I've been writing short synopses, and student-produced content will begin to appear a few weeks from now). Much of the remainder has gone into pushing forward the inter-locking components of my research agenda. This is a brief update about those pieces.


Small-Scale Archaeological Data

At the beginning of the summer I spent a little time in the field doing some preliminary excavation work at a site that contains (minimally) an intact Archaic component buried about 1.9 meters below the surface (see this quick summary).  Based on the general pattern here in the Carolina Piedmont and a couple of projectile points recovered from the slump at the base of the profile, my guess is that buried cultural zone dates to the Middle Archaic period (i.e., about 8000-5000 years ago).​

My daughter washed some of the artifacts from the site over the summer, and I've now got an undergraduate student working on finishing up the washing before moving on to cataloging and labeling. Once the lithics are labeled we'll be able to spread everything out and start fitting the quartz chipping debris back together. Because I piece-plotted the large majority of the lithic debris, fitting it back together will help us understand how the deposit was created. I'm hoping we can get some good insights into the very small-scale behaviors that created the lithic deposit (i.e.,perhaps the excavated portion of the deposit was created by just one or two people over the course of less than an hour).
Picture
Drawing of the deposits exposed in profile. The numbers in the image are too small to read, but the (presumably) Middle Archaic zone is the second from the bottom if you look at the left edge of the drawing. Woodland/Mississippian pit features are also exposed in the profile nearer the current ground surface.
When the archaeology faculty met to discuss the classes we'd be offering in the spring semester, I pitched the idea of running a one-day-per-week field school at the site. Assuming I can get sufficient enrollment numbers, that looks like it's going to happen. The site is within driving distance of Columbia, so we'll be commuting every Friday (leaving campus at 8:00 and returning by 4:00). The course will be listed as ANTH 322/722. It's sand, it's three dimensional, and it's pretty complicated -- it's going to be a fun excavation. I'll be looking to hire a graduate student to assist me on Fridays, and I'll be applying for grant monies to cover the costs of the field assistant's wages, transportation, and other costs associated with putting a crew in the field. 

Large-Scale Archaeological Data

Some parts of my quest to assemble several different large-scale datasets are creeping along, some are moving forward nicely, and some are still on pause.
Picture
​In the "creeping along" department is the Eastern Woodlands Radiocarbon Compilation. My daughter did some work on the bibliography over the summer, so that was helpful. I'm still missing data from big chunks of the Southeast and Midwest. I've got some sources in mind to fill some of those gaps, and I've also got a list of co-conspirators. Our plan is to combine everything we've got ASAP and make it available ASAP.  I don't really have a timeline in mind for doing that, but for selfish reasons I'm going to try to make it sooner rather than later: I'm going to be using information from the radiocarbon compilation in the paper I'm going to give at this year's SEAC meeting in October. So . .  Georgia, Alabama, Arkansas, Missouri, Indiana, Illinois . . . I'll be coming for you.
Picture
I've got two undergraduate students working on processing the Larry Strong Collection, a large collection of artifacts (mostly chipped stone projectile points) from Allendale County, South Carolina.  Mr. Strong, who gathered the materials himself over the course of decades, donated the collection to SCIAA in the 1990's. Large surface collections such as this have significant research potential. I'm most interested in this collection for two reasons: (1) it provides a large sample of Kirk points from a single geographical area made from a single raw material, improving the possibility of teasing apart functional, stylistic, and temporal dimensions of variability (the large majority of 3D models of Kirk points I've produced so far have come from the Larry Strong collection for just this reason); (2) it provides a basis for making robust statements about the relative frequencies of various point types. When you have an n in the many thousands, you can have some confidence that the patterns you're seeing (such as drop in the numbers of points following the Kirk Horizon) are real. That will also factor into my SEAC paper. Curation of the Larry Strong collection is being funded by a grant from the Archaeological Research Trust.
Picture
Finally, in the "paused" category there is the Eastern Woodlands Household Archaeology Data Project. That effort has been on hold since early last year (I have money to support it and I had an assistant hired, but she moved on to a greener pasture). I'd really like to get this going again but I need to find someone who can work on it more-or-less independently. And I need a bit more office furniture and another computer. Hopefully I can get the EWHADP moving again after things stabilize with my new crop of employees and I have time to take a trip to the surplus building and see what I can scrape up.

Complex Systems Theory and Computer Modeling

Complex systems theory is what will make it possible to bridge the small and large scales of data that I'm collecting. Last year, I invested some effort into transferring my latest computer model (FN3_D_V3) into Repast Simphony and getting it working. I also started building a brand new, simpler model to look at equifinality issues associated with interpreting patterns of lithic transport (specifically to address the question of whether or not we can differentiate patterns of transport produced via group mobility, personal mobility between groups, and exchange).  

As it currently sits, the FN3_D_V3 model is mainly demographic, lacking a spatial component. Over the summer I used it to produce data relevant to understanding the minimum viable population (MVP) size of human groups. Those data, which I'm currently in the process of analyzing, suggest to me that the "magic number of 500" is probably much too large: I have yet to find evidence in my data that human populations limited to about 150 people are not demographically viable over spans of several hundred years even under constrained marriage rules. But I've just started the analysis, so we'll see. I submitted a paper on this topic years ago with a much cruder model and didn't have the stomach to attempt to use that model to address the reviewers' comments. I'm hoping to utilize much of the background and structure of that earlier paper and produce a new draft for submission quickly. I also plan to put the FN3_D_V3 code online here and at OpenABM.org once I get it cleaned up a bit. I also discuss this model in a paper in a new edited volume titled Uncertainty and Sensitivity Analysis in Archaeological Computational Modeling (edited by Marieka Brouwer Burg, Hans Peeters, and William Lovis). 
Picture
How big does a human population have to be to remain demographically viable over a long span of time? Perhaps not as big as we think. The numbers along the bottom axis code for marriage rules (which will be explained in the paper). Generally, the rules get more strict from left to right within each category: 2-0-1 basically means there are no rules, while 2-3-8 means that you are prohibited from marrying people within a certain genetic distance and are compelled to choose marriage partners from within certain "divisions" of the population.
It will be a relatively simple thing to use the FN3_D_V3 model in its non-spatial configuration to produce new data relevant to the Middle Paleolithic mortality issue I discussed at the SAA meetings a couple of years ago. I'm also going to be working toward putting the guts of the demographic model into a spatial context. That's going to take some time.
1 Comment

My American Antiquity Review of "Building the Past"

4/26/2016

0 Comments

 
Picture
American Antiquity is the flagship journal of archaeology in the United States. While I haven't yet published a research paper in it, I'm happy to say that I've got a book review in the current issue. Building the Past: Prehistoric Wooden Post Architecture in the Ohio Valley-Great Lakes is a collection of papers edited by Brian G. Redmond (Cleveland Museum of Natural History) and Robert A. Genheimer (Cincinnati Museum Center). The book contains a lot of useful data and some really interesting insights, and I enjoyed reading it.  I'll be going through the chapters again in detail when I finally get back around to working on the database for the Eastern Woodlands Household Archaeology Data Project.

0 Comments

EWHADP Cited, Pledge Fulfilled

4/11/2016

1 Comment

 
Shortly after I built the website for the Eastern Woodlands Household Archaeology Data Project (EWHADP) in the spring of 2014, I talked about the project to a workshop sponsored by the Digital Index of North American Archaeology (DINAA) (there's a link to that presentation here and some short blog posts about how my work fits into DINAA here and here). In that 2014 presentation, delivered while I was unemployed, I offered an incentive program that I hoped would spur interest in the EWHADP: beer to the first person who cited the database. With an uncertain future, I scaled my offer to whatever my employment situation happened to be at the time of citation:
Picture
At the SAA meetings last week, I met Jayur Mehta and bought him a beer.  Mehta cited the EWHADP database in a paper about the Carson site, a Mississippian site in Coahoma County, Mississippi. If I remember correctly, the paper is currently under review, so I won't say anything else about it at this point. I'm glad he was able to make use of the database, and I look forward to reading the paper.
PictureJayur Mehta enjoying his $6 Disney beer.
The alert reader will notice that Dr. Mehta is holding but a single plastic cup of beer in his hand, while my incentive program clearly shows that six beers will be awarded if I am employed in a non-tenure-track position, which I am.  I will point out that the cost of the beer he is holding, purchased in a Disney facility, actually exceeded the cost of a six-pack of PBR purchased in a normal part of the world (here is a link to $4.99 six pack of PBR on sale at Binny's Beverage Depot, the first place that came up when I Googled "PBR six pack cost").  So I think I fulfilled my pledge.

Mehta was unaware of my offer when he cited the database in his paper, suggesting that my incentive program was not behind his decision to use the EWHADP dataset.  Rather, it appears, he recognized the usefulness of the data for the project he was working on.  I hope that more people use the data. It's possible that someone else already has but I'm not aware of the citation - I'm not sure how I would ever know unless I see it or someone tells me about it.

The EWHADP has been largely dormant for a while now. It's stalled midway through an effort to re-code several of the key variable related to structure size and shape. I'm hoping to get the update done at some point this summer and get a newer, larger version online. I plan to use it myself for some research that will build on the 2013 JAA paper that spawned the original dataset.

If you cite the database, please let me know. I'll create a page for those citations on the EWHADP site. I'm interested to see how people use the data, and tracking and understanding use will help me to enlarge and improve the database.

1 Comment

EWHADP: Fresh Data (and Bumper Stickers) Available Soon

9/29/2015

0 Comments

 
I hope to be able to announce the latest iteration of the Eastern Woodlands Household Archaeology Data Project (EWHADP) database soon.  The last database release (containing information on over 2100 structures) was all the way back in March of 2014.  Gah!
Picture
​The project got off to a quick start in February of 2014, but stalled when I had to direct my energies elsewhere later that year. I spent the 2014-2015 academic year teaching a 4-4 at Grand Valley State University, and it was difficult to find time to do anything with the EWHADP other than teach look at the box of files and participate in a trial linking of the EWHADP database with DINAA.  GVSU undergraduate Emily Gilhooly did make some progress on the database, continuing the process of consulting the original records in order to re-code some fields and add new data to others.  

With donated cash in hand from a successful GoFundMe campaign, I was able to hire University of South Carolina doctoral student Laura Clifford to work on the project this semester.  Laura's first job is to finish the checking and re-coding of all the records currently in the database. She's working on that now.  When that task is complete, we'll make the new database available and she'll move on to the next job: adding new records.  That will involve tracking down leads from publications we've already seen as well as finding new sources of data in print publications and online. Eventually  I hope to give Laura the the keys to the EWHADP website so that she can write the "What's New" blurbs as she adds new structures, update the maps, and keep the online bibliography up to date. 

I would like to keep this project going next semester, but I'm not going to ask for more money until I have some results from this semester to demonstrate success.  

I would like to again thank those made a cash contribution to help get the EWHADP out of mothballs and running again: David Cusack, Ken Kosidlo, Josh Wells, and a donor who wishes to remain anonymous. In addition to my sincere gratitude, I have (as promised) a limited edition bumper sticker for each of you. And I owe you beers.

0 Comments

EWHADP Research Assistant Funded for Fall of 2015

5/5/2015

0 Comments

 
Picture
The GoFundMe campaign I set up to support a research assistant for the Eastern Woodlands Household Archaeology Data Project (EWHADP) has ended successfully! It got three donations early on (from David Cusack, Ken Kosidlo, and Josh Wells) and then a single person (who has requested anonymity) contributed the remaining funds yesterday.  That wasn't how I thought this project was going to get funded but it was a very pleasant surprise.  I'm really grateful to those that took in interest in the project this time around, and I think you'll be impressed how it moves forward with someone working on it steadily.  And think of how happy the lucky South Carolina graduate student I hire will be: he/she will get some good experience working with grey literature, databases, website management, and GIS, and will also be able to purchase some groceries. It's a win-win.  Thank you for your support, donors!

0 Comments

Crowdfunding the Eastern Woodlands Household Archaeology Data Project

4/23/2015

6 Comments

 
Picture
I started the Eastern Woodlands Household Archaeology Data Project (EWHADP) a little over a year ago.  The goal was/is to build a website that serves to assemble and freely distribute information about prehistoric house structures in eastern North America.  The current database contains information and county-level spatial data for 2130 prehistoric structures. I've started a campaign on GoFundMe to raise money to support a research assistant to work on the project for a semester. This post explains why.

As I learned when writing this paper,
much of the information about prehistoric houses in eastern North America resides in the so-called "gray literature" of CRM reports, theses, dissertations, and unpublished manuscripts.  I hoped that the EWHADP  would function as a magnet to identify information information locked up in the gray literature and make it known and available, allowing us as an archaeological community to capitalize on the work that's already been done.  What's the point of information stored in a publication that only a handful of people even know exists?  I really think we can do better than that, and we can save ourselves the wasted effort of repeated searches for the same information in the same stacks of legacy materials.

I was able to put a lot of time into the project to get it going, and as it sits now the website is functioning and is visited daily by people who make use of the information there.  I have no idea how much time I put into the endeavor (both to collect the original dataset and to get the website up and running), but it surely runs into the many hundreds of hours. 


With the demands of my job this year and other commitments, I haven't been able to devote any serious time to the EWHADP.  There was some forward progress this semester, however, thanks to the efforts of GVSU undergraduate student Emily Gilhooly.  She was able to spend a couple hours per week on the database, consulting primary sources and re-coding the information (primarily reclassifying structure shape and applying a finer chronological scheme).  For her trouble she got some experience that will hopefully be useful to her, and she'll be added as a contributor to the database when a new version is released.  Thanks Emily!  


Emily's work on the database gave me some insight into what it will take to get it fully updated.  She worked perhaps 25 hours and got through about 200 records (about 8 records per hour).  At that rate, it will take about 230 hours to get through the 1850 or so records that haven't been re-coded. Some records go faster than others, of course, and I'm hoping it will go faster rather than slower.  A few hours difference here or there won't change the reality, however, that a significant time commitment will be required to get the database ready for the next release.

I would love to have the EWHADP up and running in high gear again for a couple of different reasons: it's an important component of my research agenda for the job I'll be starting at South Carolina in August, and I know that a lot of archaeologists out there are using and will continue to use the information that is being assembled.  The EWHADP is also being knit into a larger effort to build an infrastructure of linked archaeologoical data in North America. None of the effort put into these kinds of projects is wasted when everyone can use it.

I've never done a GoFundMe campaign before, but I thought I'd give it a shot and see if it's a viable way to support something like this.  I'm l
ooking for funds to support a graduate student research assistant to bring the EWHADP database and the website up to where it should be (i.e., incorporating all the information I currently have in a clear, consistent format that is useful to others).  The goal of $3400 is based on a $12/hour rate for 280 hours (20 hours per week). 

I'll have some start-up funds at South Carolina that I could potentially use if this campaign falls short or doesn't work at all, but I thought this would be worth a try.  Projects like the EWHADP are on the ground floor of what is going to emerge as a new architecture for using our previously-collected archaeological data to address questions with big temporal and spatial scales.
The data collected by the EWHADP are, and always be, open access.  If I saw someone building a similar database that would add another component - radiocarbon dates, mortuary data, copper artifacts, etc. - I would support it.  I hope some of you will support the effort to continue to build this tool.

If you think that it's time we start really leveraging the archaeological information that we've spent untold dollars and person-hours collecting in this part of the county, please consider contributing to this project.

6 Comments

Linking the Eastern Woodlands Household Archaeology Data Project (EWHADP) Database to DINAA: Work in Progress

4/13/2015

0 Comments

 
In previous posts (here, here, and, most recently here), I have discussed what I see as the benefits of building a system of linking archaeological datasets together.  In February of 2014, I started the Eastern Woodlands Household Archaeology Data Project (EWHADP), an effort to assemble information about prehistoric residential structures in eastern North America.  I got drawn into the DINAA project through that and we've been working on building the architecture to link together independent archaeological datasets through DINAA (when I say "we" it's really "them" - I'm a participant but the DINAA people are doing 99% of the work). I haven't been able to spend much time this academic year on the EWHADP, but the people at DINAA have been forging ahead.  So I'm happy to report their progress.

I am third author on a poster that will be presented at the SAA meetings next week that will discuss what they've done to use DINAA to cross-link datasets:

  • Sarah Kansa, Eric Kansa, Andrew White, Stephen Yerka and David Anderson--DINAA and Bootstrapping Archaeology’s Information Ecosystem

The poster will be at session titled "The Afterlife of Archaeological Information: Use and Reuse of Digital Archaeological Data" on Thursday, April 16, from 6:00-8:00 pm in Grand Ballroom A. I can't be there, but many of the cool kids involved with the project will be, and you should go and talk to them. Linking together independent datasets is going to be a real game changer for archaeological research in this country, and these are the people that are making that happen.

We've done a "pilot" run linking the entries in the most recent published version of the EWHADP dataset to the entries in DINAA.  The electronic matching was not complete: several states remain to be included in DINAA and the attempt to link the datasets revealed some other issues that will need to be resolved (both on my end and their end).  That's exactly the point of doing this sort of thing, though: someone has to go first and figure it out.  I've created an entry in my Database section to provide an Excel file that contains the automatically-generated hyperlinks to site records in DINAA.  The interface from the DINAA end is here (it also references data from the Paleoindian Database of the Americas).

This step of engineering the first links is important. It is moving linked data from the realm of the hypothetical to the world of the actual. There is much work ahead to really get things knit together, but what they've done so far is not insignificant. I will be able to devote some time to the EWHADP after I'm moved down to South Carolina in the Fall. Stay tuned!
Picture
0 Comments

Big Steps, Baby Steps, and the Potential Power of Linked Data

8/7/2014

0 Comments

 
Picture
I just returned from several days at the DINAA (Digital Index of North American Archaeology) workshop that is happening at Indiana University South Bend this week.  The first two years of the DINAA project have focused on building a comprehensive, accessible database of archaeological sites in Eastern North America.  What the PIs and their core team (Josh Wells, David Anderson, Eric Kansa, Sarah Kansa, Steve Yerka, R. Carl DeMuth, Kelsey Noack Myers, and Thad Bissett) have accomplished to date is pretty remarkable:  in the pilot phase of the project, the team has assembled and made available primary data on over 340,000 recorded archaeological sites from ten states (with more data on the way).  This endeavor required navigating numerous technical, logistical, and political challenges.  What a great job they've done.  Bravo!

The DINAA project will benefit the archaeological community (and other constituencies) in a number of ways.  Some of those are obvious now, and some will become apparent as we become able to think in practical terms about the power and potential of a large, unified dataset that integrates and crosscuts the traditional (i.e., state-level) territories within which archaeological site data are managed. 


Picture
The benefit that I am most excited about is the potential of DINAA to act as a "bridge" among otherwise disconnected datasets. The key to this inter-linking is DINAA's granularity:  by making the archaeological site number the primary means by which information is organized, any datasets that reference a site number could be inter-linked through DINAA regardless of what primary information about the site is held by DINAA.  I'm compiling data on prehistoric house structures, for example, that could be linked, through DINAA, to other datasets using the key attribute of "site number."  Imagine being able to click on the site number associated with a structure in the Eastern Woodlands Household Archaeology Data Project (EWHADP) database and being led to a record in DINAA that provides links to a database of radiocarbon dates, or a spreadsheet of feature contents, or images from museum collections or field note archives, or bibliographic references for reports, dissertations, or academic papers that are also associated with that site number.  Or imagine being able to make a query for floral remains from Middle Woodland features within a 100 km radius of the site with that structure.  To say that that kind of inter-linking would be a powerful tool for research and scholarship is a great understatement.  As I argued in a presentation via Skype to the DINAA workshop that was held at the University of Tennessee in March, inter-linking of datasets would: (1) allow us to greatly expand the scale of questions we can address; (2) allow us to gather data for addressing those "big scale" questions much more efficiently; and (3) be a catalyst for developing and testing new interpretations of the past.  Engineering a system of links to connect diverse datasets would be a game changer.

At the workshop this week, we spent some time thinking about ways to actually accomplish such a linking.  The structure data I've been gathering is a good "test case" primarily because it spans the same area as the DINAA data and includes information from many sites (and is also open and freely accessible).  Information needs to flow both ways.  First, when a site record is called up in DINAA, it should be able to make a call to the EWHADP (and any other linked datasets) to see if there are records associated with that site number.  Second, site/structure records in the EWHADP should include a pointer to the appropriate record in DINAA.  The second part is simple: since the URLs associated with site records in DINAA are "stable," I can just put hyperlinks in my database that point to the DINAA record.  The second part is a little trickier.  Our first step was to put the current (as of March 2014) EWHADP database on GitHub (here) so that it would be open and accessible.  GitHub automatically tracks when changes are made to a file.  The next step (I think) will be to configure the GitHub page so that it sends a message to DINAA when the dataset changes.  This will allow the records in DINAA to be updated as new records are added to the EWHADP. 

It is the linking mechanisms that are important. 
The EWHADP data do not become a "part" of DINAA but are simply referenced by DINAA.  I maintain control and responsibility for the EWHADP part of the equation.  This is important because compilation of that dataset is ongoing:  it's a database of records that I'm still collecting rather than something like a static set of measurements that relate to a single assemblage or site.  I hope others who are developing similar datasets think about how we might link them all together through DINAA.  The corpus of scholarship relevant to archaeological work in this part of the world is simply too large and diverse to live in a single place.  A distributed approach using a "bridge" such as DINAA to link datasets is going to be much more effective and useful than trying to house everything in one central place.

We all have a lot to gain by supporting the construction of this tool.  It is going to unleash the potential energy stored in the work we've already done and provide a "living" structure that will significantly increase our capacities to find, share, utilize, and build on archaeological information. 
The DINAA project needs to move forward in a big way.

0 Comments

    All views expressed in my blog posts are my own. The views of those that comment are their own. That's how it works.

    I reserve the right to take down comments that I deem to be defamatory or harassing. 

    Andy White

    Follow me on Twitter: @Andrew_A_White

    Email me: andy.white.zpm@gmail.com

    Enter your email address:

    Delivered by FeedBurner


    Picture

    Sick of the woo?  Want to help keep honest and open dialogue about pseudo-archaeology on the internet? Please consider contributing to Woo War Two.
    Picture

    Follow updates on posts related to giants on the Modern Mythology of Giants page on Facebook.

    Archives

    January 2023
    January 2022
    November 2021
    September 2021
    August 2021
    March 2021
    June 2020
    April 2020
    March 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    May 2019
    April 2019
    January 2019
    December 2018
    November 2018
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    May 2018
    April 2018
    March 2018
    February 2018
    January 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    December 2016
    November 2016
    October 2016
    September 2016
    August 2016
    July 2016
    June 2016
    May 2016
    April 2016
    March 2016
    February 2016
    January 2016
    December 2015
    November 2015
    October 2015
    September 2015
    August 2015
    July 2015
    June 2015
    May 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014
    November 2014
    September 2014
    August 2014
    June 2014
    May 2014
    April 2014
    March 2014

    Categories

    All
    3D Models
    AAA
    Adena
    Afrocentrism
    Agent Based Modeling
    Agent-based Modeling
    Aircraft
    Alabama
    Aliens
    Ancient Artifact Preservation Society
    Androgynous Fish Gods
    ANTH 227
    ANTH 291
    ANTH 322
    Anthropology History
    Anunnaki
    Appalachia
    Archaeology
    Ardipithecus
    Art
    Atlantis
    Australia
    Australopithecines
    Aviation History
    Bigfoot
    Birds
    Boas
    Book Of Mormon
    Broad River Archaeological Field School
    Bronze Age
    Caribou
    Carolina Bays
    Ceramics
    China
    Clovis
    Complexity
    Copper Culture
    Cotton Mather
    COVID-19
    Creationism
    Croatia
    Crow
    Demography
    Denisovans
    Diffusionism
    DINAA
    Dinosaurs
    Dirt Dance Floor
    Double Rows Of Teeth
    Dragonflies
    Early Archaic
    Early Woodland
    Earthworks
    Eastern Woodlands
    Eastern Woodlands Household Archaeology Data Project
    Education
    Egypt
    Europe
    Evolution
    Ewhadp
    Fake Hercules Swords
    Fetal Head Molding
    Field School
    Film
    Florida
    Forbidden Archaeology
    Forbidden History
    Four Field Anthropology
    Four-field Anthropology
    France
    Genetics
    Genus Homo
    Geology
    Geometry
    Geophysics
    Georgia
    Giants
    Giants Of Olden Times
    Gigantism
    Gigantopithecus
    Graham Hancock
    Grand Valley State
    Great Lakes
    Hollow Earth
    Homo Erectus
    Hunter Gatherers
    Hunter-gatherers
    Illinois
    India
    Indiana
    Indonesia
    Iowa
    Iraq
    Israel
    Jim Vieira
    Jobs
    Kensington Rune Stone
    Kentucky
    Kirk Project
    Late Archaic
    Lemuria
    Lithic Raw Materials
    Lithics
    Lizard Man
    Lomekwi
    Lost Continents
    Mack
    Mammoths
    Mastodons
    Maya
    Megafauna
    Megaliths
    Mesolithic
    Michigan
    Middle Archaic
    Middle Pleistocene
    Middle Woodland
    Midwest
    Minnesota
    Mississippi
    Mississippian
    Missouri
    Modeling
    Morphometric
    Mound Builder Myth
    Mu
    Music
    Nazis
    Neandertals
    Near East
    Nephilim
    Nevada
    New Mexico
    Newspapers
    New York
    North Carolina
    Oahspe
    Oak Island
    Obstetrics
    Ohio
    Ohio Valley
    Oldowan
    Olmec
    Open Data
    Paleoindian
    Paleolithic
    Pilumgate
    Pleistocene
    Pliocene
    Pre Clovis
    Pre-Clovis
    Prehistoric Families
    Pseudo Science
    Pseudo-science
    Radiocarbon
    Reality Check
    Rome
    Russia
    SAA
    Sardinia
    SCIAA
    Science
    Scientific Racism
    Sculpture
    SEAC
    Search For The Lost Giants
    Sexual Dimorphism
    Sitchin
    Social Complexity
    Social Networks
    Solutrean Hypothesis
    South Africa
    South America
    South Carolina
    Southeast
    Stone Holes
    Subsistence
    Swordgate
    Teaching
    Technology
    Teeth
    Television
    Tennessee
    Texas
    Topper
    Travel
    Travel Diaries
    Vaccines
    Washington
    Whatzit
    White Supremacists
    Wisconsin
    Woo War Two
    World War I
    World War II
    Writing
    Younger Dryas

    RSS Feed

    Picture
Proudly powered by Weebly