We have had a lot of feedback and interest about our blog post from a couple of weeks ago and the issue of the dating of early modern humans at the site of Ksar Akil, so we thought we would look a bit more at the recent modeling in the PNAS paper of Bosch et al. We should say from the outset that we are hoping to write a reply to the paper but after discovering that PNAS will allow only 500 words and no images, we are thinking that it will have to be somewhere else!! In the meantime we think that it is important to discuss our thoughts now and that if there is a problem with a paper, people should know about it sooner rather than later. Some people will say why not wait, publish a response and go through peer review? Why write up something on a blog like this? Well, our feeling is that waiting months and months to submit to go through a review process is not good enough, especially since that review process is often so poor (as in the PNAS case, where we are told that 'a radiocarbon expert' was involved in the reviewing, but clearly did not know anything about modelling). There is a very interesting discussion at the moment over the merits of publishing online responses to papers where something is seriously wrong and which undermines the conclusions of the paper. We will post more on this later. But back to the Ksar Akil.
So, we have decided that we are going to go through some of the models that we have been looking at over the last few days to see what they can shed on this important question of early humans and their date in this region of the Near East. The first thing to say is that Ksar Akil is a very complex site. According to our colleague Chris Bergman, who knows the site better than pretty much anyone, Tixier once said in Lyons, in a fit of pique, "speak about what you know." Tixier also commented to Chris that, "Ksar Akil is not for beginners". Lots of people talk about the site, but have no knowledge about it or cite secondary sources that are clearly erroneous. So we have to tread very carefully as far as this site is concerned; it was excavated a long time ago and there are known problems and unknown problems. At the risk of veering into a Donald Rumsfeld soliloquy, however, lets move on to discuss the Bayesian modelling of new data from the site.
[Incidentally, for those of you that are perhaps not so interested by the Bayesian side of this and want to just cut to the chase, our best estimate of the age of the human remains and of humans in the Near East, you should go to the last section of this blog].
In the first figure just below we show the final age model of Bosch et al. It looks pretty nice, with a seemingly good stratigraphic sequence of dates running from the bottom to the top. In this case the bottom represents not quite the earliest Initial Upper Palaeolithic (IUP) layers, but rather the start of the dated IUP, ie where they have got their lowest dates from. (For those of you not familiar with models like this, what you see are two different types of distribution. The black outlined distributions are the so-called 'posterior' distributions. These are the distributions that we get after the mathematical modeling has been done. The other ones, the lighter outlines, are the so-called 'likelihoods', basically the calibrated radiocarbon age ranges. These are the data before the modeling takes place. The basis of Bayesian modeling is that you analyse the likelihoods with respect to 'prior information', producing new posterior distributions. For archaeologists most of the time prior information means the relative archaeological sequence data, the layers in the site which are laid down one atop the other. Other information can be used too, more refined stratigraphic information, other dating methods, volcanic ash, numismatic evidence, and lots more, but for the Palaeolithic there is usually only the stratigraphic sequence information).
Anyway, the other thing you can see in the figure is the Lab code (eg GrA-57544, GrA meaning Groningen accelerator) and a bracket to the right of it showing something like "(O:2/5)". These are the outlier analysis prior and posterior values and you can read all about this here. Put simply, and without going into too much detail, outlier analysis is a neat way of allowing OxCal to find the most likely outliers within a sequence model like this, and there are a number of tools in the programme to do this depending on the model. The 'prior' outlier value, set by the user before modelling starts, is the second of the two numbers (the 5 in the first date of the model). This is the value we give to the likelihood being an outlier (as a %) and is usually set at 5% by OxCal. It can be changed by the user, if they think there might be a problem with a particular date but you have to be careful with this, and have a reason for doing so. The first value is the 'posterior' outlier value. This is what the model tells us is the probability of the date being an outlier in the model. You can see that some of these are small; 1% and others are large; 100%. In terms of the modeling itself, a value of 100% means that the model is not including this distribution 100% of the time in the analysis. For a 5% outlier, the result is including in the calculation 95% of the time.
In this Figure on the left you can see the outlier data results from the Bosch et al model 1. Remember the prior is set by the user and the posterior is the OxCal output value. This shows that Bosch et al. have increased the values for certain of the results to decrease their impact on the model. The little black lines in these lines are the users prior outlier setting, so you can see that Bosch et al. have set some of the radiocarbon likelihoods as 50% or even 100% likely to be outliers.
How is this done? How do they select which dates are outliers and which are not?
In their Supplementary methods the authors say : Initially, all measurements were given a prior outlier probability of 0.05. In subsequent steps, dates that were in poor agreement with the model resulting from the previous step were given increasingly higher prior outlier probabilities until the resulting Bayesian model and individual measurements had acceptable agreement indices (i.e., A_model and A_ind >60).
This is a mistaken approach. With OxCal it is recommended practise to use either the Agreement index or the Outlier detection method to identify outliers and consider the robusticity of the model. In the case of agreement indices, the idea is that you run the model, look at results which have low agreement and then steadily 'question' them (basically by removing them) in the model, until you have a model with an agreement index that is more than 60%. You don't look at the outlier model results, you don't actually need to even include outliers in the model when using agreement indices. In the case of outlier analysis, however, there is a subtle difference in that samples are progressively down-weighted as the model runs as they are recognised as being more likely to be outliers. Outlier analysis models are essentially an average between a model in which the measurement is accepted and one in which it is rejected. At the end of the analysis the model tells you a posteriori which results were outlying and by how much. That's why we tend to prefer this approach, it is less subjective in many respects. So, looking at the agreement indices as being a meaningful way of telling you something about the model while using outlier detection analysis is not good practice.
If we start again with the model of Bosch et al. and run it as they would have run it at the start, with all outlier priors set at 0.05 then we get this model shown in the figure below:
You can see that the mid-section of the model has a lot of probability distributions that are bi-model (two humps). In this case this means that the model is having difficulty deciding exactly where the posterior values should go. The mathematical algorithm is having problems converging on a value, so there are two ranges and a lack of precision. We call this 'poor convergence', and you can see the convergence values in the Table of results in OxCal when you use it under the column headed 'C'. Values under about 95% indicate problems. Many of the boundaries (the distributions marking the end and start of phases in the model) in this model are slow to converge, and have low values. This is a warning sign in a Bayesian model that one should tread carefully. There is a problem somewhere in the model, either the dates are a bit wrong or the prior sequence is not right, or the model is not set up right. Either way, it's saying take care. To us it seems clear that the convergence problem is the wide variation in the dates.
The outliers detected in this analysis are shown below, you can see several in the mid-section of the model are quite high, but they are downweighted in the model and approximate what you would get were you to look at the results which have been removed due to low agreement indices.
So, what to do next? Well, probably this would have been the place for Bosch et al. to have stopped and thought carefully about the results and the low convergence issue described earlier. However, they used this combination of outliers and agreement indexes mentioned earlier to raise the prior outlier values and increasingly refine the precision of their model. Before talking too much more about that, however, it is worthwhile looking at another problem there is in their approach.
It is very important that the model is set up in a way that reflects the archaeological interpretion that has been made. In the case of the so-called Phases used in these models, this means that you need to take into account the assumption that there is a 'start' and an 'end' to the activity that is happening in each phase. A phase is simply a group of results that are assumed to have no relative sequencing in them. They are unordered. Intuitively the idea of a start and end date to a phase feels right, we are simply saying that the events we are considering are likely to come from a finite period of time with a start and an end. In the case of OxCal these start and ends are called 'boundaries' and they assume a lot of importance in modeling. This is mathematically described here. Without a boundary, posterior distributions can be unrealistic and stretch the resultant distributions. In most parts of the Bosch et al. model there are no boundaries included. Between levels XIX, XVIII, XVII and XVI, for example, there are no boundaries, they are only included for the start and end of the dated IUP and start and end of the Ahmarian and UP. There are also no boundaries either side of the Sequence command which should also be there. The importance of the boundaries can be demonstrated when we look at the way Bosch et al. sensitivity tested their model by varying some of the parameters.
Sensitivity testing is simply a way of looking at how reproducible your model is. We try and vary the priors to see what effects they have and run the models multiple times to look at repeatability. In the EUP sections there are large differences between their models, one of which is caused by a boundary that they put between XIX and XVIII to simulate a gap between the two in the second of their two models. You can see this quite clearly in their two published models:
It's easy to spot the difference between the two; model 1 on the left and model 2 on the right. Simply adding this break or boundary in the model at one place produces a very different result. This tells us that the model is quite sensitive to the priors. Again, this cautions us to be wary.
One interesting effect of a model that has a lot of outliers and is not set up properly that that they exhibit a tendency to shift around when some of the parameters are changed. When we set up the model of Bosch et al. properly, with boundaries included in the right places, the model simply does not run. This is because OxCal cannot effectively order the data due to the wide variation in the radiocarbon results. In cases like this you've either got to give up or find ways to make the model run, for example by relaxing some of the model priors, reducing the resolution or making the boundaries more uniform and less constraining. There's no space to go into this now but over the next week we will try and work on that some more.
One of things we originally critiqued in our previous blog post was the lack of any constraints in the sections to the early end of the model, where the estimated Date of the key fossil of Etheruda is placed. In the models you can see the long-tailed distribution of the estimate for the age of Ethelruda. We suggested this was wrong and misleading, because the Date command is not constrained and unrealistically spreads earlier than it probably should. The main conclusions of the paper are largely based on the information that this distribution is held to provide, namely, the early date for modern humans before 45,900 cal BP in the Near East. So how can we go about improving on this? Well one thing we can do is to introduce additional chronometric information to the model. Douka et al. published a suite of dates from the site in 2013 in PLoS One. These are mentioned in the Bosch et al. paper but all of the data is ignored in their analysis, despite the fact that the shell dates were obtained using exactly the same methods as the Bosch et al. results on shell, albeit in different labs (the Douka Ksar Akil results were not CarDS analysed as they suggest). Leaving this issue aside for the time being, there are 2 very important determinations in the Douka et al. work that are worth looking at again. These are the 2 that come from the upper Mousterian levels at the site. Bosch et al. didn't date anything in this part of the sequence, their earliest dates are from Layer XXII. We know for sure that the shells from the Mousterian are from here because they had the remains of red clay on them, which is the distinct sediment matrix for this part of the sequence, and on one of the shells was written "XXVIIIA sq. F5 depth 16.55-16.75". This seems pretty good in terms of its context. The previous Mousterian dates, discussed by Bosch et al., are dates of sediment, from an unknown exact context and reported differently in two publications. They chose not to include them in their model, probably for the best.
So, we can use these two Oxford Mousterian determinations to help constrain the age estimate of Etheruda and avoid the unhelpful tail on the age estimate. They seem to us to be good dates from a reliable context that definitely predates Ethelruda. We used the final age model (model 1) of Bosch et al. and added these Mousterian results, to see what would happen. Here is that model :
Essentially the only difference here between their Model 1 and this one is the addition of the two Mousterian dates at the base. You can see the effect on the date estimate for Ethelruda. The two Oxford Mousterian dates have quite high outlier probabilities, of 45 and 36%, so they are downweighted by this amount in the model. They look to be marginally too young in this sense. Again, there are many outliers in the model overall (8 out of 14) so we need to be cautious. The overall effect is fairly dramatic with the inclusion of the new date, with the distribution function calculated for Ethelruda now sitting between 44,380—42,880 cal BP at 95.4% probability rather than >45,900 cal BP as Bosch et al. had it.
In this figure below you can see the comparison rather more clearly.
This analysis shows once more that the model is sensitive to small and subtle changes, and therefore one must be careful with over interpreting the results. In this case it is risky to use these date estimates too much, especially when the estimates are constrained by only a small number of determinations above and below the fossil, and in the case of the Bosch et al. paper by no determinations below the date estimate.
In summary, then, it looks to us as though the original Bosch et al. date range for Etheruda is not as old as >45,900 cal BP. One thing is clear, though, there does seem to be an awful lot of scatter in the data from the site and a high number of outliers. Quite why this is, however, is probably a subject for another post or paper. Our view is that it's to do not with pretreatment chemistry (identical methods are used in both labs), or sample selection, but with issues of curation and complexities in the excavation and site, as alluded to at the beginning. What seems clear to us simply on the basis of the Bayesian modeling is that the claim by Bosch et al. that Ethelruda pre-dates to 45,900 cal BP is not supportable. In the absence of constraints lower in the model it might have been wiser simply to say that Ethelruda couldn't be dated. At the very least it should have been stated that Ethelruda's age was equivalent to, or earlier than, the boundary of the 'start dated IUP' at 45,900—43,210 cal BP. It looks to us as though the real age is younger, when we include constraining data from the Mousterian in the model.
We will keep looking at alternative age models to see whether we can make the model we have set up run in the meantime. It is worthwhile to note finally that there is no one 'right' model, all models are to some extent wrong - but some are useful, to paraphrase Box (1979). In the case of the site of Ksar Akil we are trying to date when different events happened using shell carbonates for dating, because there is no collagen in any of the bones and almost no charcoal. Under ideal circumstances this ought to be a good substrate for AMS dating. In this case there is a lot of variation in the data, and that seems to be telling us something. More on that soon.