Friday, December 6, 2013

Does a baby's eye gaze really predict future autism?

Note: Due to the considerable interest in this post, I've uploaded a PDF version at This can be cited as:
Brock, Jon (2013): Does a baby's eye gaze really predict future autism?. figshare. 

Baby's gaze may signal autism, study finds. That was the headline in the New York Times. The BBC declared that Autism signs present in first months of life. Turning the hype up to 11, a Canadian website boldly announced that Researchers prove that autism can be diagnosed right at the infant stage and that intervention is possible.

Nature, the journal that published the study, ran with Autism symptoms seen in babies, summarising the findings thus:
Children with autism make less eye contact than others of the same age, an indicator that is used to diagnose the developmental disorder after the age of two years. But a paper published today in Nature reports that infants as young as two months can display signs of this condition, the earliest detection of autism symptoms yet.
Certainly, being able to identify infants who were likely to develop autism would be a ground breaking advance, opening up the possibility of very early diagnosis and intervention. It would also allow researchers to study the very earliest stages of autism development.

But, as with many studies that receive the full media treatment, there are caveats a-plenty. In fact, it could be argued that the results show the exact opposite of what the authors and the media coverage has suggested.

The study was conducted by Warren Jones and Ami Klin from Emory University. Back in 2002, Klin and colleagues published a study showing that adolescents with autism spent less time looking at the eyes of people in a movie clip than did typically developing adolescents. Although, like all things autism, this seems to be true of some but not all people with a diagnosis.

Since then, Klin and Jones have reported similar results in two-year-old children with autism. Their new study was an attempt to push that all the way back to the very earliest months of life.

Jones and Klin began with a sample of 64 infant boys, 38 of whom had an older sibling with autism, putting them at increased risk of having autism themselves [1]. They also tested 46 girls but later excluded all of them from the analyses [2].

At various time points between the ages of 2 months and 2 years, the infants were eye-tracked as they viewed short video clips of a female caregiver’s face and upper body.

Sample scanpaths for a baby later diagnosed with autism (red) and a typically developing control (blue). Reprinted by permission from Macmillan Publishers Ltd: NATURE, doi:10.1038/nature12715, copyright 2013

Then, at 3 years of age, the by-now toddlers were assessed for autism [3]. 11 boys (10 from the high risk group) were identified as having an autism spectrum disorder (ASD). They were then compared with the remaining 25 typically developing (TD) boys from the low-risk group. 

The 64 boys were divided into 4 groups at their 3-year assessment: ASD; low risk typically developing (TD); high-risk with some autism symptoms (BAP); and high-risk with no autism symptoms (no-Dx). Forty-six girls were also tested but were not included in the reported analyses.

Jones and Klin began their analyses by building developmental trajectories for the two groups, similar to the growth charts that doctors use to tell, for example, whether a baby is putting on sufficient weight or not. Except here, what mattered was the percentage of time the babies were looking at the eyes in the video.

What they found was that the ASD boys showed a steady decline in eye gaze across time. However, there were no significant differences between the trajectories of the two groups until the final test session at 24 months [4].

Reprinted by permission from Macmillan Publishers Ltd: NATURE, doi:10.1038/nature12715, copyright 2013

The obvious interpretation of these data is that, in fact, eye gaze in infancy does not predict which infants will go on to develop autism (at least using Jones and Klin's set-up). Both from a practical and a theoretical point of view, that's an important finding. [5]

So how do we end up with "Baby's eye gaze signals autism"?

Having failed to find evidence for reduced eye gaze in ASD infants, Jones and Klin looked instead at the slopes of the developmental trajectories. In other words, not the amount of eye gaze at a particular time but the change in eye gaze relative to earlier and later time points. Here, they did find significant differences throughout the early months.

Reprinted by permission from
Macmillan Publishers Ltd: NATURE,
copyright 2013
When the analyses were restricted to the data from 2 to 6 months, the boys who developed autism showed a negative slope (declining eye gaze) while the low-risk boys had a positive slope (increasing eye gaze).

This, in essence, is what all the excitement is about. The study suggests that if you measure a baby's eye gaze at multiple time points before the age of 6 months and notice a decline over time, then that baby is at heightened risk for developing autism. It doesn't matter how awkward the data collection process or convoluted the analysis, there's information about autism in a babies' eye gaze.

However, there was always something about this story that didn't quite add up for me - and it's taken a while to put my finger on it. But before I get to that, there are a few other more obvious points that need to be mentioned.

First of all, the final sample size is very small. This is understandable because, even with a high risk sample, you need to test a lot of babies just to get a dozen or so who develop ASD. But it doesn't change the fact that a study with only 11 participants in one group has to be considered preliminary at best. It wouldn't have taken much to get wildly different results and we really need a replication of this before we get excited.

Second, the most useful comparison would be between the high risk children who develop autism and the high risk children who don't. That's because, in practice, it's very unlikely that such an eye-tracking measure would be rolled out as a universal screening measure. Not only would that be extremely expensive and time-consuming (especially if babies had to be tested on multiple occasions under highly controlled conditions to work out the slope of their trajectory), it would also throw up a huge number of false positives - babies who the eye-tracking test said were likely to develop autism but were never really at risk. So, in reality, only babies with a family history of autism would be tested.  Unfortunately, Jones and Klin don't report this direct comparison [6].

But the biggest issue for me is this: The claim is that babies who develop ASD start off with typical eye gaze but it's the decline in eye gaze in the first 6 months that is the signal of impending autism. However, if the ASD babies start off at the same level as the TD babies then there should also be a significant difference in the amount of eye gaze at 6 months. In fact, the two groups are almost identical in terms of their eye gaze at 6 months. And the only reason there's a difference in slope is that the ASD babies actually start off with greater eye gaze than the TD babies [7]. 

It would be incredibly interesting if this were true - if boys who go on to develop autism make more eye contact at 2 months than is typical. However, we have to be very careful because, despite the media coverage focusing on the fact that the babies were tested from 2 months, there is in fact very little data from 2-month-olds in either group.

How do we know this?

Figure 1d in the paper includes the data from a single ASD baby. Each dot corresponds to the amount of time spent gazing at the caregiver's eyes in a single video clip. There are I think 16 data points for this baby at 2 months.

Matching this up with Supplementary Figure 2b, which combines the data from all the ASD babies, we can see that there are a total of 24 data points. This means that the remaining 10 babies provided just 8 data points between them at 2 months.  this one boy provides the majority of the data points at 2 months. This in turn suggests that there are only two, perhaps three boys with data at 2 months .

It could be that these 8 points all come from one baby. It could also be that they come from lots of babies all providing one data point but, given how much our first baby's eye gaze varies from video clip to video clip, this is going to be an extremely unreliable measure of eye gaze [8]

Adapted by permission from Macmillan Publishers Ltd: NATURE, doi:10.1038/nature12715 copyright 2013

Extended Data Figure 9 in the paper shows the trajectories excluding the data from the recording at 2 months. With the trajectories now beginning at 3 months, there is hardly any difference between the two groups before 6 months. In other words, everything hangs on the rather sketchy data from the babies at 2 months.

Here I've overlaid Extended Data Figure 9a (TD in blue) and 9b (ASD in red) and faded out the irrelevant data, leaving the trajectories for eye gaze based on data from 3 to 24 months (ie excluding data collected at 2 months). Adapted by permission from Macmillan Publishers Ltd: NATURE, doi:10.1038/nature12715 copyright 2013

So, to sum up, it's far too early to be declaring that baby's eye gaze predicts future autism.  Given that I've been pulling the paper apart, I probably haven't emphasised enough quite how ground-breaking and sophisticated Jones and Klin's study is. It's an extremely impressive effort to answer an important question. It suggests that, contrary to expectations, there is in fact very little difference between babies who do and don't develop autism in terms of their eye gaze (at least when the eyes are on a computer screen). This is something that appears to emerge very gradually over the first two years of life.

Future studies beginning even earlier and having even larger sample sizes may uncover evidence of atypical eye gaze in infancy - and there's a tantalising and intriguing suggestion that ASD babies may actually begin life with excessive eye gaze. [9]

However, not for the first time - and certainly not the last - the headlines obscure a far messier reality.


Jones W, & Klin A (2013). Attention to eyes is present but in decline in 2-6-month-old infants later diagnosed with autism. Nature PMID: 24196715


Some great comments below the line. See also the discussion thread on reddit started by Noah Gray, one of the Nature editors, and the comments on the Thinking Person's Guide to Autism Facebook page.


1. It's usually the case in these studies that "low-risk" infants have older non-autistic siblings but this isn't explicitly stated.

2. It's not clear why the authors went to the trouble of testing 46 girls for 3 years only to exclude them later. The rationale given was that only 2 developed ASD, but this should have been expected given that autism is much less common in girls. My suspicion is that reviewers made them take the girls out of the analyses - but this is another reason why having an open peer review process is important.

Update 09/12/13: Ami Klin has informed me that more girls are being tested and their data will be analysed when the sample size is sufficiently large.

3. The ADOS was used as a severity measure in one analysis so was presumably included as part of the assessment. Beyond that, I can't find anywhere in the paper that actually says how the the authors determined which kids had autism.

Update 09/12/13: My mistake here. Although in my defence Nature doesn't make things easy by having three variations on the Methods section, each of increasing level of detail. Full details of diagnosis are buried in the Supplementary Information (which is separate from the Methods supplement, which is where I was searching for diagnostic information). In short, diagnosis was a clinical best estimate based on all available information, including ADI-R and ADOS scores (and direct or recorded observation of the ADOS session), language and cognitive assessments, history, and "any other relevant information". Diagnoses were made by independently by two experienced clinicians and positive diagnoses were reviewed by a third clinician.

4. This is based on the lower panel of Figure 2e

5. It's certainly possible that eye gaze discriminates between ASD and TD infants earlier than the 24 months reported and that the lack of statistical significant is just a reflection of the small sample sizes. With more babies tested, perhaps there would be a significant reduction much earlier. But that is speculation based on data that we currently don't have.

6. The high risk kids who didn't meet the ASD criteria were divided further into two subgroups - those who had no diagnosis (noDx) and those were had the "broader autism phenotype" (ie some autism symptoms) but didn't meet the full ASD criteria (BAP). The noDx group appear to resemble the TD group (at least in terms of their trajectory between 2 and 6 months) while the BAP group are intermediate between the TD and ASD groups. Unfortunately, nowhere does the paper say how the BAP was defined or whether the cut-offs were decided upon before the study started.

Update 09/12/13: Again, there were in fact details provided in the Supplementary Information. BAP infants were those for whom there were clinical concerns documented at any one of the clinical assessments. They differed significantly from the noDx group in terms of their ADOS Total scores. It would still have been useful to include a direct comparison between the high-risk babies who did develop ASD and those who didn't.

7. Based on the lower panel of Figure 2e this increase in eye gaze is statistically significant.

8. It seems as though one of the reviewers spotted this issue because the supplement to the paper includes a re-analysis looking at just the data from 3 to 6 months. This suggests that there was still a difference in the slope between the two groups , but this was barely significant. This is based on the Receive Operating Curves (ROCs) in Extended Data Figure 9d, where the 95% confidence intervals include chance at low levels of sensitivity (high specificity) but not a high levels (low specificity).

Update 09/12/13: I hadn't noticed that Extended Data Figure 9a has the trajectories based on data from 3 to 24 months. With the data from 2 months excluded, the early trajectories look very similar. For the ASD group, there's a drop from about 47.5% eye gaze at 3 months to 45% eye gaze at 6 months. The TD group are steady at about 45% throughout. If there is a real difference, it's very subtle.

9. By this I mean "more than is typical".


  1. I think that it would be amazing I we could diagnose Autism this early in a baby's life but there is the fear of miss diagnosing a baby. In psychology 101 class that I am currently in we learn about babies that babies take months for a baby to understand their environment and the people around them. babies nee to get a feel on how the world works. I think babies should wait till the rain is more fully developed when a baby is born. This is so the baby can be diagnosis correct.

  2. I think that this a really interesting way to think about the diagnosis of autism. I think that some of the ideas and some of the studies were a little skeptical. However, i agree that there is something that can be said with this research. I think that we need to further look and determine whether or not this is a good way to try to find autism with someone. At the end of the day everyone is just trying to look for the fastest and easiest way to diagnose a person with autism. If this is the quickest and most efficient then i think that we should go with it.

  3. Thanks Jon for very incisive commentary. One thing that still puzzles me, though. You argue that most data points at 2 mo are coming from one baby. I had assumed there'd been just one testing session per child. But if there were several (with number varying from child to child) how were the data managed for the regression? Did they compute a mean for each baby? With your comparison of fig 2d and the supplementary figure, I wondered if you were saying that they actually threw in all data points, ignoring the fact that they weren't independent? I assume that is not right, as it would be such a basic flaw, but it would be good if you could clarify. Thanks.

  4. Thanks Dorothy. Very good question - and there aren't really any clues in the paper for those (like me) who aren't familiar with the analyses they conducted.

    My assumption is that the FDA and PACE analyses are similar to the mixed random effects analyses I've done in my own studies, where you feed into the analysis each individual observation (rather than the average of all the subject's data points) and the non-independence is taken into account by having "crossed" random factors for item and subject at the same time.

    That may not be exactly how it works but I'm assuming that the analyses are able to deal with non-independence at a particular time point as well as non-independence across time points that you inevitably get in any longitudinal study.

    But it does raise another interesting issue. The videos were drawn pseudo-randomly from a sample of 35 at each time point. It's not clear whether that means the same pseudo-random order for each participant (and if so, what happened when a baby missed a testing session). Reading between the lines, I think that each baby got a different random selection.

    Looking at Figure 1, with the four babies' data, there's a lot of variation from trial to trial even with the same baby at the same time point. How much of this is just noise, and how much is systematic variation across videos isn't clear. If the analyses take this into account (by having the identity of the video as a random factor) then I wouldn't be too worried. If they didn't then everything could just come down to the random selection of videos for the first testing session for the four or five babies who were actually tested at 2 months.

  5. Thank you for your review of this study; it's very helpful! You've read the British studies on gaze aversion (specifically from faces) as a cognitive load management strategy which is similar (and is used for similar purposes) in autistic, typical, and Williams test subjects?

  6. This comment has been removed by a blog administrator.

  7. I think we need to realize that all children deserve better treatment. It is not important to diagnose any child that has or may have Autism, and provide them with different treatment. Stop devaluing disabled children by giving them drugs that are not safe enough for other children. Stop devaluing normal children by denying them helpful accommodations.

  8. Individuals with ASD have also been shown to have a preference for when sound and motion are in sync. This draws them away from looking at eyes. When they watch a speaker's mouth, sound and motion are in sync.

  9. Similar results in eye gaze tests have been seen in unaffected parents and siblings.Autism researchers great failure is in not looking at the parents and siblings which gives greater insight into understanding the complexity of autism

  10. Great post. You should consider reformulating the main points and posting them to Pubmed commons. This will probably prompt a discussion with the authors which should be interesting. And most importantly, everybody who finds the paper on Pubmed will have access to your comments!

  11. Thank you for this incredibly detailed and subtle analysis of the data. IMO, this isn't the first time this lab has produced impressive data, with questionable interpretations ( Interesting to see how slowly eye gaze differences develop over the first 2 years--would love to see some replications! And, if the finding pans out, it'd be interesting to see what characteristics of the child and their visual environment influence these changes for individual kids.

  12. One thing to add to this already great discussion. Take a look at the background of the video frames above - they are perceptually extremely salient. This may mean that what is being measured with this stimulus is not per se a preference for a face, but the ability to attend consistently to faces in the presence of salient competitors. There are some good reasons to suspect this ability is changing during the period being measured (e.g. our work on salience vs. faces, Frank, Vul, & Johnson, 2009, though that would predict an increase in face looking in the TD group). It might well be that these attentional processes are developing differently in the ASD group.

    1. Thanks Michael. In fact Figure 2h shows that the group differences in eye gaze are fairly closely offset by differences in object gaze. "Objects" are defined here as "surrounding inanimate stimuli". It's not clear whether this includes the background (i.e. the white space in Figure 1c) or whether that refers to actual objects in the video (and Figure 1c doesn't have any objects).

      So I think there are two questions really. One is whether there are differences in eye-movements that discriminate between the groups - and the answer seems to be definitely yes at 24 months, probably not before 6 months. And then the second question is why are there these differences? Is it a social avoidance or disinterest thing, or a perceptual / attention thing?

    2. I am often skeptical of this splashy papers, and initially was of this too, for all of the reasons you cite (but i am not sure you are correct about limited data at 2 months - hard for me to discern).

      Two points: (1) I know Ami Klin and Warren Jones well and they have active plans to role out eye tracking in the general pediatrician's office throughout Atlanta. So, his ROC curves of ASD vs controls is the target for him.

      (2) While cumbersome, measuring % change in fixation to eyes vs. mouths across age 2 to 6 mo is doable under research lab conditions. And the figures in the paper that were most impressive were Fig 3m and 3n showing group separation when looking at the combined change in eye fixation vs mouth. Yes, a small sample, and yes an apparently arbitrary window (i.e., why not 2 to 9 months?). But still very provocative and worthy of more study

  13. I've recently reviewed an abstract for IMFAR from what I presume was this lab group (though it was anonymous) where they report 70% sensitivity and specificity for using infant eye-tracking as an ealy diagnostic marker for ASD. This, combined with the comment above about rolling out eye-tracking in clinics, concerns me greatly. If this method of "diagnosis" is used too soon we will miss 3 in 10 children with autism but also risk mis-identifing 3 in 10 children who don't fit the diagnosis. Early diagnosis may well be a valid goal for research (though I question the resources poured in to this given the absence of suitable early interventions to go alongside, and the absence of suitable outcome measures for those interventions...) but we must be sure we are doing it right.

    1. Whichever lab the abstract is from (and I don't want to make any assumptions), 70% specificity and sensitivity would be terrible. It's actually far worse than you think because of the base rate issue.

      Let's assume that this is only used for screening boys, and that the prevalence of ASD in boys is as high as 1 in 55 (1.8%) which I think is the latest CDC estimate. And let's take the 70% sensitivity and specificity at face value (acknowledging that things rarely work out as well in real life as they do in the lab).

      The fact that 30% of non-autistic babies also test positive means that the positive predictive value of the test (that is the proportion of babies who test positive and actually turn out to have autism) is still only 4%. So for every baby that you're intervening with (perhaps successfully, who knows), there are another 24 families for whom you're rolling out intensive early intervention with for no good reason at all.

      From a scientific point of view, it's interesting that the screening tool is above chance. From a clinical point of view, it would almost certainly do more harm than good.

  14. Even in situations where we are dealing with a serious medical conditions and sensitivity/specificity are reasonably high (eg screening for breast cancer) there is much debate about its value, not least because of adverse consequences of detecting false positives. See Margaret McCartney's book "the patient paradox". The idea of this method being used for infant screening is so seriously ill-judged that I wonder if it can be true. Makes me glad we have NHS in the UK, which would not allow such a thing without a proper consideration of costs/benefits, plus much more solid evidence base. Also makes me cynically wonder whether this is all about money: pediatricians can charge their patients for a whizzy new test of dubious validity?

  15. Great post Jon, and great comments. I have nothing to add except keep it up!