I’m an astronomer by profession and I’ve taken up a small side project to analyze the data file that allegedly contains a record of activity between a bank in Russia, the Trump Organization and a server at a health company in Grand Rapids, Michigan during the months leading up to the election. This is Part III of my series of diaries describing this analysis. (Follow links for Part I and Part II and background links at end, including to the original Slate article on this story.)
Why am I doing this?
As I wrote in Part I, there are many aspects of this story that remain unexplained (including the story behind the story) and I believe that a thorough and objective description of what is contained in the data set will better inform those who continue to learn more.
Today’s diary — A focus on the Spectrum Health events: Periodicities, glitches and correlations
Astronomers are scavengers of data. We know so little about the Universe that we are constantly developing new ways to mine the sparse data we collect from our telescopes. Ever since record-keeping began, documented cyclical or periodic events have been among the most sought-after. We now know that periodic behavior is often related to the simple physical laws of gravity and motion that govern the orbits and rotations observed within the Solar System and beyond. Among the most celebrated periodicities of the modern era are those from radio “pulsars,” rapidly spinning neutron stars that emit a pulse on a very precise period. Neutron star timing rivals our best atomic clocks. Sometimes when a precise spinning star speeds up rapidly, we call the event a “glitch.”
What does this have to do with the server data set? Events on the internet won’t be governed by simple physical laws, but there are still periodicities that can occur. Computer software often includes loops, with actions occurring on a cycle and at regular time intervals. Other fixed timing processes, time-outs, and delays can also produce events with observable periodicities.
When I first looked at the data from the Slate article and produced the the plot featured in Part II, the first thing I noticed were the odd periodicities in the data. I was curious about these and decided to analyze them further, particularly in the data set associated with Spectrum Health, a Michigan company.
The Spectrum Health data
One unusual aspect of this story is that along with “pings” associated with a Russian bank, there were others, less frequent, that came from a server at Spectrum Health, in Grand Rapids, Michigan.
What do we see in those data? It turns out that the events from Spectrum Health are the timing clock or “pulsar” of this story. The events sometimes occur intermittently, but when they do, they follow a 61.01 minute period that remains accurate to within a few seconds over all four months. This is shown in the top plot above. The green triangles indicate when each Spectrum Health event occurred relative to a 61.01 minute clock. Don’t get too caught up in the meaning of the y-axis (“Phase Time”). The key point is that the flat lines, and even the steady diagonal, show that the timing of events remained quite precise throughout.
For comparison the Alfa Bank data are shown in red in the bottom plot. Events occur somewhat randomly with respect to the clock, although there are brief intervals of stability.
The Three “Glitches”
When timing is sufficiently precise you can use small timing offsets to learn more about the internals of a system. In this simple analysis of the Spectrum Health data we’re just scratching the surface, but we can easily see three dates on which data “glitches” occurred. At one point a (ratty) flat line shifts to a diagonal, indicating a speed up in frequency. The diagonal then flattens and sharpens, with a return to the original frequency. This is followed by a break in activity, after which the subsequent timing returns with a new offset (phase time). On what dates did these occur?
- Glitch 1: June 23 (Brexit)
- Glitch 2: July 14 (???)
- Glitch 3: July 24-25 (Weekend between RNC/DNC)
Brexit and the intra-convention weekend are called out for reference as they dominated the news on those specific dates. Correlation can’t be assumed, although any explanation for these data should explain the changes on these dates.
In Part II, I noted how the Alfa Bank data showed increases in event rate after June 23 and July 24-25. So it is interesting to find that those the same dates show up in a completely different timing analysis of the Spectrum Health data set.
Was Spectrum Health activity correlated with the Alfa Bank activity?
Yes, on certain timescales. There are correlations in the event rate of the Alfa Bank and Spectrum Health servers, shown in the plot on the right. Over the weeks, the frequency of pings increased, changing most dramatically on the two dates described above and in Part II.
The 61 minute periodicity within the Spectrum Health data may also serve as a unique fingerprint. This same periodicity also shows up erratically throughout the latter half of the Alfa Bank data. Is this unusual? I'm no expert, and there are few public data sets to explore to determine what features are common in comparable data sets. However, I did run my analysis on large database of de-identified DNS lookup events from Los Alamos National Laboratory. While ~60 minute periods are common, I found no examples of a comparable, long-lasting, 61 minute periodicity in the tens of thousands of event data sets that I inspected.
Getting back to the question of correlation, I will note that on timescales of hours, the erratic Alfa Bank data is qualitatively different from Spectrum Health. While there are similar increases in overall event rate and similarities in characteristic timing, it does not appear as if the two servers were doing the same thing (e.g. as in responding with lookups to an identical stream of e-mails from the Trump server).
A list of questions about these data
On the subject of the Spectrum Health activity, the Slate article reported:
The company said in a statement: “Spectrum Health does not have a relationship with Alfa Bank or any of the Trump organizations. We have concluded a rigorous investigation with both our internal IT security specialists and expert cyber security firms. Our experts have conducted a detailed analysis of the alleged internet traffic and did not find any evidence that it included any actual communications (no emails, chat, text, etc.) between Spectrum Health and Alfa Bank or any of the Trump organizations. While we did find a small number of incoming spam marketing emails, they originated from a digital marketing company, Cendyn, advertising Trump Hotels.”
With this in mind, I thought it might be useful to end this diary with a summary of questions relevant to the above analysis. I divide these into two categories: questions relegated to techno-nerds and/or reporters, and questions for everyone else. Feel free to place yourself in either category, or both.
Questions for techno-nerds and/or reporters:
- Unique fingerprint? Is there any reason to expect a 61-minute period in DNS lookup log data?
- Correlated event rates and timing? Spectrum Health events occur with precise timing. Alfa Bank events are either non-periodic or follow a ragged clock. And yet, the overall event rates are correlated with associated timing “glitches”. Is there any simple explanation for this?
- Is this a closed system or is there significant activity between more than three servers? During this period, were any other servers pinging the Trump server at a comparable rate as the Alfa Bank and Spectrum Health servers? There have been reports of other activity, but these claims provide no quantitative detail. The only tabulation I’ve seen suggests that the activity rate from other servers is low (<25% overall).
- Spectrum Health server as a communication channel? Some have reported that the computer scientists behind this story wrote in a white paper that the Spectrum Health server was used as a Tor exit node, which would allow anonymous transfer of data through that server, and that this was exclusively used by Alfa Bank. Reporters searched the Tor exit node archives, which are public, and didn’t turn up a match. Either the computer scientists are wrong, or there’s some misunderstanding about this. Tor archives aren’t that easy to search. Can this be clarified?
Questions for everyone else:
- Mass-marketing? Slate wrote that the Trump server “had a history of sending mass emails on behalf of Trump-branded properties and products.” Reporters obtained from Spectrum Health a Trump marketing e-mail they received in November 2015. Alfa Bank dug up a Trump e-mail from Feb 2016. These were before the dates of activity being considered above. Nevertheless many others like this might provide clues for further analysis. Can anyone find other e-mails sent to them by mail1.Trump-Email.com (to anywhere)?
- Mass-marketing during May-Sep 2016? It would be most helpful to find a Trump marketing e-mail sent by mail1.Trump-Email.com during the dates considered above—May to September 2016. Can anyone find one? (Admittedly, I’m a bit surprised that these haven’t been found, as they could easily falsify the majority of hypotheses about this story.)
- Questions from you? I’ve explored patterns in the data that have caught my attention. Are there any other questions that you’re curious about? Are there dates or patterns that you think worth investigating? Feel free to add thoughts in the comments.
Now that we have our precise ‘clock’ we can use it. In my next diary, I’ll discuss the unusual timing of the Alfa Bank server data, and what we might learn from it.
The original F. Foer Slate story: Was a Trump Server Communicating With Russia? (dkos diary) and follow-up Slate posting. Rebuttal stories (incomplete list): Vox, Intercept, Verge, ErrataSec (follow-up), Medium (N. Jeewa), Logs (J. Camp). Earlier dkos analysis with a different emphasis.