Internet Engineering Task Force IRTF, 8 Mar 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF 110 IRTF Open

Description

The Internet Research Task Force (IRTF) Open session will be held at IETF 110 on 8 March 2021 at 12:00 UTC and include presentations of the Applied Network Research Prize (ANRP) presentations by Francis Y. Yan for his work on applying machine learning to video bit-rate adaptation ( “Learning in situ: a randomized experiment in video streaming”, Proceedings of USENIX NSDI 2020) and on the "Network topology design at 27,000 km/hour", Georgia Fragkouli on "MorphIT: Morphing Packet Reports for Internet Transparency", and Audrey Randall for her work on DNS caching and privacy (“Trufflehunter: Cache Snooping Rare Domains at Large Public DNS Resolvers”, Proceedings of ACM IMC 2020).

A

A

Okay, we'll start in a couple of minutes. I still see people joining so give a couple of minutes for people to.

A

A

Sound check is that better.

A

All right, so I make it uh about five past uh and people seem to be joining slower. So I guess I guess we'll get started so um welcome uh everybody to I etf 110 uh and um we're online again uh raveling in prague. uh That's, uh I guess we're getting used to this.

A

So this is the irtf open meeting. um My name is colin perkins. uh I'm the irtf chair.

A

So, um for perhaps the the first time uh this week uh a reminder that the irtf uh follows the iatf uh intellectual property disclosure rules and that, by participating in the irtf meetings and and the ietf meetings you you agree to follow the intellectual property disclosure procedures uh and that, if you make a contribution to the meeting, then you uh you must disclose if there's a patent or a peasant's application relating to that contribution.

A

uh And uh if you need the details, please follow the link on the slide and look at the the rfc's listed.

A

In addition, um a reminder that um we we may be taking uh audio or video or photographic records of the meetings. uh And suddenly this session is being recorded and is going out, live on youtube, uh as well as being recorded for the proceedings and will be available on the itf websites.

A

Afterwards, uh and uh in addition, um please remember that uh we we uh we encourage people to work respectfully with the other participants uh and if you have any uh questions or concerns about the behavior of participants in the itf, uh we have the ombuds team. uh Who will help deal with that or please talk to me.

A

If you have any concerns and the uh the itf code of conduct and the anti-harassment procedures also applied to the irtf and the irtf meetings, so the uh goals of the irtf, the the irtf, is a parallel organization to the ietf, which focuses on some longer-term research issues uh relating to the internet as a whole.

A

While the ietf does engineering and uh standards making.

A

The irtf is a research organization, uh it's not a standards, development uh organization and, while the irtf co excuse me, um while many of the irtf research groups co-locate with the ietf meetings to encourage discussion and cross fertilizer cross fertilization between the two communities, the irtf is is not a standards development organization and while the irtf can publish informational or experimental rfcs, the primary goal is to promote collaboration and teamwork and to explore some of the research issues relating to the internet protocols, applications, architectures and technologies.

A

Rfc 7418 is an irtf primer for ietf participants, which may be useful to those of you who are not so familiar with the the way the irtf.

A

Works, we have a number of ways. You can stay informed about the the work the irtf is doing. uh We have the the irtf announce mailing list uh and you can see that the url for joining uh at the bottom of the slide here, uh which is a low volume up list for announcements. uh We also have the irtf discuss list uh for discussion about internet research, related topics.

A

We have the uh the main irtf.org website, um which has information about the organization links to all the research groups. um The the prize-winning talks, um the workshops we organize, uh we're also on social media, we're on twitter we're on facebook. We we have a linkedin page, which unfortunately isn't linked from the slide, uh and we have. um We have presence on the the sitcoms slack channel as well, so to look out for us on the various social media forums.

A

The irtf is is organized as a set of research groups um and there are uh 14 uh research uh groups currently uh and of which 13 of them are all of the research groups, apart from the decentralized infrastructure group uh a meeting later this week, uh so please do look out for those sessions in the agenda. uh I I believe the next one is the the measurement and analysis for protocols group which will be meeting in the slot immediately after this one.

A

And we, as I said, the itf also publishes rfcs. um The only rfc published in in this cycle from the on the irtf stream is rfc 8975, which talks about network coding for satellite systems, uh which came out of the network coding uh research group.

A

uh But uh I expect there'll be a number of rfcs from the the crypto forum group from the information centric network group, uh which will be published in the relatively near future. There's a number of them getting close to the.

A

A

And uh in addition to the the ongoing research groups, uh we also uh run uh in conjunction with the internet society. We run uh what's known as the applied networking research price, and this is going to be the the main focus of the session. Today.

A

The applied networking research prize is awarded um to recognize that the best recent results in applied networking research and it's awarded to recognize interesting, new research ideas uh that are potentially of relevance to the internet standards community and to recognize upcoming people that are likely to have an impact on internet standards and technologies.

A

In particular. We're trying to bring in uh people who would not otherwise engage with the itf uh and with the ietf community, uh and try and get give exposure to ideas or people. That would.

B

Not not otherwise.

A

Get exposure or would not otherwise be able to participate in the discussion.

A

The details of the the praise: uh we can see that the url, uh the rtf.org a and you can find links to all the the past prize, winning uh talks there. um We we award uh six prizes each year, uh two for each meeting typically, um and uh we we we always get a large number of nominations. I think there were 70 uh about 17 nominations for the prices this time now. So this is recently selective uh and we've got some some really nice talks coming up and some really people doing some really nice work.

A

The uh talks for today the the prize-winning talks for today uh I'm very pleased to announce uh from uh francis uh yan uh who will talk about his work on uh machine learning for video bitrate adaptation. uh First of all, uh and that will be followed uh later in this meeting by talk from audrey randall, uh we'll be talking about the dns, caching and privacy um and and uh using dns snooping to detect malware.

A

And let's say, we've got two. These two really nice talks coming up later uh and uh they're also archived on the website, and you see the url on the slide and uh we'd like to thank the the internet society and comcast and nbc universal for helping organize and for sponsoring this. For sponsoring these, these these prices.

A

And uh in addition to the applied networking research process, we also run the applied networking research workshop, and this is organized in conjunction with acm sidcom and is an academic workshop that co-locates with the the itf meeting in july each year, and I'm pleased to announce that the the a rw this year uh will be chaired by uh the the the program chairs will be andrew, lutu and uh nick feenster, uh and uh it will be uh happening in conjunction with the itf 111 uh in july this year uh and the the the cough papers uh should appear uh later this week.

A

uh The paper submission deadline will be the 21st of april, uh and uh you know this. This is a again a forum for applied networking research, a forum uh for uh the internet, research, community um network operators and the standards community to come together and discuss um recent uh results, uh emerging ideas in applied networking research.

A

So, if you're doing uh networking research, uh please do consider uh submitting your your research to this conference and we're looking for both uh academic and industry, related research uh and the broad spectrum of applied networking research.

A

So so, please do consider submitting.

A

And that's about all, I have to say uh the um that the remainder of this meeting. um We have the the two applied networking research prize winning talks, uh starting with uh francis yan, who will be uh presenting the learning in situ uh and then audrey, who will be presenting on truffle hunter uh but be before we get to that. I just want to to pass over to uh matt ford uh from the internet society um who, who sponsored the the applied networking research prize to say a few few words matt's over to you.

C

Thanks very much colin, um uh as colin mentioned, most of today's meeting is taken up with the talks from the applied networking research prize winners, and I want to say that it's an honor for the internet society to be able to support the irtf in delivering the anrp, and in this its 10th anniversary year, I had to go back through the website and look at the past prize winners and realized that this is in fact, 10 years since we started this initiative and I think the talks, the anrp prize-winning talks remain a highlight of the ietf week, um whether that's in person or virtually as it is now.

C

um As colin mentioned, we've received sponsorship from comcast and nbc universal. They are long-standing supporters of the anrp. If you know of or think of another potential sponsor for this, do please get in touch with me. My email address is pretty easy to find it's ford. Icesoc.Org I'd, be very keen to add to that list of sponsors for the anrp, if at all possible.

C

um So congratulations to francis and audrey, and I look forward to listening to your talks today. Thanks.

A

Okay, thanks matt, so uh with that we we should get started with the talks. uh The first of the two talks today is by uh francis yan uh francis is a senior researcher at microsoft, research and as ufo operators office of the cto.

A

His research seeks to improve network systems by creating learning-based algorithms that can prep that can be practically deployed and he completed his phd in computer science at stanford university recently uh he's received the nsdi community award and the atc best paper award and uh for his phd research and before his phd, he graduated from chiang mai university, where he received a bs in computer science and a ba in economics and he's going to talk to us today about learning in situ a randomized experiment in video streaming and the the talks are pre-recorded.

A

uh Francis uh is on the chat. uh If you have questions during the talk, uh otherwise, we'll have plenty of time for discussion after the talk.

A

Okay, if we can play the.

D

D

Platform will be able to run a randomized experiment on real users and a learning hello, everyone, I'm francis from microsoft research. My research has focused on practical machine learning, algorithms for network systems.

D

It's my pleasure today to talk about puffer, a video streaming platform we built to run a randomized experiment on real users and a learning based adaptive, bitrate algorithm that works on the wild internet.

D

This work was completed back at stanford, advised by my former phd advisors, keith weinstein and phil lavis. Now, let's get started the problem. We're tackling here is adaptive bit rate streaming or abr, which is a critical algorithm used to carry a large portion of the video traffic on today's internet at a high level.

D

Abr aims to improve the user's quality of experience or qe that basically consists of two primary goals: higher video quality and fewer rebuffering events, although it seems to have been well studied in the past, abr still remains a challenge because its two primary goals naturally conflict with each other.

D

Let me show you how apr works? An abr server divides the video into chunks. Each chunk is usually 2 to 6 seconds.

D

Then each of those chunks is pre-encoded into a couple of compressed versions at the different sizes and video qualities.

D

The objective of abr is to react to the varying network conditions over time and decide which version of each chunk to send so as to optimize the total qe of the client.

D

This problem is non-trivial because, let's say avr believes it's okay to send 1080p all the time which gives the highest video quality, but what? If the network capacity suddenly drops to a level? That's unable to deliver 1080p anymore from that moment on the playback buffer in a client's player will be drained slowly eventually resulting in video phrases.

D

This talk will be in three parts. First I'll describe puffer a live streaming platform for video streaming research, then I'll show a surprising finding from a randomized experiment. We performed on puffer. That is the confidence intervals on the performance of avr. Algorithms are much bigger than we realized.

D

Lastly, I'll introduce fugu a machine learning based abr algorithm that was learned in situ, meaning in place on the actual deployment environment buffer.

D

Ok, let's move on to power to study video streaming and test apr schemes in real life. Ideally, on real users, we built our own video streaming platform called puffer.

D

It's a live tv streaming website open to public in late 2018, allowing users to watch six tv channels for free. Our goal was to create a realistic testbed and a learning environment for the community to investigate video streaming algorithms and we operate puffer also as a randomized experiment of abr schemes.

D

So each time you visit our website, puffer.stanford.edu you'll be randomly assigned to one of the abr schemes being tested, including those in the prior work and our own algorithm that I'm gonna introduce later.

D

But you won't be aware of this assignment, while you're watching tv on buffer, our server will record, which apr algorithm is used along with some other client telemetry on video quality and playback buffer for analysis purposes to recruit users. We purchased ads on google and reddit for keywords. Like live tv, the other users we attracted came from the press articles covering puffer, for example, new york times recommended puffer to those who need free tv to watch at home during the pandemic.

D

As of today, we've had more than 130 000 real users across the us puffer's web page may look simple, but you can think of it. As a small youtube tv built from scratch, it's actually a lot more challenging to support 130 000 users, then building a research prototype.

D

This picture shows puffer's architecture. We receive tv signals with an antenna at stanford, decode the signals and encode video into multiple versions and then serve them using different avr algorithms to our users.

D

We've also built a monitoring system to monitor the system state and send an alert to my phone. If anything goes wrong, we wrote more than 30 000 lines of code for puffer and have used it to stream 60 years of video to 130 000 real users.

D

Not only our results in the paper are reproducible. All the user data collected on buffer is being automatically posted to the website. Every day after anonymization, you could select any data and view the algorithm performance we plotted in figures, but you could also download the data and do the analysis using our scripts by yourself.

D

Additionally, we're opening buffer to the research community to train and test novel, abr, algorithms, and, more generally speaking, also congestion, control and bandwidth prediction. Algorithms.

D

And after building buffer, we then run a long randomized experiment on video streaming in real life. Unreal users.

D

It's worth noting that the existing apr algorithms did real experiments as well, but those experiments often run between a few network nodes and lasted for only hours or days.

D

What we have found is measuring abr performance on the wild internet is much harder than we realized before.

D

You may need two years of data per scheme to reliably measure a difference like 20 I'll demonstrate the surprising finding using the experimental results, comparing five avr algorithms.

D

This figure shows the algorithm performance using the data collected on a single day in january, 2019 puffer streamed more than 17 days of video to about 600 users watching tv on that day, since I'm going to present this type of figure several times, let's take a closer look. First on the y-axis, it shows the video quality measured by a standard metric as same higher is better on the x-axis.

D

It displays the re-buffering ratio, but we have reversed the axis, so the better is to the right overall, the better qe is up and to the right. For now. Let's ignore the individual performance of each scheme and focus on those 95 percent confidence intervals around each point.

D

You see even after streaming, 17 days of video, the confidence intervals are still huge and overlapping with each other. As a result, most schemes are still indistinguishable from each other.

D

This tells us if you perform an experiment that only streams 17 days of video. You can't really measure any benefits reliably.

D

Any improvements you found could just be statistical noise if you don't collect enough data or consider the confidence intervals.

D

So we left the experiment running for a week and collected 42 days of video. Now the confidence intervals became smaller, but not actually enough. For instance, let's look at the scheme. Mpchm its mean style ratio is about 0.4 percent, but the confidence interval ranges from point one percent to point nine percent twice as large as the mean value.

D

So we continue the experiment. After a month about half a year of video were streamed, and now the confidence intervals became much smaller.

D

However, we're still unable to distinguish some schemes, for example, mpchm and bba have almost identical performance, but is that a reliable result.

D

After running the experiment, for eight months, we ended up streaming more than 13 years of video to about 55, 000 user ips and finally, let's zoom in on this figure, we are now able to narrow the confidence intervals down to 20 of the mean value, but remember this is only possible after streaming at least two years of video per scheme.

D

The reason for such large uncertainty is because the internet is way more noisy and heavy-tailed than without. Among the six hundred thousand streams, only four percent of them had any stalls, meaning the other 96 percent never stopped.

D

Stock events are so rare in practice, which is why you need a huge amount of data to measure them.

D

Additionally, we have results in the paper to show that the distributions of watch time and throughput are also super.

D

Skilled in the final part, I'll describe how we leverage machine learning to train an avr algorithm in situ meaning in place on buffer.

D

The first step we took was to understand the system dynamics better.

D

We can plot how the playback buffer changes over time, for instance, it drains one second per second, while waiting for the next chunk and once the chunk is received and appended to the buffer, its buffer size increases by the chunk length.

D

You'll see. The only uncertainty in abr is the transmission time of a video chunk. That is simply how long it takes for a client to receive a bid chunk. Since the chunk leaves the sender.

D

The algorithm we proposed is called fuku at its core.

D

It uses a neural network to predict the transmission time of each chunk, the transmission time predictor ttp takes as input the sizes and transmission times of past chunks and also the size of the chunk 2 cent I'd like to emphasize that we don't predict throughput, because a throughput predictor wouldn't consider the size of the chunk to send and would have ignored a well-known fact in networking that the observed throughput actually varies with the file size to send another uncommon feature of the input is low level tcp statistics from the kernel, such as rtt and congestion window size.

D

This is weekly crossing layers, with information flowing from transport layer to application layer.

D

The output of ttp is unusual too, instead of a point estimate. It outputs a probability distribution over transmission times, and we found it to be useful when maximizing the expected qe.

D

In sum, ttp has several uncommon features in the design and our ablation study found each of these features to be necessary to fubu's.

D

Performance ttp's design enables it to be trained in situ, meaning in place on real data from the deployment environment buffer.

D

The training data are sampled and fed into tdp as individual user streams. Each user stream contains a chunk by chunk series, including the size and actual transmission time of each chunk and the required tcp statistics on the server.

D

We use standard, supervised, learning to train ttp to minimize the difference between its predictions and the actual transmission times of chunks.

D

Note that learning in situ does not require any network simulators or replay any recorded throughput traces, because if an algorithm is learned in simulation and evaluated in the real world, the gap between simulation and reality may cause generalization issues which we'd like to avoid.

D

Once the only system uncertainty is approximated by ttp. The remaining question is how to actually select the version for each video chunk.

D

What foogo does is to look five chunks ahead and optimize the total qe in a look ahead horizon.

D

Roughly speaking, the qe function includes higher video quality, lower quality variation and less rebuffering time.

D

Given ttp, this optimization problem of maximizing qe can be solved with the well-known technique in markov decision process called value iteration.

D

We don't need to look into the formula because it's basically dynamic programming, which can be computed in real time. This behavior is also interpretable because we know exactly what's going on.

D

After the optimal plan is computed, fugu only takes the first step and then re-plans for the next five chunks.

D

This is also a classical technique in control theory called model predictive control, which proves to mitigate the accumulation of errors.

D

Putting everything together, including the buffer video server, the model based controller and ttp fugu, falls into a class of reinforcement, learning or rl algorithms called model based rl.

D

We first deploy a version of fugu on puffer's video server and focus model based controller will plan ahead and select the bit grid when it receives necessary information from the video server.

D

The optimal plan of chunk qualities is computed in real time with dynamic programming, during which tdp is queried to predict transmission times repeatedly for the chance of our interest.

D

After new data is collected and aggregated, we retrain tdp offline, using supervised learning and deploy the new version of fugu on the next day.

D

This paradigm, I just described, is model-based reinforcement, learning, but its core component is really the ttp which is learned in.

D

Situ once again, let's come back to this figure that shows average video quality on the y-axis and style ratio on the reverse, x-axis. The better qe is up and to the right.

D

These results are obtained by analyzing 13 years of video data sent in more than 600 000. Video streams from puffer here are four state-of-the-art avr algorithms bba is a simple buffer-based avr algorithms algorithm published at sitcom 2014..

D

It selects video bit rate based only on users, playback buffer level mpchm, and the robust mpc hm are two variants of an abr scheme from sitcom 2015..

D

It leverages the classical model predictive model same as in fugu's model predictive model based controller. However, it tries to predict the future throughput using the harmonic mean of some past throughput measurements.

D

The last abr in this figure is pensive from sitcom 2017, which is also based on reinforcement. Learning different from learning in situ nc requires training in network simulators, although their paper reports near optimal performance in simulation.

D

Unfortunately, its performance did not generalize from simulation to the real internet. For instance, although its rebuffering ratio is lower than bba and npc hm its average video quality is worse than them.

D

In this figure, fugu is here it achieved the highest video quality and the lowest star issue except robust mpc has a lower star ratio than fugu, but that comes at a great cost of video quality.

D

This table contains the raw performance numbers of the same experiment, where the first row is fugu, we can see, fugu's mean star ratio in the first column is only 0.01 percent higher than the lowest scheme, and fugu is the best on the other dimensions, including the mean as sm we saw in in the last slide.

D

In a third column, we see fugu's quality variation is also the lowest meaning that the video delivered by fugu is more smooth than the other schemes.

D

The last column shows an interesting performance metric time on site or session. Durations recall that power users are blinded to the assignment of apr algorithms.

D

Under this randomized setting users, whose sessions were assigned to fuku chose to remain on the powerful video player about one to three minutes longer than on average than those assigned to other schemes.

D

Let's now move on to their code start performance, that is how well they perform in new sessions which they hadn't streamed. Any video to we plot the average video quality of the first chunk served in such new sessions on the y-axis and the startup delay on the reversed x-axis on a cold start to a new session.

D

Firework argues that, since the abr algorithm knows nothing about the network conditions of the new session, it needs some session clustering algorithm to determine the nature chunk quality based on other similar sessions. Otherwise apr algorithms will have to choose the first chunk blindly, which could be too conservative or aggressive. We don't know by contrast.

D

Fugu provides an alternative option recall that one of the input features of fugu focus. Ttp is tcp. Statistics such as rtt measurements, which are actually available as soon as the underlying http or tls or tcp connection, is established, and knowing this information turns out to allow fugu to begin safely at a higher first chunk quality than the other schemes, while maintaining roughly the same level of startup delay.

D

To conclude today's presentation, I introduced the puffer a video streaming platform that we built for the research community to train and test novel algorithms.

D

It has 130 000 real users now and has streamed more than 60 years of video using the data. We had a surprising finding that the internet is way more noisy than expected.

D

You may need as much as two years of data per scheme to reliably measure a 20 precision. We outperformed existing abr algorithms consistently in the real world through the design of fugu and fugu's. Core component is a neural network. Ttp that predicts the transmission time of each chunk.

D

Tdp is trained in situ, meaning unreal data from the deployment environment buffer.

D

That's all for today and I'm happy to take questions. Thank you.

A

Okay, thank you excellent talk uh from francis um francis, if you, uh if you want to turn on your video, um I see we, we have a a whole bunch of uh uh questions and conversation in the chat. uh I hope not. Everyone has uh spent their whole time asking questions in the chat and then there are some left for francis.

A

uh So if anyone has any any questions uh for francis, uh but please uh please join the queue.

D

Yeah thanks thanks.

D

um Okay, so, regarding the watch time uh so time outside is definitely a useful metric for for the industry. But the observation we had is that users chose to just watch fugu longer than the other algorithms and there's a cdf graph in our paper and typically uh for sessions longer than five minutes.

D

um Users tend to uh stay with, but other than that we really don't understand the reasons and what other factors there might be uh affecting that user, behavior and other question is: um can fugu predict whether having a lower base quality in the manifest of encoded qualities? Will they eliminate more stalls? Okay, yeah. That's a good point. uh Thanks david for the question.

D

uh We already have a more fine green bit ready lighter compared with, I believe, most of the industrial players.

D

We have 10 versions for each video track, including four resolutions with different crf encoding parameters and we uh spread out. You know the uh the bit rates we we have a monit. We have a dashboard to monitor whether the 10 bit rates are evenly spread out in terms of the the their bit rates, the sizes and their ascent values. So we picked 10 levels and uh it's the lowest base quality. I believe our base quality is already low enough for for users, yeah, okay,.

A

D

If anyone wants to jump into.

A

The audio to ask questions as well, then we have that.

D

Let's see slide 29. Give me a second okay when calculating the median versus the mean shows similar spreads in quality.

D

We we didn't calculate the median values, but I would suppose I would assume they showed similar results.

D

Did I miss any other questions.

A

It seems everyone's jumping into the the chats there um area.

A

Do I need to press something.

E

To yes, let me see how that works.

A

Yes, I can hear you.

E

Perfect, um so I'm I'm not a specialist on video coders, um but you have a couple of other there that you compared with and are these used in practice or do you have any knowledge about what the big video platforms are using and how this would compare.

D

Oh, your question is about bit rate or the other abr algorithms.

E

The other api algorithms.

D

The other api algorithms, um so the bba, so I I would say the all the other four api algorithms are research, algorithms and in terms of industry adoption, I think, probably bba. A variant of bba is used by netflix because it was proposed by one of my friends uh actually, my colleagues uh at netflix.

D

So it's a heuristic based algorithm, simple algorithm, based on only the buffer size and for mpc and the robust mpc they were proposed in sitcom, 2015 and pensive was proposed in sitcom 2017.. They are both research, algorithms.

D

So that's why we uh compared with them, but I'm not sure if any real video service providers are using them.

E

Yeah I mean I read some papers where they try to kind of understand how these algorithms unders work and it's usually kind of a black box thing. But the one thing I kind of got from is that these algorithms also kind of change very frequently, so it's probably interesting to to find out more there, but also hard as a researcher. Thank you.

D

Yeah no problem, and actually, I would say, um they're not as black box as uh as we thought so, for instance, bba is pretty simple, so below a threshold of the playback buffer size, maybe three seconds we uh we ask the video server to send the lowest quality and above maybe 12 seconds.

D

We send the highest quality and between three seconds and 12 seconds, they use a linear. You know relation linear interpolation between different bit rates. So that's uh pretty interpretable.

D

B

Thanks ali for.

D

Yeah yeah thanks and ali in the chat, netflix, never confirmed or denied, using pva or anything else. um I think when we gave a talk so bba was proposed by uh you know t y huang uh when she studied at stanford, with with the advice by by nick and then um I believe.

F

D

F

Was a decade ago, so it's uh I'm sure netflix has advanced since then.

D

Yeah and when we give a talk, uh the puffer talk at netflix. Yes, uh they didn't review any confidential information.

D

So I'm I'm not saying anything, uh not a lot, um but but you are right, so maybe they have deployed or are using a variant of vba or maybe not well, I I don't really know actually has there been similar work on video conferencing systems, zoom, gc, mi echo, uh yes, so abr's corresponding work in video conferencing is bandwidth, estimation or, broadly speaking, also congestion, control for uh real-time video, and um it also adapts to different, like varying network conditions, by uh changing the sending bit rate of of the video encoder and actually I'm working with ali.

D

So we're hosting microsoft is organizing a grand challenge on bandwidth estimation for real-time communications, we're hosting it as a grant challenge at mmcis this year and ali has been kindly helping us putting things together, sort things out, yeah thanks for sending the link.

A

So uh jonathan had a question I think.

G

Hi, can you hear me.

A

G

Did you did you have a chance to look at the the the p99 or the p99.9 performance, so the performance of the very tail uh of the.

D

Yeah, that's a good question.

D

um uh Let me see I I don't think we reported any uh percentile or tail performance in the paper, but we did look at them and uh especially in those figures, we have included the confidence in your intervals. So we're confident that that interval has the mean value.

D

D

Honestly, I I don't have the answer, but I believed we checked before yeah.

D

I agree that tail performance is critical.

D

A

Thanks for um so, can you can you maybe say something about uh some of the difficulties or challenges in running uh uh and uh sort of research experiments as a grad student at this sort of scale, essentially, one of the larger experiments in the space.

D

Yeah, so, um in my experience, having real users is, is like having real impact on the world and that's super exciting to me, but uh the availability, I would say that's the the biggest challenge.

D

uh Usually when we uh write some code, it can work 99 of time, but if real users are watching it all the time, then um that the system should should never go down like too frequently and as soon as um buffer stops working, I would receive user complaints in emails, so I I'm essentially on call 24 7.. I have a so that's why we built a monitoring system. So anytime there's a bug making the the service crash.

D

I would receive a receiver alert so ahead of our user emails thanks our users for letting us know, of course, um so I I will always try to maybe I'm in the middle of nine. I would still get up and fix the the bug immediately. So that's the biggest challenge, availability and especially for me, the almost the only engineer working on developing the the platform.

D

um It's it's really hard to maintain. It.

A

I I'd like to.

D

Add one more thing yeah, so we are testing research schemes, so research code is I I I I'm yeah. I don't.

B

D

Say this, but recent code tends to be messy so even though our code is of relatively high quality, but when we run other algorithms, they use their research code on the production system and it it also crashes uh many times. So that's that's a pain.

A

Yeah, it's it's a challenge. It's a challenge. um My other question um you to talk. To what extent is I mean yeah? Obviously, the specific results you've got uh uh uh relating to video. uh But to what extent do you do you think the the types of issues you're running into um with needing to do very large scale, very long-running measurement studies uh applied to other types of network measurement research and do you think you'd find the same issues of confidence intervals and so on? If you repeated other types of network measurement, experiments.

D

Yes, I would expect the um the findings to generalize to other network measurements, because so we did see heavy-tailed user behavior, but we believe most of the okay part of the reason why we observe the noisy sliver of the internet was because of the network, the network per se, the inherent inherent issues um and heavy-tailed nature of the network, and my past research has studied congestion control and we observed also very different findings from those reported in previous research papers, because when you measure congestion, control or larger real-world test bed and over many times many times, we tend to see different and different results and noisy results.

A

Okay, okay, thank you all right. Does anyone else have any questions for francis.

D

Jonathan wonders: if it's a question of code optimization, so are you talking about the research code or anything else?

D

D

I I'm not sure how to solve this issue. Like writing research code on a production system. I think typically, people will just reproduce where fixed potential issues um potential about in the research code, but in our case we just we want to faithfully compare with other algorithms and report and evaluate their performance.

D

So I think it's challenging to consider both that. Could you, like both aspects.

D

Compare server compute time for the algorithms yeah, that's.

B

A good question.

D

um For our algorithm uh and another deeper in first learning based algorithm, yes, we have to take into account the compute time and the compute resources required, but uh fortunately uh none of the algorithms consumes too much computer resource and it takes just uh typically several milliseconds to uh to compute the decisions apr decisions online. So that's not a bottleneck.

D

So we did look at the computer time, but we didn't report it in the paper because it was not about the mic just to be clear sure what information was included in the state updates to the mpc controller.

D

I think you're asking about uh fugu's model based controller right, so the information includes the past eight chunks at their transmission times and the sizes, and also the size of the chunk two cents and also low level tcp statistics.

D

So those are the uh input features as the states and they're also the updates provided to the controller. Sorry, so those are the inputs to uh to ttp to the transmission time predictor and for the model based controller. um The input is the current playback buffer level and all the necessary chunk sizes, because it needs to run this dynamic programming, also known as value iteration algorithm online.

D

So that's all. It means.

A

All right so last question from.

D

Yeah, that's great, so does it leave a slow start, often uh so, first of all um the pensive paper in sitcom 2017. If I remember correctly, they disabled this, uh the free like the timeout, the tcp time timeout such that it never or it hardly leaves the slow start. Sorry, it hardly leaves the congestion avoidance phase. So it doesn't return to the slow start phase, because um when, when you send a video chunk every two to four seconds we don't want to.

D

We don't want congestion control to run pop every time, starting from the slow start phase. So we can definitely disable it. But in our case uh I believe we also disabled it. So uh it shouldn't a should shouldn't, go to uh or stay and still start better off.

A

All right, so, uh thank you, francis uh really, nice talk uh some really good discussion there. uh I I would normally say that uh uh francis will be around for the rest of the week uh and you should uh grab him in the break. If you have any questions uh clearly, this is a little bit difficult, but hopefully francis uh will be able to join that the gather town uh in some of the breaks uh or drop him an email or on the chat. If you want to talk further about this, uh this work.

A

So thank you again, francis excellent talk thanks.

D

For having me, thank you.

A

All right, so uh at this point we will move on to the the second of the uh prize. Winning talks uh today, uh which is uh by audrey randall audrey is a third year phd student studying internet measurement and security at the university of california, san diego, her research interests center around measuring and understanding harmful behavior on the internet, from underground crime to stock, aware to dns hijacking.

A

She received her bachelor's degree in computer science from the university of colorado boulder in 2018, and her talk today is uh on cash snooping red domains at large, public dns resolvers to detect malware.

A

So if we can uh play the video.

B

Hey everyone. My name is audrey randall. I am a phd student at the university of california, san diego and I'd like to talk to you today about our paper. Truffle hunter cash, snooping rare domains at large, public dns resolvers.

B

When you think about abusive behavior on the internet today, you might first think about the more common types, such as spam emails or botnets or malware.

B

All that stuff is everywhere and the thing about it is it's easy to find in the wild simply because it is so common, but there's another type of abuse which is much harder to find because of its rarity, and that includes things like typo squatting, hack for hire services, stock, aware services, where we really don't know how many people are affected by it, because it's hard to find in the wild, for whatever reason.

B

So that's the type of abuse that we are really interested in studying and we've made the observation that all these types of rare, harmful behavior have something in common, which is that they all need to make dns requests to the servers they require to function.

B

If you could observe enough dns requests, you could presumably study these types of harm in more detail. You could figure out how prevalent they are where they occur, how frequently they occur, but to do that you're going to need to observe a lot of dns requests because you're looking for the needle in the haystack you're looking for a very small amount of signal in the large amount of noise.

B

Fortunately we're kind of entering this new era in dns right now, and that's because these public dns resolvers are starting to gain more popularity.

B

It used to be that it was only power users and people who were really tech savvy, who would be using public resolvers but we're starting to see them get hard-coded by default. For example, google, home routers all use google's quad 8 service by default and firefox routes, all their dns queries to cloudflare.

B

We even see that new york city's entire public wi-fi network now uses quad 9.. So, of course we are not google. We are not quad 9, we are not cloudflare, but can we, as third-party observers, still use these services to observe the kind of rare behavior we want to study?

B

Well, of course the answer is yes, there is a well-known technique. That's been around since at least 2004 called dns cash snooping, but in the past it's been presented as an attack and it's considered a privacy threat and for good reason, most of the time what researchers were doing when they did cash snooping was, they would scan the whole internet and they would try and see which devices would answer a dns request.

B

The problem was, most of those devices were misconfigured home routers, so they've only got a few users behind them and if you find some domain on them that might be invasive of privacy, it's not too difficult to figure out which actual user put it there.

B

Public dns resolvers, on the other hand, allow you to preserve privacy, because so many people use them, it's almost impossible to de-anonymize them. It's almost impossible to figure out just based on the fact that a domain is in cash who put it there.

B

So that's great: we can use cash swooping as a measurement technique instead of a privacy threat on public resolvers, but public resolvers are also more challenging because they've got complicated caching strategies and that's why they're of interest to the ietf, because this does lead to some protocol non-compliance.

B

So for the remainder of this talk, I'm first going to go over some brief background on cash snooping for anyone who hasn't seen the details in a while, and then I'm going to talk about how to do it on public resolvers. To do that, you need to understand their caching strategies, so we as researchers had to reverse engineer the caching strategies of four large public resolvers.

B

Next, I'm going to talk about our tool, which is called truffle hunter and how we used it to measure certain case studies and learn a little bit more about these types of rare abuse that we're interested in measuring, so I'll get started on background of cash snooping.

B

So if somebody were to make a request for example.com to their local dns resolver, that resolver would have to look in its cache and say all right is example.com present. If so, they can return the response quickly to the user. If not, they have to go to the authoritative name server.

B

What you can do, if you are a snooper, is make a request for example.com but set a flag that tells the resolver. It is not allowed to check the authoritative name server that way. If you get a valid response back with a valid ip and a valid ttl or time to live value, then you know that the domain was cached.

B

I should note that all of the resolvers we've been using this on do respect the recursion desired flag, with one slight exception, which I'll get into later.

B

But the thing about cash snooping as a measurement technique is that it only provides a lower bound on the number of users that are accessing a domain. So if multiple users have hit the same cache for the same domain before that ttl expires and the record is removed from cache, you won't be able to observe them.

B

You can observe a maximum of one user per cash per ttl, but that's okay for our purposes, because we're looking for the types of phenomena where nobody knows how frequent they are, how frequently they occur in the wild and it's valuable, even just to get that lower bound, because seeing any of them is actually really bad news.

B

But cash snooping on a single resolver is actually reasonably straightforward in order to do it on a public resolver things get significantly more complicated. So let me talk next about how public resolvers work in broad terms and then how they work in more specific terms, when a user wants to send a request to a public resolver.

B

That query is first routed using ip anycast to the first available or the closest point of presence or pop. Once there it can be routed to one of any number of front-end caches. There are a lot of these, and if it misses in those front-end caches, it will be sent to one of usually several back-end resolvers.

B

So that's good for us in some ways, because each of those caches represents another possibility of observing a user in the wild.

B

But of course, these complicated caching techniques make our job a little bit harder, because we need to be able to count how many caches our queries have hit and we need to be able to differentiate between them so that we don't double count full caches.

B

This challenge is exacerbated by the fact that each of these public resolvers implements their caching differently, and we have found that inconsistency could cause potential problems, while some of the resolvers we've looked at always do seem to serve records with correct ttls.

B

We have found some that can either serve records with incorrect ttls or will serve records after the ttl should have expired. So in a few slides I'll go into more details on that.

B

I've mentioned that, in order to do our measurement study, we need to count how many caches have been filled with a domain, and that means we need to be able to identify which caches our queries have hit. So we had to understand the caching strategy of each of these four public resolvers.

B

To do that, we only have access to the ttl and the timestamp of the dns queries that we were sending, which made reverse engineering these things kind of challenging so I'll go over now how we did that.

B

We ran an experiment where, from a single central location, we would repeatedly query a resolver and try and fill its caches. Then we looked at the query responses that came back and we looked at their ttls and their time stamps to try and figure out how the caches worked.

B

At this point, I have to introduce the concept of a ttl line, which is just our model of how a ttl decreases in a cache, a ttl and a cache ought to decrease by about one second per second. So if you plot a bunch of measurements that have all hit the same cache, you ought to see if you plot their timestamp against their ttls, that that decreases by one second per second, and that's this green line in the figure here.

B

That's our model of what the ttl should be doing in the cache when we do that against real resolvers, in this case open dns and quad 9. We get something like this.

B

It looks like about what you would expect now note that this plot is zoomed in on the first 50 seconds, just to show detail, so the ttl lines aren't shown going all the way to zero. But of course we expect them to do that. Eventually.

B

The first thing that you might notice is this row of measurements across the top, which came back with the maximum ttl value for the domain. We used they're circled in red.

B

We assume that each of these measurements filled a new cache because they came back with the maximum ttl value, and we were able to confirm this because we controlled the authoritative name server for the domain that we were querying. So we confirmed that every time our authoritative name server got a new request.

B

We got a measurement back at our measurement source with a maximum ttl value.

B

You'll also notice that all of the measurements that are not one of the top row of circle dots lie on one of the ttl lines, so they look like they came from one of the caches that we observed to be filled. So that's great. That means that open, dns and quadmine's caching architecture is reasonably straightforward.

B

Requests can hit any of several independent front-end caches and if they miss they're, sent to a group of independent back-end caches, and we can just count the ttl lines that we see and assume that that's the number of filled caches.

B

When we ran this experiment on cloudflare, we got a very different looking graph, so we do get a a first measurement which came back with the maximum ttl, but all of the measurements. After that, look like they came from the same cache, which we would have thought would be unusual in a resolver of cloudflare's size.

B

You can also see that for a while, the measurements look like they're exactly on the ttl line, but then they start to drift over time and we notice that they would always drift upward. So what we think is happening is that cloudflare has a shared, front-end cache shared and distributed as soon as a measurement arrives. In one cache, it is shared with all of the others.

B

So that's a little bit disappointing for our purposes, because at the whole pop we can only see one cache get filled, so we can only measure one user per pop on cloudflare.

B

There is a question with cloudflare strategy of whether or not it is completely compliant with the dns rfc for ttls. The maximum drift that we saw of away from the true ttl value was about 80 seconds. So we were using a domain at the time with a ttl of about three hours and we saw that there were still measurements in cash whose ttls hadn't yet expired for about 80 seconds after they should have expired.

B

Now, it's important to note that the drift scales with the maximum ttl, so probably even if you have a 60 second ttl you're, only going to have a drift of a few seconds and that's probably not going to be an issue for you. Even if you have such a short ttl and if you have a long, ttl, you're, probably tolerant of more drift. So we concluded that the actual problems here are likely to be very small.

B

And then, finally, we looked at google dns, which is the resolver where the cache is filled themselves. This actually isn't just our observation. There has been prior work that has observed this as well.

B

uh Champet all found that they could make requests and and get a accurate ttl back on the original requests, but then they would keep making requests and they would find subsequent ttls to be wrong because it looked like those ttls were coming from caches that had never been filled and then reprimando at all uh noticed the same effect and called these mystery caches, ghost caches, which we thought was a great name for them. So why on earth are these caches getting filled without being queried?

B

I'll show you what we mean here? If you look at all of these blue lines, you will notice that there is no measurement at the start of these blue ttl lines. These mystery caches.

B

According to our theory, every filled cache should correspond to two things: a a measurement that we made with a maximum ttl at its start and b, a request to our authoritative name server, and we didn't see either of those here.

B

What we eventually noticed is that every light blue cache does appear to get filled at the same time as one of our measurements was made, you can see that the dotted lines that descend from the start of each cache line each pass through a measurement.

B

So what we think is happening is this: google is using what we call a dynamic caching strategy when a request comes into google and it misses in a front-end cache. That's its light. This light blue cache. Here then it's going to get forwarded to a back end cache and let's assume that that back-end cache is already full and it has a ttl less than the maximum value.

B

The first time that ttl less than the maximum value, let's say 550 seconds- is going to get sent back to the user, but at the same time the front end cache is going to store the record and it's going to store the maximum ttl which in this case, let's say it's 600 seconds.

B

So you can think of it like this. Every request that comes into google dns has a chance to spawn a new cache that is visible to cash snipping. So that's great news for us as researchers running a measurement study, because we will see a much greater percentage of unique queries on google than we will anywhere else.

B

But you do have the question of whether this strategy is going to lead to inaccurate ttls. Now we did observe that if we queried a domain that we had placed in cache and we queried it all the way until it's ttl expired, these ghost caches or these front-end caches that we filled ourselves.

B

They did expire even if the ttl had not reached zero. Yet when the original back-end cache expired, so that's good, but we noticed that a user could make a request just before the ttl of the backend cache expires and get a cache that had just been filled, and that could lead to extending the ttl to twice as long as it should be.

B

So the maximum drift is just twice as long as whatever the maximum ttl is.

B

Whether or not this is actually a problem is not up for us to decide. We couldn't think of a use case where it would be super problematic, but we do have a question of. Why would why would this be a useful strategy? Why store the maximum ttl in the front end caches rather than just copying the ttl from the back end caches?

B

Now it's great that google did this from our point of view, because it really enabled our measurement study, but we couldn't come up with a reason why it would be more efficient or more performant to do that. So if anyone is here from google or if anybody wants to weigh in, I would love to get somebody's thoughts on that when I'm done with this talk.

B

So to summarize, open dns and quad 9 appear to have a pretty straightforward caching strategy, and we don't think that that caching strategy ever manipulates. The ttls of the responses at all cloudflare has this shared and distributed front-end cache, and we do notice that the ttls are affected slightly by it.

B

But we don't think that's likely to be too much of an issue, because the drift is so small compared to the length of the maximum ttl and google has what we call a dynamic caching strategy and that can result in a ttl received by the client being about twice as long as it should be, because you could receive a maximum ttl right before the back end. Caches ttl was set to expire, so you should have been receiving a very small ttl and you receive one that's closer to the maximum.

B

Now that we've talked about how to use cash snooping on public resolvers by counting the caches that your queries hit, let's talk about our tool, which we have nicknamed truffle hunter.

B

It's our distributed measurement tool which we've deployed on cada's archipelago project. That means it's on 46 different measurement nodes scattered across the united states.

B

All it does is send continuous dns queries across the us for the domains that we're interested in when it gets the responses back. It interprets them according to our models, to try and figure out how many caches were filled and we go from there to estimating counts of users. In some cases we have three months of data or did at the time we wrote this paper from march to may in 2020.

B

Truffle hunter is, of course, not perfect and the first big question that we had when we deployed it was how accurate it was at estimating the number of caches that were filled, because we know that our models of cache architecture might not be 100 accurate.

B

So we ran an experiment where, from 900 different ripe atlas probes, we placed a domain we controlled into the caches of public resolvers. The idea was just to put it there, as if people across the u.s had done it. Naturally, then we used truffle hunter to try and observe it in those public resolver caches, because it's a domain we control. We could conclude that the number of requests that came into our authoritative name server should be the true number of filled caches, except in the case of google, which of course does its own thing.

B

We found that we performed best on opendns and cloudflare, except in the case of one particular cloudflare pop, where we think there was some routing going on that we didn't account for during our experiment on gp dns. It turned out to be difficult to accurately remove all the front-end caches that had been filled by our own probes, so we conservatively moved removed more than we had actually created in order to ensure we never over counted.

B

That's consistent with our goal of always providing an underestimate, rather than an overestimate of the number of caches that have been filled, and therefore the number of people that are filling them and then quad 9 was very interesting.

B

It had the same architecture as open dns, so you would expect it to be just as easy to snoop, but as it turns out, quad 9 runs two types of software at each of their back ends, and one of them refuses to respond to non-recursive queries, that's unbound.

B

So if it turned out to be that unbound was the software that had cached our domain. That record essentially became invisible to us, so it does mean we can't observe about half of the filled caches at any quad 9 pop the takeaways here are: first, we were able to tune our algorithm so that we almost always underestimate, which is good, because our goal is to provide these lower bounded estimates of prevalence and second, even on the resolvers, where we have high error. We do see at least half of the filled caches.

B

The next limitation that truffle hunter has is inherent to cash snooping as a measurement technique, and that is that you can only observe one user per cache.

B

So we looked at how many caches were visible on each of these resolvers at any one time we found that google had a great many of them that were visible to us because of their dynamic caching strategy about a hundred times more than either open, dns or quad9.

B

Cloudflare, of course, only has one visible cache per pop, so its graph is omitted here, open dns in quad 9. You might notice that some of the larger pops like nyc and iad have a lot more visible caches than the smaller.

B

Ones so that's truffle hunter in a nutshell. Now the interesting bit is what we actually use, truffle hunter to study, and that's our case studies which I'll talk about next.

B

We ran four of these. I'm only going to talk about three of them here, and these were stock, aware, contract, cheating and very old typo squatting domains.

B

We didn't expect to see very much of any of this stuff for various reasons, and previously all of these things were kind of difficult to measure, maybe type of squatting, not so much, but certainly contract, cheating and stalkerware have been very difficult to observe in the wild and nobody really had any data about how prevalent they were.

B

First of all, let's talk about stalkerware.

B

If you haven't heard the term before it's kind of this emerging spyware thread, it's this software that can be installed on a target's device, either a phone, usually a phone or a desktop computer and it tracks them. It can record location. It often has key loggers to record text social media browsing history, things like that and oftentimes. It can record ambient sound and video of the device as well, and it can hide its presence on the device.

B

So, even if you are downloading it on an android phone which ought to tell you how many apps and what types are installed on it, it can hide its icon so that you can't tell it's there.

B

So we downloaded and profiled 24 of these apps to try and get their network signatures so that we could figure out what dns requests they were. Making six of these apps were what previous work calls dual use.

B

Those are apps that have a legitimate purpose, which is separate from their ability to stalk somebody, but they can be repurposed to act as stalker wear the line between dual use and overt apps is a little bit blurry, but dual use apps tend to be marketed for parental control or employee surveillance, or things like finding a lost phone or backing up your data rather than for spying on a spouse or a girlfriend.

B

Overt apps, on the other hand, are marketed as undetectable and marketed specifically for, or at least often marketed specifically for spying on a spouse or a girlfriend or a husband. Or what have you so they tend to be the more dangerous ones.

B

The prevalence of overt stalker wear is hard to estimate by any other means, because it's very difficult to observe it in the wild prior work in this space has mostly been conducted in clinical settings, so researchers will conduct individual one-on-one interviews with targets and, unfortunately that gives them a low sample size. During these interviews they found few to zero of these overt apps in the wild, but a simple google query will turn up dozens of them and there are ads all over the place.

B

So it really does beg the question of how much of this overt stock aware is out there. Additionally, by the time a target has come in to talk to a professional, they have often already reset their devices. So it's difficult for a clinic to tell which apps were on there before the reset and finally, clinics often lack technical expertise. So if they aren't working with someone who does have expertise in this space, it can be very difficult for them to tell if a device has soccerware installed up.

B

Until this point, we've only been talking about counting filled caches, not counting the number of devices that have made the requests, and we haven't made any attempt to differentiate between a single device that is making multiple requests and multiple devices that are each making a single request.

B

Stalkerware is a really good app to use really good use case to use for figuring this out, because it's supposed to be often installed without the user's knowledge, so it has to make its dns requests automatically without any user interaction, and that means it often makes them at regular, predictable intervals.

B

So if you want to know how many devices have stock aware installed, all you have to do is measure that request rate and then divide the number of filled caches. You see by the request rate of the app that's the technique that we use to come up with this figure here. This graph shows the maximum targets that we ever observed with stock aware installed at any one time across the united states.

B

We found that nearly 6 000 people are being targeted by overt stock, aware in the us today and recall that that's a strict, lower bound. The number may very well be higher, but we suspect it's not lower.

B

The two most interesting apps that we found were called mobile, tracker, free and spy to mobile. Those are the two most frequent that you can see at the top of this chart. Mobile tracker, free, we suspect, is so popular because out of all the overt apps we studied, it was the only one that wasn't subscription based.

B

It was free to install free to use and spy to mobile while not being free was the only one of the overt apps that was available on the google play store because it has a habit of changing its developer name and its name and re-uploading itself multiple times to evade google's rules.

B

We also looked at the dashboards for stalkerware and, when I say a dashboard, I mean the website that an attacker will go to when they want to view the data that have been collected by the app.

B

We again see that mobile tracker free is the one that was visited most frequently, but spike mobile has fallen down in the rankings a little bit so clearly, it is the case that the popularity of the app does not necessarily correspond to how many times somebody is checking the dashboard we theorize. That might be because of differing app capabilities. Mobile tracker free has a lot more features than spy to mobile spider. Mobile is mostly good at tracking location.

B

And the next case, study that we looked at was contract, cheating, which is the new form of plagiarism that the kids are using these days.

B

So in case you haven't heard of this, these are services that you can buy as a student to complete your homework or your projects or even entire classes, for you, I've even seen a few that are offering to get entire online degrees.

B

It's pretty hard to detect when a student is using this because they're not actually plagiarizing they're, not copying, work that exists already they're hiring someone to create original content for them. Of course, your mileage may vary. Some of these services are better than others, but there are a few that are good enough to get a's in most cases, even including college, and sometimes graduate classes.

B

So it's of course hard to observe in the wild, because students aren't going to just admit that they have done cheating even on anonymous surveys, which is how a lot of this work has been done in the past. So we had truffle hunter look for it and we observed that. Yes, you see a lot of requests per day to these contract cheating websites.

B

Now, of course, a request made for the website doesn't necessarily mean that a student bought anything, but it's still an interesting number to observe. We saw that some of these decrease some of these services, which we measured over the last couple weeks of may were decreasing over time, which we thought was interesting. It might indicate that schools are letting out for summer break so demand for cheating is going down.

B

And then, finally, because we had some of these domains, we look for typo squatting. These domains are pretty old. We don't expect that they are being used to fish. Anybody anymore because received wisdom in prior work says that phishing domains and type of squatting domains usually roll over very quickly. They get blacklisted and then the miscreants move on to other domains.

B

So you wouldn't expect that any of these would still be receiving dns requests, but we saw that yes, they are still getting a few requests per day, which is interesting.

B

So the takeaway, from our point of view, is that cash snooping on public resolvers shouldn't actually be gotten rid of. Yet we argue that there are minimal privacy concerns when you're cash snooping on public resolvers, because there are too many users to figure out which user put a domain into cash.

B

And if you allow cash snooping on these resolvers, then you can measure types of harm that are otherwise very difficult to study, in particular for stalker ware. It's very difficult to figure out how much of this stuff exists in the wild and each instance of stalkerware represents a significant amount of harm being done.

B

Furthermore, contract cheating is difficult to study, because students are just not honest about whether or not they've bought cheating software and then there's other phenomena which we didn't get a chance to measure very well, which we would like to look into more in the future, such as these new hack for hire, services and phishing, which by all accounts, is quite common. But we would like to see how much of it is happening in various places around the world.

B

To conclude, we found that public dns resolvers enable us to use cash snooping as a privacy, preserving measurement technique rather than an attack. We think this is a valuable measurement technique that should not be disabled on specifically public dns resolvers.

B

We also found that to use cash snooping on public resolvers, you have to understand their cash architecture, which is quite complex, so we reverse engineered four of these resolver strategies and we did find that cloudflare and google caused some minor ttl non-compliance, whether that's an actual issue or not, is not up for us to decide, but we suspect it's not that much of a problem and then finally, we used our tool to find non-trivial lower bounds of the prevalence of internet phenomena that were previously very difficult to study.

B

So travel hunter is open source. If you want to try it out for yourself from a single location, you can go to this github link at the bottom of the slide, and I'd like to thank you for your attention and take questions at this time.

A

Okay, thank you very much. Audrey uh excellent talk. um I I see there's uh a bunch of uh discussion in the chat there. um If anyone uh wishes to ask questions in the audio, then please go ahead.

A

Please join the queue.

A

I don't know if audrey has yes, you are.

B

Sorry figure out figuring out, oh dear, oh dear.

A

I I can see you and I can hear you.

B

I'm getting quite a bit of echo, let me just plug in the headphones real fast.

B

All right, colin, can you say something.

A

Yeah is it working better now.

B

Yes, I believe so, okay, so um there.

E

B

Questions in the chat that I haven't had a chance to get to yet.

A

People in the audio queue as well.

B

All right so to andrew's question about um resolvers, offering malicious content filtering.

B

We were using the services that provide content filtering, but the thing about stalker ware is that it usually isn't filtered because most threat feeds didn't consider it a threat until very recently, um it's kind of an emerging threat, so people haven't been focusing on it too much until just the the last couple of years and the the problem with trying to block stalker ware is that, first of all, you have the dual use issue, so you could be blocking a legitimate app that is being used for a legal purpose.

B

And second, um if you block it, then you have the risk of making a stalker think that the person being stalked is trying to leave the toxic relationship or trying to take action against the stalker and research suggests that that is the point at which um surveillance can turn violent.

B

So it's it's not necessarily a good thing to block this stuff, so most threat fees. Don't um let's see jim asked where we got the domain names for contract cheating services they're very easy to find if you google them. So what we're measuring is hits on the main landing page for each of these services, when you just get there from a google search.

B

Let's see and then the list of typo squad names came from, I think a couple of older papers, which is why we wouldn't have expected to see any of them and why it was interesting that some of them were still getting. You know up to 100 hits per day.

A

I think jonathan has a question in the audio.

G

What uh hi, um the the number of uh dual use uh technologies seemed very low compared to the amount of stock aware was. Is that the what you actually measured, that dual use is less common than stalker wire.

B

um The particular dual apps dual use, apps that we measured did seem to be less common than the most common types of overt stalker wear, but there could be a number of reasons for that. So, first of all, we didn't measure as many dual use apps by choice, because people know more about the dual use ones about their their usage and their prevalence already, and what we really wanted to know was the overt ones, so we may not have found the most popular ones.

B

We we do find based on previous literature in this space, that most surveillance is actually not done by the um the overt apps. It's done by you know, misconfigured sharing settings or whatnot, um so that was the the really interesting thing we wanted to find out was why the overt stuff was there.

G

Interesting. Thank you.

A

H

Great work, audrey. I think this is an excellent presentation. um Could you see a little bit more about the rd bit setting stuff? Has that had any influence on your work or on the behavior of the caches that you're measuring?

H

um um I'm sorry, you.

B

Cut out for just a second at the end.

H

Sorry, you mentioned earlier in the presentation about the rd bit setting on the queries. Could you say a little bit more about how that's influenced your work or the behavior of these resolver caches.

B

Yes, so the kind of cash snooping that we're doing wouldn't be possible without poisoning the caches ourselves without using the the recursion desired bit because um recursion if the recursion desired bit is off. Of course you make the query and if it isn't in cash, then it won't get put into cash.

B

uh Google was the exception to this, because we found that if we had hit a back end resolver that had the query cache and then we hit a front-end resolver that did not have the query cache a front-end cache that did not have the query cached, then the um the record would be copied from the back-end resolver to the front-end resolver. So our own probes would fill the caches there and we did find that it was a challenge to remove the poisoning that we had done and then, as someone just pointed out in the chat.

B

Yes, so unbound will return a refused response when you try and make a query with the recursion desired flag, which was why we could only measure half of the caches at quad9, because if they had a resolver with unbound software that received the original query, then it became invisible to truffle hunter. Does that answer your question.

H

Okay, yes, it does sorry, I didn't realize you still had to make.

A

So uh I see some discussion about uh whether it's desirable to enable the the ability to respond to rd equals zero. Zero queries in the chats.

B

um Yeah, so I think the the issue is that um sorry, the the issue with whether or not to enable recursion desired is you want to enable recursion desired? If you want to enable these types of measurement studies on resolvers, where there are few to no de-anonymization risks for users and those are the large resolvers if you are running dns resolving software- and you know that it's going to get put on small home routers which can be used.

B

As you know, misconfigured open resolvers, then I would say no, you want to have defenses against cash snooping and sure that could include not allowing queries with recursion desired on set. um So the question is how large the resolver.

A

Is this may be.

B

Something where.

A

It's worth chatting with some of the folks in the dns operations group.

B

Yes, absolutely, I would love to do that.

H

A

So there's a question.

B

A

Results from around the world.

B

Yes, um jonathan is saying it would be sorry, I'm getting some echo again. Did I do something?

B

Okay, um yeah jonathan was saying it would be great to see results from around the world. um Yes, so the measurement platform that we're using, which is how we get results from around the united states, doesn't have quite as many nodes around the world. But yes, it would be great to um to expand to that.

A

What what would it need to do that so do you.

B

No, it would actually probably be reasonably straightforward. um We were keeping our experiment to the united states in the first place so that we wouldn't add too much load on their system, um and then it just ended up that. That was the data that we had to to publish our results. We are at the moment working on expanding a similar tool to truffle hunter, um but instead of measuring domain usage, we're trying to measure dns hijacking and we do want to expand that to worldwide using similar techniques.

B

Vladimir is saying that default settings aren't targeted for use huge instances like those studied. I don't know if you want to elaborate a little bit more on that.

B

um Are we talking about enabling recursion desired or not in in large public resolvers.

B

Yes, um yes, I think, having the ability to enable it or disable it based on what kind of what kind of resolver you're running the size of the resolver you're running is is probably the the best solution.

A

Okay, I can't tell jim I used still in the queue to talk.

A

A

Okay, uh so I had a question uh I mean we're seeing uh obviously um people making increasing use of uh you know different types of dns transports. You know over tls or over https uh and we're seeing um new techniques like oblivious dns that are getting proposed. um Do any of these make a difference to the type of work you're doing.

B

um Dns sec and dns over https or dns over tls do not affect our work, because the queries still get cached the same way you mentioned oblivious. I haven't actually heard of that before.

A

uh Yeah, this is a a new thing which I have to say. I don't know a whole lot about. It seems to be using uh encryption and proxy resolvers to anonymize the people, making the queries.

B

Okay, um I suspect that, as long as those queries are still arriving at public resolvers, then it won't interfere with our technique, but if it provides an extra layer of anonymization for the people making the queries then awesome.

A

Presumably the um that the dns over https would maybe allow some of these um uh some of the stockwear apps to make the dns queries in a more controlled way. So maybe avoid resolvers.

B

Yes, it might the thing about stalkerware apps is that they tend to be incredibly unsophisticated. They don't try and obfuscate their code at all half of them crash as soon as they get installed on the phone, at least on an older device, um and they they seem to have a lot of bugs and problems in general. So I would be surprised to see them adopting any sophisticated techniques like that in the near future, but there are some that are certainly ahead of the game like flexispy.

B

um That would be a more problem more of a problem than the others.

B

Okay, um jonathan is pointing out that this might mess with our geolocation. Yes, but I think, as we realize uh anycast is, is not a great way to do geolocation in any case, so you know, our geolocation is a little bit suspect. We can say that users, possibly in a very broad region, are experiencing more stuck aware or whatnot. Then then, users across the country, but anycast is not a great way to figure out where users actually are to the best of my knowledge. In any case,.

A

B

There's some more questions about, oblivious, which I feel like I might not be able to answer.

A

Yeah, I'm sure some of the people- developing oblivious, uh are probably here. So we can. We can hook you up. We've done that yeah. I would love to learn more.

B

A

Okay, um I think it would also be good if any of the operators of these services uh uh around, uh if we could try and put you in touch with them as well. That would be helpful.

B

Yeah, that would be great.

A

All right, uh as already last questions for audrey before we finish.

A

Up, no, I guess that's everyone. Well, thank you uh again uh to to audrey and and to francis uh the two two really nice talks, uh some really good discussion, uh and I know it's a a fairly unpleasant error in the day for both of them.

A

So thank you for getting up so early to uh participate in this, uh as I, as I said, at the the end of uh francis's talker, I'm sure both audrey and francis will be around for for the rest of the week and will be available if you want to chat with them. uh So please do get in touch with them.

A

If you have any any questions or want to talk about any of this work further- uh and that's uh that's all we have uh for this session uh today, uh look out for the um applied networking research workshop uh call for papers uh coming up uh and uh look out for the rest of the the irtf sessions uh later this week uh and the the recordings of these talks uh and the links to the papers are on the irtf web page. If you want to look into them in more detail all right thanks.

A

A