Pwn2Win CTF 2017 – Differential Privacy – Crypto

Is it possible to have privacy on these days? The Rebelious Fingers do not think so. Get the flag.

nc 9999

Right off the bat we know this has something to do with differential privacy (Such a surprise!) and we have a server so lets connect to it.

Hello, chose an option:
[1] Info
[2] Query the flag (in ASCII)
[3] Quit

Basic menu, lets see the options.

You can query the flag, but the characters are private (indistinguishable).
Differential privacy mechanism: Laplace
Sensitivity: ||125 – 45|| = 80
Epsilon: 6.5

[85, 89, 108, -16, 78, 67, 123, 61, 110, 96, 103, 85, 107, 116, 122, 130, 96, 83, 108, 136, 100, 94, 103, 90, 117, 102, 73, 118, 118, 112, 112, 105, 101, 110, 143, 103, 75]

Mhm, well not sure what to do with this yet. I don’t really know much about differential privacy, so let’s research. Differential privacy is used to encode data in such a way that you can’t discern any single persons input while keeping the actual statistical findings of the data intact. It does this by adding a Laplace random number set by some parameters to the value. Because the average of the Laplace function is 0, the average of the data set stays about the same.

At this point, I was at a loss. I would work on some other problems and come back to this one now and then. Then I noticed that the query values change on successive connections. Because the center of the Laplace function is 0, if you take the average of a lot of different encodings, it should be the original value! Python time!

from pwn import *
import ast

host = ""
port = 9999

#This is the ammount of values to keep after throwing out the largest and smallest values.
#[100,50,10,7,6,5,4,3,2,1,-10,-50,-100] will keep [7,6,5,4,3,2,1]
sample = 7

def get_list():
  #get a new data set from the server
  #turn it into a list
  conn = remote(host,port)
  data = ast.literal_eval(conn.recvline().decode().strip())
  return data

gotten = None

while True:
  d = get_list()

  if not gotten:
    #initialize the gotten array if it hasn't been
    gotten = [[] for j in d]

  for j,v in enumerate(d):
    #add each new value to the corresponding sub-array

  if len(gotten[0]) > sample + 1:
    #After we've gotten enough data
    r = max([1,int((len(gotten[0])-sample)/2)])
    for vals in gotten:
      #For each character, sort it's values and average the sample
      work = vals[:]
      work = sorted(work)[r:-r]
      print(chr(int(round(sum(work)/len(work)))), end="")
    print(" ", len(gotten[0]))

Running this code will print out an estimate of the correct string after each ‘download.’ After 20 updates, the output is something like this


This makes almost no sense, so lets make a bigger data set.

After 400 updates, it’s looking a bit better


General flag structure is there, and maybe the beginnings of some words.
Let’s keep going. 1622 updates later we get


Only one incomplete word! Looks like it’s probably “just” let’s try that! It worked!