A Top-1 Algorithm Library over MTurk
As a good example of crowdsourcing, Amazon Mechanical Turk (MTurk) comes up with a platform where people provide their human intelligence service with a reseaonable price. MTurk requesters sometimes have a big task which needs to be divided into small sub-tasks and combines their sub-results to reach the final result. This process can be organized like a tree or a bubble. With this library, you can easily make this feature. You can:
  • check it out here on Github.


Here are two ways to use this library based on your programming lanuguage -- a library in Java or an algorithm server.
The Library in Java

If you have Java as your MTurk implementation language, there is Java library exclusively for you. The key part is to implement MyHit interface, where you customize your own operations on HITs such as creating a HIT, retrieving answers from a HIT and dumping an used HIT. By passing an instance of class implementing MyHit, building an instance of the algorithm and starting it, everything will be done by the library.


The Algorithm Server

If you program your MTurk task in other programming languages, you can pass your questions in a request string to an algorithm server through socket. The string for communication between the algorithm server and your client is in a CGI style. You don't have to implement MyHit interface if you choose this way. Instead, the only thing is to intract with the algorithm server. The steps are as follows:


  1. Locate your MTurk property file in the same directory where the server is.
  2. Start an algorithm server using command line like:
    java -jar TopOneServer tree 50000 50001 127.0.0.1 mturk.properties
    • tree is the type of the algorithm the server will be running. Another available value of this parameter can be bubble.
    • 50000 is the port of the server. The server will keep listening at this port to accept initilization requests from client side.
    • 50001 is the port of the client. The client will keep listening at this port to accept requests of creating HITs, fetching answers or returning final answers from server. When a request comes up, the client will open up a new thread to process this request. After creating a new HIT/fetching the answers of a HIT/receiving the final answers, this thread will be closed.
    • 127.0.0.1 is the IP address of the client program.
    • mturk.properties is the name of your Amazon Mechanical Turk property file, which stores your access_key, secret_key and service_url. You are supposed to provide this file since the algorithm needs to keep track on the active HITs you created. Once a HIT is done, the algorithm will send your client program a request for answers.
  3. Start your client program and send an initialization request to the server. The request string would be like: qnum=19&q0=1&q1=2&q2=3&q3=4&q4=5&q5=6&q6=7&q7=8&q8=9&q9=10&q10=11&
    q11=12&q12=13&q13=14&q14=15&q15=16&q16=17&q17=18&q18=19&nInput=3&
    nOutput=2&nAssignment=2&nTieAssignment=1\n

    After receiving a request like above, the server will start a new thread as an instance of the algorithm by these initialization parameters.
    • qnum is the number of all the questions you have, q0 is the first question and q18 is the last question here.
    • nInput is the number of inputs/questions a HIT has.
    • nOutput is the number of outputs/answers a HIT should return.
    • nAssignment is the number of assignments of a normal HIT.
    • nTieAssignment is the number of assignments of a tie-solving HIT (because there may be a tie among different answers who win same amount of votes).
    In addition, you can also indicate some other paramters, such as isShuffled, isLogged, jobId.
    • isShuffled is true if the server shuffles the inputs. If it is omitted, it is true by default.
    • isLogged is true if the server records a log. If it is omitted, it is true by default.
    • jobId is the name of the instance of algorithm (namly, the current specific task) given by you. It is also the name of the log file. If it is omitted, it is a random number generated by the hash code of the date.
    Now, the algorithm is running inside the server.
  4. During the process of algorithm, the server will request you to create HITs, return answers of HITs and receive the final result. Your client should have a "server" program which keeps listening at a port so that it can receive requests from and return answers to the algorithm server all the time. Please don't forget to append the newline charater at the end of each string you send, as "\n" appeared in every string below, since the server uses readLine function to receive strings. The request string from the server would be in one of the three formats below:
    • type=createHit&nOutput=2&nAssignment=3&jobId=1831723&qnum=3&
      q0=3&q1=13&q2=7\n

      If the request is to ask for creating a HIT live above, your client should use the provided parameters to create a HIT and return its ID. Note that the returned string contains nothing but the ID itself, without any other additional information. For example, if it create a HIT with ID 8W7E5D9AWE67WQ, the returned string is supposed to be 8W7E5D9AWE67WQ\n
    • type=getAnswer&hitId=JASDH7SD9S9&jobId=1831723\n
      If the request is to ask for retrieving answers of a HIT, you should return a string containing the answer of the speicified HIT. The string you return should follow this format: anum=2&a0=3&a1=6\n, where anum is the number of answers, and a0 is the first answer and so on.
    • type=returnFinalAnswer&finalAnswer=6\n
      If the request is to return the final answer to you, you may extract the answer in the finalAnswer field.
  5. After returning the final answer, the whole process is done.



Please see more below: