Running the Benchmarks
You need a goalpost to measure the strength of your model before going to battle against other models, so we enabled testing against our rules-based agents. We call them benchmarks.
Each benchmark consists of battling a rules-based agent 10 times - 5 times starting from each side.
The following is the scoring methodology for the benchmarks:
const getBenchmarkScore = (yourHealth, opponentHealth, timeRemaining, startingTime) => {
const relativeHealth = yourHealth - opponentHealth
var resultMultiple
if (relativeHealth > 0) resultMultiple = 1
else if (relativeHealth < 0) resultMultiple = -1
else resultMultiple = 0
const timeRemainingScore = timeScoreMultiple * resultMultiple * timeRemaining / startingTime
const healthRemainingScore = healthScoreMultiple * relativeHealth
return timeRemainingScore + healthRemainingScore
}
Benchmark Analytics
We provide researchers with tools to analyze their model’s performance against benchmarks.
Watch Your Model
This is when it becomes really fun. You can watch your model battle against the benchmarks as well!
AI Inspector
We provide you with a tool to analyze more of your agent’s policy. Researchers can toggle the state to see what AI would do in every possible scenario. We make every aspect of the state toggle-able.
Compete Against Others
After you are comfortable with your model’s performance against the benchmarks, join the ranked competition and battle against other models from around the world.