Cheaters are Out of Control Today. Is Lichess anti cheating team on vacation today?

@delorenflie said in #78:
> but I do have the feeling that this discussion is pointless.

I agree with that, although for different reasons. You claim to know things about ML, but so far I could not tell whether it is more than just a claim. I tried to get you to formulate independent thoughts with my question, but you only aped what was said in the other topic. So I cannot judge if you are qualified for a technical discussion. What I do know is that you are not the type of person to look up the basics on how cheat detection happens on lichess, but instead you approach it as if zero domain specific knowledge is necessary. Which, quite frankly, if you are indeed working in ML, is a very unhealthy attitude that has bitten a lot of ML researches before. The first and most basic step is to know your data.

> Because you are assuming that there absolutely no correlations between some of these other data that you say is used to automatically flag some players and the data that is fed to Kaladin and Irwin. That is likely a WRONG assumption, not only in my experience, but given my own research. if y~f(data fed to the system, data not fed to the system) [...]

Yes, because you lack domain specific knowledge, as above. If you had any idea how tosViolation flags for browser plugin detection versus tosViolation flags for other kind of engine use worked, you might agree with me that those two are almost completely orthogonal. Irwin was to the best of my knowledge never fed data from browser plugin detection for that precise reason. Training a ML system with the data Irwin gets to predict tosViolation flags for browser plugin detection is like training it to predict a coin flip. There was also other data withheld for similar reasons.

> I do not think I am making incorrect assumptions

You are constantly making them, and you do not seem to see any reason to reconsider your assumptions, even when you are told they are wrong. Instead, you prefer to think I am incompetent and do not know what I am talking about. Fair enough.

> I am not sure you are fully being able to pass across to me all you want to say

I am indeed not. The reason is that I cannot pass on all the information I get as a moderator. These things are only discussed openly in restricted access areas of lichess.

> Therefore, what I think we need is a forum specific for these things

Oh, of course we have a forum like that. But we only invite people when we think their expertise would be useful. For the reasons you mentioned, we try to keep trolls and people without the necessary background out of it.

#82

JT3579

#83

It sounds like you handled the situation well by reporting the suspected cheater and focusing on enjoying the game yourself. Keep up the positive attitude and keep having fun playing chess!

Cedur216

#84

shouldn't have posted the game though before the ban came in. Basically shouldn't reveal names in public in general.

ChessWithoutBullets

#85

@Cedur216 said in #84:
> shouldn't have posted the game though before the ban came in. Basically shouldn't reveal names in public in general.
He was not banned until I posted the game here, but it's true, it's a bad practice to expose him like this. I'll not do it again, I just wanted to contribute the thread.

delorenflie

#86

@anonmod said in #81:
> I agree with that, although for different reasons. You claim to know things about ML, but so far I could not tell whether it is more than just a claim. I tried to get you to formulate independent thoughts with my question, but you only aped what was said in the other topic. So I cannot judge if you are qualified for a technical discussion. What I do know is that you are not the type of person to look up the basics on how cheat detection happens on lichess, but instead you approach it as if zero domain specific knowledge is necessary. Which, quite frankly, if you are indeed working in ML, is a very unhealthy attitude that has bitten a lot of ML researches before. The first and most basic step is to know your data.

I agree, it impossible for either you or me to prove that we know what we know here. And honestly I don't care much. I agree that domain knowledge is important, as is to understand the methods and models that are being used deeply. I understand most of the input fed to Kaladin, however I still have no clue why do you need a convolutional layer (see github.com/lichess-org/kaladin/blob/main/src/model/assets.py). I also don't fully get why there isn't a recurrent layer (e.g. LSTM) or at least a pre-processing of the input to include that. There are obvious time dependencies for, for example, move times, but it is not trivial for how many steps these dependencies are important.

You do assume, and I understand why as you have no way to realize the contrary, that I have no domain knowledge. However, I spent quite some time working on chess data throughout the past year. And I have found enough to find it strange that you say:

> If you had any idea how tosViolation flags for browser plugin detection versus tosViolation flags for other kind of engine use worked, you might agree with me that those two are almost completely orthogonal. Irwin was to the best of my knowledge never fed data from browser plugin detection for that precise reason. Training a ML system with the data Irwin gets to predict tosViolation flags for browser plugin detection is like training it to predict a coin flip. There was also other data withheld for similar reasons.

But I have found that move times alone, its distribution and variation over time, are often a great predictor of anomalous behavior. In fact, if you characterize players using such move time distributions and cluster them, you will see that it is possible to do anomaly detection, and with a very high percentage (I am still working on quantifying this reliably) the labels, both in lichess and chess.com, fall within the anomalous clusters. Moreover, I have been doing independent tests to figure out whether it is possible to detect bot behavior (often from those plugins you mention) only with move times, and what I find is that human move time distributions, and the way they vary over time (this is crucial) is quite distinct from anything else. Thus, if you feed (even if indirectly) move times to Kaladin, I find it strange that the labels produced would not be somewhat correlated with the detection you make of those plugins independently.

> You are constantly making them, and you do not seem to see any reason to reconsider your assumptions, even when you are told they are wrong. Instead, you prefer to think I am incompetent and do not know what I am talking about. Fair enough.

Agree to disagree

> I am indeed not. The reason is that I cannot pass on all the information I get as a moderator. These things are only discussed openly in restricted access areas of lichess.
>
> Oh, of course we have a forum like that. But we only invite people when we think their expertise would be useful. For the reasons you mentioned, we try to keep trolls and people without the necessary background out of it.

It's great to hear about that, and probably this might be the only thing in which we truly agree for now. I am glad there is a separate forum for that and that it is not accessible to the general public. I wish you luck with it.

anonmod edited

#87

@delorenflie said in #86:
> [...] I find it strange that the labels produced would not be somewhat correlated with the detection you make of those plugins independently.

Yes, because you still have not bothered to try researching what detection of browser plugins is and how it might work. And you are not willing to believe me when I tell you that the data available to either Kaladin or Irwin is of no help in finding any irregularities there.

And while you may have familiarity with general chess data, you obviously have none with chess data as used for lichess ML. You still do not fully understand what the tosViolation flag means. You see my problem here? You make incorrect assumptions about the Yi you are trying to predict. That is not even just data, that is the main variable.

And yet you are now bringing up convolutional and recurrent layers. You seem to think that I should be impressed that you know these concepts. Yet I am mainly shocked that you would prioritise looking at neural network designs over understanding input and output. And that is even when someone already told you, multiple times, that you are misunderstanding those.

This is so absurd to me, I do not even know whether to laugh or to cry.

Edit: Anyway, I am out of this discussion. We already have established before that it is pointless.

mrbasso edited

#88

@Cedur216 said in #63:
> that means it's alright, isn't it?

Not sure...
Since I was on provisional in classical I lost 94 points to a now banned cheater and another 94 points against a 1400 because the site thinks I didn't reply to his 1.f4 move but IMHO his first move did not show up on my screen. Now I'm underrated, but whatever.
It is as it is...

ChessWithoutBullets

#89

@mrbasso said in #88:
> Not sure...
> Since I was on provisional in classical I lost 94 points to a now banned cheater and another 94 points against a 1400 because the site thinks I didn't reply to his 1.f4 move but IMHO his first move did not show up on my screen. Now I'm underrated, but whatever.
> It is as it is...

I saw your game, and at first I didn't think he used computer assistance until I saw his fide profile... Indeed, very weird for a 1400 fide-rated chap to play like that.

Cedur216

#90

@mrbasso well the banning part worked. whether you should be refused a refund when provisional is a different matter.

This topic has been archived and can no longer be replied to.