I am trying to create a machine learning model to predict the position of each team, but I am having trouble organizing the data in a way the model can train off of it.
I want the pandas dataframe to look something like this Where each tournament has team members constantly shifting teams.
And based on the inputted teammates, the model makes a prediction on the team’s position. Does anyone have any suggestions on how I can make a pandas dataframe like this that a model can use as training data? I’m completely stumped.
Coming on to the question as to how to create this sheet, you can easily get the data and store it in the format you described above. The trick is in how to use it as training data for your model. We need to convert it in numerical form to be able to be used as training data to any model. As we know that the max team size is 3 in most cases, we can divide the three names into three columns (keep the column blank, if there are less than 3 members in the team). Now we can either use Label encoding or One-hot encoding to convert the names to numbers. You should create a combined list of all three columns to
LabelEncoder and then use
transform function individually on each column (since the names might be shared in these 3 columns). On label encoding, we can easily use tree-based models. One-hot encoding might lead to curse of dimensionality as there will be many names, so I would prefer not to use it for an initial simple model.