How to balance power level in custom set?

Hi everyone,
I’m in the process of designing my first custom set for Magic: The Gathering, but I’m having a hard time balancing the power levels of the cards.
A few things I’m struggling with:
Some of my 2-mana creatures seem to be too strong for Limited (they almost always win fights).
The removal spells are either too weak, or they just wipe things out too easily.
When I was building my test deck, I noticed that the combos in the set were a bit too easy to become “surefire” if I drew the right piece.
I tried a few solutions:
Compared to similar cards in recent sets (Modern Horizons 3, Wilds of Eldraine…).
Created a scoreboard (mana efficiency, board impact, flexibility).
I did a mini-playtest with friends, but the results were still quite different.
What methods do people usually apply to keep the power level reasonable when creating custom sets?
Is there a tool/sheet that helps quantify the power of cards?
How many playtests are enough to determine whether a card is "standard"?
I would love to hear your experiences—especially those who have completed a complete fan-made set.
Thanks, everyone, in advance!
Crazy Cattle 3D
I’m in the process of designing my first custom set for Magic: The Gathering, but I’m having a hard time balancing the power levels of the cards.
A few things I’m struggling with:
Some of my 2-mana creatures seem to be too strong for Limited (they almost always win fights).
The removal spells are either too weak, or they just wipe things out too easily.
When I was building my test deck, I noticed that the combos in the set were a bit too easy to become “surefire” if I drew the right piece.
I tried a few solutions:
Compared to similar cards in recent sets (Modern Horizons 3, Wilds of Eldraine…).
Created a scoreboard (mana efficiency, board impact, flexibility).
I did a mini-playtest with friends, but the results were still quite different.
What methods do people usually apply to keep the power level reasonable when creating custom sets?
Is there a tool/sheet that helps quantify the power of cards?
How many playtests are enough to determine whether a card is "standard"?
I would love to hear your experiences—especially those who have completed a complete fan-made set.
Thanks, everyone, in advance!
Crazy Cattle 3D