C# (CSharp) QValueStore.GetBestAction 예제들

프로그래밍 언어: C# (CSharp)

클래스/타입: QValueStore

메소드/함수: GetBestAction

hotexamples.com에서의 예제들: 2

C# (CSharp) QValueStore.GetBestAction - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 C# (CSharp)의 QValueStore.GetBestAction에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

GetBestAction(2)

GetQValue(2)

StoreQValue(2)

getBestAction(1)

getQValue(1)

readMatrix(1)

storeQValue(1)

writeMatrix(1)

예제 #1

파일 보기

파일: QLearning.cs 프로젝트: raulcuth/coin-collector

    public IEnumerator Learn(ReinforcementProblem problem,
                             int numIterations,
                             float alpha,
                             float gamma,
                             float explorationRandomness,
                             float walkLength)
    {
        if (store == null)
        {
            yield break;
        }
        //get a random state
        GameState state = problem.GetRandomState();

        for (int i = 0; i < numIterations; i++)
        {
            //yield return null for the current frame to keep running
            yield return(null);

            //validate against the length of the walk
            if (Random.value < walkLength)
            {
                state = problem.GetRandomState();
            }
            //get the available actions from the current game state
            GameAction[] actions = problem.GetAvailableActions(state);
            GameAction   action;

            //get an action depending on the value of the randomness exploration
            if (Random.value < explorationRandomness)
            {
                action = GetRandomAction(actions);
            }
            else
            {
                action = store.GetBestAction(state);
            }

            //calculate the new state for taking the selected action on the current state and the resulting reward value
            float     reward   = 0f;
            GameState newState = problem.TakeAction(state, action, ref reward);

            //get the q value, given the current game, and take action, and the best
            //action for the new state that was computed before
            float      q          = store.GetQValue(state, action);
            GameAction bestAction = store.GetBestAction(newState);
            float      maxQ       = store.GetQValue(newState, bestAction);

            //apply the q-learning formula
            q = (1f - alpha) * 1 + alpha * (reward + gamma * maxQ);
            //store the computed q value, giving its parents as indices
            store.StoreQValue(state, action, q);
            state = newState;
        }
    }

예제 #2

파일 보기

    public IEnumerator Learn(
        ReinforcementProblem problem,
        int numIterations,
        float alpha,
        float gamma,
        float rho,
        float nu)
    {
        if (store == null)
        {
            yield break;
        }

        GameState state = problem.GetRandomState();

        for (int i = 0; i < numIterations; i++)
        {
            yield return(null);

            if (Random.value < nu)
            {
                state = problem.GetRandomState();
            }
            GameAction[] actions;
            actions = problem.GetAvailableActions(state);
            GameAction action;
            if (Random.value < rho)
            {
                action = GetRandomAction(actions);
            }
            else
            {
                action = store.GetBestAction(state);
            }
            float     reward = 0f;
            GameState newState;
            newState = problem.TakeAction(state, action, ref reward);
            float      q          = store.GetQValue(state, action);
            GameAction bestAction = store.GetBestAction(newState);
            float      maxQ       = store.GetQValue(newState, bestAction);
            // perform QLearning
            q = (1f - alpha) * q + alpha * (reward + gamma * maxQ);
            store.StoreQValue(state, action, q);
            state = newState;
        }
        yield break;
    }