[Openshift] machine-config 에러 : Failed to resync 4.12.0 because: error during syncRequiredMachineConfigPools

리눅스/OpenShift|2023. 2. 13. 13:40
반응형

Cluster Operators 상태 확인 명령중 machine-config 에서 아래와 같은 오류가 출력되었습니다.

필수 시스템 구성 풀 동기화 중 오류 발생한 내용으로 출력되며,

이에대한 조치는 다음과 같이 할 수 있습니다.

 

 

[ 에러 ]

# oc get co machine-config
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.12.0    True        False         True       4d17h   Failed to resync 4.12.0 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 2)]

 

 

[ 시도 ]

Machine Config 정보를 확인합니다.

# oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          2b3eba74dd9e4371f35ab41dbda02642f60707ec   3.2.0             4d18h
00-worker                                          2b3eba74dd9e4371f35ab41dbda02642f60707ec   3.2.0             4d18h
01-master-container-runtime                        2b3eba74dd9e4371f35ab41dbda02642f60707ec   3.2.0             4d18h
01-master-kubelet                                  2b3eba74dd9e4371f35ab41dbda02642f60707ec   3.2.0             4d18h
01-worker-container-runtime                        2b3eba74dd9e4371f35ab41dbda02642f60707ec   3.2.0             4d18h
01-worker-kubelet                                  2b3eba74dd9e4371f35ab41dbda02642f60707ec   3.2.0             4d18h
99-master-generated-registries                     2b3eba74dd9e4371f35ab41dbda02642f60707ec   3.2.0             4d18h
99-master-ssh                                                                                 3.2.0             4d20h
99-worker-generated-registries                     2b3eba74dd9e4371f35ab41dbda02642f60707ec   3.2.0             4d18h
99-worker-ssh                                                                                 3.2.0             4d20h
rendered-master-66a1694f13f84151e53205fd83d336d0   2b3eba74dd9e4371f35ab41dbda02642f60707ec   3.2.0             16h
rendered-worker-bb1960e620ab292f59adbe8549700341   2b3eba74dd9e4371f35ab41dbda02642f60707ec   3.2.0             20h
rendered-worker-e04ed26b510360ba9a8ccbba6aa7382e   2b3eba74dd9e4371f35ab41dbda02642f60707ec   3.2.0             4d18h

 

Machine Config Poll 을 확인합니다.

# oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master                                                      False     True       True       3              0                   0                     3                      16h
worker   rendered-worker-e04ed26b510360ba9a8ccbba6aa7382e   False     True       True       2              0                   0                     1                      4d19h

 

위에서 master 만의 문제라면 아래 명령 만으로 상태 제거 (시간이 지나면 다시 자동으로 생성 됌) 하면 되겠지만

# oc delete mcp master

 

실 사례 에서는 worker 노드까지 문제가 발생한 것으로 확인되었습니다.

머신의 구성 풀을 조사하면, 문제의 원인이 설정 변경 때문인지 아닌지를 판단할 수 있습니다.

# oc describe mcp worker |grep -i node
        f:nodeSelector:
            f:node-role.kubernetes.io/worker:
  Node Selector:
      node-role.kubernetes.io/worker:  
    Message:               All nodes are updating to rendered-worker-bb1960e620ab292f59adbe8549700341
    Message:               Node worker01.az1.sysdocu.kr is reporting: "Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6db665511f305ef230a2c752d836fe073e80550dc21cede3c55cf44db01db365 failed: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6db665511f305ef230a2c752d836fe073e80550dc21cede3c55cf44db01db365: reading manifest sha256:6db665511f305ef230a2c752d836fe073e80550dc21cede3c55cf44db01db365 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized\n: exit status 125]"
    Reason:                1 nodes are reporting degraded status on sync
    Type:                  NodeDegraded

 

worker01.az1.sysdocu.kr 노드의 상태를 좀 더 자세히 확인합니다.

# oc describe node/worker01.az1.sysdocu.kr

...

Annotations:        machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-e04ed26b510360ba9a8ccbba6aa7382e
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-bb1960e620ab292f59adbe8549700341
                    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-e04ed26b510360ba9a8ccbba6aa7382e
                    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-e04ed26b510360ba9a8ccbba6aa7382e
                    machineconfiguration.openshift.io/reason:
                      Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q ...
                      : exit status 125]
                    machineconfiguration.openshift.io/ssh: accessed
                    machineconfiguration.openshift.io/state: Degraded
                    volumes.kubernetes.io/controller-managed-attach-detach: true

...

 

노드 상태는 여전히 Degraded 로 확인되었습니다.

 

[ 해결 ]

insights 항목의 오류 부분을 해결하고자 오류 이벤트를 전송하지 않도록 disable 해놨었는데,

그것이 원인이 되어 machine-config 가 에러 발생한 것이였습니다.

insights 를 disable 하지 않아도 어차피 오류 정보는 전송이 안될테니까 아래 내용으로 돌려놨더니

machine-config 가 정상으로 돌아왔습니다.

 

# oc get co insights
NAME       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
insights   4.12.0    False       False         True       7d16h   Unable to report: unable to build request to connect to Insights server: Post "https://console.redhat.com/api/ingress/v1/upload": x509: certificate is valid for *.apps.az1.sysdocu.kr, not console.redhat.com

 

# oc get co machine-config
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.12.0    True        False         False      7d18h  

반응형

댓글()