Era Software

No results match your query

Troubleshooting EraSearch

Estimated reading time: 3 minutes
  • reference
  • self-hosted
  • eracloud

This page lists EraSearch errors and issues and how to fix them.

Debugging Helm deployments for self-hosted EraSearch
Copy
Copied!

There are several reasons Helm deployments fail. To start, get deployment details with the commands below, replacing NAMESPACE_NAME and NAME with the namespace and release name you used to install EraSearch.

Review your deployments, checking the READY and AVAILABLE columns:

Copy
Copied!
$ kubectl get deployments -n NAMESPACE_NAME

Get deployment-object and status information:

Copy
Copied!
$ kubectl describe deployment NAME -n NAMESPACE_NAME

Get pod-specific information:

Copy
Copied!
$ kubectl describe pod POD_NAME -n NAMESPACE_NAME

View warnings and other notifications related to EraSearch's namespace:

Copy
Copied!
$ kubectl get events -n NAMESPACE_NAME

Here are some common deployment issues and how to fix them:

  • Bad image pull secrets

    To identify this issue, run kubectl get pods or kubectl describe pod. Then check the output for the ImagePullBackOff pod status.

    To fix this issue, make sure the imagePullSecrets name in values-eradb.yaml matches the secret you created in the EraSearch namespace.

  • Over-provisioned clusters

    To identify this issue, check for Insufficient error events, such as Insufficient CPU. Also, run kubectl get pods or kubectl describe pod to see if pods are stuck in a Pending state.

    This issue suggests there aren't enough Kubernetes cluster resources to support the deployment. To fix it, do one of the following:

    • Update the resources and replicaCount values in values-eradb.yaml to remain under the cluster resource limits. Note that reducing the resources available to EraSearch will decrease overall performance.
    • Increase the available resources to the Kubernetes cluster (or node group) to account for the new EraSearch resources.
  • No available persistent volumes

    To identify this issue, check for errors such as No persistent volumes available for this claim. Also, run kubectl get pods or kubectl describe pod to see if Cache Service pods are stuck in a Pending state.

    This issue suggests you have a misconfigured Kubernetes storage layer or you're using an invalid storage class identifier. To fix this issue, do the following:

    • Review the quarry.persistence.storageClass value in values-eradb.yaml. Make sure it's set to a valid cluster storage class. You can use kubectl get storageclass to see the available classes.
    • Make sure a storage class is available for pod storage.
    • Make sure your cluster has adequate storage in values-eradb.yaml.

Enabling debug logging for self-hosted EraSearch
Copy
Copied!

Use debug logging to get in-depth database information and troubleshoot issues. To enable debug logging, add the value logLevel: debug​​ to any EraSearch service in your values-eradb.yaml file.

For example, this values-eradb.yaml file enables debug logging for the API and Cache services (also known as quarry):

Copy
Copied!
quarry:
  logLevel: debug # ⭐️
  imagePullSecrets:
    - name: eradb-registry
  replicaCount: 4
  resources:
    cpu: 4
    memory: 8Gi
    disk: 2.5T

To deploy your changes, save the updated Helm chart and enter the command below, replacing:

  • NAME with the EraSearch database release name (for example, era).
  • x.x.x with the version of the Helm chart you got from Era Software.
  • VALUES_FILE with the path to the updated values-eradb.yaml file.
  • NAMESPACE_NAME with the relevant Kubernetes namespace.
Copy
Copied!
$ helm upgrade NAME ./eradb-X.X.X.tgz \
    --values VALUES_FILE \ 
    --namespace NAMESPACE_NAME

Successful upgrade commands return Release NAME has been upgraded. Happy Helming! with other deployment details.

Slow write throughput
Copy
Copied!

Slow write throughput can have several different causes. To start:

  • Make sure the CPU-bound task latency metric is under 1s per pod. To improve this metric, add more CPU resources to the API and Cache tiers.
  • Make sure the disk-bound task latency metric is under 5ms per pod. To improve this metric, add more disk resources or a higher number of IOPs to your Cache Service tier.
  • If insertion times are much larger than the maxwell.treasurer.batch_delay_ms setting, reduce the batch_delay_ms setting and increase the monthly_budget setting. Note that this change increases the financial costs associated with your object storage provider, but increases overall system throughput.