Error 210: NETWORK_ERROR
This error occurs when network communication fails due to connection issues, broken connections, or other I/O problems. It indicates that data could not be sent or received over the network, typically because the connection was closed, refused, or reset.
Most common causes
-
Broken pipe (client disconnected)
- Client closed connection while the server was sending data
- Client crashed or was terminated during query execution
- Client timeout shorter than query duration
- Client application restarted or connection pool recycled connection
-
Connection refused
- Target server not listening on specified port
- Server pod not ready or being restarted
- Firewall blocking connection
- Wrong hostname or port in configuration
-
Socket not connected
- Client disconnected prematurely
- Connection closed before response could be sent
- Network interruption during data transfer
- Client-side connection timeout
-
Connection reset by peer
- Remote side forcibly closed connection (TCP RST)
- Network equipment reset connection
- Remote server crashed or restarted
- Firewall or security device dropped connection
-
Distributed query failures
- Cannot connect to remote shard in cluster
- Network partition between cluster nodes
- Remote node down or unreachable
- All connection attempts to replicas failed
-
Network infrastructure issues
- Load balancer health check failures
- Pod restarts or rolling updates
- Network policy blocking traffic
- DNS resolution followed by connection failure
Common solutions
1. Check if client disconnected early
For "broken pipe" errors:
Cause: Query took longer than client timeout.
Solution:
- Increase client-side timeout
- Optimize query to run faster
- Add
LIMITto reduce result size
2. Verify server availability
For "connection refused" errors:
3. Check cluster connectivity
4. Increase client timeout
5. Check for pod restarts
6. Verify network policies and firewall
7. Handle gracefully in application
Common scenarios
Scenario 1: Broken pipe during long query
Cause: Client disconnected after 3+ hours; query completed on server but client was gone.
Solution:
- Increase client timeout to match expected query duration
- Set realistic timeout expectations
- For very long queries (>1 hour), consider using
INSERT INTO ... SELECTto materialize results - ClickHouse Cloud gracefully terminates connections with 1-hour timeout during drains
Scenario 2: Connection refused in distributed query
Cause: Cannot connect to remote shard; pod may be restarting.
Solution:
Scenario 3: Socket not connected after query completes
Cause: Client closed connection before server could send response.
Solution:
- This often appears in logs after successful query completion
- Usually harmless - query already processed successfully
- Client may have closed connection early due to timeout or crash
- Check client logs for why disconnect occurred
Scenario 4: Connection reset by peer
Cause: Remote side forcibly terminated connection.
Solution:
- Check if remote server crashed or restarted
- Verify network stability
- Check firewall or security appliance logs
- Test with simpler queries
Scenario 5: All connection tries failed
Cause: Cannot establish connection to any replica.
Solution:
- Check if all cluster nodes are down
- Verify network connectivity
- Check ClickHouse server status
- Review cluster configuration
Prevention tips
- Set appropriate client timeouts: Match client timeout to expected query duration
- Handle connection errors: Implement retry logic with exponential backoff
- Monitor network health: Track connection failures and latency
- Use connection pooling: Maintain healthy connection pools
- Plan for restarts: Design applications to handle temporary connection failures
- Keep connections alive: Configure TCP keep-alive appropriately
- Optimize queries: Reduce query execution time to avoid timeout issues
Debugging steps
-
Identify error type:
-
Check for specific error patterns:
-
Check for pod restarts (Kubernetes):
-
Monitor distributed query failures:
-
Check network connectivity:
-
Review query duration vs client timeout:
Special considerations
For "broken pipe" errors:
- Usually indicates client disconnected
- Query may have completed successfully before disconnect
- Common with long-running queries and short client timeouts
- Often not a server-side issue
For "connection refused" errors:
- Server not ready to accept connections
- Common during pod restarts or scaling
- Temporary and usually resolved by retry
- Check if server is actually running
For "socket not connected" errors:
- Appears in
ServerErrorHandlerlogs - Often logged after query already completed
- Client disconnected before server could send final response
- Usually benign if query completed successfully
For distributed queries:
- Each shard connection can fail independently
ALL_CONNECTION_TRIES_FAILEDmeans no replicas are accessible- Check network between cluster nodes
- Verify all nodes are running
Common error subcategories
Broken pipe (errno 32):
- Client closed write end of connection
- Server trying to send data to closed socket
- Usually client-side timeout or crash
Connection refused (errno 111):
- No process listening on target port
- Server not started or port closed
- Firewall blocking connection
- Wrong hostname or port
Socket not connected (errno 107):
- Operation on socket that isn't connected
- Client disconnected before operation
- Premature connection close
Connection reset by peer (errno 104):
- Remote side sent TCP RST
- Forceful connection termination
- Often due to firewall or remote crash
Network error settings
Handling in distributed queries
For distributed queries with failover:
Client-side best practices
-
Set realistic timeouts:
-
Implement retry logic:
-
Handle long-running queries:
-
Monitor connection health:
- Log connection errors on client side
- Track retry counts
- Alert on sustained network errors
Distinguishing from other errors
NETWORK_ERROR (210): Network/socket I/O failureSOCKET_TIMEOUT (209): Timeout during socket operationTIMEOUT_EXCEEDED (159): Query execution time limitALL_CONNECTION_TRIES_FAILED (279): All connection attempts failed
NETWORK_ERROR is specifically about connection failures and broken sockets.
Query patterns that commonly trigger this
-
Long-running
SELECTqueries:- Query duration exceeds client timeout
- Results in broken pipe when server tries to send results
-
Large data transfers:
- Client buffer overflows
- Client application can't keep up with data rate
-
INSERT INTO ... SELECT FROM s3():- Long-running imports from S3
- Client timeout during multi-hour operations
-
Distributed queries:
- Connection to remote shards fails
- Network issues between cluster nodes
If you're experiencing this error:
- Check the specific error message (broken pipe, connection refused, etc.)
- For "broken pipe": verify client timeout settings and query duration
- For "connection refused": check if the server is running and accessible
- For "socket not connected": usually harmless if query completed
- Test network connectivity between client and server
- Check for pod restarts or infrastructure changes (Cloud/Kubernetes)
- Implement retry logic for transient network failures
- For very long queries (>1 hour), consider alternative patterns
- Monitor frequency - occasional errors are normal, sustained errors need investigation
Related documentation: