注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

zorksylar

Nothing is impossible , if distributed.

 
 
 

日志

 
 

[zz]Machine Check Exceptions (MCE)  

2012-02-27 17:03:41|  分类: linux学习 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
转自:http://www.advancedclustering.com/faq/im-getting-mce-machine-check-exception-errors-what-does-this-mean.html

服务器出了问题,肯定是要看log的,
最近要给hfut部署某个云存储系统,
服务器居然非常没有规律的重启,幸好只是几次,
看了下syslog,message,发现多次出现
Machine check events logged.
于是google出了mcelog这个东东,
MCE : Machine Check Exceptions
一般都是hardware的问题,系统会进行correct,如果比较严重,会引起系统的panic
mcelog是个查看MCE的工具,需要先安装。

1.安装mcelog
2.配置系统开启MCE: 
查看是否开启MCE吧

#cd /root

#grep MCE config*

3.mcelog配置cpu,运行daemon模式

#mcelog --cpu your_cpu --daemon

这样系统在有MCE的时候会在/var/log/mcelog中看到


以下纯搬运:

What are Machine Check Exceptions (or MCE)?

A machine check exception is an error dedected by your system's processor. There are 2 major types of MCE errors, a notice or warning error, and a fatal execption. The warning will be logged by a "Machine Check Event logged" notice in your system logs, and can be later viewed via some Linux utilities. A fatal MCE will cause the machine to stop responding and the details of the MCE will be printed out to the system's console.

What causes MCE errors?

There most common reason for MCE events to occur are:

  • Memory errors or Error Correction Code (ECC) problems
  • Inadequate cooling / processor over-heating
  • System bus errors
  • Cache errors in the processor or hardware

How do I find out what the errors mean?

If you see the message "Machine Check Events logged" on your console or in your system logs, then you can run the mcelog command to read the message from the kernel. Once you run mcelog you will not be able to re-run it to see the error, so it's best to output the text to a file so you can further analyize it. For example:

root@localhost:/root> /usr/sbin/mcelog > mcelog.out

Some systems do this for you on a regular basis and send the output to the file /var/log/mcelog . So if you see the "Machine Check Events logged" message but mcelog does not return any data, please look /var/log/mcelog.

The output received may not always be easy to understand. If you have any questions about the decoded error message please create a support ticket and we will help analyize the problem.

What if I get a fatal machine check event that causes my machine to stop responding?

These errors are almost always caused by faulty hardware. Please capture the mce message and you can later run it through the mcelog program once the machine is back up. Here's an example of a message you might see:

CPU 1: Machine Check Exception:                4 Bank 4:  f600200137080813
TSC b0ce27165dd3 ADDR 180ee1b40

Paste or type the error message into a file, and then run it through the mcelog for example:

root@localhost:/root> /usr/sbin/mcelog --k8 --ascii < myerror

Use the --k8 option if you are using an AMD Opteron or Athlon 64 processor, or substitute it for --p4 for a Pentium 4 or Xeon. Here is the output from the previous mce error:

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC b0ce27165dd3
Northbridge Chipkill ECC error
Chipkill ECC syndrome = 3700
bit32 = err cpu0
bit45 = uncorrected ecc error
bit57 = processor context corrupt
bit61 = error uncorrected
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS f600200137080813 MCGSTATUS 4

This indicates that an uncorrected ECC error occured. This indicates that one of your memory modules has failed. For further analysis and please submit a support ticket with the complete MCE error message and the output of mcelog.

  评论这张
 
阅读(793)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2018