When diagnosing fault issues in the company, we generally print logs in key parts of the code and then analyze the problem by replacing the Jar package in the environment container. However, this process can be quite cumbersome:
- First, you need to ensure the comprehensiveness of the logs you print. If some key information is not printed, you will need to reprint, replace the package, and restart the service, which wastes a lot of time.
- Secondly, not all environments support package replacement and service restarts.
So recently, I have been looking for better methods for fault diagnosis, and I found that Alibaba's open-source Java service diagnostic tool seems quite good. It can view method call parameters, return values, called paths, call durations, method call counts, success counts, failure counts, etc., all of which can be recorded. Therefore, I will record this tool for learning purposes.
What is Arthas#
Official introduction:
Arthas is an online monitoring and diagnostic product that allows real-time viewing of application load, memory, GC, and thread status information from a global perspective. It can diagnose business problems without modifying application code, including viewing method call parameters, exceptions, monitoring method execution time, class loading information, etc., greatly improving the efficiency of online problem troubleshooting.
Operating Environment#
- Only supports JDK 6 and above
- Written in Java, supports cross-platform: supports Linux (mainly), Mac, Windows
Features#
- Uses command-line interactive mode
- Provides
Tab
key auto-completion functionality
Initial Use#
Since the environment used by the company is mainly in containers, the following mainly records how to use this tool in a Linux environment.
Download Usage Package#
Due to the company's environment being an intranet, direct access to GitHub for downloading installation packages is not supported. To prevent network issues from preventing downloads, the method used is to manually download from GitHub and copy it to the service container.
Download the complete installation package from GitHub, download address: https://github.com/alibaba/arthas/releases
Unzip in Container Environment#
# Create a directory dedicated to arthas, as many files will be generated after unzipping
mkdir arthas
# Unzip to the newly created arthas directory
unzip -d arthas arthas-bin.zip
Uninstall#
After locating the problem, it's also important to clean up the battlefield, so the method for uninstalling this tool is also recorded.
You can uninstall the tool by executing the following three steps:
rm -rf arthas
rm -rf ~/.arthas/
rm -rf ~/logs
Run#
First, start a Java program service that will not stop. The official installation package comes with a Jar package for practice:
math-game.jar
(however, our services generally run continuously, so here we use the official package for record-keeping).
# Start this Java program; if you have your own service, you can skip this step
java -jar math-game.jar
Then start arthas
# 1. Start
java -jar arthas-boot.jar
# 2. Select the Java service you want to attach to, enter the process number and press enter (there is only one process here, which is process 1)
1
# Seeing the arthas logo means arthas has attached to this process 1 service
Common Commands#
help#
# Entering help will provide Arthas-related command help information.
[arthas@421554]$ help
NAME DESCRIPTION
help Display Arthas Help
auth Authenticates the current session
keymap Display all the available keymap for the specified connection.
sc Search all the classes loaded by JVM
sm Search the method of classes loaded by JVM
classloader Show classloader info
jad Decompile class
getstatic Show the static field of a class
monitor Monitor method execution statistics, e.g. total/success/failure count, average rt, fail rate, etc.
stack Display the stack trace for the specified class and method
thread Display thread info, thread stack
trace Trace the execution time of specified method invocation.
watch Display the input/output parameter, return object, and thrown exception of specified method invocation
tt Time Tunnel
jvm Display the target JVM information
memory Display jvm memory info.
perfcounter Display the perf counter information.
ognl Execute ognl expression.
mc Memory compiler, compiles java files into bytecode and class files in memory.
redefine Redefine classes. @see Instrumentation#redefineClasses(ClassDefinition...)
retransform Retransform classes. @see Instrumentation#retransformClasses(Class...)
dashboard Overview of target jvm's thread, memory, gc, vm, tomcat info.
dump Dump class byte array from JVM
heapdump Heap dump
options View and change various Arthas options
cls Clear the screen
reset Reset all the enhanced classes
version Display Arthas version
session Display current session information
sysprop Display and change the system properties.
sysenv Display the system env.
vmoption Display, and update the vm diagnostic options.
logger Print logger info, and update the logger level
history Display command history
cat Concatenate and print files
base64 Encode and decode using Base64 representation
echo write arguments to the standard output
pwd Return working directory name
mbean Display the mbean information
grep grep command for pipes.
tee tee command for pipes.
profiler Async Profiler. https://github.com/jvm-profiling-tools/async-profiler
vmtool jvm tool
stop Stop/Shutdown Arthas server and exit the console.
jfr Java Flight Recorder Command
dashboard#
Dashboard: Displays the real-time data panel of the current system. When there is no dashboard, we generally can only view system operation information through the built-in
top
command in Linux.
Enter dashboard, press Enter
, and it will display the current process information. Press Ctrl+C
or enter q
to interrupt execution.
The displayed information is roughly divided into three main sections:
- The top section is thread-related information
- The middle area is JVM memory-related information
- The bottom section is information about the Java runtime environment
For specific information in each column, please refer to the official documentation.
thread#
View the current thread information stack
When there are no parameters, display the first page of thread information#
thread
By default, it is sorted in descending order by CPU increment time and only displays the first page of data.
Supports one-click display of the top N busiest threads and print the stack#
thread -n N
thread --all, display all matching threads#
# Display all matching thread information. Sometimes you need to obtain all JVM thread data for analysis.
thread --all
thread id, display the running stack of the specified thread#
[arthas@421554]$ thread 1
"main" Id=1 TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at java.lang.Thread.sleep(Thread.java:342)
at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386)
at demo.MathGame.main(MathGame.java:17)
thread -b, find the thread that is currently blocking other threads#
thread -b
watch#
Observe the call situation of the specified method
You can observe:
Method return value, parameters, exceptions thrown by the method, and you can also view corresponding variables by writing OGNL expressions.
Observe the parameters, this object, and return value when the function call returns#
# The default observation dimensions are {params, target, returnObj}. Below, we observe the parameters, this object, and return value when the function call returns. -x represents the depth of the output result property traversal, i.e., the depth of sub-objects, with a maximum depth of 4.
[arthas@421554]$ watch demo.MathGame primeFactors -x 2
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 28 ms, listenerId: 2
method=demo.MathGame.primeFactors location=AtExit
ts=2023-05-10 00:25:42; [cost=0.106133ms] result=@ArrayList[
@Object[][
@Integer[1],
],
@MathGame[
random=@Random[java.util.Random@254989ff],
illegalArgumentCount=@Integer[85442],
],
@ArrayList[
@Integer[103],
@Integer[1667],
],
]
# Change the depth to 3
[arthas@421554]$ watch demo.MathGame primeFactors -x 3
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 26 ms, listenerId: 3
method=demo.MathGame.primeFactors location=AtExit
ts=2023-05-10 00:26:17; [cost=0.34344ms] result=@ArrayList[
@Object[][
@Integer[1],
],
@MathGame[
random=@Random[
serialVersionUID=@Long[3905348978240129619],
seed=@AtomicLong[97774455668942],
multiplier=@Long[25214903917],
addend=@Long[11],
mask=@Long[281474976710655],
DOUBLE_UNIT=@Double[1.1102230246251565E-16],
BadBound=@String[bound must be positive],
BadRange=@String[bound must be greater than origin],
BadSize=@String[size must be non-negative],
seedUniquifier=@AtomicLong[-3282039941672302964],
nextNextGaussian=@Double[0.0],
haveNextNextGaussian=@Boolean[false],
serialPersistentFields=@ObjectStreamField[][isEmpty=false;size=3],
unsafe=@Unsafe[sun.misc.Unsafe@5d099f62],
seedOffset=@Long[24],
],
illegalArgumentCount=@Integer[85459],
],
@ArrayList[
@Integer[7],
@Integer[21313],
],
]
Observe both the function call before and after the function returns#
[arthas@421554]$ watch demo.MathGame primeFactors "{params,target,returnObj}" -x 2 -b -s
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 39 ms, listenerId: 6
method=demo.MathGame.primeFactors location=AtEnter
ts=2023-05-10 00:30:00; [cost=0.036373ms] result=@ArrayList[
@Object[][
@Integer[163405],
],
@MathGame[
random=@Random[java.util.Random@254989ff],
illegalArgumentCount=@Integer[85576],
],
null,
]
method=demo.MathGame.primeFactors location=AtExit
ts=2023-05-10 00:30:00; [cost=1.4721136326180643E10ms] result=@ArrayList[
@Object[][
@Integer[1],
],
@MathGame[
random=@Random[java.util.Random@254989ff],
illegalArgumentCount=@Integer[85576],
],
@ArrayList[
@Integer[5],
@Integer[11],
@Integer[2971],
],
]
Observe the properties in the current object#
If you want to view the properties in the current object before and after the function runs, you can use the target keyword, where target represents the current object, and then use target.field_name to access a specific property of the current object.
[arthas@421554]$ watch demo.MathGame primeFactors 'target.illegalArgumentCount'
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 31 ms, listenerId: 7
method=demo.MathGame.primeFactors location=AtExit
ts=2023-05-10 00:33:50; [cost=0.081212ms] result=@Integer[85676]
method=demo.MathGame.primeFactors location=AtExceptionExit
ts=2023-05-10 00:33:51; [cost=0.102672ms] result=@Integer[85677]
trace#
The internal call path of the method, outputting the time spent at each node along the path, is used when the service call time is too long.
[arthas@421554]$ trace demo.MathGame run
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 55 ms, listenerId: 9
`---ts=2023-05-11 00:24:38;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@74a14482
`---[0.54719ms] demo.MathGame:run()
+---[20.76% 0.113574ms ] demo.MathGame:primeFactors() #24
`---[53.28% 0.29155ms ] demo.MathGame:print() #25
- In the output result, #24 indicates that the
primeFactors()
function was called at line 24 of the source file. - In the output result, #25 indicates that the
print()
function was called at line 25 of the source file.
stack#
Output the call path of the current method. When we need to know where this method (which has been called from many places) started executing, we can use this command (suitable for tracing back).
[arthas@421554]$ stack demo.MathGame primeFactors
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 38 ms, listenerId: 12
ts=2023-05-11 00:32:43;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@74a14482
@demo.MathGame.primeFactors()
at demo.MathGame.run(MathGame.java:24)
at demo.MathGame.main(MathGame.java:16)
jad#
Decompile the source code of the specified loaded class for easier understanding of business logic online. The decompiled code is syntax-highlighted.
jad demo.MathGame